Archive

Posts Tagged ‘Database’

Dipping a Toe in the NoSQL Pool

August 11th, 2010 Patrick Gombola No comments

There’s a great deal of data floating around and it has to be put somewhere. In 1970, the relational database (RDBMS) was invented to help store it. Data that was *related* could be grouped together in a table. A schema was used to define the structure of the data within the database and the relationships among it. SQL was used to manipulate the database and the data it contained. There are many databases that use this model, such as MySQL, PostgresSQL, and Oracle.

For many years, relational databases were the cornerstones of applications. Organizations used them as the backend store for their thick-clients as well as being an integral part of the LAMP stack used in early web applications. Only recently has our software needed a little extra oomph.

The NoSQL movement promises to fulfill requirements of high availability, horizontal scaling, replication, schemaless design, and complex computational capabilities. It contests the notion that RDBMS are always the best place to store your data and opens the doors to greater freedom when choosing your storage mechanism.

The framework used to evaluate these systems is based on consistency, availability, and partition tolerance (CAP). The CAP theorem was developed by Eric Brewer to formally talk about tradeoffs in highly scalable systems1. Like other decisions made in the software world, you can only pick two out of the three criteria.

The NoSQL movement doesn’t subscribe to a particular data model like RDBMS do. There are three other models that are part of the crowd:

key-value: much like a map that supports put, get, and remove (Redis, Dynamo)

column-oriented: still uses tables like the relational model, but without joins (BigTable)

document-oriented: stores structured documents like JSON or XML (CouchDB)

You may be thinking, “Ok, so what is the best one?” I only wish the answer was that simple. Many different factors go into choosing and you are not limited to one mechanism per application. You can choose different stores for different types of data and functionality.2, 3

Structuring your application to take advantage of these data store capabilities requires analysis of your data requirements. You may need fast access or maybe your data is written more than it’s read. Perhaps you need to perform calculations such as map/reduce or graph manipulations. Maybe your data is of the binary variety. And of course, the availability rabbit hole – do you trust your server not to fail when you’ve just been featured on the 6 o’clock news (or Digg)?

While this is a lot to think about, the benefits of charting your way through the NoSQL forest are worth the effort in the long run. Your application will be better suited to expandability and your maintenance efforts may be decreased. However, there’s no cause to throw out your SQL books…just yet.

More info & references:

1. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

2. http://blog.nahurst.com/visual-guide-to-nosql-systems

3. http://blog.heroku.com/archives/2010/7/20/nosql/

4. http://architects.dzone.com/news/nosql-old-wine-new-bottle

Categories: Data Integration Tags: ,

Supercharging Your SQL Statements

February 4th, 2010 Fred Winkler No comments

Structured Query Language (SQL) is a database computer language designed for managing data within relational databases. The most common applications for SQL are actions (calls) on the database using SELECT, UPDATE, DELETE and INSERT (data) functions.  SQL can provide many other database calls that will contribute to the reduction of user programs.   SQL includes built-in functionality that can help format and calculate data during any of these database actions/calls.  Below are some examples of using SQL functions to accomplish these processes. Read more…

Dealing with the Pain of Version Upgrades

February 2nd, 2010 Andrew Knott No comments

If you’ve ever dealt with changes to a working version of a schema, whether it is database, EDI, XML, or whatever format your data may be in, then you know how painful it is.  In most shops in the typical data processing scenario, either a tool or a custom program is used to process the data in one format and convert it to another format to be piped off for further processing somewhere else.  The most difficult to deal with example can be changes to an XML schema.  The reason is that XML is so extensible and just about anything can be done with it.  The contrasting example would be EDI data where the changes are usually miniscule and the structure itself does not vastly change.  The typical example that most IT shops face is a change in a database which could be the addition of a table or column, the deletion of a table or column, the moving of a table or column, or a change in table/column properties.

If we look at this from the perspective of a model, a schema is really a tree or graph (depending on whether it’s recursive) with entities representing the schema structure. Read more…

Distributed Transactions

January 14th, 2010 Jason Barkanic No comments

Today we are going to talk about distributed transactions, a common component of enterprise systems. First some definitions. Looking through past EXTOL blogs, the term transaction is used in many different ways. You might have specific meanings for the word in your own business. The kinds of transactions I’m talking about here are those at the software component level. These transactions may or may not mirror a higher level business transaction. There are many locations inside a software system where data is exchanged between components, but we usually don’t describe them all with the word ‘transaction’. The term is usually reserved for an exchange were certain conditions are maintained. These extra conditions are nearly always the principles of ACID [A]tomicity, [C]onsistency, [I]solation, and [D]urability. You can look up the details on Wikipidia, but in short, these ideas are important to keep ‘bad’ things from happening as the exchange is performed (such as interference from other actions executing in the system, unexpected crashes, etc…). Read more…

Give your Database a Makeover: Create a View

December 29th, 2009 DJ Zimniski No comments

When setting up a process to translate data from your backend database to another format, it is common to realize that your source DB data is too disparate with how your target data format needs to be mapped/structured. Introducing database views can often simplify integration between two formats and reduce or eliminate the need for external programming.

A view is simply a named select statement that is stored in a database as an object. Using a view can simplify the mapping process by renaming fields (rows) to a purpose more easily understood.   Read more…

“Extend” or “Expand” the Database?

September 21st, 2009 Troy Lunt No comments

Standards only remain “standard” until the next change. Even in the B2B EDI world where trading partner data exchanges are “supposed to be” standardized, new additions and upgrades to later versions will create an opportunity to re-evaluate how data is being moved and stored for company consumption.

Traditional EDI transactions are built upon the “Header” and “Detail” premise where a single iteration of header information could be followed by many iterations of detail information. This makes it simple to build a database in support of this structure – a single transaction would see one record moved into the header file, and one or more records moved into the detail file. Header and detail would be linked together through common key values and sequential controls.

As requirements mature, changes are made and generally new information is presented. This could come by way of additional (single or multiple pieces of) data that could be stored in either the header or detail file. Trouble begins to brew when this new data is defined [by itself] as repeating information. Read more…