Tag Archives: Database

Assessing the Intermediate Database

The types of databases employed can often determine whether a secondary database is necessary.  This secondary database is typically referred to as a staging or intermediate database, because it resides outside the base application.

Should the company “Enterprise Resource Planning” (“ERP”) application support all required business transactions – both inbound and outbound with customers – then the need to have an intermediate database is lessened.  However, the company ERP might provide support for “core” business transactions but might be limited for “extended” business transactions.  This creates a business problem – where to store the extended business transaction data. Continue reading

Dipping a Toe in the NoSQL Pool

There’s a great deal of data floating around and it has to be put somewhere. In 1970, the relational database (RDBMS) was invented to help store it. Data that was *related* could be grouped together in a table. A schema was used to define the structure of the data within the database and the relationships among it. SQL was used to manipulate the database and the data it contained. There are many databases that use this model, such as MySQL, PostgresSQL, and Oracle.

For many years, relational databases were the cornerstones of applications. Organizations used them as the backend store for their thick-clients as well as being an integral part of the LAMP stack used in early web applications. Only recently has our software needed a little extra oomph.

The NoSQL movement promises to fulfill requirements of high availability, horizontal scaling, replication, schemaless design, and complex computational capabilities. It contests the notion that RDBMS are always the best place to store your data and opens the doors to greater freedom when choosing your storage mechanism.

The framework used to evaluate these systems is based on consistency, availability, and partition tolerance (CAP). The CAP theorem was developed by Eric Brewer to formally talk about tradeoffs in highly scalable systems1. Like other decisions made in the software world, you can only pick two out of the three criteria.

The NoSQL movement doesn’t subscribe to a particular data model like RDBMS do. There are three other models that are part of the crowd:

key-value: much like a map that supports put, get, and remove (Redis, Dynamo)

column-oriented: still uses tables like the relational model, but without joins (BigTable)

document-oriented: stores structured documents like JSON or XML (CouchDB)

You may be thinking, “Ok, so what is the best one?” I only wish the answer was that simple. Many different factors go into choosing and you are not limited to one mechanism per application. You can choose different stores for different types of data and functionality.2, 3

Structuring your application to take advantage of these data store capabilities requires analysis of your data requirements. You may need fast access or maybe your data is written more than it’s read. Perhaps you need to perform calculations such as map/reduce or graph manipulations. Maybe your data is of the binary variety. And of course, the availability rabbit hole – do you trust your server not to fail when you’ve just been featured on the 6 o’clock news (or Digg)?

While this is a lot to think about, the benefits of charting your way through the NoSQL forest are worth the effort in the long run. Your application will be better suited to expandability and your maintenance efforts may be decreased. However, there’s no cause to throw out your SQL books…just yet.

More info & references:

1. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

2. http://blog.nahurst.com/visual-guide-to-nosql-systems

3. http://blog.heroku.com/archives/2010/7/20/nosql/

4. http://architects.dzone.com/news/nosql-old-wine-new-bottle

Supercharging Your SQL Statements

Structured Query Language (SQL) is a database computer language designed for managing data within relational databases. The most common applications for SQL are actions (calls) on the database using SELECT, UPDATE, DELETE and INSERT (data) functions.  SQL can provide many other database calls that will contribute to the reduction of user programs.   SQL includes built-in functionality that can help format and calculate data during any of these database actions/calls.  Below are some examples of using SQL functions to accomplish these processes. Continue reading

Dealing with the Pain of Version Upgrades

If you’ve ever dealt with changes to a working version of a schema, whether it is database, EDI, XML, or whatever format your data may be in, then you know how painful it is.  In most shops in the typical data processing scenario, either a tool or a custom program is used to process the data in one format and convert it to another format to be piped off for further processing somewhere else.  The most difficult to deal with example can be changes to an XML schema.  The reason is that XML is so extensible and just about anything can be done with it.  The contrasting example would be EDI data where the changes are usually miniscule and the structure itself does not vastly change.  The typical example that most IT shops face is a change in a database which could be the addition of a table or column, the deletion of a table or column, the moving of a table or column, or a change in table/column properties.

If we look at this from the perspective of a model, a schema is really a tree or graph (depending on whether it’s recursive) with entities representing the schema structure. Continue reading

Distributed Transactions

Today we are going to talk about distributed transactions, a common component of enterprise systems. First some definitions. Looking through past EXTOL blogs, the term transaction is used in many different ways. You might have specific meanings for the word in your own business. The kinds of transactions I’m talking about here are those at the software component level. These transactions may or may not mirror a higher level business transaction. There are many locations inside a software system where data is exchanged between components, but we usually don’t describe them all with the word ‘transaction’. The term is usually reserved for an exchange were certain conditions are maintained. These extra conditions are nearly always the principles of ACID [A]tomicity, [C]onsistency, [I]solation, and [D]urability. You can look up the details on Wikipidia, but in short, these ideas are important to keep ‘bad’ things from happening as the exchange is performed (such as interference from other actions executing in the system, unexpected crashes, etc…). Continue reading