Archive

Archive for the ‘Data Integration’ Category

Small- versus Large-Scale Provisioning

September 8th, 2010 Jim OLeary No comments

As applied to business integration, the term “provisioning” has many meanings, but overall, it refers to the process of defining integration endpoints and establishing connections and integration processes between them.   If an integration service that connects a pair of endpoints is simple and tightly constrained – for example, a data syndication service with a fixed process model and limited output options – provisioning can be as simple as selecting from a fixed list of connection and data format delivery options, and specifying the delivery endpoint’s address.

In most cases, however, business integration provisioning involves more steps, because the business problem to be solved requires tailored integration between some set of sources and targets, e.g., integration of an XML transaction set with a Warehouse Management System.  Those steps might include definition or specification of endpoints, communication and interface connections, documents / messages and envelopes, data routing, business processes, and data transformations.  By combining building blocks that implement such object types, you can solve most kinds of business-to-business, application, and data integration problems. Read more…

Dipping a Toe in the NoSQL Pool

August 11th, 2010 Patrick Gombola No comments

There’s a great deal of data floating around and it has to be put somewhere. In 1970, the relational database (RDBMS) was invented to help store it. Data that was *related* could be grouped together in a table. A schema was used to define the structure of the data within the database and the relationships among it. SQL was used to manipulate the database and the data it contained. There are many databases that use this model, such as MySQL, PostgresSQL, and Oracle.

For many years, relational databases were the cornerstones of applications. Organizations used them as the backend store for their thick-clients as well as being an integral part of the LAMP stack used in early web applications. Only recently has our software needed a little extra oomph.

The NoSQL movement promises to fulfill requirements of high availability, horizontal scaling, replication, schemaless design, and complex computational capabilities. It contests the notion that RDBMS are always the best place to store your data and opens the doors to greater freedom when choosing your storage mechanism.

The framework used to evaluate these systems is based on consistency, availability, and partition tolerance (CAP). The CAP theorem was developed by Eric Brewer to formally talk about tradeoffs in highly scalable systems1. Like other decisions made in the software world, you can only pick two out of the three criteria.

The NoSQL movement doesn’t subscribe to a particular data model like RDBMS do. There are three other models that are part of the crowd:

key-value: much like a map that supports put, get, and remove (Redis, Dynamo)

column-oriented: still uses tables like the relational model, but without joins (BigTable)

document-oriented: stores structured documents like JSON or XML (CouchDB)

You may be thinking, “Ok, so what is the best one?” I only wish the answer was that simple. Many different factors go into choosing and you are not limited to one mechanism per application. You can choose different stores for different types of data and functionality.2, 3

Structuring your application to take advantage of these data store capabilities requires analysis of your data requirements. You may need fast access or maybe your data is written more than it’s read. Perhaps you need to perform calculations such as map/reduce or graph manipulations. Maybe your data is of the binary variety. And of course, the availability rabbit hole – do you trust your server not to fail when you’ve just been featured on the 6 o’clock news (or Digg)?

While this is a lot to think about, the benefits of charting your way through the NoSQL forest are worth the effort in the long run. Your application will be better suited to expandability and your maintenance efforts may be decreased. However, there’s no cause to throw out your SQL books…just yet.

More info & references:

1. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

2. http://blog.nahurst.com/visual-guide-to-nosql-systems

3. http://blog.heroku.com/archives/2010/7/20/nosql/

4. http://architects.dzone.com/news/nosql-old-wine-new-bottle

Categories: Data Integration Tags: ,

Creating XML in MSSQL

May 26th, 2010 Dan Brown No comments

In my last blog I briefly touched upon storing XML documents in MSSQL.  Now let’s discuss creating them instead.  Specifically I mean creating them using the FOR XML keywords in MSSQL.  While this functionality exists in other database systems such as Oracle and DB2, their implementation is different and warrants a separate blog post.

Most of us are familiar with selecting data using a SQL command and retrieving some sort of result set.  With XML being used so frequently for a wide range of applications, it is not uncommon for a programmer to want to build an XML message out of data stored in a database somewhere.  Instead of using SQL queries and looping over result sets, it is possible in MSSQL to use the FOR XML declaration to actually return that data as pre-formatted XML. Read more…

Data Integration 101 Using the EXTOL Business Integrator

May 20th, 2010 Mike Coyle No comments

Data Integration is defined as, “the combining of fragmented data residing in different sources and locations which are aligned to support business goals”.  There are many reasons to bring data of different types (flat file, DB2, or even spreadsheets), possibly residing on different servers, to one main location to be integrated together.  If you do Electronic Data Interchange (EDI), translating data between an EDI fixed format and application variable format files, then you have already been doing a piece of the data integration puzzle. Read more…

SQL and XML

May 13th, 2010 Dan Brown No comments

In my previous blog post I talked about the SQL standard.  It is tempting to visualize a standard as a list of rules nailed to a wall.  However things in this industry have a habit of becoming a moving target.   Read more…

Best Practices for Mapping: Application Files and Fields

May 11th, 2010 Andrew Mihalick No comments

Successful EDI implementations must begin with the development and employment of efficient object naming conventions using “best practices”.  This will avoid aggravation and redevelopment at a later time.  “Doing it correctly the first time” is a most-relevant piece of advice.  This is of particular advantage when creating files (tables) to store EDI data (the implementation and deployment of EDI interface / staging files and in support of both inbound and outbound EDI transactions).

Read more…

Integrating Data Into Business Knowledge

April 8th, 2010 Joe Wood No comments

To understand and fully appreciate the benefits of Data Integration we must first ask the question, “What is data”?  Data is “information”; it can come in many flavors, using many formats, and serving many purposes.  Data could be the contents of a spreadsheet; it could be the contents of a single “cell” within that spreadsheet.

My experience is with the implementation of data, particularly data either being sent-to or received-from a trading partner/customer (considered Business-to-Business or “B2B” data, such as “EDI”).  Data integration focuses on the practical business use of this information and not necessarily the formats where the data is stored (although that does affect how that information is eventually processed and interpreted). Read more…

EXTOL Business Integrator: Dealing with Proprietary Flat File Data (Part 2)

April 6th, 2010 Jeff Barlow No comments

Continuing from my previous blog on dealing with proprietary flat files, EXTOL Business Integrator includes a tool to address another complexity of flat files – repeating blocks of records. EBI’s record fragment, allows you to handle multiple format flat files having record formats repeating in blocks or groups.

To illustrate the issue of repeating blocks of records, consider the following record occurrence where the record names are A, B, and C… Read more…

SQL: The Non-standard Standard

April 1st, 2010 Dan Brown No comments

In a previous blog  titled “Supercharging Your SQL Statements,” Fred Winkler touched upon some interesting capabilities inherent to the SQL Query Language. It is an unfortunate fact that while SQL as a standard is widely accepted, the actual implementation of that standard varies considerably between vendors.

Variation from the standard may not necessarily be a bad thing, although it is quite confusing to those who aren’t expecting it. Let’s look at the TRIM example Fred used.

SELECT TRIM(FIELD) from TABLE

That should be easy enough; however, MSSQL doesn’t support the use of the TRIM function. Instead, you have to use a LTRIM (Left Trim), RTRIM (Right Trim) or a combination of both.

(MSSQL Syntax)
SELECT (LTRIM(RTRIM(FIELD))) from TABLE

Not the most pleasant expression to work with, but it gets the job done. This is a good example where deviation from the standard may make the resulting script look more confusing. Read more…

Categories: Data Integration Tags:

EXTOL Business Integrator: Dealing with Proprietary Flat File Data (Part 1)

March 18th, 2010 Jeff Barlow No comments

In my next two blogs, I will be discussing a common challenge facing EXTOL users — the handling of proprietary flat file data received from trading partners. The flat file trend is becoming more popular and, more importantly, being forced on users by their trading partners. We’ve seen a trend where you are either forced to handle the data in the format it is presented or lose the business. Another side of the increase in processing flat file data is to accommodate smaller “Mom & Pop” shops without the means to present the data in a better format.

Let’s explore the different flavors of flat file data. Flat file data can fall into one of two format types: single format or multiple format. Single format files have a common record layout throughout the entire payload data. This means, for example, that every record in the payload has the same exact layout, all of the fields are identified in the same manner in every record and every record of data is treated the same way in the pending data transformation. Multiple format files contain more than a single record layout throughout the payload. This means you will see multiple record layouts needing to be identified and treated as different records in the pending transformation. Read more…