Category Archives: Integration Architecture

EXTOL’s Migration Assistant

EXTOL’s Migration Assistant provides a path for the users to upgrade the schemas they’re using in a Transformation in place.  The Migration Assistant is a first class archivable object that can be found in its own node in the Workbench.  This feature works for all syntax categories.  Use cases would be an upgrade of an EDI version (i.e. 4010 850 to 5010 850), upgrade of an .xsd (i.e. elements/attributes added or removed from an xml schema or other structural changes), changes made to a database (i.e. tables or columns added or removed), upgrade of a delimited flat file (i.e. fields added or removed), or the upgrade of a Spreadsheet (i.e. cells or tabs added or removed).

The Migration Assistant is a two step process.  First the user is required to create a map called a Schema Association Map between the “Old Schema” and the “New Schema”.  This map is created by using a Ruleset like editor called the Schema Association Map Editor.  This can be done manually or it can use the Auto Association Algorithm.  The Auto Association Algorithm is used by toggling the “Auto-Associate Children” button in the toolbar and dragging a desired source node to a desired target node.  All child nodes of the desired source will be auto associated with all child nodes of the desired target.  The recommended Associations are displayed in an Approval Editor where the user can review and change Associations.  The suggested Associations are ranked by a confidence level.  It is recommended for schemas that are drastically different that if it is desired to use the Auto Association Algorithm that the user start at lower level source elements that they are confident belong with lower level target elements and work their way up to the root.  For schemas where the changes between the “Old” and “New” schemas are reasonable (no huge structural changes) it is permissible to associate the two roots and let all of their children be auto associated, but one should carefully review the results.

The second step to the process is to apply the Schema Association Map to the desired Ruleset.  This can be applied in a batch mode (used when a schema is being used in multiple Rulesets) or it can be applied to a single Ruleset.  The Schema Association Map can be applied to a single Ruleset by right clicking on a Ruleset in the Rulesets tab and clicking Convert Ruleset->Source/Target.  A dialog will be displayed asking the user to select which Schema Association Map they want to apply.  Once the proper Schema Association Map is selected the Ruleset Conversion Editor will be displayed.  This is a diff like editor that will allow the user to review and apply their desired changes as dictated by the Schema Association Map.  To initiate the batch conversion mode the user can right click on the desired Schema Association Map and select Convert Rulesets.  A dialog will be displayed listing all of the Rulesets that the selected Schema Association Map can be applied to.  For each desired conversion the user must select whether they want to review their changes with the Ruleset Conversion Editor or if they want to automatically apply the changes dictated by the Schema Association Map to the Ruleset.

The hope is that the solution is flexible enough so that for simple cases there is very little for the user to do but is able to handle the really difficult cases as well.  Hopefully this has been a helpful and informative guide to upgrading your schemas in EBI.

Increasing Productivity With a Local Eclipse Mirror

We use Eclipse pretty heavily here at EXTOL. In addition to using it as our development environment, we also use a lot of the Eclipse platform and tools in our own product. It really helps our productivity both by making it easier for us to write our code, and by allowing us to reuse 3rd party code instead of writing it ourselves.

However, when new releases are made available, there is a whole lot of downloading going on. Each of our developers has to download the new Eclipse install, install it, and then download whatever other features they need from the update sites. Our Internet connection is pretty fast, but it still takes a lot of time. Time that would be better spent doing our actual work.

First, we started keeping the installs in a shared folder. That worked for just the installs. One person would download it first, and then the rest of us copy it from the shared folder. But that involves having  someone do the initial download and save first. No specific person is responsible for that, so it isn’t done consistently. Or, sometimes someone does it only to find that someone else already did. It also doesn’t solve the problem of the update sites. After installing, each developer still has to download all their features from them directly.

The next thing we tried was to set up a local copy of the update sites. Eclipse has a mirroring tool that can replicate an update site to another location. That was progressing pretty well, but then we thought about how disjointed it was doing manual downloads for the installs, this mirror for the update sites, and having to manage both of them.

I started looking into becoming an actual Eclipse mirror site. Hard drives are cheap, so storage space wouldn’t be a problem. But, we did not want to actually be a public mirror, since that would defeat the purpose of trying to save bandwidth. Then, after looking around a bit, I found this page. I saw that they accepted requests for internal only mirrors. That was perfect. I signed up, and a short time later we had access.

Then came the setup. They make it easy for you by using rsync to do the task, and they also provide a configuration script to get you started. Since the total size was over 100GB, I had to break it into chunks to download over a few nights. In that script, I would enable a few more pieces each night until it was fully synced. From then on, it would just download the changed content each night.

One problem we ran into, though, was that once the whole site was synced, we noticed it was missing some pieces. After further investigation, and an email to the Eclipse webmaster, we found out that what they were calling the full sync was not actually a true full sync. They cut out some of the subprojects because some of the mirror sites were worried about the space that was needed. The webmaster offered to put us on the true full sync, which was much larger. But, by today’s standards, it is still not all that much data. It is currently about 360GB. My laptop that is a year old has more than enough space to contain that. I don’t know what people were worried about.

In any case, our true full mirror is now up and running, and it saves a huge amount of time for us. Now when a new release is available, the rsync automatically gets it. Our developers then download it from our local mirror in seconds, instead of the minutes it used to take. The update sites are also pointed to our local mirror, so those downloads also complete in a fraction of the time it used to take. Finally, our automated build process can also use the local mirror. That lets us set up those builds to automatically get updates, but still have a fast turnaround time.

An Overview: Class and Object Design

Class and object design can be a tricky thing.  It’s not really something that can be taught or quantified.  This skill is developed through years of experience.  This may be more of an art than it is a science.  No expert can come up with an exact formula on how this should be done but all experts can pretty much agree on what is a bad design.

A possible starting point of a process could be to develop a written problem statement outlining the issue at hand.  This can be done by interviewing a customer or whoever the requirements are coming from.  Once the problem statement is developed you will find that the nouns in the statement will turn into classes and the verbs in the statement will turn into operations.  From there relationships between the classes can be derived.  Statements like “is a” can denote inheritance, “has a” can denote composition, “uses” can denote delegation, etc.

A UML diagram should be drawn modeling what was discovered by the problem statement.  It is also good to keep a catalogue of design patterns at hand so that a given pattern may be applied to a module of the software system.  Remember that Object Oriented Software is about loose coupling, encapsulation, and reuse, which design patterns help enforce.  Once the initial diagram is developed it is really more of a guideline than concrete fact at this point.  When the implementation process begins new facts will come to light that haven’t been considered before which will ultimately force the redesign some components of the software.  Once this happens the diagram should be updated to reflect the new functionality.  This will create a software cycle that will bounce back and forth from implementation to redesign approaching a final version of the software.

This is really only a generalized overview of an absolutely monstrous topic in which there are many different paradigms and many different ways to go about tackling the problem.  As a closing thought, keep in mind that a design is never really complete, that it can always be improved upon and made more robust, clearer, and more efficient.

Dipping a Toe in the NoSQL Pool

There’s a great deal of data floating around and it has to be put somewhere. In 1970, the relational database (RDBMS) was invented to help store it. Data that was *related* could be grouped together in a table. A schema was used to define the structure of the data within the database and the relationships among it. SQL was used to manipulate the database and the data it contained. There are many databases that use this model, such as MySQL, PostgresSQL, and Oracle.

For many years, relational databases were the cornerstones of applications. Organizations used them as the backend store for their thick-clients as well as being an integral part of the LAMP stack used in early web applications. Only recently has our software needed a little extra oomph.

The NoSQL movement promises to fulfill requirements of high availability, horizontal scaling, replication, schemaless design, and complex computational capabilities. It contests the notion that RDBMS are always the best place to store your data and opens the doors to greater freedom when choosing your storage mechanism.

The framework used to evaluate these systems is based on consistency, availability, and partition tolerance (CAP). The CAP theorem was developed by Eric Brewer to formally talk about tradeoffs in highly scalable systems1. Like other decisions made in the software world, you can only pick two out of the three criteria.

The NoSQL movement doesn’t subscribe to a particular data model like RDBMS do. There are three other models that are part of the crowd:

key-value: much like a map that supports put, get, and remove (Redis, Dynamo)

column-oriented: still uses tables like the relational model, but without joins (BigTable)

document-oriented: stores structured documents like JSON or XML (CouchDB)

You may be thinking, “Ok, so what is the best one?” I only wish the answer was that simple. Many different factors go into choosing and you are not limited to one mechanism per application. You can choose different stores for different types of data and functionality.2, 3

Structuring your application to take advantage of these data store capabilities requires analysis of your data requirements. You may need fast access or maybe your data is written more than it’s read. Perhaps you need to perform calculations such as map/reduce or graph manipulations. Maybe your data is of the binary variety. And of course, the availability rabbit hole – do you trust your server not to fail when you’ve just been featured on the 6 o’clock news (or Digg)?

While this is a lot to think about, the benefits of charting your way through the NoSQL forest are worth the effort in the long run. Your application will be better suited to expandability and your maintenance efforts may be decreased. However, there’s no cause to throw out your SQL books…just yet.

More info & references:

1. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

2. http://blog.nahurst.com/visual-guide-to-nosql-systems

3. http://blog.heroku.com/archives/2010/7/20/nosql/

4. http://architects.dzone.com/news/nosql-old-wine-new-bottle

AS2, FTP, or VAN: The Race to Zero (Part 2)

With the ever-changing economy, especially now, businesses are looking for ways to reduce costs and become more efficient.  The new technologies (as previously discussed in Part 1 of this blog), coupled with the immediate need for cost cutting, have created the perfect environment to fuel another major shift in business communication.

One factor helping this shift is that technology has been developed and priced so that even small “Mom and Pop” shops can obtain affordable AS2 and/or FTP solutions.  The shift in cost of direct connections will fuel the move away from traditional Value Added Network (VAN) trading partners to having more AS2 and/or FTP partners (connection types).  This shift will increase more rapidly as a result of the constant need for companies to become more efficient to remain competitive.

Large retailers, such as Walmart, require a “direct” AS2 connection to do EDI business. This represents a key indicator that this shift to AS2 (and/or FTP) is becoming more prominent and recognized.

It is interesting to see this relationship go though another major change.  What we’re actually seeing is communications between Trading Partners coming full circle.  Initially there were leased lines and individual modems for connecting to trading partners.  Over time this became too expensive and, out of necessity to reduce communication costs, the VAN was born.  More recently the Internet was introduced, which provided a new and inexpensive means to communicate.  With expanding Internet capabilities, VANs became too expensive and too time consuming to manage.  Better, cheaper software was designed to utilize these new communication methods.  As a result, we now see more direct-connections being established in place of moving data through the traditional VANs.

This shift will require more time…it will not happen overnight.  Instead, it will be a slow migration that will occur over the next decade (possibly longer).  As new business relationships are formed, they will take advantage of these newer technologies; older methods are likely to remain with the VAN service (although the VANs often do provide other solutions and services besides merely the moving of data).  Going forward, implementations will generally see a mix of these connection options until costs and efficiency eventually eliminate those methods that restrict growth.