Throughout my time at EXTOL, I’ve been in close contact with the customer base. One topic of conversation that has consistently come to the forefront over the years is understanding exactly how much data your business integration software needs to “consume and digest.” In more cases than you would expect, a semi-understanding or lack of understanding exists in this area.
The true problem here is that if you don’t know how much you need to process and how it needs to be processed, future decisions will be difficult to make. Even if you believe that you understand your data requirements, perhaps a review is still worth your time, as it is much cheaper to pay now than later. Just this year alone, I’ve been involved with about ten customers who decided to review their consumption requirements. The research was positive in 100% of the cases. Even customers who were confident that they really knew their system came away with valuable information.
How do you truly know what your system needs to consume? In the cases that I was involved in, it was handled by simply monitoring the data input to the system over time (normally a week, sometimes a month) and then charting it. This task may not be as difficult as it sounds if you are already using an integration implementation that stores information that will allow you to output a listing of each file processed by the system. However, if your current implementation doesn’t allow for this, it can be time consuming.
Once you have a listing of your data, you can chart it. The most useful method is to break the data into types; days, hours, trading partner and size. For instance, “fifteen EDI 850 transaction sets from ACME between 2:00 AM and 3:00 AM on Friday that equal 6 MB” or “no XML purchase orders between 1:00 PM and 2:00 PM on Sunday.” What you define as a “type” is really unique to your business and the documents that you consume. The result will be a spreadsheet that can be used to build charts and graphs that will be very useful.
What do you mean by digesting data? Basically, it’s what you need to do with the data. Now that you are able to chart out exactly how your data breaks down hourly by type and amount, it’s really important to consider what needs to happen with it. You should be taking into consideration such factors as number of transformations (“one to one” or “one to many”); how complex the transformations will be (number of data moves and manipulations, program calls during transformation); and turn around times dictated by service level agreements with customers. To illustrate why this is important, consider the difference between following two statements:
1. I will be receiving 10,000 EDI purchase orders from Acme in hour x.
2. I will be receiving 10,000 EDI purchase orders from Acme in hour x that I need to transform into a database through a ruleset that will be performing 50 moves and 2 program calls per purchase order. I also need to turn around a functional acknowledgement document and advanced ship notice document to Acme within an hour or I will lose the business.
So, what does all of this information get me? The ability to make good decisions and confidence in your decisions in two major areas:
1. Hardware — How much RAM? Hard drive speed and capacity? Processor types, number of cores? Machine virtualization potential? The ability to run other software on the same hardware configuration VS having a dedicated system?
2. Data — Source and target types? Sizes: average, largest documents, percentage of whole that largest documents account for, messages per interchange in EDI documents? Volumes: average per hour, average per day, peaks and valleys? Turnaround and processing times? Still not convinced? Here are a few major areas that you may not be considering where this information would be priceless:
1. High availability and disaster recovery — How robust is the system that needs to be available or recovered?
2. Backup requirements — How large will your backups be? How long will they take to run? When do you have adequate windows of “quiet time” to perform them?
3. Maintenance — When are the best times to perform unrelated system maintenance that requires machine downtime that will minimally effect production?
4. Data retention — How much historical data do you need on hand on the system and can the system handle that amount of data without degrading in performance? Is a data warehouse solution warranted?
I’ll close with some advice. Always pay attention to your integration monster. Learn its patterns and behaviors. Be friends with the monster or it will turn on you!