Spreadsheets are a little bit like hammers: you can use them for almost anything. And, they’re a relatively simple mechanism so what could be hard about them…right? Humans seem to have a propensity to find new ways of complicating technology, and I think most of you have probably guessed that that applies to spreadsheets, too. I don’t expect that will ever change.
Spreadsheets are a natural choice for people who don’t want (or don’t have the time) to invest in more formalized options like EDI or XML. They’ve gotten a bad rap, perhaps because they’re arguably less sophisticated than the alternatives, but also because of the complication of dealing with them in an automated fashion.
Before we decided to add support for spreadsheets to our integration platform, we were asked by several prospects and even given some examples of what they needed to handle. These cases were, of course, quite varied and we thought about it for a while, wondering if we could come up with something that would be useful.
In the end, it came down to the language of our metadata: what things do you need to be able to express how a Spreadsheet is going to look when it shows up at your door. A metadata language for Spreadsheets needs to be able to allow the user to tell us where to find data, and what the data looks like. Seems simple…after all, that’s what metadata is.
After iterating on the problem for a while, we discovered that there are two basic types of spreadsheets that our customers need to handle: forms and lists; often times we see both types in the same spreadsheet (think of a purchase order form with an embedded list of line items).
The thing about these two basic types is that how you access the data (or how you populate it, if you’re creating the spreadsheet) is quite different. In general (and it’s dangerous to say “in general” when you’re talking about spreadsheets), references on form-type documents aremore absolute (e.g. the “ship to” address is always in cell C9). Conversely, lists of items will generally start in the same row, but they usually end when the data runs out, not in some fixed row. Usually.
In the case with lists, there are some other interesting complications:
- Sometimes an item in the list will be represented by more than one row. The problem of how you tell the different rows apart for a given item can be rather challenging.
- We find situations where there is a simple list, but every so often there is a sub-total, or following a list of items we find a grand total. The trick here is that in both cases (sub-totals and grand-totals), the location is not fixed and can only be determined heuristically.
- Occasionally, we find two different lists, one following the other. The trick here is to know when one list ends and the other starts.
The things I’ve mentioned above are all dealing with the question of how you locate the cells and rows where your data exists. As we talked with prospects and customers, we found that many of them were already doing this, but with custom code, so they were pretty familiar with the problems. Their solutions were often found in complicated, fragile and hard to maintain custom code for each Spreadsheet they need to consume or produce.
As an alternative, we might have placed the burden for dealing with this problem of locating cells and rows on our users (during mapping of the data). But that seemed like a cop out: they really wanted to get away from this burden and get on to the business behind these Spreadsheets. As with most things in our domain, if we can find ways of declaring characteristics in metadata, we can use these hints to make the mapping less of a burden.
In my next blog on this topic, I’m going to take up the topic of separating the formatting from the business data.