Data Transformation Mapping – Can it be Automated?

In my previous post on this subject, I wrote about the neglected problem of data transformation mapping productivity, and its impact on integration project costs, particularly for B2B integration in companies with many customer relationships (and therefore many documents pairs to map).

Although attempts have been made to automate aspects of mapping, it remains largely a manual activity.  Gartner and other industry analysts have published research on automated mapping methods, particularly Dictionary-based mapping, but real-world implementations haven’t put a serious dent in the mapping productivity problem.  Why is that?

Mapping is a large and complex problem space.   It accounts for the single largest part of integration time and effort, encompassing data validation, translation, value derivation, enrichment, aggregation, and routing.  Defining these outcomes involves two kinds of specification, not one:  source-target matching and transformation.  Most attempts to automate mapping have focused intensively on the first part (matching), but very little on the second (data transformations).  But the biggest obstacle to mapping automation is that specifying the right mapping action may require an understanding of business and data contexts and requirements.  And that requires human decision-making.

So to the degree that mapping automation is possible, it must occur in the context of a broader, human-guided mapping process.  Simply defining a source-target dictionary and “lighting the fuse” won’t produce a map that can be used in a production environment.

Integrating automated mapping methods with the human-guided mapping process imposes critical implementation requirements for integration technology providers, including:

  • Unobtrusive integration of automation methods and human mapping decisions in the UI
  • Support for both source-target matching and data transformation aspects of mapping
  • Automation that works with both familiar and unfamiliar document types and combinations
  • Results that can be verified quickly and easily by humans
  • Configurability to suit different circumstances and preferences
  • Ability to “learn” from and adapt past decisions to future mapping situations

Each of these requirements could be subjects for their own blog posts.  The bottom line, however, is that past attempts to automate mapping have ignored or fallen short in all of these areas.

Automating data transformation mapping isn’t easy, but it is possible.  The next post in this series will examine the technology solution strategy taken by EXTOL, with the Smart Mapping feature introduced in EXTOL Business Integrator v2.5.

2 thoughts on “Data Transformation Mapping – Can it be Automated?

  1. Roy Hayward

    I have used a number of solutions that had some “auto” mapping tools. For the most part they have all fallen short of the promise.

    So I am interested in what you are going to say about this. Is there a better tool that I haven’t found yet? I would like to see it.

    But I will be approaching from a self acknowledged position of skepticism. Many of the “auto” mapping tools I have been exposed to left the integration person with a map that could not be manually maintained. Thus any changes need to go back through the mapping process and this can cause some unexpected bugs and changes to happen.

    So for me, an advanced integrations expert, when I evaluate an integration mapping tool, I am thinking about how this tool would have behaved in a long history of support situations. And this is not something many of the tool makers are thinking about.

  2. Jim O'Leary

    Roy, Sorry for the delay in this response, I will reach out to you directly, but want to post a comment here also so that others with similar questions can see my reply.

    I understand and share your skepticism about “auto-mapping” tools, in general. Some, as you point out, are essentially code generators, and don’t support “round trip” modification of generated outputs. And aside from the initial inputs to the generation step, they generally don’t allow human inputs during the generation process to control generated map outputs.

    Most approaches are also one-dimensional in terms of the mapping methods they implement. Source-target matching based on syntax and type comparison is not uncommon, but element naming differences and matching engine limitations (e.g., not cross-matching name and description) limit the scope of application.

    Dictionary and schema tagging approaches generally provide better matching results, but require setup effort and usually don’t provide context-sensitive map rule generation (e.g., based on the document section or level at which a match occurs).

    Matching based on mapping history produces the most accurate results, but depends on the availability of examples – or mapping pattern profiles (as is the case with our solution), in order to work.

    We believe that the bottom line on matching methods is that multiple methods are needed, with user-weighting, so the matching engine can adapt to different source / target document cases and the availability of external matching metadata (dictionaries, tagged schemas, mapping examples).

    Beyond matching method concerns, there are other limitations, like syntax category constraints (e.g., methods that only work for EDI), source-target cardinality constraints, and generation scope constraints (e.g., generation limited to full maps or individual elements) that further limit the applicability of many approaches.

    Finally, there are cases in which no automated method can produce a complete result, which goes back to your point about the need to be able to modify the results of generation. A simple example of this is when production of a target document element requires invocation of an external function (e.g., program or web service) that is not available.

    The bottom line is that there is no available automated tool that can “automagically” generate 100% of every map in every circumstance. But if you can generate 50%, 70%, or even 90% in some cases, you can achieve big time reductions in what is probably the most time-consuming and error-prone part of a typical integration project.

    Hope that helps.

Leave a Reply

Your email address will not be published. Required fields are marked *