Deciding how to integrate application data in a way that limits the impact of application changes can be a serious challenge. Direct, point-to-point integrations are efficient and can be implemented rapidly, but they ignore the notion of interface resiliency. To complicate matters, modes of data transport such as files or messages and whether the data crosses business domain boundaries expand the complexity of “normalizing” the information.
These challenges occur in both the Business-to-Business (B2B) and Application-To-Application (A2A) domains. In the B2B space, several data “standards” have emerged to help coordinate interoperability efforts such as ANSI X12 and UN/ECE EDIFACT from the EDI world and RosettaNet for XML content. Integrating to such standard formats limits the impact of application-level changes made by either partner.
In the A2A space, vendors have tried to solve interoperability within their own domains by exposing interfaces into their products. An example of this type of interface is the SAP IDoc format. However, these interfaces are one-sided “point solutions” in that they assist with integration into and out of the ERP, but do not take into account any connectivity to other “interfaces” within the enterprise A2A landscape.
Employing a common, intermediate data model for A2A integration can provide isolation from application interface changes in the same way that EDI isolates partners from application changes in a B2B context. Canonical structures are one of the most popular design patterns from the Enterprise Application Integration realm and can be beneficial to use. They provide a “library” of datasets that are typically a superset accommodating all of the content that we need across all interfaces.
There are several approaches that we can leverage to create a canonical model:
(1) Design a continually-evolving superset of data items
(2) Create a set of base documents and add supplements to handle unique cases
(3) Repurpose an industry standard such as EDI for internal integration
So, which is best? Actually, all three methods have been implemented successfully over the years. The interesting observation is that there is no “silver-bullet” that stands out as the clear “best practice”. Whichever method we choose depends on our implementation requirements.
For the superset option (1), we need to be able to design data sets that encompass a union of all use cases. This results in record formats that contain a large number of columns and, in most cases, quite a few empty fields for each record at any given time. The extra fields could increase transport time and memory requirements during processing. Another negative impact is that periodic changes to the canonical schema can disable integrations that are based on them.
To mitigate the “dead space” in the records, the base document approach (2) works nicely, allowing the base document to be sufficiently populated and supplements “joined” for connection-specific data extensions. This is an efficient design but adds a layer of complexity in that the supplement records are “linked”.
The third option is to leverage the “groupthink” that created industry standards such as EDI (X12 or EDIFACT) and map to/from those formats (3). Generally, this approach accommodates all of the data needed and provides a well-documented framework for most integration needs.
Up to this point, I have been touting the value of canonical models and while they are a good practice, the right tooling can level the playing field for point-to-point integrations. Consider a schema-based migration tool that provides a mechanism to convert integration maps from an old source/target schema to an updated version. Converting integration maps is reduced to “mapping” between the old and new schemas. The actual data maps can be mechanically updated in a predictable and reliable fashion. This approach could change our perspective of how we look at integration design and choose the option that truly works best for our needs.
The canonical approach is dictionary-mediated mapping in which the mediation occurs at runtime, and the schema-based migration approach is dictionary-mediated mapping with the mediation occurring at design-time. They are equally flexible approaches, but the latter requires fewer maps and transformations, so both implementation time and execution time can be faster.
To facilitate integration, take a step back and think about how data moves into, within and out of your enterprise landscape. Remember, the only true constant is change…so design your systems to accommodate and embrace change. The small upfront investment in time will pay big dividends when change comes knocking on your door…typically in the form of customer requests.