XML Schemas

All things are permissible, but not all things are beneficial; when developing an XML Schema, this is a good rule of thumb.  XML Schema provides many options for developing robust, flexible data structures.  One such option is the ability to define nested content model definitions.

All XML elements have a content model.  A content model defines the validation rules and structure of an element’s content.  Element content can consist of character data, child elements, or a mixture of both.  In the cases where an element contains child elements, a content model is used to define the order, cardinality and presence of the child elements.

XML Schema provides three explicit content models:

  • sequence – defines a list of sub elements and the order in which they appear under the parent element.
  • choice – defines a list of sub elements, but only one may appear under the parent.
  • any – defines a list of sub elements, but the order of the children is arbitrary.

The XML Spec allows you to specify nested content models as part of an element’s content.  For example, the following XSD and XML snippets:

<element name=”Contact”>

<sequence>

<element name=”Name”/>

<sequence maxOccurs=”unbounded”>

<element name=”Qualifier”/>

<element name=”ContactDetail”/>

…………

<Contact>

<Name/>

<Qualifier>T</Qualifier>

<ContactDetail>555.555.5555</ContactDetail>

<Qualifier>E</Qualifier>

<ContactDetail>joetheintegrator at xyz dot com</ContactDetail>

</Contact>

Even though the XML provides the ability to specify nested content models, in general we recommend against doing this.  The implied structure of data of the “Quality” and “TeleOrEmail” elements makes the subsequent software or translation maps complex as they must decipher and interpret the data structure.  You have to build business rules into your maps or programs instead of letting the structure of the data drive it.  You are reduced to looking for data structures by using some kind of “counting” algorithm:  Every 2rd, 4th, 6th, etc… element is a new instance of the data structure.

The alternative to using implied data structures is to use explicit data structures as follows:

<element>

<complexType>

<sequence>

<element name=”Name”/>

<element maxOccurs=”unbounded”>

<complexType>

<sequence>

<element name=”Qualifier”/>

<element name=”ContactDetail”/>

…………

<Contact>

<Name/>

<Type>

<Qualifier>T</Qualifier>

<ContactDetail>555.555.5555</ContactDetail>

</Type>

<Type>

<Qualifier>E</Qualifier>

<ContactDetail>joetheintegrator at xyz dot com</ContactDetail>

</Type>

</Contact>

The introduction of the <Type> element as a “container” element, adds an explicit data structure to the XML document.  The <Type> element gives context to the elements contained within it.

In this case, the child elements are structured as a cohesive group of related elements.  Cohesive data structures naturally lend themselves to cohesive programs that operate on the data – one of the goals of well designed software.

Instead of using some kind of “counting” algorithm for processing the elements, you can use standard XML query mechanisms.  Its way easier to say “Retrieve all the <Type> elements and insert the values into my database.” than it is to say “Give me all the elements under <Contact> and each time I encounter a <Qualifier>, use it and the next element (<ContactDetail>) value and insert them into my database”.  In the first case, we can devise a method for working with just <Type> elements (cohesive).  In the second case, we have to devise a more complicated logic loop to deal with the unstructured data.

Furthermore, at some point in the future, if the schema is changed (you can count on it), an explicitly defined data structure will typically cause less re-factoring in your business logic.  If your business logic matches your schema (by having cohesive rules that deal with cohesive data structures), you can add or delete data elements from the data structure, and it will only affect the rules that apply to the data structure.

In the long run, explicit structures in XML, while they may add a little bit of bloat to your document, will have big returns as it pertains to maintenance and re-factoring activities.

Leave a Reply

Your email address will not be published. Required fields are marked *


*