Our Friend, the Flat File

A common staple in the developer’s diet has been the flat file. These interesting creatures come in as many varieties as there are animals in nature. They have been instrumental in allowing us to send data back and forth between systems and application, often being the lowest common denominator that can be used as a data transport. Let’s explore what these files are and how they came to be adopted as a common basis for how we exchange information today.

First, we should discuss the various flavors of flat files. At the basic level, a flat file is a character string of data.  This long string, ranging from a few bytes to several gigabytes, contains all of the information that the document is trying to convey.

Flat files are used for more than simply containing a single instance of a document. Frequently, flat files are used as “containers” to hold many documents (such as EDI X12 837 HealthCare Claims documents can contain millions of health care claims from a service provider). The business requirements agreed upon between you and your business partner will dictate the boundaries of a complete “document” within the file.

The contents of the file may contain carriage-return/line-feed characters, identifying separations we refer to as “records”. However, CR/LF is not the only way to define record boundaries; any character can be defined as a record-boundary delimiter. This definition is made at the business-level and then applied to the document. In some defined flat file standards, the record-delimiter is actually identified in a “preamble” record within the document. This approach allows for a soft definition of the delimiters. Flat files frequently contain many records and these records can be of different formats. This is known as a multi-format flat file. To identify different types of records, both approaches typically use a few characters at the start of the record (or in the first position, in the case of the delimited record) to store a specific “record id” value.  As an example, this marker allows a program to discover header records, detail records and shipment records easily in a purchase order.

The two most common field-storage approaches within a flat file are delimited and fixed-length records. Delimited records use a character as a separator between the actual fields that compose the record. Fixed-length records use a pre-defined length for each field, agreed upon as a business requirement. The fields can contain alphanumeric information, including extended characters.  They can also contain binary information, usually re-encoded into a character representation (such as Base64).

You can see from the example cited above, EDI has leveraged the power and compactness of delimited flat files. This has proven to be a very efficient way to store and send data between businesses. In fact, I frequently make reference to EDI files, calling them multi-format delimited flat files that have a Standardized definition. X12 and EDIFACT documents fall into this category.  ERP systems, for many years, have also leveraged flat files as a convenient interface into their systems. By providing a flat file interface to incorporate or retrieve data, vendors have wisely insulated the back-end system allowing changes to take place during upgrades and not requiring us to re-write our interface hooks. This approach also allows the ERP software to validate our data before it is written into the system. That is an important consideration for data safety.

Whether it is used to store millions of records/documents, our preferences for our favorite application, or a few characters that are downloaded to an electronics chip, we owe a lot to this versatile entity and it is clearly here to stay.

Leave a Reply

Your email address will not be published. Required fields are marked *