Plain Text Transmogrification through Regular Expression Sorcery

Plain Text is everywhere and it isn’t always present in exactly the format you would like it to be in. Take for example an address line like this one; “Pottsville, PA 17901”, that is in a single field (or xml element). The data needs to be separated into city, state, and zip code fields. Luckily there is a pattern to the address data, although the city name may vary in length and might include spaces such as in “New York”, there is always a comma and space before a two letter state abbreviation, another space and then a five digit zip code.

Now, I could write a little parsing program or perhaps use several string manipulation functions in my translator to get what I want, but I know a little something about Regular Expressions (regex). I’m going to get the information I want using a single regex and a replacement action in my translator that supports regex. Here is the expression I’m going to use:

(.*),\s(\D\D)\s(\d\d\d\d\d)

It appears a little intimidating; however, it is basically describing the pattern the same way I did in the introduction. In the first set of parenthesis, the period and asterisk means to match anything and to stop matching when the rest of the expression is matched. Ignoring the other parenthesis for the moment, the rest of the expression says to match a comma, a space, two letters, a space, and five digits; in that order. What the parenthesis are doing, is identifying the parts of the match I’m interested in. The first set of parenthesis is the city name, the second set is the state, and the third set is the zip code. So now, for the replacement value in my translators search/replace action, I can use $1 to extract the city name; $2 to get the state, and finally $3 for the zip code.

Regular Expressions are very powerful and can be used in a lot of other ways too. They are commonly usable in text editors and for performing advanced searches. If you are interested in learning more, there are lots of resources on the internet; as well as a great book called “Mastering Regular Expressions, Second Edition” by Jeffrey E.F. Friedl. One other thing, if you are using the Firefox web browser and are interested in experimenting with Regular Expressions or testing them, there is a “Regular Expressions Tester” Add-on available.

Leave a Reply

Your email address will not be published. Required fields are marked *


*