Tag Archives: plain-text

Plain Text Transmogrification through Regular Expression Sorcery

Plain Text is everywhere and it isn’t always present in exactly the format you would like it to be in. Take for example an address line like this one; “Pottsville, PA 17901”, that is in a single field (or xml element). The data needs to be separated into city, state, and zip code fields. Luckily there is a pattern to the address data, although the city name may vary in length and might include spaces such as in “New York”, there is always a comma and space before a two letter state abbreviation, another space and then a five digit zip code.

Now, I could write a little parsing program or perhaps use several string manipulation functions in my translator to get what I want, but I know a little something about Regular Expressions (regex). Continue reading

Just what is the Character (Set) of that Document?

In this age of the Internet, where information is exchanged between systems regularly, it is all too easy to forget that computer systems can store their “plain-text” data in a lot of different ways. If you thought that UNIX Files versus Windows Files were annoying with their Line Feed versus Carriage Return+Line Feed differences, can you imagine the trouble we would have if ASCII didn’t exist?

ASCII, the American Standard Code for Information Interchange, has become a subset of many other character sets in common usage today, so you can exchange a lot of documents without too much hassle, but what do you do if you get something else?

Continue reading