Archive

Posts Tagged ‘plain-text’

Plain Text Transmogrification through Regular Expression Sorcery

October 29th, 2009 Jason Honicker No comments

Plain Text is everywhere and it isn’t always present in exactly the format you would like it to be in. Take for example an address line like this one; “Pottsville, PA 17901”, that is in a single field (or xml element). The data needs to be separated into city, state, and zip code fields. Luckily there is a pattern to the address data, although the city name may vary in length and might include spaces such as in “New York”, there is always a comma and space before a two letter state abbreviation, another space and then a five digit zip code.

Now, I could write a little parsing program or perhaps use several string manipulation functions in my translator to get what I want, but I know a little something about Regular Expressions (regex). Read more…

Just what is the Character (Set) of that Document?

August 21st, 2009 Jason Honicker 1 comment

In this age of the Internet, where information is exchanged between systems regularly, it is all too easy to forget that computer systems can store their “plain-text” data in a lot of different ways. If you thought that UNIX Files versus Windows Files were annoying with their Line Feed versus Carriage Return+Line Feed differences, can you imagine the trouble we would have if ASCII didn’t exist?

ASCII, the American Standard Code for Information Interchange, has become a subset of many other character sets in common usage today, so you can exchange a lot of documents without too much hassle, but what do you do if you get something else?

Read more…