Your leaking thatched hut during the restoration of a pre-Enlightenment state.

 

Hello, my name is Judas Gutenberg and this is my blaag (pronounced as you would the vomit noise "hyroop-bleuach").



links

decay & ruin
Biosphere II
Chernobyl
dead malls
Detroit
Irving housing

got that wrong
Paleofuture.com

appropriate tech
Arduino μcontrollers
Backwoods Home
Fractal antenna

fun social media stuff


Like asecular.com
(nobody does!)

Like my brownhouse:
   better BetterExplode
Wednesday, March 3 2010
In the process of importing a Microsoft-generated Excel spreadsheet into my homemade data-manipulating universe, I learned some things I hadn't known about the data format known as CSV (comma-separated values). In this common export format, database column values are separated by commas and rows are separated by carrigae returns. Unlike the native formats of database tables from nearly all database vendors, this format is human-readable when viewed in a text editor. My assumption was that the values in any particular CSV file are all either enclosed by quotes or they are not, depending on whether or not the column values themselves could potentially contain commas and carriage returns (which would otherwise be interpreted as delimiters). Of course this doesn't really solve anything, since it introduces another character (the double quote) that has to be escaped should it appear in columnar data. But it's a hacked-together data format from the 1960s, a time when tabular data wasn't often used for text information (and when text itself was often all-uppercase and bore only rudimentary punctuation).
It turns out that my assumption about CSV was over-simplistic, because (at least as exported by Microsoft Excel), any particular field in any particular row in any particular CSV file can either have quotes around it or it can not have quotes around it, depending on whether they are necessary. (In the CSV export code I've written, the fields are always quoted and any quotes in the tabular data are always escaped, and Excel never had any problem importing this more predictable format.)
It turns out that parsing a CSV file where the data can go back and forth between being quoted and not being quoted is a challenging programming problem. Today I decided to tackle it after discovering that the PHP function that is supposed to do this (str_getcsv) doesn't seem to work (and even if it did, it's only present in very recent versions of PHP). The function I use for parsing delimited strings is called BetterExplode, and it's been in my arsenal since November, 2007. It basically does what the built-in PHP function explode does, but it pays enough attention to the context of characters used as delimiters to avoid treating them as such when they are obviously part of data (quoted, escaped, or appearing in a PHP comment). Today I added a small amount of code to BetterExplode so that it could successfully parse lines of a CSV file exported by Microsoft Excel. The function works by scanning through a string character by character, going into and out of modes depending on what it finds along the way. My new code is built around a "mop" (a variable called $strMop) that catches cases that had earlier been ignored, outputting them in the result array. For those coming here from Google searches, I've provided the content you are seeking: the latest copy of BetterExplode, complete with all the special functions it relies upon:

Last night on American Idol there was a two hour show featuring the performances of male contestants. Gretchen and I already have our list of contestants we hate, particularly Tim Urban, a mop-topped whiteboy positioning himself as a hearthrob in the Jonas Brothers tradition. His voice is mediocre at best and he seems drawn to pop songs that sound like Christian contemporary composed by a sexual predator. For some strange reason most of our favorites were having an off night, leaving room for the likes of Alex Lambert (who sports an unironic mullet) and Aaron Kelley (whom we refer to as "the twerp") to surprise us.
Tonight was ladies' night, and their performances were decidedly better than those of the gentlemen. There's also an interesting personality issue at play. Usually the contestants on American Idol are some mix of vapid drips and barely-concealed publicity hounds, but the thing that is striking about this season of Idol is the number of female contestants who actually seem interesting, like people we knew and socialized with at their age. These include Crystal Bowersox, Lilly Scott, and the pleasant surprise of the evening, Siobhan Magnus, the existence of whom I had not yet become aware. The woman I disliked the most tonight was Lacey Brown, whose involvement in Idol seems to be even more about her and even less about the music than is typical for a contestant on the show.


For linking purposes this article's URL is:
http://asecular.com/blog.php?100303

feedback
previous | next