walking around the errors

 Your leaking thatched hut during the restoration of a pre-Enlightenment state.

Hello, my name is Judas Gutenberg and this is my blaag (pronounced as you would the vomit noise "hyroop-bleuach").

[latest article]

Jun | July | Aug
2015 | 2016 | 2017
index of years

links

decay & ruin
Biosphere II
Chernobyl
dead malls
Detroit
Irving housing

got that wrong
Paleofuture.com

appropriate tech
Arduino μcontrollers
Backwoods Home
Fractal antenna

fun social media stuff

Like asecular.com
(nobody does!)

Like my brownhouse:

walking around the errors
Wednesday, July 13 2016

Today in my remote workplace, I focused on a task involving missing records in a database. My workplace responsibilities revolve around databases on two servers. One of these contains email lists for a proprietary mass mailing system and the other contains a database of donors and contacts. In theory, the later database holds the best, most complete copy of contact information. But to do that, it has to be updated with every record that is put into the mass emailing system. Recently for several weeks, though, the system that sends the donor database most of its data was broken, meaning there was now data in the mass emailing system with no corresponding records in the donor database, a situation Meerkat had indicated should never happen. And he'd been so sure it had never happened that he hadn't built a system to copy data from the mass email system to the donor database. Now that I was sure such a system was necessary, I'd made myself a task to build it. That was how I spent most of my workday.

First I built an API on the mass email system allowing me to produce an arbitrary number of records whose contact ID was larger than a given number. Since the mass mailing system is fragile, I settled on 500, a fairly modest number. I then build a cron job on the donor database side to retrieve these chunks of records, scan through them to see if any didn't exist in its database, and then update a local table with the number of the last contact ID it had processed. This was based on code Meerkat had written to do things in the opposite direction. It all seemed to work in theory, so I started it up to do a test run and just put missing emails in a text file. But soon theory ran aground on the shoals of practice. The data from the mass emailing system was being presented in JSON, but there was a problem. Some of the data had been stored incorrectly and, on retrieval, it wasn't in the proper UTF-8 format. In a reasonable world, the bad data would just be ignored, but PHP's json_encode function refused to display any data at all if any of it was improper UTF-8 data. This meant that occasionally requests for 500 records would come back with nothing.

PHP does provide a way to deal with this, and that is for me to pass in an extra parameter to json_encode called JSON_PARTIAL_OUTPUT_ON_ERROR, which allows it to produce as much data as it can while ignoring errors (that is, not be so fucking brittle). This works fine on the version of PHP on my main computer, which I use to debug my code. But when I uploaded this code to the mass emailing system, the whole thing exploded in a terrifying way. That server can send thousands of emails per minute and is always sending email somewhere, and for whatever reason it did not like JSON_PARTIAL_OUTPUT_ON_ERROR, thinking it an unexpected string instead of a reserved integer. In the couple minutes it took me to figure this out, the server had sent me hundreds of admin error emails. These queued up in the Gmail system The Organization uses for its employee mail, and I continued to receive them hours later (as did my boss; though I'd told him not to worry). Eventually I was forced to make myself a filter that sent every email containing the string "json_encode" directly into the trash.

That experience with a single little bug on the mass emailing system made me gun shy about trying anything else to clean up the JSON data on that side. I decided to deal with it as best I could on the receiving end. So I greatly increased the complexity of my code. Now, instead of always calling for 500 records, it could call for different numbers of them. If it had recently failed to get any records, it would cut the number of records it asked for in half and then ask for that new number the next time through (a minute later). If it didn't get any records, it would cut the number it asked for in half again. It could keep halving the number of records it asked for until it was down to just asking for one, at which point it would increment the contact ID by one and scan through the records one by one, and then start doubling the number of records it requested until it was back to 500. This way it could get past the occasional records containing bad data and continue with the good ones without skipping any but the bad ones. The halving and doubling of the number of records requested depending on whether or not things happen is similar to a binary search, and is reasonably efficient way to do it. My system wasn't perfect, though, because it only knew about the present state and the state before that, and to really do this perfectly it would need to know about three states. Another limitation was that the script only ran once each minute, meaning that in some minutes it only grabbed a single record. (Time wasn't much of an issue and I didn't want to overburden the mass mailing server, which has more important things to be doing.)

At around 9:00pm, I was feeling pretty good about the day's work, so I went with Gretchen when she wanted to go "skinny dipping" at the pool at the end of the Farm Road. I made the mistake of walking there barefoot; the gravels in the relatively-new surface of the Farm Road poked into my feet, which haven't been adequately toughened this summer due to all my desk work. We swam around for about twenty minutes while the dogs stood around bored on the pool's edge. But on the walk home (which Gretchen and I did naked), Ramona ran off into the woods and could be heard barking at something interesting. A light rain began to fall, and that might've been what got her to abandon her hunt relatively prematurely.

For linking purposes this article's URL is:
http://asecular.com/blog.php?160713

feedback
previous | next