Latest posts.

stupid closure tricks in py3k

This is kind of silly, but I like the nonlocal keyword in Python 3 (well, ok, I like let-bindings in Lisps much much better, but this will do) and how you can get a closure and then poke the insides of the closure to see what it’s up to.  I’m sure this isn’t recommended, but it’s a fun trick:

def counter():
    a = 0
    def inc():
        nonlocal a
        a+=1
        return a
    return inc
 
c = counter()
print(c())
print(c())
print(c())
 
print(c.__closure__[0].cell_contents)

I’d guess the locations of cells in the closure tuple are determined at compile time (i.e. they won’t change between runs), but I haven’t investigated this.

relating to nosql

About the time the nosql movement started to get legs, I started to get interested in relational databases.  They’re awesome beasts, and horizontal scalability issues aside, they handle a lot of the very hard problems in the web programming space (i.e. durability and concurrency).  But when data gets large and you can’t find the 2 million quarters in your sofa for that oracle license (or the 2 million braincells for that oh-so-perfect sharding scheme),  the nosql databases start to show their appeal.

But maybe there’s another way.  This article is mostly focussed on relaxing the durability constraints of postgres but also mentions a possible mixing of relational and non-relational systems.  What a cool idea!  Use the somewhat batch-oriented, distributed, and non-relational system as a large backing store, and populate a traditional relational database (almost) on demand by transforming and moving (ETLing) the data there.  With the right transforms, we get all the ad-hoc queries of our beloved star schema data warehouses without the expense of a big-honkin’ database server.  Certainly these ‘transforms’ are the tricky part (and slicing the data out of the non-relational store along a particular dimension (probably time) will almost always be necessary), but the gains seems pretty great.  Maybe we’ll call this approach a “pattern” someday.

Isn’t big data exciting?

simple s-expression parser in python

A few weeks back, I boasted in a bar that anyone could write a simple s-expression parser from scratch (no flex, no yacc, no bison, no antler, etc).  Well, I could not boast without doing it myself, so I finally did.  My s-expression lexer/parser weighs in at about 170 lines without tests (450 lines with them), and parsers integers and symbols.  I even created a little repl which takes this:

(+ 1 2)

And converts it into this:

[Symbol(+), Number(1), Number(2)]

A pretty-printed python list consisting of objects representing symbols and numbers.

I think the design is fairly readable if a bit unusual. There are classes for each token (instances of which are used for fully and partially formed tokens) and we “add” these tokens together to form a larger token and ultimately invoke a next_token_exception allowing the lexer to yield the largest, greediest, most fully-formed token possible.

Here’s an example of the lexer in action:

>>> symbol_token("h")
<parse_sexp.symbol_token object at 0x7fb9e2d253d0>
>>> _ + symbol_token("i")
<parse_sexp.symbol_token object at 0x7fb9e2d25710>
>>> formed = _
>>> formed + " "
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parse_sexp.py", line 33, in __add__
    raise next_token_exception()
parse_sexp.next_token_exception
>>> formed.value
'hi'

After lexing, we’re left with a stream of fully-formed tokens; it’s a simple matter to filter out the whitespace tokens and start yielding instances of Symbol() and Number() classes with proper, pythonic types for the data.

Although I used regexes (quite unnecessarily) and generators, this little thing could probably be ported to C. It’d be the fastest, most useless parser/lexer around :)

Happy Hacking!

on the road

A month ago, I left Chicago on my bicycle, headed to Moline and then south to Saint Louis.  The journey spanned 7 days, 6 nights, and 589 miles.

I rode alone with dreams of solving my problems and divining my future in sunbaked introspection.  I carried all my gear (too much of it), camped 4 nights (hotel-camped 2 nights ;) ) and navigated my poorly planned route with two guide books and my G1.

Riding alone has its perks.  Plenty of time to think, and plenty of time to think about getting lost.  Fortunately my navigating skills seemed to improve after the second day (I nearly burst into tears that day after the GPS put my location in the middle of a lake, not, as I expected, between the Hennepin and I&M Canals).

Camping alone is different.  There are very few perks.  All my nights were fairly restless, and the one night I didn’t have to hiss at raccoons to shoo them away, a large storm came through at 2:00am.  My tent held out, though, and I can sleep when I’m dead.  But still, I’ve no shame now admitting to being afraid of the dark.

`

Being alone in the middle of Illinois is an odd thing.  It’s possible to ride for hours without seeing another human, but at no point are you truly in the middle of nowhere.  Everything has the imprint of humanity.  The road itself is a man-made scar through the land, but moreover, the land is far from natural.  Almost every inch is cultivated, pruned, or tilled for our use.  I suppose this is the trade-off  of a civilized society (possibly the definition of it), but it still makes me glad there are areas we have set aside for the exclusive non-use by humans.

In general the trip is somewhat hazy.  I probably rode too fast and did not stop at enough towns to really soak everything in.  But still I find myself thinking of very specific things about the ride, whether it be pushing my bike up a giant hill or resting for a bit on a bench.  I will do this again.  Someday I will find the time to ride farther west or maybe even ride the entire American Discovery Trail.

things I have re-learned so far this week

  • close your files, flush your files, fsync() your files, sync() if you have to, just don’t be stupid.
  • don’t use globals for mutable state. especially when you’re in a multi-threaded environment. or you’re interested in correct answers.
  • there’s a good chance the ORM is writing shitty sql.
  • coffee != sleep
  • more tests!

delaying computation: lazy dictionaries in Python

Yesterday I had a problem I ended up solving another way, but my first approach required a lazy dictionary.  That is, a dictionary filled with the results of some other function, but also one that only evaluated this computation when the dictionary was first accessed.  Here’s a silly example use case:

class LazyDictionary(object):
 
    def __init__(self, callback=dict):
        self.data = None
        self.callback = callback
 
    def evaluate_callback(self):
        self.data = self.callback()
 
    def __getitem__(self, name):
        if(self.data is None):
            self.evaluate_callback()
        return self.data.__getitem__(name)
 
    def __setitem__(self, name, value):
        if(self.data is None):
            self.evaluate_callback()
        return self.data.__setitem__(name, value)
 
    def __getattr__(self, name):
        if(self.data is None):
            self.evaluate_callback()
        return getattr(self.data, name)

The real version I started to use (and then threw out because it was not needed) inherited from a dict instead of using delegation. At the time I thought this was necessary to get it to play nice with the Django templating system (I may have been wrong about this). Anyways, I like this version much better, and I view inheritance for the composition of behavior with growing skepticism each day.

I also have unittests for this but they made this post unappealing long (I should really start using that github gist thingie. Is there a mercurial equivalent? (also I fixed my blog comments)).

python thirty days ago

These tiny things make Python fun and useful:

import datetime
thirty_days = datetime.timedelta(days=30)
thirty_days_ago = datetime.date.today() - thirty_days

plotting the racial makeup of Chicago’s public schools

schools

I created this image using matplotlib and the data from my CPS Racial Data Warehouse project.  Circle size represents school population.  The axes are latitude and longitude (as a reference, the loop is the empty rectangle just below the left corner of the legend).  Also I should note the Native American population was excluded from this image because of a technical issue.

Full size image here, more on this project soon (in the meantime, the code is here).

software quality

I’ve been thinking a lot about software quality and testing. This is what I’ve come up with:

softwarequality

palindromes on twitter – rettiwt no semordnilap

Lately I’ve been thinking about processing streams of data on the fly. Weirdly enough, I have a fascination with palindromes, so I wrote a very bad hack with Twisted to hookup to the streaming twitter feed and search for these things. Over the course of about a day, I found 259 distinct palindromes where palindromes are (very narrowly) defined as:

  1. Words without punctuation, numbers or accents (i.e. only a-z in regex).
  2. Longer than four characters.
  3. Consisting of three or more distinct characters.
  4. Not in a set of stop words I decided I was not interested in.

The astute reader will notice that these rules actual mean all valid palindromes will be 5 or more characters. Now for the results:

Twitter Palindromes

“Level” is the clear winner with 662 occurrences. The venerable palindrome “racecar” sped into the #30 spot with a mere 8 occurrences.

Unfortunately I did not keep track of how many tweets I scanned, but it appears a very small percentage of tweets contain this type of palindrome.  I would like to convert this to a full-on palindrome tracking twitter bot that will annoyingly tweet at you if you tweet something palindrome-ish, but I am slightly afraid someone would take that code and make streaming-twitter-pesterbot-v1™.  Hunting for palindromic sentences would be fun, too (a palindromic 144 character tweet would be pure genius).  Most importantly, I’d like to search for sarahpalindromes which I’ve defined as words that become palindromes with one transposition (sarahpalindrome detection algorithms are much appreciated…you could probably get a paper out of something like that).