Monthly Archives: June 2010

simple s-expression parser in python

A few weeks back, I boasted in a bar that anyone could write a simple s-expression parser from scratch (no flex, no yacc, no bison, no antler, etc).  Well, I could not boast without doing it myself, so I finally did.  My s-expression lexer/parser weighs in at about 170 lines without tests (450 lines with them), and parsers integers and symbols.  I even created a little repl which takes this:

(+ 1 2)

And converts it into this:

[Symbol(+), Number(1), Number(2)]

A pretty-printed python list consisting of objects representing symbols and numbers.

I think the design is fairly readable if a bit unusual. There are classes for each token (instances of which are used for fully and partially formed tokens) and we “add” these tokens together to form a larger token and ultimately invoke a next_token_exception allowing the lexer to yield the largest, greediest, most fully-formed token possible.

Here’s an example of the lexer in action:

>>> symbol_token("h")
<parse_sexp.symbol_token object at 0x7fb9e2d253d0>
>>> _ + symbol_token("i")
<parse_sexp.symbol_token object at 0x7fb9e2d25710>
>>> formed = _
>>> formed + " "
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parse_sexp.py", line 33, in __add__
    raise next_token_exception()
parse_sexp.next_token_exception
>>> formed.value
'hi'

After lexing, we’re left with a stream of fully-formed tokens; it’s a simple matter to filter out the whitespace tokens and start yielding instances of Symbol() and Number() classes with proper, pythonic types for the data.

Although I used regexes (quite unnecessarily) and generators, this little thing could probably be ported to C. It’d be the fastest, most useless parser/lexer around :)

Happy Hacking!

on the road

A month ago, I left Chicago on my bicycle, headed to Moline and then south to Saint Louis.  The journey spanned 7 days, 6 nights, and 589 miles.

I rode alone with dreams of solving my problems and divining my future in sunbaked introspection.  I carried all my gear (too much of it), camped 4 nights (hotel-camped 2 nights ;) ) and navigated my poorly planned route with two guide books and my G1.

Riding alone has its perks.  Plenty of time to think, and plenty of time to think about getting lost.  Fortunately my navigating skills seemed to improve after the second day (I nearly burst into tears that day after the GPS put my location in the middle of a lake, not, as I expected, between the Hennepin and I&M Canals).

Camping alone is different.  There are very few perks.  All my nights were fairly restless, and the one night I didn’t have to hiss at raccoons to shoo them away, a large storm came through at 2:00am.  My tent held out, though, and I can sleep when I’m dead.  But still, I’ve no shame now admitting to being afraid of the dark.

`

Being alone in the middle of Illinois is an odd thing.  It’s possible to ride for hours without seeing another human, but at no point are you truly in the middle of nowhere.  Everything has the imprint of humanity.  The road itself is a man-made scar through the land, but moreover, the land is far from natural.  Almost every inch is cultivated, pruned, or tilled for our use.  I suppose this is the trade-off  of a civilized society (possibly the definition of it), but it still makes me glad there are areas we have set aside for the exclusive non-use by humans.

In general the trip is somewhat hazy.  I probably rode too fast and did not stop at enough towns to really soak everything in.  But still I find myself thinking of very specific things about the ride, whether it be pushing my bike up a giant hill or resting for a bit on a bench.  I will do this again.  Someday I will find the time to ride farther west or maybe even ride the entire American Discovery Trail.

things I have re-learned so far this week

  • close your files, flush your files, fsync() your files, sync() if you have to, just don’t be stupid.
  • don’t use globals for mutable state. especially when you’re in a multi-threaded environment. or you’re interested in correct answers.
  • there’s a good chance the ORM is writing shitty sql.
  • coffee != sleep
  • more tests!