<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>thatmattbone.com &#187; Matt Bone</title>
	<atom:link href="http://thatmattbone.com/author/mbone/feed/" rel="self" type="application/rss+xml" />
	<link>http://thatmattbone.com</link>
	<description>let's ride bikes...</description>
	<lastBuildDate>Mon, 19 Jul 2010 03:46:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>stupid closure tricks in py3k</title>
		<link>http://thatmattbone.com/2010/07/stupid-closure-tricks-in-py3k/</link>
		<comments>http://thatmattbone.com/2010/07/stupid-closure-tricks-in-py3k/#comments</comments>
		<pubDate>Mon, 19 Jul 2010 03:46:03 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=288</guid>
		<description><![CDATA[This is kind of silly, but I like the nonlocal keyword in Python 3 (well, ok, I like let-bindings in Lisps much much better, but this will do) and how you can get a closure and then poke the insides of the closure to see what it&#8217;s up to.  I&#8217;m sure this isn&#8217;t recommended, but [...]]]></description>
			<content:encoded><![CDATA[<p>This is kind of silly, but I like the nonlocal keyword in Python 3 (well, ok, I like let-bindings in Lisps much much better, but this will do) and how you can get a closure and then poke the insides of the closure to see what it&#8217;s up to.  I&#8217;m sure this isn&#8217;t recommended, but it&#8217;s a fun trick:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    a = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> inc<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        nonlocal a
        a+=<span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> a
    <span style="color: #ff7700;font-weight:bold;">return</span> inc
&nbsp;
c = counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>c<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>c<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>c<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>c.__closure__<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">cell_contents</span><span style="color: black;">&#41;</span></pre></div></div>

<p>I&#8217;d guess the locations of cells in the closure tuple are determined at compile time (i.e. they won&#8217;t change between runs), but I haven&#8217;t investigated this.</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/07/stupid-closure-tricks-in-py3k/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>relating to nosql</title>
		<link>http://thatmattbone.com/2010/07/relating-to-nosql/</link>
		<comments>http://thatmattbone.com/2010/07/relating-to-nosql/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 04:45:11 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[databases]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=284</guid>
		<description><![CDATA[About the time the nosql movement started to get legs, I started to get interested in relational databases.  They&#8217;re awesome beasts, and horizontal scalability issues aside, they handle a lot of the very hard problems in the web programming space (i.e. durability and concurrency).  But when data gets large and you can&#8217;t find the 2 million quarters [...]]]></description>
			<content:encoded><![CDATA[<p>About the time the nosql movement started to get legs, I started to get interested in relational databases.  They&#8217;re awesome beasts, and horizontal scalability issues aside, they handle a lot of the very hard problems in the web programming space (i.e. durability and concurrency).  But when data gets large and you can&#8217;t find the 2 million quarters in your sofa for that oracle license (or the 2 million braincells for that oh-so-perfect sharding scheme),  the nosql databases start to show their appeal.</p>
<p>But maybe there&#8217;s another way.  <a href="http://it.toolbox.com/blogs/database-soup/runningwithscissorsdb-39879?rss=1">This article</a> is mostly focussed on relaxing the durability constraints of postgres but also mentions a possible mixing of relational and non-relational systems.  What a cool idea!  Use the somewhat batch-oriented, distributed, and non-relational system as a large backing store, and populate a traditional relational database (almost) on demand by transforming and moving (ETLing) the data there.  With the right transforms, we get all the ad-hoc queries of our beloved star schema data warehouses without the expense of a big-honkin&#8217; database server.  Certainly these &#8216;transforms&#8217; are the tricky part (and slicing the data out of the non-relational store along a particular dimension (probably time) will almost always be necessary), but the gains seems pretty great.  Maybe we&#8217;ll call this approach a &#8220;pattern&#8221; someday.</p>
<p>Isn&#8217;t big data exciting?</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/07/relating-to-nosql/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>things I have re-learned so far this week</title>
		<link>http://thatmattbone.com/2010/06/things-i-have-re-learned-so-far-this-week/</link>
		<comments>http://thatmattbone.com/2010/06/things-i-have-re-learned-so-far-this-week/#comments</comments>
		<pubDate>Wed, 23 Jun 2010 03:43:32 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=243</guid>
		<description><![CDATA[close your files, flush your files, fsync() your files, sync() if you have to, just don&#8217;t be stupid. don&#8217;t use globals for mutable state. especially when you&#8217;re in a multi-threaded environment. or you&#8217;re interested in correct answers. there&#8217;s a good chance the ORM is writing shitty sql. coffee != sleep more tests!]]></description>
			<content:encoded><![CDATA[<ul>
<li>close your files, flush your files, fsync() your files, sync() if you have to, just don&#8217;t be stupid.</li>
<li>don&#8217;t use globals for mutable state. especially when you&#8217;re in a multi-threaded environment. or you&#8217;re interested in correct answers.</li>
<li>there&#8217;s a good chance the ORM is writing shitty sql.</li>
<li>coffee != sleep</li>
<li>more tests!</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/06/things-i-have-re-learned-so-far-this-week/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>python thirty days ago</title>
		<link>http://thatmattbone.com/2010/04/python-thirty-days-ago/</link>
		<comments>http://thatmattbone.com/2010/04/python-thirty-days-ago/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 12:42:53 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=214</guid>
		<description><![CDATA[These tiny things make Python fun and useful: import datetime thirty_days = datetime.timedelta&#40;days=30&#41; thirty_days_ago = datetime.date.today&#40;&#41; - thirty_days]]></description>
			<content:encoded><![CDATA[<p>These tiny things make Python fun and useful:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">datetime</span>
thirty_days = <span style="color: #dc143c;">datetime</span>.<span style="color: black;">timedelta</span><span style="color: black;">&#40;</span>days=<span style="color: #ff4500;">30</span><span style="color: black;">&#41;</span>
thirty_days_ago = <span style="color: #dc143c;">datetime</span>.<span style="color: black;">date</span>.<span style="color: black;">today</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> - thirty_days</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/04/python-thirty-days-ago/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>plotting the racial makeup of Chicago&#8217;s public schools</title>
		<link>http://thatmattbone.com/2010/04/plotting-the-racial-makeup-of-chicagos-public-schools/</link>
		<comments>http://thatmattbone.com/2010/04/plotting-the-racial-makeup-of-chicagos-public-schools/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 12:24:24 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[transparency]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=201</guid>
		<description><![CDATA[I created this image using matplotlib and the data from my CPS Racial Data Warehouse project.  Circle size represents school population.  The axes are latitude and longitude (as a reference, the loop is the empty rectangle just below the left corner of the legend).  Also I should note the Native American population was excluded from [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-202" title="schools" src="http://thatmattbone.com/wp-content/uploads/2010/04/schools.png" alt="schools" width="500" /></p>
<p>I created this image using <a href="http://matplotlib.sourceforge.net/">matplotlib</a> and the data from my <a href="http://cpswarehouse.thatmattbone.com/">CPS Racial Data Warehouse</a> project.  Circle size represents school population.  The axes are latitude and longitude (as a reference, the loop is the empty rectangle just below the left corner of the legend).  Also I should note the Native American population was excluded from this image because of a technical issue.</p>
<p>Full size image <a href="http://thatmattbone.com/wp-content/uploads/2010/04/schools.png">here</a>, more on this project soon (in the meantime, <a href="http://code.google.com/p/cpswarehouse/">the code is here</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/04/plotting-the-racial-makeup-of-chicagos-public-schools/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>software quality</title>
		<link>http://thatmattbone.com/2010/03/software-quality/</link>
		<comments>http://thatmattbone.com/2010/03/software-quality/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 14:49:02 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=194</guid>
		<description><![CDATA[I&#8217;ve been thinking a lot about software quality and testing. This is what I&#8217;ve come up with:]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking a lot about software quality and testing.  This is what I&#8217;ve come up with:</p>
<p><img src="http://thatmattbone.com/wp-content/uploads/2010/03/softwarequality.jpg" alt="softwarequality" title="softwarequality" width="281" height="185" class="alignnone size-full wp-image-195" /></p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/03/software-quality/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>palindromes on twitter &#8211; rettiwt no semordnilap</title>
		<link>http://thatmattbone.com/2010/01/palindromes-on-twitter-rettiwt-no-semordnilap/</link>
		<comments>http://thatmattbone.com/2010/01/palindromes-on-twitter-rettiwt-no-semordnilap/#comments</comments>
		<pubDate>Sun, 24 Jan 2010 23:35:32 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=177</guid>
		<description><![CDATA[Lately I&#8217;ve been thinking about processing streams of data on the fly. Weirdly enough, I have a fascination with palindromes, so I wrote a very bad hack with Twisted to hookup to the streaming twitter feed and search for these things. Over the course of about a day, I found 259 distinct palindromes where palindromes [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been thinking about processing streams of data on the fly.  Weirdly enough, I have a fascination with palindromes, so I wrote <a href="http://bitbucket.org/thatmattbone/twitiwt/">a very bad hack</a> with <a href="http://twistedmatrix.com/trac/">Twisted</a> to hookup to the <a href="http://apiwiki.twitter.com/Streaming-API-Documentation">streaming twitter feed</a> and search for these things.  Over the course of about a day, I found 259 distinct palindromes where palindromes are (very narrowly) defined as:</p>
<ol>
<li>Words without punctuation, numbers or accents (i.e. only a-z in regex).</li>
<li>Longer than four characters.</li>
<li>Consisting of three or more distinct characters.</li>
<li>Not in a set of stop words I decided I was not interested in.</li>
</ol>
<p>The astute reader will notice that these rules actual mean all valid palindromes will be 5 or more characters. Now for the results:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-178" title="Twitter Palindromes" src="http://thatmattbone.com/wp-content/uploads/2010/01/out.png" alt="Twitter Palindromes" width="500" height="600" /></p>
<p>&#8220;Level&#8221; is the clear winner with 662 occurrences. The venerable palindrome &#8220;racecar&#8221; sped into the #30 spot with a mere 8 occurrences.</p>
<p>Unfortunately I did not keep track of how many tweets I scanned, but it appears a very small percentage of tweets contain this type of palindrome.  I would like to convert this to a full-on palindrome tracking twitter bot that will annoyingly tweet at you if you tweet something palindrome-ish, but I am slightly afraid someone would take that code and make streaming-twitter-pesterbot-v1™.  Hunting for palindromic sentences would be fun, too (a palindromic 144 character tweet would be pure genius).  Most importantly, I&#8217;d like to search for sarahpalindromes which I&#8217;ve defined as words that become palindromes with one transposition (sarahpalindrome detection algorithms are much appreciated&#8230;you could probably get a paper out of something like that).</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/01/palindromes-on-twitter-rettiwt-no-semordnilap/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Binding Classes to Tags in Python&#8217;s ElementTree</title>
		<link>http://thatmattbone.com/2010/01/binding-classes-to-tags-in-pythons-elementtree/</link>
		<comments>http://thatmattbone.com/2010/01/binding-classes-to-tags-in-pythons-elementtree/#comments</comments>
		<pubDate>Fri, 01 Jan 2010 20:51:21 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[oop]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=165</guid>
		<description><![CDATA[Most of my work on BetterXML was in the area of data binding. In both of our frameworks, we created mappings between classes and markup elements, and then bound each occurrence of a tag in a document to an instance of the mapped class. Our work there was implemented from scratch (i.e. on top of [...]]]></description>
			<content:encoded><![CDATA[<p>Most of my work on <a href="betterxml.googlecode.com">BetterXML</a> was in the area of data binding.  In both of our frameworks, we created mappings between classes and markup elements, and then bound each occurrence of a tag in a document to an instance of the mapped class. Our work there was implemented from scratch (i.e. on top of a SAX parser), but it turns out that some of our ideas are quite simple to implement in Python&#8217;s <a href="http://docs.python.org/library/xml.etree.elementtree.html">ElementTree</a>.  Let&#8217;s say we are parsing the xml version of someone&#8217;s twitter stream.  It looks a little something like:</p>
<pre>&lt;statuses type="array"&gt;
  &lt;status&gt;
    &lt;created_at&gt;Fri Dec 18 18:31:14 +0000 2009&lt;/created_at&gt;
    &lt;id&gt;6804282413&lt;/id&gt;
    &lt;text&gt;my frozen soup will never thaw &lt;/text&gt;
    ...
  &lt;/status&gt;
  ...
&lt;/statuses&gt;</pre>
<p>We may like to bind each instance of the &lt;text&gt; tag to an instance of a Tweet class:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> Tweet<span style="color: black;">&#40;</span>_ElementInterface<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> word_count<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">text</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> at_someone<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">text</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;">#this is in the xml, but here we do it from scratch for fun</span>
            <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>word.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'@'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">True</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">False</span></pre></div></div>

<p>The actual binding occurs by creating our own factory method, and supplying this to the <a href="http://docs.python.org/library/xml.etree.elementtree.html#treebuilder-objects">ElementTree TreeBuilder</a> and parser:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> bound_element_factory<span style="color: black;">&#40;</span>tag, attrs<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tag == <span style="color: #483d8b;">'text'</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> Tweet<span style="color: black;">&#40;</span>tag, attrs<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">elif</span><span style="color: black;">&#40;</span>tag == <span style="color: #483d8b;">'statuses'</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> Twitter<span style="color: black;">&#40;</span>tag, attrs<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">else</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> _ElementInterface<span style="color: black;">&#40;</span>tag, attrs<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #dc143c;">parser</span> = XMLTreeBuilder<span style="color: black;">&#40;</span>target=TreeBuilder<span style="color: black;">&#40;</span>element_factory=bound_element_factory<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
tree = ElementTree.<span style="color: black;">parse</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'mytweets.xml'</span>, <span style="color: #dc143c;">parser</span>=<span style="color: #dc143c;">parser</span><span style="color: black;">&#41;</span></pre></div></div>

<p>So now as the document, is parsed, the bound_element_factory() method is called each time a tag is encountered.  Instead of always instantiating an _ElementInterface class for each tag (what the default factory method does), this method instantiates specialized children classes when a &#8220;text&#8221; or &#8220;statuses&#8221; tag is encountered.  These classes can obviously define or override methods willy-nilly.</p>
<p>While I think this is a pretty neat xml party trick, it is fair to question the utility of this approach in particular and data binding in general.   After all, ElementTree is pretty useful on its own. But data binding is appealing because it allows us to create the object and class hierarchy first, and worry about going to and from the markup representation later.</p>
<p>This version of data binding is a bit backwards, though. Because our objects extend the _ElementInterface object, we still must call the ElementTree&#8217;s <a href="http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.insert">insert()</a> method build up a tree of objects.  Thus, when we write our classes and build up trees of objects, we&#8217;re aware that they&#8217;re going to or coming from xml.</p>
<p>However, all is not lost. This approach can be useful when going <em>from</em> existing markup (particularly if it is changing or poorly defined) to an object hierarchy.  Again, this is what ElementTree excels at, but here we have the opportunity to create some utility methods in the objects and hide the markup from the users of these objects.  So in this case, if twitter changes the markup from &lt;text&gt; to &lt;tweet&gt;, we need only change the factory and the call to find().  Users of the Tweet class will see no changes.</p>
<p>A complete working example can be found in my <a href="http://bitbucket.org/thatmattbone/elementtree-data-binding/src/tip/example.py">bitbucket</a> (pardon the non-working thing in that repo, too).  Also, a major hat tip to the flexibility of the ElementTree implementation.  I high recommend reading that very pretty code, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2010/01/binding-classes-to-tags-in-pythons-elementtree/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Binary tree traversal in Python with generators</title>
		<link>http://thatmattbone.com/2009/09/binary-tree-traversal-in-python-with-generators/</link>
		<comments>http://thatmattbone.com/2009/09/binary-tree-traversal-in-python-with-generators/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 01:45:38 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=154</guid>
		<description><![CDATA[One of the many things I like about Python is generators. They allow for the creation of iterators without the boilerplate imposed by Java or PHP. Furthermore, iterators can be thought of as streams, and many times we don&#8217;t really want to default to eager behavior (maybe eager behavior should be demanded ). For example, [...]]]></description>
			<content:encoded><![CDATA[<p>One of the many things I like about Python is <a href="http://www.python.org/dev/peps/pep-0255/">generators</a>.  They allow for the creation of iterators without the boilerplate imposed by <a href="http://java.sun.com/javase/6/docs/api/java/util/Iterator.html">Java</a> or <a href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html">PHP</a>.  Furthermore, iterators can be thought of as streams, and many times we don&#8217;t really want to default to eager behavior (maybe eager <a href="http://www.haskell.org/">behavior should be demanded</a> ).  For example, consider a class implementing a very simple binary tree:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> BinaryTree:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, data, left=<span style="color: #008000;">None</span>, right=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">data</span> = data
        <span style="color: #008000;">self</span>.<span style="color: black;">left</span> = left
        <span style="color: #008000;">self</span>.<span style="color: black;">right</span> = right
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__unicode__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">'%s'</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">self</span>.<span style="color: black;">data</span></pre></div></div>

<p>More of a struct, this simple class has slots (instance variables) for the left sub-node, the right sub-node, and the data.  When we first encounter a structure like this in, say, a data structure course, our inclination is to traverse the thing.  Remember, we can do this any number of ways: depth-first, breadth-first, pre-order, post-order (for the traversals in this article, I will only be concerned with node data, but all the algorithms can easily be modified to yield the nodes themselves).  If we want to write a very simple, eager, depth-first (and also pre-order) traversal of a tree like this, we can do something as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> recursive_dfs<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
    nodes = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree <span style="color: #66cc66;">!</span>= <span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        nodes.<span style="color: black;">append</span><span style="color: black;">&#40;</span>tree.<span style="color: black;">data</span><span style="color: black;">&#41;</span>
        nodes.<span style="color: black;">extend</span><span style="color: black;">&#40;</span>recursive_dfs<span style="color: black;">&#40;</span>tree.<span style="color: black;">left</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        nodes.<span style="color: black;">extend</span><span style="color: black;">&#40;</span>recursive_dfs<span style="color: black;">&#40;</span>tree.<span style="color: black;">right</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> nodes</pre></div></div>

<p>This is a great first step, but, as already mentioned, it is eager.  By calling this function, we get a complete list of all the nodes in the tree whether we need them or not.  With a few simple modifications, however, we can pull nodes out of a tree on demand in the same pre-order fashion by using Python generators.  We simply start the traversal, yield the node data, yield all nodes in the left subtree, and then yield all nodes in the right subtree:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> basic_dfs<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree<span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">data</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> node_data <span style="color: #ff7700;font-weight:bold;">in</span> basic_dfs<span style="color: black;">&#40;</span>tree.<span style="color: black;">left</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">yield</span> node_data
        <span style="color: #ff7700;font-weight:bold;">for</span> node_data <span style="color: #ff7700;font-weight:bold;">in</span> basic_dfs<span style="color: black;">&#40;</span>tree.<span style="color: black;">right</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">yield</span> node_data</pre></div></div>

<p>If we wanted a (not-quite)-post-order traversal, we would yield the nodes in the right subtree first.  We could do this by simple rewriting the function above.  However, we can take this a step further, and leave the the decision of what nodes to yield first to another function entirely (I will call this the &#8216;chooser&#8217; function):</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> left_then_right<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree<span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">left</span>
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">right</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> dfs<span style="color: black;">&#40;</span>tree, chooser=left_then_right<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree<span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">data</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> immediate_child <span style="color: #ff7700;font-weight:bold;">in</span> chooser<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">for</span> node_data <span style="color: #ff7700;font-weight:bold;">in</span> dfs<span style="color: black;">&#40;</span>immediate_child, chooser<span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> node_data</pre></div></div>

<p>Thus <code>dfs(sometree)</code> will call the <code>left_then_right()</code> function by default, and perform a pre-order traversal.  For our (not-quite)-post-order traversal, we define the <code>right_then_left()</code> function:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> right_then_left<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree<span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">right</span>
        <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">left</span></pre></div></div>

<p>And passing this function with <code>dfs(sometree, right_then_left)</code>, we have our new traversal.  To really see the benefit of our lazy traversals, though, we can go one step further, and implement a binary-search on top of our <code>dfs()</code> function as a chooser function.  Instead of yielding the left subtree <em>and</em> the right subtree, the chooser function will yield the nodes in the left subtree if the value we&#8217;re searching for is less than the node data, the nodes in the right if the value is greater than the node data, or the node data itself if it is equal to the value:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> binary_search_chooser<span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> binary_search_chooser_inner<span style="color: black;">&#40;</span>tree<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>tree<span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span> <span style="color: #ff7700;font-weight:bold;">and</span> tree.<span style="color: black;">data</span><span style="color: #66cc66;">!</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span><span style="color: black;">&#40;</span>value<span style="color: #66cc66;">&amp;</span>lt<span style="color: #66cc66;">;</span>=tree.<span style="color: black;">data</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">left</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> tree.<span style="color: black;">right</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">return</span> binary_search_chooser_inner</pre></div></div>

<p>Don&#8217;t let the closure fool you, this is still simple stuff.  A call to <code>binary_search_chooer(5)</code>, returns a chooser function that will decide whether to go left or right down the tree based on a node&#8217;s value.  So to search a BST for 5, we can just call <code>bfs(tree, binary_search_chooser(5))</code>.  This will give us a list of nodes (the path to a leaf), the last of which will be 5 if that value is found.</p>
<p>For sure there are more efficient ways to do these kinds of traversal with pointer manipulation, etc, but this serves as fun exercise for fans of generators.  The astute reader will also note that we&#8217;ve implemented the strategy pattern in a functional-programming type of way with our use of first class functions.</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2009/09/binary-tree-traversal-in-python-with-generators/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>the stack!</title>
		<link>http://thatmattbone.com/2009/04/the-stack/</link>
		<comments>http://thatmattbone.com/2009/04/the-stack/#comments</comments>
		<pubDate>Fri, 01 May 2009 01:11:04 +0000</pubDate>
		<dc:creator>Matt Bone</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://thatmattbone.com/?p=139</guid>
		<description><![CDATA[When I was teaching data structures, one of the things that commonly came up was the whole issue of &#8220;pass by value&#8221; versus &#8220;pass by reference.&#8221; This was a very difficult thing to teach;  somehow I&#8217;ve internalized the common C/Java style memory model, and these things are fairly second nature.  While lecturing, I was definitely [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-146" title="img_0386" src="http://thatmattbone.com/wp-content/uploads/2009/04/img_0386-225x300.jpg" alt="img_0386" width="225" height="300" /></p>
<p>When I was teaching data structures, one of the things that commonly came up was the whole issue of &#8220;pass by value&#8221; versus &#8220;pass by reference.&#8221; This was a very difficult thing to teach;  somehow I&#8217;ve internalized the common C/Java style memory model, and these things are fairly second nature.  While lecturing, I was definitely guilty of resorting to mind-numbing aphorisms like &#8220;integers are passed by value while objects are passed by reference&#8221;.  These rules of thumb just don&#8217;t work (and make it incredibly difficult to get your head around the idea that object _references_ are passed by value). This <a href="http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx">(fabulous) article over at MSDN </a>has a really nice description of the stack as an implementation detail.  Perhaps the key to solving this problem is really to just make the first course in computer science programs a course in programing language implementation.  Then you can move on to harder things like control flow and datastructures (I&#8217;m kind of kidding here).</p>
<p>But in reality, the vagaries of &#8220;value types&#8221; versus heap-allocated objects are too pointy for a data structures course.  Their guts are easily exposed in something like C while being appropriately hidden in an everything-is-an-object language like Python.  The Java/C# mix of these memory allocation strategies is confusing for the beginner.  Expose the details or don&#8217;t.  Exceptions to the rules breed bugs and confusion.</p>
]]></content:encoded>
			<wfw:commentRss>http://thatmattbone.com/2009/04/the-stack/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.373 seconds -->
