Posts tagged “oop”.

Binding Classes to Tags in Python’s ElementTree

Most of my work on BetterXML was in the area of data binding. In both of our frameworks, we created mappings between classes and markup elements, and then bound each occurrence of a tag in a document to an instance of the mapped class. Our work there was implemented from scratch (i.e. on top of a SAX parser), but it turns out that some of our ideas are quite simple to implement in Python’s ElementTree. Let’s say we are parsing the xml version of someone’s twitter stream.  It looks a little something like:

<statuses type="array">
  <status>
    <created_at>Fri Dec 18 18:31:14 +0000 2009</created_at>
    <id>6804282413</id>
    <text>my frozen soup will never thaw </text>
    ...
  </status>
  ...
</statuses>

We may like to bind each instance of the <text> tag to an instance of a Tweet class:

class Tweet(_ElementInterface):
    def word_count(self):
        return len(self.text.split())

    def at_someone(self):
        for word in self.text.split(): #this is in the xml, but here we do it from scratch for fun
            if(word.startswith('@')):
                return True
        return False

The actual binding occurs by creating our own factory method, and supplying this to the ElementTree TreeBuilder and parser:

def bound_element_factory(tag, attrs):
    if(tag == 'text'):
        return Tweet(tag, attrs)
    elif(tag == 'statuses'):
        return Twitter(tag, attrs)
    else:
        return _ElementInterface(tag, attrs)

parser = XMLTreeBuilder(target=TreeBuilder(element_factory=bound_element_factory))
tree = ElementTree.parse('mytweets.xml', parser=parser)

So now as the document, is parsed, the bound_element_factory() method is called each time a tag is encountered. Instead of always instantiating an _ElementInterface class for each tag (what the default factory method does), this method instantiates specialized children classes when a “text” or “statuses” tag is encountered. These classes can obviously define or override methods willy-nilly.

While I think this is a pretty neat xml party trick, it is fair to question the utility of this approach in particular and data binding in general.   After all, ElementTree is pretty useful on its own. But data binding is appealing because it allows us to create the object and class hierarchy first, and worry about going to and from the markup representation later.

This version of data binding is a bit backwards, though. Because our objects extend the _ElementInterface object, we still must call the ElementTree’s insert() method build up a tree of objects.  Thus, when we write our classes and build up trees of objects, we’re aware that they’re going to or coming from xml.

However, all is not lost. This approach can be useful when going from existing markup (particularly if it is changing or poorly defined) to an object hierarchy.  Again, this is what ElementTree excels at, but here we have the opportunity to create some utility methods in the objects and hide the markup from the users of these objects.  So in this case, if twitter changes the markup from <text> to <tweet>, we need only change the factory and the call to find().  Users of the Tweet class will see no changes.

A complete working example can be found in my bitbucket (pardon the non-working thing in that repo, too). Also, a major hat tip to the flexibility of the ElementTree implementation. I high recommend reading that very pretty code, too.

private methods are stupid

Today at work in a unit-testing-brown-bag-lunch-session, I spouted off about private methods, claiming that they are stupid.  I stand by my statement; private methods are definitely stupid.  However, my true feelings are a bit more nuanced.

If we assume we are associating a specific chunk of functionality with every method we write, then there is an overarching hubris associated with declaring a method to be private.  When we declare such a method to be private, we  predict the future, saying “only I, the privileged creator of this spectacular class will ever need this functionality.”

But, wait, aren’t there legitimate reasons to declare a private method?  What if a particular method leaves an object in an inconsistent state, debilitating all subsequent calls to “public” methods?  Should this dangerous method be marked private?  Perhaps.  Leaving aside arguments about the necessity to create such a method, there are other ways around the problem.  Convention, not language level constructs solve the issue nicely.  Python, for example, uses a leading underscore for “private” methods, and Smalltalk programmers sometimes place these methods in the “private” protocol.  And while I am not interested in metaphysical debates about what makes a system “object oriented” (I mean really, I think CLOS is perhaps the best OO system out there), with the examples of Smalltalk and Python, it is clear that private methods are not necessary for OO systems.

The argument that private methods are necessary to “protect” code is even sillier.  If we’re afraid of how other programmers might use our classes, then we shouldn’t bother programming.  Not only do these fears foster a sense of mistrust among programmers, but also, they are not easily eliminated.  In the most straightforward cases, users of our classes with access to the code may go through and mark useful methods public (note that loosening protection introduces no bugs in non-reflective code), or worse, copy and paste this functionality into another method entirely.   In more severe cases, the “nefarious” programmer will resort to introspection or disassemblers.

We programmers are tinkerers, and we should be willing to let ourselves tinker.  Sleep easy, and declare your methods to be public!