Lately I’ve been thinking about processing streams of data on the fly. Weirdly enough, I have a fascination with palindromes, so I wrote a very bad hack with Twisted to hookup to the streaming twitter feed and search for these things. Over the course of about a day, I found 259 distinct palindromes where palindromes are (very narrowly) defined as:
- Words without punctuation, numbers or accents (i.e. only a-z in regex).
- Longer than four characters.
- Consisting of three or more distinct characters.
- Not in a set of stop words I decided I was not interested in.
The astute reader will notice that these rules actual mean all valid palindromes will be 5 or more characters. Now for the results:
“Level” is the clear winner with 662 occurrences. The venerable palindrome “racecar” sped into the #30 spot with a mere 8 occurrences.
Unfortunately I did not keep track of how many tweets I scanned, but it appears a very small percentage of tweets contain this type of palindrome. I would like to convert this to a full-on palindrome tracking twitter bot that will annoyingly tweet at you if you tweet something palindrome-ish, but I am slightly afraid someone would take that code and make streaming-twitter-pesterbot-v1™. Hunting for palindromic sentences would be fun, too (a palindromic 144 character tweet would be pure genius). Most importantly, I’d like to search for sarahpalindromes which I’ve defined as words that become palindromes with one transposition (sarahpalindrome detection algorithms are much appreciated…you could probably get a paper out of something like that).