Octopodial Chrome

Stuff that Made Sense at the Time

The Personal Weblog of Bob Uhl


Friday, 07 January 2005

Trees vs. Hashes

My friend (and kinda godson—I sponsored him for confirmation) Buzz Anderson writes on the topic of data structures as culture, inspired by a friend’s interview at Microsoft which featured trees and his own interviewing experience at Apple which heavily features hash tables. What, one might ask, is a tree and why would a programmer care about it—and for that matter, what’s a hash table?

When dealing with data in a computer, one often needs to search for a particular datum. This could be done by scanning every last bit of memory to find it, but that would be extremely slow. So instead we somehow associate a key with the data, and look for the key instead. But it’s still too slow to just search through a list of keys to find the one we’re looking for: on average one must examine half the keys to see if they match (what we call O(x), since performance is directly proportional to the number of keys); this can get quite expensive. So we computer scientists build data structures which enable faster searching and use of data.

A tree is arranged like an upside-down tree, with the root at the top and the leaves at the bottom. The entire thing is composed of nodes elements which contain a key, a datum and two child nodes (the leaf nodes have no children). Moreover, there is a constraint on the nodes: the no node in the left-hand subtree (the tree which has the left-hand child node as its root) has a key is greater than the root’s key, and every node in the right-hand subtree has a greater key than the root’s. To search the tree for a number (say, 42), one first visits the root node and checks its key. If it’s 42, then you’re done: return the datum. If its key is greater than 42, then check the left-hand subtree in the same way; otherwise check the right-hand subtree. Keep on doing this until you find the key or run out of nodes to examine. A binary tree provides what we refer to as O(log x) performance: a substantial improvement over linear searching.

A hash tree is best visualised as a black box (there are a lot of different ways to implement them): you give it a key, and it returns the data in constant time. That is, whether the table contains one key or for one million, the time to retrieve the data is the same. This means that a hash table’s performance is O(1)—it’s proportional only to 1. This is really cool: it means that we need never care about how large our data set grows, since performance will be as predictable as it ever was.

You might be asking why hash tables aren’t the only data structure ever used, then. The answer is that the constant time they take is relatively slow compared to trees for small data sets—and the logarithmic time that trees run in is relatively slow compared to linear search for very small data sets. Thus even in a modern program there are probably a few places that use a simple linear search, and in others places using trees, and in others using hash tables.

The beauty of a hash table, though, is that it is conceptually the cleanest and best of the structures. You know that performance on the developer’s workstation with a toy data set will be the exact same as in the field with $100,000,000 worth of financial transactions; you know that you're ready for the future. It’s telling that Apple cares about hash tables and Microsoft about trees. Tree are a speed hack when can bite one in production, but are great when developing; hash tables take up a bit more resources, but are much more durable.

The Innocence of Fr. Brown and What's Wrong with the World

Sleepy Owl Press has published two more books by G.K. Chesterton.

The Innocence of Father Brown
Twelve of G.K. Chesterton's superlative Fr. Brown mysteries. Fr. Brown is a Roman Catholic priest who solves the most varied type of mysteries in the traditional English manner of Miss Marple and Sherlock Holmes. A wonderful read!
What’s Wrong with the World
A classic work on philosophy, imperialism, feminism, education and what is to be done about all the above.

Both are $9.99. Now that I’ve published three Chesterton works, I’m next turning my attention to Doyle—and after that, Kipling! Not so certain which authors will intrigue me enough to add after that, though.

Swede Loses Husband, Son and Mother in Tsunami

A Swedish woman lost her husband, her two-year-old and her mother in the Asian tsunami just a day after she and her husband had a second wedding. One knows neither the time nor the hour indeed.

How the Left Betrayed Iraq

Naseer Flayih Hasan has written How the Left Betrayed my Country—Iraq. A biting commentary on the dishonesty, cynicism and ignorance of the Left.

Alarms and Discursions

I have just published G.K. Chesterton’s Alarms \& Discursions in a new attractively typeset edition. The book is a collection of miscellaneous essays on politics, religion and philosophy first published in 1911. The cover price is $9.99—contact me for bulk orders.


January
Sun Mon Tue Wed Thu Fri Sat
           
7
         
2005
Months
Jan

Powered by Blosxom | Subscribe with Bloglines | Listed on
BlogShares | Blogarama - The Blog Directory | Technorati Profile

This is my blogchalk:
United States, Colorado, Englewood, Centennial, English, , Robert, Male, 21–25, Free Software, Society for Creative Anachronism.