Trees vs. Hashes
My friend (and kinda godson—I sponsored him for confirmation) Buzz Anderson writes on the topic of data structures as culture, inspired by a friend’s interview at Microsoft which featured trees and his own interviewing experience at Apple which heavily features hash tables. What, one might ask, is a tree and why would a programmer care about it—and for that matter, what’s a hash table?
When dealing with data in a computer, one often needs to search for a particular datum. This could be done by scanning every last bit of memory to find it, but that would be extremely slow. So instead we somehow associate a key with the data, and look for the key instead. But it’s still too slow to just search through a list of keys to find the one we’re looking for: on average one must examine half the keys to see if they match (what we call O(x), since performance is directly proportional to the number of keys); this can get quite expensive. So we computer scientists build data structures which enable faster searching and use of data.
A tree is arranged like an upside-down tree, with the
root
at the top and the leaves at the bottom. The entire thing
is composed of nodes elements which contain a key, a datum
and two child nodes (the leaf nodes have no children). Moreover, there
is a constraint on the nodes: the no node in the left-hand subtree (the
tree which has the left-hand child node as its root) has a key is
greater than the root’s key, and every node in the
right-hand subtree has a greater key than the root’s. To search
the tree for a number (say, 42), one first visits the root node and
checks its key. If it’s 42, then you’re done: return the
datum. If its key is greater than 42, then check the left-hand
subtree in the same way; otherwise check the right-hand subtree. Keep
on doing this until you find the key or run out of nodes to examine. A
binary tree provides what we refer to as O(log x) performance: a
substantial improvement over linear searching.
A hash tree is best visualised as a black box (there are a lot of different ways to implement them): you give it a key, and it returns the data in constant time. That is, whether the table contains one key or for one million, the time to retrieve the data is the same. This means that a hash table’s performance is O(1)—it’s proportional only to 1. This is really cool: it means that we need never care about how large our data set grows, since performance will be as predictable as it ever was.
You might be asking why hash tables aren’t the only data structure ever used, then. The answer is that the constant time they take is relatively slow compared to trees for small data sets—and the logarithmic time that trees run in is relatively slow compared to linear search for very small data sets. Thus even in a modern program there are probably a few places that use a simple linear search, and in others places using trees, and in others using hash tables.
The beauty of a hash table, though, is that it is conceptually the cleanest and best of the structures. You know that performance on the developer’s workstation with a toy data set will be the exact same as in the field with $100,000,000 worth of financial transactions; you know that you're ready for the future. It’s telling that Apple cares about hash tables and Microsoft about trees. Tree are a speed hack when can bite one in production, but are great when developing; hash tables take up a bit more resources, but are much more durable.

