Wednesday, September 3, 2008

Dictionary Usability

A few weeks ago, I posted a link to a TED talk by a lexicographer. Now I'll follow up with a tangential response (which is to say, some of the ideas in this post are inspired by her talk).

The definition of a dictionary isn't as simple as people might think. We often think of dictionaries as those books that have an alphabetical list of words, each word (or entry) followed by an explanation. But what about "rhyming dictionaries" that don't define anything at all? Those are merely lists of words grouped by similar sounds. And of course, "translation dictionaries" which usually do little more than list the most likely synonyms in the target language. When you think about it, the term "dictionary" isn't well-defined at all.

But moving on to the classical notion of dictionaries. We're progressing towards a digital era, in which (we hope) the power necessary to support computers and servers are less harmful to the environment than the deforestation necessary to produce physical-book dictionaries. The electronic "search" function is certainly handier than physically having to flip through pages of other words. And the current OED online has a menu on the left that lists lexically similar entries, just like a real dictionary.

One area that it fails in, however, is in its ability to guess the word you intended to search, if you mistyped or misspelt it. Google does this pretty well though. Searching {"fedutiary"} will get you to "the nearest alphabetical match-point is displayed in the side-frame", which in this case was pretty useless. (The side-frame gets set at "fay", which is pretty far from "fedutiary".) Google is equally useless for searches containing "fedutiary", but will give a suggestion if "feduciary" is entered. ("Did you mean fiduciary?") Despite its relatively lower prestige, dictionary.com is even better, suggesting "fiduciary" for both searches {"feduciary"} and {"fedutiary"}.

My knowledge of programming is pretty limited, but even in turing, it'd be pretty easy to at least generate common alternatives to sound clusters. (eg. {"-ciary", "-tiary", "-siary"}, {"-e-", "-i-", "-ei-", etc.}). And yet, OED hasn't bothered. I wonder what's holding them up.

No comments: