Breathing new life into old data: how to retro-digitize a dictionary

A new digital home for granddad and grandma:

I have recently worked on a project where we retro-digitized two Irish dictionaries and published them on the web, so I thought it would be a good idea to summarize my experience here. Hopefully somebody somewhere will find it useful.

In the slang of people who care about such things, retro-digitization is the process of taking a work that had previously been published on paper (often a long time ago, way before computers made their way into publishing) and converting it into a digital, computer-readable format. A bit like retro-fitting a house or pimping up an old car. This involves not only scanning and OCRing the pages, but also structuring and indexing the content so it can be searched and interrogated in ways that would have been impossible on paper. This is the bit that matters most if what you are retro-digitizing is a dictionary.

The dictionaries we retro-digitized are Foclóir Gaeilge-Béarla [Irish-English Dictionary] from 1977 (editor Niall Ó Dónaill), and English-Irish Dictionary from 1959 (editor Tomás de Bhaldraithe). Both are sizeable volumes which, despite their age, enjoy the respect, even adoration, of Irish speakers everywhere, are still widely used and widely available in bookshops. People have been saying for ages how nice it would be if we had electronic versions of these. And now we do, available freely to everybody on a website. Here's how we got there.

Is the Web killing the dictionary?

Paper, e-book, app.I have been going to lexicography conferences for many years now, including the Euralex congresses and the eLex series. One popular opinion that always emerges in talks and conversations at such events is that the Web is – supposedly – killing the dictionary. Now that I'm about to attend yet another instalment of the eLex conference (taking place in Tallinn, Estonia this year – great, I hear the Baltic Sea is lovely in October!) I thought it would be a good idea to dissect this opinion a little. Let's dissect away, then.

Why Wikipedia works but Wiktionary doesn’t

It’s a truism to say that Wikipedia has been a resounding success. Not only does it have a large community of contributors but it also has an even larger community of readers: people who actually go to Wikipedia to get information. Wiktionary, on the other hand, has been more of an “unmitigated failure”, in the words of the lexicographer Patrick Hanks that I’ve overheard at the eLex conference in Belgium this October. Sure, Wiktionary does have an active contributor community, like Wikipedia does. But it has not achieved the status of the “go-to place” for lexical information, like Wikipedia has for factual information. It seems to me that, by and large, people don’t actually go to Wiktionary to find out about the meanings, usage and translations of words. People tend to prefer proprietary dictionaries (some of which are also available online for free). The question is, why?

On good dictionaries and bad

I had the pleasure of attending the eLex (“electronic lexicography”) conference in Louvain-la-Neuve in Belgium earlier this month. As someone who works a lot with lexical databases, I was in my element at an event where everybody was talking about electronic dictionaries.

One of the issues that was discussed a lot at the conference was the question, how do people actually use electronic dictionaries? There are, of course, many different kinds of electronic dictionaries including CD-ROM ones, on-line ones, and dictionaries embedded in hand-held devices. And there are quite striking differences in how people use them in different parts of the world. I already knew that hand-held dictionaries are much more popular in Asia than in Europe, but a talk given by Hilary Nesi at the conference added a great deal of detail that I didn't know yet.

