Do minority languages need machine translation?

This is an abbreviated transcript of a talk I gave at a British-Irish Council conference on language technology in indigenous, minority and lesser-used languages in Dublin earlier this month (November 2015) under the title ‘Do minority languages need the same language technology as majority languages?’ I wanted to bust the myth that machine translation is necessary for the revival of minority languages. What I had to say didn’t go down well with some in the audience, especially people who work in machine translation (unsurprisingly). So beware, there is controversy ahead!

Be patient! Bí othar!

Continue reading


10 reasons why Irish is an absolutely awesome language

Health warning! Learning Irish will open your mind, win you interesting friends and make you attractive to the opposite sex.I have devoted a large chunk of my career to learning Irish, working with Irish and making a living out of Irish. So I thought it would be fair to put together a list of reasons why I think the language is worth it. Mine are proper linguistic reasons though – none of that starry-eyed sentimental nonsense about the language being ‘beautiful’ or ‘romantic’! So, put your language geek hats on, here we go!

(Many of the features mentioned here are actually common to all Celtic languages, including Scottish Gaelic and Welsh, but let’s not be splitting hairs now.) Continue reading

Why Ireland needs a Minister for Data

I wish more Irish data came with this sticker.

I wish more Irish data came with this sticker. Photo by “jwyg” on Flickr.

In this article, I am going to say something important about open data. But first, I need to explain what open data is. If you are familiar with the concept, skip to the next section.

Open data is a principle that dictates that data held internally in organizations should be made available to outsiders. This applies mainly to governments. Governments possess large amounts of data: data about the geographies of their countries, anonymized statistics about their populations, about the economy, data about transport infrastructure, about traffic, about weather. There is a growing understanding in developed countries everywhere that governments should make these data sets available, in machine-readable formats, for free reuse by anyone anywhere, without copyright or royalties. The idea is that society will benefit in two ways. Way number one, opening up government data will encourage transparency in government: good governments have nothing to hide. Way number two, all that data will provide fodder for innovation and entrepreneurship, people will be able to build applications on top of the data, start businesses and create jobs, or if not, at least build useful apps that make people’s lives easier.

That is the theory and politicians everywhere are falling over themselves proclaiming how much they believe in it. But not everywhere are words being converted into actions. Sadly, the country where I live, Republic of Ireland, is not a leader in this field. Very little government data is available for unrestricted reuse or in formats that lend themselves to easy reuse. I will demonstrate this with a concrete example from personal experience. Continue reading

Breathing new life into old data: how to retro-digitize a dictionary

A new digital home for granddad and grandma:

I have recently worked on a project where we retro-digitized two Irish dictionaries and published them on the web, so I thought it would be a good idea to summarize my experience here. Hopefully somebody somewhere will find it useful.

In the slang of people who care about such things, retro-digitization is the process of taking a work that had previously been published on paper (often a long time ago, way before computers made their way into publishing) and converting it into a digital, computer-readable format. A bit like retro-fitting a house or pimping up an old car. This involves not only scanning and OCRing the pages, but also structuring and indexing the content so it can be searched and interrogated in ways that would have been impossible on paper. This is the bit that matters most if what you are retro-digitizing is a dictionary.

The dictionaries we retro-digitized are Foclóir Gaeilge-Béarla [Irish-English Dictionary] from 1977 (editor Niall Ó Dónaill), and English-Irish Dictionary from 1959 (editor Tomás de Bhaldraithe). Both are sizeable volumes which, despite their age, enjoy the respect, even adoration, of Irish speakers everywhere, are still widely used and widely available in bookshops. People have been saying for ages how nice it would be if we had electronic versions of these. And now we do, available freely to everybody on a website. Here’s how we got there. Continue reading

Is the Web killing the dictionary?

Paper, e-book, app.I have been going to lexicography conferences for many years now, including the Euralex congresses and the eLex series. One popular opinion that always emerges in talks and conversations at such events is that the Web is – supposedly – killing the dictionary. Now that I’m about to attend yet another instalment of the eLex conference (taking place in Tallinn, Estonia this year – great, I hear the Baltic Sea is lovely in October!) I thought it would be a good idea to dissect this opinion a little. Let’s dissect away, then. Continue reading

Linguistics of the Gaelic Languages 2013: a conference report

Oh, the things I do for fun at weekends! For example last weekend, I attended the Linguistics of the Gaelic Languages conference in University College Dublin (19 – 20 April 2013). This was a small but focused event, with 20 to 30 people attending to discuss latest research on Irish, Scottish Gaelic and Manx. Here is my report. Continue reading

European Data Forum 2013: a trip report

By LOD2project on Flickr

Impression from EDF2013 by LOD2project on Flickr

I’ve decided to enliven this blog a little by using it as an outlet for trip reports to conferences and other work-related events I travel to. Which is funny because my first trip report will be from a conference I didn’t have to travel to at all (unless a fifteen-minute walk from my front door counts as travelling): the European Data Forum (EDF) held in Dublin’s Croke Park Conference Centre on 9 and 10 April 2013.

This was an occasion for information professionals to meet and discuss, well, data. You might think that that sounds too vague. Surely, what can anybody have to say about data in general except that it is the stuff that computers eat? A lot, actually. In the last couple of years, something has changed about the way we understand data: what it is, how we produce it, how much of it we produce, and how we use it. I will summarize this under two broad headings: big data and open data. Continue reading