I wish more Irish data came with this sticker. Photo by “jwyg” on Flickr.
In this article, I am going to say something important about open data. But first, I need to explain what open data is. If you are familiar with the concept, skip to the next section.
Open data is a principle that dictates that data held internally in organizations should be made available to outsiders. This applies mainly to governments. Governments possess large amounts of data: data about the geographies of their countries, anonymized statistics about their populations, about the economy, data about transport infrastructure, about traffic, about weather. There is a growing understanding in developed countries everywhere that governments should make these data sets available, in machine-readable formats, for free reuse by anyone anywhere, without copyright or royalties. The idea is that society will benefit in two ways. Way number one, opening up government data will encourage transparency in government: good governments have nothing to hide. Way number two, all that data will provide fodder for innovation and entrepreneurship, people will be able to build applications on top of the data, start businesses and create jobs, or if not, at least build useful apps that make people’s lives easier.
That is the theory and politicians everywhere are falling over themselves proclaiming how much they believe in it. But not everywhere are words being converted into actions. Sadly, the country where I live, Republic of Ireland, is not a leader in this field. Very little government data is available for unrestricted reuse or in formats that lend themselves to easy reuse. I will demonstrate this with a concrete example from personal experience. Continue reading
I have recently worked on a project where we retro-digitized two Irish dictionaries and published them on the web, so I thought it would be a good idea to summarize my experience here. Hopefully somebody somewhere will find it useful.
In the slang of people who care about such things, retro-digitization is the process of taking a work that had previously been published on paper (often a long time ago, way before computers made their way into publishing) and converting it into a digital, computer-readable format. A bit like retro-fitting a house or pimping up an old car. This involves not only scanning and OCRing the pages, but also structuring and indexing the content so it can be searched and interrogated in ways that would have been impossible on paper. This is the bit that matters most if what you are retro-digitizing is a dictionary.
The dictionaries we retro-digitized are Foclóir Gaeilge-Béarla [Irish-English Dictionary] from 1977 (editor Niall Ó Dónaill), and English-Irish Dictionary from 1959 (editor Tomás de Bhaldraithe). Both are sizeable volumes which, despite their age, enjoy the respect, even adoration, of Irish speakers everywhere, are still widely used and widely available in bookshops. People have been saying for ages how nice it would be if we had electronic versions of these. And now we do, available freely to everybody on a website. Here’s how we got there. Continue reading
I have been going to lexicography conferences for many years now, including the Euralex congresses and the eLex series. One popular opinion that always emerges in talks and conversations at such events is that the Web is – supposedly – killing the dictionary. Now that I’m about to attend yet another instalment of the eLex conference (taking place in Tallinn, Estonia this year – great, I hear the Baltic Sea is lovely in October!) I thought it would be a good idea to dissect this opinion a little. Let’s dissect away, then. Continue reading
Oh, the things I do for fun at weekends! For example last weekend, I attended the Linguistics of the Gaelic Languages conference in University College Dublin (19 – 20 April 2013). This was a small but focused event, with 20 to 30 people attending to discuss latest research on Irish, Scottish Gaelic and Manx. Here is my report. Continue reading
I’ve decided to enliven this blog a little by using it as an outlet for trip reports to conferences and other work-related events I travel to. Which is funny because my first trip report will be from a conference I didn’t have to travel to at all (unless a fifteen-minute walk from my front door counts as travelling): the European Data Forum
(EDF) held in Dublin’s Croke Park Conference Centre on 9 and 10 April 2013.
This was an occasion for information professionals to meet and discuss, well, data. You might think that that sounds too vague. Surely, what can anybody have to say about data in general except that it is the stuff that computers eat? A lot, actually. In the last couple of years, something has changed about the way we understand data: what it is, how we produce it, how much of it we produce, and how we use it. I will summarize this under two broad headings: big data and open data. Continue reading
In this article, I am going to give a nice and simple example of how learning a new language causes you to start perceiving the world differently. By doing so I will provide support for the Sapir-Whorf hypothesis (in its weak form), which is a hypothesis that claims that the language you speak predetermines, to some extent, how you think. I will demonstrate this on my favourite toy language, Irish. Continue reading
The first programming language I ever learned was called BASIC. This is ancient history now but back when I was a kid, BASIC was the gateway drug for any aspiring computer geek. As a programming language, BASIC was quite, well, basic – it consisted of a small number of keywords like DATA, READ, LET and PRINT (yes, you were supposed to write them in uppercase), you had to number your lines (which you were recommended to do in increments of 10 so you could insert additional lines later) and I’m not sure if it could even do loops. If it could, it probably had to be done with the GOTO command followed by a line number, which even back then had the elegance of a bucket of sludge. Continue reading