Do minority languages need machine translation?

This is an abbreviated transcript of a talk I gave at a British-Irish Council conference on language technology in indigenous, minority and lesser-used languages in Dublin earlier this month (November 2015) under the title ‘Do minority languages need the same language technology as majority languages?’ I wanted to bust the myth that machine translation is necessary for the revival of minority languages. What I had to say didn’t go down well with some in the audience, especially people who work in machine translation (unsurprisingly). So beware, there is controversy ahead!

Be patient! Bí othar!

This sign, which you could find in Dublin airport at one stage until public outcry forced the authorities to change it, is meant to be bilingual but the Irish version – which is probably machine-translated from English – is nothing more than word salad. Apart from the grammar being so jumbled up that it’s virtually incomprehensible, it translates the adjective ‘patient’ with a noun meaning ‘hospital patient’. This is not an exception: botched signs like this are common in Ireland and probably outnumber correct ones. While some are produced by incompetent humans the old-school way, many are machine-translated. Their proliferation is a consequence of widely available machine translation in deadly combination with irresponsible people who don’t understand its limitations.

For better of worse, publicly available machine translation has been with us for some time now, thanks mainly to Google Translate and Microsoft’s Bing Translator. While the output is sometimes so bad it makes people react with laughter and anger and everything in between, machine translation is undeniably useful if used responsibly and for the right purpose. So far so good. But the question I want to ask here is, do minority languages such as Irish, Welsh and Scottish Gaelic need machine translation technology as much as majority languages such as English and French do?

Machine translation: a double-edged sword

Machine translation is viewed by many to be the most prominent artefact of language technology. So the idea naturally springs to mind that, because majority languages have it, minority languages need it too.

Do they? That depends on what machine translation is for, what it can do and what it cannot. One thing people are not aware of enough is that current machine translation technology comes with a margin of error: it is designed with the caveat that it will produce substandard translations some of the time. This means that machine translation is really only suitable for comprehension (= helping me understand a text in a language I don’t speak) and not for production (= writing a text in a language I don’t speak).

In a sense, using machine translation for production, when not followed by post-editing, constitutes not a use but an abuse of the technology. In the English-to-Irish language pair, the abuses are so numerous and so prominently visible in public that they probably outnumbers the valid uses. Monolingual English speakers everywhere, including Ireland’s government and civil service, routinely abuse machine translation to produce texts in a language they don’t speak (Irish), in apparent ignorance of the fact that the technology was never intended for production-quality output. Consequently, Google Translate et al. have become a running joke in the Irish-language community in Ireland. From what I hear the story is not too different in other minority languages where a machine translation tool is publicly available.

How is a minority language different from a majority language?

The languages we are dealing with here, Irish, Welsh, Breton and so on, are minority languages which coexist with a majority language like English and French in a situation I call subset bilingualism: everybody who speaks the minority language also speaks the majority language, but not everybody who speaks the majority language speaks the minority language. There are no monolingual speakers of the minority language. The minority-language community is bilingual and forms a subset of the majority-language community.

With this in mind, does anybody really need machine translation from the majority language to the minority language? Not really. It isn’t needed for comprehension and it isn’t meant to be used for production (except when followed by post-editing). When used for production without post-editing, it does more harm than good: it floods the world with inadequate translations which alienate those who speak the language well and mislead those who don’t. And yet, this is exactly what it seems to be (ab)used for most of the time.

Does anybody need the opposite, that is, machine translation from a minority language to a majority language? There the use case for comprehension is valid: there are English speakers in Ireland who don’t speak Irish, French speakers in Brittany who don’t speak Breton. But, in actual reality, the amount of content that gets translated from a minority language to a majority one is small, in fact practically zero, for the obvious reason that very little original content tends to be created in the minority language, and whatever little exists, the majority-language community isn’t interested in it. Most translation flows in the opposite direction, from the majority language to the minority language.

Neither of this means that machine translation has no legitimate uses for a minority language. But it does mean that those legitimate uses are much less numerous when compared to machine translation when both languages in the language pair are mainstream global languages with no subset bilingualism, such as German and Spanish. There the role of machine translation is often to ‘overcome’ the perceived ‘barriers’ posed by linguistic diversity. In a minority-language setting, however, we often want the opposite: we want to recreate and reinforce linguistic diversity. Machine translation is counter-productive here: it brings lots of low-quality content into the language (= inadequate translations from the majority language) and it allows original content authored in the minority language to ‘escape’ out of it with ease (= via translation to the majority language), leading to even more domain loss for the minority language (domain loss is a sociolinguistic concept describing a situation when a language is being used in fewer and fewer areas of public life until it is reduced to purely domestic use – think ‘kitchen Welsh’ – before it dies out completely).

If we don’t need machine translation, what do we need instead?

I am of the opinion that languages are judged by the quality of what is said and written in them – not by the quantity of what can be cheaply translated into and out of them. A language begins to die when nobody has anything original to say in it any more. Therefore, what minority languages need most of all, as far as technology is concerned, is tools that support the creation of original content in them: things like spellcheckers, grammar checkers and dictionaries, including importantly monolingual dictionaries. Minority languages need technology for moving content from one modality to another while keeping it in the same language: speech synthesis, speech recognition, optical character recognition. Minority languages need technology for disseminating content, which in this day and age means online media, web TV, internet radio. Finally, minority languages need technology that normalizes the use of the language in public and in as many domains as possible (by ‘normalize’ I mean ‘create the impression that it is normal, not strange, not weird’), which in the IT industry implies localized software: Windows, Facebook, your favourite word processor.

These are the tools the language technology industry needs to provide in order to support language revival. What they have in common is that they work ‘inside’ the language. They enable speakers of the minority language to have a life in the language. They facilitate contact with other speakers, they make possible the emergence of a language community engaged in a constant conversation with itself. A language that has this is a language worth using, learning and not forgetting, not for its own sake but for the sake of what’s said and written in it.

Only when all this has been provided and taken care of is the right time to start pouring money and effort into machine translation.

Advertisements

23 thoughts on “Do minority languages need machine translation?

  1. I feel something of a need to present the other side of the coin here. But first let me lead by pointing out that I do agree with most, if not all of what Michal has said above. However, I think to a less informed reader the above could be viewed as an attack on the notion of doing MT for a “small” language at all as it is implied that it can only do harm. This I would refute.

    The points made about the abuse of MT outputs are all very valid. However, the abuse of a tool is not the fault of the tool, rather that of the user. So I’d argue that it’s not MT which could be (or as Michal argues is) doing a disservice to languages such as Irish, rather it’s the users. Therefore it is those who would erroneously use a free online MT service for content production who need to be educated and indeed actively discouraged from working in this manner.

    This is really no different to how we educate our children when learning Irish or any other language at school. So rather than attacking the notion of MT for languages like Irish and Welsh we need to call out these people with the equivalent of the teachers’ edict of “You don’t just look it all up in the dictionary” but for using MT.

    Coming to some of the other points made. It is rightly pointed out that for a minority language lacking in language technology resources, proofing tools like spell/grammar checkers, dictionaries etc. should be prioritised before developing MT. I can’t argue against this. It’s common sense. Likewise for localised software and operating systems. However, since all of these things exist for languages such as Irish and Welsh both in the big company names like Microsoft as well as a wealth of open source tools, speakers of these languages can use them as they please. Perhaps they’re not being used, or are under utilised, but that is an entirely different issue with more to do with sociology, language perception and language policy than the development of tools or technologies for processing a given language.

    Finally, what’s also crucial to point out here is that for languages such as Irish and Welsh there is a legal requirement to provide content translated into/from these languages and the “majority” language. That being the case then being able to equip competent speakers who are trained translators with modern translation tooling isn’t something that’s optional.It is in fact a requirement in order to cope with the demand for translation. As such MT plus post editing is really the only viable solution to tackle this and it is already in use for a great many other language pairs and directions and has become very much the norm in the translation industry. To not provide some MT option would in fact be more of a hindrance here because it would slow the production of quality content in these languages.

    So to sum up what I had intended to be a short response but has turned out otherwise 🙂 Michal is indeed correct that if a minority language has no technology to support it then MT should be way down the “to do list” of things to be developed. Proofing/authoring tools are far more important and help content to be produced. However, for languages like Irish and Welsh, where these are available and where there is a genuine need to help content production through translation, MT is not just a viable option, I feel it is the only option to help keep the small number of capable translators on top of the mammoth task that they have. To tackle the crux of what the actual problem, ie. the abuse of MT outputs (which are raw, unfinished translations), instead of blaming the MT itself and deprecating its potential to be useful, we should instead be attacking the problem of those who would abuse MT and educating them as to how to go about things properly.

    • I’m glad we are in agreement. Thank you for responding so gentlemanly to my angry ran! 🙂 I want to add two observations (which I think you will agree with too).

      Firstly: It seems that the only valid use-case for Irish-to-English MT is to satisfy the massive demand for translation in public administration. I would argue that this demand is artificial, created by legislation based on the mistaken assumption that it is good for the language if we flood it with translations of lots of government documents, most of which are only of interest to specialists rather than the general public and some percentage of which will unavoidably be linguistically substandard, even with post-editing. But of course, this is a matter for politicians and language planners, it’s nothing you or I can fix with tech.

      Secondly: As for localized software, I would argue that it is only partially true that these things already exist for Irish. In the commercial sector, it will always be a never-ending race as new software and new versions come along. In the public sector, our own government is failing to provide most of its websites and online services in Irish, while investing massively into translating other, less important things. This is a consequence of the current language legislation in Ireland: the Official Languages Act places certain obligations on the public sector (such as to translate legislation) and absolves them of others (such as to have bilingual websites). Again, not a technological problem, but a problem nonetheless.

      • I wouldn’t say you were ranting. But I’d be worried others could pick it up that way.

        You’re right, currently the use case with the most urgent demand for EN/GA MT is in the public sector. You say that this is artificial and I see what you’re getting at. But if the legislation disappeared over night there would be uproar because there is still a demand to do have content as Gaeilge, it just happens to be more straightforward to oblige public bodies than others. So if we’re out to normalise the use of the language providing public information bilingually is a good place to start.

        I think we’re both arriving at more or less the same point here. That the tools aren’t so much the problem but rather the people and the policies that are driving their use. And you’re right, we’re not going to fix that ourselves. But I do think that if we do our part and produce the tech to provide the building blocks, then the folks with the policy know how (and clout) can help steer them to the right usage. The “problem” is far far bigger than the tools and resources themselves it’s in how we educate a society.

  2. As I said to Mihael in person at BIC, this is an opinion that he is entitled to, but it is just an opinion and not necessarily fact. Sadly, his argument is framed misleadingly and the main point is lost. As a computational linguist, with a background in machine translation (amongst others), here are my two cents’ worth:

    “Machine translation is viewed by many to be the most prominent artefact of language technology.”
    This is not true. I’ve worked in the field of machine translation (MT), know many world experts in this field and I have never met one that feels that MT is the “most prominent artefact” – either for well-resourced languages or lesser-resourced languages. The beauty of working as a research scientist in this field is to have the opportunity to be exposed to all facets of natural language processing, and see how the well-informed can acknowledge and respect that everyone has a contribution to make, and that all subfields are important. When working towards the preservation of a language through technology, specifically a low-resourced language where skillsets and funding is scarce, it’s inappropriate to assert than one effort or contribution is better than the other.

    Reading this blog, I feel disappointed that, while some points may be valid (yes, there is misuse of free online MT systems in the public domain), the pitch and the framing of the argument is misleading and essentially damaging when we consider that the audience may not be fully informed of the merits of state-of-the-art MT research.

    For many readers, the only exposure to machine translation is Google Translate. Google Translate is an open-domain translation system. That means that it aims to be able to translate “any text that it is given in any domain”. To understand why it fails to work so well for a minority language such as Irish, one must understand the science behind statistical MT. Systems like this are data-driven. By giving a computer tons of previously translated material known as “training data” (both language versions of a text), through statistics, the machine can calculate probabilities of the likelihood of a word or a combination of words being translated in a certain way. The beauty of it means that it will be able to handle text it has never seen before – providing it has a large amount of training data. In the case of Irish↔English text, the current size of this training data is small. Hence, Google Translate falls over.

    The argument therefore, should be that “the general public are not aware that an open-domain free translation system such as Google Translate is only intended to be a gisting tool”. The merits for those using it correctly for this purpose is that it makes text in one language (e.g. Irish) accessible to those who don’t speak the language, providing them with an idea of the content. Google provide this service for almost 80 languages world-wide. Mihael doesn’t necessarily have a problem with this language direction (Irish→English), but more so the English→Irish direction (this should be clear from the outset of this blog, and that’s why even the title is misleading).

    Yet, the suggestion that English->Irish translation systems are not needed at all is largely misleading and uninformed. There is a significant need for domain-specific English-Irish translation systems. One of the biggest complaint heard from naysayers of the Irish language is the amount of time and money “wasted” on translating documents into Irish (because Heaven forbid that a native speaker would want to read public material in their own language…?!). The need to meet these translation demands are also closely linked to our native language being recognised as an official language. This recognition is far-reaching and the significance of retaining this status is unquantifiable. Currently, domain-specific MT is helping the government meet these demands. And of course, these MT systems are used in a well-informed context where a translator oversees the automated translated output and edits where appropriate. This pilot system is proving cost effective and highly beneficial, and with further research and development, tailored MT systems will continue to make translation of documents into Irish a less contentious topic.

    So the argument that funding and focus should be prioritised towards other language technology tools is clearly flawed. MT (both statistical and rule-based) equally needs attention, to assist in producing domain-specific translated text in order to facilitate the needs of the native speakers of our language.

    So to answer a few questions:
    DO MINORITY LANGUAGES NEED MACHINE TRANSLATION?
    Yes, they do.

    DO USERS OF FREE ONLINE TRANSLATION SERVICES NEED TO BE BETTER INFORMED OF ITS INTENDED USE?
    Yes, they do.

    DO WE NEED CONTINUED INVESTMENT TO IMPROVE MACHINE TRANSLATION FOR IRISH?
    Yes, we do.

    This blog should be pitched at tackling the abuse of MT and not at attacking MT.

    • I’m sorry the tone of my article rubs you the wrong way, but it is a tone which is justified. For people who know Irish well, the proliferation of badly machine-translated signage, brochures and documents is irritating. This irritation needs to be voiced.

      I want everybody reading this article to understand that not everything is rosy, that Irish is not being rescued and that we do have a genuine problem here.

      The problem is that we are investing too much money and effort into translation (and therefore into machine translation), which is not a useful tool for language revival, while neglecting simple, lower-tech but more effective efforts such as fully bilingual websites with ‘active offer’ splashscreens and same-quality bilingual online services.

    • I think there’s no doubt that experts don’t think statistical MT is the pinnacle (or even the pillar) of language tech. The trouble is that it answers to something deep in the human imagination (resolving the Tower of Babel, perhaps) and does seem magical to the general public, much more so than the more mundane-seeming localizations. A book written in Irish is one thing, but a book that changes from Irish to English and back when you press on the cover is pure sorcery. And because of this awe, ordinary people expect far more from it than it can deliver.

      Of course, human translators have their limits too, as the famous out-of-the-office bilingual Welsh sign shows, and their mistakes can be downright deadly, as the confusion of “Look left” and “Look right” (mentioned in the same article) shows.

  3. Pingback: Do minority languages need machine translation?...

  4. “However, the abuse of a tool is not the fault of the tool, rather that of the user.”

    True. But this is as spurious an argument as the NRA claiming that “guns don’t kill people, people kill people”. Eminently true but ultimately blinkard. The maker, seller or controller of a tool cannot be absolved of the duty to consider what their actions will result in. If I manufacture a gun, I am part in a chain of events which will lead to death. Ditto if I sell a gun. Even more so if I recklessly sell a gun to someone to shouldn’t have a gun or make laws which result in people having guns who shouldn’t. Even if I don’t pull the trigger, I am complicit.

    The same applies to MT. We all know that people will misuse MT, the more so if human translation is a significant cost factor. So whoever takes part in the making of MT and wants to do so responsibly must ask themselves “how will this tool be misused”. And (by the way I love the way you brought it down to two words, comprehension vs production Michal!!) in minority languages, comprehension is rarely the issue – even if someone struggles with a text, the chances are they are learners and by giving them the easy way out, we actually hinder their learning process. There are other options for giving people a dictionary tool to help them with terminology. By far the biggest outcome in a language like Gaelic or Irish will be (bad) production.

    As we say, leac air a’ bheul, a flagstone on its (MT) mouth.

  5. Reblogged this on Dear Developer, and commented:
    I’m glad to see others in the field have similar apprehensions about MT in small languages

  6. “DO MINORITY LANGUAGES NEED MACHINE TRANSLATION?
    Yes, they do.”

    Why? Even as a “gisting tool” (nice word), it is mostly pointless in the context of minority languages. There is no Irish or Gaelic speaker on the planet who needs to rely on a gisting tool to read an English text. They’re all bilingual. So the optimum direction (Some Language > English) is 99% pointless to begin with for speakers of Irish. Sure, it would serve non-speakers/learners to get the gist of some Irish text but honestly, why should we care? English speakers have a glut of written materials at their disposal. If they really want to know what some Irish song or poem says (like there are that many untranslated ones), let them learn Irish or ask someone.

    Which leaves to other direction – English to Irish/Some Language. Which we know is rubbish.

    One day it *might* be possible for Irish to have enough data to feed a statistical MT system that way, but I doubt it. Languages smaller than Irish don’t stand a snowball’s. There simply isn’t the data to do Sami English or Manx English MT. There probably never will be.

    Scarce as resources are, we should always ask ourselves whom are we serving the most by spening money on a particular tool. If we serve the speakers/learners a lot then that can be justified. If it serves mainly non-speakers, then it cannot. It remains a “would be nice to have gimmick”.

    I’d like to hear ONE, just ONE, well-founded real life example of where mediocre MT (which would be an improvement on the current quality of Irish MT) would strongly serve the Irish speaking community.

    • “Sure, it would serve non-speakers/learners to get the gist of some Irish text but honestly, why should we care?”
      Speaking as a non-native speaker and a language learner I think that statement is really unhelpful to anyone and could quite easily be deemed as insulting to non-native learners of Irish. Which I’d add are in the majority when it comes to the Irish language. An elitist attitude like that does a real disservice to the language.

      “Which leaves to other direction – English to Irish/Some Language. Which we know is rubbish. .[snip]… There probably never will be.”

      Come visit our lab, I think it would be easier to show you the reality and talk about it rather than discuss the minutiae here.

      “Scarce as resources are, we should always ask ourselves whom are we serving the most by spening money on a particular tool. If we serve the speakers/learners a lot then that can be justified. If it serves mainly non-speakers, then it cannot. It remains a “would be nice to have gimmick”.

      I’d like to hear ONE, just ONE, well-founded real life example of where mediocre MT (which would be an improvement on the current quality of Irish MT) would strongly serve the Irish speaking community.”

      I’m happy to provide you with ONE example. As I’ve said above, come for a visit and I’ll show you. I get the feeling you’re judging MT having only ever seen the output of generic free online services. If we were all to judge any industry area on the merits of what is given away for free via a webpage I think we’d have a similarly skeptical view.

      To get back to Michal’s point (or what I believe to be the essence of it) language promotion is one thing, translation is another. Translation into Irish is needed (it’s the law after all) and supply can’t keep up with demand. In our lab we have developed a working SMT tool which has been deployed at minimal cost and is already resulting in savings that cover the initial very small set up costs.

      This has been made possible mostly because the majority of those working on the project are, like myself non-native language learners of Irish who have put in their time and expertise for free in order to help improve the situation for Irish. I can also introduce to the translators who use it every day in their jobs and who have plenty of examples of how it has helped them in translating, official documents into Irish to help serve those in the community who would like to conduct business as Gaeilge. Those people who otherwise would be left waiting.

  7. “non-speakers/learners” was meant as “non-speakers/non-learners”, sorry if that was ambiguous. I am well aware of the valuable contributions learners are making in any language but especially small languages. I’m also, incidentally, I’m also aware of the damage they can do.

    But if you can’t give an elevator pitch about how “yours” is working on what must be comparatively little data, then I don’t think I’ll be hopping on a plane soon.

    • Sorry, I didn’t realise you’re overseas. I’d still love to show you our system and introduce you to the team if you’re ever around Dublin.

      The elevator pitch is quite straightforward. We’ve built an EN>GA SMT engine from what few resources we could lay our hands on, tailored it for a specific domain and use case and deployed it for use in a production translation workflow. Currently it’s in use daily, translating in excess of 160k words per month. The users (Irish speaking professional translators) are very happy with the results and productivity testing indicates that some users are working at almost twice the rate they would be without MT in their workflow. This was accomplished in less than 6 months through funding for a student assistant and with myself and others on the team pitching in pro bono. As a result the deployed system is generating savings that mean it will have paid for itself quite soon. Also the increased work rate of the translators using the engine means a greater volume of parallel texts which we can ingest into training the engine. So it’s a virtuous circle.

  8. Ah but you’re talking about something rather different. You’re talking about a very specific non-public (for the want of a better word) controlled and human-proofed system. Basically an extension of a translation memory (I know this is a gross simplification, I mean in terms of workflow/process). I have few issues with that and I can see the value in that.
    Perhaps this is a bit of a case of talking past each other because what Michal and I are so frustrated about is the kind of “free” service (shades of Google Translate) which are just put out there however bad, to be abused by people and with no QA. Which is what (some) people here are trying to get Gàidhlig onto and which is sending shivers down my spine.
    So perhaps we ARE on the same side even – and I would like to take up your offer some time when in Dublin, which is relatively frequently.

    • I do think we are presenting more or less the same point. Though perhaps looking at it from different places. I’m speaking out mostly because I feel that what could be perceived as “MT-bashing” could have serious detrimental effects to the public perception of the nuances of the situation if someone doesn’t voice another side to things.

      If anyone scratches the surface of any MT system they’ll find that they are never meant to be used as a stand alone tool, even those provided ala Google Translate or AltaVista’s Babelfish (for those that remember that!). They are at best, as you say, jisting tools. So while our system isn’t available publicly, I do plan at some stage (once the relevant clearance has been received) to set up an online interface so others can use it if they wish. But as with all such demos/services caveat utilitor very much applies. This, for me will help identify and fix weaknesses. In essence, and this applies to the Google translate products too, the versions made available for free are not the best of breed, they’re tasters.

      In general though I’d argue what we’ve developed here is quite a different thing to TM. In the same way that a hammer and chisel are 2 very different tools, they’re both found in a carpenter’s toolkit and when used together they produce great results. I’d argue the same about TM and MT in a translator’s tool kit (and let’s not forget that it’s not so long ago that TM was a very controversial development in the world of translation too!). Also to extend the metaphor, if someone then takes that hammer and while attempting to use drops it and damages a car, or hits his/her own thumb then I’d hardly hold the tool manufacturer or the hardware shop liable.

      By the same token, if I was out to promote craftsmanship in carpentry, I certainly wouldn’t believe that selling hammers was the first step. It would be education. So in the same way that in my youth we abused the foclóir póca and swore to our teachers that our prose had to be perfect because it matched with the dictionary, I feel that today MT can and is being similarly abused. The solution now is the same as it was then, education. No one would attack the publishers of the foclóir póca for the tripe (let’s face it!) that its users most often produced. So in the same way I feel anything that would attack MT on the same grounds (when resources like digital dictionaries and proofing tools exist) is equally misplaced.

      Please do give us a shout when you’re next in Dublin. I think we’re both passionate about, more or less, the same things. We could probably do well with both our heads together.

  9. “But as with all such demos/services caveat utilitor very much applies” – that’s the crux though, isn’t it? As developers we know (or ought to…) that 1 in a 1000 ever reads the manual. Maybe. Hence my position that it’s irresponsible to let something lose knowing that the ways in which people WILL misuse will be damaging. Thing is, you have some degree of control over who gets to be a carpenter – either people apprentice or train but there will always be that significant number who just buy a hammer. And – and this we really should all know – that the moment you put something on the internet, you lose a lot of control.

    “I’d argue what we’ve developed here is quite a different thing to TM” – I didn’t mean it’s technically the same thing. But that like a TM it’s a tool supporting a translator.

    “I’m speaking out mostly because I feel that what could be perceived as “MT-bashing”…” – probably not. Professionals will have a more balanced view anyway and as for anyone else, the less they trust MT the better. The warier you are, the less likely you are just to use something willy-nilly 🙂 And I certainly don’t buy this “we have to present a unified front for the rest of the world”, that’s bad medicine, things start going wrong when you can’t have a robust debate in public.

    But I will call in on when I’m in Dub next. Who knows what might come of it 🙂

  10. Pingback: Link love: language (64) | Sentence first

  11. Pingback: Weekly translation favorites (Dec 4-10)

  12. Pingback: Machine translation: Cause or solution of all evils? « Translator T.O.

  13. I agree that at present statistical MT systems do not serve minority languages well, though I think they have their uses, and as a learner, I find Google translate a useful aid.

    Statistical MT systems are cheap to produce, because they can simply be fed lots of data and left to it.

    Rule based MT works better, particularly if the available body of manual translation is limited, but requires a lot of human labour to build, therefore is expensive to create.

    I believe in the next 5-10 years we’ll see significant advances in AI based MT, which will be able to train with significantly less human interaction and be able to learn a language from a much smaller pool of translated work, giving it many of the benefits of both approaches, and I believe these approaches will produce high quality translation at very low cost. And probably also provide real time spoken interpretation.

    Whether this is to be welcomed or dreaded, and whether it will aid the recovery of Irish or speed its demise, I cannot say.

    James

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s