Why Wikipedia works but Wiktionary doesn’t

It’s a truism to say that Wikipedia has been a resounding success. Not only does it have a large community of contributors but it also has an even larger community of readers: people who actually go to Wikipedia to get information. Wiktionary, on the other hand, has been more of an “unmitigated failure”, in the words of the lexicographer Patrick Hanks that I’ve overheard at the eLex conference in Belgium this October. Sure, Wiktionary does have an active contributor community, like Wikipedia does. But it has not achieved the status of the “go-to place” for lexical information, like Wikipedia has for factual information. It seems to me that, by and large, people don’t actually go to Wiktionary to find out about the meanings, usage and translations of words. People tend to prefer proprietary dictionaries (some of which are also available online for free). The question is, why?

At the same eLex conference, the keynote speaker Michael Rundell suggested that the reason is probably in the nature of the business itself. Lexicography is a different kind of activity than encyclopedia writing. For any encyclopedic subject, be it Bulgarian history or the biology of yeast or the ergonomics of door handles, there is almost certainly an expert somewhere in the world willing to write an article about it for Wikipedia. But when it comes to lexicographic knowledge, the situation is the complete opposite. There are no experts on the words “seal”, “bank” or “close”. Dictionaries are not written by people who already have knowledge about individual words, dictionaries are written by people who do research to get that knowledge from sources such as corpora and citations. You can’t write a dictionary article like you write an encyclopedic article, you can’t sit down and say “okay, I’m going write down everything I know about the word ‘seal’” – unless, of course, you’re a Wiktionary contributor. Apparently, this is exactly what they do.

I tend to agree with Michael Rundell on that. But I also think that there is another reason. The formal structure of dictionary articles is more complex than that of encyclopedic articles. An encyclopedic article is basically free-form text, with some formatting such as paragraph breaks, headings, bulleted lists and the occasional table or illustration. Any literate person can produce content in that format. Dictionary articles, on the other hand, are more complex, they have an intricate structure in which the meaning of a word is subdivided into senses and subsenses, various bits of information are adorned with various bits of other information such as part-of speech tags and usage labels that the lexicographer has picked from a pre-agreed list, there are authentic example sentences that have been chosen for a good reason, and so on. Lexicographers usually work with specialized software which typically uses an XML schema to enforce structure and to guarantee consistency. In other words, a dictionary article is a more complex text type than an encyclopedic article. Generally, people who are not trained as lexicographers do not possess the skills to write them. Most Wiktionary contributors seem not to be trained lexicographers and it shows. Wiktionary articles could be described as semi-structured at best. There isn’t much explicit structure in the articles beyond surface formatting such as italic type and bullets. There is very little consistency in labelling as the editorial system doesn’t enforce it. Also, the way in which the labels are applied and the meanings are demarcated sometimes reveals a lack of understanding as to what is a good usage example, what is the difference between a part-of-speech tag and a usage label, between a translation and a gloss, and so on.

It seems that the articles on Wiktionary are not based on lexicographic research much, they’re more like brain dumps of what the contributors happen to know explicitly about the word in question. But lexical knowledge is a different cattle of fish than factual knowledge. Most of the important facts about how words are used, how they behave, how they combine with other words and how they are related to other words, is not easily accessible to introspection, it is inexplicit knowledge and it can only be revealed by examining large quantities of language in use, such by scanning large numbers of concordance lines extracted from a corpus. Which is hard work and I don’t think Wiktionary contributors do that or even know how to do that.

So there are two reasons why Wiktionary isn’t succeeding. Reason number one is that lexical knowledge is harder to come by than factual knowledge. Reason number two is that once lexical knowledge has been obtained, it is harder to encode than factual knowledge. Having realized this, it would be easy to accuse Wiktionarians of dilettantism. That is not what I want to do here, though. The idea to create an open-source, crowd-sourced dictionary is a good idea. The world would be all the better for it if a good-quality, open dictionary existed that’s owned by everybody and nobody. Wiktionary is to be commended for trying to bring that vision into reality. But Wiktionary’s problem is that it is using the same model as Wikipedia in an area where that model doesn’t suit.

The question now is, is the idea of an open-source, crowd-sourced dictionary feasible at all? It would appear that the answer to that question will have to be no, at least not if the dictionary is to be good. Dictionary writing is a specialized craft which requires a large amount of training. That’s in fact why there is a profession with a name – lexicography – and that’s how it differs from encyclopedia writing. Most people are not lexicographers and those who are are busy working for commercial dictionary publishers. The last thing a professional lexicographer wants to do after a busy day writing dictionaries is write more dictionaries for free. The only people who are prepared to contribute to an open-source dictionary are non-lexicographers. Can you ever get a good dictionary that’s written by non-lexicographers?

I think that the only way to make a dictionary written by non-lexicographers is to set yourself a less ambitious goal. For example, instead of aiming to write a dictionary, you could aim to collect as many translation equivalents for each word, without analysing its meaning into senses and subsenses and without any labelling. That way, you’d be collecting lexical data in the raw sense, but you wouldn’t actually be writing a dictionary – at best, you’d be writing a glossary. Maybe that’s the way Wiktionary should have chosen.


5 thoughts on “Why Wikipedia works but Wiktionary doesn’t

  1. Have you read “The Meaning of Everything”, Simon Winchester’s “biography” of the Oxford English Dictionary? I took it in as an audio book last month. (Audio books are what turn the tedium of working out into something enjoyable for me. I only listen to them at the gym, so I am forced to go to the gym to listen to the next bit. Right now, it’s Victor Pelevin’s “The Sacred Book of the Werewolf” which has me on the hook.) Anyway, Winchester does an excellent job of analyzing the whole lexicographical enterprise, both theory and practice. I highly recommend it (although he does wax a bit chauvinistic about the glories of the English language).

  2. Alternatively there were a number of freely available dictionaries on the web before Wiktionary came along and urban dictionary has already grabbed the web jargon market.

  3. Don’t let fat cost-free or perhaps gentle
    meals trick anyone; them usually comprise copious
    amounts connected with one more unhealthy element. Individuals who have underlying medical condition should also consult a physician before taking this
    diet pill. With more time now passed, those rates would be even higher as obesity is still
    on the rise.

  4. Michal, I regularly consult the Russian entries on Wiktionary, and it is very good, with very good coverage and information. Maybe it’s come on since you discussed this year. They have a template to assign noun classes, but it is very detailed, and a slight mistake results in the assignment of the wrong class. So i do regularly see the wrong stress patterns in declensions. Also the Russian Wiktionary (ru.wiktionary.org) is much more authoritative for Russian than the English Wiktionary. I always find the Russian editors get it right on the Russian-language Wiktionary where the English editors have messed things up.

    • I’m not surprised. I too have found some new respect for the German Wiktionary in particular and it is now my “go-to place” for German. I often hear people, including professional lexicographers, saying that the German Wiktionary is surprisingly reliable. I guess I’ll have to rethink what I thought back in 2009!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s