It’s a truism to say that Wikipedia has been a resounding success. Not only does it have a large community of contributors but it also has an even larger community of readers: people who actually go to Wikipedia to get information. Wiktionary, on the other hand, has been more of an “unmitigated failure”, in the words of the lexicographer Patrick Hanks that I’ve overheard at the eLex conference in Belgium this October. Sure, Wiktionary does have an active contributor community, like Wikipedia does. But it has not achieved the status of the “go-to place” for lexical information, like Wikipedia has for factual information. It seems to me that, by and large, people don’t actually go to Wiktionary to find out about the meanings, usage and translations of words. People tend to prefer proprietary dictionaries (some of which are also available online for free). The question is, why?
At the same eLex conference, the keynote speaker Michael Rundell suggested that the reason is probably in the nature of the business itself. Lexicography is a different kind of activity than encyclopedia writing. For any encyclopedic subject, be it Bulgarian history or the biology of yeast or the ergonomics of door handles, there is almost certainly an expert somewhere in the world willing to write an article about it for Wikipedia. But when it comes to lexicographic knowledge, the situation is the complete opposite. There are no experts on the words “seal”, “bank” or “close”. Dictionaries are not written by people who already have knowledge about individual words, dictionaries are written by people who do research to get that knowledge from sources such as corpora and citations. You can’t write a dictionary article like you write an encyclopedic article, you can’t sit down and say “okay, I’m going write down everything I know about the word ‘seal’” – unless, of course, you’re a Wiktionary contributor. Apparently, this is exactly what they do.
I tend to agree with Michael Rundell on that. But I also think that there is another reason. The formal structure of dictionary articles is more complex than that of encyclopedic articles. An encyclopedic article is basically free-form text, with some formatting such as paragraph breaks, headings, bulleted lists and the occasional table or illustration. Any literate person can produce content in that format. Dictionary articles, on the other hand, are more complex, they have an intricate structure in which the meaning of a word is subdivided into senses and subsenses, various bits of information are adorned with various bits of other information such as part-of speech tags and usage labels that the lexicographer has picked from a pre-agreed list, there are authentic example sentences that have been chosen for a good reason, and so on. Lexicographers usually work with specialized software which typically uses an XML schema to enforce structure and to guarantee consistency. In other words, a dictionary article is a more complex text type than an encyclopedic article. Generally, people who are not trained as lexicographers do not possess the skills to write them. Most Wiktionary contributors seem not to be trained lexicographers and it shows. Wiktionary articles could be described as semi-structured at best. There isn’t much explicit structure in the articles beyond surface formatting such as italic type and bullets. There is very little consistency in labelling as the editorial system doesn’t enforce it. Also, the way in which the labels are applied and the meanings are demarcated sometimes reveals a lack of understanding as to what is a good usage example, what is the difference between a part-of-speech tag and a usage label, between a translation and a gloss, and so on.
It seems that the articles on Wiktionary are not based on lexicographic research much, they’re more like brain dumps of what the contributors happen to know explicitly about the word in question. But lexical knowledge is a different cattle of fish than factual knowledge. Most of the important facts about how words are used, how they behave, how they combine with other words and how they are related to other words, is not easily accessible to introspection, it is inexplicit knowledge and it can only be revealed by examining large quantities of language in use, such by scanning large numbers of concordance lines extracted from a corpus. Which is hard work and I don’t think Wiktionary contributors do that or even know how to do that.
So there are two reasons why Wiktionary isn’t succeeding. Reason number one is that lexical knowledge is harder to come by than factual knowledge. Reason number two is that once lexical knowledge has been obtained, it is harder to encode than factual knowledge. Having realized this, it would be easy to accuse Wiktionarians of dilettantism. That is not what I want to do here, though. The idea to create an open-source, crowd-sourced dictionary is a good idea. The world would be all the better for it if a good-quality, open dictionary existed that’s owned by everybody and nobody. Wiktionary is to be commended for trying to bring that vision into reality. But Wiktionary’s problem is that it is using the same model as Wikipedia in an area where that model doesn’t suit.
The question now is, is the idea of an open-source, crowd-sourced dictionary feasible at all? It would appear that the answer to that question will have to be no, at least not if the dictionary is to be good. Dictionary writing is a specialized craft which requires a large amount of training. That’s in fact why there is a profession with a name – lexicography – and that’s how it differs from encyclopedia writing. Most people are not lexicographers and those who are are busy working for commercial dictionary publishers. The last thing a professional lexicographer wants to do after a busy day writing dictionaries is write more dictionaries for free. The only people who are prepared to contribute to an open-source dictionary are non-lexicographers. Can you ever get a good dictionary that’s written by non-lexicographers?
I think that the only way to make a dictionary written by non-lexicographers is to set yourself a less ambitious goal. For example, instead of aiming to write a dictionary, you could aim to collect as many translation equivalents for each word, without analysing its meaning into senses and subsenses and without any labelling. That way, you’d be collecting lexical data in the raw sense, but you wouldn’t actually be writing a dictionary – at best, you’d be writing a glossary. Maybe that’s the way Wiktionary should have chosen.