Wikipedia: Mirror, Microcosm, and Motorof Global Linguistic Diversity
Virginie Mamadouh
ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Linguistic and Political Geographies of Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Wikipedia as a Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
A Plurality of Wikipedias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Multilingualism on Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Multilingualism in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Wikipedia as a Microcosm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Negotiating Multilingualism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Creating a New Wikipedia: How to Make Your Language Count in the World . . . . . . . . . . . . 17Disputes About Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Wikipedia as a Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
AbstractWikipedia has long presented itself as “the biggest multilingual free-contentencyclopedia on the Internet.” This chapter examines Wikipedia as a multilingualproject from a geographical perspective. It examines how multilingualism isrepresented, performed, and reproduced by Wikipedians (i.e., users of Wikipediaand more specifically the community of editors, as opposed to the much broadpublic of readers consulting the encyclopedia). The chapter discusses first the way
The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general use.
V. Mamadouh (*)Department of Human Geography, Planning and International Development, Universiteit vanAmsterdam, Amsterdam, The Netherlandse-mail: [email protected]
© Springer Nature Switzerland AG 2019S. D. Brunn, R. Kehrein (eds.), Handbook of the Changing World Language Map,https://doi.org/10.1007/978-3-319-73400-2_200-1
1
linguistic diversity is mirrored in the organization of Wikipedia through thecoexistence of a plurality of monolingual Wikipedias called after that language(such as English Wikipedia, Portuguese Wikipedia, Japanese Wikipedia, etc.) andthe representations of the links between them. It foregrounds the inequalitiesbetween Wikipedias and the special position of the English Wikipedia. Thechapter then turns to the way Wikipedia is a multilingual environment and tothe dynamics that shape the position of languages in the community – includingdecisions regarding the creation of new Wikipedias in new languages. Finally, itquestions the possible effect of Wikipedia on global linguistic diversity in thephysical world and how it can influence the evolution of specific languages andtheir position in the world and more specifically the position of English as theglobal language of communication.
KeywordsWikipedia · Linguistic diversity · Multilingualism · English
Introduction
Wikipedia has long presented itself as “the biggest multilingual free-content ency-clopedia on the Internet” (quoted in Ensslin 2011: 555, note 1). It now brands itselfas “the encyclopedia that everyone can edit,” but the multilingual and globalambitions remain explicit in the phrasing used in the opening of the Wikipediaarticle about Wikipedia:
Wikipedia [. . .] is a multilingual, web-based, free encyclopedia based on a model of openlyeditable content. It is the largest and most popular general reference work on the Internet.(https://en.wikipedia.org/wiki last accessed 15 June 2018)
In the same vein, Wikipedia founder and original funder, Jimmy Wales, is knownfor having expressed the ambition that the encyclopedia should be “the sum of allknowledge” (a title used in Salor’s dissertation in 2012). Or as the vision of theWikimedia Foundation proclaims:
Imagine a world in which every single human being can freely share in the sum of allknowledge. That’s our commitment. (https://wikimediafoundation.org/about/vision/ lastaccessed 15 September 2018)
All these declarations suggest that Wikipedia is a particularly fascinating websiteto study how the Internet shapes the new language map. This chapter is doing so witha contribution from political geography to Wikipedia studies. It does not deal withlinguistic aspects itself but with geographical aspects of languages (in plural),linguistic diversity, and multilingualism. While the chapter entitled “Writing theWorld in 301 Languages: A Political Geography of the Online EncyclopediaWikipedia” (Mamadouh, this volume) presents a short history of Wikipedia and ashort account of its organization before discussing how it circulates (disputed)
2 V. Mamadouh
geopolitical representations unevenly across the world, this chapter focuses onWikipedia as a multilingual project. It discusses how linguistic diversity and multi-lingualism are represented and enhanced (Wikipedia as mirror of global linguisticdiversity), how multilingualism is practiced by multilingual Wikipedians, whatlanguage policies have been developed for the establishment of a new languageversion, how languages are sometimes contested (Wikipedia as microcosm of globallinguistic diversity), and how Wikipedia might affect the evolution of specificlanguages and the relations between language groups (Wikipedia as motor of globallinguistic diversity).
Linguistic and Political Geographies of Wikipedia
Since its creation in 2001 as an English language online encyclopedia using wikitechnology, a website enabling users to work collaboratively to modify its contentand structure, Wikipedia has expanded dramatically as becoming the fifth mostvisited website in the world (according to the authoritative webwatcher AlexaInternet) and foremost a multilingual project, with almost 300 language versions.Each version is called a Wikipedia in everyday language in such a way thatWikipedia is actually a collection of monolingual Wikipedias. The oldest and largestone, English Wikipedia, is said to have outnumbered by words the most voluminousencyclopedia of all times, the Spanish language encyclopedia of the Spanish lan-guage Enciclopedia Espasa in December 2005 (Van Dijk 2009: 234). A few monthslater in March 2006, the English Wikipedia reached the symbolic threshold of onemillion articles, followed by German (December 2009), French (September 2010),Dutch (December 2011), and 11 others, since most recently Chinese (April 2018)and Portuguese (June 2018).
Linguists have long tapped into Wikipedia to generate a corpus for their researchabout a single language and about crosslinguistic communication. Sociolinguistshave used it as a valuable source to study transcultural encounters. A majority of thework is quantitative using a large amount of data dumps to search for editingpatterns. Computer scientists use them to trace networks and patterns of meaning,sometimes explicitly looking for a Ur-Wikipedia, i.e., the proto-Wikipedia, behindthe diversities of the different versions (Warncke-Wang et al. 2012, see also Baoet al. 2012). Other studies are more qualitative. Kopf (2018), for example, appliescritical discourse analysis to study the discussions on the talk page of the article onthe European Union in the English Wikipedia between 2001 and 2015 to analyzehow the EU was perceived and represented and how Wikipedians negotiatedcontested issues.
In this chapter languages are not studied for their linguistic characteristics but fortheir sociospatial ones. The objective is to discuss the way Wikipedia engages withthe global linguistic diversity from a geographical perspective. For geographers,languages have spatial characteristics, not only regarding the distribution of theirspeakers but also for the way they are employed to make sense of the world and theways people use them to contribute to place-making. As Claude Raffestin (1995,
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 3
2012) documented, language and territory are co-constitutive. Informed by politicalgeography and critical geopolitics, this study is sensitive to the way languages areused politically, for example, to justify territorial claims and to make sense of livedterritory and place-making. Territorial claims of states of nationalist movements areoften justified by the foregrounding of language differences and/or similarities, andlanguage claims can be instrumental to territorial projects: promoting a specificlanguage variety as a language can be part of the project to achieve politicalautonomy or independence from the state in which one resides. On the other hand,states can promote one language as the national language and discourage or evenforbid others to homogenize culturally their population and foster a sense of nationalidentity rooted in a common language. Language conflicts (which are of course notconflicts between languages and often are not about languages but about conflictsbetween language groups and about power and resources) have been extensivelystudied by political geographers such as Claude Raffestin, Colin H. Williams, andAlexander B. Murphy.
By contrast the representation of linguistic diversity is an understudied topic ingeography and commonsensically represented as a juxtaposition of monolingualterritories. Monolingualism is taken for granted as default, and it is way too oftenperceived and represented as the norm, both at the individual and the collective level,while multilingualism is seen as a complication or even a danger for sanity andcohesion. In Europe where the modern territorial states emerged after the Peace ofWestphalia, “one state, one nation, one language, one territory” has long been takenfor granted as the ideal for a stable political organization. Many conflicts rangingfrom wars about mixed borderlands through forced cultural assimilation and plannedlinguicide to ethnic cleansing have been justified by narratives proclaiming theobjective of achieving such ideal. More recently there is more attention for the factthat multilingualism is a more common condition and that the way it is organizedsocially and spatially can be studied. But then again, economic globalization andtransnational migration have created new configurations of linguistic diversity, withan extreme linguistic variety in local contexts, of which the urban multilingualism ofNew York City, London, or even Amsterdam, is emblematic.
A linguistic geography of Wikipedia could entail an analysis of the representationof languages as geographic phenomena and as features of geographic objects such asplaces, countries, or networks. Such a geographical inquiry could examine howWikipedians in different Wikipedias represent languages and more specifically theirgeographical reach, language groups, language contacts, and language conflicts.Likewise, it could examine the linguistic data provided in articles about places, forexample, the languages noted as being spoken in a specific country. This focus,however, is not the purpose of this chapter. Instead it will discuss howWikipedias bytheir very existence inform us about languages in plural, i.e., about the globallinguistic diversity and how multilingualism is practiced.
The analysis is based both on secondary literature and on primary documents(Wikipedia articles, talk pages, Wikimedia articles, and debates). Due to the finitelinguistic capabilities of the author, it focuses mainly on sources in English and othermajor European languages such as French, Dutch, German, Spanish, Italian, andPortuguese (all languages with a largeWikipedia featuring over one million articles).
4 V. Mamadouh
This is problematic because it overlooks the debates in smaller Wikipedias and innon-Western languages (unless they are reported in exchanges in these selectedlanguages) and, therefore, miss important voices and insights. However, mostmeta-discussions on Wikipedia projects, especially those on the opening and theclosure of Wikipedias, are carried out in English. The position of the Englishlanguage as lingua franca and as a hegemonic language (see Mamadouh 2018) inthe world of Wikipedia is, therefore, also an important issue.
Wikipedia as a Mirror
Wikipedia can first be studied for the way it reflects the global linguistic diversity,first by the mere existence of so many linguistic versions and second by the explicitrepresentation of multilingualism, i.e., of contacts and crossings between languages.
A Plurality of Wikipedias
There are two ways to stress the scope of Wikipedia. One is to stress its size (themere number of articles, editors, users, etc. For statistics in real time, see https://stats.wikimedia.org/EN/Sitemap.htm), and the second is to focus on the extraordinarynumber of language versions. New Wikipedias have been created since 2001 andamount now to almost 300 (please see “Writing the World in 301 Languages: APolitical Geography of the Online Encyclopedia Wikipedia” (Mamadouh, this vol-ume)). Figure 1 is a snapshot of an animation showing the extraordinary growth ofthe number of Wikipedias over time including both their creation and their growingsize. This is truly unmatched by other projects. It does represent the global linguisticdiversity adequately in the sense that it represents a fraction of all the languagesspoken in the world (about 7097 according to ethnologue.com of which about halfare not written at all) and that all larger and more institutionalized languages (e.g.,the official languages of independent states) are well represented.
Linguistic diversity is mirrored in the organization of Wikipedia through thecoexistence of a plurality of monolingual Wikipedias called after that language(such as English Wikipedia, Portuguese Wikipedia, and Japanese Wikipedia). ButWikipedia actively engages in multilingualism, not just through the parallel exis-tence of different language versions. Even more significantly, these Wikipedias areprominently linked to each other, through interwiki links, allowing the users toswitch between equivalent articles in different language versions. These articlesare not identical: although sometimes an article in one of the languages has beentranslated from another language, each being the result of a unique process ofcollective editing. Actually, quite a large share of the entries are unique to a languageversion, which means that they are not linked to any entry addressing the same topicin another language. The interwiki links are a prominent feature of a Wikipedia page,bringing linguistic diversity to each monolingual page since the name of the lan-guages in which an equivalent article is available are listed in the original language(e.g., Deutsch for German, Tiếng Việt for Vietnamese, 日本語 for Japanese, etc.) in
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 5
Fig.1
Astill
oftheanim
ationby
ErikZachteof
thegrow
thof
thelang
uage
Wikipedia.(Availableathttps://stats.wikim
edia.org/wikim
edia/animations/growth/
AnimationP
rojectsG
rowthWp.html.Accessed21
September20
18)
6 V. Mamadouh
the left margin of every page (this is not true for displays on the Wikipedia app formobile devices). Even if users are monolingual and do not click on these interwikilinks, their visibility does convey a strong sense of global linguistic diversity tousers.
Multilingualism on Display
The multilingual character of Wikipedia is announced upfront in the slogan “thebiggest multilingual free-content encyclopedia on the Internet” (a phrase mentionedat the beginning of this chapter) as well as in the logo. It is a three-dimensionaljigsaw puzzle representing a globe, and each piece of the puzzle is marked with aglyph (a letter or a sign) and in a unique script, representing the beginning of thename Wikipedia in a main language in these different scripts (Fig. 2).
Each piece bears a glyph (a letter or other character), or glyphs, symbolizing themultilingualism of Wikipedia. As with the Latin letter “W,” these glyphs are in mostcases the first glyph or glyphs of the name “Wikipedia” rendered in that language. Theyare as follows:
• Near the center is Latin ⟨W⟩. Above that is Japanese ⟨ウィ⟩wi; below it are Cyrillic ⟨И⟩ i,Hebrew ⟨ו⟩ w, and (barely visible at the bottom) Tamil ⟨ ⟩ vi.
• To the left of the ⟨W⟩ is Greek ⟨Ω⟩ ō, and below that are Chinese ⟨維⟩ wéi, Kannada ⟨ ⟩vi, and (barely visible at the bottom) Tibetan ⟨ ⟩ wi.
• At left, from the top down, are Armenian ⟨ ⟩ v, Cambodian ⟨ ⟩ vĕ (lying on its side),Bengali ⟨উ⟩ U, Devanagari वि vi, and Georgian ⟨ ⟩ v.
• The rightmost column is Ethiopic ⟨ ⟩ wə, Arabic ,w⟨و⟩ Korean ⟨위⟩wi, and Thai ⟨วิ⟩wi.• The empty space at the top represents the incomplete nature of the project, the articles,
and languages yet to be added (https://en.wikipedia.org/wiki/Wikipedia_logo).
Fig. 2 The globe in theWikipedia logo (For a series ofthe evolution of the logo fromNupedia to present the logo,see illustrations at https://es.wikipedia.org/wiki/Marcas_corporativas_de_Wikipedia#/media/File:WikipediaLogo-TheOfficialFive.jpg)
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 7
There have been debates about the logo as some have questioned the centralpositions of the Latin and the Greek alphabets (with the w and Ω) followed by theassociated Kanji and the Cyrillic letter contrasting with the peripheral ones for theArabic of the Devanagari characters for example. And there are more, includingKlingon (a fictional language from the television series Star Trek) which wasincluded and lasted until 2010 when it was replaced by Amharic and since usersreported the lack of precision and care in the representation of the letters and signs inJapanese, Devanagari, Chinese, and Greek (Ensslin 2011, pp. 549–553).
The text at the bottom of the logo was in English, Wikipedia The Free Encyclo-pedia, and is available in translation for the main pages of the other Wikipedias, forexample,Wikipédia L’encyclopédie libre in French, Wikipedia de vrije encyclopediein Dutch,Wikipedia Vapaa tietosanakirja in Finnish, Vikipēdija Brīvā enciclopēdijain Latvian, Esperanta Vikipedio in Esperanto, ةرحلاةعوسوملاايديبيكيو in Arabic,etc. This way the logo is automatically localized depending on where one accessesthe Internet.
The splash page of the overall project at https://www.wikipedia.org/ also reflectsboth diversity and clear language hierarchies. The largest Wikipedias are listedaround the logo. Below is a search engine (with a menu for language choiceamong the larger Wikipedias), followed by a menu “Read Wikipedia in your ownlanguage.” (This page displays the interface in the language dominant at the locationof the IP, but the names of the language are filed in the original language.) The menuincludes a list of languages that are visible; the languages are grouped by size, andwithin each cluster, they are ranked alphabetically.
• Those with 100,000+ articles• Those with 10,000+ articles (but less than 100,000)• Those with 1000+ articles (but less than 10,000)• Those with 100+ articles (but less than 1,000)• And a link to a list with other languages (the smaller Wikipedia with less than
100 articles)
A reproduction of the main page in Ensslin (2011, pp. 548) shows that thesehierarchies have long been in place. The structure is intact in 2018, but the sizecategories and the exact list of languages have evolved since the language versionshave not necessarily developed at the same pace.
For editors, that is, active Wikipedians contributing to the encyclopedia, multi-lingualism is additionally foregrounded in many features of Wikipedia apart from theinterwiki links and in many projects of the Wikimedia Foundations. For example, itis clearly represented in the Babel user language template that Wikipedians can useon their personal page to announce their linguistic skills. To label a language, theyuse codes allocated to languages by the International Organization for Standardiza-tion (ISO), the main international standard-setting body based in Geneva andcomposed of representatives from various national standard organizations. TheBabel user template uses ISO codes (en for English, fr for French, nl for Dutch,etc.) and a standardized classification of expertise: mother tongue, ranging from
8 V. Mamadouh
1 (for basic ability) to 5 (for professional level). The box displays the description ofthe level of expertise in the language in question. For example, “This user is able tocontribute with a professional level of English” for en-5 vs “This user is able tocontribute with a basic level of English” for en-1 (Fig. 3).
It is also noteworthy that members of the language committee are listed along withtheir linguistic skills (not other information provided about the committee), and theyare all multilingual, mentioning at least three languages, up to 17 (of which 11 are atthe basic level) for one member (Fig. 4), and all mention fluency in English, at leasten-3, i.e., “can write in this language with some minor errors.” The coding presup-poses, however, some familiarity with the Latin alphabet, Arabic numbers, and thelanguage codes used online; in other words “Wikipedia’s multilingual endeavours areskewed towards a power imbalance in favour of code-savvy Western (and specificallyAnglophone) users” (Ensslin 2011, p. 554). So it celebrates multilingualism, butparticipates into the reproduction and deepening of English hegemony.
Another expression of the multilingual habitus of Wikipedia is the Wiktionary, asister project of Wikipedia to construct free-content multilingual dictionaries (nowexisting in 171 languages). Wiktionaries collect lexicographic data that can be usedfor various natural language processing tasks (see, e.g., the English Wiktionary athttps://en.wiktionary.org/wiki/Wiktionary:Main_Page) which can be particularlyuseful resources for Wikipedians editing articles on Wikipedia.
Moreover, there is Babylon, the Wikimedia translators’ portal at https://meta.wikmedia.prg/wiki/Meta:Babylon with a talk page, a mailing list, a newsletter, and
en-5
en
en-4
en-3
en-2
en-1
This user can contribute with aprofessional level of English.
This user can contribute with a near-native level of English.
This user can contribute with anintermediate level of English.
This user can contribute with a basic level of English.
This user can contribute with anadvanced level of theEnglish language.
This user is a native speaker of theEnglish language.
Fig. 3 Snapshot codes foruser-en
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 9
an IRC channel. There are multilingual Wikisources. Translation requests can befiled to encourage volunteers to work on specific articles deemed worthy of trans-lation between Wikipedia A and Wikipedia B, either because the existing article hasbeen noted for its qualities or because the topic is particularly relevant and priori-tized, since the lack of an entry is assessed as an urgent need.
Fig. 4 Snapshots of the Babel box of one of member of the language committee
10 V. Mamadouh
Table
1Dataforthe15
largestWikipedias
Con
tent
Users
Editors
Edits
Usage
Cod
eLangu
age
Articles
Speakers
Eds/
speakers
5+ edits
100+
edits
Adm
ins
Bots
By
bots
(%)
Unregistered
humans(%
)Origina
Page
view
s/ho
urOrigin
enEng
lish
5,70
8,69
61121
M26
29,207
3472
1274
312
931
39.3%
USA
4,60
4,45
641
.5%
USA
ceb
Cebuano
5,38
1,90
120
M1
282
460
9920
a52
838
.6%
USA
svSwedish
3,77
1,25
010
M58
582
106
6640
5820
87.4%
Sweden
45,823
89%
Sweden
deGerman
2,21
4,81
313
2M
3951
7581
219
837
410
2083
.4%
Germany
563,24
477
.2%
Germany
frFrench
2,03
6,68
228
5M
1645
8176
316
110
719
2180
.1%
France
357,27
968
%France
nlDutch
1,94
0,50
428
M40
1109
203
4526
938
1970
%Netherlands
74,950
65%
Netherlands
ruRussian
1,49
3,77
426
4M
1230
5453
887
8417
2572
.5%
Russia
559,95
858
.6%
Russia
esSpanish
1,46
6,91
851
3M
839
4152
971
3618
3732
.2%
Spain
509,01
422
.1%
Spain
itItalian
1,45
8,03
368
M34
2310
374
109
173
2932
95%
Italy
220,02
186
.6%
Italy
plPolish
1,29
7,36
043
M29
1235
246
106
6835
1990
.9%
Poland
138,74
490
.3%
Poland
war
Waray-
Waray
1,26
3,32
73M
512
22
4294
7a
392
83.2%
China
viVietnam
ese
1,18
7,69
168
M7
446
6423
130
5810
83.9%
Vietnam
25,850
83%
Vietnam
jaJapanese
1,118,66
612
8M
3444
0037
049
579
4096
.3%
Japan
524,15
196
.1%
Japan
(con
tinued)
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 11
Table
1(con
tinue
d) Con
tent
Users
Editors
Edits
Usage
Cod
eLangu
age
Articles
Speakers
Eds/
speakers
5+ edits
100+
edits
Adm
ins
Bots
By
bots
(%)
Unregistered
humans(%
)Origina
Page
view
s/ho
urOrigin
zhChinese
1,02
0,41
61107
M3
2827
403
83111
1725
39.1%
Taiwan
264,04
751
.3%
Taiwan
ptPortugu
ese
1,00
3,98
823
6M
615
0820
369
213
2436
83.4%
Brazil
145,62
272
.6%
Brazil
Speakers–Estim
ationof
thenu
mberof
prim
aryandsecond
aryspeakersin
millions
Editors/speakers–Num
berof
edito
rswith
5+edits/m
onth
(3mon
thsaverage)
permillionspeakers
Edits–Percentageof
edits
bybo
ts,and
amon
ghu
man
edito
rs,p
ercentageof
edits
byun
registered
humans
Adm
ins–Adm
inistrators(editorswith
administratingrespon
sibilities
Origin–Maincoun
tryof
origin
and%
oflocalized
page
edits,resp.
page
view
sSou
rces:Mostdata(Con
tent,U
sers,E
ditorsandUsage)31
Aug
ust20
18from
http://stats,wikipedia,org/EN,/sitemap,htm
Sou
rce:https://stats,wikim
edia,org/wikim
edia/squ
ids/Squ
idReportPageE
ditsPerLangu
ageB
reakdo
wn,htm
OriginPageview
sdataarefor1–30
June
2018
,Sou
rce:https://stats,wikim
edia,org/wikim
edia/squ
ids/Squ
idReportPageV
iewsPerLangu
ageB
reakdo
wn,htm
NBno
dataforWaray-W
aray
andCebuano
notincluded
inthestatistics,notso
largeatthetim
ea O
riginPageedits
dataarefor1July
2009
–30September20
13(since
thesestatisticsareno
tprov
ided
anym
ore)
12 V. Mamadouh
Multilingualism in Practice
Wikimedia provides a wide range of statistics about its own activities includingWikipedias at https://stats.wikimedia.org/, generating tables about articles, views,editors, etc. for the different Wikipedias. They document the wide differencesbetween the Wikipedias and the uneven representations of languages (Table 1).These impressive figures should not distract the observers from the deep inequalitiesbetween the language versions: in size and in quality (comprehensiveness, readabil-ity, and reliability).
An earlier comparative study of different Wikipedias warns about comparingsheer statistics provided by Wikipedia. For example, counting articles gives anapproximation of the size of each Wikipedia, but remains a very global estimate.Articles widely differ in size and quality. Some are very long and detailed, refer tomany sources, and feature plenty of illustrations. For example, the article about theEuropean Union in English at https://en.wikipedia.org/wiki/European_Union is26 pages long if printed and includes almost 300 references and a large number ofmaps, tables, and pictures. But some articles are one-sentence stubs that are veryincomplete entries (e.g., the entry about the EU in Gagauz at https://gag.wikipedia.org/wiki/Evropa_Birlii), and sometimes articles appear in the wrong language (cre-ated or copied to another – often with the intend to get them translated – but not yettranslated and who knows for how long they are in limbo) (Van Dijk 2009, p. 236).In a sample of 50 random articles for 53 language editions, Van Dijk found somemajor differences between what he calls “real” and “pseudo” articles (the first beingcontent rich and with strong editing by individuals) with the Japanese Wikipediaranking 100% real articles, while others were as low as 10% (Upper Sorbian andCorsican) and still others at 0% (in the artificial language Volapük). He uses thesefindings to distinguish four groups of Wikipedias:
1. Large ones (like German, French, Dutch, Russian, and Chinese with over100,000 articles in 2008) covering a vast range of subjects and very active
2. Medium ones (like Catalan or Esperanto) quite similar, but more modest, morethan 10,000 articles in 2008
3. Small ones (like Afrikaans, Swahili, and Bavarian) with then over 1000 realarticles according to his estimation, covering only fragments of human knowl-edge and not very active
4. Micro ones with very few real articles and hardly active (his 2008 sampleincluded Scots, Zealandic, and Tok Pisin) (Van Dijk 2009, pp. 237–238).
Almost 10 years later, the size of the Wikipedias has changed, especially thelargest ones have become much larger, all now with more than one million articles(as shown earlier in this chapter), but the typology combining size and activity is stilla useful categorization.
Generally speaking, the community does not promote the translation of articleswithout localization in the societal context associated with the language to serve theintended audience. For example, the article about James Joyce in Italian is expected
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 13
to feature more details about his life in Trieste than the article in English, while thearticle about a US movie in language X is expected to feature details about thecirculation and reception of the movie in that language zone, including the name ofthe voice actors dubbing for the main characters (when dubbing is applied) or thename of the translators of the subtitles.
Some versions have expanded dramatically using machine translation through thework of bots or web robots generating articles by translating them automaticallyfrom the other Wikipedias, often the English Wikipedia. In 2007 already the VolapükWikipedia increased from about 800 articles to over 110,000, thanks to the use ofbot-generating stubs (sketches of articles that are largely empty and need to be filledby editors with additional information). The most productive bot is called lsjbot; ithas been created by the Swedish Wikipedian Sverker Johansson and used to generatecontent for the Swedish, Cebuano, and Waray-Waray (the latter two are languages inthe Philippines where his wife is from) especially about animals and plants. Heproudly calls himself on his personal page https://en.wikipedia.org/wiki/User:Lsj asthe single biggest producer of articles. All three languages are now ranked in the top15 (over one million articles). Despite these achievements, the use of bots andmachine translation is highly disputed. In Winter 2017–2018, a lively discussionensued about closing the CebuanoWikipedia altogether because it consists mainly ofbots-generated content (99% of the edits, see Table 1). The proposal to close it waseventually rejected, but is likely to resurface in the near future since the generaldiscussion goes on.
The disagreement between proponents and opponents of machine translationboils down to quality assessment. The proponents think it is better to have a (poorly)translated article than nothing, the other fear that it circulates hegemonic Anglo-American representations. Moreover, it is disputed whether it is easier and moreconvenient for local editors to edit a translated article and localize it or to create anew one from scratch.
Notable disputes about new articles have been known as long as there have beenongoing debates between the inclusionists and the deletionists (Ford 2011,pp. 262–263); the latter are also called exclusionists. The first favor the inclusionof new articles, even if short and/or poorly written (note that this could mean also in anonstandard version of the language). The second prioritize quality and favor thedeletion of articles that do not match their high standards. The dispute was partic-ularly serious among German Wikipedians (Wikimedia Deutschland 2011: espe-cially pp. 164–182). The dilemma is as old as the making of encyclopedias. AndAchim Raschka, the German editor opposing the use of bots, referred to this point inthis context to an entry written by Denis Diderot for the Encyclopédie in 1751, titled“Aguaxima”:
Aguaxima, a plant growing in Brazil and on the islands of South America. This is all that weare told about it; and I would like to know for whom such descriptions are made. It cannot be
14 V. Mamadouh
for the natives of the countries concerned, who are likely to know more about the aguaximathan is contained in this description, and who do not need to learn that the aguaxima grows intheir country. It is as if you said to a Frenchman that the pear tree is a tree that grows inFrance, in Germany, etc. It is not meant for us either, for what do we care that there is a tree inBrazil named aguaxima, if all we know about it is its name? What is the point of giving thename? It leaves the ignorant just as they were and teaches the rest of us nothing. If all thesame I mention this plant here, along with several others that are described just as poorly,then it is out of consideration for certain readers who prefer to find nothing in a dictionaryarticle or even to find something stupid than to find no article at all. (quoted in WikipediaSignpost 29 June 2013, available at https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2013-06-19/News_and_notes)
Geographical entries are particularly well suited for such exercises, since a lot ofstubs can be generated stating no more than the existence of a place (see, e.g., theentry for Logan, Utah, in Turkish Wikipedia at https://tr.wikipedia.org/wiki/Logan,_Utah or the Finnish city of Mikkeli in Greenlandic https://kl.wikipedia.org/wiki/Mikkeli or in Chinese https://zh.wikipedia.org/wiki/%E7%B1%B3%E5%87%B1%E5%88%A9). Machine translation or editing based on foreign articles do notnecessarily use the English Wikipedia as starting point. Articles in a related language(linguistically and/or culturally) might prove much more useful, for example, using aDanish article to produce one in Bokmål, a Czech article to produce one in Slovak, aPortuguese article to produce one in Galician, an Indonesian article to produce one inMalay, etc.
A final reason to acknowledge that Wikipedia mirrors global linguistic diversityincluding existing inequalities and hierarchies between languages is the realizationthat English is the editorial and auxiliary metalanguage, in other words English as aco-language (Ensslin 2011; Mamadouh 2018) used for discussions among editorsand administrators across Wikipedias and other Wikiprojects, not to mention theWikimedia Foundation itself. This typically is both the enabling and disablingfunction of English (Ensslin 2011, p. 555), making cross-cultural communicationpossible, but at the same time disabling some users/editors and favoring those ingood command of the language, especially native speakers. In that sense, one mightwonder if the English Wikipedia is an example of the use of English as lingua franca,i.e., a language shared by everyone or an instrument of English hegemony, i.e., thehegemonic position of the core speakers of English (the Anglosphere dominated bythe UK and the USA). In any event, the English Wikipedia is different from theothers because it clearly serves a global audience, while other versions serve morelocalized audience, even if the Portuguese, Spanish, and French Wikipedias alsoserves a public spread across different continents.
Wikipedia as a Microcosm
As an encyclopedia, and as a set of monolingual encyclopedias linked together,Wikipedia reflects the uneven relations between languages as well as the differencesbetween languages regarding their status, prestige, and resources. But as an online
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 15
community or as a network of online communities of editors, Wikipedias also dealwith lingual diversity by “doing” multilingualism. As such Wikipedia is a micro-cosm of global linguistic diversity.
Negotiating Multilingualism
Multilingual users navigate between language versions to find the information theyneed if no relevant article is published in their first language and to find the mostcomplete article in the languages they can read. But often the purpose is to comparethe content of the articles in different language versions or sometimes as a translatingtool between the languages they know in order to enlarge their vocabulary (i.e., tocheck how to translate the name of a disease, a plant, or a specific place or the propertransliteration of a person’s name in another script). This multilingual reading ofWikipedia greatly enriches multilinguals, expanding their mastering of the differentlanguages as well as their understanding of cultural or environmental differences(e.g., the arrival of swallows associated with spring in French sayings, but withsummer in Dutch ones who are linked to the location of the core areas of bothlanguages: after the winter swallows arrive sooner in the south of France than in theNetherlands). It is also a reality check on taken-for-granted assumptions about theglobal notoriety of national artists, scientists, or politicians when they seem notworthy of a page in another language.
Studies have shown that editors work across language versions. According toHale (2014a), 15% of active Wikipedia editors are multilingual and edit in multiplelanguage versions. They sometimes edit in languages they do not master well inorder to comply with layout rules or systematize interwiki links. Cross-languageediting has been studied by Hale (2014a, b); he found a strong negative correlationbetween the size of the group of users primarily editing a language edition and thepercentage of multilingual editors. In other words, he found a higher level ofmultilingualism among smaller language editions and a lower level of multilingual-ism among the larger language editions, with the Japanese Wikipedia being the mostmonolingual. Hara and Doney (2015) compared English and Japanese editorsediting articles about Okinawa and found differences in content and interactionstyle (also between the interventions of the two languages for bilingual editors).Kim et al. (2016) analyzed the content written by multilingual editors in the English,German, and Spanish Wikipedias and found differences among the characteristics ofthe editors, the policies they adopted, and their behaviors. The English Wikipediahas the largest and most varied number of multilingual editors by any primarylanguage (only 33% are primary users of English). Editors whose primary languagewas Spanish or German made more complex edits than those who edited in theselanguages as their second (or third or fourth. . .) language. By contrast editorsworking in the English Wikipedia as their second (or third or fourth...) languagemade edits as complex as primary users of English. It suggests that the Englishmultilinguals were editing interwiki links, adding illustrations, standardizing layout,etc., while multilinguals working in English contribute to the contents. These
16 V. Mamadouh
findings stress the role of the English Wikipedia as a common source for a globalcommunity. Content is provided and edited by a linguistically diverse population.
Many studies have used Wikipedias as a source of information about cross-language and cross-cultural differences and to compare Wikipedias and their con-tent. Despite the many interwiki links, it is very common that articles have nocounterparts in another edition. For example, local politicians, artists, or scientistsmight have only a page in the main language of their country of origin. By contrastDonald Trump (since his election as president of the USA), Wolfgang AmadeusMozart, and Albert Einstein are likely to have an article in a very large number ofWikipedias. Likewise, the USA, the United Nations, Paris (France), and the OlympicGames are likely to be well covered in most languages, while Moresnet, the IndianOcean Tuna Commission, the small community of Paris, Iowa, or the DutchChampionship Frisian handball will generate many fewer entries.
Academic studies have systematically compared the coverage of mainWikipedias in terms of topics. They report very moderate overlap, even betweenlanguages with a similar cultural background and a very active community of editors(e.g., only 51% overlap between the English and the German, according to Hechtand Gergle 2010). A large number of articles are specific to a single Wikipedia (75%according to Hecht and Gergle 2010). Even smaller ones have a specific knowledgeand are centered on the place where the language is used. Others have ranked peoplethe most covered (in the most languages), networked (through links to other articles),and consulted by users. The Swedish botanist Carl Linnaeus scores high, forexample, because in most languages most articles including those on botanicalclassification refer to an article about him.
Samoilenko et al. (2016) have carried a cluster analysis based on co-editingamong 110 Wikipedias’ clustering languages that share a large number of concepts,that is, articles about the same concept that are linked together by an interwiki link.The resulting map (Fig. 2 on p. 8 to be found in the open access article at https://epjdatascience.springeropen.com/track/pdf/10.1140/epjds/s13688-016-0070-8)shows 23 clusters, based on linguistic distance (Romance languages), but alsogeographical proximity (e.g., Hungarian, Czech, Slovak, Romanian, and Esperanto;Japanese, Korean, Chinese, and Thai; or Scandinavian languages and Finnishdespite important linguistic differences among them). Likewise, different patternsof controversies have been studied (Apic et al. 2011; Yasseri et al. 2014) identifyingin samples of language clusters on the basis of shared contested topics.
Creating a New Wikipedia: How to Make Your Language Countin the World
As a community, the editors of Wikipedia have also to make decisions about thecreation of new Wikipedias. The procedure they have developed, for what it isworth, is important since Wikipedians function as gatekeepers. The language pro-posal policy has been codified following a proposal in June 2006 (that can still beconsulted at https://meta.wikimedia.org/wiki/Language_proposal_policy) following
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 17
the sharp increase of the number of language versions (already 250 by then). Alanguage committee was established (presently 14 members) and publishedresources to make the procedure more transparent with handbook for the requestsfor a new language and for the editors who want to participate in the decision-making process (see details about both members of the committee and for thehandbooks at https://meta.wikimedia.org/wiki/Language_committee).
The specific steps to be followed when filing a request for new language arespecified:
• Check that the project does not already exist.• Obtain an ISO 639 code (for the language name).• Ensure the requested language is sufficiently unique that it could not exist on a
more general wiki.• Ensure that there are a sufficient number of native editors of that language to merit
an edition in that language.(see https://meta.wikimedia.org/wiki/Requests_for_new_languages)
The language community needs to develop an active test project and ensure it isactive until approval (this is checked by a bot; the threshold is three active editors,i.e., editors having committed at least one edit in the past 30 days). There arerequired MediaWiki interface translations (to implement the architecture of thewebsite and the interwiki links) in a process called localization.
The language code (a valid ISO 639-1 or 639-3 language code like en for English)should be accompanied with the name of the language in English and in the languageitself (as it will eventually appear on the list of available languages for the interwikilinks). The URL is modelled after the language code and reads like langcode.wikiproject.org and eventually, if accepted, langcode.wikipedia.org.
If there is no valid ISO 639 code available, the volunteers must obtain one, i.e.,convince the standards organization to create an ISO 639 code. The evolution of theglobal regime of language recognition has been analyzed by Tomasz Kamusella whosituates its origins in the Summer Institute of Linguistics (SIL) established in 1934 inthe USA and closely connected to the Wycliffe Bible Translators (Kamusella 2012,p. 71) and their need to assess and classify indigenous languages to which the societywanted to bring the scripture. Since 1946 it merged into the United Bible Societies(UBS) which by 2010 have made the New Testament available in 1231 languages(Kamusella 2012, p. 71). With computer storage revolutionizing libraries in the1960s, classification was later driven by the Library of Congress and other nationallibraries in need of a standard to categorize their holdings in a systematic way(Kamusella 2012, p. 63). In 1967 the International Organization for Standardization(ISO) came up with the ISO 639 standards covering the 184 main languages withtwo-letter codes. Eventually the Internet and its standards extended possibilities toinclude languages and the ISO-3 codes do not rely on Romanization anymore but onUnicode, an Internet standard that supports over 600 languages in about 160 scripts(Kamusella 2012, p. 65).
18 V. Mamadouh
Invented language codes have been allowed occasionally (for 13 Wikipedias atthe moment) mostly because standard language codes were not available (e.g., forSimple English, Ripuarian, or Dutch Low Saxon), but sometimes instead an officialcode or a new code was created after the Wikipedia page (e.g., for Samoglitian,Norman, Cantonese, or Classical Chinese).
Each proposal is discussed online with proponents of and opponents to the newWikipedia, and arguments are filed (often in English or with an English translation).“The project will be assessed on its linguistic merits and chances of flourishing.”Discussions are public and can be retrieved via the same webpage (https://meta.wikimedia.org/wiki/Requests_for_new_languages). They might be very procedural,that is, participants tend to oppose languages with no ISO code, taking the existenceof a code as an indicator of the singularity and notability of any language. Othersoppose this rigid interpretation, but suggest the proponents seek to convince ISO firstor allow for exceptions.
Other arguments revolve around the singularity of the language and its relationswith other languages. When the proposed version is commonly perceived as a localvariety (e.g., when Valencian is conceived as a variety of Catalan or Cajun French asa variety of French), a newWikipedia is opposed, but arguments to prove similaritiesand differences can be linguistic (differences or similarities in vocabulary or syntax)or political (in the case of Valencian, the reference to the Valencian AutonomousCommunity, as distinct to the Catalan one, in Spain). Although political argumentsare not receivable, since Wikipedia is meant to serve individuals not politicalcommunities, the political status of a language does intervene as it is closely linkedto different cultural practices. For example, different political and social institutionsgenerate different vocabularies between Belgium and the Netherlands or the USA,Canada, and the UK.
With regional languages, the discussions often revolve around the respectiveadvantages to have one Wikipedia for one language family or of separate Wikipediasfor more local language, and by definition smaller, ones. For example, the discussionabout the eligibility of Gronings (a Dutch Lower Saxon variety) was opposed asdetrimental to the existing Dutch Lower Saxon Wikipedia (including Gronings)despite the fact that Gronings had a specific ISO code and is often acknowledgedas a specific language and emphasizing that using specific dialects was allowed inthat Wikipedia. Therefore, the existing Wikipedia was seen as a place to cultivateintercomprehension between the different dialects. By contrast, the existence of aspecific Dutch Lower Saxon Wikipedia had been justified – despite the absence of aISO code for it and despite dialect continuum among Lower Saxon dialects acrossthe border between the Netherlands and Germany – by the fact that speakers on eachside of the state border use the national language (Dutch and German, respectively)to write in their regional language and to create new words such as cell phone (seealso Van Dijk 2009 on these languages).
The number of living native speakers should be sufficient to form “a viablecommunity and audience.” Nevertheless, Wikipedia has been proposed for historicallanguages such as Latin and for artificial languages such as Esperanto, and theycould be approved (and maintained) if “a reasonable degree of recognition as
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 19
determined by discussion” (the phrasing is from the policy, see https://meta.wikimedia.org/wiki/Language_proposal_policy) is demonstrated, according to thelanguage committee. There are Wikipedias in Esperanto, Ido, Interlingua,Interlingue, Lojban, Volapük, and Novial, all constructed languages with a sizeablecommunity of users, but Toki Pona has been removed for lacking any societalgrounding.
If a language is verified as eligible, the project can be developed in the incubator(“at least five editors must edit that language regularly before a test project will beconsidered successful”), and eventually the Wikipedia can be published (or closed ifthere is little or no activity). To develop the project, a list has been consolidated of athousand articles every Wikipedia should have pertaining to basic content regardingbiographies, history, geography society, culture, science, technology, and mathemat-ics. When the test wiki is finally approved, all pages developed in the test wiki aretransferred to the actual Wikipedia.
Over time, ten of the smallest Wikipedia have been closed due to inactivity. Somewere set back in the incubator where the community can develop it further. The listalso mentionedWikipedias that have been deprecated (Alsatian has been extended toAlemannic, Akan which is now considered a family of languages, but one Wikipediais now in the Twi language (one of the Akan languages)). Some wikis (Toki Ponaand Klingon, the first an artificial language, the other a fictional one) do not belong tothe Wikipedia family anymore; they have been removed and are now hosted bywikia.com as well as a few rejected proposals (Lingua Franca Nova, Korean Hanja,and Prussian). Moldovan was closed because Moldovan was found to be a version ofRomanian (even according to the 1989 Language Law of Moldova) written inCyrillic, and there is a software to navigate the two scripts. Finally one languagehas been deleted: it is infamous case of the so-called Siberian language in 2006, aterrible embarrassment for the community (although it was created before thelanguage policy was put in place following a proposal in June 2006, the veryname given to that language should had alarmed editors); it was a fantasy languagecreated by Yaroslav Zolotaryov, a Siberian separatist with a misogynic, xenophobic,and anti-Semitic political agenda.
Disputes About Languages
At first sight the criteria formalized in the language policy seem rather straightfor-ward, but they are not. Even the number of editors is disputable, and having aWikipedia can be a strategic step toward the revitalization of a language. A rathersmall group might start a virtuous circle mobilizing a larger number of participants.By contrast an enthusiastic group might quickly become exhausted and demotivatedand stop writing articles.
Two other criteria are even trickier and might generate some discussion in specificcases. Mutual intelligibility is not a technical matter: it greatly depends on attitudeand exposure, because linguistic differences reverberate power relations. The namegiven to a language (its recognition as a language as opposed to being a variety, a
20 V. Mamadouh
dialect, a sociolect, or a regiolect of another language) is politically motivated. Smalldifferences in language use (pronunciation, vocabulary, syntax, idioms, etc.) pertainto linguistic norms and cultural differences between social groups, geographicallybounded or not. Cultural differences are easier to tackle than linguistic ones. Anarticle can provide information about different practices in different places and indifferent sections and make them visible with subheadings. For example, an articleon academic degrees or on municipal institutions in English will discuss the matter ingeneral terms before presenting the practices and associated concepts in differentsocietal contexts, typically the USA, Canada, the UK, Ireland, and Australia.
By contrast issues regarding the linguistic code itself are less easy to solve in thearticle itself. Especially this is true when the language is monocentric, with a clearhierarchy between different varieties, such as French when the editors writing inanother variety will be “corrected” and see their language framed as incorrect andimproper for Wikipedia purposes. In English, similar problems arise for varietiesbeyond British and American English (but even between them regarding vocabularyand spelling, e.g., colour/color) especially postcolonial World Englishes (Kachru1996) such as Indian, Nigerian, or Singaporean varieties. The style rules specify thatconsistency within an article should be achieved. Furthermore, the preferred varietydepends on the content: for localized topics the spelling of the local variety isfavored and for general topics the spelling and vocabulary most commonly used(meaning generally that the variety used by the largest group of speakers, i.e.,American English, French of France, Brazilian Portuguese, German standard Ger-man, etc.).
The hegemony of the core users of a language applies both to the decision to addan article on a specific topic (that can be considered not worthy of an entry by editorswho come from the core) and to the syntax and the vocabulary used in an entry.Notable disputes arise around the use of measurements especially in the EnglishWikipedia: most editors favor the metric system as a global scientific system, butothers argue for American imperial metrics since the metric system is mystifyingmost of the American readers, the largest group of English speakers. Again, theglobal status of the English edition is the cause of the debates. The importance of atopic can also be very local and rejected by a wider community. There was, forexample, a controversy surrounding an article for the English Wikipedia aboutMakmende, a Kenyan fictional superhero reactualized in a video that went viral in2009, but that, nevertheless, was not seen as noteworthy for an entry by otherEnglish-speaking editors (see Ford 2011). The entry has since been stabilized (seehttps://en.wikipedia.org/wiki/Makmende).
The naming of the language is particularly sensitive. The existence of aWikipediacan become important in political struggles as an endorsement of claims to auton-omy. Political issues are particularly significant when nationalist separatist or irre-dentist movements are mobilizing around language distinction. Most notably,controversies emerged around the disintegration of Yugoslavia and the process ofdistinction that was strategically devised among the languages of the successor statesof Yugoslavia: Croatian and Bosnian as opposed to Serbian. Montenegrin was
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 21
rejected four times by the Wikipedia’s language committee, and a new discussionhas been started in 2018 after an ISO code was attributed to the language. The Serbo-Croatian edition was revived after some discussion in 2005 by proponents ofmaintaining the language and a Wikipedia version promoting intercommunicationbetween Wikipedians in the successor states of Yugoslavia and promotingintercomprehension between speakers of Serbian, Croatian, Bosnian, and Montene-grin to resist national language policies promoting further distinction between themfor political reasons. Discussions about these languages show that some stronglyoppose linguistic diversity within in the Serbo-Croatian Wikipedia (e.g., the use ofdifferent dialects and foremost Latin and Cyrillic scripts, and Serbian is more andmore associated exclusively with the latter which was not the case in the past). Ingeneral code mixing seems to generate opposition. The Norwegian Wikipediaoriginally accepted articles in both official languages Bokmål and Nynorsk, butthe users of Nynorsk felt marginalized, and in 2004 they split. There are now twoNorwegian Wikipedias, one in Bokmål (no.wikipedia.org) and a much smaller onein Nynorsk (nn.wikipedia.org) (see also Van Dijk 2009, p. 243). Egyptian Arabic hasits own Wikipedia, next to Arabic, and that has been disputed too, for fear offragmentation of the latter if other varieties of Arabic also granted a separate version(Moroccan and Algerian have been verified as eligible, but a proposal for SouthLevantine Arabic, a Lebanon variety written in Latin script, has been rejected).
Next to disputes about creating or not a new Wikipedia, debates about proposalsfor closure are also worth considering. An overview and more information aboutclosure proposals (and sometimes removal proposal) are available at https://meta.wikimedia.org/wiki/Proposals_for_closing_projects. The arguments put forward byWikipedians to convince each other and move toward a consensus are insightful andecho political disputes about languages and linguistic identities in society at large.They also provide information about the working of Wikipedias since they reflectupon existing editing practices.
One dispute is slightly different, but particularly insightful regarding globallinguistic diversity. It pertains to Simple English, a Wikipedia written in controlledEnglish. With over 100,000 articles, it is a medium-sized Wikipedia. The SimpleEnglish Wikipedia is meant for “people with different needs, such as students,children, adults with learning difficulties, and people who are trying to learnEnglish” (as phrased on https://en.wikipedia.org/wiki/Simple_English_Wikipedia).Editors use simpler vocabulary, shorter sentences than in the regular English version,and stick to commonly accepted facts (Dowling 2008, see also Yasseri et al. 2012).
In 2018 a third attempt to close Simple English led to a long discussion (54 printedpages) that shows the divide between Wikipedians. Arguments for closure include thefact that Simple English is not a separate language and that the Wikipedia should bemerged with the English Wikipedia. Others criticized the poor quality and consistencyof its articles and its liability to vandalism. It is also said to distract resources(volunteers’ time) from other Wikipedias and more notably from efforts to write inplain English in the regular English Wikipedia. Its raison d’être is contested: simpli-fication is seen as contradictory to encyclopedic comprehensiveness. Its efficiency is
22 V. Mamadouh
contested: it does not reach the intended audience, that is, students and languagelearners do not know about it. Therefore, some want to close it altogether; others wantto add more visible tabs on English articles to improve the interconnection betweencomprehensive and simple articles about the same topic. But then again, others want toprevent any association between English and Simple English for fear that what theyperceive as poor quality of the latter might taint the reputation of the former.
Clearly Simple English is experienced in very different ways. Foreign learnerswrite they want to read the real thing and to learn English, not some simplifiedlanguage. On the other hand, some editors claim to read Simple English articles intechnical domains they do not master, and some non-native speakers of Englishreport they dare write and edit an article for Simple English, but not for the EnglishWikipedia. Finally, the exceptional status of Simple English is resented, as has beennoted a number of times, and proposals to create another Wikipedia in simplelanguage in other languages have been rejected (on the ground that they were noseparated language and they had no language community). But, again SimpleEnglish was created before these rules were formulated and adopted. All in all theproposal to close Simple English was rejected on August 1, 2018. But the discussionis likely to be raised again in the near future.
Finally, and in a similar vein, the question is whether or not to have severalWikipedias in the same language, but this time using different scripts. This is anotherfascinating ongoing discussion about the creation of a Chinese Wikipedia in Pinyin(in the Latin alphabet instead of ideograms). Originally, there were virtually twoChinese Wikipedias under the names of “zh” (or “zh-cn”) and “zh-tw,” but from2005 it has been made redundant by the availability of an automatic system toconvert between traditional and simplified Chinese. Like Simple English, a PinyinWikipedia could be helpful for learners of Chinese, either children or foreigners. Thematter is at the time of writing still under deliberation.
Wikipedia as a Motor
A last aspect to consider is whether Wikipedia can sort our changes taking place in alanguage offline. As we have seen, creating a Wikipedia version is a demandingprocess, and it could be well seen more and more as an up-to-date criteria to measurethe vitality of a language and as such become an important aspect of sociolinguisticsand language policies.
The complicated impact of Wikipedia (and more generally speaking Web 2.0) on“weaker” languages, i.e., language without much institutional support, needs morescrutiny. On the one hand, it offers an incredible opportunity (low-cost productionand circulation of content), while on the other it generates a huge pressure onvolunteers to standardize and harmonize their languages. Typically issues concernlocal varieties, spelling, and vocabulary especially when neologisms are needed(also contradicting Wikipedia core principle regarding “no new knowledge”). More-over, Wikipedians are not or may not necessarily be well connected to establishedlanguage activists and their institutions.
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 23
Baxter (2009) examined the process in more details for the Breton language,showing how the Breton Wikipedia evolves from being a terminology consumer tobeing a terminology provider. Disputes revolve around the rejection of Frenchcalques (as a result of some kind of inferiority complex) and alternatives such asthe use of international calques for the nativization of new words (but then againwhich one? ie. Latvia following the name of the country in Latvian, English, andSlavic languages or Letonia following German, Dutch, Scandinavian, and Romancelanguages, cited in Baxter 2009, p. 70), borrowing from other Celtic languages(Welsh especially) as well as different shades of purism (reformist purism, elitistpurism, xenophobic purism) (see Baxter 2009, pp. 71–72) and a preference forcontinuity with earlier translations (especially to serve children enrolled in Bretonlanguage schools). By its sheer volume, the Breton Wikipedia is the biggest corpusin Breton today and the only encyclopedia in Breton, and, therefore, it is bound to beinfluential in the evolution of the language, especially since in this case no reliablesources like established dictionaries and encyclopedias are available; Wikipedia is,unavoidably, becoming itself such a source of language standardization.
Regional languages lacking full-fledged state support are only one kind of lesser-resourced languages greatly affected by Wikipedia. Van Dijk (2009) compares themwith third world languages (sic!) by which he means poorly institutionalized lan-guages from the Global South; his analysis does not include Portuguese or Spanish,but he deals with Bahasa Indonesian, Arabic, and Swahili. At the time poor access tothe Internet, lack of software standards (for long ASCII signs did not deal properlywith Arabic characters), and censorship were likely explanations for their poorvisibility in Wikipedia. Van Dijk also notes differences between them using localgeographical knowledge as an indicator: the Swahili Wikipedia covers cities in theregion much less than the English Wikipedia, the situation is slightly more balancedfor Arabic, and Indonesian has more regional city content than the EnglishWikipedia. This suggests different roles for English in these regional contexts.
More generally Van Dijk (2009) signals interesting absences that relates to thedifferent repertoires of multilingual speakers. For example, he observes that theAfrikaans Wikipedia is relatively small and most likely because Afrikaners do notfeel the urge to create it and as they consult routinely the English Wikipedia instead.By contrast the Luxembourgian Wikipedia is much larger than expected despite thediglossic situation in Luxembourg and the role of umbrella language German as thestandard language. But then again Swiss Germans seem to have no incentive tocontribute to an Alemanic Wikipedia, and they happily consult the GermanWikipedia, since High German is the proper language for an encyclopedia (VanDijk 2009: 246). These examples show that Wikipedia can be both a tool of languagerevitalization and a tool of further marginalization. This is not only influencing thebalance between minority and majority language in regional contexts (e.g., Bretonand French in Bretagne) or between low and high varieties (e.g., Luxembourgian andGerman in Luxembourg). It can also affect the balance between national languagesand English as global language. This phenomenon has been signaled very stronglyfor Icelandic which has been threatened by digital extinction in the age of the EnglishInternet (see Henley 2018). Wikipedia contributes to this development even if there
24 V. Mamadouh
is an Icelandic Wikipedia featuring presently (summer 2018) less articles than theLuxembourgian one (see https://stats.wikimedia.org/EN/Sitemap.htm).
Last but not least, Wikipedia is producing new bridges between languages. Theinterwiki links allow for smooth crossing between any pair of languages (as long asthey feature articles identified by editors and bots as dealing with the same topic).This is important because it undermines the hub function English has in the words oftranslation (both for literary and scientific publications) and is identified as thehypercentral function of English (De Swaan 2001). In Wikipedia a user can developher or his bilingualism and knowledge through navigation between let say Finnishand Maltese, German and Russian, Chinese and Japanese, or Spanish and Cebuano,without necessarily passing through English. This “crossing” shows that Wikipediacan be both a tool of the promotion of English on the one hand (through the role ofEnglish as a language of communication among editors of different Wikipedias andthrough the wide use of the English Wikipedia), and on the other hand, it offers toolsto counterbalance the hegemonic position of English.
Conclusion
Wikipedia, as an interlinked set of monolingual Wikipedias, shows a particularlysustained engagement with linguistic diversity. It is both a mirror and a motor ofglobal linguistic diversity, which is represented in all its complexity, includingdifficult power relations between language groups. It has developed complex poli-cies and mechanisms to regulate the creation of newWikipedias and at the same timecontributes to the changing world language map through the many bridges it cancreate between languages. The role of multilingual editors is particularly importantin shaping the relations between Wikipedias. It is, however, dependent on localcontingencies whether the tool adds to the pressure of the growing use of English onthe Internet or whether it provides opportunities to revitalize smaller languages.Likewise, the special position of the English Wikipedia as a global encyclopediaserving both an English-speaking monolingual audience in the hegemonic powerand a global, multilingual audience is noteworthy.
References
Apic, G., Betts, M., & Russell, R. (2011). Content disputes in Wikipedia reflect geopoliticalinstability. PLoS One, 6, 1–5.
Bao, P, Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. (2012). Omnipedia: Bridging theWikipedia language gap. In CHI-12. Austin.
Baxter, R. N. (2009). New technologies and terminological pressure in lesser-used languages: TheBreton Wikipedia, from terminology consumer to potential terminology provider. LanguageProblems & Language Planning, 33, 60–80.
van Dijk, Z. (2009). Wikipedia and lesser-resourced languages. Language Problems & LanguagePlanning, 33, 234–250.
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 25
Dowling, T. (2008, January 15). Wikipedia too long-winded for you? Try the simple version. TheGuardian.
Ensslin, A. (2011). “What an un-wiki way of doing things” Wikipedia’s multilingual policy andmetalinguistic practice. Journal of Language and Politics, 10, 535–561.
Ford, H. (2011). The missing Wikipedians. In G. Lovink & N. Tkacz (Eds.), Critical point of view:A Wikipedia reader. Vol. INC reader #7 (pp. 258–269). Amsterdam: Institute of NetworkCultures.
Hale, S. A. (2014a). Multilinguals and Wikipedia editing, WebSci’14.Hale, S. A. (2014b). Cross-language Wikipedia editing of Okinawa, Japan. CHI.Hara, N., & Doney, J. (2015). Social construction of knowledge in Wikipedia. First Monday
20(6) 1 June 2015.Hecht, B., & D. Gergle. (2010). The tower of babel meets Web 2.0. In CHI2010 Atlanta.Henley, J. (2018, February 26). Icelandic language battles threat of ‘digital extinction’, The
Guardian.Kachru, B. B. (1996). World Englishes: Agony and ecstasy. The Journal of Aesthetic Education, 30,
135–155.Kamusella, T. (2012). The global regime of language recognition. International Journal of the
Socilogy of Language, 2012(218), 59–86.Kim, S., Park, S., Hale, S. A., Kim, S., Byun, J., & Oh, A. H. (2016). Understanding editing
behaviors in multilingual Wikipedias. PLoS One, 11(5 e0155305), 1–22.Kopf, S. E. (2018). Debating the European Union transnationally – Wikipedians’ construction of
the EU on a Wikipedia talk page (2001–2015), PhD Thesis Lancaster University.Mamadouh, V. (2018). Do you speak Globish? Geographies of the globalization of English and
linguistic diversity. In R. C. Kloosterman, V. Mamadouh, & P. Terhorst (Eds.), Researchhandbook on the geographies of globalization (pp. 209–221). Cheltenham: Edward Elgar.
Raffestin, C. (1995). Langue et territoire. Autour de la géographie culturelle. In S. Walty &B. Werlen (Eds.), Kulturen und Raum: theoretische Ansätze und empirische Kulturforschungin Indonesien: Festschrift für Professor Albert Leemann. Zürich: Rüegger.
Raffestin, C. (2012). Space, territory, and territoriality. Environment and Planning D: Society andSpace, 30(1), 121–141.
Salor, G. E. (2012). Sum of all knowledge: Wikipedia and the encyclopedic urge, PhD ThesisUniversity of Amsterdam.
Samoilenko, A., Karimi, F., Edler, D., Kunegis, J., & Strohmaier, M. (2016). Linguisticneighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editingactivity. EPJ Data Science, 5, 1–20.
de Swaan, A. (2001).Words of the world, the global language system. Cambridge, UK: Polity Press.Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In search of the Ur-Wikipedia:
Universality, similarity, and translation in the Wikipedia inter-language link network.WikiSym‘12.
Wikimedia Deutschland e.V (Ed.). (2011). Alles über Wikipedia, und die Menschen hinter dergrößten Enzyklopädie der Welt. Hamburg: Hoffmann und Campe.
Yasseri, T., Kornai, A., & Kertész, J. (2012). A practical approach to language complexity: AWikipedia case study. PLoS One, 7(11), e-48386.
Yasseri, T., Spoerri, A., Graham, M., & Kertész, J. (2014). The most controversial topics inWikipedia: A multilingual and geographical analysis. In P. Fichman & N. Hara (Eds.), GlobalWikipedia: International and cross-cultural issues in online collaboration. Lanham: Rowman &Littlefield.
26 V. Mamadouh
Wikipedia and Other Wikimedia Sites Cited in the Text (Apart fromthe Language Versions) (All Last Accessed in the Summer of 2018,Unless Mentioned Otherwise)
https://meta.wikimedia.org/wiki/Language_committeehttps://en.wikipedia.org/wiki/List_of_Wikipediashttps://en.wikipedia.org/wiki/List_of_Wiktionarieshttps://en.wikipedia.org/wiki/Wikimedia_Foundationhttps://en.wikipedia.org/wiki/Wikipedia_logohttps://en.wikipedia.org/wiki/Wiktionaryhttps://es.wikipedia.org/wiki/Marcas_corporativas_de_Wikipediahttps://foundation.wikimedia.org/wiki/Wikimedia_official_marks/About_the_official_
Marks#What_characters_are_on_the_Wikipedia_puzzle_globe.3Fhttps://meta.wikimedia.org/wiki/Language_committee/Handbook_(committee) https://meta.wikimedia.
org/wiki/Language_committee/Handbook_(requesters)https://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_havehttps://meta.wikimedia.org/wiki/Meta:Babylonhttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Cebuano_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Siberian_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Simple_English_
Wikipedia_(3)https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Yiddish_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Deletion_of_Siberian_Wikipediahttps://meta.wikimedia.org/wiki/Requests_for_new_languageshttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Chinese_(Pinyin)_2https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Groningshttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Jazayrihttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Montenegrin_5https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Moroccanhttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_South_Levantine_
Arabichttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Valencianohttps://stats.wikimedia.org/EN/https://stats.wikimedia.org/wikimedia/animations/growth/AnimationProjectsGrowthWp.htmlhttps://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.htmlhttps://stats.wikimedia.org/wikimedia/squids/SquidReportEditsPerLanguageBreakdown.htmhttps://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htmhttps://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htmhttps://wikimediafoundation.org/about/vision/
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 27