+ All Categories
Home > Documents > Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself...

Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself...

Date post: 08-Aug-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
27
Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity Virginie Mamadouh Contents Introduction ....................................................................................... 2 Linguistic and Political Geographies of Wikipedia .............................................. 3 Wikipedia as a Mirror ............................................................................ 5 A Plurality of Wikipedias ..................................................................... 5 Multilingualism on Display ................................................................... 7 Multilingualism in Practice ................................................................... 13 Wikipedia as a Microcosm ....................................................................... 15 Negotiating Multilingualism .................................................................. 16 Creating a New Wikipedia: How to Make Your Language Count in the World ............ 17 Disputes About Languages ................................................................... 20 Wikipedia as a Motor ............................................................................. 23 Conclusion ........................................................................................ 25 References ........................................................................................ 25 Abstract Wikipedia has long presented itself as the biggest multilingual free-content encyclopedia on the Internet.This chapter examines Wikipedia as a multilingual project from a geographical perspective. It examines how multilingualism is represented, performed, and reproduced by Wikipedians (i.e., users of Wikipedia and more specically the community of editors, as opposed to the much broad public of readers consulting the encyclopedia). The chapter discusses rst the way The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. V. Mamadouh (*) Department of Human Geography, Planning and International Development, Universiteit van Amsterdam, Amsterdam, The Netherlands e-mail: [email protected] © Springer Nature Switzerland AG 2019 S. D. Brunn, R. Kehrein (eds.), Handbook of the Changing World Language Map, https://doi.org/10.1007/978-3-319-73400-2_200-1 1
Transcript
Page 1: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Wikipedia: Mirror, Microcosm, and Motorof Global Linguistic Diversity

Virginie Mamadouh

ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Linguistic and Political Geographies of Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Wikipedia as a Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

A Plurality of Wikipedias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Multilingualism on Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Multilingualism in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Wikipedia as a Microcosm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Negotiating Multilingualism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Creating a New Wikipedia: How to Make Your Language Count in the World . . . . . . . . . . . . 17Disputes About Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Wikipedia as a Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

AbstractWikipedia has long presented itself as “the biggest multilingual free-contentencyclopedia on the Internet.” This chapter examines Wikipedia as a multilingualproject from a geographical perspective. It examines how multilingualism isrepresented, performed, and reproduced by Wikipedians (i.e., users of Wikipediaand more specifically the community of editors, as opposed to the much broadpublic of readers consulting the encyclopedia). The chapter discusses first the way

The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general use.

V. Mamadouh (*)Department of Human Geography, Planning and International Development, Universiteit vanAmsterdam, Amsterdam, The Netherlandse-mail: [email protected]

© Springer Nature Switzerland AG 2019S. D. Brunn, R. Kehrein (eds.), Handbook of the Changing World Language Map,https://doi.org/10.1007/978-3-319-73400-2_200-1

1

Page 2: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

linguistic diversity is mirrored in the organization of Wikipedia through thecoexistence of a plurality of monolingual Wikipedias called after that language(such as English Wikipedia, Portuguese Wikipedia, Japanese Wikipedia, etc.) andthe representations of the links between them. It foregrounds the inequalitiesbetween Wikipedias and the special position of the English Wikipedia. Thechapter then turns to the way Wikipedia is a multilingual environment and tothe dynamics that shape the position of languages in the community – includingdecisions regarding the creation of new Wikipedias in new languages. Finally, itquestions the possible effect of Wikipedia on global linguistic diversity in thephysical world and how it can influence the evolution of specific languages andtheir position in the world and more specifically the position of English as theglobal language of communication.

KeywordsWikipedia · Linguistic diversity · Multilingualism · English

Introduction

Wikipedia has long presented itself as “the biggest multilingual free-content ency-clopedia on the Internet” (quoted in Ensslin 2011: 555, note 1). It now brands itselfas “the encyclopedia that everyone can edit,” but the multilingual and globalambitions remain explicit in the phrasing used in the opening of the Wikipediaarticle about Wikipedia:

Wikipedia [. . .] is a multilingual, web-based, free encyclopedia based on a model of openlyeditable content. It is the largest and most popular general reference work on the Internet.(https://en.wikipedia.org/wiki last accessed 15 June 2018)

In the same vein, Wikipedia founder and original funder, Jimmy Wales, is knownfor having expressed the ambition that the encyclopedia should be “the sum of allknowledge” (a title used in Salor’s dissertation in 2012). Or as the vision of theWikimedia Foundation proclaims:

Imagine a world in which every single human being can freely share in the sum of allknowledge. That’s our commitment. (https://wikimediafoundation.org/about/vision/ lastaccessed 15 September 2018)

All these declarations suggest that Wikipedia is a particularly fascinating websiteto study how the Internet shapes the new language map. This chapter is doing so witha contribution from political geography to Wikipedia studies. It does not deal withlinguistic aspects itself but with geographical aspects of languages (in plural),linguistic diversity, and multilingualism. While the chapter entitled “Writing theWorld in 301 Languages: A Political Geography of the Online EncyclopediaWikipedia” (Mamadouh, this volume) presents a short history of Wikipedia and ashort account of its organization before discussing how it circulates (disputed)

2 V. Mamadouh

Page 3: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

geopolitical representations unevenly across the world, this chapter focuses onWikipedia as a multilingual project. It discusses how linguistic diversity and multi-lingualism are represented and enhanced (Wikipedia as mirror of global linguisticdiversity), how multilingualism is practiced by multilingual Wikipedians, whatlanguage policies have been developed for the establishment of a new languageversion, how languages are sometimes contested (Wikipedia as microcosm of globallinguistic diversity), and how Wikipedia might affect the evolution of specificlanguages and the relations between language groups (Wikipedia as motor of globallinguistic diversity).

Linguistic and Political Geographies of Wikipedia

Since its creation in 2001 as an English language online encyclopedia using wikitechnology, a website enabling users to work collaboratively to modify its contentand structure, Wikipedia has expanded dramatically as becoming the fifth mostvisited website in the world (according to the authoritative webwatcher AlexaInternet) and foremost a multilingual project, with almost 300 language versions.Each version is called a Wikipedia in everyday language in such a way thatWikipedia is actually a collection of monolingual Wikipedias. The oldest and largestone, English Wikipedia, is said to have outnumbered by words the most voluminousencyclopedia of all times, the Spanish language encyclopedia of the Spanish lan-guage Enciclopedia Espasa in December 2005 (Van Dijk 2009: 234). A few monthslater in March 2006, the English Wikipedia reached the symbolic threshold of onemillion articles, followed by German (December 2009), French (September 2010),Dutch (December 2011), and 11 others, since most recently Chinese (April 2018)and Portuguese (June 2018).

Linguists have long tapped into Wikipedia to generate a corpus for their researchabout a single language and about crosslinguistic communication. Sociolinguistshave used it as a valuable source to study transcultural encounters. A majority of thework is quantitative using a large amount of data dumps to search for editingpatterns. Computer scientists use them to trace networks and patterns of meaning,sometimes explicitly looking for a Ur-Wikipedia, i.e., the proto-Wikipedia, behindthe diversities of the different versions (Warncke-Wang et al. 2012, see also Baoet al. 2012). Other studies are more qualitative. Kopf (2018), for example, appliescritical discourse analysis to study the discussions on the talk page of the article onthe European Union in the English Wikipedia between 2001 and 2015 to analyzehow the EU was perceived and represented and how Wikipedians negotiatedcontested issues.

In this chapter languages are not studied for their linguistic characteristics but fortheir sociospatial ones. The objective is to discuss the way Wikipedia engages withthe global linguistic diversity from a geographical perspective. For geographers,languages have spatial characteristics, not only regarding the distribution of theirspeakers but also for the way they are employed to make sense of the world and theways people use them to contribute to place-making. As Claude Raffestin (1995,

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 3

Page 4: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

2012) documented, language and territory are co-constitutive. Informed by politicalgeography and critical geopolitics, this study is sensitive to the way languages areused politically, for example, to justify territorial claims and to make sense of livedterritory and place-making. Territorial claims of states of nationalist movements areoften justified by the foregrounding of language differences and/or similarities, andlanguage claims can be instrumental to territorial projects: promoting a specificlanguage variety as a language can be part of the project to achieve politicalautonomy or independence from the state in which one resides. On the other hand,states can promote one language as the national language and discourage or evenforbid others to homogenize culturally their population and foster a sense of nationalidentity rooted in a common language. Language conflicts (which are of course notconflicts between languages and often are not about languages but about conflictsbetween language groups and about power and resources) have been extensivelystudied by political geographers such as Claude Raffestin, Colin H. Williams, andAlexander B. Murphy.

By contrast the representation of linguistic diversity is an understudied topic ingeography and commonsensically represented as a juxtaposition of monolingualterritories. Monolingualism is taken for granted as default, and it is way too oftenperceived and represented as the norm, both at the individual and the collective level,while multilingualism is seen as a complication or even a danger for sanity andcohesion. In Europe where the modern territorial states emerged after the Peace ofWestphalia, “one state, one nation, one language, one territory” has long been takenfor granted as the ideal for a stable political organization. Many conflicts rangingfrom wars about mixed borderlands through forced cultural assimilation and plannedlinguicide to ethnic cleansing have been justified by narratives proclaiming theobjective of achieving such ideal. More recently there is more attention for the factthat multilingualism is a more common condition and that the way it is organizedsocially and spatially can be studied. But then again, economic globalization andtransnational migration have created new configurations of linguistic diversity, withan extreme linguistic variety in local contexts, of which the urban multilingualism ofNew York City, London, or even Amsterdam, is emblematic.

A linguistic geography of Wikipedia could entail an analysis of the representationof languages as geographic phenomena and as features of geographic objects such asplaces, countries, or networks. Such a geographical inquiry could examine howWikipedians in different Wikipedias represent languages and more specifically theirgeographical reach, language groups, language contacts, and language conflicts.Likewise, it could examine the linguistic data provided in articles about places, forexample, the languages noted as being spoken in a specific country. This focus,however, is not the purpose of this chapter. Instead it will discuss howWikipedias bytheir very existence inform us about languages in plural, i.e., about the globallinguistic diversity and how multilingualism is practiced.

The analysis is based both on secondary literature and on primary documents(Wikipedia articles, talk pages, Wikimedia articles, and debates). Due to the finitelinguistic capabilities of the author, it focuses mainly on sources in English and othermajor European languages such as French, Dutch, German, Spanish, Italian, andPortuguese (all languages with a largeWikipedia featuring over one million articles).

4 V. Mamadouh

Page 5: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

This is problematic because it overlooks the debates in smaller Wikipedias and innon-Western languages (unless they are reported in exchanges in these selectedlanguages) and, therefore, miss important voices and insights. However, mostmeta-discussions on Wikipedia projects, especially those on the opening and theclosure of Wikipedias, are carried out in English. The position of the Englishlanguage as lingua franca and as a hegemonic language (see Mamadouh 2018) inthe world of Wikipedia is, therefore, also an important issue.

Wikipedia as a Mirror

Wikipedia can first be studied for the way it reflects the global linguistic diversity,first by the mere existence of so many linguistic versions and second by the explicitrepresentation of multilingualism, i.e., of contacts and crossings between languages.

A Plurality of Wikipedias

There are two ways to stress the scope of Wikipedia. One is to stress its size (themere number of articles, editors, users, etc. For statistics in real time, see https://stats.wikimedia.org/EN/Sitemap.htm), and the second is to focus on the extraordinarynumber of language versions. New Wikipedias have been created since 2001 andamount now to almost 300 (please see “Writing the World in 301 Languages: APolitical Geography of the Online Encyclopedia Wikipedia” (Mamadouh, this vol-ume)). Figure 1 is a snapshot of an animation showing the extraordinary growth ofthe number of Wikipedias over time including both their creation and their growingsize. This is truly unmatched by other projects. It does represent the global linguisticdiversity adequately in the sense that it represents a fraction of all the languagesspoken in the world (about 7097 according to ethnologue.com of which about halfare not written at all) and that all larger and more institutionalized languages (e.g.,the official languages of independent states) are well represented.

Linguistic diversity is mirrored in the organization of Wikipedia through thecoexistence of a plurality of monolingual Wikipedias called after that language(such as English Wikipedia, Portuguese Wikipedia, and Japanese Wikipedia). ButWikipedia actively engages in multilingualism, not just through the parallel exis-tence of different language versions. Even more significantly, these Wikipedias areprominently linked to each other, through interwiki links, allowing the users toswitch between equivalent articles in different language versions. These articlesare not identical: although sometimes an article in one of the languages has beentranslated from another language, each being the result of a unique process ofcollective editing. Actually, quite a large share of the entries are unique to a languageversion, which means that they are not linked to any entry addressing the same topicin another language. The interwiki links are a prominent feature of a Wikipedia page,bringing linguistic diversity to each monolingual page since the name of the lan-guages in which an equivalent article is available are listed in the original language(e.g., Deutsch for German, Tiếng Việt for Vietnamese, 日本語 for Japanese, etc.) in

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 5

Page 6: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Fig.1

Astill

oftheanim

ationby

ErikZachteof

thegrow

thof

thelang

uage

Wikipedia.(Availableathttps://stats.wikim

edia.org/wikim

edia/animations/growth/

AnimationP

rojectsG

rowthWp.html.Accessed21

September20

18)

6 V. Mamadouh

Page 7: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

the left margin of every page (this is not true for displays on the Wikipedia app formobile devices). Even if users are monolingual and do not click on these interwikilinks, their visibility does convey a strong sense of global linguistic diversity tousers.

Multilingualism on Display

The multilingual character of Wikipedia is announced upfront in the slogan “thebiggest multilingual free-content encyclopedia on the Internet” (a phrase mentionedat the beginning of this chapter) as well as in the logo. It is a three-dimensionaljigsaw puzzle representing a globe, and each piece of the puzzle is marked with aglyph (a letter or a sign) and in a unique script, representing the beginning of thename Wikipedia in a main language in these different scripts (Fig. 2).

Each piece bears a glyph (a letter or other character), or glyphs, symbolizing themultilingualism of Wikipedia. As with the Latin letter “W,” these glyphs are in mostcases the first glyph or glyphs of the name “Wikipedia” rendered in that language. Theyare as follows:

• Near the center is Latin ⟨W⟩. Above that is Japanese ⟨ウィ⟩wi; below it are Cyrillic ⟨И⟩ i,Hebrew ⟨ו⟩ w, and (barely visible at the bottom) Tamil ⟨ ⟩ vi.

• To the left of the ⟨W⟩ is Greek ⟨Ω⟩ ō, and below that are Chinese ⟨維⟩ wéi, Kannada ⟨ ⟩vi, and (barely visible at the bottom) Tibetan ⟨ ⟩ wi.

• At left, from the top down, are Armenian ⟨ ⟩ v, Cambodian ⟨ ⟩ vĕ (lying on its side),Bengali ⟨উ⟩ U, Devanagari वि vi, and Georgian ⟨ ⟩ v.

• The rightmost column is Ethiopic ⟨ ⟩ wə, Arabic ,w⟨و⟩ Korean ⟨위⟩wi, and Thai ⟨วิ⟩wi.• The empty space at the top represents the incomplete nature of the project, the articles,

and languages yet to be added (https://en.wikipedia.org/wiki/Wikipedia_logo).

Fig. 2 The globe in theWikipedia logo (For a series ofthe evolution of the logo fromNupedia to present the logo,see illustrations at https://es.wikipedia.org/wiki/Marcas_corporativas_de_Wikipedia#/media/File:WikipediaLogo-TheOfficialFive.jpg)

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 7

Page 8: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

There have been debates about the logo as some have questioned the centralpositions of the Latin and the Greek alphabets (with the w and Ω) followed by theassociated Kanji and the Cyrillic letter contrasting with the peripheral ones for theArabic of the Devanagari characters for example. And there are more, includingKlingon (a fictional language from the television series Star Trek) which wasincluded and lasted until 2010 when it was replaced by Amharic and since usersreported the lack of precision and care in the representation of the letters and signs inJapanese, Devanagari, Chinese, and Greek (Ensslin 2011, pp. 549–553).

The text at the bottom of the logo was in English, Wikipedia The Free Encyclo-pedia, and is available in translation for the main pages of the other Wikipedias, forexample,Wikipédia L’encyclopédie libre in French, Wikipedia de vrije encyclopediein Dutch,Wikipedia Vapaa tietosanakirja in Finnish, Vikipēdija Brīvā enciclopēdijain Latvian, Esperanta Vikipedio in Esperanto, ةرحلاةعوسوملاايديبيكيو in Arabic,etc. This way the logo is automatically localized depending on where one accessesthe Internet.

The splash page of the overall project at https://www.wikipedia.org/ also reflectsboth diversity and clear language hierarchies. The largest Wikipedias are listedaround the logo. Below is a search engine (with a menu for language choiceamong the larger Wikipedias), followed by a menu “Read Wikipedia in your ownlanguage.” (This page displays the interface in the language dominant at the locationof the IP, but the names of the language are filed in the original language.) The menuincludes a list of languages that are visible; the languages are grouped by size, andwithin each cluster, they are ranked alphabetically.

• Those with 100,000+ articles• Those with 10,000+ articles (but less than 100,000)• Those with 1000+ articles (but less than 10,000)• Those with 100+ articles (but less than 1,000)• And a link to a list with other languages (the smaller Wikipedia with less than

100 articles)

A reproduction of the main page in Ensslin (2011, pp. 548) shows that thesehierarchies have long been in place. The structure is intact in 2018, but the sizecategories and the exact list of languages have evolved since the language versionshave not necessarily developed at the same pace.

For editors, that is, active Wikipedians contributing to the encyclopedia, multi-lingualism is additionally foregrounded in many features of Wikipedia apart from theinterwiki links and in many projects of the Wikimedia Foundations. For example, itis clearly represented in the Babel user language template that Wikipedians can useon their personal page to announce their linguistic skills. To label a language, theyuse codes allocated to languages by the International Organization for Standardiza-tion (ISO), the main international standard-setting body based in Geneva andcomposed of representatives from various national standard organizations. TheBabel user template uses ISO codes (en for English, fr for French, nl for Dutch,etc.) and a standardized classification of expertise: mother tongue, ranging from

8 V. Mamadouh

Page 9: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

1 (for basic ability) to 5 (for professional level). The box displays the description ofthe level of expertise in the language in question. For example, “This user is able tocontribute with a professional level of English” for en-5 vs “This user is able tocontribute with a basic level of English” for en-1 (Fig. 3).

It is also noteworthy that members of the language committee are listed along withtheir linguistic skills (not other information provided about the committee), and theyare all multilingual, mentioning at least three languages, up to 17 (of which 11 are atthe basic level) for one member (Fig. 4), and all mention fluency in English, at leasten-3, i.e., “can write in this language with some minor errors.” The coding presup-poses, however, some familiarity with the Latin alphabet, Arabic numbers, and thelanguage codes used online; in other words “Wikipedia’s multilingual endeavours areskewed towards a power imbalance in favour of code-savvy Western (and specificallyAnglophone) users” (Ensslin 2011, p. 554). So it celebrates multilingualism, butparticipates into the reproduction and deepening of English hegemony.

Another expression of the multilingual habitus of Wikipedia is the Wiktionary, asister project of Wikipedia to construct free-content multilingual dictionaries (nowexisting in 171 languages). Wiktionaries collect lexicographic data that can be usedfor various natural language processing tasks (see, e.g., the English Wiktionary athttps://en.wiktionary.org/wiki/Wiktionary:Main_Page) which can be particularlyuseful resources for Wikipedians editing articles on Wikipedia.

Moreover, there is Babylon, the Wikimedia translators’ portal at https://meta.wikmedia.prg/wiki/Meta:Babylon with a talk page, a mailing list, a newsletter, and

en-5

en

en-4

en-3

en-2

en-1

This user can contribute with aprofessional level of English.

This user can contribute with a near-native level of English.

This user can contribute with anintermediate level of English.

This user can contribute with a basic level of English.

This user can contribute with anadvanced level of theEnglish language.

This user is a native speaker of theEnglish language.

Fig. 3 Snapshot codes foruser-en

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 9

Page 10: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

an IRC channel. There are multilingual Wikisources. Translation requests can befiled to encourage volunteers to work on specific articles deemed worthy of trans-lation between Wikipedia A and Wikipedia B, either because the existing article hasbeen noted for its qualities or because the topic is particularly relevant and priori-tized, since the lack of an entry is assessed as an urgent need.

Fig. 4 Snapshots of the Babel box of one of member of the language committee

10 V. Mamadouh

Page 11: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Table

1Dataforthe15

largestWikipedias

Con

tent

Users

Editors

Edits

Usage

Cod

eLangu

age

Articles

Speakers

Eds/

speakers

5+ edits

100+

edits

Adm

ins

Bots

By

bots

(%)

Unregistered

humans(%

)Origina

Page

view

s/ho

urOrigin

enEng

lish

5,70

8,69

61121

M26

29,207

3472

1274

312

931

39.3%

USA

4,60

4,45

641

.5%

USA

ceb

Cebuano

5,38

1,90

120

M1

282

460

9920

a52

838

.6%

USA

svSwedish

3,77

1,25

010

M58

582

106

6640

5820

87.4%

Sweden

45,823

89%

Sweden

deGerman

2,21

4,81

313

2M

3951

7581

219

837

410

2083

.4%

Germany

563,24

477

.2%

Germany

frFrench

2,03

6,68

228

5M

1645

8176

316

110

719

2180

.1%

France

357,27

968

%France

nlDutch

1,94

0,50

428

M40

1109

203

4526

938

1970

%Netherlands

74,950

65%

Netherlands

ruRussian

1,49

3,77

426

4M

1230

5453

887

8417

2572

.5%

Russia

559,95

858

.6%

Russia

esSpanish

1,46

6,91

851

3M

839

4152

971

3618

3732

.2%

Spain

509,01

422

.1%

Spain

itItalian

1,45

8,03

368

M34

2310

374

109

173

2932

95%

Italy

220,02

186

.6%

Italy

plPolish

1,29

7,36

043

M29

1235

246

106

6835

1990

.9%

Poland

138,74

490

.3%

Poland

war

Waray-

Waray

1,26

3,32

73M

512

22

4294

7a

392

83.2%

China

viVietnam

ese

1,18

7,69

168

M7

446

6423

130

5810

83.9%

Vietnam

25,850

83%

Vietnam

jaJapanese

1,118,66

612

8M

3444

0037

049

579

4096

.3%

Japan

524,15

196

.1%

Japan

(con

tinued)

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 11

Page 12: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Table

1(con

tinue

d) Con

tent

Users

Editors

Edits

Usage

Cod

eLangu

age

Articles

Speakers

Eds/

speakers

5+ edits

100+

edits

Adm

ins

Bots

By

bots

(%)

Unregistered

humans(%

)Origina

Page

view

s/ho

urOrigin

zhChinese

1,02

0,41

61107

M3

2827

403

83111

1725

39.1%

Taiwan

264,04

751

.3%

Taiwan

ptPortugu

ese

1,00

3,98

823

6M

615

0820

369

213

2436

83.4%

Brazil

145,62

272

.6%

Brazil

Speakers–Estim

ationof

thenu

mberof

prim

aryandsecond

aryspeakersin

millions

Editors/speakers–Num

berof

edito

rswith

5+edits/m

onth

(3mon

thsaverage)

permillionspeakers

Edits–Percentageof

edits

bybo

ts,and

amon

ghu

man

edito

rs,p

ercentageof

edits

byun

registered

humans

Adm

ins–Adm

inistrators(editorswith

administratingrespon

sibilities

Origin–Maincoun

tryof

origin

and%

oflocalized

page

edits,resp.

page

view

sSou

rces:Mostdata(Con

tent,U

sers,E

ditorsandUsage)31

Aug

ust20

18from

http://stats,wikipedia,org/EN,/sitemap,htm

Sou

rce:https://stats,wikim

edia,org/wikim

edia/squ

ids/Squ

idReportPageE

ditsPerLangu

ageB

reakdo

wn,htm

OriginPageview

sdataarefor1–30

June

2018

,Sou

rce:https://stats,wikim

edia,org/wikim

edia/squ

ids/Squ

idReportPageV

iewsPerLangu

ageB

reakdo

wn,htm

NBno

dataforWaray-W

aray

andCebuano

notincluded

inthestatistics,notso

largeatthetim

ea O

riginPageedits

dataarefor1July

2009

–30September20

13(since

thesestatisticsareno

tprov

ided

anym

ore)

12 V. Mamadouh

Page 13: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Multilingualism in Practice

Wikimedia provides a wide range of statistics about its own activities includingWikipedias at https://stats.wikimedia.org/, generating tables about articles, views,editors, etc. for the different Wikipedias. They document the wide differencesbetween the Wikipedias and the uneven representations of languages (Table 1).These impressive figures should not distract the observers from the deep inequalitiesbetween the language versions: in size and in quality (comprehensiveness, readabil-ity, and reliability).

An earlier comparative study of different Wikipedias warns about comparingsheer statistics provided by Wikipedia. For example, counting articles gives anapproximation of the size of each Wikipedia, but remains a very global estimate.Articles widely differ in size and quality. Some are very long and detailed, refer tomany sources, and feature plenty of illustrations. For example, the article about theEuropean Union in English at https://en.wikipedia.org/wiki/European_Union is26 pages long if printed and includes almost 300 references and a large number ofmaps, tables, and pictures. But some articles are one-sentence stubs that are veryincomplete entries (e.g., the entry about the EU in Gagauz at https://gag.wikipedia.org/wiki/Evropa_Birlii), and sometimes articles appear in the wrong language (cre-ated or copied to another – often with the intend to get them translated – but not yettranslated and who knows for how long they are in limbo) (Van Dijk 2009, p. 236).In a sample of 50 random articles for 53 language editions, Van Dijk found somemajor differences between what he calls “real” and “pseudo” articles (the first beingcontent rich and with strong editing by individuals) with the Japanese Wikipediaranking 100% real articles, while others were as low as 10% (Upper Sorbian andCorsican) and still others at 0% (in the artificial language Volapük). He uses thesefindings to distinguish four groups of Wikipedias:

1. Large ones (like German, French, Dutch, Russian, and Chinese with over100,000 articles in 2008) covering a vast range of subjects and very active

2. Medium ones (like Catalan or Esperanto) quite similar, but more modest, morethan 10,000 articles in 2008

3. Small ones (like Afrikaans, Swahili, and Bavarian) with then over 1000 realarticles according to his estimation, covering only fragments of human knowl-edge and not very active

4. Micro ones with very few real articles and hardly active (his 2008 sampleincluded Scots, Zealandic, and Tok Pisin) (Van Dijk 2009, pp. 237–238).

Almost 10 years later, the size of the Wikipedias has changed, especially thelargest ones have become much larger, all now with more than one million articles(as shown earlier in this chapter), but the typology combining size and activity is stilla useful categorization.

Generally speaking, the community does not promote the translation of articleswithout localization in the societal context associated with the language to serve theintended audience. For example, the article about James Joyce in Italian is expected

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 13

Page 14: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

to feature more details about his life in Trieste than the article in English, while thearticle about a US movie in language X is expected to feature details about thecirculation and reception of the movie in that language zone, including the name ofthe voice actors dubbing for the main characters (when dubbing is applied) or thename of the translators of the subtitles.

Some versions have expanded dramatically using machine translation through thework of bots or web robots generating articles by translating them automaticallyfrom the other Wikipedias, often the English Wikipedia. In 2007 already the VolapükWikipedia increased from about 800 articles to over 110,000, thanks to the use ofbot-generating stubs (sketches of articles that are largely empty and need to be filledby editors with additional information). The most productive bot is called lsjbot; ithas been created by the Swedish Wikipedian Sverker Johansson and used to generatecontent for the Swedish, Cebuano, and Waray-Waray (the latter two are languages inthe Philippines where his wife is from) especially about animals and plants. Heproudly calls himself on his personal page https://en.wikipedia.org/wiki/User:Lsj asthe single biggest producer of articles. All three languages are now ranked in the top15 (over one million articles). Despite these achievements, the use of bots andmachine translation is highly disputed. In Winter 2017–2018, a lively discussionensued about closing the CebuanoWikipedia altogether because it consists mainly ofbots-generated content (99% of the edits, see Table 1). The proposal to close it waseventually rejected, but is likely to resurface in the near future since the generaldiscussion goes on.

The disagreement between proponents and opponents of machine translationboils down to quality assessment. The proponents think it is better to have a (poorly)translated article than nothing, the other fear that it circulates hegemonic Anglo-American representations. Moreover, it is disputed whether it is easier and moreconvenient for local editors to edit a translated article and localize it or to create anew one from scratch.

Notable disputes about new articles have been known as long as there have beenongoing debates between the inclusionists and the deletionists (Ford 2011,pp. 262–263); the latter are also called exclusionists. The first favor the inclusionof new articles, even if short and/or poorly written (note that this could mean also in anonstandard version of the language). The second prioritize quality and favor thedeletion of articles that do not match their high standards. The dispute was partic-ularly serious among German Wikipedians (Wikimedia Deutschland 2011: espe-cially pp. 164–182). The dilemma is as old as the making of encyclopedias. AndAchim Raschka, the German editor opposing the use of bots, referred to this point inthis context to an entry written by Denis Diderot for the Encyclopédie in 1751, titled“Aguaxima”:

Aguaxima, a plant growing in Brazil and on the islands of South America. This is all that weare told about it; and I would like to know for whom such descriptions are made. It cannot be

14 V. Mamadouh

Page 15: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

for the natives of the countries concerned, who are likely to know more about the aguaximathan is contained in this description, and who do not need to learn that the aguaxima grows intheir country. It is as if you said to a Frenchman that the pear tree is a tree that grows inFrance, in Germany, etc. It is not meant for us either, for what do we care that there is a tree inBrazil named aguaxima, if all we know about it is its name? What is the point of giving thename? It leaves the ignorant just as they were and teaches the rest of us nothing. If all thesame I mention this plant here, along with several others that are described just as poorly,then it is out of consideration for certain readers who prefer to find nothing in a dictionaryarticle or even to find something stupid than to find no article at all. (quoted in WikipediaSignpost 29 June 2013, available at https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2013-06-19/News_and_notes)

Geographical entries are particularly well suited for such exercises, since a lot ofstubs can be generated stating no more than the existence of a place (see, e.g., theentry for Logan, Utah, in Turkish Wikipedia at https://tr.wikipedia.org/wiki/Logan,_Utah or the Finnish city of Mikkeli in Greenlandic https://kl.wikipedia.org/wiki/Mikkeli or in Chinese https://zh.wikipedia.org/wiki/%E7%B1%B3%E5%87%B1%E5%88%A9). Machine translation or editing based on foreign articles do notnecessarily use the English Wikipedia as starting point. Articles in a related language(linguistically and/or culturally) might prove much more useful, for example, using aDanish article to produce one in Bokmål, a Czech article to produce one in Slovak, aPortuguese article to produce one in Galician, an Indonesian article to produce one inMalay, etc.

A final reason to acknowledge that Wikipedia mirrors global linguistic diversityincluding existing inequalities and hierarchies between languages is the realizationthat English is the editorial and auxiliary metalanguage, in other words English as aco-language (Ensslin 2011; Mamadouh 2018) used for discussions among editorsand administrators across Wikipedias and other Wikiprojects, not to mention theWikimedia Foundation itself. This typically is both the enabling and disablingfunction of English (Ensslin 2011, p. 555), making cross-cultural communicationpossible, but at the same time disabling some users/editors and favoring those ingood command of the language, especially native speakers. In that sense, one mightwonder if the English Wikipedia is an example of the use of English as lingua franca,i.e., a language shared by everyone or an instrument of English hegemony, i.e., thehegemonic position of the core speakers of English (the Anglosphere dominated bythe UK and the USA). In any event, the English Wikipedia is different from theothers because it clearly serves a global audience, while other versions serve morelocalized audience, even if the Portuguese, Spanish, and French Wikipedias alsoserves a public spread across different continents.

Wikipedia as a Microcosm

As an encyclopedia, and as a set of monolingual encyclopedias linked together,Wikipedia reflects the uneven relations between languages as well as the differencesbetween languages regarding their status, prestige, and resources. But as an online

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 15

Page 16: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

community or as a network of online communities of editors, Wikipedias also dealwith lingual diversity by “doing” multilingualism. As such Wikipedia is a micro-cosm of global linguistic diversity.

Negotiating Multilingualism

Multilingual users navigate between language versions to find the information theyneed if no relevant article is published in their first language and to find the mostcomplete article in the languages they can read. But often the purpose is to comparethe content of the articles in different language versions or sometimes as a translatingtool between the languages they know in order to enlarge their vocabulary (i.e., tocheck how to translate the name of a disease, a plant, or a specific place or the propertransliteration of a person’s name in another script). This multilingual reading ofWikipedia greatly enriches multilinguals, expanding their mastering of the differentlanguages as well as their understanding of cultural or environmental differences(e.g., the arrival of swallows associated with spring in French sayings, but withsummer in Dutch ones who are linked to the location of the core areas of bothlanguages: after the winter swallows arrive sooner in the south of France than in theNetherlands). It is also a reality check on taken-for-granted assumptions about theglobal notoriety of national artists, scientists, or politicians when they seem notworthy of a page in another language.

Studies have shown that editors work across language versions. According toHale (2014a), 15% of active Wikipedia editors are multilingual and edit in multiplelanguage versions. They sometimes edit in languages they do not master well inorder to comply with layout rules or systematize interwiki links. Cross-languageediting has been studied by Hale (2014a, b); he found a strong negative correlationbetween the size of the group of users primarily editing a language edition and thepercentage of multilingual editors. In other words, he found a higher level ofmultilingualism among smaller language editions and a lower level of multilingual-ism among the larger language editions, with the Japanese Wikipedia being the mostmonolingual. Hara and Doney (2015) compared English and Japanese editorsediting articles about Okinawa and found differences in content and interactionstyle (also between the interventions of the two languages for bilingual editors).Kim et al. (2016) analyzed the content written by multilingual editors in the English,German, and Spanish Wikipedias and found differences among the characteristics ofthe editors, the policies they adopted, and their behaviors. The English Wikipediahas the largest and most varied number of multilingual editors by any primarylanguage (only 33% are primary users of English). Editors whose primary languagewas Spanish or German made more complex edits than those who edited in theselanguages as their second (or third or fourth. . .) language. By contrast editorsworking in the English Wikipedia as their second (or third or fourth...) languagemade edits as complex as primary users of English. It suggests that the Englishmultilinguals were editing interwiki links, adding illustrations, standardizing layout,etc., while multilinguals working in English contribute to the contents. These

16 V. Mamadouh

Page 17: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

findings stress the role of the English Wikipedia as a common source for a globalcommunity. Content is provided and edited by a linguistically diverse population.

Many studies have used Wikipedias as a source of information about cross-language and cross-cultural differences and to compare Wikipedias and their con-tent. Despite the many interwiki links, it is very common that articles have nocounterparts in another edition. For example, local politicians, artists, or scientistsmight have only a page in the main language of their country of origin. By contrastDonald Trump (since his election as president of the USA), Wolfgang AmadeusMozart, and Albert Einstein are likely to have an article in a very large number ofWikipedias. Likewise, the USA, the United Nations, Paris (France), and the OlympicGames are likely to be well covered in most languages, while Moresnet, the IndianOcean Tuna Commission, the small community of Paris, Iowa, or the DutchChampionship Frisian handball will generate many fewer entries.

Academic studies have systematically compared the coverage of mainWikipedias in terms of topics. They report very moderate overlap, even betweenlanguages with a similar cultural background and a very active community of editors(e.g., only 51% overlap between the English and the German, according to Hechtand Gergle 2010). A large number of articles are specific to a single Wikipedia (75%according to Hecht and Gergle 2010). Even smaller ones have a specific knowledgeand are centered on the place where the language is used. Others have ranked peoplethe most covered (in the most languages), networked (through links to other articles),and consulted by users. The Swedish botanist Carl Linnaeus scores high, forexample, because in most languages most articles including those on botanicalclassification refer to an article about him.

Samoilenko et al. (2016) have carried a cluster analysis based on co-editingamong 110 Wikipedias’ clustering languages that share a large number of concepts,that is, articles about the same concept that are linked together by an interwiki link.The resulting map (Fig. 2 on p. 8 to be found in the open access article at https://epjdatascience.springeropen.com/track/pdf/10.1140/epjds/s13688-016-0070-8)shows 23 clusters, based on linguistic distance (Romance languages), but alsogeographical proximity (e.g., Hungarian, Czech, Slovak, Romanian, and Esperanto;Japanese, Korean, Chinese, and Thai; or Scandinavian languages and Finnishdespite important linguistic differences among them). Likewise, different patternsof controversies have been studied (Apic et al. 2011; Yasseri et al. 2014) identifyingin samples of language clusters on the basis of shared contested topics.

Creating a New Wikipedia: How to Make Your Language Countin the World

As a community, the editors of Wikipedia have also to make decisions about thecreation of new Wikipedias. The procedure they have developed, for what it isworth, is important since Wikipedians function as gatekeepers. The language pro-posal policy has been codified following a proposal in June 2006 (that can still beconsulted at https://meta.wikimedia.org/wiki/Language_proposal_policy) following

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 17

Page 18: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

the sharp increase of the number of language versions (already 250 by then). Alanguage committee was established (presently 14 members) and publishedresources to make the procedure more transparent with handbook for the requestsfor a new language and for the editors who want to participate in the decision-making process (see details about both members of the committee and for thehandbooks at https://meta.wikimedia.org/wiki/Language_committee).

The specific steps to be followed when filing a request for new language arespecified:

• Check that the project does not already exist.• Obtain an ISO 639 code (for the language name).• Ensure the requested language is sufficiently unique that it could not exist on a

more general wiki.• Ensure that there are a sufficient number of native editors of that language to merit

an edition in that language.(see https://meta.wikimedia.org/wiki/Requests_for_new_languages)

The language community needs to develop an active test project and ensure it isactive until approval (this is checked by a bot; the threshold is three active editors,i.e., editors having committed at least one edit in the past 30 days). There arerequired MediaWiki interface translations (to implement the architecture of thewebsite and the interwiki links) in a process called localization.

The language code (a valid ISO 639-1 or 639-3 language code like en for English)should be accompanied with the name of the language in English and in the languageitself (as it will eventually appear on the list of available languages for the interwikilinks). The URL is modelled after the language code and reads like langcode.wikiproject.org and eventually, if accepted, langcode.wikipedia.org.

If there is no valid ISO 639 code available, the volunteers must obtain one, i.e.,convince the standards organization to create an ISO 639 code. The evolution of theglobal regime of language recognition has been analyzed by Tomasz Kamusella whosituates its origins in the Summer Institute of Linguistics (SIL) established in 1934 inthe USA and closely connected to the Wycliffe Bible Translators (Kamusella 2012,p. 71) and their need to assess and classify indigenous languages to which the societywanted to bring the scripture. Since 1946 it merged into the United Bible Societies(UBS) which by 2010 have made the New Testament available in 1231 languages(Kamusella 2012, p. 71). With computer storage revolutionizing libraries in the1960s, classification was later driven by the Library of Congress and other nationallibraries in need of a standard to categorize their holdings in a systematic way(Kamusella 2012, p. 63). In 1967 the International Organization for Standardization(ISO) came up with the ISO 639 standards covering the 184 main languages withtwo-letter codes. Eventually the Internet and its standards extended possibilities toinclude languages and the ISO-3 codes do not rely on Romanization anymore but onUnicode, an Internet standard that supports over 600 languages in about 160 scripts(Kamusella 2012, p. 65).

18 V. Mamadouh

Page 19: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Invented language codes have been allowed occasionally (for 13 Wikipedias atthe moment) mostly because standard language codes were not available (e.g., forSimple English, Ripuarian, or Dutch Low Saxon), but sometimes instead an officialcode or a new code was created after the Wikipedia page (e.g., for Samoglitian,Norman, Cantonese, or Classical Chinese).

Each proposal is discussed online with proponents of and opponents to the newWikipedia, and arguments are filed (often in English or with an English translation).“The project will be assessed on its linguistic merits and chances of flourishing.”Discussions are public and can be retrieved via the same webpage (https://meta.wikimedia.org/wiki/Requests_for_new_languages). They might be very procedural,that is, participants tend to oppose languages with no ISO code, taking the existenceof a code as an indicator of the singularity and notability of any language. Othersoppose this rigid interpretation, but suggest the proponents seek to convince ISO firstor allow for exceptions.

Other arguments revolve around the singularity of the language and its relationswith other languages. When the proposed version is commonly perceived as a localvariety (e.g., when Valencian is conceived as a variety of Catalan or Cajun French asa variety of French), a newWikipedia is opposed, but arguments to prove similaritiesand differences can be linguistic (differences or similarities in vocabulary or syntax)or political (in the case of Valencian, the reference to the Valencian AutonomousCommunity, as distinct to the Catalan one, in Spain). Although political argumentsare not receivable, since Wikipedia is meant to serve individuals not politicalcommunities, the political status of a language does intervene as it is closely linkedto different cultural practices. For example, different political and social institutionsgenerate different vocabularies between Belgium and the Netherlands or the USA,Canada, and the UK.

With regional languages, the discussions often revolve around the respectiveadvantages to have one Wikipedia for one language family or of separate Wikipediasfor more local language, and by definition smaller, ones. For example, the discussionabout the eligibility of Gronings (a Dutch Lower Saxon variety) was opposed asdetrimental to the existing Dutch Lower Saxon Wikipedia (including Gronings)despite the fact that Gronings had a specific ISO code and is often acknowledgedas a specific language and emphasizing that using specific dialects was allowed inthat Wikipedia. Therefore, the existing Wikipedia was seen as a place to cultivateintercomprehension between the different dialects. By contrast, the existence of aspecific Dutch Lower Saxon Wikipedia had been justified – despite the absence of aISO code for it and despite dialect continuum among Lower Saxon dialects acrossthe border between the Netherlands and Germany – by the fact that speakers on eachside of the state border use the national language (Dutch and German, respectively)to write in their regional language and to create new words such as cell phone (seealso Van Dijk 2009 on these languages).

The number of living native speakers should be sufficient to form “a viablecommunity and audience.” Nevertheless, Wikipedia has been proposed for historicallanguages such as Latin and for artificial languages such as Esperanto, and theycould be approved (and maintained) if “a reasonable degree of recognition as

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 19

Page 20: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

determined by discussion” (the phrasing is from the policy, see https://meta.wikimedia.org/wiki/Language_proposal_policy) is demonstrated, according to thelanguage committee. There are Wikipedias in Esperanto, Ido, Interlingua,Interlingue, Lojban, Volapük, and Novial, all constructed languages with a sizeablecommunity of users, but Toki Pona has been removed for lacking any societalgrounding.

If a language is verified as eligible, the project can be developed in the incubator(“at least five editors must edit that language regularly before a test project will beconsidered successful”), and eventually the Wikipedia can be published (or closed ifthere is little or no activity). To develop the project, a list has been consolidated of athousand articles every Wikipedia should have pertaining to basic content regardingbiographies, history, geography society, culture, science, technology, and mathemat-ics. When the test wiki is finally approved, all pages developed in the test wiki aretransferred to the actual Wikipedia.

Over time, ten of the smallest Wikipedia have been closed due to inactivity. Somewere set back in the incubator where the community can develop it further. The listalso mentionedWikipedias that have been deprecated (Alsatian has been extended toAlemannic, Akan which is now considered a family of languages, but one Wikipediais now in the Twi language (one of the Akan languages)). Some wikis (Toki Ponaand Klingon, the first an artificial language, the other a fictional one) do not belong tothe Wikipedia family anymore; they have been removed and are now hosted bywikia.com as well as a few rejected proposals (Lingua Franca Nova, Korean Hanja,and Prussian). Moldovan was closed because Moldovan was found to be a version ofRomanian (even according to the 1989 Language Law of Moldova) written inCyrillic, and there is a software to navigate the two scripts. Finally one languagehas been deleted: it is infamous case of the so-called Siberian language in 2006, aterrible embarrassment for the community (although it was created before thelanguage policy was put in place following a proposal in June 2006, the veryname given to that language should had alarmed editors); it was a fantasy languagecreated by Yaroslav Zolotaryov, a Siberian separatist with a misogynic, xenophobic,and anti-Semitic political agenda.

Disputes About Languages

At first sight the criteria formalized in the language policy seem rather straightfor-ward, but they are not. Even the number of editors is disputable, and having aWikipedia can be a strategic step toward the revitalization of a language. A rathersmall group might start a virtuous circle mobilizing a larger number of participants.By contrast an enthusiastic group might quickly become exhausted and demotivatedand stop writing articles.

Two other criteria are even trickier and might generate some discussion in specificcases. Mutual intelligibility is not a technical matter: it greatly depends on attitudeand exposure, because linguistic differences reverberate power relations. The namegiven to a language (its recognition as a language as opposed to being a variety, a

20 V. Mamadouh

Page 21: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

dialect, a sociolect, or a regiolect of another language) is politically motivated. Smalldifferences in language use (pronunciation, vocabulary, syntax, idioms, etc.) pertainto linguistic norms and cultural differences between social groups, geographicallybounded or not. Cultural differences are easier to tackle than linguistic ones. Anarticle can provide information about different practices in different places and indifferent sections and make them visible with subheadings. For example, an articleon academic degrees or on municipal institutions in English will discuss the matter ingeneral terms before presenting the practices and associated concepts in differentsocietal contexts, typically the USA, Canada, the UK, Ireland, and Australia.

By contrast issues regarding the linguistic code itself are less easy to solve in thearticle itself. Especially this is true when the language is monocentric, with a clearhierarchy between different varieties, such as French when the editors writing inanother variety will be “corrected” and see their language framed as incorrect andimproper for Wikipedia purposes. In English, similar problems arise for varietiesbeyond British and American English (but even between them regarding vocabularyand spelling, e.g., colour/color) especially postcolonial World Englishes (Kachru1996) such as Indian, Nigerian, or Singaporean varieties. The style rules specify thatconsistency within an article should be achieved. Furthermore, the preferred varietydepends on the content: for localized topics the spelling of the local variety isfavored and for general topics the spelling and vocabulary most commonly used(meaning generally that the variety used by the largest group of speakers, i.e.,American English, French of France, Brazilian Portuguese, German standard Ger-man, etc.).

The hegemony of the core users of a language applies both to the decision to addan article on a specific topic (that can be considered not worthy of an entry by editorswho come from the core) and to the syntax and the vocabulary used in an entry.Notable disputes arise around the use of measurements especially in the EnglishWikipedia: most editors favor the metric system as a global scientific system, butothers argue for American imperial metrics since the metric system is mystifyingmost of the American readers, the largest group of English speakers. Again, theglobal status of the English edition is the cause of the debates. The importance of atopic can also be very local and rejected by a wider community. There was, forexample, a controversy surrounding an article for the English Wikipedia aboutMakmende, a Kenyan fictional superhero reactualized in a video that went viral in2009, but that, nevertheless, was not seen as noteworthy for an entry by otherEnglish-speaking editors (see Ford 2011). The entry has since been stabilized (seehttps://en.wikipedia.org/wiki/Makmende).

The naming of the language is particularly sensitive. The existence of aWikipediacan become important in political struggles as an endorsement of claims to auton-omy. Political issues are particularly significant when nationalist separatist or irre-dentist movements are mobilizing around language distinction. Most notably,controversies emerged around the disintegration of Yugoslavia and the process ofdistinction that was strategically devised among the languages of the successor statesof Yugoslavia: Croatian and Bosnian as opposed to Serbian. Montenegrin was

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 21

Page 22: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

rejected four times by the Wikipedia’s language committee, and a new discussionhas been started in 2018 after an ISO code was attributed to the language. The Serbo-Croatian edition was revived after some discussion in 2005 by proponents ofmaintaining the language and a Wikipedia version promoting intercommunicationbetween Wikipedians in the successor states of Yugoslavia and promotingintercomprehension between speakers of Serbian, Croatian, Bosnian, and Montene-grin to resist national language policies promoting further distinction between themfor political reasons. Discussions about these languages show that some stronglyoppose linguistic diversity within in the Serbo-Croatian Wikipedia (e.g., the use ofdifferent dialects and foremost Latin and Cyrillic scripts, and Serbian is more andmore associated exclusively with the latter which was not the case in the past). Ingeneral code mixing seems to generate opposition. The Norwegian Wikipediaoriginally accepted articles in both official languages Bokmål and Nynorsk, butthe users of Nynorsk felt marginalized, and in 2004 they split. There are now twoNorwegian Wikipedias, one in Bokmål (no.wikipedia.org) and a much smaller onein Nynorsk (nn.wikipedia.org) (see also Van Dijk 2009, p. 243). Egyptian Arabic hasits own Wikipedia, next to Arabic, and that has been disputed too, for fear offragmentation of the latter if other varieties of Arabic also granted a separate version(Moroccan and Algerian have been verified as eligible, but a proposal for SouthLevantine Arabic, a Lebanon variety written in Latin script, has been rejected).

Next to disputes about creating or not a new Wikipedia, debates about proposalsfor closure are also worth considering. An overview and more information aboutclosure proposals (and sometimes removal proposal) are available at https://meta.wikimedia.org/wiki/Proposals_for_closing_projects. The arguments put forward byWikipedians to convince each other and move toward a consensus are insightful andecho political disputes about languages and linguistic identities in society at large.They also provide information about the working of Wikipedias since they reflectupon existing editing practices.

One dispute is slightly different, but particularly insightful regarding globallinguistic diversity. It pertains to Simple English, a Wikipedia written in controlledEnglish. With over 100,000 articles, it is a medium-sized Wikipedia. The SimpleEnglish Wikipedia is meant for “people with different needs, such as students,children, adults with learning difficulties, and people who are trying to learnEnglish” (as phrased on https://en.wikipedia.org/wiki/Simple_English_Wikipedia).Editors use simpler vocabulary, shorter sentences than in the regular English version,and stick to commonly accepted facts (Dowling 2008, see also Yasseri et al. 2012).

In 2018 a third attempt to close Simple English led to a long discussion (54 printedpages) that shows the divide between Wikipedians. Arguments for closure include thefact that Simple English is not a separate language and that the Wikipedia should bemerged with the English Wikipedia. Others criticized the poor quality and consistencyof its articles and its liability to vandalism. It is also said to distract resources(volunteers’ time) from other Wikipedias and more notably from efforts to write inplain English in the regular English Wikipedia. Its raison d’être is contested: simpli-fication is seen as contradictory to encyclopedic comprehensiveness. Its efficiency is

22 V. Mamadouh

Page 23: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

contested: it does not reach the intended audience, that is, students and languagelearners do not know about it. Therefore, some want to close it altogether; others wantto add more visible tabs on English articles to improve the interconnection betweencomprehensive and simple articles about the same topic. But then again, others want toprevent any association between English and Simple English for fear that what theyperceive as poor quality of the latter might taint the reputation of the former.

Clearly Simple English is experienced in very different ways. Foreign learnerswrite they want to read the real thing and to learn English, not some simplifiedlanguage. On the other hand, some editors claim to read Simple English articles intechnical domains they do not master, and some non-native speakers of Englishreport they dare write and edit an article for Simple English, but not for the EnglishWikipedia. Finally, the exceptional status of Simple English is resented, as has beennoted a number of times, and proposals to create another Wikipedia in simplelanguage in other languages have been rejected (on the ground that they were noseparated language and they had no language community). But, again SimpleEnglish was created before these rules were formulated and adopted. All in all theproposal to close Simple English was rejected on August 1, 2018. But the discussionis likely to be raised again in the near future.

Finally, and in a similar vein, the question is whether or not to have severalWikipedias in the same language, but this time using different scripts. This is anotherfascinating ongoing discussion about the creation of a Chinese Wikipedia in Pinyin(in the Latin alphabet instead of ideograms). Originally, there were virtually twoChinese Wikipedias under the names of “zh” (or “zh-cn”) and “zh-tw,” but from2005 it has been made redundant by the availability of an automatic system toconvert between traditional and simplified Chinese. Like Simple English, a PinyinWikipedia could be helpful for learners of Chinese, either children or foreigners. Thematter is at the time of writing still under deliberation.

Wikipedia as a Motor

A last aspect to consider is whether Wikipedia can sort our changes taking place in alanguage offline. As we have seen, creating a Wikipedia version is a demandingprocess, and it could be well seen more and more as an up-to-date criteria to measurethe vitality of a language and as such become an important aspect of sociolinguisticsand language policies.

The complicated impact of Wikipedia (and more generally speaking Web 2.0) on“weaker” languages, i.e., language without much institutional support, needs morescrutiny. On the one hand, it offers an incredible opportunity (low-cost productionand circulation of content), while on the other it generates a huge pressure onvolunteers to standardize and harmonize their languages. Typically issues concernlocal varieties, spelling, and vocabulary especially when neologisms are needed(also contradicting Wikipedia core principle regarding “no new knowledge”). More-over, Wikipedians are not or may not necessarily be well connected to establishedlanguage activists and their institutions.

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 23

Page 24: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Baxter (2009) examined the process in more details for the Breton language,showing how the Breton Wikipedia evolves from being a terminology consumer tobeing a terminology provider. Disputes revolve around the rejection of Frenchcalques (as a result of some kind of inferiority complex) and alternatives such asthe use of international calques for the nativization of new words (but then againwhich one? ie. Latvia following the name of the country in Latvian, English, andSlavic languages or Letonia following German, Dutch, Scandinavian, and Romancelanguages, cited in Baxter 2009, p. 70), borrowing from other Celtic languages(Welsh especially) as well as different shades of purism (reformist purism, elitistpurism, xenophobic purism) (see Baxter 2009, pp. 71–72) and a preference forcontinuity with earlier translations (especially to serve children enrolled in Bretonlanguage schools). By its sheer volume, the Breton Wikipedia is the biggest corpusin Breton today and the only encyclopedia in Breton, and, therefore, it is bound to beinfluential in the evolution of the language, especially since in this case no reliablesources like established dictionaries and encyclopedias are available; Wikipedia is,unavoidably, becoming itself such a source of language standardization.

Regional languages lacking full-fledged state support are only one kind of lesser-resourced languages greatly affected by Wikipedia. Van Dijk (2009) compares themwith third world languages (sic!) by which he means poorly institutionalized lan-guages from the Global South; his analysis does not include Portuguese or Spanish,but he deals with Bahasa Indonesian, Arabic, and Swahili. At the time poor access tothe Internet, lack of software standards (for long ASCII signs did not deal properlywith Arabic characters), and censorship were likely explanations for their poorvisibility in Wikipedia. Van Dijk also notes differences between them using localgeographical knowledge as an indicator: the Swahili Wikipedia covers cities in theregion much less than the English Wikipedia, the situation is slightly more balancedfor Arabic, and Indonesian has more regional city content than the EnglishWikipedia. This suggests different roles for English in these regional contexts.

More generally Van Dijk (2009) signals interesting absences that relates to thedifferent repertoires of multilingual speakers. For example, he observes that theAfrikaans Wikipedia is relatively small and most likely because Afrikaners do notfeel the urge to create it and as they consult routinely the English Wikipedia instead.By contrast the Luxembourgian Wikipedia is much larger than expected despite thediglossic situation in Luxembourg and the role of umbrella language German as thestandard language. But then again Swiss Germans seem to have no incentive tocontribute to an Alemanic Wikipedia, and they happily consult the GermanWikipedia, since High German is the proper language for an encyclopedia (VanDijk 2009: 246). These examples show that Wikipedia can be both a tool of languagerevitalization and a tool of further marginalization. This is not only influencing thebalance between minority and majority language in regional contexts (e.g., Bretonand French in Bretagne) or between low and high varieties (e.g., Luxembourgian andGerman in Luxembourg). It can also affect the balance between national languagesand English as global language. This phenomenon has been signaled very stronglyfor Icelandic which has been threatened by digital extinction in the age of the EnglishInternet (see Henley 2018). Wikipedia contributes to this development even if there

24 V. Mamadouh

Page 25: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

is an Icelandic Wikipedia featuring presently (summer 2018) less articles than theLuxembourgian one (see https://stats.wikimedia.org/EN/Sitemap.htm).

Last but not least, Wikipedia is producing new bridges between languages. Theinterwiki links allow for smooth crossing between any pair of languages (as long asthey feature articles identified by editors and bots as dealing with the same topic).This is important because it undermines the hub function English has in the words oftranslation (both for literary and scientific publications) and is identified as thehypercentral function of English (De Swaan 2001). In Wikipedia a user can developher or his bilingualism and knowledge through navigation between let say Finnishand Maltese, German and Russian, Chinese and Japanese, or Spanish and Cebuano,without necessarily passing through English. This “crossing” shows that Wikipediacan be both a tool of the promotion of English on the one hand (through the role ofEnglish as a language of communication among editors of different Wikipedias andthrough the wide use of the English Wikipedia), and on the other hand, it offers toolsto counterbalance the hegemonic position of English.

Conclusion

Wikipedia, as an interlinked set of monolingual Wikipedias, shows a particularlysustained engagement with linguistic diversity. It is both a mirror and a motor ofglobal linguistic diversity, which is represented in all its complexity, includingdifficult power relations between language groups. It has developed complex poli-cies and mechanisms to regulate the creation of newWikipedias and at the same timecontributes to the changing world language map through the many bridges it cancreate between languages. The role of multilingual editors is particularly importantin shaping the relations between Wikipedias. It is, however, dependent on localcontingencies whether the tool adds to the pressure of the growing use of English onthe Internet or whether it provides opportunities to revitalize smaller languages.Likewise, the special position of the English Wikipedia as a global encyclopediaserving both an English-speaking monolingual audience in the hegemonic powerand a global, multilingual audience is noteworthy.

References

Apic, G., Betts, M., & Russell, R. (2011). Content disputes in Wikipedia reflect geopoliticalinstability. PLoS One, 6, 1–5.

Bao, P, Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. (2012). Omnipedia: Bridging theWikipedia language gap. In CHI-12. Austin.

Baxter, R. N. (2009). New technologies and terminological pressure in lesser-used languages: TheBreton Wikipedia, from terminology consumer to potential terminology provider. LanguageProblems & Language Planning, 33, 60–80.

van Dijk, Z. (2009). Wikipedia and lesser-resourced languages. Language Problems & LanguagePlanning, 33, 234–250.

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 25

Page 26: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Dowling, T. (2008, January 15). Wikipedia too long-winded for you? Try the simple version. TheGuardian.

Ensslin, A. (2011). “What an un-wiki way of doing things” Wikipedia’s multilingual policy andmetalinguistic practice. Journal of Language and Politics, 10, 535–561.

Ford, H. (2011). The missing Wikipedians. In G. Lovink & N. Tkacz (Eds.), Critical point of view:A Wikipedia reader. Vol. INC reader #7 (pp. 258–269). Amsterdam: Institute of NetworkCultures.

Hale, S. A. (2014a). Multilinguals and Wikipedia editing, WebSci’14.Hale, S. A. (2014b). Cross-language Wikipedia editing of Okinawa, Japan. CHI.Hara, N., & Doney, J. (2015). Social construction of knowledge in Wikipedia. First Monday

20(6) 1 June 2015.Hecht, B., & D. Gergle. (2010). The tower of babel meets Web 2.0. In CHI2010 Atlanta.Henley, J. (2018, February 26). Icelandic language battles threat of ‘digital extinction’, The

Guardian.Kachru, B. B. (1996). World Englishes: Agony and ecstasy. The Journal of Aesthetic Education, 30,

135–155.Kamusella, T. (2012). The global regime of language recognition. International Journal of the

Socilogy of Language, 2012(218), 59–86.Kim, S., Park, S., Hale, S. A., Kim, S., Byun, J., & Oh, A. H. (2016). Understanding editing

behaviors in multilingual Wikipedias. PLoS One, 11(5 e0155305), 1–22.Kopf, S. E. (2018). Debating the European Union transnationally – Wikipedians’ construction of

the EU on a Wikipedia talk page (2001–2015), PhD Thesis Lancaster University.Mamadouh, V. (2018). Do you speak Globish? Geographies of the globalization of English and

linguistic diversity. In R. C. Kloosterman, V. Mamadouh, & P. Terhorst (Eds.), Researchhandbook on the geographies of globalization (pp. 209–221). Cheltenham: Edward Elgar.

Raffestin, C. (1995). Langue et territoire. Autour de la géographie culturelle. In S. Walty &B. Werlen (Eds.), Kulturen und Raum: theoretische Ansätze und empirische Kulturforschungin Indonesien: Festschrift für Professor Albert Leemann. Zürich: Rüegger.

Raffestin, C. (2012). Space, territory, and territoriality. Environment and Planning D: Society andSpace, 30(1), 121–141.

Salor, G. E. (2012). Sum of all knowledge: Wikipedia and the encyclopedic urge, PhD ThesisUniversity of Amsterdam.

Samoilenko, A., Karimi, F., Edler, D., Kunegis, J., & Strohmaier, M. (2016). Linguisticneighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editingactivity. EPJ Data Science, 5, 1–20.

de Swaan, A. (2001).Words of the world, the global language system. Cambridge, UK: Polity Press.Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In search of the Ur-Wikipedia:

Universality, similarity, and translation in the Wikipedia inter-language link network.WikiSym‘12.

Wikimedia Deutschland e.V (Ed.). (2011). Alles über Wikipedia, und die Menschen hinter dergrößten Enzyklopädie der Welt. Hamburg: Hoffmann und Campe.

Yasseri, T., Kornai, A., & Kertész, J. (2012). A practical approach to language complexity: AWikipedia case study. PLoS One, 7(11), e-48386.

Yasseri, T., Spoerri, A., Graham, M., & Kertész, J. (2014). The most controversial topics inWikipedia: A multilingual and geographical analysis. In P. Fichman & N. Hara (Eds.), GlobalWikipedia: International and cross-cultural issues in online collaboration. Lanham: Rowman &Littlefield.

26 V. Mamadouh

Page 27: Wikipedia: Mirror, Microcosm, and Motor of Global ...€¦ · Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the Internet.”This

Wikipedia and Other Wikimedia Sites Cited in the Text (Apart fromthe Language Versions) (All Last Accessed in the Summer of 2018,Unless Mentioned Otherwise)

https://meta.wikimedia.org/wiki/Language_committeehttps://en.wikipedia.org/wiki/List_of_Wikipediashttps://en.wikipedia.org/wiki/List_of_Wiktionarieshttps://en.wikipedia.org/wiki/Wikimedia_Foundationhttps://en.wikipedia.org/wiki/Wikipedia_logohttps://en.wikipedia.org/wiki/Wiktionaryhttps://es.wikipedia.org/wiki/Marcas_corporativas_de_Wikipediahttps://foundation.wikimedia.org/wiki/Wikimedia_official_marks/About_the_official_

Marks#What_characters_are_on_the_Wikipedia_puzzle_globe.3Fhttps://meta.wikimedia.org/wiki/Language_committee/Handbook_(committee) https://meta.wikimedia.

org/wiki/Language_committee/Handbook_(requesters)https://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_havehttps://meta.wikimedia.org/wiki/Meta:Babylonhttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Cebuano_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Siberian_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Simple_English_

Wikipedia_(3)https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Yiddish_Wikipediahttps://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Deletion_of_Siberian_Wikipediahttps://meta.wikimedia.org/wiki/Requests_for_new_languageshttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Chinese_(Pinyin)_2https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Groningshttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Jazayrihttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Montenegrin_5https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Moroccanhttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_South_Levantine_

Arabichttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Valencianohttps://stats.wikimedia.org/EN/https://stats.wikimedia.org/wikimedia/animations/growth/AnimationProjectsGrowthWp.htmlhttps://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.htmlhttps://stats.wikimedia.org/wikimedia/squids/SquidReportEditsPerLanguageBreakdown.htmhttps://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htmhttps://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htmhttps://wikimediafoundation.org/about/vision/

Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity 27


Recommended