Date post: | 26-Jun-2015 |
Category: |
Data & Analytics |
Upload: | scott-a-hale |
View: | 435 times |
Download: | 0 times |
Multilinguals and Wikipedia Editing
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Background, Motivations
Wikipedia is global platform covering hundreds of languagesdespite evidence of balkanization (Taneja & Wu, in press)
Past studies generally concentrate on one edition (usually English)
Important variations across languages
Content is diverse across languages (Hecht & Gergle, 2010)
Each edition of Wikipedia shows a self-focus bias with more articlesabout regions where the language is spoken (Hecht & Gergle, 2009)
Multilingual users may act as unconscious translators bridging languagedivides (Herring et al., 2007; Eleta & Golbeck, 2012)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content inforeign languages even while simultaneously reporting poor foreignlanguage skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety oftimezones suggesting that bilingual editors are present (Yasseri, Sumi,& Kertesz, 2012)
In a survey of editors, half of all editors reported editing in multiplelanguages and 72% reported reading more than one language edition ofWikipedia.†
†https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content inforeign languages even while simultaneously reporting poor foreignlanguage skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety oftimezones suggesting that bilingual editors are present (Yasseri et al.,2012)
In a survey of editors, half of all editors reported editing in multiplelanguages and 72% reported reading more than one language edition ofWikipedia.†
†https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
1 Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user willusually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions
5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data
All edits to any of the top 46 language editions (all editions with atleast 100,000 articles)
Recorded via the IRC stream(code at http://www.scotthale.net/pubs/?websci2014)
32 days (8 July to 9 August 2013)
Edit meta-datadatetimeeditionarticle title
usernamesize of editflags (minor, bot, etc.)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in themonth
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users with a total of 3,518,955 edits (excluding the Simple Englishedition).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data summary
Language Edits Articles Users NPusers
NPedits
English 1,389,647 518,405 27,476 18% 3%German 256,495 125,647 5,967 18% 2%French 250,828 106,027 4,549 25% 3%Spanish 191,934 66,848 4,338 24% 3%Russian 239,267 92,326 3,961 16% 1%Japanese 106,848 56,406 3,551 11% 2%Italian 160,191 69,534 2,919 25% 2%Chinese 112,888 42,937 2,309 14% 1%Portuguese 67,505 32,753 1,730 29% 4%Dutch 80,535 39,463 1,500 33% 3%Polish 67,038 37,393 1,454 30% 3%
Top language editions: The Users column includes all users who edited the editionduring the data collection period. A percentage of these users (NP users) arenon-primary users who edited a different language edition more frequently.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals vs Monolinguals
15.4% of users (8,544) edited multiple language editions.
Figure: Density plot comparing the number of edits made by monolingual andmultilingual Wikipedia users.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
X Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user willusually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions
5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Only 2.6% of edits arefrom users writing in theirnon-primary languages.44% of the articles editedby multilingual users intheir non-primarylanguages were not editedby any monolingual user
2D density plot of the number of multilingualusers editing articles in a non-primary languageagainst the number of monolingual users editingthe articles.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles inother languages that they also edited in their primary languages. The distribution isbimodal. A large number of users did not edit any of the same articles in theirprimary languages, but a large number of users always edited the same articles intheir primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles inother languages that they also edited in their primary languages after removingedits to articles that do not exist in users’ primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
X Most editors will edit only one language edition
X Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user willusually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions
5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Variations by language
Scatter plot of language size (number of unique users) and percentage of users whoare multilingual (edit more than one language edition). The three editions with lessthan 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings
ar
bg
ca
cs
dade
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
nopl
pt
ro
ru
sv
truk
zh
Co-editing network graph
Nodes represent languageeditions
Directed, weighted edges showthe log of the number of usersprimarily editing one languageedition who edited anotheredition
Only edges with weights over1.96 standard deviations abovethe mean are shown
Colors indicate communitiesfound by the infomap communitydetection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent languageeditions
Directed, weighted edges showthe log of the number of usersprimarily editing one languageedition who edited anotheredition
Only edges with weights over1.96 standard deviations abovethe mean are shown
Colors indicate communitiesfound by the infomap communitydetection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
X Most editors will edit only one language edition
X Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user willusually also edit the corresponding article in his native language
X Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions
X Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Simple English
No big changes if Simple English edition is considered
Largest editor overlap with English edition
Dedicated group of editors:45% of editors editing Simple most frequently do not edit any otheredition (similar to Esperanto)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Comparison with Twitter
Similar percentages of users multilingual (11% in Twitter)
Similar correlation between activity level and multilingualism
Language size not correlated with multilingualism on Twitter;some language consistencies (Japanese, English) and some variations
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.http://www.scotthale.net/pubs/?chi2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Implications and future directions
Implications
Multilingual users found in alleditions; correlation with activity
Design for multilingual users(universal language selector andglobal accounts already progressin this direction)
Important per languagevariations
Inverse correlation betweenmultilingual users and self-focusbias as measured by Hecht(2009)
Further work
Move from edit meta-data toedit content itself
What type of edits are usersmaking in non-primarylanguages?Variations by topic/theme?Correlations with link/imageoverlap?
Viewing vs. editing behavior(survey results show much higherpercentage of users read multipleeditions)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as
well as the anonymous reviewers who provided helpful comments on previous versions of
this research.
Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:Cambridge University Press.
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:How Multilingual Users of Twitter Connect Language Communities.Proceedings of the American Society for Information Science andTechnology, 49(1), 1–4. Available fromhttp://dx.doi.org/10.1002/meet.14504901327
Hale, S. A. (2014). Global Connectivity and Multilinguals in the TwitterNetwork. In Proceedings of the sigchi conference on human factors incomputing systems (pp. 833–842). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/2556288.2557203
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias incommunity-maintained knowledge repositories. In Proceedings of thefourth international conference on communities and technologies (pp.11–20). New York, NY, USA: ACM. Available fromhttp://doi.acm.org/10.1145/1556460.1556463
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:User-generated content and its applications in a multilingual context.In Proceedings of the 28th international conference on human factorsin computing systems (pp. 291–300). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1753326.1753370
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,Stoerger, S., et al. (2007). Language Networks on LiveJournal. InProceedings of the 40th annual hawaii international conference onsystem sciences. Washington, DC, USA: IEEE Computer Society.Available from http://dx.doi.org/10.1109/HICSS.2007.320
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Languageand Internet diffusion patterns in Uzbekistan. New Review ofHypermedia and Multimedia, 11(2), 205–220.
Yasseri, T., Sumi, R., & Kertesz, J. (2012). Circadian Patterns of WikipediaEditorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.Available fromhttp://dx.doi.org/10.1371%2Fjournal.pone.0030091
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age ofConnection. London: W. W. Norton & Company.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing