Date post: | 21-Jan-2018 |
Category: |
Science |
Upload: | evazangerle |
View: | 594 times |
Download: | 3 times |
1
The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.500 staff, it is
western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.
#Wikipedia on Twitter:
Analyzing Tweets about Wikipedia
Eva Zangerle, Georg Schmidhammer, Günther Specht
4
5
Research Questions
RQ3: Does the number of tweets about a certain articlecorrelate to a recent edit and hence, an update of thepage?
RQ2: Which features do Wikipedia articles that are popularon Twitter exhibit/share?
RQ1: How popular are the various Wikipedias on Twitter andin which language contexts are these referenced?
6
Dataset
• Crawl of Twitter using keyword „wikipedia“
• 2014/10/20 – 2015/03/10
• Total of 4.5 million tweets
• Cleaning of dataset
• Tweets with Wikipedia URL
• Normalization of URLs (also mobile URLs)
• Retweets remain within the set
22% of all Wikipedia-URLs articlesare mobile URLs
7
Dataset
Characteristic Raw Cleaned
Tweets 4,530,967 2,468,055
Retweets 1,440,122 659,641
Distinct Users 1,730,984 844,975
Mentions 3,334,848 1,880,687
Distinct Hashtags 159,231 118,912
Hashtag Usages 1,528,458 778,737
Distinct URLs 1,447,124 1,121,825
URL Usages 3,393,846 2,793,900
63.24% of all tweets contain 1
URL (maximum: 6 URLs)
77.72% of all URLs point to a
Wikipedia page
8
Tweets per Day
9
General Observations: Users
• Long-tailed distribution
• Average number of tweets per user: 2.92
• However: maximum number of tweets per user: 64,521
• 19 of 20 most popular users are bots (404 users in total; 264k tweets)
E. Zangerle, G. Schmidhammer, G. Specht: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links
(accepted at HICSS 2016)
RQ1
Language Analyses
11
Language Distribution
• Analysis of tweeted Wikipedia article in regards to language
• Extract Wikipedia edition (language) from URL
Missing: context, underlying data.
Language Total Share
English (en) 1,349,623 52.81%
Japanese (ja) 579,157 22.66%
Spanish (es) 140,396 5.49%
Turkish (tr) 78,235 3.06%
French (fr) 64,139 2.51%
German (de) 52,256 2.04%
Russian (ru) 44,347 1.74%
Arabian (ar) 38,757 1.52%
Korean (ko) 27,261 1.07%
Portuguese (pt) 26,442 1.03%
12
Correlation of Language and Wikipedia Size Measures
Measure Spearman‘s ρ
Total number of articles .76*
Edits .65*
Users .46*
Admins .42*
Active users .39*
Images .39*
Depth1 .35*
* Significant at the 0.001 level
1 Depth = Edits/Articles x Non-Articles/Articles x [1-Stub-ratio]
13
Tweet Languages
Language Share
English 42.90%
Japanese 21.92%
Spanish 5.77%
Arabian 2.56%
French 2.37%
Turkish 2.24%
German 1.75%
Indonesian 1.56%
Russian 1.35%
Language Share
English (en) 52.81%
Japanese (ja) 22.66%
Spanish (es) 5.49%
Turkish (tr) 3.06%
French (fr) 2.51%
German (de) 2.04%
Russian (ru) 1.74%
Arabian (ar) 1.52%
Korean (ko) 1.07%
Tweets Wikipedias referenced
14
Inter-language links
Wikipedia Language
Twit
ter
Lan
guag
e
en ja es ar fr tr de id ru pt
en 97.33% 0.19% 0.42% 0.03% 0.33% 0.05% 0.35% 0.12% 0.10% 0.05%
ja 5.48% 93.56% 0.04% 0.01% 0.11% 0.03% 0.20% 0.01% 0.05% 0.01%
es 19.65% 0.28% 77.48% 0.01% 0.62% 0.03% 0.32% 0.07% 0.03% 0.51%
ar 26.58% 0.02% 0.12% 72.79% 0.17% 0.02% 0.02% 0.00% 0.00% 0.00%
fr 20.21% 0.19% 1.11% 1.92% 74.73% 0.03% 0.73% 0.02% 0.05% 0.17%
tr 20.78% 0.01% 0.17% 0.00% 0.18% 77.62% 0.83% 0.04% 0.10% 0.02%
de 21.15% 0.59% 1.41% 0.06% 0.44% 0.13% 74.94% 0.04% 0.04% 0.06%
id 49.83% 1.20% 1.77% 0.16% 0.60% 0.40% 0.91% 42.84% 0.06% 0.26%
ru 17.74% 0.10% 0.05% 0.00% 0.14% 0.03% 0.32% 0.00% 78.38% 0.01%
pt 28.90% 0.73% 6.91% 0.01% 0.75% 0.05% 0.46% 0.09% 0.03% 60.87%
20% of all tweets link toanother language.
85% of all inter-languagelinks do not have a
counterpart in original language.
15
Inter-Language Links
• 85% of all links leading to a Wikipedia of a language different from thetweet‘s language do not have a counterpart in the user‘s language
• Remaining 15%: Wikipedia actually used is significantly better in terms ofquality than language in tweet‘s language
E. Zangerle, G. Schmidhammer, G. Specht: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links
(accepted at HICSS 2016)
RQ2
Top Articles and Categories
17
Methods
• Tweets about English Wikipedia
• 52.81% of all tweets
• Total of 724,974 references to Wikipedia
• Total of 336,605 distinct English Wikipedia articles
• Extract article titles and categories from DBPedia
• Resolve extended URLs (e.g., diff-pages, access to old revisions, etc).
18
Distribution: Tweets per Articles
64% of all articlesonly tweeted once
19
Top Articles
Article No. of Tweets Share
diff 54,432 7,51%
cod_wars 6,868 0,95%
user:Giraffedata/comprised_of 4,541 0,63%
matthew_ziff 2,100 0,29%
kidz_bop 2,015 0,28%
gamergate 1,703 0,23%
old_revision 1,517 0,21%
search 1,383 0,19%
the_little_mermaid_(1989_film) 1,370 0,19%
No article standing out particularly.
20
Top Categories
Category No. of Tweets Share
Living people 105,895 14,61%
English-language films 18,331 2,53%
American films 9,605 1,32%
Wars involving the United Kingdom 7,487 1,03%
American male television actors 7,255 1,00%
20th-century conflicts 7,158 0,99%
American male film actors 6,981 0,96%
20th-century military history of the United Kingdom 6,968 0,96%
Law of the sea 6,953 0,96%
Wars involving Iceland 6,928 0,96%
RQ3
Edits and Tweets
22
Methods
• Crawled via MediaWiki API
• Tweets about English Wikipedia articles (724,974 references to 336,605 distinct articles)
• Observation period: +/- 24 hours of a tweet
• 543,788 edits in total
• 91,577 edits marked as minor
• 312,160 tweets link to an article edited within +/- 24 hours of tweet
• 233,962 tweets: edit occured before tweet
• 215,192 tweets: edit occured after tweet
• No correlation between number of edits and number of tweets: Pearson‘s r: 0.06 (at0.001 significance level)
• Exception: events
23
Conclusion
RQ1: 20% of all tweets link to a Wikipedia of another language.
RQ2: No particular categories or articles are significantly more popular onTwitter. Longtail-distribution for articles (64% of all English articles only tweetedonce).
RQ3: No correlation between number of edits and popularity of article onTwitter can be detected.
24
Future Work
• Look into inter-language links
• Tweets as quality measure
• Look into those tweets about Wikipedia without mentioning a particulararticle (qualitatively)
• Interested in joining forces?
25
#questions? http://en.wikipedia.org/wiki/Q&A #wikipedia
@eva_zangerle
http://www.evazangerle.at
@dbisibk
http://dbis-informatik.uibk.ac.at
https://www.facebook.com/dbisibk
26
The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.500 staff, it is
western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.
#Wikipedia on Twitter:
Analyzing Tweets about Wikipedia
Eva Zangerle, Georg Schmidhammer, Günther Specht