+ All Categories
Home > Education > New Methodologies for Capturing and Working with Publicly Available Twitter Data

New Methodologies for Capturing and Working with Publicly Available Twitter Data

Date post: 10-May-2015
Category:
Upload: axel-bruns
View: 673 times
Download: 1 times
Share this document with a friend
Description:
Paper presented at the Association of Internet Researchers conference, Salford, 19-21 Oct. 2012.
Popular Tags:
15
New Methodologies for Capturing and Working with Publicly Available Twitter Data Associate Professor Axel Bruns @ snurb_dot_info http://mappingonlinepublics.net/ Queensland University of Technology
Transcript
Page 1: New Methodologies for Capturing and Working with Publicly Available Twitter Data

New Methodologies for Capturing and Working with Publicly Available Twitter Data

Associate Professor Axel Bruns @snurb_dot_infohttp://mappingonlinepublics.net/ Queensland University of Technology

Page 2: New Methodologies for Capturing and Working with Publicly Available Twitter Data

WHY TWITTER?

• Researching Twitter:– Significant world-wide social network– ~500 million accounts (but how many active?)– Varied range of uses: from phatic communication to emergency coordination– Healthy third-party ecosystem (for now)– Strong history of user innovation:

@replies, #hashtags– Flat and open network structure:

non-reciprocal following, public profiles by default– Good API for gathering (big) data for research

Page 3: New Methodologies for Capturing and Working with Publicly Available Twitter Data

NEW MEDIA AND PUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER-CREATED CONTENT

IN ONLINE SOCIAL NETWORKS

• Australian Research Council (ARC) Discovery Project (2010-13) – $410,000– QUT (Brisbane), Sociomantic Labs (Berlin)– First comprehensive study of Australian social media use– Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr,

YouTube as ‘networked publics’– Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and

communication studies – natively digital methods– Studying society with the Internet (Richard Rogers)

http://mappingonlinepublics.net/

Page 4: New Methodologies for Capturing and Working with Publicly Available Twitter Data

• Data Gathering– yourTwapperkeeper + in-house crawler

• Data Processing– Gawk – open source, multiplatform, programmable command-line tool for

processing CSV documents

• Textual Analysis– Leximancer – commercial, multiplatform: extracts key concepts from large

corpora of text, examines and visualises concept co-occurrence– WordStat – commercial, PC-only text analysis tool; generates concept co-

occurrence data that can be exported for visualisation

• Visualisation– Gephi – open source, multiplatform network visualisation tool

A TWITTER RESEARCH TOOLKIT

Page 5: New Methodologies for Capturing and Working with Publicly Available Twitter Data

SO NOW WHAT?

Page 6: New Methodologies for Capturing and Working with Publicly Available Twitter Data

APPROACHING TWITTER

• Possible research questions:– Hashtags as vehicles for ad hoc events and publics:

• How do online publics form and dissolve? How do they interact, what structures do they form?

• Where do they draw information from? What do they share?• Do they simply consist of the usual suspects? How insular and disconnected

are online publics?

– Hashtags in context:• How do different hashtag events compare? Are there common types of

hashtags/publics?• How ‘big’ are they? What topics attract attention on Twitter?• What community (?) structures emerge?

Page 7: New Methodologies for Capturing and Working with Publicly Available Twitter Data

DEVELOPING TWITTER METRICS

• Key data points available through the Twitter API:– text: contents of the tweet itself, in 140 characters or less– to_user_id: numerical ID of the tweet recipient (for @replies)– from_user: screen name of the tweet sender– id: numerical ID of the tweet itself– from_user_id: numerical ID of the tweet sender– iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default

language– source: client software used to tweet (e.g. Web, Tweetdeck, ...)– profile_image_url: URL of the tweet sender’s profile picture– geo_type: format of the sender’s geographical coordinates– geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates– created_at: tweet timestamp in human-readable format– time: tweet timestamp as a numerical Unix timestamp

Page 8: New Methodologies for Capturing and Working with Publicly Available Twitter Data

DEVELOPING TWITTER METRICS

• Additional data points from tweets:– original tweets: tweets which are neither @reply nor retweet– retweets: tweets which contain RT @user… (or similar)

• unedited retweets: retweets which start with RT @user…• edited retweets: retweets do not start with RT @user…

– genuine @replies: tweets which contain @user, but are not retweets– URL sharing: tweets which contain URLs

• Potential uses:– metrics per hashtag– metrics per timeframe (day, hour, minute, second, …)– metrics per user (or group of users)– …

(Bruns & Stieglitz, forthcoming)

Page 9: New Methodologies for Capturing and Working with Publicly Available Twitter Data

#QLDFLOODS @REPLIES

mainstream media

authorities

Page 10: New Methodologies for Capturing and Working with Publicly Available Twitter Data

#ROYALWEDDING

Page 11: New Methodologies for Capturing and Working with Publicly Available Twitter Data

#AUSPOL (FEB.-DEC. 2011)

Page 12: New Methodologies for Capturing and Working with Publicly Available Twitter Data

HA

SH

TA

G M

ET

RIC

S

Page 13: New Methodologies for Capturing and Working with Publicly Available Twitter Data

BEYOND HASHTAGS

• Publics on Twitter:– Micro: @reply and retweet conversations– Meso: follower/followee networks– Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming)

Multiple overlapping publics / networks

• What drives their formation and dissipation?• How do they interact and interweave?• How are they interleaved with the wider media ecology?• Twitter doesn’t contain publics: publics transcend Twitter

Page 14: New Methodologies for Capturing and Working with Publicly Available Twitter Data

‘BIG DATA’ AND THE DIGITAL HUMANITIES

• Emerging needs in Twitter research:– Unified, compatible methods and metrics for Twitter analysis

Tools and approaches shared at http://mappingonlinepublics.net/

– Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter

Data access requires substantial funding stream

– Facilities for long-term data storage and preservation Key roles for National Libraries, National Archives

– Integration with related datasets (e.g. MSM content) Need to address data interoperability questions

– Robust frameworks for Internet research ethics Clear guidelines which take into account complex new public/private structures

• Twitter as a test case for digital humanities research– Widespread, open, public platform for everyday communication– Tool for observing society at scale through Internet research


Recommended