+ All Categories
Home > Software > Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Date post: 02-Jul-2015
Category:
Upload: stuart-shulman
View: 875 times
Download: 0 times
Share this document with a friend
Description:
Slide prepared for "Social Media and the Defense Sector" http://www.smi-online.co.uk/defence/uk/social-media-within-the-military-and-defence-sector
41
Sifting Social Data Word Sense Disambiguation Using Machine Learning Dr. Stuart Shulman Founder & CEO, Texifter “…a wealth of information creates a poverty of attention.” - Herbert Simon, 1971
Transcript
Page 1: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Sifting Social Data Word Sense Disambiguation Using Machine Learning

Dr. Stuart Shulman

Founder & CEO, Texifter

“…a wealth of information creates a poverty of attention.”- Herbert Simon, 1971

Page 2: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pronounced “tech-sifter” the metaphor is of a sifter

Page 3: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Text Classification

A 2500 year-old problem

Plato argued it would be frustrating and it still is…

Page 4: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Grimmer & Stewart “Text as Data” Political Analysis (2013)

Volume is a problem for scholarsCoders are expensive

Groups struggle to accurately label text at scaleValidation of both humans and machines is “essential”

Some models are easier to validate than othersAll models are wrong

Automated models enhance/amplify, but don’t replace humansThere is no one right way to do this

“Validate, validate, validate”“What should be avoided then, is the blind use of any method without a validation step.”

Page 5: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Our free, open-source, web-based text analytics toolkit

Page 6: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

The original software kernel: tools for measurement

Page 7: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

A mission to avoid tennis elbow

Items load to the screen and the coder hits the keystroke

Page 8: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Keystroke human coding: alone or in groups

Codes

Metadata Data

Human coding can be distributed to individuals, groups & crowds

Page 9: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Computer science & NSF influences: measure everything

How fast?How reliable?

How accurate?

Stuart Shulman – Texifter

Page 10: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Inter-rater reliability is one critical measurement

Stuart Shulman – Texifter

Page 11: Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Page 12: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Plugged in to APIs & Government

Page 13: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Import data directly via APIs or from your desktop

Stuart Shulman – Texifter

Page 14: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Full historical Twitter access

Stuart Shulman – Texifter

Page 15: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

PowerTrack operators for more precise queries

Stuart Shulman – Texifter

Page 16: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Store social data with survey responses and other data

Stuart Shulman – Texifter

Page 17: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Private, 3rd party & free (rate limited) social data sources

Stuart Shulman – Texifter

Page 18: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Unlimited “fire hose” premium data sources

Stuart Shulman – Texifter

Page 19: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

The Five Pillars of Text Analytics

SearchFiltering

De-duplication and ClusteringHuman Coding

Machine-LearningStuart Shulman – Texifter

Page 20: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #1: Search

Stuart Shulman – Texifter

Page 21: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #1: Defined multi-term search

Stuart Shulman – Texifter

Page 22: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #2: Filters

Stuart Shulman – Texifter

Page 23: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #2: Filters

Stuart Shulman – Texifter

Page 24: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #3: Deduplication & clustering

Stuart Shulman – Texifter

Page 25: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #3: Deduplication & clustering

Stuart Shulman – Texifter

Page 26: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #4: Human coding (a.k.a. labeling or tagging)

Stuart Shulman – Texifter

Page 27: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #4: Human coding

Stuart Shulman – Texifter

Page 28: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar #4: Human coding (adjudication)

Stuart Shulman – Texifter

Page 29: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar#5: Machine-learning

Stuart Shulman – Texifter

Page 30: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Pillar#5: Machine-learning

Stuart Shulman – Texifter

Page 31: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Our ActiveLearning engine and coding tools combine…

what humans do best… with what computers do best

Humans and machines learning togetherKeep humans “in-the-loop” for more accurate results and better insights

Stuart Shulman – Texifter

Page 32: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Word sense disambiguation (relevance)

Stuart Shulman – Texifter

Page 33: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Word sense disambiguation (relevance)

Stuart Shulman – Texifter

Page 34: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Word sense disambiguation (relevance)

Stuart Shulman – Texifter

Page 35: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Stuart Shulman – Texifter

Page 36: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Human coding can be converted into machine classifiers

Accumulated human coding becomes training data via machine-learning

Page 37: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Users can drill into interactive reporting displays

Use metadata to examine sub-sets of responses and create reports.

Page 38: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Slicing big piles of text into smaller, more focused sets is key

Ultimately all text analytics are filtering techniques

Page 39: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Crowdsourcing accelerates the insight generation process through machine-learning

Distributed for synchronous & asynchronous collaboration

Page 40: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

CoderRank (patent pending) for enhanced machine-learning is our key innovation

Page 41: Sifting Social Data: Word Sense Disambiguation Using Machine Learning

For more information visit the Texifter table ordiscovertext.com

@discovertextThank-you for listening!

Stuart Shulman – Texifter


Recommended