Date post: | 23-Sep-2014 |
Category: |
Education |
View: | 6 times |
Download: | 0 times |
Citizen Sensor Data Mining, Social Media Analytics and
Development Centric Web Applications.Tutorial at
Semantic Technology Conference, San Francisco, CA.
Karthik GomadamAccenture Technology Labs,
San Jose
Amit ShethKno.e.sis @
Wright State University
Selvam VelmuruganeMoksha, Kiirti
Monday, June 6, 2011
Lu Chen(Sentiment Analysis)
Meena Nagarajan(Content Analysis)
Ashutosh Jadhav(Event Analysis)
Hemant Purohit(People & Network analysis)
Pavan Kapanipathi(Real Time Web)
Selvam Velmurugan (Kiirti, eMoksha NGOs)
Pramod Anantharam(Social & Sensor web)
Amit Sheth(Semantic Web)
Monday, June 6, 2011
Much of the work discussed in this tutorial is primarily the doctoral research by Dr. Meena Nagarajan, currently at IBM Almaden. It also includes current work done at kno.e.sis center at Wright State University.
A Quick Word
Monday, June 6, 2011
Citizen Sensing: Role, Enablers, Apps
Systematic Study Social Media
Citizen Sensing @ Real-time
Emerging Research Areas Spam and Trust in Social Media, Mobile Social ComputingResearch Application: Twitris
Tutorial part 2
Outline
Monday, June 6, 2011
Citizen Sensing
Everyday users of Web2.0 and social networks: Citizens ofan Internet- or Web-enabled social communityObservation and Information reported by citizens => Citizen SensingHuman-in-the-loop (participatory)sensing + Web 2.0 + mobile computing = emergence of
" citizen-sensor networks
Monday, June 6, 2011
Social Signals
The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), Creating social signals through aggregation, enhancement,
analysis, visualization, and interpretation.Immense potential to disseminate information quickly and in real-time
Monday, June 6, 2011
Enablers: Mobile Devices & Ubiquitous Connectivity
Mobile device fast emerging as our primary tool Redefines the way we engage with people, information,
etc. Global, Ubiquitous, always availableSense where you are, how you are,
Monday, June 6, 2011
Enablers: Mobile Devices & Ubiquitous Connectivity
Global, Ubiquitous, always availableSense where you are, how you are,
Monday, June 6, 2011
Enablers: Mobile Devices & Ubiquitous Connectivity
Sense where you are, how you are,
Monday, June 6, 2011
Enablers: Mobile Devices & Ubiquitous Connectivity
Monday, June 6, 2011
Mobile Platforms Hit Critical Mass Over 5 billion users 1+B with internet connected mobile devices (2010) Smartphones > Notebooks + Netbooks (2010E) 500K+ mobile phone applications 74% of mobile phone users (2.4B) worldwide texted (2007)
Enablers: Mobile Devices & Ubiquitous Connectivity
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
500M+ Facebook Users100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Internet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Content dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Types of UGC: Twitter(text/microblogs), Facebook(multimedia),YouTube(videos), Flicker(images), Blogs(text),Ping: (Social network for music)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Flicker(images), Blogs(text),Ping: (Social network for music)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Ping: (Social network for music)
Monday, June 6, 2011
Enablers: Web 2.0 & Social Media
Monday, June 6, 2011
Iran electionHaiti EarthquakeUS healthcare debate
Citizen Sensors in Action
Monday, June 6, 2011
Revolution 2.0 Political/Social Activism
If you want to liberate a government, give them the internet. - Wael Ghonim (Egyptian social activist)When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.
Monday, June 6, 2011
Revolution 2.0 Political/Social Activism
When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.
Monday, June 6, 2011
Revolution 2.0 Political/Social Activism
Monday, June 6, 2011
Citizen Journalism
Twitter Journalism
Monday, June 6, 2011
Social Media Inuence: Intelligence, News & Analysis
Many media companies useFacebook and Twitter asnews-delivery platform. Manyindividuals rely on them as newssource. News is increasingly social.
Monday, June 6, 2011
Business Intelligence Trend SpoTing, Forecasting, Brand
Tracking and Crisis ManagementSysomos : http://www.sysomos.com/Trendspotting : http://trendspotting.comSimplify : http://simplify360.com/Shoutlet : http://www.shoutlet.com/ Reputation (Defender): http://www.reputationdefender.com/
Monday, June 6, 2011
Development (Education, Health, eGov)
LiveMocha (http://www.livemocha.com/) OnlineLanguage learning tool with social engagement bridging the gap!!Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds
across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
Development (Education, Health, eGov)
Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds
across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
Development (Education, Health, eGov)
Project Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
Development (Education, Health, eGov)
Monday, June 6, 2011
PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)TrialX (http://trialx.com)
Image: hMp://www.dragonsearchmarketing.com/blog/
social-media-development-through-visual-aids-tools/
Development (Education, Health, eGov)
Monday, June 6, 2011
Why People-Content-Network metadata?
Monday, June 6, 2011
Spatio - Temporal -Thematic+
People - Content - Network
Dimensions of Systematic Study of Social Media
Monday, June 6, 2011
"Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties)People: poster identities, the active effortof accomplishing interactionContent : studying the content of ommunication.
Social InformationProcessing
Monday, June 6, 2011
How does the (semantics or style of) content t into the observations made about the network?
Often, the three-dimensional dynamic of people, content and link structure is what shapes the social dynamic.
Studying Online Human Social Dynamics
Monday, June 6, 2011
Studying Online Human Social Dynamics
Monday, June 6, 2011
Studying Online Human Social Dynamics
Example: how does the topic of discussion, emotional charge of a conversation, the presence of an expert and connections between participants; together explain information propagation in a social network?
Monday, June 6, 2011
Studying Online Human Social Dynamics
Monday, June 6, 2011
Metadata/Annotations
Metadata: an organized way to study types creation/extraction and storage use
Monday, June 6, 2011
The Anatomy of a Tweet
Monday, June 6, 2011
Explicit information from user proles User Names, Pictures, Videos, Links, Demographic
Information, Group memberships... Often is not updated Implicit information from user a+ention metadata Page views, Facebook 'Likes', Comments; TwiMer
'Follows', Retweets, Replies..
People Metadata: Variety of Self-expression Modes on Multiple
Social Media Platforms
Monday, June 6, 2011
People Metadata: Various Levels
Demographic
Network
Activity
Interests
Monday, June 6, 2011
People Metadata: Continued
User Demographic MetadataUser-idScreen/Display-name of userReal name of userLocation Profile Creation DateUser descriptionUser BioURL
Interest Level MetadataAuthor type Trustee/donor, journalist, blogger, scientist etc.
Favorite tweets Types of lists subscribed Style of Writing personality indicator No. of Followees Author type trend of Followees
Monday, June 6, 2011
Web Presence:User affiliationsKLOUT Score influence measure (www.klout.com)
Activity Level Metadata
Age of the prole
Frequency of posts
Timestamp of last status
No. of Posts
No. of Lists/groups created
No. of Lists/groups subscribed
Inuence Level Metadata (Inferring People Metadata from Network level Information)
No. of Followers normal, inuential
No. of Mentions
No. of Retweets/Forwards
No. of Replies
No. of Lists/groups following
No. of people following back
Authority & Hub Scores
People Metadata: Continued
Monday, June 6, 2011
Content Independent metadata " date, location, author etcContent Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata
named entities in content Implicit/Inferred Content Metadata
related named entities from knowledge sources Indirect content-based metadata (External metadata)
context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)
Content Metadata
Monday, June 6, 2011
Content Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata
named entities in content Implicit/Inferred Content Metadata
related named entities from knowledge sources Indirect content-based metadata (External metadata)
context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)
Content Metadata
Monday, June 6, 2011
Content Metadata
Monday, June 6, 2011
For Tweets Published date and time Location (where tweet was generated from) Tweet posting method (smart-phone, twitter.com,
clients for twitter) Author information
Content Independent Metadata
Monday, June 6, 2011
Content Independent Metadata
Monday, June 6, 2011
For Text messages Published date and time Origin location Recipient Carrier information
Content Independent Metadata
Monday, June 6, 2011
Content Independent Metadata
Monday, June 6, 2011
Content Independent Metadata
Monday, June 6, 2011
Content Dependent Metadata (Tweet) Direct Content-based Metadata
Direct Content-based Metadata
Indirect content-based metadata (External metadata)
Monday, June 6, 2011
Content Dependent Metadata
Direct Content-based Metadata
Monday, June 6, 2011
Network Metadata
Connections/Relationships (foundation for the network) matter!Structure Level Metadata
Community SizeCommunity growth rateLargest Strongly Connected Component sizeWeakly Connected Components & Max. sizeAverage Degree of SeparationClustering Coecient
Relationship Level Metadata
Type of RelationshipRelationship strengthUser Homophily based on certain characteristic (e.g., Location, interest etc.)Reciprocity: mutual relationshipActive Community/ Ties
Monday, June 6, 2011
Metadata: Creation, Extraction and Storage
Monday, June 6, 2011
Extracted Metadata Directly visible information from the user profile, tweet
content & community structureCreated Metadata After processing information in the user profile, content
and/or network structure
Metadata Creation & Extraction
Monday, June 6, 2011
Length: 144 characters; General topic: Egypt protestThis poor {sentiment_expression: {target:Lara Logan, polarity:negative}} woman! RT @THR CBS News'{entity:{type=News Agency}} Lara Logan{entity:{type=Person}} Released From Hospital{entity:{type=Location}} After Egypt{entity:{type=Country} Assault{type=topic}http://bit.ly/dKWTY0 {external_URL}
An Example
Monday, June 6, 2011
Rich Snippet, RDFa, open graph, semantic web based social data standards
Relationships/connections play central role Relationships as rst class object is important
Why Semantic Web is a standard for social metadata?
Monday, June 6, 2011
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Representation RDF relationships as first class object OWLRepresenting Knowledge and Agreements:
nomenclature, taxonomy, folksonomy, ontology
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Annotation RDFa, Xlink, model reference
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Annotation RDFa, Xlink, model referenceWeb of Data Linked Open Data
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Annotation RDFa, Xlink, model referenceWeb of Data Linked Open DataQuerying SPARQL; Rules: SWRL, RIF
Semantic Web: A Very Short Primer
Monday, June 6, 2011
Store metadata as data and use standard database techniques
Use filtering and clustering, summarization, statistics - implicit semantics
How to save and use metadata?
Monday, June 6, 2011
Use filtering and clustering, summarization, statistics - implicit semantics
How to save and use metadata?
Monday, June 6, 2011
How to save and use metadata?
Monday, June 6, 2011
How to save and use metadata?
Monday, June 6, 2011
Use explicit semantics and Semantic Web standards and technologies
semantics = meaningricher representation, support for relationships, contextsupports use of background knowledgebetter integration, powerful analysisSemantics- the implicit, the formal and the
powerfulSocial metadata on the Web
How to save and use metadata?
Monday, June 6, 2011
Metadata Extraction from Informal Text
Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010
Monday, June 6, 2011
Characteristics of Text on Social Media
Monday, June 6, 2011
The Formality of Text
Monday, June 6, 2011
Recognize key entities mentioned in content Information Extraction (entity recognition, anaphora
resolution, entity classification..) Discovery of Semantic Associations between entities Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
Content Analysis-Typical Sub-tasks
Sentiment Analysis What opinions are people conveying via the content?Author ProfilingWhat can we infer about the author from the content he posts?Context (external to content) extractionURL extraction, analyzing external content
Monday, June 6, 2011
Examining usefulness of multiple context cues for text mining algorithms Compensating for for informal, highly variable
language, lack of context Using context cues: Document corpus, syntactic,
structural cues, social medium, external domain knowledge
In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion Mining
Research Eorts, Contributions in this space..
Monday, June 6, 2011
Named Entity Recognition I loved the hangover !Key Phrase Extraction
Part 1. NER, Key Phrase Extraction
Monday, June 6, 2011
Multiple Context Cues Utilized for NER in Blogs and MySpace
Monday, June 6, 2011
Multiple Context Cues Utilized for Keyphrase Extraction from TwiTer,
Facebook and MySpace
Monday, June 6, 2011
Techniques focus on relatively less explored content aspects on social
media platformsCombination of top-down, bottom-up analysis for informal text Statistical NLP, ML algorithms over large corpora Models and rich knowledge bases in a domain
Focus, Impact
Monday, June 6, 2011
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
I loved your music Yesterday!It was THE HANGOVER of the year..lasted
forever.. So I went to the movies..badchoice picking GI
Janeworse now
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
Identifying and classifying tokens
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
NER focus in this work: Cultural Named Entities
Artifacts of Culture Name of a books, music albums, lms, video games,
etc.Common words in a language The Lord of the Rings, Lips, Crash, Up, Wanted,
Today, Twilight, Dark Knight
Cultural Named Entities
Monday, June 6, 2011
Varied senses, several poorly documented Merry Christmas covered by 60+ artists Star Trek:
movies, TV series, media franchise.. and cuisines !!Changing contexts with recent events The Dark Knight reference to Obama, health care
reformUnrealistic expectations Comprehensive sense definitions, enumeration of
contexts, labeled corpora for all senses .. NER Relaxing the closed-world sense assumptions
Characteristics of Cultural Entities
Monday, June 6, 2011
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
NER generally a sequential prediction problem NER system that achieves 90.8 F1 score on the
CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]
Focus of approach: Spot and Disambiguate ParadigmStarting off with a dictionary or list of entities we want to spot
A Spot and Disambiguate Paradigm
Monday, June 6, 2011
Spot, then disambiguate in context (natural language, domain knowledge cues)Binary ClassificationIs this mention of the hangover in a sentence referring to a movie?
A Spot and Disambiguate Paradigm
Monday, June 6, 2011
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
Algorithmic Contributions Supervised Algorithms
Monday, June 6, 2011
Algorithmic Contributions Supervised Algorithms
Examples:I am watching Pattinson scenes in Twilight for the nth time.I spent a romantic evening watching the Twilight
by the bay..I love Lilys song
Monday, June 6, 2011
Multiple Senses in the Same Domain
Monday, June 6, 2011
Problem Defn Cultural Entity Identification : Music album, tracks Smile (Lilly Allen), Celebration (Madonna)Corpus: MySpace comments Context-poor utterances
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
Corpus: MySpace comments Context-poor utterances
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
Goal: Semantic Annotation of music named entities (w.r.t
MusicBrainz)
Algorithm Preliminaries
Monday, June 6, 2011
Using a Knowledge Resource for NER is not straight-forward..
Monday, June 6, 2011
Approach Overview
Scoped Relationship graphsUsing context cues from the
content, webpage title, url new Merry Christmas tune
Reduce potential entity spot size new albums/songs
Generate candidate entitiesSpot and Disambiguate
Monday, June 6, 2011
Sample Real-world Constraints
Career Restrictionsrelease your third album already..Recent Album restrictionsI loved your new album..Artist age restrictionshappy 25th rihanna, loved alfie btw.. etc.
Monday, June 6, 2011
Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem
partially
Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!
" " " "" " " "
Non-Music Mentions
Monday, June 6, 2011
Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem
partially
Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!
" " " "" " " "
Non-Music Mentions
Monday, June 6, 2011
Syntactic features POS Tags, Typed dependencies.. Example hereWord-level features Capitalization, QuotesDomain-level features
Using Language Features to eliminate incorrect mentions..
Monday, June 6, 2011
Supervised Learners
Monday, June 6, 2011
1800+ spots in MySpace user comments from artist pages
Keep your SMILE on! good spot, bad spot, inconclusive?
4-way annotator agreements
Madonna 90% agreement Rihanna 84% agreement
Lily Allen 53% agreement
Hand Labeling - Fairly Subjective
Monday, June 6, 2011
Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity SpoMing in Informal Text, The 8th International Semantic Web Conference,
2009: 260-276
Dictionary SpoTer + NLP Step
Monday, June 6, 2011
Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% allows the more time-intensive NLP analytics to
run on less than the full set of input data
NER on Social Media Text using Domain Knowledge
Monday, June 6, 2011
" "
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: Multimodal Social Intelligence in a Real-Time Dashboard System, special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 CHECK hMp://www.almaden.ibm.com/cs/
projects/iis/sound/
BBC SoundIndex (IBM Almaden): Pulse of the Online Music
Monday, June 6, 2011
http://www.almaden.ibm.com/cs/projects/iis/sound/
The Vision
Monday, June 6, 2011
Monday, June 6, 2011
Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source?
Ignoring Spam can change ordering of popular artists
Trending popularity of artists Trending topics in artist pages
Several Insights
Monday, June 6, 2011
Billboards Top 50 Singles chart during the week of Sept 22-28 07 vs. MySpace popularity charts.User study indicated 2:1 and upto 7:1 (younger age
groups) preference for MySpace list.Challenging traditional polling methods!
Predictive Power of Data
Monday, June 6, 2011
Key Phrase Extraction
Monday, June 6, 2011
Key phrases extracted from prominent discussionson Twitter around the 2009 Health Care Reformdebate and 2008 Mumbai Terror Attack on one day
Key Phrase Extraction: Example
Monday, June 6, 2011
Different from Information ExtractionExtracting vs. Assigning Key Phrases " Focus: Key Phrase ExtractionPrior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book..Focus: summarize multiple documents (UGC) around same event/topic of interest
Key Phrase Extraction from SM Text
Monday, June 6, 2011
Focus: Summarizing Social Perceptions via key phrase extractionPreserving/Isolating the social behind the social
data"What is said in Egypt vs. the USA should be viewed in
isolation
Key Phrase Extraction on SM Content
Monday, June 6, 2011
Accounting for redundancy, variability, off-topic content
" Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.
Key Phrase Extraction on SM Content
Monday, June 6, 2011
Thematic components similar messages convey similar ideas Space, time metadata role of community and geography in communicationPoster attributes age, gender, socio-economic status reflect similar
perceptions
Social and Cultural Logic in SMC
Monday, June 6, 2011
Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms
Feature Space (common to several eorts)
Monday, June 6, 2011
Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.
Feature Space (common to several eorts)
Monday, June 6, 2011
President Obama in trying to regain control of the health-care debate will likely shift his pitch in September
" 1-grams: President, Obama, in, trying, to, regain, ..." 2-grams: President Obama, Obama in, in
trying, trying
Key Phrase Extraction: Overview
Monday, June 6, 2011
A descriptor is an n-gram weighted by: Thematic Importance
TFIDF, stop words, noun phrases Redundancy: statistically discriminatory in nature variability: contextually important
Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
Monday, June 6, 2011
Monday, June 6, 2011
Eliminating Off-topic Content [WISE2009]Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR
Monday, June 6, 2011
Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonysCanonHV20.Great little cameras under $1000.
Approach Overview
Monday, June 6, 2011
Assume one or more seed words (from domain knowledge base) C1 -['camcorder']Extracted Key words / phrases
C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']
Gradually expand C1 by adding phrases from C2 that are strongly associated with C1Mutual Information based algorithm [WISE2009]
Approach Overview
Monday, June 6, 2011
Are the key phrases we extracted topical and good indicators of what the content is about? If it is, it should act as an effective index/search
phrase and return relevant contentEvaluation Application: Targeted Content Delivery
Key Phrases and Aboutness Evaluations
Monday, June 6, 2011
12K posts from MySpace and Facebook Electronics forums Baseline phrases: Yahoo Term Extractor Our method phrases: Key phrase extraction,
eliminationTargeted Content from Google AdSense
Targeted Content Delivery -Evaluations
Monday, June 6, 2011
Targeted Content for all content vs. extracted key phrases
Monday, June 6, 2011
User Studies and Results
Monday, June 6, 2011
TFIDF + social contextual cues yield more useful phrases that preserve social perceptionsCorpus + seeds from a domain knowledge base eliminate off-topic phrases effectively
Impact and Contributions
Monday, June 6, 2011
Intention Mining
Monday, June 6, 2011
On social networksUse case for this talk " Targeted content = content-based " advertisements " Target = user profilesContent-based advertisements CBAs " Well-known monetization model for online content
Targeted Content Delivery via Intention Mining
Monday, June 6, 2011
Circa. 2009 Content-based Ads
Monday, June 6, 2011
Circa. 2009 -Ads on Proles
Monday, June 6, 2011
Interests do not translate to purchase intents " Interests are often outdated.. " Intents are rarely stated on a profile.. Cases that do seem to work " New store openings, sales " Highly demographic-targeted ads
What is going on here
Monday, June 6, 2011
Intents in User
Monday, June 6, 2011
Content Ads Outside Proles
Monday, June 6, 2011
Non-trivial Non-policed contentBrand image, Unfavorable sentiments People are there to networkUser attention to ads is not guaranteed Informal, casual nature of content People are sharing experiences and eventsMain message overloaded with off topic content"
Targeted Content-based Advertising
Monday, June 6, 2011
Targeted Content-based Advertising
Monday, June 6, 2011
Targeted Content-based Advertising
I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to
do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not
fun. Pleasssse, help? :(
Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and
Narasimhan, M.,KDD 2008
Monday, June 6, 2011
Identifying intents behind user posts on social networks Identify Content with monetization potentialIdentifying keywords for advertising in user-generated content Considering interpersonal communication & off-topic
chatter
Preliminary Results in
Monday, June 6, 2011
Investigations
User studies Hard to compare activity based ads to s.o.t.a Impressions to Clickthroughs How well are we able to identify monetizable posts How targeted are ads generated using our " keywords
vs. entire user generated contentMonday, June 6, 2011
Scribe Intent not same as Web Search Intent 1B.People write sentences, not keywords or phrasesPresence of a keyword does not imply navigational / transactional intents am thinking of getting X (transactional) I like my new X (information sharing) what do you think about X (information seeking)
1B. J. Jansen, D. L. Booth, and A. Spink, Determining the informational, navigational, and transactional intent of web queries,Inf. Process. Manage., vol. 44, no. 3, 2008.
Identifying Monetizable Intents
Monday, June 6, 2011
Action patterns surrounding an entity How questions are asked and not topic words that indicate
what the question is about where can I find a chottopspcam User post also has an entity
From X to Action PaTerns
Monday, June 6, 2011
Set of user posts from SNSsNot annotated for presence or absence of any intent
Conceptual Overview Bootstrapping to learn IS paTerns
Monday, June 6, 2011
Generate a universal set of n- gram paMerns; freq > f
S = set of all 4-grams; freq > 3
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
! !Generate set of candidate paMerns from seed words (why,when,where,how,what)
Sc= all 4-grams in S that extract seed words
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
! !User picks 10 seed paMerns from Sc
Sis= does anyone know how, where do I nd,
someone tell me where
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
! !! !
Gradually expand Sis by adding Information
Seeking paDerns from Sc
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
! !! !
For every pis in Sis generate set of ller paMerns
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
.* anyone know how does .* know how
does anyone .* how does anyone know .*
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
Extracting and Scoring PaTerns
Monday, June 6, 2011
Extracting and Scoring PaTerns
does * know how does someone know how
Functional Compatibility -Impersonal pronouns Empirical Support 1/3
does somebody know how Functional Compatibility -Impersonal pronouns
Empirical Support 0 PaMern Retained
does john know how PaMern discarded
Monday, June 6, 2011
Sc= {does anyone know how, where do I nd,
someone tell me where}
pis= `does anyone know how
Extracting and Scoring PaTerns
Monday, June 6, 2011
pis= `does anyone know how
Extracting and Scoring PaTerns
Monday, June 6, 2011
Extracting and Scoring PaTerns
Monday, June 6, 2011
Functional properties / communicative functions of words
From a subset of LIWC
cognitive mechanical (e.g., if, whether, wondering, nd) I am thinking about geMing X
adverbs(e.g., how, somehow, where)
(e.g., someone, anybody, whichever)
Someone tell me where can I nd X
1Linguistic Inquiry Word Count, LIWC, hMp://liwc.net
Expanding the PaTern Pool
Monday, June 6, 2011
Over iterations, single-word substitutions, functional usage and empirical support conservatively expands Sis
Infusing new paMerns and seed words
Stopping conditions
Details in [WISE2009] for..
Monday, June 6, 2011
Sample Extracted PaTerns
Monday, June 6, 2011
Information Seeking paMerns generated oine
Information seeking intent score of a post
Extract and compare paMerns in posts with extracted paMerns
Transactional intent score of a post LIWC Money dictionary - 173 words and
word forms indicative of transactions, e.g., trade, deal, buy, sell, worth, price etc.
Identifying Monetizable Posts
Monday, June 6, 2011
Identifying keywords in monetizable posts" Plethora of work in this spaceOff-topic noise removal is our focus" I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(
Keywords for Advertizing
Monday, June 6, 2011
Identifying keywords in monetizable posts Plethora of work in this spaceOff-topic noise removal is our focus I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and
ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(
Keywords for Advertising
Monday, June 6, 2011
Topical hints
C1 -['camcorder']Keywords in post
C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']
Move strongly related keywords from C2 to C1 one-by-one
Relatedness determined using information gain Using the Web as a corpus, domain independent
Conceptual Overview (also see slides 88,89)
Monday, June 6, 2011
C1 -['camcorder']C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Informative words ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras',
'canon']
O-topic ChaTer
Monday, June 6, 2011
Keywords from 60 monetizable user posts
Monetizable intent, at least 3 keywords in content45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students
10 sets of 6 posts each Each set evaluated by 3 randomly selected usersMonetizable intents?
All 60 posts voted as unambiguously information seeking in intent
Evaluations -User Study
Monday, June 6, 2011
Google AdSenseads for user post vs. extracted topical keywords
1. Eectiveness of using topical keywords
Monday, June 6, 2011
Instructions User Study
Monday, June 6, 2011
Users picked ads relevant to the post At least 50% inter-evaluator agreementFor the 60 posts Total of 144 ad impressions 17% of ads picked as relevantFor the topical keywords Total of 162 ad impressions 40% of ads picked as relevant
Result -2X Relevant Impressions
Monday, June 6, 2011
Users profile information Interests, hobbies, TV shows.. Non-demographic informationSubmit a postLooking to buy and why (induced noise)Ads that generate interest, captured attention
2. Prole Ads vs. Activity Ads
Monday, June 6, 2011
Using profile ads
Total of 56 ad impressions 7% of ads generated interestUsing authored posts
Total of 56 ad impressions 43% of ads generated interest" Using topical keywords from authored posts
Total of 59 ad impressions 59% of ads generated interest
Result -8X Generated Interest
Monday, June 6, 2011
User studies small and preliminary, clearly suggest Monetization potential in user activity Improvement for Ad programs in terms of relevant
impressionsEvaluations based on forum, marketplace Verbose content Status updates, notes, community and event
memberships One size may not fit all
To note
Monday, June 6, 2011
A world between relevant impressions and click throughs Objectionable content, vocabulary impedance, Ad
placement, network behaviorIn a pipeline of other community effortsNo profile information taken into accountCannot custom send information to Google AdSense
To note
Monday, June 6, 2011
SENTIMENT / OPINION MINING
Monday, June 6, 2011
Two main types of information we can learn from user-generated content: fact vs. opinionMuch of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}"
Content Analysis: Sentiment Analysis/Opinion Mining
Monday, June 6, 2011
Sentiment Analysis Motivation
Which movie should I see?
What customers complain about?
Why do people oppose
health care reform?
Monday, June 6, 2011
Example: How awful that many #Egyptian artifacts are in danger of
being destroyed. What Zahi Hawass must be thinking #jan25 (read in the
tone of what were YOU thinking
Sentiment Analysis: Tasks
Monday, June 6, 2011
Sentiment Analysis: Tasks
Monday, June 6, 2011
Sentiment Analysis: Tasks
Classification: overall sentiment polarity: positive/neutral/negativeExample: How awful that many #Egyptian artifacts are in danger of being destroyed.overall polarity is negative Target-specific sentiment polarity: positive/neutral/negative Example: for target "egyptian artifacts", polarity is "negative for target "Zahi Hawass", polarity is "neutral
Monday, June 6, 2011
Sentiment Analysis: Tasks
Monday, June 6, 2011
Sentiment Analysis: Tasks
Identification & Extraction: opinion, opinion holder, opinion target
Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger"
Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass"
Monday, June 6, 2011
Classification: Supervised: labeled training data features, differ from traditional topic classification tasks learning strategies
Unsupervised: lexicon-based approach Bootstrapping
Sentiment Analysis: Approaches
Monday, June 6, 2011
Sentiment Analysis: Approaches
Monday, June 6, 2011
Sentiment Analysis: Approaches
Identification & Extraction: utilizing the relations between opinion and opinion target, proximity, syntactic dependency, co-occurrence and prepared patterns/rules
Monday, June 6, 2011
Sentiment Analysis: From Tweets to polls
Lexicon-based approach for sentiment analysis of tweets:subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
Sentiment Analysis: From Tweets to polls
subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
Sentiment Analysis: From Tweets to polls
Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
Sentiment Analysis: From Tweets to polls
B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series. In Intl.AAAI Conference on Weblogs and
Social Media, Washington,D.C.,2010.
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
Corpus: 2.89 million tweets referring to 24 movies released over a period of three monthsSentiment Analysis Classifier:
DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
Sentiment Analysis Classifier:DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
thousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
train the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
Observations:The opinions may not contribute toward the given target (1,2,3,6)The subjectivity and polarity of opinion clues are domain-dependent (5,7)Single words are not enough (4,7,8)
Simple lexicon-based method doesn't work.
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
General subjective lexicon Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
General subjective lexicon Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
Sentiment Analysis: Target-specic opinion identication &
Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
Sentiment Analysis: Target-specic opinion identication &
Classication of Tweets-Unsupervised Approach
Target-specic opinion identication/extraction Shallow syntactic analysis Rules + Proximity
Monday, June 6, 2011
URL Extraction is for Tweets
FourSquare in Facebook, TwiMer
What is it in other mediums/SMS?
Content Analysis: Context Extraction, Utilization
Monday, June 6, 2011
ResolutionSemantic Context Relevance
Content Analysis: URL extraction
Monday, June 6, 2011
Personality Signals Blogs, Style of WritingPsychometric analysis of contentSample study: Gendered writing styles online
Author Categorization: Using Content to derive additional
People metadata
Monday, June 6, 2011
Interesting questions to ask: Who are the most popular people* in the network Who are the most influential people in the network Who are the most active people in the network What are the types of people in communities of the
network Who are the bridges between communities in the network
People Analysis: Using Network to derive People metadata
Monday, June 6, 2011
By Link Analysis AlgorithmsHits [K-99]& variants PageRank [BP-97]& variants etc..Links not sufficient! Million Follower Fallacy[C-10]
People Analysis: Inuence
Source : informing-arts
Monday, June 6, 2011
People Analysis: Inuence
Monday, June 6, 2011
People Analysis: Inuence
Flavor of Context Analysis (activity level)Popularity NOT = Influence! Influence & Passivity[RGAH-10]Interest Similarity TwitterRank: Reciprocity & Homophily [WLJH-10]Klout Score - True Reach, Amplification [Klout]
Monday, June 6, 2011
Blogger, Scientist, Journalist,Artist, Trustee, Company X in DomainY.. Multiple types and affiliations!User interest mining Key Phrase Extraction followed by semantic association on
user bio, tweets, lists, favorite posts Twitter Study [BCDMJNRM-09]
People Analysis: User types & Aliation
Source: kahunainstitute.com
Monday, June 6, 2011
People Analysis: User types & Aliation
Monday, June 6, 2011
Semantic analysis of profile description Web Presence:Use of Web & Knowledge bases
(Wikipedia, Blogs)to build contextfor user types Entity Spotting & Extraction, followed by Semantic
Association and Similarity with user-type context
People Analysis: User types & Aliation
Monday, June 6, 2011
People Analysis: Social Engagement
Frequency Distribution Analysis of user activity posting, retweet, reply, mentions, lists etc.
Source: http://www.syscomminternational.com/
Monday, June 6, 2011
Network Analysis
Interesting questions to ask:
How communities form around topics- growth & evolution
What are the eects of presence of inuential participants in the communities
What are the eects of content nature (or sentiment, opinions) owing in network on the community life
What is the community structure: degree of separation and sub-communities
Foundation of network: NodesConnections/Relationships
Monday, June 6, 2011
Network Analysis: Methods
Source: http://www.kudos-dynamics.com/
Monday, June 6, 2011
Network Analysis: Methods
Source: http://www.kudos-dynamics.com/
Network Structure metricsCentrality, Connected Component, Avg.
Degree, Clustering Coecient, Avg. Path Length, Bridge, Cohesion, Prestige, Reciprocity
Important Literature: [AB-02, WS-98, BW-00; NW-06, WF-92, MW-10]
Monday, June 6, 2011
Community Discovery, growth, evolution Based on relationship types (e.g., signed network),
geography/location based etc. Hierarchical clustering algorithms Top-down, bottom-upModularity Maximization [NW-06]Algorithms comparison survey [B-06]
Network Analysis: Algorithms
Monday, June 6, 2011
Graph Partitioning & TraversalBest time-complexity & reachabilityFollow Greedy paths K-way multilevel Partitioning , Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS,
MST
Network Analysis: Algorithms
"We dream in Graph and We analyze in Matrix-
Barry Wellman, INSNA
Monday, June 6, 2011
Network Analysis: Methods
Network Modeling Approaches Random graph model (Erdos-Renyi model) Small-world model(Small World Phenomenon) Scale-free model(led to Power-Law degree distribution) Social Network Analysis methods Centrality (Degree, Eigenvector, Betweenness, Closeness) Clusters (Cliques and extensions, Communities)
Source: http://www.kudos-dynamics.com/
Monday, June 6, 2011
Information Flow: Diffusion Maximizing Spread (Opinion, Innovation, Recommendation) Outbreak Detection (e.g., disease)Social Network: No info about user action Understanding dynamics is challenging!Power Law distribution [LAH-07]Factors impacting flow: Sampling strategy, user Homophily, content nature
[CLSCK-10, NPS-10]
Network Analysis: Diusion & Homophily
Monday, June 6, 2011
Querying
Monday, June 6, 2011
(Network WorkBench)NWBTruthy Graph-toolOrangePajekTuliphttp://en.wikipedia.org/wiki/social_network_analysis_software
Analysis & Visualization Tools
Source: hMp://truthy.indiana.edu/
Monday, June 6, 2011
Event Detection
Monday, June 6, 2011
Citizen Sensing in Real-time
Monday, June 6, 2011
People cant wait forInformation500 years ago
Single life time20 years ago
Next day or two Television,News papers
Presently
Minutes are notconsideredfast enough Digital media,Social media
Real-Time Motivation
Monday, June 6, 2011
Is Real-Time the future of Web?Social Media for Real-Time Web Disaster Management
Ushahidi Real-Time Markets
Examples Brand Tracking
Twarql Movie reviews
Real-Time Social Media
Monday, June 6, 2011
Scenario
The GuardianFeb 2010
Monday, June 6, 2011
Scenario
The GuardianFeb 2010
Monday, June 6, 2011
Scenario
Journalist
The GuardianFeb 2010
Monday, June 6, 2011
Information Overload Can we aggregate, organize and collectively analyze data
Real Time Can we deliver the data as it is generated
Challenges
Monday, June 6, 2011
Expressive description of Information need
Using SPARQL (Instead of traditional keyword search)Flexibility on the point of view
Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc..
Streaming data with Background Knowledge
Enables automatic evolution and serendipityScalable Real-Time delivery
Using sparqlPuSH (SFSW'10)
A Semantic Web Approach
Monday, June 6, 2011
Concept Feed
Monday, June 6, 2011
Architecture
Monday, June 6, 2011
Social Sensor Server
Monday, June 6, 2011
Named Entity Recognition 2 Million Entities from DBPedia Load as Trie for efficiency N-grams matched Example: Obama, Barack Obama
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
URL, HashTag Extraction Regex extraction Resolution URL Resolution: Follows http redirects for resolution HashTag Resolution:Tagdef, Tagal,WTHashTag.com
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
Other Metadata provided by Twitter User profile: User Name, Location, Time etc.. Tweet: RT, reply etc..
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
RDF Annotation Common RDF/OWL Vocabularies FOAF -(foaf-project.org) Friend of aFriend SIOC- (sioc-project.org) Semantically Interlinked
Online Communities
OPO -(online-presence.net) Online PresenceOntology MOAT -(moat-project.org) Meaning Of A Tag
Structured Data(Social Sensor Server)
Monday, June 6, 2011
Structured Data(Social Sensor Server)
Monday, June 6, 2011
A snippet of the annotation
rdf:type sioct:MicroblogPost ; sioc:content Fingers crossed for the upcoming #hcrvote
sioc:hascreator ; foaf:maker ;
moat:taggedWith dbpedia:Healthcare_reform . geonames:locatedIn
Dbpedia:Ohio .
Structured Data(Social Sensor Server)
Monday, June 6, 2011
Semantic Publisher
Monday, June 6, 2011
Virtuoso to store triplesQueries formulated by the users are storedSPARQL protocol over the HTTP to access rdf from the storeCombine data from tweet with the background knowledge in the rdf store
Semantic Publisher
Monday, June 6, 2011
Application Server & Distribution Hub
Monday, June 6, 2011
Distribution Hub PUSH Model - Pubsubhubbub protocol Pushes the tweets to the Application Server
Application Server Delivers data to the Clients RSS Enable Concept feeds
Application Server & Distribution Hub
Monday, June 6, 2011
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
category:Wi-Fi category:Touchscreen
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
category:Wi-Fi category:Touchscreen
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
HPTabletPCIPhone
Brand Tracking - Example
Monday, June 6, 2011
1242 Articles from Nytimes
Around 800,000 tweets
Monday, June 6, 2011
1242 Articles from Nytimes
Around 800,000 tweets
President Obama lays out plan for
Health care reform in Speech to Joint
Session of Congress (10th Sept
Timeline.com)
Monday, June 6, 2011
1242 Articles from Nytimes
Around 800,000 tweets
President Obama lays out plan for
Health care reform in Speech to Joint
Session of Congress (10th Sept
Timeline.com)
Obama taking an active role in Health talks in pursuing his proposed overhaul
of health care system. (13th Aug
Nytimes)Monday, June 6, 2011
Twarql on Linked Open Data
Monday, June 6, 2011
Twarql on Linked Open Data
Monday, June 6, 2011
Emerging Research Areas
Monday, June 6, 2011
Reasons for spamming include: Gaining Popularity Use of popular topic related keywords (e.g. hashtags of
trending topics) to propagate something off topic.
Launching malicious attacks Phishing attacks, virus, malware etc. Misleading the masses Propagating false information [MM-10].
Spam in Social Networks
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
Spam detection Content-based features ContentSize,URL type, spam words
Metadata-based features Account information, behavior.
Network-based features Provenance. (e.g. content from a reliable source)
Spam in Social Networks
Monday, June 6, 2011
Reputation,Policy,Evidence, and Provenance used to derive trustworthiness.Illustrative examples of online cues used for trust assessment. Wikipedia: article size, number of references, author, edit
history, age of the article, edit frequency etc. Product Reviews: number of helpful, very helpful ratings,
author expertise, sentiments in comments received for a review etc.
Trust in Social Networks
Monday, June 6, 2011
We propose trust ontology[AHTS-10] that Captures semantics of trust. Enables representation and reasoning with trust.Semantics of Trust specifies, for a given trustor and trustee, the following features. Type - Type of trust relationship. Scope - Context of the trust relationship. Value - Quantifies the trust relationship.
Trust in Social Networks
Monday, June 6, 2011
Gleaning primitive (edge) trust Trust value between two nodes is quantified using
numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09].Gleaning composite (path) trust Propagation via chaining and aggregation (transitivity)Some popular algorithms for trust computation Eigentrust, Spreading Activation, SUNNY etc.
Trust in Social Networks
Monday, June 6, 2011
Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative.Benefits of combining observations from humans and machine sensors Complementary evidence. Corroborative evidence
Integrating Social And Sensor Networks
Monday, June 6, 2011
Applications of integrating heterogeneous sensor observations Situation Awareness by using human observations to
interpret machine sensor observations. Enhancing trustworthiness using corroborative evidence.
Integrating Social And Sensor Networks
Monday, June 6, 2011
Instant Discovery: Geo-tagging and location-aware services, in combination with search, have made discovery a two-way street.
Compressed Expression: Mobile makes social networking even more compelling
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
Compressed Expression: Mobile makes social networking even more compelling
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
Mobile Social Computing
Monday, June 6, 2011
Mobile Social Computing
Monday, June 6, 2011
Mobile Social Computing
Automated Decisions: Smart apps helps to make faster decisions or even apps makes decisions for usPeer Power: Mobiles can create social movements based on peer influence
Monday, June 6, 2011
Personalized Branding: advertising are rapidly becomingpersonalized based onindividual's needs and preferencesMobiles in social development becoming an integral part of development Coordination in disaster situations Health care delivery, especially in developing countries Elections and other forms of political expression
Mobile Social Computing (Cont.)
Monday, June 6, 2011
Research Application: Twitris
Monday, June 6, 2011
1. Information OverloadMultiple events around usWHAT to be aware ofMultiple Storylines aboutsame event!!
Twitris - Motivation
Monday, June 6, 2011
2. Evolution of Citizen Observation with location and time
Twitris - Motivation
Monday, June 6, 2011
3. Semantics of Social perceptions
What is being said about an event (theme) where (spatial) When (temporal )
Twitris lets you browse citizen reports using social perceptions as the fulcrum
Twitris - Motivation
Monday, June 6, 2011
Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media
Twitris: Semantic Social Web Mash-up
Monday, June 6, 2011
Twitris: Architecture
Monday, June 6, 2011
Twitris: Functional Overview
Monday, June 6, 2011
Twitris: Functional Overview
Monday, June 6, 2011
Twitris: Event Summarization 1
Monday, June 6, 2011
Sentiment Analysis using statistical and machine learning techniques
Twitris: Event Summarization 2
Monday, June 6, 2011
Entity-relationship graph
using semantically annotated DBpedia entities mentioned in the tweets
Twitris: Event Summarization 3
Monday, June 6, 2011
http://twitris.knoesis.org/
http://knoesis1.wright.edu/sidfot/
Twitris: Demo, Quick Show
Monday, June 6, 2011
Twitris: On going work
Monday, June 6, 2011
Domain models to enhance understanding of the content
Twitris: Knowledge-Enabled Computing
Monday, June 6, 2011
Great role in military and NGOrescue operations during emergencies:Haiti and Chile Earthquakes
Twitris: Coordination
Monday, June 6, 2011
Coordinating needs and resources in disaster situation Analyze SMS and Web reports from disaster location Use domain models for efficient and timely coordination
Twitris: Coordination
Monday, June 6, 2011
Modeling relationships between social behavior,roles, social and cultural values, etc.
Twitris: Socio-Cultural-Behavior Model as Lens
Monday, June 6, 2011
We simply do not have enough genes to program the brain fully in advance, we must work together, extending and supporting our own intelligence with social prosthetic systems that make up for our missing cognitive and emotional capacities:Evolution has allowed our brains to be configured during development so that we are plug compatible with other humans, so that others can help us extend ourselves.- Harvard "Group Brain Project"
Collaboration
Monday, June 6, 2011
Open Source Linux,Apache, ...Social Networks Facebook, Twitter, ...Crowd Sourcing Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana...Collaborative Governance Peer-to-Patent, ...
Beginnings
Monday, June 6, 2011
http://gomadam.org/tutorial
@namelessnerd
Monday, June 6, 2011
Facebook + Twitter Iran post-election protests Tunisia,Egypt, Libya, Bahrain, ... Ushahidi Kenya Violence India, Lebanon, Afghanistan, and Sudan elections Haiti Earthquake Pakistan Floods
Popular Initiatives
Monday, June 6, 2011
Kiirti BBMP election monitoring Bangalore AutoWatch
Popular Initiatives
Monday, June 6, 2011
FixOurCity allows citizens to report, view and discuss civic issues in their locality.
FixOurCity Process Flow
Monday, June 6, 2011
Built on top of FixMyCity open-source codebaseStage I Report by Area/Ward and Street Integration with Google Map Displays Ward member name/contact details Select category of issue, description and severity Confirmation through email to avoid misuse
FixOurCity Backend
Monday, June 6, 2011
Stage II/III Normalize incoming reports to official wards and
categories Integration with Corporation website to allow auto-
forwarding and updating of reports
FixOurCity Backend
Monday, June 6, 2011
Information Collection: SMS (FrontlineSMS, Clickatell), Email, WebVisualization/Interactive Mapping: Timeline, Category, Geo-spatialAlerts: Geo-spatialAdmin: User Management, Report Moderation / Creation, Site Statistics
Ushahidi Features
Monday, June 6, 2011
Enables filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds.
SwiftRiver Architecture - I
Monday, June 6, 2011
Kiirti allows you to set up your own instance of the Ushahidi Platform without having to install it on your own web server. And, it provides pre-integrated Voice and SMS reporting capabilities within India.
Kiirti Features
Monday, June 6, 2011
Kiirti - Flywheel of Engagement
Monday, June 6, 2011
Sahana: a Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government groups, the civil society (NGOs) and the victims themselves.
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Requests Management: Tracks requests for aid and matches them against donors who have pledged aid.Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.
Sahana Features
Monday, June 6, 2011
Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Missing Persons Registry: Report and Search for Missing Persons.Disaster Victim Identification.Shelter Registry- Tracks the location, distribution, capacity and breakdown of victims in Shelters.
Sahana Features
Monday, June 6, 2011
Hospital Management System- Hospitals can share information on resources & needs.Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Sahana Features
Monday, June 6, 2011
Mapping- Situation Awareness & Geospatial Analysis.Messaging- Sends & Receives Alerts via Email & SMS.Document Library- A library of digital resources, such as Photos & Office documents.
Sahana Features
Monday, June 6, 2011
Peer To Patent is a historic initiative by the United States Patent and Trademark Office (USPTO) that opens the patent examination process to public participation for the first time. Peer to Patent is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications.
Peer to Patent
Monday, June 6, 2011
Twitris 2.0, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties.
Twitris Architecture
Monday, June 6, 2011
Online Dispute Resolution 30M+ pending cases in India's courtsPublic Policy ReviewsCrisis ManagementEffective Local Governance
Future Possibilities
Monday, June 6, 2011
http://www.nascio.org/events/2009Midyear/documents/NASCIO-KeynoteNoveck.pdfhttp://citizensensing.posterous.com/[MM-10] Eni Mustafaraj, Panagiotis Metaxas, From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search, In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (April 2010).[AHTS-10] Pramod Anantharam, Cory A. Henson, Krishnaprasad Thirunarayan and, Amit P. Sheth, 'Trust Model for Semantic Sensor and Social Networks: A Preliminary Report', National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010.[TAHS-09] K. Thirunarayan, Dharan K. Althuru, Cory A. Henson, and Amit P. Sheth, 'A Local Qualitative Approach to Referral and Functional Trust,' In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009.
References
Monday, June 6, 2011
B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series.In International AAAI Conference on Weblogs and Social Media, Washington,D.C.,2010.Sitaram Asur and Bernardo A.Huberman. Predicting the Future With Social Media. 2010. http://arxiv.org/abs/1003.5699A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, PolandDaniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on 'Data Management and Mining for Social Networks and Social Media', 2010
References
Monday, June 6, 2011
A. Sheth, C. Thomas, and P. Mehra, Continuous Semantics to Analyze Real-Time Data, IEEE Internet Computing, November-December 2010, pp. 80-85[NPS-10] M. Nagarajan, H. Purohit, and A. Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices, 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010[RGAH-10] D. Romero, W. Galuba, S. Asur, and B. Huberman. Influence and Passivity in Social Media. Arxiv preprint, arXiv:1008.1253, 2010[LLDM-10] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29{123, 2009.[CHBG-10] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM'04, 2010.[BP-98] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, Vol 30, 1-7, 1998.
References
Monday, June 6, 2011
[K-99] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604 -632, 1999.[AB-02] R. Albert and A.L. Barabasi. Statistical Mechanics of Complex Networks. Rev. Modem Physics, vol. 74, no. 1, pp. 47-97, 2002.[WLJH-10] Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi He. TwitterRank: nding topic-sensitive influential twitterers. WSDM, 2010.[BCDMJNRM-09] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan. User interests in social media sites: an exploration with micro-blogs. CIKM '09.[RCD-10] A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. InHuman Language Technologies: ACL (HLT '10).[WS-10] D.J. Watts; S.H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393 (6684): 40910, 1998
References
Monday, June 6, 2011
[NW-06] M. E. J. Newman, D. J. Watts The structure and dynamics of network, Princeton University Press, 2006[WF-92] Wasserman & Faust, Social Network Analysis, 1992[EK-10] D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010[MW-10] A. Marin and B. Wellman. Handbook of Social Network Analysis, 2010[B-06] H. Balakrishnan. Algorithms for Discovering Communities in Complex Networks. Ph.D. Dissertation. University of Central Florida, Orlando, FL, USA. Advisor(s) Narsingh Deo. 2006[CLSCK-10] M. D. Choudhury, , Y-R. Lin, H. Sundaram, K. S. Candan, L. Xie, A. Kelliher. How Does the Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?. ICWSM 2010[LAH-07] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Trans. Web 1, 1, Article 5, May 2007.
References
Monday, June 6, 2011