+ All Categories
Home > Documents > Taxonomy Boot Camp Panel Text Analytics

Taxonomy Boot Camp Panel Text Analytics

Date post: 01-Jan-2016
Category:
Upload: yetta-benjamin
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Taxonomy Boot Camp Panel Text Analytics. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Taxonomy and Text Analytics Search, Taxonomy, and Text Analytics Case Study – Taxonomy Development - PowerPoint PPT Presentation
Popular Tags:
22
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Transcript
Page 1: Taxonomy Boot Camp Panel Text Analytics

Taxonomy Boot Camp PanelText Analytics

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Taxonomy Boot Camp Panel Text Analytics

2

Agenda

Taxonomy and Text Analytics– Search, Taxonomy, and Text Analytics

Case Study – Taxonomy Development– Text Analytics as a Taxonomy tool– Case Studies – Expertise & Sentiment & Beyond

Future of Text Analytics and Taxonomy– Beyond Indexing - Categorization – Sentiment, Expertise, Ontologies

Page 3: Taxonomy Boot Camp Panel Text Analytics

3

Taxonomy and Text AnalyticsText Analytics Features Noun Phrase Extraction

– Catalogs with variants, rule based dynamic– Multiple types, custom classes – entities, concepts, events– Feeds facets

Summarization– Customizable rules, map to different content

Fact Extraction– Relationships of entities – people-organizations-activities– Ontologies – triples, RDF, etc.

Sentiment Analysis– Rules – Objects and phrases – positive and negative

Page 4: Taxonomy Boot Camp Panel Text Analytics

4

Taxonomy and Text Analytics Text Analytics Features Auto-categorization

– Training sets – Bayesian, Vector space– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Semantic Network – Predefined relationships, sets of rules– Boolean– Full search syntax – AND, OR, NOT– Advanced – DIST (#), PARAGRAPH, SENTENCE

This is the most difficult to develop Build on a Taxonomy Combine with Extraction

– If any of list of entities and other words

Page 5: Taxonomy Boot Camp Panel Text Analytics

Case Study – Categorization & Sentiment

5

Page 6: Taxonomy Boot Camp Panel Text Analytics

6

Page 7: Taxonomy Boot Camp Panel Text Analytics

7

Search, Taxonomy, and Text AnalyticsElements Multiple Knowledge Structures

– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness– Categorization, clusters, entity extraction into facets

A Hybrid Model of ECM and Metadata– Authors, editors-librarians, Text Analytics– Submit a document -> TA generates metadata, extracts

concepts, Suggests categorization (keywords) -> author OK’s (easy task) -> librarian monitors for issues

– Use results as input into analytics And/or Dynamic categorization-extraction at results time

Page 8: Taxonomy Boot Camp Panel Text Analytics

8

Page 9: Taxonomy Boot Camp Panel Text Analytics

9

Page 10: Taxonomy Boot Camp Panel Text Analytics

10

Search, Taxonomy and Text Analytics Multiple Applications Platform for Information Applications

– Content Aggregation– Duplicate Documents – save millions!– Text Mining – BI, CI – sentiment analysis– Combine with Data Mining – disease symptoms, new

• Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata– Social – expertise, categorize tweets and blogs, reputation– Ontology – travel assistant – SIRI

Use your Imagination!

Page 11: Taxonomy Boot Camp Panel Text Analytics

Taxonomy and Text AnalyticsCase Study – Taxonomy Development

Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms

11

Page 12: Taxonomy Boot Camp Panel Text Analytics

Case Study – Taxonomy Development

12

Page 13: Taxonomy Boot Camp Panel Text Analytics

Case Study – Taxonomy Development

13

Page 14: Taxonomy Boot Camp Panel Text Analytics

Case Study – Taxonomy Development

14

Page 15: Taxonomy Boot Camp Panel Text Analytics

15

Taxonomy and Text Analytics ApplicationsExpertise Analysis Sentiment Analysis to Expertise Analysis(KnowHow)

– Know How, skills, “tacit” knowledge Experts write and think differently Basic level is lower, more specific

– Levels: Superordinate – Basic – Subordinate• Mammal – Dog – Golden Retriever

– Furniture – chair – kitchen chair Experts organize information around processes, not

subjects Build expertise categorization rules

Page 16: Taxonomy Boot Camp Panel Text Analytics

16

Expertise Analysis Expertise – application areas Taxonomy / Ontology development /design – audience focus

– Card sorting – non-experts use superficial similarities Business & Customer intelligence – add expertise to sentiment

– Deeper research into communities, customers Text Mining - Expertise characterization of writer, corpus eCommerce – Organization/Presentation of information – expert, novice Expertise location- Generate automatic expertise characterization based

on documents Experiments - Pronoun Analysis – personality types

– Essay Evaluation Software - Apply to expertise characterization• Model levels of chunking, procedure words over content

Page 17: Taxonomy Boot Camp Panel Text Analytics

17

Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Analyze customer support notes General issues – creative spelling, second hand reports Develop categorization rules

– First – distinguish cancellation calls – not simple– Second - distinguish cancel what – one line or all– Third – distinguish real threats

Page 18: Taxonomy Boot Camp Panel Text Analytics

18

Beyond SentimentBehavior Prediction – Case Study

Basic Rule– (START_20, (AND,  – (DIST_7,"[cancel]", "[cancel-what-cust]"),– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))

Examples:– customer called to say he will cancell his account if the does not stop receiving

a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to

cancel his act– ask about the contract expiration date as she wanted to cxl teh acct

Combine sophisticated rules with sentiment statistical training and Predictive Analytics

Page 19: Taxonomy Boot Camp Panel Text Analytics

19

Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support Example – Android User Forum Develop a taxonomy of products, features, problem areas Develop Categorization Rules:

– “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.”

– Find product & feature – forum structure– Find problem areas in response, nearby text for solution

Automatic – simply expose lists of “solutions”– Search Based application

Human mediated – experts scan and clean up solutions

Page 20: Taxonomy Boot Camp Panel Text Analytics

20

Text Analytics Development Best Practices - Principles

Categorization taxonomy structure– Tradeoff of depth and complexity of rules– Multiple avenues – facets, terms, rules, etc.

• No right balance– Recall-precision balance is application specific– Training sets of starting points, rules rule– Need for custom development

Different kinds of taxonomies – Sentiment – products and features– Expertise – process– Categorization – smaller – power in categorization rules– Facets – combine – more orthogonal categories

Page 21: Taxonomy Boot Camp Panel Text Analytics

21

Taxonomy and Text Analytics Conclusions Text Analytics (Entity extraction and auto-categorization,

sentiment analysis) are an essential platform Text Analytics add a new dimension to taxonomy

– Taxonomists are an essential resource – understand information structure

Enterprise Search – Hybrid ECM model with text analytics Future – new kinds of applications:

– Text Mining and Data mining, research tools, sentiment– Social Media – multiple sources for multiple applications– Beyond Sentiment – expertise applications, behavior– NeuroAnalytics – cognitive science meets taxonomy and

more• Watson is just the start

Page 22: Taxonomy Boot Camp Panel Text Analytics

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com


Recommended