+ All Categories
Home > Documents > Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect...

Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect...

Date post: 24-Dec-2015
Category:
Upload: gwenda-carr
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
40
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Transcript
Page 1: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Best of All Worlds Text Analytics and Text Mining

and Taxonomy Tom Reamy

Chief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

2

Agenda

Text Analytics Introduction– Text Analytics – Text Mining

Case Study – Taxonomy Development Text Analytics, Text Mining, and Taxonomy, Text Analytics Applications – New Directions

– Search & Info Apps– Expertise Analysis, Behavior Prediction, More

Conclusions

Page 3: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

3

KAPS Group: General

Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching

– Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services:

– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.

Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies

Presentations, Articles, White Papers – http://www.kapsgroup.com

Page 4: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

4

Taxonomy, Text Mining, and Text AnalyticsText Analytics Features Noun Phrase Extraction

– Catalogs with variants, rule based dynamic– Multiple types, custom classes – entities, concepts, events– Feeds facets

Summarization– Customizable rules, map to different content

Fact Extraction– Relationships of entities – people-organizations-activities– Ontologies – triples, RDF, etc.

Sentiment Analysis– Rules – Objects and phrases – positive and negative

Page 5: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

5

Taxonomy, Text Mining, and Text AnalyticsText Analytics Features Auto-categorization

– Training sets – Bayesian, Vector space– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Semantic Network – Predefined relationships, sets of rules– Boolean– Full search syntax – AND, OR, NOT– Advanced – DIST (#), PARAGRAPH, SENTENCE

This is the most difficult to develop Build on a Taxonomy Combine with Extraction

– If any of list of entities and other words

Page 6: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

6

Page 7: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Case Study – Categorization & Sentiment

7

Page 8: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Case Study – Categorization & Sentiment

8

Page 9: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

9

Page 10: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

10

Page 11: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

11

Page 12: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

12

Page 13: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

13

Page 14: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Taxonomy and Text Analytics

14

Page 15: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Taxonomy and Text Analytics

15

Page 16: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Taxonomy, Text Mining, and Text AnalyticsCase Study – Taxonomy Development

Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms

16

Page 17: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Case Study – Taxonomy Development

17

Page 18: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Case Study – Taxonomy Development

18

Page 19: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Case Study – Taxonomy Development

19

Page 20: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

20

Text Analytics Development

Page 21: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

21

New Directions in Social MediaText Analytics, Text Mining, and Predictive Analytics Two Systems of the Brain

– Fast, System 1, Immediate patterns (TM)– Slow, System 2, Conceptual, reasoning (TA)

Text Analytics – pre-processing for TM– Discover additional structure in unstructured text– Behavior Prediction – adding depth in individual documents – New variables for Predictive Analytics, Social Media Analytics– New dimensions – 90% of information

Text Mining for TA– Semi-automated taxonomy development – Bottom Up- terms in documents – frequency, date, clustering– Improve speed and quality – semi-automatic

Page 22: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

22

Text Analytics and TaxonomyComplimentary Information Platform Taxonomy provides a consistent and common vocabulary

– Enterprise resource – integrated not centralized Text Analytics provides a consistent tagging

– Human indexing is subject to inter and intra individual variation Taxonomy provides the basic structure for categorization

– And candidates terms Text Analytics provides the power to apply the taxonomy

– And metadata of all kinds Text Analytics and Taxonomy Together – Platform

– Consistent in every dimension– Powerful and economic

Page 23: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

23

Taxonomy, Text Mining, and Text AnalyticsMetadata – Tagging – the Problem How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough

– And expensive – central or distributed Library staff –experts in categorization not subject matter

– Too limited, narrow bottleneck– Often don’t understand business processes and business uses

Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance

Text Analytics is the answer(s)!

Page 24: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

24

Taxonomy, Text Mining, and Text AnalyticsMetadata Tagging – the Solution Mind the Gap – Manual, Automatic, Hybrid All require human effort – issue of where and how effective Manual - human effort is tagging (difficult, inconsistent) Automatic and Hybrid - human effort is prior to tagging

– Build on expertise – librarians on categorization, SME’s on subject terms

Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for

categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select

from head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets

Hybrid – Automatic is really a spectrum – depends on context

Page 25: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

25

Taxonomy, Text Mining, and Text AnalyticsApplications: Search Multiple Knowledge Structures

– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness– Ontology – Relationships / Facts

• Subject – Verb - Object Software - Search, ECM, auto-categorization, entity

extraction, Text Analytics and Text Mining People – tagging, evaluating tags, fine tune rules and

taxonomy People – Users, social tagging, suggestions Rich Search Results – context and conversation

Page 26: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

26

Page 27: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

27

Page 28: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

28

Taxonomy, Text Mining, and Text AnalyticsApplications: Search-Based Applications Platform for Information Applications

– Content Aggregation– Duplicate Documents – save millions!– Text Mining – BI, CI – sentiment analysis– Combine with Data Mining – disease symptoms, new

• Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata– Social – expertise, categorize tweets and blogs, reputation– Ontology – travel assistant – SIRI

Use your Imagination!

Page 29: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

29

Taxonomy, Text Mining, and Text AnalyticsApplications: Expertise Analysis Sentiment Analysis to Expertise Analysis(KnowHow)

– Know How, skills, “tacit” knowledge Experts write and think differently Basic level is lower, more specific

– Levels: Superordinate – Basic – Subordinate• Mammal – Dog – Golden Retriever

– Furniture – chair – kitchen chair Experts organize information around processes, not

subjects Build expertise categorization rules

Page 30: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

30

Taxonomy, Text Mining, and Text AnalyticsExpertise – application areas Taxonomy / Ontology development /design – audience focus

– Card sorting – non-experts use superficial similarities Business & Customer intelligence – add expertise to sentiment

– Deeper research into communities, customers Text Mining - Expertise characterization of writer, corpus eCommerce – Organization/Presentation of information – expert, novice Expertise location- Generate automatic expertise characterization based

on documents Experiments - Pronoun Analysis – personality types

– Essay Evaluation Software - Apply to expertise characterization• Model levels of chunking, procedure words over content

Page 31: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

31

Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Analyze customer support notes General issues – creative spelling, second hand reports Develop categorization rules

– First – distinguish cancellation calls – not simple– Second - distinguish cancel what – one line or all– Third – distinguish real threats

Page 32: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

32

Beyond SentimentBehavior Prediction – Case Study

Basic Rule– (START_20, (AND,  – (DIST_7,"[cancel]", "[cancel-what-cust]"),– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))

Examples:– customer called to say he will cancell his account if the does not stop receiving

a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to

cancel his act– ask about the contract expiration date as she wanted to cxl teh acct

Combine sophisticated rules with sentiment statistical training and Predictive Analytics

Page 33: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

33

Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support Example – Android User Forum Develop a taxonomy of products, features, problem areas Develop Categorization Rules:

– “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.”

– Find product & feature – forum structure– Find problem areas in response, nearby text for solution

Automatic – simply expose lists of “solutions”– Search Based application

Human mediated – experts scan and clean up solutions

Page 34: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

34

Taxonomy, Text Mining, and Text Analytics Conclusions Text Analytics is an essential platform for multiple applications Text Analytics and Text Mining and Taxonomy are mutually

enriching approaches Sentiment Analysis, Beyond Positive & Negative

New emotion taxonomies, context around terms New applications – Expertise, behavior prediction, etc.

Future – new kinds of applications:– Enterprise Search – Hybrid ECM model with text analytics– Expertise Analysis, Behavior Prediction, and more– Social Media and Big Data built from TM & TA– NeuroAnalytics – cognitive science meets taxonomy and

more• Watson is just the start

Page 35: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 36: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

36

Resources

Books– Women, Fire, and Dangerous Things

• George Lakoff– Knowledge, Concepts, and Categories

• Koen Lamberts and David Shanks– Formal Approaches in Categorization

• Ed. Emmanuel Pothos and Andy Wills– The Mind

• Ed John Brockman • Good introduction to a variety of cognitive science theories,

issues, and new ideas– Any cognitive science book written after 2009

Page 37: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

37

Resources

Conferences – Web Sites– Text Analytics World– http://www.textanalyticsworld.com

– Text Analytics Summit– http://www.textanalyticsnews.com

– Semtech– http://www.semanticweb.com

Page 38: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

38

Resources

Blogs– SAS- http://blogs.sas.com/text-mining/

LinkedIn Groups:– Text Analytics World– Text Analytics Group– Data and Text Professionals– Sentiment Analysis– Metadata Management– Semantic Technologies

Page 39: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

39

Resources

Web Sites – Taxonomy Community of Practice:

http://finance.groups.yahoo.com/group/TaxoCoP/– Whitepaper – CM and Text Analytics -

http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

– Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com

Page 40: Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

40

Resources

Articles– Malt, B. C. 1995. Category coherence in cross-cultural

perspective. Cognitive Psychology 29, 85-148– Rifkin, A. 1985. Evidence for a basic level in event

taxonomies. Memory & Cognition 13, 538-56– Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987.

Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086

– Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82


Recommended