+ All Categories
Home > Documents > Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group...

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group...

Date post: 18-Jan-2016
Category:
Upload: cuthbert-patrick
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com
Transcript
Page 1: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text AnalyticsA Tool for Taxonomy Development

Tom ReamyChief Knowledge Architect

KAPS Group

Program Chair – Text Analytics World

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

2

Agenda

Introduction

Project: Update ACM taxonomy – after 12+ years

Information Environment

Text Mining / Text Analytics Multiple Methods / Reports

Conclusion

Page 3: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

3

Introduction: KAPS Group

Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted & emotion taxonomies, natural categories

Services:– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics, Social Media development, consulting– Text Analytics Quick Start – Audit, Evaluation, Pilot

Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics

Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.

Program Chair – Text Analytics World – March 29-April 1 - SF Presentations, Articles, White Papers – www.kapsgroup.com Current – Book – Text Analytics: How to Conquer Information Overload,

Get Real Value from Social Media, and Add Smart Text to Big Data

Page 4: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

4

Introduction: Approach

Is Automatic Taxonomy Development Here Yet? Not Yet But it is getting closer Hybrid:

– Taxonomists, SME’s, database analysts, text analysts– Text Mining software – basic text analysis – power – Text analytics software – brains

New taxonomy terms & structure– Old = indexing, authors adding tags & keywords– New = auto-tagging, applications

Page 5: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

5

Information Environment

Existing Taxonomy: Computing Classification System Content:

– Database export of Guide to the Computing Literature bibliographic records (.txt; approximately 7GB in 58 files.)

– Statistical distribution of CCS categories across the Digital Library and Guide to Computing Literature (Excel; 4 files)

– ACM Digital Library full text files (PDFs and XML metadata, including CCS categories; approximately 170GB in 240,000 files)

– Ralston Encyclopedia of Computer Science (PDFs and HTML of each article with XML metadata, including CCS categories; approximately 350MB in 1,850 files)

Page 6: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text Analytics in Taxonomy DevelopmentCase Study – Multiple Methods

Text Mining - terms in documents – frequency, date, source, etc.– Text Preparation – Create multiple filters

Quality – important terms, co-occurring terms Time savings – only feasible way to scan documents Clustering – suggested categories, chunking for editors

– Clustering within clusters - explore Entity Extraction – people, organizations, programming

languages, hardware/devices, etc. Joint Work Sessions – interactive exploration

6

Page 7: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Case Study – Taxonomy Development

7

Page 8: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

8

Page 9: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Case Study – Taxonomy Development

9

Page 10: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Case Study – Taxonomy Development

10

Page 11: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Case Study – Taxonomy Development

11

Page 12: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

12

Multiple Sets of Reports

Keyword Frequency– First Pass – 3,026 – Total – 508, 941 (Get from Big Database)– Sub-Totals

• Year Pre-1998, By Year, By 5 year blocks• Map to other variables – Journals, Authors – basis for

communities Keywords in Abstract/Title Cluster analysis of keyword-abstract-title Search Terms in keyword-abstract-title

Page 13: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

13

Entity Extraction – Company, Internet, Organization, Title

Page 14: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

14

Multiple Methods - Reports

Spreadsheets – static reports Database query reports

– Create multiple slices, views, filters

 Working reports – eliminate more noise words Multiple mapping – extractions, author tags &keywords Map – frequency in abstracts, titles, articles Search logs – terms and phrases

Date ranges – trend reports – per terms, new words

Page 15: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

15

Page 16: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

16

Page 17: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

17

Conclusions

Auto-taxonomy not here - Yet Scale requires semi-automated solution Human effort – initial design, text preparation

– Now would add more auto-categorization Human effort – analysis & refinement – of queries, text mining, and

taxonomy Simple taxonomies are better – part of information ecosystem

– Lower levels of terms – into auto-tagging rules Early 2015: New Book:

– Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data

– Title might be shorter but it will be cover all you need to know

Page 18: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com


Recommended