Post on 17-Jan-2016
transcript
Text AnalyticsWorkshop
Tom ReamyChief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda
Introduction – Elements & Infrastructure Platform– Semantics not technology– Infrastructure not project– Value of Text Analytics
Evaluating Software– Two Phase Process– Designing the Team and Content Structures
Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications
– Integration with Search and ECM– Platform for Information Applications
3
KAPS Group: General
Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services:
– Taxonomy/Text Analytics development, consulting, customization– Technology Consulting – Search, CMS, Portals, etc.– Evaluation of Enterprise Search, Text Analytics– Metadata standards and implementation– Knowledge Management: Collaboration, Expertise, e-learning– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
4
Introduction to Text AnalyticsSemantic Infrastructure - Elements Taxonomy – Thesauri, Controlled Vocabulary Metadata – Standard (Dublin Core) and Facets Basic Text Analytics
– Categorization – Document Topics – Aboutness– Entity Extraction – noun phrases, feed facets– Summarization – beyond snippets
Advanced Text Analytics– Fact extraction – ontologies– Sentiment Analysis – good, bad, and ugly
What is in a Name – text analytics or ?
5
Introduction to Text AnalyticsTaxonomy Thesauri, Controlled Vocabulary
– Resources to build on– Indexing not categorization
Taxonomy – Foundation for Categorization– Browse – classification scheme– Formal – Is-Child-Of, Is-Part-Of– Large taxonomies - MeSH – indexing all topics– Small is better – for categorization and faceted navigation
6
Introduction to Text AnalyticsMetadata Metadata standards – Dublin Core - Mostly syntactic not semantic
– Description – static or dynamic (summarization)– Semantic – keywords – very poor performance
Best Bets – high level categorization-search– Human judgments
Audience – mixed results– Role, function, expertise, information behaviors
Facets – classes of metadata– Standard - People, Organization, Document type-purpose– Specialized – methods, materials, products
7
Introduction to Text AnalyticsText Analytics Categorization
– Multiple techniques – examples, terms, Boolean– Built on a taxonomy
Entity Extraction– Catalogs with variants, rule based dynamic
Summarization– Rules – find sentences in a document
Fact Extraction– Relationships of entities – people-organizations-activities
Sentiment Analysis– Rules – adjectives & adverbs not nouns
8
Introduction to Text AnalyticsText Analytics Why Text Analytics?
– Enterprise search has failed to live up to its potential– Enterprise Content management has failed to live up to its potential– Taxonomy has failed to live up to its potential– Adding metadata, especially keywords has not worked
What is missing?– Intelligence – human level categorization, conceptualization– Infrastructure – Integrated solutions not technology, software
Text Analytics can be the foundation that (finally) drives success – search, content management, and much more
9
Text Analytics Platform4 Basic Contexts Ideas – Content Structure
– Language and Mind of your organization– Applications - exchange meaning, not data
People – Company Structure– Communities, Users– Central team - establish standards, facilitate
Activities – Business processes and procedures Technology
– CMS, Search, portals, taxonomy tools– Applications – BI, CI, Text Mining
10
Text Analytics Platform: The start and foundationKnowledge Architecture Audit Knowledge Map - Understand what you have, what you
are, what you want– The foundation of the foundation
Contextual interviews, content analysis, surveys, focus groups, ethnographic studies
Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories
Natural level categories mapped to communities, activities• Novice prefer higher levels• Balance of informative and distinctiveness
Living, breathing, evolving foundation is the goal
11
Text Analytics Platform – BenefitsIDC White Paper Time Wasted
– Reformat information - $5.7 million per 1,000 per year– Not finding information - $5.3 million per 1,000– Recreating content - $4.5 Million per 1,000
Small Percent Gain = large savings– 1% - $10 million– 5% - $50 million– 10% - $100 million
12
Text Analytics Platform – Benefits
Findability within and outside the enterprise– Savings per year - $millions
Rescue enterprise search and ECM projects– Add semantics to search
Clean up enterprise content– Duplication and accurate categorization
Improve the quality of information access– Finding the right information can save millions
Build smarter applications – Social networking, locate expertise within the enterprise
13
Text Analytics Platform – Benefits
Understand your customers– What they are talking about and how they feel about it
Empower your employees – Not only more time, but they work smarter
Understand your competitors– What they are working on, talking about– Combine unstructured content and rich data sources – more
intelligent analysis
14
Text Analytics Platform – Dangers
Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library
– Need all of the above and taxonomists+
Bad Design:– Start with bad taxonomy– Wrong taxonomy – too big or two flat
Bad Categorization / Entity Extraction– Right kind of experience
15
Resources
Books– Women, Fire, and Dangerous Things
• George Lakoff– Knowledge, Concepts, and Categories
• Koen Lamberts and David Shanks– The Stuff of Thought – Steven Pinker
Web Sites– Text Analytics News -
http://social.textanalyticsnews.com/index.php
– Text Analytics Wiki - http://textanalytics.wikidot.com/
16
Resources
Blogs– SAS- Manya Mayes – Chief Strategist -
http://blogs.sas.com/text-mining/
Web Sites – Taxonomy Community of Practice:
http://finance.groups.yahoo.com/group/TaxoCoP/
– Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf
Questions?
Tom Reamytomr@kapsgroup.com
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com