Facets and Faceted NavigationDevelopment
Tom ReamyChief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda
Two Case Studies– Good and Bad
Development Process– Research Foundation
– Facet Design: Sources– Integrated Solution
• Metadata Strategy – Technology and People
– Develop, Test, Monitor, Refine Application Conclusions
3
Enterprise Environment – Case Studies
A Tale of Two Taxonomies – It was the best of times, it was the worst of times
Basic Approach– Initial meetings – project planning– High level K map – content, people, technology– Contextual and Information Interviews– Content Analysis– Draft Taxonomy – validation interviews, refine– Integration and Governance Plans
4
Enterprise Environment – Case One – Taxonomy, 7 facets
Taxonomy of Subjects / Disciplines:– Science > Marine Science > Marine microbiology > Marine toxins
Facets:– Organization > Division > Group– Clients > Federal > EPA– Instruments > Environmental Testing > Ocean Analysis > Vehicle– Facilities > Division > Location > Building X– Methods > Social > Population Study– Materials > Compounds > Chemicals– Content Type – Knowledge Asset > Proposals
5
Enterprise Environment – Case One – Taxonomy, 7 facets
Project Owner – KM department – included RM, business process
Involvement of library - critical Realistic budget, flexible project plan Successful interviews – build on context
– Overall information strategy – where taxonomy fits Good Draft taxonomy and extended refinement
– Software, process, team – train library staff– Good selection and number of facets
Final plans and hand off to client
6
Enterprise Environment – Case Two – Taxonomy, 4 facets
Taxonomy of Subjects / Disciplines:– Geology > Petrology
Facets:– Organization > Division > Group– Process > Drill a Well > File Test Plan– Assets > Platforms > Platform A– Content Type > Communication > Presentations
7
Enterprise Environment – Case Two – Taxonomy, 4 facets
Environment Issues– Value of taxonomy understood, but not the complexity
and scope– Under budget, under staffed– Location – not KM – tied to RM and software
• Solution looking for the right problem
– Importance of an internal library staff– Difficulty of merging internal expertise and taxonomy
8
Enterprise Environment – Case Two – Taxonomy, 4 facets
Project Issues– Project mind set – not infrastructure– Wrong kind of project management
• Special needs of a taxonomy project• Importance of integration – with team, company
– Project plan more important than results• Rushing to meet deadlines doesn’t work with semantics as
well as software
9
Enterprise Environment – Case Two – Taxonomy, 4 facets
Research Issues– Not enough research – and wrong people– Interference of non-taxonomy – communication– Misunderstanding of research – wanted tinker toy connections
• Interview 1 implies conclusion A
Design Issues– Not enough facets– Wrong set of facets – business not information– Ill-defined facets – too complex internal structure
10
Taxonomy DevelopmentConclusion: Risk Factors
Political-Cultural-Semantic Environment – Not simple resistance - more subtle
• – re-interpretation of specific conclusions and sequence of conclusions / Relative importance of specific recommendations
Understanding project scope Access to content and people
– Enthusiastic access
Importance of a unified project team– Working communication as well as weekly meetings
11
Faceted Navigation: Development processOverview
Research Foundation – KA Audit– Environment – Technology and People– Users, Content, Information Behaviors and Needs
Facet Design - Sources– Selection of Facets and Facet Structure
Integrated solution– Metadata Strategy – Technology and People
Application – Design, Develop, Test, Refine– Monitor and Refine
12
Faceted Navigation: Development processInformation / Knowledge Environment
Strategic Foundation– Info Problems – what, how severe– Political environment – support, special interests
Strategic Questions – why, what value from the taxonomy and facet classification, how are you going to use it
Technology Environment – ECM, Enterprise Search High Level Content Map / Content Structures High Level Community Map – formal and informal
13
Faceted Navigation: Development processFacet Design - Sources
Facet Theory and Practice– Broaden your perspective
Domain Collection - metadata– Database or Catalog– Unstructured content – Much more difficult
Content Structure – vocabularies, glossaries, etc. Building Facets – facetize the taxonomy
– Pull out facets – • Chemistry – Agents/Compounds, Instruments• Chemistry and Health -- methods
Current or projected metadata as source– Content Types – presentations, well reports, policy
14
Faceted Navigation: Development processResearch Foundation
Users – formal and informal communities– How do users think, categorize
– Information behaviors and needs– Natural Level categories
What labels do they use? – Assets vs. Facilities and instruments / Processes vs Activities– Issue – labels that people use to describe their business and label
that they use to find information
Suitability of Facets and Facet Labels– Support for user tasks
Interviews, surveys, search log analysis, folksonomies
15
Faceted Navigation: Development process An Integrated Approach: Elements Multiple Knowledge Structures
– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness
Technology – Search, Content Management Text analytics
– Entity extraction – feeds facets, signatures, ontologies– Taxonomy & Auto-categorization – aboutness, subject
People – tagging, evaluating tags, fine tune rules and taxonomy
People – Users, social tagging, suggestions
16
Faceted Navigation: Development process Integrated Solutions: Technology Search – Integrated features, facets and clusters and tag
clouds and feedback Enterprise Content Management – tagging and Policy
– Place to add metadata, supported by policy– Gather input from authors, tag clouds plus
Text Analytics – Taxonomy management, entity extraction, categorization, sentiment
– Auto-populate variety of metadata – author, title, date, etc.– Relevance – best bets to weights and classes of documents
17
Faceted Navigation: Development process Software Tools – Auto-categorization Auto-categorization
– Training sets – Bayesian, Vector Machine– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Advanced – saved search queries (full search syntax)– NEAR, SENTENCE, PARAGRAPH– Boolean – X NEAR Y and Not-Z
Advanced Features– Facts / ontologies /Semantic Web – RDF +– Sentiment Analysis – positive, negative, neutral
18
Faceted Navigation: Development process Software Tools – Entity Extraction Dictionaries – variety of entities, coverage, specialty
– Cost of update – service or in-house– Inxight – 50+ predefined entity types– Nstein – 800,000 people, 700,000 locations, 400,000 organizations
Rules– Capitalization, text – Mr., Inc.– Advanced – proximity and frequency of actions, associations– Need people to continually refine the rules
Entities and Categorization– Total number and pattern of entities = a type of aboutness of
the document – Bar Code, Fingerprint
19
Faceted Navigation: Development process Integrated Solution: People Programmers, Librarians, Taxonomists, Metadata specialist
– Integrate, design, develop rules, monitor activity & quality
Authors, Subject Matter Experts– Input into design (important facets), rules, activity meaning
Users – Web 2.0– Feedback – quality and usability– Suggestions – missing terms, bad categorization & entity– Tags Clouds & folksonomy – for social networking features,
not for information retrieval
20
Faceted Navigation: Development process Faceted Navigation Application Usability Studies
– Integration with browse/search - Findability– Equal ranked facets or primary-secondary facets– Granularity of Facets– Ordering of the facets– Sorting within facets
Monitor usage and refine.– Unused facets / Preferred facets / facet combinations– Map to user communities / information behaviors
Refine auto-categorization and entity values– Disambiguation
21
Conclusion - Development
Design starts with self-knowledge – users, content, activities Integrated Solution is needed
– Multiple Knowledge structures, technology, people– Search, Content management, text analytics
Faceted navigation requires a lot of Metadata Text Analytics (Entity extraction and auto-categorization) are
essential Monitor and Refine never ends – dedicated resources Semantic Projects are different
– Project management, software evaluation
22
Conclusions – Faceted Navigation
The future is the combination of simple facets (name catalogs of entities) with rich taxonomies with complex semantics / ontologies
– Ontologies = Relationships of two facets
Facets call for a new type of taxonomies– Faceted taxonomies and/or simple taxonomies
Future – new kinds of applications:– Text Mining, research tools, sentiment
Future of Search – smart ways to refine results, not better relevance
– Real problem with 10 mil hits – no way to get to target– Include facets, taxonomies, semantics, & lots of metadata
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
24
Faceted Navigation Resources
Articles– Faceted Classification Resource Collection
• http://deyalexander.com/resources/faceted-classification.html
– A Simplified Model for Facet Analysis• http://iainstitute.org/pg/
a_simplified_model_for_facet_analysis.php
– Mailing List for Faceted Classification• http://www.poorbuthappy.com/fcd/
– Study – Facets on the Web (75 ecommerce sites)• http://mypage.iu.edu/%7Eklabarre/facetstudy.html
25
Faceted Navigation Resources
Example Implementations– Berkeley SIMS – Flamenco
http://bailando.sims.berkeley.edu/flamenco.html– Facetmap – demo’s – www.facetmap.com
Tools– Business Objects / Inxight – entity and fact extraction –
www.inxight.com– Teragram – www.teragram.com– Lexalytics – www.lexalytics.com– Data Harmony – www.dataharmony.com– Smart Logic – www.smartlogic.com
26
Faceted Navigation Resources
Vendors– Most Search vendors now offer faceted navigation– FAST, Autonomy, etc.
• Beware of parametric search sold as facets
– Most focused on facets – application and metrics:• Endeca – http://www.endeca.com
27
Faceted Navigation Resources
Articles– How to Make a Faceted Classification and Put It On the Web
• http://www.misatonic.org/library/facet-web-howto.html– Putting Facets on the Web: An Annotated Bibliography
• http://www.miskatonic.org/library/facet-biblio.html
– Ecommerce – cooking and kitchen – Faceted Navigation http://www.
– Extended Faceted Taxonomies for Web Catalogs • http://www.ercim.org/publication/Ercim_News/enw51/tzitzikas.html
– Webdesignpractices – study of ecommerce use of faceted navigation – Use of Faceted Classification
• http://www.webdesignpractices.com/navigation/facets.html