Date post: | 14-May-2015 |
Category: |
Technology |
Upload: | puckmiller3 |
View: | 6,369 times |
Download: | 2 times |
Enterprise Class Taxonomy Management and Auto-classification -Leveraging the Term Store for Organizational Metadata to Close
Information and Records Management Capability Gaps in SharePoint
The Term Store Management Company
Don Miller is a senior executive at Concept Searching with over 20 years experience in knowledge management. He is a frequent speaker about Records Management and Information Architecture problems and solutions. Don has been a guest speaker at Taxonomy Boot Camp, Management Electronic Records and numerous SharePoint events about information organization and records management.
Agenda
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Introductions Company Overview, Unique Differentiator, Use Cases The cost and ROI of metadata for Records Management and Findability SharePoint 2010
Enterprise Metadata Management Service Term Store Basics
Enterprise Taxonomy and Auto Classification Product Screen Shots Demo of conceptClassifier for SharePoint 2010
Show native integration into SharePoint 2010 for Records Management and automatic content type updating
Dynamic guided navigation within the search platform Show enterprise Taxonomy Management and auto-classification capabilities
Building out new Taxonomies/Term Sets Term Store Management Enterprise Taxonomy Management
Company founded in 2002 Product launched in 2003 Focus on management of structured and unstructured
information
Technology Automatic concept identification, content tagging, auto-
classification, taxonomy management Only statistical vendor that can extract conceptual metadata
2009, 2010, 2011 ‘100 Companies that Matter in KM’ (KM World Magazine)
KMWorld ‘Trend Setting Product’ of 2009, 2010
Locations: US, UK, & South Africa
Client base: Fortune 500/1000 organizations
Managed Partner under Microsoft global ISV Program - “go to partner” for SharePoint 2010 Term Store Management
Microsoft Enterprise Search ISV , FAST Partner
Enterprise Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier
Concept Searching, Inc.
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
ConceptSearching’s unique statistical concept identification underpins all technologies
Multi word suggestion is explicitly more valuable than single term suggestion algorithms
Automated Multi Word Term Suggestions for Term Store
conceptClassifier will generate conceptual metadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords
• Metadata can be used by any search engine index or any application/process that uses metadata
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Concept Searching provides Automatic Concept Term Extraction
Triple
BaseballThree
Heart
OrganCenter
Bypass
HighwayAvoid
Enterprise Class Product Suite - Deployment Case Studies
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
USAF Medical Service Global Deployments 70,000 Users
LexisNexis FAST Multi User Distributed Taxonomy Management
Architecture
Xerox E Discovery 150 Million Documents
Market Research FAST WWW
Logica FAST 40,000 Users
CAL ISO & MIDWEST ISO FAST WWW
Booz and Company Taxonomy Management
Emerson Climate Technologies Enterprise Deployment
BP Enterprise Deployment
Parsons Brinckerhoff FAST Global Deployment 40,000
CPSC Enterprise wide FAST Enterprise Deployment
National Transportation Safety Board FAST Enterprise Deployment
Health and Human Services FAST Enterprise Deployment
Southern Union Group FAST
What Is poor Metadata (Lack of structure) costing you?
•Identify any type of organizationally defined privacy data
•Combines pattern matching with associated vocabulary
•Automatic Content Type updating enabling workflows and rights management
Data Privacy Protection
•Average cost per exposed record is $197 and ranges from $90-$305 per record
•70% of breaches are due to a mistake or malicious intent by an organization’s own staff
•Average cost runs from $225K to $35M
•Eliminate manual tagging & replace with automatic identification of multi-word concepts
•Provide guided navigation via the taxonomy structure (i.e. concepts)
•Go beyond dynamic clustering with conceptual clustering based on the taxonomies
Search
•“It’s not about better search”
•Less than 50% of content is correctly indexed, meta tagged or efficiently searchable
•85% of relevant documents are never retrieved in search
•Taxonomy navigation is 36% - 48% faster
•Savings 2.5 hours per user per day
•Eliminate inconsistent end user tagging
•Automatically declare documents of record based on vocabulary and retention codes
•Automatically change the Content Type and route to the Records Management repository
Records Management
•67% of data loss in Records Management is due to end user error
•It costs and organization $180 per document to recreate it when it is not tagged correctly and cannot be found
•Savings of $4.00 - $7.04 per record by eliminating manual tagging
•Ensures compliance and reduces potential litigation exposures
•Eliminate duplicate documents
•Identify privacy data exposures
•Identify and declare records that were not previously identified
•Notify users of high value content
•Migrating required content to a structure
Pre Migration/Collaboration
•60% of stored documents are obsolete
•50% of documents are duplicates
•Requires resources to identify what should/not be migrated
•Reduces migration costs
•Ensures compliance and protection of content assets
•Easy end user updates
Problem
Solution
Benefit
A manual metadata approach will fail 95%+ of the time
Issue Organizational ImpactInconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently
searchable rendering it unusable to the organization (IDC)
Subjective Highly trained Information Specialists will agree on meta tags between 33% - 50% of the time. (C. Cleverdon)
Cumbersome - Expensive Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers)
Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier)
No perceived value for end user What’s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non conformance to policies.
What have you seen Metadata will continue to be a problem due to inconsistent human behavior
The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage
knowledge assets in alignment with the unique corporate knowledge infrastructure.
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Create enterprise automated metadata framework/model Average return on investment minimum of 38%
and runs as high as 600% (IDC)
Apply consistent meaningful metadata to enterprise content Incorrect meta tags costs an organization $2,500
per user per year – in addition potential costs for non-compliance (IDC)
Guide users to relevant content with taxonomy navigation Savings of $8,965 per year per user based on an
$80K salary (Chen & Dumais) 100% “Recall” of content, 35% Faster access to
content “Precision”
Use automatic conceptual metadata generation to improve Records Management Eliminate inconsistent end user tagging at $4-$7
per record (Hoovers) Improve compliance processes, eliminate
potential privacy exposures
conceptClassifer for SharePoint 2010 and the Enterprise, provides an automated approach to apply metadata and content types for immediate ROI and business value
1. Align,Model and Validate
2. Automate Tagging
3. Findability
4. Business Processes – Alerts & WF
5. Records Management and PII
6. Life Cycle Managemen
t
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Microsoft’s approach to solving the metadata problem for Records
Management, Governance Policies, Sensitive Information Removal and
Findability:
Content Types, The Term Store and Enterprise Managed
Metadata Services 04/12/2023
A Content Types is a means to apply structure to unstructured or structured content with in SharePoint. Content Types inherit their parent content types.
This is usually a combination of a term or terms from a single or multiple term sets.
Terms are metadata and metadata is information about information. Terms can also include governance and retention code policies and also can be
for the sole purpose of improved findability However, it is best to align Content Types with business goals and business use
cases.
What is a content type
Introducing EMM, The Term Store and Term Store Management Definitions
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Subscription Service
Content Type Hub
Term Store
Term Store Management
Auto Classification
Content Type Updating
SharePoint 2010 Farm
Site Collection
Records Library
Concept Classifier for SharePoint 2010
SharePoint 2010 Enterprise Managed Metadata Service
Managed Metadata Service Manages Enterprise Content Types
via the Content Type Hub Manages Term Store Term Sets (taxonomies) and terms
can be shared across multiple SharePoint site collections
Multiple manage metadata services can be created
Enables search filtering Two types of terms:
Managed terms – pre-defined by an enterprise administrator and may be hierarchical. Surfaced in the "managed metadata" column type
Managed keywords – non-hierarchical words or phrases that have been added to SharePoint 2010 items by users (folksonomy)
The Managed Metadata Service
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
30,000 Terms per Term Set (1 Taxonomy)
1,000 Term Sets
Tested to 1,000,000 Preferred Terms
Enterprise Managed Metadata Service
SharePoint 2010 Managed Metadata Service Considerations
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
SharePoint 2010 Element Comments
Site Collection/Site Structure Can be organized by a hierarchical taxonomy structure
Document Library Structure Can be organized by a hierarchical taxonomy structure
Columns Where terms are applied to content in Document Libraries and Lists
Term A metadata value. Metadata is information about information.
Term Set Hierarchical metadata with values
Managed Metadata SP 2010’s ability to manage terms and term sets - Hierarchical
Keywords Allows to add metadata by end user, not recommended for enterprise use – Flat List Only!
Content Types Ability to use managed metadata and associate with different columns and different term sets with a specific content type for a specific business requirement, governance (i.e. PII, Retention Code, SOX) or Findability (facets for navigation)
File Share or Directory Structures
Database fields/tablesExcel spreadsheet File Plan – Especially if
using for records management
Search Analytics Topic MapsCard Sorting – (Open &
Closed)Subject Matter Experts
Free industry standard taxonomies Wikipedia – “Industry
classification” or “Global Industry
Classification Standard”
WWW directory structureTag Clouds – Flickr,
Del.icio.us, Technorati,ConceptSearching – Free
Taxonomies Hard Core - ANSI/NISO
Z39.19-2005
What/where do I find good examples to use to build out term sets and terms
conceptClassifier for SharePoint is the only native Term Store Management tool for 2010
Term Set
Child Term
Parent Term
Grand Child Term
A content type can contain one or many taxonomies based on specific business user requirement. The values can shown as columns or can be hidden from users for administrative or governance purposes only.
Build term sets/taxonomies here in SharePoint 2010 EMM. Plan for 30,000 values
Traditional manual approach is subjective, cumbersome and overwhelming
End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.
ConceptClassifier for SharePoint 2010
An automated solution for applying metadata and providing term store management to enhance SharePoint 2010 capabilities for
Records Management, Governance Policies, Rights Management, Sensitive Information
Removal and Findability.04/12/2023
Native integration into Term Store
No Service Pack Updates, no custom code. ConceptClassifier is a native integration.
No custom property types Every item is synchronized with term store and is a part of managed metadata service. All search features work natively as they should. No custom search property values which require custom code updates and additional custom search controls. ConceptClassifier is a native integration.
Why do we work with native term store natively
Because it is the natural place that you should store metadata if you are driving economies of scale by leveraging Microsoft stack. That is Microsoft’s road map for metadata management.
Easy Upgrade If you want to go back to a pure manual application, there is no code rewrite. ConceptClassifier is a native integration. You just unplug and you are back to native.
04/12/2023
conceptClassifier provides a native integration into Term Store
Multi User Distributed Branch and Term Support for Enterprise
Native Term Store Integration for SharePoint 2010
Accelerate building out taxonomies by 75% with automatic Term/Clue Suggestion
Enables the ability for information architects to build model and validate
Automatic Term Boosting for FAST/Search Platforms
Pragmatic Ontology Features for subject matter experts (You don’t need to be a librarian)
Broad to Narrow Preferred Term Non preferred terms Poly hierarchies – Not supported in Term Store Relations – Not supported in Term Store
Enterprise Taxonomy Management and Auto-classification
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
conceptClassifier for SharePoint 2010
Automatically applies Metadata
Automatically Applies Content Types
Auto Applies Retention Code Policies
Automatically applies Windows Rights Management Policies
Automatic Term Boosting for FAST
Pulls hierarchy directly from Term Store, therefore updates are immediate and accurate for guided taxonomy navigation in FAST
conceptClassifier for SharePoint 2010 drives immediate value for end users for Search, Records Management and Sensitive Information Removal
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
conceptClassifier for FAST Search
Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results
Enables import of FAST Entities into the conceptClassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature
Runs natively as a FAST Pipeline Stage eliminating integration and customization issues
Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies
Improves faceted search results as facets are based on concepts aligned with the taxonomy
Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching
Removes documents from search results that are confidential/sensitive through automatic Content Type updating and routing to secure server
Automatically tags content with both vocabulary and retention codes and respects SharePoint security that could prevent access to the document once it has been declared a record
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Product Screen Shots
04/12/2023
Traditional manual approach is subjective, cumbersome and ineffective
End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.
An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability
Metadata is automatically applied to content by ConceptClassifier via TaxonomyManager. Content Type Updater can take it a step further and can modify content type to redirect document/object to a different content type or migrate it to another site collection or document library. In this example the documents are being changed from document content type to PII or Records Cetner Content Type.
Term Store Management is provided by Taxonomy Manager and ConceptClassifier
TaxonomyManager is an intuitive and elegant to tool to manage how and when term sets are applied within SharePoint 2010 and what new terms to add to the term store
Deep capabilities to build out rules classification approaches including: standard term, phonetics, metadata, class ID, language, case sensitive, regular expression and boosting
An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability
The documents with 10 in front of them have had their content types updated. In this example the documents are being changed from document content type to PII or Records Cetner Content Type. They could have also been moved to a different folder if that was the desired outcome.
conceptClassifier for 2010 Product Suite provides intuitive guided navigation for FAST
Multi value select with in a term set is the single fastest approach you can provide for end users to get access to the correct content. It is just like picking values when you are on Best Buy or Amazon but it is with your personalized corporate term set vocabulary.
conceptClassifier for FAST and SharePoint 2010 Search
• Set proper expectations– Select a business unit to begin term set building and classification approaches (Manual vs.
Automated) within SharePoint– Manual – No more than 3 tags– Manage scope, don’t try to boil ocean
• Focus on value– Focus on the key constituents that you can show immediate value– Search or Findability– Records Management
• Focus on Use Cases– Understand how and why they will use term sets and how they will apply metadata
• Define Governance (See partner presentation from PPC on governance)– Roles, responsibilities, policies, and procedures
• Reconfirm expectations, it is a Marathon not a Sprint– Taxonomy development is an iterative and on-going effort– It changes and evolves just like your content and terminology– Add new business units or users after successful feedback from initial term set sponsors
28How To Guide for Taxonomies in SharePoint28
Best practices for Term Store Development and applying metadata in SharePoint 2010 for Records Management and Findability
Demo
04/12/2023
Differentiator Value
Enterprise Product Suite for Metadata Management that includes: Taxonomy Management (TM) , Auto Classification (AC), and Search
Only product to use TM, AC and Search to test, validate and build out meaningful business taxonomies and term sets for records management, sensitive information removal and improved search
Native Integration into SharePoint 2010 for Term Store Management
No custom code, no additional user controls, easy installation and upgrades with Microsoft SharePoint 2010
Positioned for success •Privately funded•Strategic Microsoft Partner for Term Store Management•Leverage partners for deployment and domain expertise across the world•Growing Fortune 2000 Customer Base
In Summary we are an Enterprise Metadata Management Product Suite
Planning
04/12/2023
Determine Key Term Sets Think about audience, business needs, content types Focus on immediate needs, build out term set Ask for immediate feedback
Governance for Tagging Vision and Executive Sponsorship Roles and responsibilities – Committee of one Policies and procedures – Committee of one Adoptability
Communication – Mandated process? Education and Training – How much time to ensure adoption
Maximum of 3-5 manual tags Internal Promotion
Tag off - Total number of tags per business unit or group Show total number of retention code policies as a before and after
Showing ROI The Stop Watch Test Governance Applications Executive Feedback – Tuning exercise
Initial Planning:
Method Definition Examples
Records Management
Retention Code Policies employment, staffing, training
Subject-oriented Information categorized by subject or topic Instantive - each child category is an instance of the parent category Partitive - each child category is a part of the parent category
water pollution, soil pollution, air pollution
Functional Information categorized by the process to which it relates
employment, staffing, training
Organizational Information categorized by corporate departments or business entities
Human Resources, Marketing, Accounting, Research
Document Type Information categorized by the type of document presentations, expense reports, press releases
Location Information categorized by the location where it originated or was conceived
US State, Office locations
Product or Customer
Information categorized by the product or customer it was developed for
Electronics > TVs, DVD Players, Computers
Categorization Schemas
Hard
es
t
Easi
est
34 How To Guide for Taxonomies in SharePoint
Records Management Use Cases
www.conceptsearching.com
Lack of Information Transparency Government and Private Sector directives to tag content for retrieval Untagged Data Assets = Untapped Resources Time Gap between Information Requests and Discovery is Directly Proportional to
Volume of Data Assets
Non-Compliance with Records Management Policies Sarbanes-Oxley and Government RM Retention Schedules Data Stored in Wrong Location Information not Preserved in Accordance with Regulatory Guidelines
Increasing Volume of Unplanned Data Exposure Events Privacy Act Program (PII), Protected Health Information (PHI), HIPAA, Payment Card
Industry (PCI), etc… Organizational Confidential and Sensitive Information
Problems
Information and Records Management Capability Gaps
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Why is this Difficult?
Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems
www.conceptsearching.com
Metadata
Tagging
Records Retention Code
Access Rights
Document Library 1 Document Library 2
Document Library 3 Document Library 4
Server Content with Appropriate Metadata, Retention Codes, and Rights Management
Templates
Human Factors
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
www.conceptsearching.com
Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems
Limiting Factor = Human Behavior
Metadata
Tagging
Records Retention Code
Access Rights
Document Library 1 Document Library 2
Document Library 3 Document Library 4
Server Content with Appropriate Metadata, Retention Codes, and Rights Management
Templates
Why is this Difficult?
Human Factors
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
How do Organization’s Typically Address These Capability Gaps
www.conceptsearching.com
Customize system interface to force manual application of metadata Pros: data assets now have metadata Cons: high customization costs, increase in end-user labor costs, less end-user
productivity, non-standardized application of metadata across enterprise
Hire temporary staff to add metadata to data assets Pros: data assets now have metadata Cons: temporary staff = $$$$$ and results in non-standardized tagging
Acknowledge that it is a problem and do nothing
Alternatives
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
www.conceptsearching.com
Records Retention
Code Tagging
Automatic Content
Type Updating
Records Management
Confidential Secure Data
CollaborationPortal
Concept Classifier
Security
Appropriate Storage
& Preservati
on
Increase Information
Retrieval Precision for Search
Semantic Metadata Tagging
Metadata, Auto-classification, Taxonomies Drive Business Value
Tagged for Search
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
How does Concept Searching Close IM and RM Capability Gaps
www.conceptsearching.com
Uses Taxonomy Manager to create and manage organizational taxonomies, ontologies, and metadata environment;
Employs conceptClassifier for SharePoint as an Automated Metadata Population Service;
Applies content types base on metadata; Uses content types derived from metadata to drive individual and group
access to data assets using inherent SharePoint Security; Uses content types derived from metadata to drive migration of data
assets to proper document libraries where RMS templates are automatically applied to restrict data asset usage.
Leveraging Metadata as an Enabling Asset
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Concept Searching in MOSS and Windows Server
SharePoint Server Security and AD-RMS in MOSS