Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | megan-donna-york |
View: | 215 times |
Download: | 0 times |
1http://www.sekt-project.com
The BT Digital LibraryThe BT Digital LibraryA case study in intelligent content A case study in intelligent content
managementmanagement
Paul WarrenPaul [email protected]@bt.com
2http://www.sekt-project.com
Semantics in content management
limitations of conventional technology
the users’ view
using the technology
enhancing the experience
the starting point
3http://www.sekt-project.com
Semantics in content Semantics in content managementmanagement
Intelligent content managementIntelligent content management
4http://www.sekt-project.com
The need for semanticsContent management systems need to:• index by meaning, not just text• combine information from heterogeneous
sources
Users need information:• identified by semantics, not just keywords
• precise and complete• selected by their interests and their task context
• defined semantically• from heterogeneous sources, accessed
uniformly
semantics in content management
5http://www.sekt-project.com
Higher precision, greater recall
Precision• Find me information about Washington the
man, not the state or city• Find me information about a company called
X which operates in industry YRecall• Finding all relevant documents• E.g. ask for information about ‘George W
Bush’ and be given documents on ‘the President’
semantics in content management
6http://www.sekt-project.com
Interests and context
• Need information about Jaguar?• interested in cars, the natural world, South
America …• with a context defined by current activities
Not just about searching• interest & context to share information …• … and to push information to user• … plus many integrated applications
semantics in content management
7http://www.sekt-project.com
Too much relevant information
Documents with duplicate information.Goal to:• extract what is unique from each document• help users prioritise their reading
Need to:• aggregate from disparate sources• remove duplication• present meaningfully
• classified• summarised
semantics in content management
8http://www.sekt-project.com
The starting pointThe starting point
The BT digital library before SEKTThe BT digital library before SEKT
9http://www.sekt-project.com
The BT digital librarythe starting point
• Two major document databases• 5 million articles –
abstracts plus some full text
• Originally text-based with some attribute-based querying: e.g. author, date
• information spaces defined by queries
10http://www.sekt-project.com
An information spacethe starting point
• Query-defined alerts
• Emailed weekly• as database updated
• Public info spaces• anyone can subscribe• forming communities
• Private info spaces• defined by user
11http://www.sekt-project.com
Personalisationthe starting point
Personalised entry page shows user’s info spaces, journals of interest, recent reading and ‘jottings’ (bookmarks)
12http://www.sekt-project.com
Limitations of conventional Limitations of conventional technologytechnology
Why we need semanticsWhy we need semantics
13http://www.sekt-project.com
Queries
• Text string ‘knowledge management’• 4161 ABI + 5029 Inspec records
• Descriptor ‘knowledge management’• 3213 ABI + 2783 Inspec
• So careful query formulation needed …
… but average query length is 1.8 words
• Little use of ‘advanced’ functions …
… 80% queries use no query modifier
limitations of conventional technology
14http://www.sekt-project.com
Poor relevancy of results• A simple keyword search tends to offer high recall and low
precision.
• Ambiguity in the query, e.g. synonymy where several terms could describe the same concept, homonymy where a word has many different meanings.
Relevant
documents
retrieved
|A|
Non relevant
documents
retrieved
|B|
Non relevant
Documents
|C|
Relevant
Documents
|D|
Relevant documents
Relevant documents
Retrieved documents
Retrieved documentsRecall = |A|/(|A|+|D|)
(proportion of relevant
documents retrieved)
Precision = |A|/(|A|+|B|)
(proportion of retrieved
documents that are relevant)
limitations of conventional technology
15http://www.sekt-project.com
Presenting results
Searches• Only 17% results read after 1st page
… no more than 10 results checked
• Same query, same results• regardless of user’s preference & context
Document descriptors• Lots – many irrelevant to readership• Where relevant, not fine-grained
• e.g. knowledge management
limitations of conventional technology
16http://www.sekt-project.com
Enhancing the experienceEnhancing the experience
What semantics can offer a digital libraryWhat semantics can offer a digital library
17http://www.sekt-project.com
A new experienceenhancing the experience
• Hybrid searching• concepts, instances, information spaces, and text• search results meaningfully classified
• Automatic annotation• identifying companies, people, …• hyperlinked to a knowledgebase
• Topics – finer grained than document descriptors• semi-automatically generated• automatic document classification
• An extended corpus• crawling the Web for related pages• Web pages added to share knowledge
18http://www.sekt-project.com
A better experience
• Semantics to improve precision & recall• Washington the man, not city or state• references to the President not just George W
Bush
• Information spaces • defined on semantic queries• not just text queries
• Taking account of interests and context• semantically defined
• Natural language results
enhancing the experience
20http://www.sekt-project.com
Initial questionnaire & focus group
Users want:
• Improved searching and indexing• based on a user’s profile• integrated into working environment
• To stay in control• advise but not decide• frustrated by too many email alerts
the users’ view
21http://www.sekt-project.com
Features – what the users think
very important / important
• summarising results of search
• with personal interests and preferences
• advanced attribute-based search
• looking beyond the library
• suggesting candidate topic areas
• highlighting & hyperlinking named entities
• natural language queries
the users’ view
22http://www.sekt-project.com
After that …
Important / minor importance
• retrieving similar articles
• re-using old queries
• agent searches
• access from a range of devices
the users’ view
23http://www.sekt-project.com
Using the technologyUsing the technology
Applying semantics to the BT Digital LibraryApplying semantics to the BT Digital Library
24http://www.sekt-project.com
Search: knowledge management
using the technology
knowledge management as:• info space• topic• term
With clustered results
25http://www.sekt-project.com
A complex queryusing the technology
microsoft • 2 companies• term
semantic web• info space• topic• term
sem web info space• Microsoft-authored• Microsoft as term
26http://www.sekt-project.com
Querying a conceptalloy• a term
but also - • concept in ontology… with properties… definition… sub-concepts
using the technology
27http://www.sekt-project.com
Document with markupusing the technology
Identified:• Bhargava• Waterbury• Connecticut• USA• IEE
Click for related documents, e.g. by Bhargava