+ All Categories
Home > Documents > Http:// 1 The BT Digital Library A case study in intelligent content management Paul Warren...

Http:// 1 The BT Digital Library A case study in intelligent content management Paul Warren...

Date post: 27-Dec-2015
Category:
Upload: megan-donna-york
View: 215 times
Download: 0 times
Share this document with a friend
30
1 http://www.sekt-project.com The BT Digital Library The BT Digital Library A case study in intelligent content A case study in intelligent content management management Paul Warren Paul Warren [email protected] [email protected]
Transcript

1http://www.sekt-project.com

The BT Digital LibraryThe BT Digital LibraryA case study in intelligent content A case study in intelligent content

managementmanagement

Paul WarrenPaul [email protected]@bt.com

2http://www.sekt-project.com

Semantics in content management

limitations of conventional technology

the users’ view

using the technology

enhancing the experience

the starting point

3http://www.sekt-project.com

Semantics in content Semantics in content managementmanagement

Intelligent content managementIntelligent content management

4http://www.sekt-project.com

The need for semanticsContent management systems need to:• index by meaning, not just text• combine information from heterogeneous

sources

Users need information:• identified by semantics, not just keywords

• precise and complete• selected by their interests and their task context

• defined semantically• from heterogeneous sources, accessed

uniformly

semantics in content management

5http://www.sekt-project.com

Higher precision, greater recall

Precision• Find me information about Washington the

man, not the state or city• Find me information about a company called

X which operates in industry YRecall• Finding all relevant documents• E.g. ask for information about ‘George W

Bush’ and be given documents on ‘the President’

semantics in content management

6http://www.sekt-project.com

Interests and context

• Need information about Jaguar?• interested in cars, the natural world, South

America …• with a context defined by current activities

Not just about searching• interest & context to share information …• … and to push information to user• … plus many integrated applications

semantics in content management

7http://www.sekt-project.com

Too much relevant information

Documents with duplicate information.Goal to:• extract what is unique from each document• help users prioritise their reading

Need to:• aggregate from disparate sources• remove duplication• present meaningfully

• classified• summarised

semantics in content management

8http://www.sekt-project.com

The starting pointThe starting point

The BT digital library before SEKTThe BT digital library before SEKT

9http://www.sekt-project.com

The BT digital librarythe starting point

• Two major document databases• 5 million articles –

abstracts plus some full text

• Originally text-based with some attribute-based querying: e.g. author, date

• information spaces defined by queries

10http://www.sekt-project.com

An information spacethe starting point

• Query-defined alerts

• Emailed weekly• as database updated

• Public info spaces• anyone can subscribe• forming communities

• Private info spaces• defined by user

11http://www.sekt-project.com

Personalisationthe starting point

Personalised entry page shows user’s info spaces, journals of interest, recent reading and ‘jottings’ (bookmarks)

12http://www.sekt-project.com

Limitations of conventional Limitations of conventional technologytechnology

Why we need semanticsWhy we need semantics

13http://www.sekt-project.com

Queries

• Text string ‘knowledge management’• 4161 ABI + 5029 Inspec records

• Descriptor ‘knowledge management’• 3213 ABI + 2783 Inspec

• So careful query formulation needed …

… but average query length is 1.8 words

• Little use of ‘advanced’ functions …

… 80% queries use no query modifier

limitations of conventional technology

14http://www.sekt-project.com

Poor relevancy of results• A simple keyword search tends to offer high recall and low

precision.

• Ambiguity in the query, e.g. synonymy where several terms could describe the same concept, homonymy where a word has many different meanings.

Relevant

documents

retrieved

|A|

Non relevant

documents

retrieved

|B|

Non relevant

Documents

|C|

Relevant

Documents

|D|

Relevant documents

Relevant documents

Retrieved documents

Retrieved documentsRecall = |A|/(|A|+|D|)

(proportion of relevant

documents retrieved)

Precision = |A|/(|A|+|B|)

(proportion of retrieved

documents that are relevant)

limitations of conventional technology

15http://www.sekt-project.com

Presenting results

Searches• Only 17% results read after 1st page

… no more than 10 results checked

• Same query, same results• regardless of user’s preference & context

Document descriptors• Lots – many irrelevant to readership• Where relevant, not fine-grained

• e.g. knowledge management

limitations of conventional technology

16http://www.sekt-project.com

Enhancing the experienceEnhancing the experience

What semantics can offer a digital libraryWhat semantics can offer a digital library

17http://www.sekt-project.com

A new experienceenhancing the experience

• Hybrid searching• concepts, instances, information spaces, and text• search results meaningfully classified

• Automatic annotation• identifying companies, people, …• hyperlinked to a knowledgebase

• Topics – finer grained than document descriptors• semi-automatically generated• automatic document classification

• An extended corpus• crawling the Web for related pages• Web pages added to share knowledge

18http://www.sekt-project.com

A better experience

• Semantics to improve precision & recall• Washington the man, not city or state• references to the President not just George W

Bush

• Information spaces • defined on semantic queries• not just text queries

• Taking account of interests and context• semantically defined

• Natural language results

enhancing the experience

19http://www.sekt-project.com

The users’ viewThe users’ view

What users wantWhat users want

20http://www.sekt-project.com

Initial questionnaire & focus group

Users want:

• Improved searching and indexing• based on a user’s profile• integrated into working environment

• To stay in control• advise but not decide• frustrated by too many email alerts

the users’ view

21http://www.sekt-project.com

Features – what the users think

very important / important

• summarising results of search

• with personal interests and preferences

• advanced attribute-based search

• looking beyond the library

• suggesting candidate topic areas

• highlighting & hyperlinking named entities

• natural language queries

the users’ view

22http://www.sekt-project.com

After that …

Important / minor importance

• retrieving similar articles

• re-using old queries

• agent searches

• access from a range of devices

the users’ view

23http://www.sekt-project.com

Using the technologyUsing the technology

Applying semantics to the BT Digital LibraryApplying semantics to the BT Digital Library

24http://www.sekt-project.com

Search: knowledge management

using the technology

knowledge management as:• info space• topic• term

With clustered results

25http://www.sekt-project.com

A complex queryusing the technology

microsoft • 2 companies• term

semantic web• info space• topic• term

sem web info space• Microsoft-authored• Microsoft as term

26http://www.sekt-project.com

Querying a conceptalloy• a term

but also - • concept in ontology… with properties… definition… sub-concepts

using the technology

27http://www.sekt-project.com

Document with markupusing the technology

Identified:• Bhargava• Waterbury• Connecticut• USA• IEE

Click for related documents, e.g. by Bhargava

28http://www.sekt-project.com

Categorising results …using the technology

29http://www.sekt-project.com

... and more categoriesusing the technology

30http://www.sekt-project.com

In summary

Semantic technology

- provides intelligence in content management

- enhances the user experience

- satisfies proven user needs


Recommended