+ All Categories
Home > Documents > PART 1 : What is a thesaurus ? Concept and samples · Cape Town, June 2006 PART 1 : What is a...

PART 1 : What is a thesaurus ? Concept and samples · Cape Town, June 2006 PART 1 : What is a...

Date post: 01-Sep-2018
Category:
Upload: danglien
View: 215 times
Download: 0 times
Share this document with a friend
75
Cape Town, June 2006 PART 1 : What is a thesaurus ? Concept and samples Christine Laaboudi-Spoiden Publications Office of the European Communities EUR-LEX Unit – Documentary section
Transcript

Cape Town, June 2006

PART 1 : What is a thesaurus ?Concept and samples

Christine Laaboudi-SpoidenPublications Office of the European Communities

EUR-LEX Unit – Documentary section

Cape Town, June 2006

EUR-Lex – Searching information

EUR-LEX http://eur-lex.europa.eu/en/index.htmdirect free access to European Union law

• the treaties, legislation, case-law and legislativeproposals

– Official Journal of the European Union– Official Journal L – Legislation– Official Journal C – Information and notices– Official Journal – Special editions– European Court Reports– Documents of the institutions– Consolidated texts

Cape Town, June 2006

EUR-Lex : Searching information

COMPUTER CRIME• Title and text: computer crime 40 Hits

COMPUTER RELATED CRIME• Title and text: computer crime 58 Hits

CYBERCRIME• Title and text: cybercrime 55 Hits

CYBER CRIME• Title and text: cyber crime 48 Hits

COMPUTER CRIME, CYBERCRIME, CYBERCRIME (Boolean - OR)• Title and text: computer crime, cybercrime 129

Hits USE OF SYNONYMS OR EQUIVALENT TERMS

Cape Town, June 2006

EUR-Lex sample –Bibliographic Notice

TERMES D’INDEXATION ou DESCRIPTEURSEUROVOC DESCRIPTORSINDEXING TERMSPREFERRED TERMS

CLASSIFICATION SCHEME

SUBJECT HEADINGS

Cape Town, June 2006

Indexing process

Indexing = Identify the conceptRepresented in a document

EUROVOC descriptor:information society, computer crime, personal data,electronic mail, confidentiality

For information retrieval (information request)Title and text: computer crime, cybercrime 129 Hits

Content Indexing = only 1 process !Searching = start again if the results are not

relevant to the question.

Cape Town, June 2006

Search Results

Relevant / Relevancy = relationship between adocument and a request.

– The document is relevant to the topic– It replies to the user’s request

Pertinence = relationship between a document andan information need.• Relevant and useful for a user• Relevant but the user doesn’t find it useful

(language, level of comprehensibility, type)

Irrelevant results = NOISENon-retrieved results = SILENCE

Cape Town, June 2006

Causes of searching failures

Two words don’t mean exactly the same thing

Enormous range of choices of words and expressions

No true synonyms, although words are often close inmeaning

Words are not clearly understood

Inconsistent use of words

Users are unlikely to choose all the relevant terms

The user might choose the terms used by the indexerwith a different understanding of meaning.

Cape Town, June 2006

Need of a controlled vocabulary

A controlled vocabulary = A consistent set ofwords/expressions, along with rules of usage, tobe followed when indexing / searching

Nature of indexing languageA list of terms acceptable to usersMechanisms for structuring and using those termsMinimize the ambiguity of isolated vocabulary that

may be out of context

Cape Town, June 2006

Out of context information

What means SENSITIVE AREA ?urbanmilitaryenvironmentalsensitive epidermis …

A sensitive area protected by special measures topreserve a highly vulnerable habitat (Eurovocthesaurus)

Cape Town, June 2006

Types of Vocabulary – Authority List

Simple list or index enumerating the termsavailable for indexing a collection of documents

Author names, organization names, Countries,E.g.• Library of Congress Authorities• ISO Country Codes

Cape Town, June 2006

Vocabulary control – Classification SchemeHeading / Caption

NotationClass

Sub-classes

Upper level

Lower level

EUR-Lex directory codes

Cape Town, June 2006

Vocabulary control – Classification Scheme

Systematic arrangement of entities/concepts intoclasses (group or categories)

group of concepts whose members share a commonfeature

vertical arrangement – level of specificityWords may appear in several classes

Cape Town, June 2006

Vocabulary control – Classification Scheme

Classes are identified bya heading/captiona notation (alphabetical and/or numerical code)

• Key for arranging items in physical libraries

Expressiveness (reflects the structure of thescheme)11.60.30.20 External relations / Commercial policy /

Trade arrangements / Common import arrangements

Cape Town, June 2006

EUR – Lex Directory Codes

Numerical classification of the “Directory ofCommunity legislation in force” and is used toindex legislation and preparatory acts.http://eur-lex.europa.eu/RECH_repertoire.do

20 principal chapters, each covering a specific areaof European Union activity.

Each descriptor is composed of eight digits• (principal chapter heading and up to three

subsequent subdivisions, each represented by twodigits)

Cape Town, June 2006

EUR-Lex – Subject Headings

One to maximum 5 descriptors based on thesubject-matter list of terms

The alphabetically structured list of over 200keywords is based on the subdivisions of thetreaties and the areas of activity of the institutions.

The descriptors are less specific than those of theDirectory code but provide a general overview ofthe content of the document.

Cape Town, June 2006

Thesaurus - Definition

ISO 2788 (1984)A structured list of expressions intended to

represent in unambiguous way the conceptualcontent of the document in a documentary systemand of the queries addressed to the system.

= NOUN, NOUN PHRASE

= INDEXING PROCESS

= ONE SINGLE INTERPRETATION

Cape Town, June 2006

BSI 8723 (2006)A controlled vocabulary in which concepts are

represented by descriptors, formally organized sothat paradigmatic relationships between theconcepts are made explicit,

and the descriptors are accompanied by lead-inentries for synonyms and quasi-synonyms.

The purpose of a thesaurus is• to guide both the indexer and the searcher to select

the same descriptor or combination of descriptors torepresent a given subject.

Thesaurus - Definition

= MUTUALLY EXCLUSIVE RELATIONSHIPS

= EQUIVALENCE

= INDEXING PROCESS

Cape Town, June 2006

Eurovoc - Scope

EurovocA multilingual thesaurus (hierarchical list of terms)Multidisciplinary vocabulary

• Community and national point of view• Parliamentary activities

Definition of conceptsSamples from Eurovoc

Cape Town, June 2006

04 POLITICS04 POLITICS08 INTERNATIONAL RELATIONS08 INTERNATIONAL RELATIONS10 EUROPEAN COMMUNITIES10 EUROPEAN COMMUNITIES12 LAW12 LAW16 ECONOMICS16 ECONOMICS20 TRADE20 TRADE24 FINANCE24 FINANCE28 SOCIAL QUESTIONS28 SOCIAL QUESTIONS32 EDUCATION AND COMMUNICATIONS32 EDUCATION AND COMMUNICATIONS36 SCIENCE36 SCIENCE40 BUSINESS AND COMPETITION40 BUSINESS AND COMPETITION44 EMPLOYMENT AND WORKING CONDITIONS44 EMPLOYMENT AND WORKING CONDITIONS48 TRANSPORT48 TRANSPORT52 ENVIRONMENT52 ENVIRONMENT56 AGRICULTURE, FORESTRY AND FISHERIES56 AGRICULTURE, FORESTRY AND FISHERIES60 AGRI60 AGRI--FOODSTUFFSFOODSTUFFS64 PRODUCTION, TECHNOLOGY AND RESEARCH64 PRODUCTION, TECHNOLOGY AND RESEARCH66 ENERGY66 ENERGY68 INDUSTRY68 INDUSTRY72 GEOGRAPHY72 GEOGRAPHY76 INTERNATIONAL ORGANISATIONS76 INTERNATIONAL ORGANISATIONS

Eurovoc - Coverage21 FIELDS = HEADINGS

127 MICROTHESAURUS= CLASSES

0806 international affairs0811 cooperation policy0816 international balance0821 defence

Cape Town, June 2006

Eurovoc - Equivalence

USE DESCRIPTORNON-DESCRIPTOR

Cape Town, June 2006

Eurovoc – Contextual information

DESCRIPTOR

MT - MICROTHESAURUS (MAIN CLASS)

UF (USED FOR) - NON-DESCRIPTORThis descriptor is USED FOR a non-descriptor

BT - BROADER TERM / GENERIC TERM

NT - NARROWER TERM / SPECIFIC TERM

RT – RELATED TERM

Cape Town, June 2006

Eurovoc – RelationshipsTOP TERM = higher in the hierarchy

SCOPE NOTE (SN) =Usage or definition note

NT1

NT3 Hierarchical relationship(MT, BT, NT)

Associative relationship(RT)

Equivalence relationship(USE, UF)

Cape Town, June 2006

Vocabulary Control – Thesaurus

The scope of a descriptor is limited to a singlemeaning (unambiguous)

• Nouns or Noun phrases• Pre-coordination of concepts

The context is provided by :• The hierarchical relationships (MT, BT, NT)• The scope note (SN)

– (state the chosen meaning or indicate other meaningsexcluded for indexing purposes)

A concept is represented by two or more synonyms• One term selected as a descriptor (indexing term)• Equivalents = non-descriptors

– (lead-in entries or references to the descriptor – USE, UF)

Cape Town, June 2006

Vocabulary control - Targets

Represents the general conceptual structure of a subjectarea and presents a guide to the user of an index

Reflects closely the literature vocabulary and the user’sown technical usage

Employs pre-coordinated phrases to reduce false dropsto minimum

• Venetian Blind

Controls synonyms and near-synonyms in order toincrease the consistency

Only one term from a list of similar terms will be used inindexing

Horizontal and vertical relationships among terms(cross-references)

Cape Town, June 2006

Classification & Thesaurus - Difference

Classification

Single preferred location (physical libraries)• Directory code:

03.60.55.00 Agriculture / Products subject to marketorganisation / Wine

• Post-coordination of concepts Eurovoc

Admits relationships as hierarchicalwine

MT 6021 beverages and sugarBT1 alcoholic beverage

BT2 beverageNT1 bottled wineNT1 champagneNT1 flavoured wineNT1 fortified wine

Cape Town, June 2006

Indexing systems - Types

Greater time and effortsCost is important

Automatic indexing

The Indexer determines thescope of the document andassigns descriptors from acontrolled vocabulary

Descriptors identify theconcepts expressed by thedocuments

Natural language or free-textindexing

Subject heading list,thesaurus, classification,taxonomy

Intellectual effort

All descriptors are taken fromthe text itself

Assigned-term systemDerived-term system

Cape Town, June 2006

PART 2 : EUROVOC THESAURUS

Christine Laaboudi-SpoidenPublications Office of the European Communities

EUR-LEX Unit – Documentary section

Cape Town, June 2006

Eurovoc 4.2 - Languages

http://europa.eu/eurovoc/http://europa.eu/eurovoc/:: Official EU LanguagesOfficial EU Languages

Acceeding countriesAcceeding countriesBGBG -- Bulgarian, ROBulgarian, RO –– RomanianRomanian

Candidate countryCandidate countryHRHR –– CroatianCroatian

Local sitesLocal sites Other languagesOther languages

Albanese, Ukranian, Russian,Albanese, Ukranian, Russian,Georgian, SerbianGeorgian, Serbian

Regional languages :Regional languages : basque, catalanbasque, catalan

LV

SVIT

FIFR

SKEN

SIET(*)

PTEL

PLDE

NLDA

HUCS

LTES

Cape Town, June 2006

Eurovoc 4.2 in figures

36363542ASSOCIATIVERELATIONSHIPS

66696510GENERIC RELATIONSHIPS

66456501DESCRIPTORS

127127MICROTHESAURI

2121DOMAINS

Eurovoc 4.2Eurovoc 4.1

Cape Town, June 2006

Eurovoc – fields most frequently used

1817

1110

97

54444

3222

1111

0 5 10 15 20

04121028081652202432447248566636406876

Fields

Number of users

76 - INTERNATIONAL ORGANISATIONS68 – INDUSTRY40 – BUSINESS AND COMPETITION36 – SCIENCE66 – ENERGY56 – AGRICULTURE, FORESTRY AND FISHERIES48 – TRANSPORT72 – GEOGRAPHY44 – EMPLOYMENT AND WORKING CONDITIONS32 – EDUCATION AND COMMUNICATIONS24 – FINANCE20 – TRADE52 – ENVIRONMENT16 – ECONOMICS08 – INTERNATIONAL RELATIONS28 – SOCIAL QUESTIONS10 – EUROPEAN COMMUNITIES12 – LAW04 – POLITICS

Cape Town, June 2006

Eurovoc – Polyhierarchical relationship

Main rule : Descriptors belong to one category (1BT, 1 MT)

Exception : Descriptors from Domains 72 & 76Field 72 : GeographyField 76 : International Organizations

Cape Town, June 2006

Eurovoc - Advantages

Multilingualism Indexation in the documentalist’s languageSearch in the user’s language

Update18 months

CooperationNational parliamentsCandidate descriptors

Normalisation ISO 2788 & 5964

Cape Town, June 2006

Eurovoc - Limits

Generic vocabulary, not specific

Don’t cover national specificities

Cape Town, June 2006

Eurovoc - Display

FormatsPrinted – paper versionWeb site http://europa.eu/eurovoc/XML Files (provided to licensees)PDF Files to download

Types of displayAlphabeticalThematic

• Alphabetical listing by field/domain

Cape Town, June 2006

Eurovoc – Thematic display

Field/Domain

Microthesauri

Languages

NAVIGATING

Cape Town, June 2006

Eurovoc – Thematic display

Microthesauri

Top Term / Broader Term

Specific TermsNT1 – NT2

Related TermsAlphabetical index ofdescriptors/non-descriptors

of the current field

Cape Town, June 2006

Eurovoc – Terminology of the fieldAlphabetical index of

descriptors and non-descriptors

Cape Town, June 2006

Eurovoc – Searching for concept

Cape Town, June 2006

Eurovoc – Alphabetical display

Cape Town, June 2006

Eurovoc – Alphabetical display

PT

FR

Cape Town, June 2006

Eurovoc – Translations

A descriptor =an equivalent concept in every language

Cape Town, June 2006

Eurovoc - History1982 :

• comparative study of the existing documentary languagesat the European Commission and the European Parliament

1984 : first edition• seven languages (DA, DE, EN, FR, EL, IT, NL)

1987 : 2nd edition• + ES, PT

1995 : 3rd edition - 1999 : 3.1 edition• + SE, FI

2002 : 4.0 edition - 2004 : 4.1 edition2005 : 4.2 edition

• 17 languages2006 : 4.3 edition

• 21 langues

Cape Town, June 2006

Eurovoc - Users

National parliamentsEuropean institutions (European Parliament,

Publications Office, Court of Justice)Private users = Eurovoc License holders (licence

Eurovoc)

Cape Town, June 2006

Eurovoc – Users

16

65

43

2 2

0

2

4

6

8

10

12

14

16

Total

NationalP arliament

NationalA dministration

EU Institutions

Consultants

Universities

Private User

Research Institutes

Cape Town, June 2006

Eurovoc – Users

1% 6% 3%

56%

14%

20%

Transla tors

Informatics

Termino loguesLingui sts

Libraria nsDocum entalis ts

Researchers

Other

Cape Town, June 2006

Eurovoc - Licenses (1)

15

25

44

05

101520253035404550

2003 2004 2005

Number of Licences

Licence s

Cape Town, June 2006

Eurovoc – Licenses (2)

14 2 3 4 4

18

33

3

0

5

10

15

20

25

30

35

Academic Commercial Translation Indexing

2004

2005

2006

Cape Town, June 2006

PART 3 : EUROVOC MAINTENANCE

Christine Laaboudi-SpoidenPublications Office of the European Communities

EUR-LEX Unit – Documentary section

Cape Town, June 2006

Eurovoc - Maintenance

2 interinstitutional committees

Maintenance committee• Commission, Council, Parliament, Court of Justice,

Court of Auditors

Steering committee• Commission, Council, Parliament, Court of Justice,

Court of Auditors

Eurovoc Maintenance TeamPublications Office

Cape Town, June 2006

Eurovoc - Steering committee

Supervises the Eurovoc project• Objectives, priorities, overall timetable• Resources and budget

Officially adopts each new version

Chair by a representative of the EuropeanParliament

Cape Town, June 2006

Eurovoc – The maintenance committee

Examines and votes on the proposals for updatingthe thesaurus

Decides on the amendments to be made

Chair by the Publications Office

Meets twice a year

Cape Town, June 2006

Eurovoc – The maintenance team

Location: Publications Office

Collects and examines the proposals made by allusers

Coordinate the work of the Maintenance Committee

Responsible for IT developments, translationmonitoring, web site

Works through a maintenance interface

Cape Town, June 2006

Eurovoc – Maintenance processThe European Parliament

– Collects, examines and filters the proposals from the nationalparliaments

The Maintenance Team– Collects the proposals made by all users (E.P, licensees,

OPOCE)– Manage the proposals through the maintenance system

The Maintenance Committee– Votes on the various proposals– Decides on the final amendments

The Maintenance Team– New descriptors and amendments are sent to the E.C

translation

The Maintenance Committee– Review the multilingual draft version

The Steering Committee– Officially adopts the new version

Cape Town, June 2006

EUROVOC – The maintenance interface

https://webgate.cec.eu.int/eurovoc/maintUsersEU Institutions : Members of the maintenance

committee, TranslatorsNational parliaments

FeaturesPropose Candidate descriptors, amendmentsTranslation moduleA dedicated layer for each user

Cape Town, June 2006

EUROVOC – Maintenance

How to propose new concepts / amendments

Eurovoc maintenance form (web site)

Email to [email protected]

CANDIDATE DESCRIPTOR

Cape Town, June 2006

EUROVOC – Maintenance

Criteria’s of acceptance / non acceptance ofcandidates descriptors

Acceptance : Creation necessary :

• European Food Safety Authority (new europeanorganism)

• Greater Poland province in Regions of Poland inMT7211 (new regions to incorporate)

New concept interesting and useful• Access to healthcare• selfregulation

Cape Town, June 2006

EUROVOC – Maintenance

Criteria’s of acceptance / non acceptance ofcandidates descriptors

Non acceptance : Descriptor already existing under another form

• Second home secondary home• Community Customs Code exists as a non-

descriptor of « Customs regulations »

Concept which can be obtained in combining twoor three descriptors already created (• European Refugee Fund EC fund + aid to

regufees

Cape Town, June 2006

EUROVOC – Maintenance

Criteria’s of acceptance / non acceptance ofcandidates descriptors

Non acceptance : Term too specific (not enough used)

• Arctic agriculture

Term too national (not useful for the other users)• Popular school (in SV)

Term too vague• Right to peace• Small states

Cape Town, June 2006

PART 4 : INDEXING AND SEARCHINGWITH EUROVOC & the EP Library

Christine Laaboudi-SpoidenPublications Office of the European Communities

EUR-LEX Unit – Documentary sectionIsabelle Gautier – European Parliament - Library

Cape Town, June 2006

INDEXING AND SEARCHINGWITH EUROVOC

1. Content analysis and subject determination :

Example from Eur-Lex database (Directive 50/2006)

Example from Eur-Lex database (Règlement 802/2006)

Cape Town, June 2006

Cape Town, June 2006

Cape Town, June 2006

INDEXING AND SEARCHINGWITH EUROVOC

1. Term selection in Eurovoc• Check the relationships (hierarchy and semantical

environment of a descriptor)• Definition of horizontal or vertical specificity• Translation of concepts into indexing terms : cases of

generic terms, compounds terms, lack of precision, propernames.

3. Depth of indexing :• Exhaustivity and selectivity

4. Making choice : indexing policy

Cape Town, June 2006

Cape Town, June 2006

Cape Town, June 2006

Cape Town, June 2006

Cape Town, June 2006

EUROVOC at EP LIBRARY

1999 : change of our data processing system of ourcatalogue ; involves a new indexing policy to managefor the library.

new catalogue => needs to develop a new consistency forindexing ;

to obtain this consistency, organization of a training for allindexers ;

creation of a Working Group in charge of the IndexingCoordination among the library.

Cape Town, June 2006

EUROVOC at EP LIBRARYThe Indexing Coordination Group

Working Group formed by indexers InformationSpecialists (nationalities and languagesdifferents) in charge of :

Writing an internal guide to use the practicalrules for indexing, this for the departement ;

Creating some updated lists (descriptors studiedand descriptors created for the Library) andtemplates (to propose a creation or amodification) useful for the colleagues

organizing regularly some meetings on theindexing policy and its implementation;

training the new colleagues.

Cape Town, June 2006

EUROVOC at EP LIBRARYThe Indexing Guide

Target : to obtain a better consistency of theindexing operation in the catalogue and a goodknowledge of the new data processing system.

Contents three parts : definition and basic rules for indexing ; the indexing policy in the library ; practical application in our catalogue.

Completed by some advised-sheets for indexing if itappears necessary.

Cape Town, June 2006

EUROVOC at EP LIBRARYIndexing Meetings

Target : the group studies the proposals of newdescriptors or modifications sent by thecolleagues ;

To answer to specific questions asked by thecolleagues ; to write if necessary some advised-sheets ;

questions are analysed by the group in somemeetings and presented in meetings at thedepartment level;

Advise and help role.

Cape Town, June 2006

EUROVOC at EP LIBRARYExamples of proposals received by the Group

Candidate-descriptor created (library level) : Community law-international lawMT 1231 international law - BT international lawSN influence du droit communautaire sur le droit

international et vice-versaCandidate-descriptor rejected : environmental damage principleAdvise to index with : environment impact + risk

preventionModification of a descriptor : polluter pays principleProposal to change the English term (in place of polluter

pays policy).

Cape Town, June 2006

EUROVOC at THE EP LIBRARYTraining

Training Organisation for new colleagues :

Internal with a presentation of : the thesaurus,the indexing guide, the indexing policy of thedepartment, indexing in our catalogue and littlepractical exercises ;

internal but an external trainer to review or totrain - if necessary – to index a group of people

external : as needs requested by indexers and iftraining available in the different countries.

Cape Town, June 2006

EUROVOC at EP LIBRARY

European Parliament’s role as member of MaintenanceCommittee :

Represents both the EP and the national parliaments at theMaintenance Committee ;

Receives as representative the proposals of the nationalparliaments users of the thesaurus ;

Filters the proposals (criteria's rejection : concept toonational or too specific or too vague) ;

Forwards the proposals of the department and of thenational parliaments to the Committee ;

organises regularly seminars with national Parliaments.

Cape Town, June 2006

IN CONCLUSION : USEFUL LINKS

EUROVOC : http://eurovoc.europa.eu

Eur-Lex : http://eur-lex.europa.eu

Parlement européen : http://www.europarl.europa.eu


Recommended