+ All Categories
Home > Documents > Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical...

Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical...

Date post: 26-Dec-2015
Category:
Upload: magdalen-elliott
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical ReferNet Technical Meeting Meeting 24-25 September 2009 24-25 September 2009
Transcript
Page 1: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Languages are bridges … not barriers

Chiara Carlucci – CEDEFOP Library

ReferNet Technical MeetingReferNet Technical Meeting

24-25 September 200924-25 September 2009

Page 2: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

What it is …

Why to use it …

How to use it …

What else ..

Languages are bridges … not barriers

Page 3: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Is there any place left for thesauri in this new information retrieval environment?

What

Page 4: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

for sure there is a place for thesauri but they must change in order to continue to be of value. A true thesaurus has equivalence relationships but it also supports other

kinds of relationship and provides navigation assistance by means of scope

notes and other aids.

What

Page 5: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

A thesaurus suggest other ways of expressing an idea which is already in the user's mind and remind the user of related ideas that might be valuable in searching.

What

Page 6: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

It’s useful recounts some classic moments of indexation because the documents are

changing rapidly, because the habit of making the same things and leads to

repetitive behavior and not considered, because the thesaurus is to be used as a

thesaurus !

What

Page 7: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

it must be remembered that, though a thesaurus appears to be made up of a natural language terms, it is an artificial language, a controlled vocabulary with a limited number of descriptors the meaning of each being understood through

the:– context provided by the descriptors as a whole

in a bibliographical context (as VET bib) these information provided by the whole system of

descriptors are also helped by – the title of the document– the abstract of the document

What

Page 8: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• Is not – a dictionary which contains definitions and

pronunciations. Unlike a dictionary, a thesaurus entry does not define words.

– a glossary which contains explanations of concepts relevant to a certain field of study or action.

– a lexicon because the lexicon of a language is its vocabulary, including its words and expressions.

– a vocabulary which is the set of words they are familiar with in a language. A vocabulary usually grows and evolves with age, and serves as a useful and fundamental tool for communication and acquiring knowledge.

What

Page 9: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

The thesaurus is a thesaurus

What

Page 10: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

The thesaurus is a thesaurus

With his propre Hierarchical relationships that are used to indicate terms which are narrower

and broader in scope. A "Broader Term" (BT) is a more general term, e.g. “Apparatus” is a

generalization of “Computers”. Reciprocally, a Narrower Term (NT) is a more specific term, e.g.

“Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a

broader term necessarily implies at least one other term which is narrower. BT and NT are

used to indicate class relationships, as well as part-whole relationships.

What

Page 11: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

With his propre Equivalency relationship that are used primarily to connect synonyms and

near-synonyms. Use (USE) and Used For (UF) indicators are used when an authorized term is

to be used for another, unauthorized, term. Reciprocally, the entry for the unauthorized term

would have a indicator "USE". Unauthorized terms are often called "entry vocabulary", "entry points", "lead-in terms", or "non-preferred terms", pointing to the authorized term (also referred to as the Preferred Term or Descriptor) that has

been chosen to stand for the concept.

The thesaurus is a thesaurus

What

Page 12: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

The thesaurus is a thesaurus

With his propre Associative relationships that are used to connect two related terms whose relationship is neither hierarchical nor equivalent. This relationship

is described by the indicator "Related Term" (RT). Associative relationships should be applied with caution, since excessive use of RT will reduce

specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the

answer is no, then an associative relationship should not be established.

What

Page 13: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• To translate the concept you are looking for into key-words

• Multilingualism and standardisation are the main advantages of this powerful indexing tool covering the fields of VET

• The thesaurus is an operational tool used to retrieve documents according to their semantic content

• Thesaurus must be delivered to users to identify their information needs

• Thesaurus provides a conceptual framework for understanding reality through graphic presentations that preserve the specificity

• It presents in an unambiguous way the conceptual content of documents.

Why

Page 14: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• A thesaurus is fit for the digital environment to show his versatility

• Is open to the interoperability information because the thesaurus context is not only an operating environment but an organizational criterion

• It can be integrated with other tools of information retrieval

Why

Page 15: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

research in systems of unstructured information

→ web

Why

Page 16: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

ETT is used to index and represent the content of a document. It is mostly used by documentalists and

librarians to identify the concepts laid down in the text and to represent them by attributing keywords from the

thesaurus. This operation enables extracting the relevant records from a collection of bibliographic

references or from a full-text documentary database to answer the user’s query. End-users can combine ETT

descriptors in order to represent their search query. The indexation through ETT enables all documents on the same subject to be retrieved through a single query.

Why

Page 17: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

ETT is useful for taxonomy and semantic web applications. The main role of a thesaurus is to

standardise the indexing process in order to make searches simpler, more efficient and

consistent regardless of the language of the query. It is a multilingual conceptual thesaurus

which strives to satisfy both the Community and national needs on a wide range of subjects.

Each descriptor is related to one concept in each of the languages.

Why

Page 18: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Another interesting option offered by ETT is the possibility for users to ask questions in one language and retrieve the answers in

different languages and this Google doesn’t do, or not yet !!

Why

Page 19: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Is only a term

Why

In this case the descriptor ‘transparency of qualifications’ represents a precise concept and can be able to retries many web pages, not necessarily documents, that have the descriptor in the exact form in the text

Page 20: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

WhyIn this case ‘transparency of qualifications’ is more than a descriptor: is a concept. We can find documents relating to the subject even if: 1. the term is not within the text 2. the document is in a different language.

Page 21: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

ETT is also used in Cedefop website for automatic categorisation or classification of

documents in websites and in Library’s reference desk to categorize user’s questions. A

simple click enables crosslingual information access to the translation of a descriptor or of the complete semantic chain of a descriptor. These advanced options open the door to many cross-

lingual applications, such as calculating document similarity across languages.

Why

Page 22: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Indexing with the ETT’s update version

… knowing how something is stored makes finding it easier

How

Page 23: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Alphabeticalpresentation withsemantic relation

KWIC index

Hierarchicalpresentation

How

Page 24: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

The main, word-by-word alphabetical display the most familiar since it provides a variety of information for each descriptor. The term’s main entry in the alphabetical display shows the appropriate coordination.

This includes a SN, a BT and NT, USE and UF relations, RT

But be careful … this approach is easy to understand but non so easy for end-user for example the fact that BT and NT mean that two terms are related hierarchically is obvious only to specialists !

How

Page 25: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Showing to the users hierarchical structures is a useful mechanism for query expansion also because …

- users with varying levels of domain knowledge make use of thesauri in different ways

- thesauri are capable of providing end-users with additional, useful terms for query formulation and expansion

How

Page 26: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

How

A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. The term permuted index is another name for a KWIC index, referring to the fact that it indexes all cyclic permutations of the headings. A permutation is called a cyclic permutation if and only if it will be constructed with exactly 1 cycle A cyclic permutation is built from one or more sets of elements in cyclic order.

Page 27: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Indexing with the ETT’s update version

• New 465 descriptors = have added to the thesaurus since 2008 edition so you can not search previous literature using these descriptors

Oldest literature on topics represented by these terms is searchable using related descriptors.

How

Page 28: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• 415 Deleted descriptors = are non longer used in indexing but they may be used for searching data base entries prior to ETT’s 2008 edition

More recent literature on topics represented by these terms is searchable using related

descriptors.

How

Indexing with the ETT’s update version

Page 29: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

How can I add the new descriptors using VET det ?

1) introduce the new descriptors (p.16-19 of ETT printed version) in the field notes preceding of the word, NEWDESCRIPTOR, and separating these with commas. i.e. Notes field: NEWDESCRIPTOR certification of learning outcomes, key competences

– If the new descriptor is a main descriptor NEWMAINDESCRIPTOR at the beginning

2) not to introduce the deleted descriptors (p. 20-22 of ETT printed version)

How

Page 30: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Fundamental, basic, classic indexing rules really important because VEt BIB

contains 70.000 records!!!

Index ONLY what is in the document and Index at the LEVEL of specificity of the

document

1. Statements or assumptions are not indexed

How

Page 31: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Fundamental indexing rules

2. Very general descriptors are not used unless the document covers a topic very broadly

3. Main descriptor cover the main focus or subject of a document

4. Other descriptors indicate less important aspects within the document

How

Page 32: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Fundamental indexing rules

5. ETT avoids ‘indexing up’ to a broader descriptor when an appropriate more specific exists

How

Page 33: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Fundamental indexing rulesHow

Page 34: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Fundamental indexing rules

• Indexing is complementary to information found in other parts of the document

(mainly title and abstract)

How

Page 35: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• The number of the descriptors should be proportioned with the number of pages

How

Fundamental indexing rules

Page 36: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

How

Fundamental indexing rules

Page 37: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• “Indexable” concepts are translated into descriptors using the thesaurus helps maintain consistency and prevents proliferation of concepts

How

Fundamental indexing rules

Page 38: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• Thus a single descriptor may be imprecise even ambiguous while the greater the number of descriptors used together the greater the precision

Fundamental indexing rules

How

Page 39: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• This world precision is used in a technical sense to mean the ratio of relevant to irrelevant documents in a retrieved set

Fundamental indexing rules

How

Page 40: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

• The word recall is used to mean the ratio of relevant documents retrieved to those wich are relevant and not retrieved

How

Fundamental indexing rules

Page 41: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

… for the future

Permitting the searcher to switch between navigating the thesaurus and searching

the database can only improve access an obvious way in which a thesaurus can be applied directly in retrieval is to use the

relationship as a means of expanding the search. Research, however, has shown

that these relationship must be used with caution (precision/recall)

What else …

Page 42: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

… for the future

In general, expanding a search to include the narrower terms tends to improve recall

without great sacrifice in precision. Expanding to include broader or related terms while does improve recall typically

has a significant negative impact on precision.

What else …

Page 43: Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

… for the future

• How is it possible to remain positive about the need for continued use of thesauri ?

Because only a thesaurus can become the basis of a more extensive semantic

network that provide information not just on what terms are used in indexing but

on how they are used within the system.

What else …


Recommended