Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Post on 24-Dec-2015

218 views 3 download

transcript

Multi-language CASCOT

Margaret Birch and Ritva EllisonInstitute for Employment Research

Computer Assisted Structured Coding Tool

CASCOT

• Software tool for coding text automatically or manually

• Developed at the Institute for Employment Research at Warwick University 1993-

• Used by over 100 organisations in the UK and abroad

IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08

A large task and limited resources, so this is a pilot project The 8 selected languages:

- Dutch (Netherlands, Flemish-Belgium)- English

- Finnish- French (France, Walloon-Belgium, Switzerland)- German (Germany, Austria, Switzerland)- Italian- Slovak- Spanish

Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08

structure for Cascot Indexing job titles in the selected languages to ISCO 08

- Some supplied by NSIs or other partners- Some found by exploring relevant national websites

Validating the software using raw data files from the European Social Survey (ESS) Round 6

Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software

Coding with Cascot

Enter text (could be from a file)

Cascot provides a recommendation for code but user can change it

Output can be directed to a file

Selected classification

Multi-language Cascot

• 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish

Cascot detects language automatically but it can be changed from menu

ISCO-08 classification exists for each country (some with national code)

Coding in Dutch

Finnish

French

German*

* The index is © Federal Employment Agency

Italian

Slovak

Spanish

A test of multi-language Cascot• Comparison of European Social Survey

round 6 code and automatic Cascot code• Data available from DE, ES, GB and NL

ISCO-08

Cascot Performance ToolAllows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data.

A delimited results file is needed that containsa reference code, Cascot code and Cascot score.

The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key

Opening a results file

Performance Results Display

The longer the green line stays high, the better

The more towards right the purple/blue lines are, the better

• The versions in different languages could be improved by developing coding rules

• Contribution needed from experts who know the language

• Rules are developed with Cascot Editor

Fine-tuning multi-language Cascot

Cascot Editor• Classification files for Cascot are created and modified

with the Editor• Each classification has Structure, Index, Rules for coding

Cascot Editor Rules• Downgraded words: words that are considered to be significantly less

important than other words, e.g. deputy, junior, person• Equivalent word ends: wait|er, wait|ress• Abbreviations: asst assistant, fe further education• Replacement words: taylor tailor, tesco supermarket

– Omitting noise words, e.g. replace ‘part-time’ with nothing• Input modifications: used when the rule absolutely can not be made

elsewhere• Word alternatives: words and phrases that should also be tried as

possible solution candidates

• Conclusions, retired can not conclude, agent ambiguous (score 39)

• Default coding: a set of words and phrases that should be scored as though they were a different word or phrase

Example of a new rule - English

• Add two new Replacement Words rules:

• The result:

• The problem:

Potential for rules - GermanText to be coded Cascot

ScoreBest matching index entry (Cascot)

Klassenlehrer/in (Klasse 1-3)

2341 Lehrkräfte im Primarbereich

73 2330 Lehrkräfte im Sekundarbereich

Klassenlehrer/in

Diplomingenieur/in (Fahrzeugbau)

2144 Maschinenbauingenieure 52 7231 Kraftfahrzeugmechaniker und -schlosser

Fahrzeugbauer/in

Mopedbote/-in 8321 Kraftradfahrer 34 7522 Möbeltischler und verwandte Berufe

Büchsenschäfter/in/in

Rampenpersonal 9333 Frachtarbeiter und verwandte Berufe

27 4323 Bürokräfte in der Transportwirtschaft und verwandte Berufe

Rampenmanager/in

Maniküre 5142 Kosmetiker und verwandte Berufe

0 ---- No conclusion

ISCO-08 (ESCO) ISCO-08 (Cascot)

• German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance.

• It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes.

• Cascot coding result can be compared with “gold standard” to find areas for improvement.