+ All Categories
Home > Documents > Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno...

Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno...

Date post: 30-Dec-2015
Category:
Upload: clare-carter
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology Contents: Motivation Mkbeem in a nutshell Multilingual Cataloguing Tool Meaning extraction Experiences of test users Future DEMO
Transcript
Page 1: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System

Aarno Lehtola, Jarno Tenni and Tuula Käpylä

VTT Information Technology

Contents: MotivationMkbeem in a nutshellMultilingual Cataloguing ToolMeaning extractionExperiences of test usersFutureDEMO

Page 2: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Online Language Challenges for eCommerce

Ref: Global Reachhttp://www.glreach.com/

(end of year) 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005upper

limit (M)

English 40.0 45.0 72.0 148 192 231 245 270 295 320 550Total Non-English:

10.0 16.0 45.3 109 211 307 663 540 680 820 5850

TOTAL: 50.0 70.0 117.0 245 391 522 435 792 956 1120 6400

Native English speakers comprise lessthan 9 % of the world population.

"If I'm selling to you, I speak your language. If I'm buying, dann müssen Sie Deutsch sprechen". (Willy Brandt)

Page 3: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

An Answer: MKBEEM and Multilingual eCommerce Mediation

CustomerCP/SP User

MKBEEMMediation System

CP/SP eCom

Service

MonolingualCP/SP

• EC FP 5 IST/HLT project in 2000-2002, budget 4,9 M€

• Goal: Develop intelligent knowledge-based key components (HLP & KRR) for applications in multilingual eCommerce

• Language adaptation via automatic HL translation and interpretation

• Natural dialogues combining HL & navigation

• Harmonised ontologies enabling localised views to products and trading contracts

Customerlanguage information retrieval &trading

Multilingualcataloguing:write once, publish many

Transactions with contract adaptation

• Generic solutions proved by trials in Finnish, French and English in the domains of travel and mail-order sales

• More information: www.mkbeem.com

Page 4: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Generic Architecture of Mkbeem

Content/Ser vice Provider

CP Agent

Rational Agent

DomainOntology

Server

User Interface

Customer

CP E-Commerce platform

CP Interface

Content/Ser vice Provider

CP Agent

CP Agent

UserAgent

CP Information

System

Manager Interface

MKBEEM System

Manager

TradingOntology

Server

Manager Agent

HumanLanguage Processing

Server

Page 5: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Colour OntologyColour Ontology

14 products found:1. Beige winterjacket of

wool2. Ochre quilted jacket of

cotton...

Any further requirements?

14 products found:1. Beige winterjacket of

wool2. Ochre quilted jacket of

cotton...

Any further requirements?

Extracting Product PropertiesExtracting Product Properties

"Toppatakki. Muhkea malli, olkapäissä vahvikkeet.

Painonapeilla kiinnitetty huppu, jossa joustava nyöri. Vetoketjun alla suojalista. Kaksi kannellista taskua...

"Toppatakki. Muhkea malli, olkapäissä vahvikkeet.

Painonapeilla kiinnitetty huppu, jossa joustava nyöri. Vetoketjun alla suojalista. Kaksi kannellista taskua...

Meaning extraction

Machine translation

Dialogue processing

...

Meaning extraction

Machine translation

Dialogue processing

...

User Information Request Proc.

User Information Request Proc.

A brown jacket made of natural material

A brown jacket made of natural material

One with a hoodOne with a hood

"Toppatakki. Muhkea malli..." "Quilted jacket. Puffy model with reinforcements on the shoulder..."

jacket(X,quilted_jacket), model(X,puffy), part(X,Y,sleeves), property(Y,Z,reinforcement)...

"Toppatakki. Muhkea malli..." "Quilted jacket. Puffy model with reinforcements on the shoulder..."

jacket(X,quilted_jacket), model(X,puffy), part(X,Y,sleeves), property(Y,Z,reinforcement)...

Product ModelProduct Model

Multilingual Product Data

Material OntologyMaterial Ontology

Mkbeem: Bridging Languages via Language Neutral Ontologies

Ontological Formula in CARIN:(c_colour)(X),

(r_name)(X,brown),(c_product)(Y),

(r_name)(Y,jacket),(c_material)(Z),

(r_name)(Z,nat_mat).

Ontological Formula in CARIN:(c_colour)(X),

(r_name)(X,brown),(c_product)(Y),

(r_name)(Y,jacket),(c_material)(Z),

(r_name)(Z,nat_mat).

Page 6: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Mkbeem: Multilingual Cataloguing Tool

• Starting point:

• The new product belongs to the supported product domains

• Available a textual product description in one of the supported languages and a photograph

• Basic functionalities:

• Text checking

• Property extraction

• Product Categorisation

• Machine Translation

• NL Query Processing

• Technical key challenge:

• Formalising relationship of ontologies and HL and

• Extracting meaning of input HL texts with respect to provided ontologies into the form of Ontological Formulas

Page 7: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Meaning Extraction: Example in Clothing Domain

Long skirt with cargo pockets

Jupe longue avec des poches battle-dress

Pitkä hame, jossa reisitaskut

(c_MKBEEM:81007:clothingProduct)(H6641),

(r_name)(H6641,H6989),

(c_MKBEEM:83383:property)(H6552),

(r_name)(H6552,H6889),

(c_MKBEEM:81011:part)(H6730),

(r_name)(H6730,H7295),

(l_dependency)(H6989,adjAttr,H6889),

(l_dependency)(H6989,prepAttr,H7295),

(l_constituent)(H6889,0,long,[en,long,adj,nom,sg,property]),

(l_constituent)(H6989,1,skirt,[en,skirt,noun,nom,sg,product]),

(l_constituent)(H7295,4,cargo#pockets,

[en,cargo#pocket,noun,nom,pl,prodpart])

Concept Bindings

LinguisticDependencies &Lexical info

Page 8: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Linguistic

Services

Linguistic Ontology

Webtran MT System

VTT’s implementation of HLP Services in Mkbeem

extractMeaning

HL

string

On

tolo

gic

Fo

rmu

laALEs for MT

(965 btwFinnish, French

English)

Cone: Onto S/W Inference

DomainLexica

(Finnish/~4500French/~1700English/~1500Fi->Fr/~1300Fi->En/~1500)

Meaning Extractor

Verification

Unifier: Text Correction S/W

Webtran Dependency Parser

checkText

HL

str ing

OK

or

corr

ecti

onH

L str in

g

Tra

nsl

ated

str

ing

translateText

Product ModelColour Ontology

Material OntologyAltogether:

307 concepts1050 attributes

150 ALEs embedded

Concept Bindings

Functions:

KBs:

Page 9: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Augmented Lexical Entries

• Augmented Lexical Entries (ALE) rules (see MT Summit 99): • Bilingual or multilingual non-directed entries representing phrase and

sentence structures and possibly their translation relations.• Both surface form entries and generalised rules• Possible to declare multidirectional entries• Declarative and intuitive formalism - to be used by translators• Uniform way of representing phenomena on different levels of language • Designed to be suitable for automated or machine supported language

modelling (see SMC 99 paper on learning translation grammars)• Can be viewed as a forest of partial dependency parse trees• Near relationship obtainable to the corresponding conceptual structures

(concept bindings to ontologies)

• Lexicon • All the allowed words• Monolingual and bilingual entries

Page 10: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Meaning Extraction: A Product Ontology with ALEs Embedded

Page 11: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Syntax of ALEs

augmented_lexical_entry ::= [ entry_name pattern.. opt_message opt_repair ]entry_name ::= name . number_indexname ::= hierarchical_name_w_dots_betw_partspattern ::= [ opt_language_id constituent_def.. ] opt_message ::= | [ message string_w_opt_binding ]opt_repair ::= | [ repair string_w_opt_binding ]constituent_def ::= constituent_def*constituent_def ::= constituent_def.. constituent_def ::= < constituent_def.. > constituent_def ::= opt_regent_mark opt_lexeme opt_binding opt_feature_constraint opt_language_id ::= | ISO_std_lang_identifier | ~ ISO_std_lang_identifierISO_std_lang_identifier ::= ee | en | fi | fr | se | opt_regent_mark ::= | ^ opt_lexeme ::= | lexeme | tag | nameopt_binding ::= | binding opt_feature_constraint ::= | { feature.. } binding ::= ( variable_name ) | (^) feature ::= feature_value | property_type binding

Page 12: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Examples of ALEs - 1/3

• Basic word correspondence definition:

• Specific idiom correspondence:

• Generalised ALE, e.g. "shirt of 100% cotton”

[footwear.word.27[se ^allväderskänga][fi ^jokasäänkenkä]

[en all weather ^shoe]]

[price.tax.4[se inkl. ^moms tag_price(X)][fi sis. ^alv tag_price(X)] [en incl. ^VAT tag_price(X)]]

[cloth.material.composition

[fi ^(A){clothProd} tag_percentage(X)

(B){textileMaterial ptv}]

[fr ^(A){clothProd} en tag_percentage(X)

(B){textileMaterial}]

[en ^(A){clothProd} of tag_percentage(X)

(B){textileMaterial}]

[se ^(A){clothProd} av tag_percentage(X)

(B){textileMaterial}]

Page 13: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Examples of ALEs - 2/3

• Semantical and grammatical restrictions,

e.g. agreement in “miellyttävä pusero”

or “miellyttävää puseroa”

(“comfortable blouse”)

• An iterative phrase, obs! tree flattening:

[cloth.property.2

[se property.expr{clothProp}

^(B){cloth}]

[fi property.expr{clothProp}

^(B){cloth}]

[en property.expr{clothProp}

^(B){cloth)] ]

[cloth.property.1

[se (A){adj clothProp gender(B) number(B)}

^(B){noun clothProd}]

[fi (A){adj clothProp case(B) number(B)}

^(B){noun clothProd}]

[en (A){adj clothProp}

^(B){noun clothProd}] ]

[property.expr.1

[se (A){adj prop gender(^) number(^)} ]

[fi (A){adj prop number(^) case(^)} ]

[en (A){adj prop} ] ]

[property.expr.2

[property.expr.2 tag_comma property.expr.3]]

[property.expr.3

[property.expr.1 {conjAND} property.expr.1]]

Page 14: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

• Negative Instances - Correction ALEs

[correct.ellos.3[~se kardborrstängning(A)][~se kardborreförslutning(A)][~se kardborrknäppning(A)][~se kardborreknäppning(A)][message Use the correct synonym

“kardborrestängning” instead of word(A)]

[repair kardborrestängning(A)]

Examples of ALEs - 3/3

Page 15: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Inference of CARIN formulas Syntactico-semantic analysis

LexicalAnalysis

Set of tokens

DependenceAnalysis

Set of syntacticdependence trees

Set of approved lexical-semantic graphs with

concepts identified

Subset of refined semanticgraphs

Refining semanticsin particular themes,e.g. colors, materials,

distances

Set of CARIN formulas

Syntactictranslationinto CARIN

Input phrase

ConceptMatching

& Verification

Meaning Extraction Process

Page 16: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Inference of CARIN formulas

Syntactictranslationinto CARIN

Syntactico-semantic analysis

Set of CARIN formulasInput phrase

LexicalAnalysis

Set of tokens

DependenceAnalysis

Set of syntacticdependence trees

Subset of refined semanticgraphs

Refining semanticsin particular themes,

e.g. colours, materials,distances

ConceptMatching

& Verification

musta hame, jossa halkio ja taskutune jupe noire avec fente et pochesa black skirt with split and pockets

(c_MKBEEM:81098:colour)(H1017),(r_name)(H1017,H641),(c_MKBEEM:84731:clothingProduct)(H984),(r_name)(H984,H684),(c_MKBEEM:81011:part)(H951),(r_name)(H951,H813),(c_MKBEEM:81011:part)(H918),(r_name)(H918,H899),(l_dependency)(H684,adjAttr,H641),(l_dependency)(H684,prepAttr,H813),(l_dependency)(H684,prepAttr,H899),(l_constituent)(H641,0,musta, [fi,colour,musta,adj,nom,sg]),(l_constituent)(H684,1,hame, [fi,product,hame,noun,nom,sg]),(l_constituent)(H727,2,tag_comma, [fi]),(l_constituent)(H770,3,jossa, [fi,jossa,pron,ine,sg]),(l_constituent)(H813,4,halkio, [fi,prodpart,halkio,noun,nom,sg]),(l_constituent)(H856,5,ja,[fi,conj,ja,coord_c]),(l_constituent)(H899,6,taskut, [fi,prodpart,taskut,noun,nom,pl])

Set of approved lexical-semantic graphs with

concepts identified

(c_product)(H1606), (r_name)(H1606,skirt), (c_colour)(H1573), (r_name)(H1573,black), (c_part)(H1540), (r_name)(H1540,split), (c_part)(H1506), (r_name)(H1506,pocket)

Meaning Extraction Process Example

Page 17: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Cataloguing Tool Testing by End-Users

• Goals:• Proof of concept (“Swiss army knife of a cataloguer”)• Usability in real working environment

• Ellos' test group consisted of 8 persons (translators, cataloguers and call-centre workers):

• familiar with Internet: 5 yes, 1 almost yes, 2 yes at home• languages used: 8 Finnish, 6 English, 4 Swedish, 1 French• familiar w. catalogue maintenance: 6 yes, 2 no

• Schedule:• Short training and preliminary interviews on August 30, 2002• Interviews of experiences and summary of the results ready by

October 14, 2002

Page 18: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

... Trial experiences of the Ellos test group

• Cataloguing tool considered to be useful:

• cataloguing process as a whole was seen as an easy and efficient way of producing and classifying product information

• each of the main features was considered good

• very important:semi-automatic translation into target languages

• property extraction and inference with colours and materials seen as important in bringing value-adding services to customers

• helps in producing consistent and uniform information

• can make the working process faster and reduce the amount of manual, repeated routine procedures

• KB management tools considered suitable to their task

• Reported difficulty:

• occasionally long response times => boring of the user + e.g. repeating queries

• e.g. "hourglass" or provision of partial results could bring quick help

• will be eventually solved by continued product development

Page 19: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

MT Part (Webtran) in Production Use at Ellos since 2000

Language technology solutions are necessary to embed into business processes and IT infrastructure

Catalogueauthor

SourceDB

MacQuarkXPress

Cataloguer

MacQuarkXPress

LocalisedDB

WebtranMachine Translation

Software

AutomaticSw -> Fi

Translation

LanguageModeller PC Server

Swedish Finnish

Ellos Sweden Ellos Finland

About 2000 translated catalogue pages and 10000-15000 product descriptions per year

Benchmark by CSC Inc. reports over 30% time savings after one year of use

Page 20: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

• Marginal cost of adding a new domain or a new language is reasonable with respect to the added-value gained

• Based on experiences from modelling vacation cottage domain to the system (fi,fr,en) we have estimated that introducing a comparable new domain would require:

• semantic-lexicon: 2 man-months

• translation and meaning extraction rules: 1 man-month

• product models: 2-4 man-weeks

• We also estimate that adding a language to a pre-existing domain would need:

• semantic-lexicon: 1-2 man-month

• translation and meaning extraction rules: 2-4 man-week

• product models: 1 man-week

Work Needed for Adding Domain and Languages

Page 21: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

Future Development Recommendations

• Further development of could focus on the following issues:

• information request processing dialogues:

• question answering capabilities (e.g. qualitative questions about the goods selection)

• proper way of handling null queries (e.g. graceful relaxation of the search constraints based on the ontology models and the actual goods selection)

• new languages to the system: Russian, Norwegian, Estonian, German ...

• user-friendlier ways for the acquisition and maintenance of language models and product models (knowledge acquisition bottleneck): machine learning

• special requirements of mobile terminals (e.g. automatic text abstraction)

Page 22: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology.

VTT TIETOTEKNIIKKA

DEMO


Recommended