Which Log for which Information? Gathering Multilinguality Data from Different Log File Types

Post on 04-Jan-2016

27 views 2 download

description

Which Log for which Information? Gathering Multilinguality Data from Different Log File Types. Maria Gäde, Vivien Petras, and Juliane Stiller Humboldt-Universität zu Berlin CLEF 2010 Padova, 21 September 2010. Premise. - PowerPoint PPT Presentation

transcript

Which Log for which Information?

Gathering Multilinguality Data from Different Log File Types

Maria Gäde, Vivien Petras, and Juliane StillerHumboldt-Universität zu BerlinCLEF 2010 Padova, 21 September 2010

2 / 16

Premise

Assume you are building a multilingual digital library and could log every user action with particular consideration for multilingual activities.

Which questions could one ask?(Which questions cannot be answered by logging?)

Outline:

• Europeana

• Log file types

• Logging multilingual information

• Europeana ClickStreamLogger

3 / 16

Europeana

• 1,000+ content providers

• Portal + APIs

• Services

September 2010:

• 7.8 mio. images

• 4.6 mio. texts

• 127,000 videos

• 68,000 sounds“A digital library that is a single, direct and multilingual access point to the European cultural heritage.”

European Parliament, 27 September 2007

4 / 16

Multilingual Europeana

• Interface

• Search

• Browse

• Results

5 / 16

Multilingual Europeana

6 / 16

Log File Types

123.123.123.123 - - [11/Mar/2010:09:42:06 +0100]

"GET /cache/image/?uri=http://images.scran.ac.uk/rb/images/

thumb/0098/00980252.jpg&size=BRIEF_DOC&type=IMAGE

HTTP/1.0" 200 2843 "http://www.europeana.eu/portal/brief-doc.html?start=1&view=table&query=italy"

"Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)"

Example Apache web server log

7 / 16

Log File Types

Example Google Analytics

Map overlay (IP address)

Languages (system language)

8 / 16

Log File Types – Missing Information

• Web server log (Apache)• Interface language missing• Certain actions cannot be distinguished (browse = search)• Ajax / Flash actions (saved searches, tags, filter)• Reconstruct sessions

• Search engine log (Solr)• Only queries

• Google Analytics• Queries missing

9 / 16

Logging Multilingual Information

Stages of the interaction:

• Approaching the system / background information

• Launching queries / browsing

• Viewing results

• Interacting with the results (filter, save, tag, repeat)

• User background

• Interface language

• Query language

• Query type

• Query content

• Query translation

• Search results

• Result set views

• Result translation

• Query reformulation

• User-generated content

• Saved searches / docs

10 / 16

Logging Multilingual Information - Background

• User background information• Country of access, system language, referrer site

• Interface language• Change stronger intervention

11 / 16

Logging Multilingual Information - Query

• Query language• Query processing• Adapting languages to system

• Query type• Simple, advanced, fielded (e.g. language restriction)• Pre-selected categories for browsing

• Query content• Named entities, dates, numbers (language ambiguous)

• Query translation

12 / 16

Logging Multilingual Information - Results

• Search results• Document languages

• Result set views• Detailed view, external click stronger intervention

• Result translation

13 / 16

Logging Multilingual Information – User Activities

• Query reformulation / refinement• Language switch• Filtering (language), related-item search

• User-generated content• Language of tags• Language of documents being tagged

• Saved searches / documents

• ???

14 / 16

Europeana ClickStreamLogger

• Interface language• state + change for every activity

• Search• Result numbers, distribution of results by language / country• Filtering and related searches

• Browse• Browsing activities + starting points

• Navigation• Move outside Europeana

• Ajax• Save / remove searches / tags

• User management• Account creation etc.

15 / 16

What happens now…

• Soft roll-outs of new releases change site

• Analysis of log data

• Interpretation

• Re-iteration of “useful information” categories

• Re-design user interaction?

16 / 16

www.europeana.euwww.europeanaconnect.eu