Which Log for which Information?
Gathering Multilinguality Data from Different Log File Types
Maria Gäde, Vivien Petras, and Juliane StillerHumboldt-Universität zu BerlinCLEF 2010 Padova, 21 September 2010
2 / 16
Premise
Assume you are building a multilingual digital library and could log every user action with particular consideration for multilingual activities.
Which questions could one ask?(Which questions cannot be answered by logging?)
Outline:
• Europeana
• Log file types
• Logging multilingual information
• Europeana ClickStreamLogger
3 / 16
Europeana
• 1,000+ content providers
• Portal + APIs
• Services
September 2010:
• 7.8 mio. images
• 4.6 mio. texts
• 127,000 videos
• 68,000 sounds“A digital library that is a single, direct and multilingual access point to the European cultural heritage.”
European Parliament, 27 September 2007
4 / 16
Multilingual Europeana
• Interface
• Search
• Browse
• Results
5 / 16
Multilingual Europeana
6 / 16
Log File Types
123.123.123.123 - - [11/Mar/2010:09:42:06 +0100]
"GET /cache/image/?uri=http://images.scran.ac.uk/rb/images/
thumb/0098/00980252.jpg&size=BRIEF_DOC&type=IMAGE
HTTP/1.0" 200 2843 "http://www.europeana.eu/portal/brief-doc.html?start=1&view=table&query=italy"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)"
Example Apache web server log
7 / 16
Log File Types
Example Google Analytics
Map overlay (IP address)
Languages (system language)
8 / 16
Log File Types – Missing Information
• Web server log (Apache)• Interface language missing• Certain actions cannot be distinguished (browse = search)• Ajax / Flash actions (saved searches, tags, filter)• Reconstruct sessions
• Search engine log (Solr)• Only queries
• Google Analytics• Queries missing
9 / 16
Logging Multilingual Information
Stages of the interaction:
• Approaching the system / background information
• Launching queries / browsing
• Viewing results
• Interacting with the results (filter, save, tag, repeat)
• User background
• Interface language
• Query language
• Query type
• Query content
• Query translation
• Search results
• Result set views
• Result translation
• Query reformulation
• User-generated content
• Saved searches / docs
10 / 16
Logging Multilingual Information - Background
• User background information• Country of access, system language, referrer site
• Interface language• Change stronger intervention
11 / 16
Logging Multilingual Information - Query
• Query language• Query processing• Adapting languages to system
• Query type• Simple, advanced, fielded (e.g. language restriction)• Pre-selected categories for browsing
• Query content• Named entities, dates, numbers (language ambiguous)
• Query translation
12 / 16
Logging Multilingual Information - Results
• Search results• Document languages
• Result set views• Detailed view, external click stronger intervention
• Result translation
13 / 16
Logging Multilingual Information – User Activities
• Query reformulation / refinement• Language switch• Filtering (language), related-item search
• User-generated content• Language of tags• Language of documents being tagged
• Saved searches / documents
• ???
14 / 16
Europeana ClickStreamLogger
• Interface language• state + change for every activity
• Search• Result numbers, distribution of results by language / country• Filtering and related searches
• Browse• Browsing activities + starting points
• Navigation• Move outside Europeana
• Ajax• Save / remove searches / tags
• User management• Account creation etc.
15 / 16
What happens now…
• Soft roll-outs of new releases change site
• Analysis of log data
• Interpretation
• Re-iteration of “useful information” categories
• Re-design user interaction?
16 / 16
www.europeana.euwww.europeanaconnect.eu