Post on 14-Oct-2020
transcript
Sessions: 11.35 - 13.15 Area 1
Session : P01 - Corpora and
Annotation
Chair: Marko Tadić
59 AiTi Aw, Sharifah Mahani Aljunied, Nattadaporn Lertcheva
and Sasiwimon Kalunsima
TaLAPi – A Thai Linguistically Annotated Corpus for Language
Processing
120 Guiyao Ke, Pierre-Francois Marteau and Gildas Menier Variations on quantitative comparability measures and their
evaluations on synthetic French-English comparable corpora
147 Paul Felt, Eric Ringger, Kevin Seppi and Kristian Heal Using Transfer Learning to Assist Exploratory Corpus
Annotation
187 Miguel B. Almeida, Mariana S. C. Almeida, André F. T.
Martins, Helena Figueira, Pedro Mendes and Cláudia Pinto
Priberam Compressive Summarization Corpus: A New Multi-
Document Summarization Corpus for European Portuguese
253 Patrick Schone, Heath Nielson and Mark Ward Corpus and Evaluation of Handwriting Recognition of
Historical Genealogical Records
294 Milena Hnátková, Michal Křen, Pavel Procházka and Hana
Skoumalová
The SYN-series corpora of written Czech
300 Karel Kučera and Martin Stluka Corpus of 19th-century Czech Texts: Problems and Solutions
308 Maik Stührenberg Extending standoff annotation
345 Stefan Höfler and Kyoko Sugisaki Constructing and exploiting an automatically annotated
resource of legislative texts
Session : P02 -
Crowdsourcing
Chair: Alain Couillault
25 Yuan Luo, Thomas Boucher, Tolga Oral, David Osofsky and
Sara Weber
A Study on Expert Sourcing Enterprise Question Collection
and Classification
28 Balamurali A.R Can the Crowd be Controlled?: A Case Study on Crowd
Sourcing and Automatic Validation of Completed Tasks based
on User Modeling
94 Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop
Kunchukuttan, Karthik Visweswariah and Pushpak
Bhattacharyya
When Transliteration Met Crowdsourcing : An Empirical
Study of Transliteration via Crowdsourcing using Efficient,
Non-redundant and Fair Quality Control
132 Manjira Sinha, Tirthankar Dasgupta and Anupam Basu Design and Development of an Online Computational
Framework to Facilitate Language Comprehension Research
on Indian Languages
319 Martin Benjamin Collaboration in the Production of a Massively Multilingual
Lexicon
363 Marco Marelli, Stefano Menini, Marco Baroni, Luisa
Bentivogli, Raffaella Bernardi and Roberto Zamparelli
A SICK cure for the evaluation of compositional distributional
semantic models
431 Wajdi Zaghouani and Kais Dukes Can Crowdsourcing be used for Effective Annotation of
Arabic?
471 Héctor Martínez Alonso and Lauren Romeo Crowdsourcing as a preprocessing for complex semantic
annotation tasks
564 Christoph Draxler Online experiments with the Percy software framework -
experiences and some early results
641 Ryan Cotterell and Chris Callison-Burch A Multi-Dialect, Multi-Genre Corpus of Informal Written
Arabic
Session : P03 - Dialogue Chair: Dan Cristea
113 Stefan Ultes, Hüseyin Dikme and Wolfgang Minker First Insight into Quality-Adaptive Dialogue
169 Volha Petukhova, Martin Gropp, Dietrich Klakow, Gregor
Eigner, Mario Topf, Stefan Srb, Petr Motlicek, Blaise Potard,
John Dines, Olivier Deroo, Ronny Egeler, Uwe Meinz, Steffen
Liersch and Anna Schmidt
The DBOX Corpus Collection of Spoken Human-Human and
Human-Machine Dialogues
321 Dietmar Rösner, Rafael Friesen, Stephan Günther and Rico
Andrich
Modeling and evaluating dialog success in the LAST MINUTE
corpus
575 Layla El Asri, Rémi Lemonnier, Romain Laroche, Olivier
Pietquin and Hatim Khouzaimi
NASTIA: Negotiating Appointment Setting Interface
576 Layla El Asri, Romain Laroche and Olivier Pietquin DINASTI: Dialogues with a Negotiating Appointment Setting
Interface
959 Thomas Pellegrini, Vahid Hedayati and Angela Costa El-WOZ: a client-server wizard-of-oz interface
Session : P04 - Phonetic
Databases and Prosody
Chair: Philippe Martin
119 Claire Brierley, Majdi Sawalha and Eric Atwell Tools for Arabic Natural Language Processing: a case study in
qalqalah prosody
299 Johann-Mattis List and Jelena Prokić A Benchmark Database of Phonetic Alignments in Historical
Linguistics and Dialectology
381 Anne Lacheret, Sylvain Kahane, Julie Beliao, Anne Dister, Kim
Gerdes, Jean-Philippe Goldman, Nicolas Obin, Paola
Pietrandrea and Atanas Tchobanov
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
870 Jean-Philippe Goldman, Tea Prsir and Antoine Auchlin C-PhonoGenre: a 7-hours corpus of 7 speaking styles in
French: relations between situational features and prosodic
properties
454 Abir Masmoudi, Mariem Ellouze Khmekhem, Yannick Esteve,
Lamia Hadrich Belguith and Nizar Habash
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech
Recognition
716 Yuichi Ishimoto, Tomoyuki Tsuchiya, Hanae Koiso and
Yasuharu Den
Towards Automatic Transformation between Different
Transcription Conventions: Prediction of Intonation Markers
from Linguistic and Acoustic Features
727 Tiberiu Boroș, Adriana Stan, Oliver Watts and Stefan Daniel
Dumitrescu
RSS-TOBI - A Prosodically Enhanced Romanian Speech Corpus
931 Klim Peshkov and Laurent Prévot Segmentation evaluation metrics, a comparison grounded on
prosodic and discourse units
DAY1 Poster Sessions
1048 Bistra Andreeva, William Barry and Jacques Koreman A Cross-language Corpus for Studying the Phonetics and
Phonology of Prominence
1200 Liviu Dinu, Alina Maria Ciobanu, Ioana Chitoran and Vlad
Niculae
Using a machine learning model to assess the complexity of
stress systems
1212 Tanja Schultz and Tim Schlippe GlobalPhone: Pronunciation Dictionaries in 20 Languages
Session : P05 - Speech
Resources
Chair: Martine Adda-
Decker TBC
7 Juan Rafael Orozco-Arroyave, Julián David Arias-Londoño,
Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-
Rátiva and Elmar Nöth
New Spanish speech corpus database for the analysis of
people suffering from Parkinson's disease
32 François Salmon and Félicien Vallet An Effortless Way To Create Large-Scale Datasets For Famous
Speakers
41 Florian Schiel and Thomas Kisler German Alcohol Language Corpus - the Question of Dialect
89 Jetske Klatter, Roeland Van Hout, Henk van den Heuvel,
Paula Fikkert, Anne Baker, Jan De Jong, Frank Wijnen, Eric
Sanders and Paul Trilsbeek
Vulnerability in Acquisition, Language Impairments in Dutch:
Creating a VALID Data Archive
134 Mirjam Ernestus, Lucie Kočková-Amortová and Petr Pollak The Nijmegen Corpus of Casual Czech
182 Carlos Daniel Hernandez Mena and Abel Herrera Camacho CIEMPIESS: A New Open-Sourced Mexican Spanish Radio
Corpus
252 Marie Kopřivová, Hana Goláňová, Petra Klimešová and David
Lukeš
Mapping Diatopic and Diachronic Variation in Spoken Czech:
the Ortofon and Dialekt Corpora
290 Thomas Schmidt The Research and Teaching Corpus of Spoken German – FOLK
312 Niklas Vanhainen and Giampiero Salvi Free Acoustic and Language Models for Large Vocabulary
Continuous Speech Recognition in Swedish
Sessions: 14.45 - 16.25 Area 2
Session : P06 - Endangered
Languages
Chair: Laurette Pretorius
TBC
143 Kristiina Jokinen Open-domain Interaction and Online Content in the Sami
Language
438 Tjerk Hagemeijer, Michel Généreux, Iris Hendrickx, Amália
Mendes, Abigail Tiny and Armando Zamora
The Gulf of Guinea Creole Corpora
1046 Dagmar Jung, Katarzyna Klessa, Zsuzsa Duray, Beatrix Oszkó,
Mária Sipos, Sándor Szeverényi, Zsuzsa Várnai, Trilsbeek Paul
and Tamás Váradi
Languagesindanger.eu - including multimedia language
resources to disseminate knowledge and create educational
material on less‑resourced languages
1174 José Pedro Ferreira, Cristiano Chesi, Daan Baldewijns,
Fernando Miguel Pinto, Margarita Correia, Daniela Braga,
Hyongsil Cho, Amadeu Ferreira and Miguel Dias
Casa de la Lhéngua: a set of language resources and natural
language processing tools for Mirandese
1216 Christian Curtis A finite-state morphological analyzer for a Lakota HPSG
grammar
Session : P07 - Evaluation
Methodologies
Chair: Violeta Seretan
52 Adam Kilgarriff, Pavel Rychlý, Milos Jakubicek, Vojtěch Kovář,
Vit Baisa and Lucia Kocincová
Extrinsic Corpus Evaluation with a Collocation Dictionary Task
289 Nancy Underwood, Bartolomé Mesa-Lao, Mercedes García
Martínez, Michael Carl, Vicent Alabau, Jesús González-Rubio,
Luis A. Leiva, Germán Sanchis-Trilles, Daniel Ortíz-Martínez
and Francisco Casacuberta
Evaluating the effects of interactivity in a post-editing
workbench
320 Bogdan Ludusan, Maarten Versteegh, Aren Jansen,
Guillaume Gravier, Xuan-Nga Cao, Mark Johnson and
Emmanuel Dupoux
Bridging the gap between speech technology and natural
language processing: an evaluation toolbox for term
discovery systems
398 Paula Lopez-Otero, Laura Docio-Fernandez and Carmen
Garcia-Mateo
Introducing a Framework for the Evaluation of Music
Detection Tools
427 Bartosz Broda, Bartłomiej Nitoń, Włodzimierz Gruszczyński
and Maciej Ogrodniczuk
Measuring Readability of Polish Texts: Baseline Experiments
829 Jason Utt, Sylvia Springorum, Maximilian Köper and Sabine
Schulte im Walde
Fuzzy V-Measure - An Evaluation Method for Cluster
Analyses of Ambiguous Data
887 Andrea Horbach, Alexis Palmer and Magdalena Wolska Finding a Tradeoff between Accuracy and Rater's Workload
in Grading Clustered Short Answers
935 Petra Barancikova, Rudolf Rosa and Ales Tamchyna Improving Evaluation of English-Czech MT through
Paraphrasing
1198 Chi-kiu Lo and Dekai Wu On the reliability and inter-annotator agreement of human
semantic MT evaluation via HMEANT
Session : P08 - Language
Resource Infrastructures
Chair: Georg Rehm
206 Nelleke Oostdijk and Henk van den Heuvel The evolving infrastructure for language resources and the
role for data scientists
325 Dorte Haltrup Hansen, Lene Offersgaard and Sussi Olsen Using TEI, CMDI and ISOcat in CLARIN-DK
338 Jonathan Chevelu, Gwénolé Lecorvé and Damien Lolive ROOTS: a toolkit for easy, fast and consistent processing of
large sequential annotated data collections
368 Matteo Abrate, Angelo Mario Del Grosso, Emiliano
Giovannetti, Angelica Lo Duca, Damiana Luzzi, Lorenzo
Mancini, Andrea Marchetti, Irene Pedretti and Silvia Piccini
Sharing Cultural Heritage: the Clavius on the Web Project
517 Verena Lyding, Lionel Nicolas and Egon Stemle 'interHist' an interactive visual interface for corpus
exploration
Session : P09 - Machine
Translation
Chair: Jan Hajic TBC
21 Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Constructing a Chinese–Japanese Parallel Corpus from
Wikipedia
43 Lise Rebout and Phillippe Langlais An Iterative Approach for Mining Parallel Sentences in a
Comparable Corpus
103 Dan Tufiș Large SMT data-sets extracted from Wikipedia
107 Juan Luo and Yves Lepage Production of Phrase Tables in 11 European Languages using
an Improved Sub-sentential Aligner
162 Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda
and Satoshi Nakamura
Collection of a Simultaneous Translation Corpus for
Comparative Analysis
205 Sharid Loaiciga, Thomas Meyer and Andrei Popescu-Belis English-French Verb Phrase Alignment in Europarl for Tense
Translation Modeling
610 Bushra Jawaid and Ondrej Bojar Two-Step Machine Translation with Lattices
Session : P10 - Metadata Chair: Victoria Arranz
156 Matej Durco and Menzo Windhouwer The CMD Cloud
332 Fritz Kliche, Andre Blessing, Jonathan Sonntag and Ulrich Heid The e-Identity Exploration Workbench
1022 Damir Cavar and Malgorzata Cavar Visualization of Language Relations and Families: MultiTree
Session : P11 - MultiWord
Expressions and Terms
Chair: Valeria Quochi
263 Pierre André Ménard and Caroline Barriere Linked Open Data and Web Corpus Data for noun compound
bracketing
331 Anita Rácz, István Nagy T. and Veronika Vincze 4FX: Light Verb Constructions in a Multilingual Parallel Corpus
462 Wan Yu Ho, Christine Kng, Shan Wang and Francis Bond Identifying Idioms in Chinese Translations
466 Kara Warburton Narrowing the Gap between Termbases and Corpora in
Commercial Environments
518 Rodrigo Boos, Kassius Prestes and Aline Villavicencio Identification of Multiword Expressions in the brWaC
519 Lis Pereira, Elga Strafella and Yuji Matsumoto Collocation or Free Combination? – Applying Machine
Translation Techniques to identify collocations in Japanese
630 Irina Temnikova, Andrea Varga and Dogan Biyikli Building a Crisis Management Term Resource for Social
Media: The Case of Floods and Protests
Session : P12 - Treebanks Chair: Beatrice Daille
18 Riyaz Ahmad Bhat, Shahid Musjtaq Bhat and Dipti Misra
Sharma
Towards building a Kashmiri Treebank: Setting up the
Annotation Pipeline
42 Shinsuke Mori, Hideki Ogura and Tetsuro Sasada A Japanese Word Dependency Corpus
63 Chris Culy, Marco Passarotti and Ulla König-Cardanobile A Compact Interactive Visualization of Dependency Treebank
Query Results
70 Scott Martens and Marco Passarotti Thomas Aquinas in the TüNDRA: Integrating the Index
Thomisticus Treebank into CLARIN-D
225 Blanca Arias, Nuria Bel, Mercè Lorente, Montserrat
Marimón, Alba Milà, Jorge Vivaldi, Muntsa Padró, Marina
Fomicheva and Imanol Larrea
Boosting the creation of a treebank
382 Montserrat Marimon, Núria Bel, Beatriz Fisas, Blanca Arias,
Silvia Vázquez, Jorge Vivaldi, Carlos Morell and Mercè
Lorente
The IULA Spanish LSP Treebank
303 Per Erik Solberg, Arne Skjærholt, Lilja Øvrelid, Kristin Hagen
and Janne Bondi Johannessen
The Norwegian Dependency Treebank
378 Mojgan Seraji, Carina Jahani, Beáta Megyesi and Joakim Nivre A Persian Treebank with Stanford Typed Dependencies
441 Masood Ghayoomi and Jonas Kuhn Converting an HPSG-based Treebank into its Parallel
Dependency-based Treebank
Sessions: 16.45 - 18.05 Area 1
Session : P13 - Discourse
Annotation,
Representation and
Processing
Chair: Ann Bies
77 Kasia Budzynska, Mathilde Janier, Chris Reed, Patrick Saint-
Dizier, Manfred Stede and Olena yakorska
A Model for Processing Illocutionary Structures and
Argumentation in Debates
579 Manfred Stede and Arne Neumann Potsdam Commentary Corpus 2.0: Annotation for Discourse
Research
79 Magdalena Rysova Verbs of Saying with a Textual Connecting Function in the
Prague Discourse Treebank
155 Ryu Iida and Takenobu Tokunaga Building a Corpus of Manually Revised Texts from Discourse
Perspective
270 Lanjun Zhou, Binyang Li, Zhongyu Wei and Kam-Fai Wong The CUHK Discourse TreeBank for Chinese: Annotating
Explicit Discourse Connectives for the Chinese TreeBank
280 Thomas Bögel, Jannik Strötgen and Michael Gertz Computational Narratology: Extracting Tense Clusters from
Narrative Texts
330 Susana Bautista and Horacio Saggion Can Numerical Expressions Be Simpler? Implementation and
Demostration of a Numerical Simplification System for
Spanish
400 Cristina Grisot and Thomas Meyer Cross-linguistic annotation of narrativity for English/French
verb tense disambiguation
Session : P14 - Grammar
and Syntax
Chair: Cristina Bosco
47 Richard Sproat, Bruno Cartoni, HyunJeong Choe, David
Huynh, Linne Ha, Ravindran Rajakumar and Evelyn Wenzel-
Grondie
A Database for Measuring Linguistic Information Content
50 Katerina Rysova and Jiří Mírovský Valency and Word Order in Czech – A Corpus Probe
211 Ludger Zeevaert Mörkum Njálu. An annotated corpus to analyse and explain
grammatical divergences between 14th-century manuscripts
of Njál's saga
346 Roman Schneider GenitivDB – a Corpus-Generated Database for German
Genitive Classification
361 Tibor Kiss, Francis Jeffry Pelletier and Tobias Stadtfeld Building a reference lexicon for countability in English
Session : P15 - Lexicons Chair: Amália Mendes
34 Ismail El Maarouf, Jane Bradbury, Vít Baisa and Patrick Hanks Disambiguating Verbs by Collocation: Corpus Lexicography
meets Natural Language Processing
58 Nabil Hathout, Franck Sajous and Basilio Calderone GLÀFF, a Large Versatile French Lexicon
102 John Richardson, Toshiaki Nakazawa and Sadao Kurohashi Bilingual Dictionary Construction with Transliteration Filtering
127 Krasimir Angelov Bootstrapping Open-Source English-Bulgarian Computational
Dictionary
128 Mathieu Mangeot MotàMot project: conversion of a French-Khmer published
dictionary for building a multilingual lexical system
154 Menzo Windhouwer, Justin Petro and Shakila Shayan RELISH LMF: Unlocking the Full Power of the Lexical Markup
Framework
175 Liviu Dinu and Alina Maria Ciobanu Building a Dataset of Multilingual Cognates for the Romanian
Lexicon
222 Palmira Marrafa, Raquel Amaro and Sara Mendes LexTec – a rich language resource for technical domains in
Portuguese
Session : P16 - Morphology Chair: Benoît Sagot
2 Fadoua Ataa Allah and Siham Boulaknadel Amazigh Verb Conjugator
66 Menno van Zaanen, Gerhard Van Huyssteen, Suzanne
Aussems, Chris Emmery and Roald Eiselen
The Development of Dutch and Afrikaans Language
Resources for Compound Boundary Analysis
116 Rico Sennrich and Beat Kunz Zmorge: A German Morphological Lexicon Extracted from
Wiktionary
207 Attila Novák A New Form of Humor – Mapping Constraint-Based
Computational Morphologies to a Finite-State Representation
262 Veronika Vincze, Viktor Varga, Katalin Ilona Simkó, János
Zsibrita, Ágoston Nagy, Richárd Farkas and János Csirik
Szeged Corpus 2.5: Morphological Modifications in a
Manually POS-tagged Hungarian Corpus
437 Çağrı Çöltekin A set of open source tools for Turkish natural language
processing
501 Magda Sevcikova and Zdenek Zabokrtsky Word-Formation Network for Czech
593 Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed
El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery,
Owen Rambow and Ryan Roth
MADAMIRA: A Fast, Comprehensive Tool for Morphological
Analysis and Disambiguation of Arabic
607 Yvonne Adesam, Malin Ahlberg, Peter Andersson, Gerlof
Bouma, Markus Forsberg and Mans Hulden
Computer-aided morphology expansion for Old Swedish
768 Marcin Woliński Morfeusz Reloaded
Session : P17 - WordNet Chair: Francis Bond
121 Antoni Oliver and Salvador Climent Automatic creation of WordNets from parallel corpora
122 Spandana Gella, Carlo Strapparava and Vivi Nastase Mapping WordNet Domains, WordNet Topics and Wikipedia
Categories to Generate Multilingual Domain Specific
Resources
203 Quentin Pradet, Laurence Danlos and Gaël de Chalendar Adapting VerbNet to French using existing resources
541 Gianluca Lebani, Veronica Viola and Alessandro Lenci Bootstrapping an Italian VerbNet: data-driven analysis of
verb alternations
582 Ahti Lohk, Kaarel Allik, Heili Orav and Leo Võhandu Dense Components in the Structure of WordNet
1071 Yuri Bizzoni, Federico Boschetti, Harry Diakoff, Riccardo Del
Gratta, Monica Monachini and Gregory Crane
The Making of Ancient Greek WordNet
1083 Gerard de Melo Etymological Wordnet: Tracing The History of Words
Sessions: 18.10 - 19.30 Area 2
Session: P18 - Corpora and
Annotation
Chair: Steve Cassidy TBC
199 Angela Costa, Tiago Luís and Luísa Coheur Translation errors from English to Portuguese: an annotated
corpus
360 Verginica Barbu Mititelu, Elena Irimia and Dan Tufiș CoRoLa – The Reference Corpus of Contemporary Romanian
Language
523 Houda Bouamor, Nizar Habash and Kemal Oflazer A Multidialectal Parallel Corpus of Arabic
558 Ahmed Salama, Houda Bouamor, Behrang Mohit and Kemal
Oflazer
YouDACC: the Youtube Dialectal Arabic Comment Corpus
529 Miquel Esplà-Gomis, Filip Klubička, Nikola Ljubešić, Sergio
Ortiz-Rojas, Vassilis Papavassiliou and Prokopis Prokopidis
Comparing two acquisition systems for automatically
building an English–Croatian parallel corpus from
multilingual websites
530 Siim Orasmaa Towards an Integration of Syntactic and Temporal
Annotations in Estonian
552 Louise Deleger, Anne-Laure Ligozat, Cyril Grouin, Pierre
Zweigenbaum and Aurelie Neveol
Annotation of specialized corpora using a comprehensive
entity and relation scheme
594 Ritesh Kumar Developing Politeness Annotated Corpus of Hindi Blogs
606 Adriane Boyd, Jirka Hana, Lionel Nicolas, Detmar Meurers,
Katrin Wisniewski, Andrea Abel, Karin Schöne, Barbora
Štindlová and Chiara Vettori
The MERLIN corpus: Learner language and the CEFR
612 Luz Rello, Ricardo Baeza-Yates and Joaquim Llisterri DysList: An Annotated Resource of Dyslexic Errors
624 Jena D. Hwang, Annie Zaenen and Martha Palmer Criteria for Identifying and Annotating Caused Motion
Constructions in Corpus Data
914 Ann Irvine, Joshua Langfus and Chris Callison-Burch The American Local News Corpus
Session : P19 - Document
Classification, Text
Categorisation
Chair: Damir Cavar
8 Mohamed Morchid, Richard Dufour and Georges Linares A LDA-based Topic Classification Approach from Highly
Imperfect Automatic Transcriptions
104 Juan Soler and Leo Wanner How to Use less Features and Reach Better Performance in
Author Gender Identification
195 Lucie Poláková, Pavlína Jínová and Jiří Mírovský Genres in the Prague Discourse Treebank
291 Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah
Kermes, Ekaterina Lapshinova-Koltunski, Noam Ordan and
Elke Teich
Data Mining with Shallow vs. Linguistic Features to Study
Diversification of Scientific Registers
402 Mahmoud El-Haj, Paul Rayson, Steve Young and Martin
Walker
Detecting Document Structure in a Very Large Corpus of UK
Financial Reports
470 Noushin Rezapour Asheghi, Serge Sharoff and Katja Markert Designing and Evaluating a Reliable Corpus of Web Genres
via Crowd-Sourcing
498 Ioannis Korkontzelos and Sophia Ananiadou Locating Requests among Open Source Software
Communication Messages
1007 Thamar Solorio, Ragib Hasan and Mainul Mizan Sockpuppet Detection in Wikipedia: A Corpus of Real-World
Deceptive Writing for Linking Identities
Session : P20 - FrameNet Chair: Alessandro Lenci
254 Ildikó Pilán and Elena Volodina Reusing Swedish FrameNet for training semantic roles
455 Marie-Claude L' Homme, Benoît Robichaud and Carlos
Subirats Rüggeberg
Discovering frames in specialized domains
496 Marie Candito, Pascal Amsili, Lucie Barque, Farah Benamara,
Gaël de Chalendar, Marianne Djemaa, Pauline Haas, Richard
Huyghe, Yvette Yannick Mathieu, Philippe Muller, Benoît
Sagot and Laure Vieu
Developing a French FrameNet: Methodology and First
results
Session : P21 - Semantics Chair: Peter Anick TBC
221 Reinhard Rapp Corpus-Based Computation of Reverse Associations
242 Haritz Salaberri, Olatz Arregi and Beñat Zapirain First approach toward Semantic Role Labeling for Basque
267 Tomoko Izumi, Tomohide Shibata, Hisako Asano, Yoshihiro
Matsuo and Sadao Kurohashi
Constructing a Corpus of Japanese Predicate Phrases for
Synonym/Antonym Relations
274 Martin Riedl, Richard Steuer and Chris Biemann Distributed Distributional Similarities of Google Books over
the Centuries
545 Kostadin Cholakov, Chris Biemann, Judith Eckle-Kohler and
Iryna Gurevych
Lexical Substitution Dataset for German
353 Nianwen Xue and Yuchen Zhang Buy one get one free: Distant annotation of Chinese tense,
event type and modality
403 Dan Stefanescu, Rajendra Banjade and Vasile Rus Latent Semantic Analysis Models on Wikipedia and TASA
461 Yuka Tateisi, Yo Shidahara, Yusuke Miyao and Akiko Aizawa Annotation of Computer Science Papers for Semantic
Relation Extrac-tion
574 Moritz Wittmann, Marion Weller and Sabine Schulte im
Walde
Automatic Extraction of Synonyms for German Particle Verbs
from Parallel Data with Distributional Similarity as a Re-
Ranking Feature
233 Gregor Titze, Volha Bryl, Cäcilia Zirn and Simone Paolo
Ponzetto
DBpedia Domains: augmenting DBpedia with domain
information
750 Elena Cabrio, Serena Villata and Fabien Gandon Classifying Inconsistencies in DBpedia Language Specific
Chapters
Session : P22 - Speech
Resources
Chair: Giuseppe Riccardi
TBC
171 Thomas Schmidt The Database for Spoken German – DGD2
365 Annika Hämäläinen, Jairo Avelar, Silvia Rodrigues, Miguel
Sales Dias, Artur Kolesiński, Tibor Fegyó, Géza Németh, Petra
Csobánka, Karine Lan and David Hewson
The EASR Corpora of European Portuguese, French,
Hungarian and Polish Elderly Speech
394 Barbara Schuppler, Martin Hagmueller, Juan A. Morales-
Cordovilla and Hannes Pessentheiner
GRASS: the Graz corpus of Read And Spontaneous Speech
432 Hanae Koiso, Yasuharu Den, Ken'ya Nishikawa and Kikuo
Maekawa
Design and development of an RDB version of the Corpus of
Spontaneous Japanese
484 Camille Fauth, Anne Bonneau, Frank Zimmerer, Juergen
Trouvain, Bistra Andreeva, Vincent Colotte, Dominique Fohr,
Denis Jouvet, Jeanin Jügler, Yves Laprie, Odile Mella and
Bernd Möbius
Designing a Bilingual Speech Corpus for French and German
Language Learners: a Two-Step Process
511 Rosemary Orr, Marijn Huijbregts, Roeland van Beek, Lisa
Teunissen, Kate Backhouse and David van Leeuwen
Semi-automatic annotation of the UCU accents speech
corpus
514 Ana Lúcia Santos, Michel Généreux, Aida Cardoso, Celina
Agostinho and Silvana Abalada
A corpus of European Portuguese child and child-directed
speech
537 Anna Polychroniou, Hugues Salamin and Alessandro
Vinciarelli
The SSPNet-Mobile Corpus: Social Signal Processing Over
Mobile Phones
553 Katarzyna Klessa and Dafydd Gibbon Annotation Pro + TGA: automation of speech timing analysis
611 Björn Schuller, Felix Friedmann and Florian Eyben The Munich Biovoice Corpus: Effects of Physical Exercising,
Heart Rate, and Skin Conductance on Human Speech
Production
Sessions: 9.45 - 11.25 Area 1
Session : P23 - Collaborative
Resource Construction
Chair: Christian Chiarcos
TBC
14 Włodzimierz Gruszczyński and Maciej Ogrodniczuk Digital Library 2.0: Source of Knowledge and Research
Collaboration Platform
95 Livio Robaldo, Guido Boella, Luigi Di Caro and Andrea Violato Exploiting networks in Law
151 Alex Rudnick, Taylor Skidmore, Alberto Samaniego and
Michael Gasser
Guampa: a Toolkit for Collaborative Translation
758 Billy T.M. Wong, Ian C. Chow, Jonathan J. Webster and
Hengbin Yan
The Halliday Centre Tagger: An Online Platform for Semi-
automatic Text Annotation and Analysis
769 Mauro Dragoni, Alessio Bosca, Matteo Casu and Andi Rexha Modeling, Managing, Exposing, and Linking Ontologies with a
Wiki-based Tool
817 Mathieu Lafourcade and Karën Fort Propa-L: a semantic filtering service from a lexical network
created using Games With A Purpose
940 Frederik Baumgardt, Giuseppe Celano, Gregory R. Crane,
Stella Dee, Maryam Foradi, Emily Franzini, Greta Franzini,
Monica Lent, Maria Moritz and Simona Stoyanova
Open Philology at the University of Leipzig
975 Joshua Elliot, Logan Kearsley, Jason Housley and Alan Melby LexTerm Manager: Design for an Integrated Lexicography
and Terminology System
1016 Jonathan Wright RESTful Annotation and Efficient Collaboration
Session : P24 - Corpora and
Annotation
Chair: Maria Gavrilidou
1094 Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker,
Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor,
Preston Cabe, Thomas Thomas, Brendan Callahan and Ann
Sawyer
Collecting Natural SMS and Chat Conversations in Multiple
Languages: The BOLT Phase 2 Corpus
656 Daniel Hladek, Jan Stas and Jozef Juhar The Slovak Categorized News Corpus
680 Matus Pleva and Jozef Juhar TUKE-BNews-SK: Slovak Broadcast News Corpus Construction
and Evaluation
675 Irina Temnikova, William A. Baumgartner Jr., Negacy D. Hailu,
Ivelina Nikolova, Tony McEnery, Adam Kilgarriff, Galia
Angelova and K. Bretonnel Cohen
Sublanguage Corpus Analysis Toolkit: A tool for assessing the
representativeness and sublanguage characteristics of
corpora
681 Csaba Oravecz, Tamás Váradi and Bálint Sass The Hungarian Gigaword Corpus
690 Željko Agić and Nikola Ljubešić The SETimes.HR Linguistically Annotated Corpus of Croatian
841 Nikola Ljubešić and Antonio Toral caWaC -- A web corpus of Catalan and its application to
language modeling and machine translation
691 Jerid Francom, Mans Hulden and Adam Ussishkin ACTIV-ES: a comparable, cross-dialect corpus of ‘everyday’
Spanish from Argentina, Mexico, and Spain
714 Vidas Daudaravicius Language Editing Dataset of Academic Texts
777 Suguru Matsuyoshi, Ryo Otsuki and Fumiyo Fukumoto Annotating the Focus of Negation in Japanese Text
1019 Siddharth Jain, Archna Bhatia, Angelique Rein and Eduard
Hovy
A Corpus of Participant Roles in Contentious Discussions
Session : P25 - Machine
Translation
Chair: Hitoshi Isahara
TBC
210 Michael Carl, Mercedes Martínez García and Bartolomé
Mesa-Lao
CFT13: A resource for research into the post-editing process
384 Nianwen Xue, Ondrej Bojar, Jan Hajic, Martha Palmer,
Zdenka Uresova and Xiuhong Zhang
Not an Interlingua, But Close: Comparison of English AMRs to
Chinese and Czech
390 Miriam Kaeshammer and Anika Westburg On Complex Word Alignment Configurations
414 Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee,
Ritesh Shah and Pushpak Bhattacharyya
Shata-Anuvadak: Tackling Multiway Translation of Indian
Languages
473 Marco Turchi and Matteo Negri Automatic Annotation of Machine Translation Datasets with
Binary Quality Judgements
676 Violeta Seretan, Pierrette Bouillon and Johanna Gerlach A Large-Scale Evaluation of Pre-editing Strategies for
Improving User-Generated Content Translation
735 Nicolas Pécheux, Alexander Allauzen and François Yvon Rule-based Reordering Space in Statistical Machine
Translation
682 Kunal Sachdeva, Rishabh Srivastava, Sambhav Jain and Dipti
Sharma
Hindi to English Machine Translation: Using Effective
Selection in Multi-Model SMT
Session : P26 - Parallel
Corpora
Chair: Dan Tufiș
1137 Jayendra Rakesh Yeka, Prasanth Kolachina and Dipti Misra
Sharma
Benchmarking of English-Hindi parallel corpora
328 Petic Mircea and Daniela Gîfu Transliteration and alignment of parallel texts from Cyrillic to
Latin
674 Manuela Sanguinetti, Cristina Bosco and Loredana Cupi Exploiting catenae in a parallel treebank alignment
772 Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova
and Eric Wehrli
SwissAdmin: A multilingual tagged parallel corpus of press
releases
774 Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma,
Francisco Oliveira and Lu Yi
UM-Corpus: A Large English-Chinese Parallel Corpus for
Statistical Machine Translation
807 Raphael Rubino, Antonio Toral, Nikola Ljubešić and Gema
Ramírez-Sánchez
Quality Estimation for Synthetic Parallel Data Generation
DAY2 Poster Sessions
846 Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis and Daiga
Deksne
Billions of Parallel Words for Free: Building and Using the EU
Bookshop Corpus
877 Ahmed Abdelali, Francisco Guzman, Hassan Sajjad and
Stephan Vogel
The AMARA Corpus: Building parallel language resources for
the educational domain
1159 Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland and Colin
Warner
Incorporating Alternate Translations into English Translation
Treebank
1199 Shikun Zhang, Wang Ling and Chris Dyer Dual Subtitles as Parallel Corpora
285 Pavel Vondřička Aligning parallel texts with InterText
Session : P27 - Sign
Language
Chair: Thomas Hanke
6 Rosalee Wolfe, John McDonald, Larwan Berke and Marie
Stumbo
Expanding n-gram analytics in ELAN and a case study for sign
synthesis
209 Matti Karppa, Ville Viitaniemi, Marcos Luzardo, Jorma
Laaksonen and Tommi Jantunen
SLMotion - An extensible sign language oriented video
analysis tool
440 Ville Viitaniemi, Tommi Jantunen, Leena Savolainen, Matti
Karppa and Jorma Laaksonen
S-pot - a benchmark in spotting signs within continuous
signing
278 Mayumi Bono, Kouhei Kikuchi, Paul Cibulka and Yutaka Osugi A Colloquial Corpus of Japanese Sign Language: Linguistic
Resources for Observing Sign Language Conversations
371 Leah Geer and Jonathan Keane Exploring factors that contribute to successful fingerspelling
comprehension
585 Jens Forster, Christoph Schmidt, Oscar Koller, Martin
Bellgardt and Hermann Ney
Extensions of the Sign Language Recognition and Translation
Corpus RWTH-PHOENIX-Weather
634 Julie Hochgesang The Use of a FileMaker Pro Database in Evaluating Sign
Language Notation Systems
1138 Mark Dilsizian, Polina Yanovich, Shu Wang, Carol Neidle and
Dimitris Metaxas
A New Framework for Sign Language Recognition based on
3D Handshape Identification and Linguistic Modeling
Sessions: 11.45 - 13.25 Area 2
Session : P28 - Information
Extraction
Chair: Diana Maynard
3 Xavier Tannier Extracting News Web Page Creation Time with DCTFinder
190 Hans-Ulrich Krieger, Christian Spurk, Hans Uszkoreit, Feiyu
Xu, Yi Zhang, Frank Müller and Thomas Tolxdorff
Information Extraction from German Patient Records via
Hybrid Parsing and Relation Extraction Strategies
449 Júlia Pajzs, Ralf Steinberger, Maud Ehrmann, Mohamed
Ebrahim, Leonida Della Rocca, Stefano Bucci, Eszter Simon
and Tamás Váradi
Media monitoring and information extraction for the highly
inflected agglutinative language Hungarian
536 Antje Schlaf, Claudia Bobach and Matthias Irmer Creating a Gold Standard Corpus for the Extraction of
Chemistry-Disease Relations from Patent Texts
590 Felice Dell'Orletta, Giulia Venturi, Andrea Cimino and
Simonetta Montemagni
T2K^2: a System for Automatically Extracting and Organizing
Knowledge from Texts
764 Johannes Kirschnick, Alan Akbik and Holmer Hemsen Freepal: A Large Collection of Deep Lexico-Syntactic Patterns
for Relation Extraction
791 Marc Poch, Núria Bel, Sergio Espeja and Felipe Navio Ranking Job Offers for Candidates: learning hidden
knowledge from Big Data
913 Paul Buitelaar, Georgeta Bordea and Barry Coughlan Hot Topics and Schisms in NLP: Community and Trend
Analysis with Saffron on ACL and LREC Proceedings
1009 Andre Blessing and Jonas Kuhn Textual Emigration Analysis (TEA)
Session : P29 - Lexicons Chair: Nianwen Xue
4 Tristan Miller and Iryna Gurevych WordNet–Wikipedia–Wiktionary: Construction of a Three-
way Alignment
248 Lei Zhang, Michael Färber and Achim Rettinger xLiD-Lexica: Cross-lingual Linked Data Lexica
316 Begum Erten, Cem Bozsahin and Deniz Zeyrek Turkish Resources for Visual Word Recognition
339 Martin Jansche Computer-Aided Quality Assurance of an Icelandic
Pronunciation Dictionary
397 Lars Borin, Jens Allwood and Gerard de Melo Bring vs. MTRoget: Evaluating automatic thesaurus
translation
417 Wushouer Mairidan, Toru Ishida, Donghui Lin and Katsutoshi
Hirayama
Bilingual Dictionary Induction as an Optimization Problem
563 Tommaso Caselli, Laure Vieu, Carlo Strapparava and Guido
Vetere
Enriching the "Senso Comune" Platform with Automatically
Acquired Data
588 Sameh Alansary MUHIT: A Multilingual Harmonized Dictionary
604 Aurelie Neveol, Julien Grosjean, Stéfan Darmoni and Pierre
Zweigenbaum
Language Resources for French in the Biomedical Domain
1021 Pyry Takala, Pekka Malo, Ankur Sinha and Oskar Ahlgren Gold-standard for Topic-specific Sentiment Analysis of
Economic Texts
Session : P30 - Large
Projects and Infrastructural
Issues
Chair: Yohei Murakami
31 Peter Spyns and Remco van Veenendaal A decade of HLT Agency activities in the Low Countries: from
resource maintenance (BLARK) to service offerings (BLAISE)
410 Koenraad De Smedt, Erhard Hinrichs, Detmar Meurers,
Inguna Skadina, Bolette Pedersen, Costanza Navarretta,
Núria Bel, Krister Linden, Marketa Lopatkova, Jan Hajic, Gisle
Andersen and Przemyslaw Lenkiewicz
CLARA: A New Generation of Researchers in Common
Language Resources and Their Applications
814 Lina Henriksen, Dorte Haltrup Hansen, Bente Maegaard,
Bolette Sandford Pedersen and Claus Povlsen
Encompassing a spectrum of LT users in the CLARIN-DK
Infrastructure
452 Maarten Truyens and Patrick Van Eecke Legal aspects of text mining
459 Jan Odijk CLARIN-NL: Major results
795 Auður Hauksdóttir An Innovative World Language Centre : Challenges for the
Use of Language Technology
945 Joseph Mariani, Christopher Cieri, Gil Francopoulo, Patrick
Paroubek and Marine Delaborde
Facing the Identification Problem in Language-Related
Scientific Data Analysis
983 Frank Landsbergen, Carole Tiberius and Roderik Dernison Taalportaal: an online grammar of Dutch and Frisian
Session : P31 - Opinion
Mining and Reviews Analysis
Chair: Manfred Stede
TBC
85 Roman Klinger and Philipp Cimiano The USAGE review corpus for fine grained multi lingual
opinion analysis
258 Christian Haenig, Andreas Niekler and Carsten Wuensch PACE Corpus: a multilingual corpus of Polarity-annotated
textual data from the domains Automotive and CEllphone
293 Patrik Lambert and Carlos Rodriguez-Penagos Adapting Freely Available Resources to Build an Opinion
Mining Pipeline in Portuguese
350 Roser Saurí, Judith Domingo and Toni Badia The NewSoMe Corpus: A Unifying Opinion Annotation
Framework across Genres and in Multiple Languages
356 André Bittar, Luca Dini, Sigrid Maurel and Mathieu Ruhlmann The Dangerous Myth of the Star System
1001 Wiltrud Kessler and Jonas Kuhn A Corpus of Comparisons in Product Reviews
Session : P32 - Social Media
Processing
Chair: Fei Xia
1116 Clare Voss, Stephen Tratz, Jamal Laoudi and Douglas Briesch Finding Romanized Arabic Dialect in Code-Mixed Tweets
53 Fabrizio Gotti, Phillippe Langlais and Atefeh Farzindar Hashtag Occurrences, Layout and Translation: A Corpus-
driven Analysis of Tweets Published by the Canadian
Government
83 Guoyu Tang, Yunqing Xia, Weizhi Wang, Raymond Lau and
Fang Zheng
Clustering tweets usingWikipedia concepts
317 Eshrag Refaee and Verena Rieser An Arabic Twitter Corpus for Subjectivity and Sentiment
Analysis
442 Iñaki Alegria, Nora Aranberri, Pere Comas, Victor Fresno,
Pablo Gamallo, Lluís Padró, Iñaki San Vicente, Jordi Turmo
and Arkaitz Zubiaga
TweetNorm_es: an annotated corpus for Spanish microtext
normalization
834 Nikola Ljubešić, Darja Fišer and Tomaž Erjavec TweetCaT: a tool for building Twitter corpora of smaller
languages
1146 Tatjana Scheffler A German Twitter Snapshot
Session : P33 - Treebanks Chair: Montserrat
Marimón
444 Elżbieta Hajnicz The Procedure of Lexico-Semantic Annotation of Składnica
Treebank
494 Marie Candito, Guy Perrier, Bruno Guillaume, Corentin
Ribeyre, Karën Fort, Djamé Seddah and Eric de la Clergerie
Deep Syntax Annotation of the Sequoia French Treebank
538 Alina Wróblewska and Adam Przepiórkowski Projection-based Annotation of a Polish Dependency
Treebank
694 Željko Agić, Daša Berović, Danijela Merkler and Marko Tadić Croatian Dependency Treebank 2.0: New Annotation
Guidelines for Improved Parsing
766 Rachel Bawden, Marie-Amélie Botalla, Kim Gerdes and
Sylvain Kahane
Correcting and Validating Syntactic Dependency in the
Spoken French Treebank Rhapsodie
860 Kilian A. Foth, Arne Köhn, Niels Beuck and Wolfgang Menzel Because Size Does Matter: The Hamburg Dependency
Treebank
915 Rudolf Rosa, Jan Mašek, David Mareček, Martin Popel, Daniel
Zeman and Zdeněk Žabokrtský
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
995 Munshi Asadullah, Patrick Paroubek and Anne Vilnat Bidirectionnal converter between syntactic annotations :
from French Treebank Dependencies to PASSAGE
annotations, and back
1145 Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul,
Nizar Habash and Ramy Eskander
Developing an Egyptian Arabic Treebank: Impact of Dialectal
Morphology on Annotation and Tool Development
Sessions: 14.55 - 16.35 Area 1
Session : P34 - Corpora and
Annotation
Chair: Zygmunt Vetulani
TBC
219 Inès Zribi, Rahma Boujelbane, Abir Masmoudi, Mariem
Ellouze, Lamia Belguith and Nizar Habash
A Conventional Orthography for Tunisian Arabic
956 Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama
Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah
Alkuhlani and Kemal Oflazer
Large Scale Arabic Error Annotation: Guidelines and
Framework
763 Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata and Tetsuro
Sasada
Flow Graph Corpus from Recipe Texts
842 Marc Kupietz and Harald Lüngen Recent Developments in DeReKo
843 Shu-Kai Hsieh Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is
Killing Chinese Corpus Linguistics
849 Jannik Strötgen, Thomas Bögel, Julian Zell, Ayser Armiti, Tran
Van Canh and Michael Gertz
Extending HeidelTime for Temporal Expressions Referring to
Historic Dates
852 Thomas Eckart, Erla Hallsteinsdóttir, Sigrún Helgadóttir, Uwe
Quasthoff and Dirk Goldhahn
A 500 Million Word POS-Tagged Icelandic Corpus
916 Shan Wang and Francis Bond Building The Sense-Tagged Multilingual Parallel Corpus
922 Anik Dey and Pascale Fung A Hindi-English Code-Switching Corpus
934 Andrea Abel, Aivars Glaznieks, Lionel Nicolas and Egon Stemle KoKo: an L1 Learner Corpus for German
1000 Vasile Rus, Rajendra Banjade and Mihai Lintean On Paraphrase Identification Corpora
1070 Anne Garcia-Fernandez, Anne-Laure Ligozat and Anne Vilnat Construction and Annotation of a French Folkstale Corpus
1226 Shyam Sundar Agrawal, Abhimanue, Shweta Bansal and
Minakshi Mahajan
Statistical Analysis of Multilingual Text Corpus and
Development of Language Models
1037 Vanessa Loza, Shibamouli Lahiri, Rada Mihalcea and Po-
Hsiang Lai
Building a Dataset for Summarization and Keyword
Extraction from Emails
Session : P35 - Grammar
and Syntax
Chair: Tamás Váradi
639 Emily M. Bender Language CoLLAGE: Grammatical Description with the LinGO
Grammar Matrix
773 Anna Vernerová, Václava Kettnerová and Marketa Lopatkova To Pay or to Get Paid: Enriching a Valency Lexicon with
Diatheses
1060 Georgios Petasis The Ellogon Pattern Engine: Context-free Grammars over
Annotations
1079 Dana Dannells and Normunds Gruzitis Extracting a bilingual semantic grammar from FrameNet-
annotated corpora
1149 Kyoko Ohara Relating Frames and Constructions in Japanese FrameNet
1179 Lars Hellan, Dorothee Beermann, Tore Bruland, Mary Esther
Kropp Dakubu and Montserrat Marimon
MultiVal - towards a multilingual valence lexicon
1214 Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria
Silvello
A Vector Space Model for Syntactic Distances Between
Dialects
349 Jana Sindlerova, Zdenka Uresova and Eva Fucikova Resources in Conflict: A Bilingual Valency Lexicon vs. a
Bilingual Treebank vs. a Linguistic Theory
Session : P36 - Metaphors Chair: Walter Daelemans
TBC
241 Samira Shaikh, Tomek Strzalkowski, Ting Liu, George Aaron
Broadwell, Boris Yamrom, Sarah Taylor, Laurie Feldman, Kit
Cho, Umit Boz, Ignacio Cases, Yuliya Peshkova and Ching-
Sheng Lin
A Multi-Cultural Repository of Automatically Discovered
Linguistic and Conceptual Metaphors
419 Brian MacWhinney and Davida Fromm Two Approaches to Metaphor Detection
737 Andrew Gargett and John Barnden Mining Online Discussion Forums for Metaphors
Session : P37 - Named
Entity Recognition
Chair: German Rigau
186 Kareem Darwish and Wei Gao Simple Effective Microblog Named Entity Recognition: Arabic
as an Example
236 Cyril Grouin Biomedical entity extraction using machine-learning based
approaches
276 Darina Benikova, Chris Biemann and Marc Reznicek NoSta-D Named Entity Annotation for German: Guidelines
and Dataset
358 Haibo Li, Masato Hagiwara, Qi Li and Heng Ji Comparison of the Impact of Word Segmentation on Name
Tagging for Chinese and Japanese
391 Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister
Lindén and Lars Borin
HFST-SweNER – A New NER Resource for Swedish
421 Hege Fromreide, Dirk Hovy and Anders Søgaard Crowdsourcing and annotating NER for Twitter #drift
468 Guillaume Jacquet, Maud Ehrmann and Ralf Steinberger Clustering of Multi-Word Named Entity variants: Multilingual
Evaluation
513 Daniela Amaral, Evandro Fonseca, Lucelene Lopes and
Renata Vieira
Comparative Analysis of Portuguese Named Entities
Recognition Tools
549 Cédric Lopez, Frédérique Segond, Olivier Hondermarck,
Paolo Curtoni and Luca Dini
Generating a Resource for Products and Brandnames
Recognition. Application to the Cosmetic Domain
688 Younggyun Hahm, Jungyeul Park, Kyungtae Lim, Youngsik
Kim, Dosam Hwang and Key-Sun Choi
Named Entity Corpus Construction using Wikipedia and
DBpedia Ontology
865 Andrea Glaser and Jonas Kuhn Exploring the utility of coreference chains for improved
identification of personal names
967 Joachim Bingel and Thomas Haider Named Entity Tagging a Very Large Unbalanced Corpus:
Training and Evaluating NE Classifiers
Session : P38 - Question
Answering
Chair: António Branco
12 Peter Exner and Pierre Nugues REFRACTIVE: An Open Source Tool to Extract Knowledge
from Syntactic and Semantic Relations
74 Akira Fujita, Akihiro Kameda, Ai Kawazoe and Yusuke Miyao Overview of Todai Robot Project and Evaluation Framework
of its NLP-based Problem Solving
124 Kirk Roberts, Kate Masterton, Marcelo Fiszman, Halil Kilicoglu
and Dina Demner-Fushman
Annotating Question Decomposition on Complex Medical
Questions
130 Sérgio Curto, Ana C. Mendes, Pedro Curto, Luísa Coheur and
Angela Costa
JUST.ASK, a QA system that learns to answer new questions
from previous interactions
271 Kugatsu Sadamitsu, Ryuichiro Higashinaka and Yoshihiro
Matsuo
Extraction of Daily Changing Words for Question Answering
902 Artem Ostankov, Florian Röhrbein and Ulli Waltinger LinkedHealthAnswers: Towards Linked Data-driven Question
Answering for the Health Care Domain
990 Axel-Cyrille Ngonga Ngomo, Norman Heino, René Speck and
Prodromos Malakasiotis
A tool suite for creating question answering benchmarks
Session : P39 - Speech
Resources
Chair: Henk van den
Heuvel
650 Luca Cristoforetti, Mirco Ravanelli, Maurizio Omologo,
Alessandro Sosi, Alberto Abad, Martin Hagmueller and Petros
Maragos
The DIRHA simulated corpus
695 Roberto Gretter Euronews: a multilingual speech corpus for ASR
709 Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig,
Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi and
Ryosuke Isotani
Towards Multilingual Conversations in the Medical Domain:
Development of Multilingual Medical Data and A Network-
based ASR System
710 Andrej Zgank, Ana Zwitter Vitez and Darinka Verdonik The Slovene BNSI Broadcast News database and reference
speech corpus GOS: Towards the uniform guidelines for
future work
719 Jan Gorisch, Corine Astésano, Ellen, Gurman Bard, Brigitte
Bigi and Laurent Prévot
Aix Map Task corpus: The French multimodal corpus of task-
oriented dialogue
739 Carmen Garcia-Mateo, Antonio Cardenal, Xose Luis Regueira,
Elisa Fernández Rei, Marta Martinez, Roberto Seara, Rocío
Varela and Noemí Basanta
CORILGA: a Galician Multilevel Annotated Speech Corpus for
Linguistic Analysis
744 Igor Odriozola, Inma Hernaez, María Inés Torres, Luis Javier
Rodriguez-Fuentes, Mikel Penagarikano and Eva Navas
Basque Speecon-like and Basque SpeechDat MDB-600:
speech databases for the development of ASR technology for
Basque
799 David Tavarez, Eva Navas, Daniel Erro, Ibon Saratxaga and
Inma Hernaez
New bilingual speech databases for audio diarization
748 Tobias Bocklet, Andreas Maier, Korbinian Riedhammer,
Ulrich Eysholdt and Elmar Nöth
Erlangen-CLP: A Large Annotated Corpus of Speech from
Children with Cleft Lip and Palate
789 Evgeny Stepanov, Giuseppe Riccardi and Ali Orkan Bayer The Development of the Multilingual LUNA Corpus for
Spoken Language System Porting
Sessions: 16.55 - 18.15 Area 2
Session : P40 - Lexicons Chair: Yoshihiko Hayashi
602 Bruno Guillaume, Karën Fort, Guy Perrier and Paul Bédaride Mapping the Lexique des Verbes du Francais (Lexicon of
French Verbs) to a NLP lexicon using examples
633 Satoshi Sato Text Readability and Word Distribution in Japanese
657 Uwe Quasthoff, Dirk Goldhahn, Thomas Eckart, Erla
Hallsteinsdóttir and Sabine Fiedler
High Quality Word Lists as a Resource for Multiple Purposes
672 Þórdís Úlfarsdóttir ISLEX – a Multilingual Web Dictionary
704 Eduard Bejček, Kettnerová Václava and Marketa Lopatkova Automatic Mapping Lexical Resources: A Lexical Unit as the
Keystone
753 Cédric Lopez, Reda Bestandji, Mathieu Roche and Rachel
Panckhurst
Towards Electronic SMS Dictionary Construction: An
Alignment-based Approach
803 Ahmet Aker, Monica Paramita, Marcis Pinnis and Robert
Gaizauskas
Bilingual dictionaries for all EU languages
808 Janine Pimentel Adding a Third Language to a Lexical Resource Describing
Legal Terminology: the assignment of equivalents
844 Tafseer Ahmed Khan Automatic acquisition of Urdu nouns (along with gender and
irregular plurals)
1031 Valeria de Paiva, Livy Real, Alexandre Rademaker and Gerard
de Melo
NomLex-PT: A Lexicon of Portuguese Nominalizations
Session : P41 - Parsing Chair: Simonetta
Montemagni
60 Hen-Hsen Huang, Huan-Yuan Chen, Chang-Sheng Yu, Hsin-Hsi
Chen, Po-Ching Lee and Chun-Hsun Chen
Sentence Rephrasing for Parsing Sentences with OOV Words
62 Cheikh M. Bamba Dione Pruning the Search Space of the Wolof LFG Grammar Using a
Probabilistic and a Constraint Grammar Parser
73 Elena Mitocariu, Daniel-Alexandru Anechitei, Dan Cristea How Could Veins Speed Up The Process Of Discourse Parsing
239 Achim Stein Parsing Heterogeneous Corpora with a Rich Dependency
Grammar
453 Angelina Ivanova and Gertjan van Noord Treelet Probabilities for HPSG Parsing and Error Correction
543 Arda Celebi and Arzucan Özgür Self-training a Constituency Parser using n-gram Trees
1089 Natalia Silveira, Timothy Dozat, Marie-Catherine de
Marneffe, Samuel Bowman, Miriam Connor, John Bauer and
Chris Manning
A Gold Standard Dependency Corpus for English
230 Wolfgang Maier, Miriam Kaeshammer, Peter Baumann and
Sandra Kübler
Discosuite - A parser test suite for German discontinuous
structures
Session : P42 - Part-of-
Speech Tagging
Chair: Krister Linden
510 Timur Gilmanov, Olga Scrivner and Sandra Kübler SWIFT Aligner, A Multifunctional Tool for Parallel Corpora:
Visualization, Word Alignment, and (Morpho)-Syntactic Cross-
Language Transfer
275 Saba Urooj, Sarmad Hussain, Asad Mustafa, Rahila Parveen,
Farah Adeeba, Tafseer Ahmed Khan, Miriam Butt and
Annette Hautli
The CLE Urdu POS Tagset
335 Kareem Darwish, Ahmed Abdelali and Hamdy Mubarak Using Stem-Templates to Improve Arabic POS and
Gender/Number Tagging
362 Gaël de Chalendar The LIMA Multilingual Analyzer Made Free: FLOSS Resources
Adaptation and Correction
544 Bushra Jawaid, Amir Kamran and Ondrej Bojar A Tagged Corpus and a Tagger for Urdu
677 Sigrún Helgadóttir, Hrafn Loftsson and Eiríkur Rögnvaldsson Correcting Errors in a New Gold Standard for Tagging
Icelandic Text
1018 Łukasz Kobyliński PoliTa: A multitagger for Polish
Session : P43 - Semantics Chair: Marc Verhagen
556 Francesca Frontini, Valeria Quochi, Sebastian Padó, Monica
Monachini and Jason Utt
Polysemy Index for Nouns: an Experiment on Italian using the
PAROLE SIMPLE CLIPS Lexical Database
619 Muntsa Padró, Marco Idiart, Aline Villavicencio and Carlos
Ramisch
Comparing Similarity Measures for Distributional Thesauri
643 Elisa Omodei, Jean-Philippe Cointet and Thierry Poibeau Reconstructing the Semantic Landscape of Natural Language
Processing
754 Olivier Ferret Compounds and distributional thesauri
823 Kyle Richardson and Jonas Kuhn UnixMan Corpus: A Resource for Language Learning in the
Unix Domain
866 Tatiana Erekhinskaya, Meghana Satpute and Dan Moldovan Multilingual eXtended WordNet Knowledge Base: Semantic
Parsing and Translation of Glosses
867 Manel Zarrouk and Mathieu Lafourcade Relation Inference in Lexical Networks ... with Refinements
900 Raquel Amaro Extracting semantic relations from Portuguese corpora using
lexical-syntactic patterns
904 David Jurgens An analysis of ambiguity in word sense annotations
1012 Claire Bonial, Julia Bonn, Kathryn Conger, Jena D. Hwang and
Martha Palmer
PropBank: Semantics of New Predicate Types
1043 Michael Mohler, Marc Tomlinson, David Bracewell and Bryan
Rink
Semi-supervised methods for expanding psycholinguistics
norms by integrating distributional similarity with the
structure of WordNet
1150 Gemma Bel Enguix, Reinhard Rapp and Michael Zock A Graph-Based Approach for Computing Free Word
Associations
1168 Martin Gleize and Brigitte Grau A hierarchical taxonomy for classifying hardness of inference
tasks
Session : P44 - Speech
Recognition and Synthesis
Chair: Denise Di Persio
196 Joris Pelemans, Kris Demuynck, Hugo Van hamme and Patrick
Wambacq
Speech Recognition Web Services for Dutch
383 Maria Goryainova, Cyril Grouin, Sophie Rosset and Ioana
Vasilescu
Morpho-Syntactic Study of Errors from Speech Recognition
System
771 Daniel Luzzati, Cyril Grouin, Ioana Vasilescu, Martine Adda-
Decker, Eric Bilinski, Nathalie Camelin, Juliette Kahn, Carole
Lailler, Lori Lamel and Sophie Rosset
Human annotation of ASR error regions: Is "gravity" a
sharable concept for human annotators?
430 Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi
Development of a TV Broadcasts Speech Recognition System
for Qatari Arabic
434 Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi
Automatic Long Audio Alignment and Confidence Scoring for
Conversational Arabic Speech
533 Giampiero Salvi and Niklas Vanhainen The WaveSurfer Automatic Speech Recognition Plugin
715 Matti Varjokallio and Mikko Kurimo A Toolkit for Efficient Learning of Lexical Units for Speech
Recognition
838 Aimilios Chalamandaris, Pirros Tsiakoulis, Sotiris Karabetsos
and Spyros Raptis
Using Audio Books for Training a Text-to-Speech System
Sessions: 18.20 - 19.20 Area 1
Session : P45 - Anaphora
and Coreference
Chair: Costanza
Navarretta
286 Panot Chaimongkol, Akiko Aizawa and Yuka Tateisi Corpus for Coreference Resolution on Scientific Papers
298 Liane Guillou, Christian Hardmeier, Aaron Smith, Jörg
Tiedemann and Bonnie Webber
ParCor 1.0: A Parallel Pronoun-Coreference Corpus to
Support Statistical MT
372 Nobal Niraula, Vasile Rus, Rajendra Banjade, Dan Stefanescu,
William Baggett and Brent Morgan
The DARE Corpus: A Resource for Anaphora Resolution in
Dialogue Based Intelligent Tutoring Systems
726 Christian Girardi, Manuela Speranza, Rachele Sprugnoli and
Sara Tonelli
CROMER: a Tool for Cross-Document Event and Entity
Coreference
729 Arturs Znotins and Peteris Paikens Coreference Resolution for Latvian
850 Nadjet Bouayad-Agha, Alicia Burga, Gerard Casamayor, Joan
Codina, Rogelio Nazar and Leo Wanner
An Exercise in Reuse of Resources: Adapting General
Discourse Coreference Resolution for Detecting Lexical
Chains in Patent Documentation
891 Anders Björkelund, Kerstin Eckart, Arndt Riester, Nadja
Schauffler and Katrin Schweitzer
The Extended DIRNDL Corpus as a Resource for Coreference
and Bridging Resolution
918 Marcos Garcia and Pablo Gamallo Multilingual corpora with coreferential annotation of person
entities
1088 Maciej Ogrodniczuk, Mateusz Kopeć and Agata Savary Polish Coreference Corpus in Numbers
Session : P46 - Information
Extraction and Information
Retrieval
Chair: Dimitrios
Kokkinakis
45 Véronique Moriceau and Xavier Tannier French Resources for Extraction and Normalization of
Temporal Expressions with HeidelTime
99 Zdenka Uresova, Jan Hajic, Pavel Pecina and Ondrej Dusek Multilingual Test Sets for Machine Translation of Search
Queries for Cross-Lingual Information Retrieval in the
Medical Domain
106 Huijing Deng and Grzegorz Chrupała Semantic approaches to software component retrieval with
English queries
250 Hong Li, Sebastian Krause, Feiyu Xu, Hans Uszkoreit, Robert
Hummel and Veselina Mironova
Annotating Relation Mentions in Tabloid Press
337 Shaoda He, Xiaojun Zou, Liumingjing Xiao and Junfeng Hu Construction of Diachronic Ontologies from People's Daily of
Fifty Years
389 Maria Evangelia Chatzimina, Cyril Grouin and Pierre
Zweigenbaum
Use of unsupervised word classes for entity recognition:
Application to the detection of disorders in clinical reports
409 Alan Akbik and Thilo Michael The Weltmodell: A Data-Driven Commonsense Knowledge
Base
645 Marieke van Erp, Gleb Satyukov, Piek Vossen and Marit Nijsen Discovering and Visualising Stories in News
1107 Tomohide Shibata, Shotaro Kohama and Sadao Kurohashi A Large Scale Database of Strongly-related Events in Japanese
218 Steven Bethard, Philip Ogren and Lee Becker ClearTK 2.0: Design Patterns for Machine Learning in UIMA
Session : P47 - Language
Identification
Chair: Michael Rosner
TBC
435 Dirk Goldhahn and Uwe Quasthoff Vocabulary-Based Language Similarity using Web Corpora
732 Thomas Lavergne, Gilles Adda, Martine Adda-Decker and Lori
Lamel
Automatic language identity tagging on word and sentence-
level in multilingual text sources: a case-study on
Luxembourgish
996 Marcos Zampieri and Binyam Gebre VarClass: An Open-source Language Identification Tool for
Language Varieties
1068 Xiao Jiang, Yufan Guo, Jeroen Geertzen, Dora Alexopoulou,
Lin Sun and Anna Korhonen
Native Language Identification Using Large, Longitudinal Data
1183 Liviu Dinu and Alina Maria Ciobanu On the Romance Languages Mutual Intelligibility
Session : P48 - Morphology Chair: Pavel Smrz TBC
784 Senka Drobac, Krister Lindén, Tommi Pirinen and Miikka
Silfverberg
Heuristic Hyper-minimization of Finite State Lexicons
793 Claudia Borg and Albert Gatt Crowd-sourcing evaluation of automatically acquired,
morphologically related word groupings
896 Patrick Littell, Kaitlyn Price and Lori Levin Morphological parsing of Swahili using crowdsourced lexical
resources
909 Carla Parra Escartín Chasing the Perfect Splitter: A Comparison of Different
Compound Splitting Tools
1003 Vincent Claveau and Ewa Kijak Generating and using probabilistic morphological resources
for the biomedical domain
1051 Peter Baumann and Janet Pierrehumbert Using Resource-Rich Languages to Improve Morphological
Analysis of Under-Resourced Languages
1073 Ozlem Cetinoglu Turkish Treebank as a Gold Standard for Morphological
Disambiguation and Its Influence on Parsing
1074 Krešimir Šojat, Matea Srebačić, Marko Tadić and Tin Pavelić CroDeriV: a new resource for processing Croatian morphology
1090 Jan Šnajder DerivBase.hr: A High-Coverage Derivational Morphology
Resource for Croatian
1207 Jonathan Washington, Ilnar Salimzyanov and Francis Tyers Finite-state morphological transducers for three Kypchak
languages
Session : P49 -
Multimodality
Chair: Volker Steinbiss
TBC
51 Brigitte Bigi, Tatsuya Watanabe and Laurent Prévot Representing Multimodal Linguistic Annotated data
160 Michael Kipp, Levin Freiherr von Hollen, Michael Christopher
Hrstka and Franziska Zamponi
Single-Person and Multi-Party 3D Visualizations for
Nonverbal Communication Analysis
163 Huseyin Cakmak, Jerome Urbain, Thierry Dutoit and Joelle
Tilmanne
The AV-LASYN Database : A synchronous corpus of audio and
3D facial marker data for audio-visual laughter synthesis
189 Vincent Vandeghinste and Ineke Schuurman Linking Pictographs to Synsets: Sclera2Cornetto
192 Dietmar Schabus, Michael Pucher and Phil Hoole The MMASCS multi-modal annotated synchronous corpus of
audio, video, facial motion and tongue motion data of
normal, fast and slow speech
235 Mathieu Chollet, Magalie Ochs and Catherine Pelachaud Mining a multimodal corpus for non-verbal behavior
sequences conveying attitudes
318 Massimo Moneglia, Susan Brown, Francesca Frontini, Gloria
Gagliardi, Fahad Khan, Monica Monachini and Alessandro
Panunzi
The IMAGACT Visual Ontology. An Extendable Multilingual
Infrastructure for the representation of lexical encoding of
Action
354 Kodai Takahashi and Masashi Inoue Multimodal dialogue segmentation with gesture post-
processing
374 Shannon Hennig, Ryad Chellali and Nick Campbell The D-ANS corpus: the Dublin-Autonomous Nervous System
corpus of biosignal and multimodal recordings of
conversational speech
Sessions: 9.45 - 11.25 Area 1
Session : P50 -
Crowdsourcing
Chair: Cristina Vertan
214 Jean-Philippe Goldman, Adrian Leeman, Marie-José Kolly,
Ingrid Hove, Ibrahim Almajai, Volker Dellwo and Steven
Moran
A Crowdsourcing Smartphone Application for Swiss
German: Putting Language Documentation in the
Hands of the Users
738 Theodosia Togia and Ann Copestake TagNText: A parallel corpus for the induction of
resource-specific non-taxonomical relations from
756 Shinsuke Goto, Donghui Lin and Toru Ishida Crowdsourcing for Evaluating Machine Translation
813 George Kiomourtzis, George Giannakopoulos, Georgios
Petasis, Pythagoras Karampiperis and Vangelis Karkaletsis
NOMAD: Linguistic Resources and Tools Aimed at
Policy Formulation and Validation
1106 Darja Fišer, Aleš Tavčar and Tomaž Erjavec sloWCrowd: A crowdsourcing tool for lexicographic
Session : P51 - Emotion
Recognition and Generation
Chair: Patrick Paroubek
322 Maxim Sidorov, Stefan Ultes and Alexander Schmitt Comparison of Gender- and Speaker-adaptive Emotion
341 Maxim Sidorov, Christina Brester, Wolfgang Minker and
Eugene Semenkin
Speech-Based Emotion Recognition: Feature Selection
by Self-Adaptive Multi-Criteria Genetic Algorithm
334 Nesrine Fourati and Catherine Pelachaud Emilya: Emotional body expression in daily actions
377 Juan-María Garrido, Yesika Laplaza, Benjamin Kolz and
Miquel Cornudella
TexAFon 2.0: A text processing tool for the generation
of expressive speech in TTS applications
591 Giovanni Costantini, Iacopo Iaderola, Andrea Paoloni and
Massimiliano Todisco
EMOVO Corpus: an Italian Emotional Speech Database
741 Demulier Virginie, Elisabetta Bevacqua, Florian Focone, Tom
Giraud, Pamela Carreno, Brice Isableu, Sylvie Gibet, Pierre De
Loor and Jean-Claude Martin
A Database of Full Body Virtual Interactions Annotated
with Expressivity Scores
1222 Sophia Lee, Shoushan Li and Chu-Ren Huang Annotating Events in an Emotion Corpus
Session : P52 - Linked Data Chair: John Philip McCrae
703 Tomáš Kliegr and Ondřej Zamazal Towards Linked Hypernyms Dataset 2.0:
complementing DBpedia with hypernym discovery
780 Mohamed Sherif, Sandro Coelho, Ricardo Usbeck, Sebastian
Hellmann, Jens Lehmann, Martin Brümmer and Andreas Both
NIF4OGGD - NLP Interchange Format for Open German
Governmental Data
856 Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel
Gerber and Andreas Both
N³ - A Collection of Datasets for Named Entity
Recognition and Disambiguation in the NLP
788 Riccardo Del Gratta, Gabriella Pardelli and Sara Goggi The LRE Map disclosed
1052 Clara Bacciu, Angelica Lo Duca, Andrea Marchetti and
Maurizio Tesconi
Accommodations in Tuscany as Linked Data
1182 David Lewis, Rob Brennan, Leroy Finn, Dominic Jones, Alan
Meehan, Declan O'sullivan, Sebastian Hellmann and Felix
Sasaki
Global Intelligent Content: Active Curation of Language
Resources using Linked Data
Session : P53 - Machine
Translation
Chair: Mikel Forcada
835 Ondrej Bojar, Vojtěch Diatka, Pavel Rychlý, Pavel Stranak, Vit
Suchomel, Aleš Tamchyna and Daniel Zeman
HindEnCorp - Hindi-English and Hindi-only Corpus for
Machine Translation
848 Mara Chinea-Rios, Germán Sanchis Trilles, Daniel Daniel
Ortiz-Martínez and Francisco Casacuberta
Online optimisation of log-linear weights in interactive
machine translation
964 Kashif Shah, Marco Turchi and Lucia Specia An efficient and user-friendly tool for machine
translation quality estimation
982 Santanu Pal, Sudip Kumar Naskar and Sivaji Bandyopadhyay Word Alignment-Based Reordering of Source Chunks in
PB-SMT
1095 Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos
Ramisch and Maria José Finatto
Comparing the Quality of Focused Crawlers and of the
Translation Resources Obtained from them
1097 Christian Buck, Kenneth Heafield and Bas van Ooyen N-gram Counts and Language Models from the
Common Crawl
1115 Guillaume Wisniewski, Natalie Kübler and François Yvon A Corpus of Machine Translation Errors Extracted from
Translation Students Exercises
1213 Alexandru Ceausu and Sabine Hunsicker Pre-ordering of phrase-based machine translation
input in translation workflow
1217 Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar,
Benjamin Van Durme and Matt Post
A Wikipedia-based Corpus for Contextualized Machine
Translation
Session : P54 -
Multimodality
Chair: Kristiina Jokinen
TBC
525 Costanza Navarretta and Magdalena Lis Transfer learning of feedback head expressions in
Danish and Polish comparable multimodal corpora
567 Onno Crasborn and Han Sloetjes Improving the exploitation of linguistic annotations in
627 Yoshihiko Hayashi Web-imageability of the Behavioral Features of Basic-
689 Zoraida Callejas, Brian Ravenet, Magalie Ochs and Catherine
Pelachaud
A model to generate adaptive multimodal job
interviews with a virtual recruiter
747 Coline Claude-Lachenaud, Eric Charton, Benoit Ozell and
Michel Gagnon
A multimodal interpreter for 3D visualization and
animation of verbal concepts
912 Philippe Martin New functions for a multipurpose multimodal tool for
phonetic and linguistic analysis of very large speech
947 Mariette Soury and Laurence Devillers Smile and Laughter in Human-Machine Interaction: a
study of engagement
DAY3 Poster Session
1017 Hendrik Buschmeier, Zofia Malisz, Joanna Skubisz, Marcin
Wlodarczak, Ipke Wachsmuth, Stefan Kopp, Petra Wagner
ALICO: a multimodal corpus for the study of active
listening
1053 Przemyslaw Lenkiewicz, Olha Shkaravska, Twan Goosen,
Daan Broeder, Menzo Windhouwer, Stephanie Roth and Olof
Olsson
The DWAN framework: Application of a web
annotation framework for the general humanities to
the domain of language resources
1119 Nicolas Auguin and Pascale Fung Co-Training for Classification of Live or Studio Music
Session : P55 - Ontologies Chair: Monica Monachini
251 Chetana Gavankar, Ashish Kulkarni and Ganesh
Ramakrishnan
Efficient Reuse of Structured and Unstructured
Resources for Ontology Population
686 Maria Pia di Buono and Mario Monteleone From Natural Language to Ontology Population in the
Cultural Heritage Domain. A Computational Linguistics-
781 Alessio Bosca, Matteo Casu, Matteo Dragoni and Nikolaos
Marianos
A Gold Standard for CLIR evaluation in the Organic
Agriculture Domain
851 Bernardo Severo, Cassia Trojahn and Renata Vieira VOAR: A Visual and Integrated Ontology Alignment
Sessions: 11.45 - 13.25 Area 2
Session : P56 - Corpora and
Annotation
Chair: Tomaž Erjavec TBC
1023 Goran Glavaš, Jan Šnajder, Marie-Francine Moens and Parisa
Kordjamshidi
HiEve: A Corpus for Extracting Event Hierarchies from
News Stories
1075 Masaya Yamaguchi Building a Database of Japanese Adjective Examples
from Special Purpose Web Corpora
1134 Antonio Toral TLAXCALA: a multilingual corpus of independent news
1143 Nathan Green and Septina Dian Larasati Votter Corpus: A Corpus of Social Polling Language
1151 Roald Eiselen and Martin Puttkammer Developing Text Resources for Ten South African
1153 Paul Felt, Robbie Haertel, Eric Ringger and Kevin Seppi Momresp: A Bayesian Model for Multi-Annotator
Document Labeling
1211 Maciej Ogrodniczuk and Mateusz Kopeć The Polish Summaries Corpus
Session : P57 - Information
Extraction and Information
Retrieval
Chair: Feiyu Xu
980 Clément de Groc and Xavier Tannier Evaluating Web-as-corpus Topical Document Retrieval
with an Index of the OpenDirectory
1038 Jordan Schmidek and Denilson Barbosa Improving Open Relation Extraction via Sentence Re-
1058 Pavel Smrz and Jan Kouril Semantic Search in Documents Enriched by LOD-based
1103 Antske Fokkens, Serge Ter Braake, Niels Ockeloen, Piek
Vossen, Susan Legêne and Guus Schreiber
BiographyNet: Methodological Issues when NLP
supports historical research
1156 Tilia Ellendorff, Fabio Rinaldi and Simon Clematide Using Large Biomedical Databases as Gold Annotations
for Automatic Relation Extraction
1170 Yutaka Mitsuishi, Vit Novacek and Pierre-Yves Vandenbussche A Method for Building Burst-Annotated Co-Occurrence
Networks for Analysing Trends in Textual Data
Session : P58 - Lexicons Chair: Kiril Simov
1067 Antonio San Martín and Marie-Claude L' Homme Definition patterns for predicative terms in specialized
lexical resources
1099 Tim vor der Brück, Alexander Mehler and Zahurul Islam ColLex.en: Automatically Generating and Evaluating a
Full-form Lexicon for English
1105 Ajay Dubey, Parth Gupta, Vasudeva Varma and Paolo Rosso Enrichment of Bilingual Dictionary through News
Stream Data
1108 Thomas Francois, Nùria Gala, Patrick Watrin and Cédrick
Fairon
FLELex: a graded lexical resource for French foreign
learners
1155 Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena
Moniz and Isabel Trancoso
OpenLogos Semantico-Syntactic Knowledge-Rich
Bilingual Dictionaries
1161 Mona Diab, Mohamed AlBadrashiny, Maryam Aminian,
Mohammed Attia, Heba Elfardy, Nizar Habash and Abdelati
Hawwari
Towards Compiling a large scale three-way Egyptian
Arabic Dictionary
1169 Michael Rosner and Kurt Sultana Automatic Methods for the Extension of a Bilingual
Dictionary using Comparable Corpora
1203 Kevin Black, Eric Ringger, Paul Felt, Kevin Seppi, Kristian Heal
and Deryle Lonsdale
Evaluating Lemmatization Models for Machine-Assisted
Corpus-Dictionary Linkage
Session : P59 - Language
Resource Infrastructures
Chair: Martin Wynne
396 Menzo Windhouwer and Ineke Schuurman Linguistic resources and cats: how to use ISOcat, RELcat
and SCHEMAcat
660 Lluís Padró, Zeljko Agic, Xavier Carreras, Blaz Fortuna,
Esteban García-Cuesta, Zhixing Li, Tadej Stajner and Marko
Tadić
Language Processing Infrastructure in the XLike Project
743 Piotr Banski, Nils Diewald, Michael Hanl, Marc Kupietz and
Andreas Witt
Access control by query rewriting: the case of KorAP
775 Rodrigo Agerri, Josu Bermudez and German Rigau IXA pipeline: Efficient and Ready to Use Multilingual
930 Trang Mai Xuan, Yohei Murakami, Donghui Lin and Toru
Ishida
Integration of Workflow and Pipeline for Language
Service Composition
1086 Rafal Rak, Jacob Carter, Andrew Rowley, Riza Theresa Batista-
Navarro and Sophia Ananiadou
Interoperability and Customisation of Annotation
Schemata in Argo
Session : P60 - Metadata Chair: Gil Francopoulo
979 Penny Labropoulou, Christopher Cieri and Maria Gavrilidou Developing a Framework for Describing Relations
among Language Resources
1011 Thorsten Trippel, Daan Broeder, Matej Durco and Oddrun
Ohren
Towards automatic quality assessment of component
metadata
Session : P61 - Opinion
Mining and Sentiment
Analysis
Chair: Gerard de Melo
TBC
188 Chantal van Son, Marieke van Erp, Antske Fokkens and Piek
Vossen
Hope and Fear: How Opinions Influence Factuality
413 Nathan Hartmann, Lucas Avanço, Pedro Balage, Magali
Duran, Maria das Graças Volpe Nunes, Thiago Pardo and
Sandra Aluísio
A Large Corpus of Product Reviews in Portuguese:
Tackling Out-Of-Vocabulary Words
500 Thierry Declerck and Hans-Ulrich Krieger Harmonization of German Lexical Resources for
617 Anne Garcia-Fernandez, Olivier Ferret and Marco Dinarelli Evaluation of different strategies for domain
adaptation in opinion mining
1010 Amel Fraisse and Patrick Paroubek Toward a unifying model for Opinion, Sentiment and
Emotion information extraction
Session : P62 - Speech
Resources
Chair: Christoph Draxler
858 Michael Stadtschnitzer, Jochen Schwenninger, Daniel Stein
and Joachim Koehler
Exploiting the large-scale German Broadcast Corpus to
boost the Fraunhofer IAIS Speech Recognition System
889 Ilaine Wang, Sylvain Kahane and Isabelle Tellier Macrosyntactic Segmenters of a French Spoken Corpus
906 Iolanda Alfano, Francesco Cutugno, Aurelio De Rosa, Claudio
Iacobini, Renata Savy and Miriam Voghera
VOLIP: a corpus of spoken Italian and a virtuous
example of reuse of linguistic resources
929 George Christodoulides, Mathieu Avanzi and Jean-Philippe
Goldman
DisMo: A Morphosyntactic, Disfluency and Multi-Word
Unit Annotator. An Evaluation on a Corpus of French
Spontaneous and Read Speech
1020 Vera Cabarrão, Helena Moniz, Fernando Batista, Ricardo
Ribeiro, Nuno Mamede, Hugo Meinedo, Isabel Trancoso, Ana
Isabel Mata and David Martins de Matos
Revising the annotation of a Broadcast News corpus: a
linguistic approach
1193 Ana Isabel Mata, Helena Moniz, Fernando Batista and Julia
Hirschberg
Teenage and adult speech in school context: building
and processing a corpus of European Portuguese
1056 Arjan van Hessen, Franciska de Jong, Stef Scagliola and Tanja
Petrovic
Croatian Memories
1081 Ines Rehbein, Sören Schalowski and Heike Wiese The KiezDeutsch Korpus (KiDKo) Release 1.0
1104 Anthony Rousseau, Paul Deléglise and Yannick Estève Enhancing the TED-LIUM Corpus with Selected Data for
Language Modeling and More TED Talks
1176 Jan Strunk, Florian Schiel and Frank Seifart Untrained Forced Alignment of Transcriptions and
Audio for Language Documentation Corpora using
Sessions: 14.55 - 16.35 Area 1
Session : P63 - Computer-
Assisted Language Learning
(CALL)
Chair: Keith Miller TBC
101 Xiaoyun Wang, Jinsong Zhang, Masafumi Nishida and Seiichi
Yamamoto
Phoneme Set Design Using English Speech Database by
Japanese for Dialogue-Based English CALL Systems
247 Lianet Sepúlveda Torres, Magali Sanches Duran and Sandra
Aluísio
Generating a Lexicon of Errors in Portuguese to
Support an Error Identification System for Spanish
340 Veronika Vincze, János Zsibrita, Péter Durst and Martina
Katalin Szabó
Automatic Error Detection concerning the Definite and
Indefinite Conjugation in the HunLearner Corpus
570 Gabriele Pallotti, Francesca Frontini, Fabio Affè, Monica
Monachini and Stefania Ferrari
Presenting a system of human-machine interaction for
performing map tasks
857 Valentín Cardeñoso-Payo, César González-Ferreras and David
Escudero
Assessment of Non-native Prosody for Spanish as L2
using quantitative scores and perceptual evaluation
892 Elena Volodina, Ildikó Pilán, Lars Borin and Therese
Lindström Tiedemann
A flexible language learning platform based on
language resources and web services
971 Renlong Ai and Marcela Charfuelan MAT: a tool for L2 pronunciation errors annotation
1126 Chris Hokamp, Rada Mihalcea and Peter Schuelke Modeling Language Proficiency Using Implicit Feedback
Session : P64 - Evaluation
Methodologies
Chair: Kevin Bretonnel
Cohen
960 Mohamed Ben Jannet, Martine Adda-Decker, Olivier
Galibert, Juliette Kahn and Sophie Rosset
ETER : a new metric for the evaluation of hierarchical
named entity recognition
1027 Olivier Galibert, Jeremy Leixa, Gilles Adda, Khalid Choukri
and Guillaume Gravier
The ETAPE speech processing evaluation
998 Achim Rettinger, Lei Zhang, Daša Berović, Danijela Merkler,
Matea Srebačić and Marko Tadić
RECSA: Resource for Evaluating Cross-lingual Semantic
Annotation
1147 Helen Hastie and Anja Belz A Comparative Evaluation Methodology for NLG in
Interactive Systems
1189 Juris Borzovs, Ilze Ilziņa, Iveta Keiša, Mārcis Pinnis and
Andrejs Vasiļjevs
Terminology localization guidelines for the national
scenario
Session : P65 - MultiWord
Expressions and Terms
Chair: Valia Kordoni
706 Kris Heylen, Stephen Bond, Dirk De Hertog De Hertog, Ivan
Vulić and Hendrik Kockaert
TermWise: A CAT-tool with Context-Sensitive
Terminological Support
883 Pollet Samvelian, Pegah Faghiri and Sarra El Ayari Extending the coverage of a MWE database for Persian
CPs exploiting valency alternations
920 Behrang Zadeh and Siegfried Handschuh Evaluation of Technology Term Recognition with
1064 Johannes Hellrich, Simon Clematide, Udo Hahn and Dietrich
Rebholz-Schuhmann
Collaboratively Annotating Multilingual Parallel
Corpora in the Biomedical Domain—some MANTRAs
1184 Anca Dinu, Liviu Dinu and Ionut Sorodoc Aggregation methods for efficient collocation detection
1197 Sandra Antunes and Amália Mendes An evaluation of the role of statistical measures and
frequency for MWE identification
Session : P66 - Parsing Chair: Giuseppe Attardi
596 Weston Feely, Mehdi Manshadi, Robert Frederking and Lori
Levin
The CMU METAL Farsi NLP Approach
696 Masood Ghayoomi, Kiril Simov and Petya Osenova Constituency Parsing of Bulgarian: Word- vs Class-
1005 Kiril Simov, Iliana Simova, Ginka Ivanova, Maria Mateva and
Petya Osenova
A System for Experiments with Dependency Parsers
809 Wolfgang Seeker and Jonas Kuhn An Out-of-Domain Test Suite for Dependency Parsing
879 Lauma Pretkalniņa, Artūrs Znotiņš, Laura Rituma and Didzis
Goško
Dependency parsing representation effects on the
accuracy of semantic applications — an example of an
970 Ophélie Lacroix and Denis Béchet Validation Issues induced by an Automatic Pre-
Annotation Mechanism in the Building of Non-
1158 Jianqiang Ma Automatic Refinement of Syntactic Categories in
Chinese Word Structures
Session : P67 - Part-of-
Speech Tagging
Chair: Daniel Flickinger
687 Stephen Wattam, Paul Rayson, Marc Alexander and Jean
Anderson
Experiences with Parallelisation of an Existing NLP
Pipeline: Tagging Hansard
721 Heike Zinsmeister, Ulrich Heid and Kathrin Beck Adapting a part-of-speech tagset to non-standard text:
The case of STTS
755 Antonio Balvet, Dejan Stosic and Aleksandra MILETIC TALC-sef A Manually-Revised POS-TAgged Literary
Corpus in Serbian, English and French
801 Cristina Sánchez Marco An open source part-of-speech tagger for Norwegian:
Building on existing language resources
826 Antonio Pareja-Lora, Guillermo Cárcamo-Escorza and Alicia
Ballesteros-Calvo
Standardisation and Interoperation of
Morphosyntactic and Syntactic Annotation Tools for
Session : P68 - Tools,
Systems, Applications
Chair: Yota
Georgakopoulou TBC
185 Peter Fankhauser, Jörg Knappen and Elke Teich Exploring and Visualizing Variation in Language
429 Raphael Winkelmann and Georg Raess Introducing a web application for labeling, visualizing
speech and correcting derived speech signals
621 Maha Althobaiti, Udo Kruschwitz and Massimo Poesio AraNLP: a Java-based Library for the Processing of
Arabic Text
640 Silvia Rodríguez Vázquez, Pierrette Bouillon and Anton
Bolfing
Applying Accessibility-Oriented Controlled Language
(CL) Rules to Improve Appropriateness of Text
Alternatives for Images: an Exploratory Study
824 Jonathan Sonntag and Manfred Stede GraPAT: a Tool for Graph Annotations
862 Vincenzo Galatà, Alberto Benin, Piero Cosi, Giuseppe
Riccardo Leone, Giulio Paci, Giacomo Sommavilla and Fabio
Tesser
Discovering the Italian literature: interactive access to
audio indexed text resources
1102 Horacio Saggion Creating Summarization Systems with SUMMA