Date post: | 22-Jan-2018 |
Category: |
Technology |
Upload: | timo-honkela |
View: | 491 times |
Download: | 2 times |
Multimodally Grounded Translationby Humans and Machines
Timo Honkela
Aalto University(former Helsinki University of Technology)
Department of Information and Computer ScienceCognitive Systems research group
Finland
Tralogy II, Paris, 18 Jan 2013
Related materials
● Full paper:http://lodel.irevues.inist.fr/tralogy/index.php?id=259
● Video recording of the presentation:http://webcast.in2p3.fr/videos-multimodality_grounded_translation_by_humans_and_machines
0Speaker's
background:
Some earlier projects
Natural language database interfacewith dependency-based compositional semantics
● H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988):A Multilevel Natural Language Processing Model. Nordic Journal of Linguistics 11:69-87.
What is the turnover of the ten largest stock exchange companies in forestry?
Morphological analysis
Dependency parsing
Logical analysis
Database query formation
Result from the SQL databaseSeveral dozens of p
erson years used on developing
a rule-based natural la
nguage processing system.
Several dozens of p
erson years used on developing
a rule-based natural la
nguage processing system.
Classical example: Learning meaning from context:
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
Automated learning of word re
lations
using self-organizing m
ap on text c
ontext data
Automated learning of word re
lations
using self-organizing m
ap on text c
ontext data
Map of Finnish Science
Chemistry
Physics andengineering
Biosciences
Medicine
Culture and society
A fully automated process from terminology extraction to semantic spaceconstruction without any manually constructed resources.
WordICA
Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010.
Jaakko J. Väyrynen, Lasse Lindqvist, and Timo Honkela. Sparse distributed representations for words with thresholded independent component analysis. In Proceedings of IJCNN'07, pages 1031–1036, 2007.
Learning taxonomies basedon analysing text documents
Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue and Timo Honkela (2012). Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3), pp. 1138--1148.
Analyzing Complexity of Languages
Markus Sadeniemi, Kimmo Kettunen, Tiina Lindh-Knuutila, and Timo Honkela. Complexity of European Union languages: A comparative approach. Journal of Quantitative Linguistics, 15(2):185–211, 2008.
Concept Formation andCommunication - General Theory
Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.
λ : Ci × Cj → R, i ≠ jA distance between two points in the concept spaces of different agents
S: symbol space,The vocabulary of anagent that consists of discrete symbols
: sξ i S∈ i → CAn individual mapping function from symbols to concepts
φi: Si D→An individual mapping from agent i's vocabulary to the signal space D andan inverse mapping φ
1 i from the signal
space to the symbol space
Ci: Ndimensional metric concept space
Observing f1 and after symbol selection process, agent 1 communicates a symbol s*to agent 2 as signal d. When agent 2 observes d, it maps it to some s2
S∈ 2 by using the function φ 11.
Then it maps the symbol to some point in its concept space by using ξ2. If this point is close to its observation f2 in the sense of λ, the communication process has succeeded.
1Being-in-the-world:
Perception and
Movement
Why brains?
● What are the central differences between plants and animals?
“The original need for a nervous system was to coordinate movement, so an organism could go find food, instead of waiting for the food to come to it.”
● An extreme example: A sea squirt transformsfrom an “animal” to a “plant”. It absorbs its own cerebral ganglion that it used to swim about and find its attachment place.
http://goodheartextremescience.wordpress.com/2010/01/27/meet-the-creature-that-eats-its-own-brain/
http://www.fi.edu/learn/brain/
Human movement
David Bailey's thesis (1997):
Verbs related to hand movement
Point of view fromcognitive linguistics
● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.
● For example: the meaning of the word 'walk' involves● what walking looks like● what it feels like to walk and after having walked● how the world looks when walking
(e.g. objects approach at a certain speed, etc.). ● ...
Abstract vs concrete grounding
Ronald Langacker
Motion capture
AnimationImage analysis
Video analysis
Robotics
Machine learning
Language learning
Socio-cognitive modelingSymbol grounding
Jorma Laaksonen
Tapio Takala
Klaus Förger
Harri Valpola
Oskar Kohonen
Reinforcementlearning
Paul WagnerMarkus Koskela
Xi Chen
Learning relations
Kinect
OptiTrack
Timo Honkela
Multimodally Grounded Language Technology
A project funded by Academy of Finland2011-2014
Timo Honkela as the Principal Investigator
A collaboration betweendepartments of
* Information and Computer Science, and
* Media Technology
Labeling movements
From an unpublished manuscript. Experiments by Klaus Förger.
Linking between modalities
Potential uses of the emerging technologies
● Multimodally grounded natural language interaction and machine translation
● Animation based on linguistic instruction● Automated skill instruction
(playing an instrument, learning some sports, etc.)
● Video annotation● Addressing some of the fundamental issues
in traditional AI, cognitive science and philosophy
2Contextuality and
Subjectivity of
Understanding
Meaning is contextual
red winered skinred shirt
Gärdenfors: Conceptual Spaces
Hardin: Color for Philosophers
Meaning is contextual
SNOW -WHITE?
WHITE
Meaning is contextual
● “Small”, “big”● “White house”● “Get”● “Every” - “Every Swede is tall/blond”● etc. etc.
Another comment:
Strict compositionality cannot be assumed
Fuzziness
Learning meaning from context
● Self-Organizing Semantic Maps● Latent Semantic Analysis● Latent Dirichlet Allocation● WordICA● etc. etc.
Meaning is subjective
Meaning is subjective
● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.
A proper theory ofmeaning has to takethis into account
2bMeasuring
Subjectivity of
Understanding
User-specificdifficulty
assessment
Basic architecture of the method
User-specific difficulty assessment
Paukkeri, Ollikainen & Honkela, Information Processing & Management, 2013.
GICA:Grounded
Intersubjective Concept Analysis
Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.
Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity.
Proceedings of IJCNN 2012, International Joint Conference on Neural Networks, pp. 2875-2883, 2012.
Publication:
Case: State of the Union Addresses
● Text mining is used in populating a Subject-Object-Context tensor
● This took place by calculating the frequencies on how often a subject uses an object word in the context of a context word● Context window of 30 words
Analysis of the word 'health'
Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.
Conclusions (1)
● Languages, including formal languages, should be considered as tools for coordination, storing and sharing knowledge in a compressed form – approximate and relative to the point of view taken
● Constructing a language or symbol system (such as an ontology) is an investment and spreading the language into use in a community is even a larger one
From TEDxAALTO presentation “Measuring Subjectivity of Meaning – and How it may change our life” with illustrations by Nelli Honkela
1+2Movement and
Subjectivity
goo.gl / UZnvH
Klaus Förger & Timo Honkela, unpublished results
WALKING
RUNNINGRUNNING
Consider how different languagesdivide the conceptual space
in different ways(cf. e.g. Melissa Bowerman et al.)
3Implications on
Human and
Machine Translation
Conclusions and discussion
● Importance and challenge of contextuality and pragmatics (cf. David Farwell's presentation)
● Huge complexity of language at different levels of abstraction(Von Foerster in Responsibilities of Compentence (1972): “The hard sciences are successful because they deal with the soft problems; the soft sciences are struggling because they deal with the hard problems”
● Avenues for future research: applying statistical machine learning, modeling rich contexts including multimodal information, modeling variation at multiple levels, refining the division of labor between humans and machines, etc.
● Acknowledging language's central role in the society which requires substantial investments in several areas of research: general, computational and cognitive linguistics, psycho- and sociolinguistics, machine learning and pattern recognition, dynamical systems theory, renewed language philosophy, etc.
Thank you! Merci!
Kiitos! ¡Gracias! Obrigado!
Danke schön! ありがとう
External references(for which there was not enough time)
● Choi, S. and Bowerman, M. (1991). Learning to express motion events in English and Korean: The influence of language-specific lexicalization patterns. Cognition, 41, 83-121.
● Saenko, K. and Darrell, T. (2008) Unsupervised Learning of Visual Sense Models for Polysemous Words. Proc. of NIPS, Neural Information Processing Systems.
References: modeling subjectivity
● Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar. Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. In Proceedings of IJCNN 2012, International Joint Conference on Neural Networks, pages 2875-2883, 2012.
● Mari-Sanna Paukkeri, Marja Ollikainen, Timo Honkela: Assessing user-specific difficulty of documents. Information Processing and Management, 49(1): 198-212, 2013.
● He Zhang, Eimontas Augilius, Timo Honkela, Jorma Laaksonen, Hannes Gamper, and Henok Alene. Analyzing emotional semantics of abstract art using low-level image features. In Advances in Intelligent Data Analysis X, pages 413-423, 2011.
Refs: language-related research
● Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010.
● Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue, and Timo Honkela. Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3):1138–1148, 2012.
● Mari-Sanna Paukkeri, Ilari T. Nieminen, Matti Pöllä, and Timo Honkela. A language-independent approach to keyphrase extraction and evaluation. In Coling 2008, pages 83-86, 2008.
Refs: Translation and multilinguality
● Markus Sadeniemi, Kimmo Kettunen, Tiina Lindh-Knuutila, and Timo Honkela. Complexity of European Union languages: A comparative approach. Journal of Quantitative Linguistics, 15(2):185-211, 2008.
● Timo Honkela, Sami Virpioja, and Jaakko Väyrynen. Adaptive translation: Finding interlingual mappings using self-organizing maps. In Proceedings of ICANN'08, pages 603-612, 2008.
● David Ellis, Mathias Creutz, Timo Honkela, and Mikko Kurimo. Speech to speech machine translation: Biblical chatter from Finnish to English. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 123-130, 2008.
● Timo Honkela. Philosophical aspects of neural, probabilistic and fuzzy modeling of language use and translation. In Proceedings of IJCNN, pages 2881-2886, 2007.
Refs: Multimodality
● Mats Sjöberg, Jorma Laaksonen, Timo Honkela, and Matti Pöllä. Retrieval of multimedia objects by combining semantic information from visual and textual descriptors. In Proceedings of ICANN, pages 75-83, 2006.
● Mats Sjöberg, Ville Viitaniemi, Jorma Laaksonen, and Timo Honkela. Analysis of semantic information available in an image collection augmented with auxiliary data. In Proceedings of AIAI'06, Artificial Intelligence Applications and Innovations, volume 204, pages 600-608. Springer, 2006.
● Forthcoming papers from the Multimodally Grounded Language Technology project
Refs: Modeling language at cognitive and socio-cultural level
● Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.
● Tiina Lindh-Knuutila, Juha Raitio, and Timo Honkela. Combining self-organized and Bayesian models of concept formation. In Proc. of the Eleventh Neural Computation and Psychology Workshop, pages 193-204, 2009.
● Tiina Lindh-Knuutila, Timo Honkela, and Krista Lagus. Simulating meaning negotiation using observational language games. In Proc. of the Third International Workshop on the Emergence and Evolution of Linguistic Communication, pages 168-179.
● Timo Honkela. Neural nets that discuss: a general model of communication based on self-organizing maps. In Proc. of ICANN'93, pages 408-411, 1993.
Refs: Text mining andinformation retrieval
● Samuel Kaski, Timo Honkela, Krista Lagus, and Teuvo Kohonen. WEBSOM—self-organizing maps of document collections. Neurocomputing, 21:101-117, 1998.
● Timo Honkela, Raimo Nordfors, and Raimo Tuuli. Document maps for competence management. In Proceedings of the Symposium on Professional Practice in AI, pages 31-39, 2004.
● Nina Janasik, Timo Honkela, and Henrik Bruun. Text mining in qualitative research: Application of an unsupervised learning method. Organizational Research Methods, 12(3):436–460, 2009.
Refs: Links to Paris
● A classical work presented in a conference in Paris:
Timo Honkela, Ville Pulkki, and Teuvo Kohonen. Contextual relations of words in Grimm tales, analyzed by self-organizing map. In Proc. of ICANN'95, 1995.
● An exhibition in Pompidou Centre in Paris:
George Legrady and Timo Honkela. Pockets full of memories: an interactive museum installation. Visual Communication, 1(2):163-169, 2002.