LInfoVis Winter 2011 Chris Culy
Scientific Visualization of Language Data
Chris CulyWinter 2011
LInfoVis Winter 2011 Chris Culy
LInfoVis* (< Language Information Visualization, cf. InfoVis):
the visualization of language related information, especially on computer displays
* Not a standard term (not yet, anyway)
What are we doing?
LInfoVis Winter 2011 Chris Culy
“Visualization has to be more than pretty pictures. It has to inform. It has to
challenge. It has to further our understanding. Visualizing data is not
about pretty pictures.”
Robert Kosara on www.eagereyes.org
LInfoVis Winter 2011 Chris Culy
What are we not doing?
(Only language, no other data.)
Source: Lewis Carroll. Alice's Adventures in Wonderland. Ch. 3
LInfoVis Winter 2011 Chris Culy
Gray Area
Numeric information derived from language datae.g. frequencies, statistical measures, etc.
There are lots of chart/graphing packagese.g. With spreadsheets, in R, etc.
But, if there is an interesting and useful way to incorporate the language data, we'll do that
LInfoVis Winter 2011 Chris Culy
Corpus Cloudshttp://www.eurac.edu/en/research/institutes/multilingualism/Projects/LInfoVis/CorpusClouds.html
LInfoVis Winter 2011 Chris Culy
Presentation vs. Analysis
Presentation:
Convey information known to the author To an audience other than the author Typically static (e.g. charts in a paper)
Analysis
Present information that is not (well) known to the user Help the user understand (“make sense of”) the information Often interactive, though not necessarily
Different goals, different techniques
LInfoVis Winter 2011 Chris Culy
Why visualization?
The human visual system is very efficient at discovering certain patterns in large amounts of information.
The eye has on average: 92 million rods (for light level) 4.6 million cones (for color)
Curcio, C. A., Sloan, K. R., Kalina, R. E. and Hendrickson, A. E. (1990), Human photoreceptor topography. The Journal of Comparative Neurology, 292: 497–523. doi: 10.1002/cne.902920402
updated 10-12 times per second Things are more much more complicated than those basic numbers, but still ...
Preattentive processing: recognition of features before conscious processing
We can take advantage of this capacity to help linguists analyze language, especially in finding patterns
LInfoVis Winter 2011 Chris Culy
What makes LInfoVis special?
Textual elements are:
Categorical not numeric in general, no scale of comparison
Hearst M. 2009. Search User Interfaces. Cambridge University Press.
NB: we will (almost?) always have non-textual data, but we will always need to show the textual elements as well
LInfoVis Winter 2011 Chris Culy
What makes LInfoVis special?
Language is:
not mappable -- there is in general no more compact way to visualize language (that is humanly comprehensible)
i.e. unlike numbers, we can't map word to size, shape, color, etc.
cf. Culy, C., Lyding, V., and Dittmann, H. 2011. "xLDD: Extended Linguistic Dependency Diagrams" in Proceedings of the 15th International Conference on Information Visualisation IV2011, 12, 13 - 15 July 2011, University of London, UK. 164-169.
LInfoVis Winter 2011 Chris Culy
What makes LInfoVis special?
Linguistics has:
particular data structures (like any field)
standard ones used in different ways e.g. trees, feature structures, KWIC
with particular (conventional) visual representations e.g. dependency structures as arcs
LInfoVis Winter 2011 Chris Culy
What makes LInfoVis special?
Linguists:
Often want to exam the original data, not just the measurements/summary More than some (most?) fields e.g. word frequencies in a text/corpus -- linguists
want to be able to exam the source data, to see the words in context
LInfoVis Winter 2011 Chris Culy
Goethe on seeing
Goethe
Man sieht nur das, was man weiß.
You only see what you know.
Culy
You can only visualize what you have.
LInfoVis Winter 2011 Chris Culy
The real Goethe on seeing
Man erblickt nur, was man schon weiß und versteht.
You glimpse only what you already know and understand.Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon
Goethe-Zitate
Was man weiß, sieht man erst!
You see first what you know!
In: Einleitung in die Propyläen
That's more optimistic!
LInfoVis Winter 2011 Chris Culy
Some challenges in LInfoVis
Dealing with the categorical/non-mappable nature of language How can we show textual data in an effective way? Exploit the capabilities of the human visual system Cater to our general cognitive capabilites Interaction is key
LInfoVis Winter 2011 Chris Culy
Some challenges in LInfoVis
Dealing with large amounts of data e.g. 2560x1440 monitor = 3,686,400 pixels, but one pixel is pretty
small, and 3.7M is a lot smaller than the amount of information in a small
corpus: Penn Treebank has 4.5M words, plus POS, parses etc Particular subsets of interest will be smaller, but they often
(usually?) contain more information than can fit on a screen
What are effective strategies for dealing with large amounts of data? From a visualization perspective From an architectural/programming perspective
LInfoVis Winter 2011 Chris Culy
Some challenges in LInfoVis
What are the most useful levels of abstraction for LInfoVis tools? i.e. what functionalities should LInfoVis components
contain?
LInfoVis Winter 2011 Chris Culy
Other practical challenges
How to integrate LInfoVis into workflows Of people: How can LInfoVis be made useful to people
doing linguistic analysis? Of programs: How can LInfoVis programs be integrated
with other tools? e.g. Weblicht What are the roles of LInfoVis components?
Producer/consumer Read only vs. read/write (i.e. using LInfoVis tools to modify/create
data) What's the division of labor between LInfoVis components and
others? How do we maintain the connection with the original
data?
LInfoVis Winter 2011 Chris Culy
Where do LInfoVis visualizations come from?
Use existing visualizations as is
Modify and adapt existing visualizations
Add Infovis techniques to standard linguistic diagrams
New approaches
LInfoVis Winter 2011 Chris Culy
Why components?
In many applications, the visualizations are custom-designed for the application and tightly integrated with it.
But, reinventing the wheel is not very interesting or productive.
LInfovis visualizations could be more like graphs/charts and parsers: components that can be used with a variety of data of the same type
Line graphs can be used with data from any field Parsers can be used with grammars for any language
Claims (Culy):
a) Linguistic data of the same “type” can be visualized meaningfully by the same visualization(s).
b) There are enough data sets with the same “type” to make (a) interesting, and hence components worth creating.
LInfoVis Winter 2011 Chris Culy
Structure of the course
A mix of theory and practice
Survey of visualization theory and general techniques (CuC) Presentation of particular techniques and applications (everyone)
Read articles, with one person responsible for presenting them
Programming exercises Introduction to Javascript (as necessary) Basic drawing (with Java, Javascript) Some higher level visualization toolkits (e.g. Processing, Protovis/D3)
Project
LInfoVis Winter 2011 Chris Culy
The project
Goal: develop a scientific visualization of some kind of linguistic data Start thinking about what kind of data you want to visualize, and where you'll get it
Who: Small groups If you are inexperienced in programming, work with someone who is more
experienced
What you'll need to provide me at the end of the term: 1. A functioning visualization, with some sample data to visualize 2. Technical documentation of how the visualization works, and how to use it
e.g. Javadoc and help/readme/tutorial 3. A short (~15 pages) paper describing the visualization: background, its goals,
how it works, and future directions 4. If you have gotten feedback from real or potential users, include that in the
paper
LInfoVis Winter 2011 Chris Culy
Practical information
http://www.sfs.uni-tuebingen.de/~cculy/courses/W2011/vis/
Office: 1.07
Tel: 07071/29-7 3966
Sprechstunden (Office hours): T 14-15, Th 16-17
LInfoVis Winter 2011 Chris Culy
For next time
Read the tutorial (link web site) Through “Principles: visual variables (2)”