Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Scientific Visualization of Language Data

Chris CulyWinter 2011


LInfoVis* (< Language Information Visualization, cf. InfoVis):

the visualization of language related information, especially on computer displays

* Not a standard term (not yet, anyway)

What are we doing?


“Visualization has to be more than pretty pictures. It has to inform. It has to

challenge. It has to further our understanding. Visualizing data is not

about pretty pictures.”

Robert Kosara on www.eagereyes.org


What are we not doing?

(Only language, no other data.)

Source: Lewis Carroll. Alice's Adventures in Wonderland. Ch. 3


Gray Area

Numeric information derived from language datae.g. frequencies, statistical measures, etc.

There are lots of chart/graphing packagese.g. With spreadsheets, in R, etc.

But, if there is an interesting and useful way to incorporate the language data, we'll do that


Corpus Cloudshttp://www.eurac.edu/en/research/institutes/multilingualism/Projects/LInfoVis/CorpusClouds.html


Presentation vs. Analysis

Presentation:

Convey information known to the author To an audience other than the author Typically static (e.g. charts in a paper)

Analysis

Present information that is not (well) known to the user Help the user understand (“make sense of”) the information Often interactive, though not necessarily

Different goals, different techniques


Why visualization?

The human visual system is very efficient at discovering certain patterns in large amounts of information.

The eye has on average: 92 million rods (for light level) 4.6 million cones (for color)

Curcio, C. A., Sloan, K. R., Kalina, R. E. and Hendrickson, A. E. (1990), Human photoreceptor topography. The Journal of Comparative Neurology, 292: 497–523. doi: 10.1002/cne.902920402

updated 10-12 times per second Things are more much more complicated than those basic numbers, but still ...

Preattentive processing: recognition of features before conscious processing

We can take advantage of this capacity to help linguists analyze language, especially in finding patterns


What makes LInfoVis special?

Textual elements are:

Categorical not numeric in general, no scale of comparison

Hearst M. 2009. Search User Interfaces. Cambridge University Press.

NB: we will (almost?) always have non-textual data, but we will always need to show the textual elements as well



Language is:

not mappable -- there is in general no more compact way to visualize language (that is humanly comprehensible)

i.e. unlike numbers, we can't map word to size, shape, color, etc.

cf. Culy, C., Lyding, V., and Dittmann, H. 2011. "xLDD: Extended Linguistic Dependency Diagrams" in Proceedings of the 15th International Conference on Information Visualisation IV2011, 12, 13 - 15 July 2011, University of London, UK. 164-169.



Linguistics has:

particular data structures (like any field)

standard ones used in different ways e.g. trees, feature structures, KWIC

with particular (conventional) visual representations e.g. dependency structures as arcs



Linguists:

Often want to exam the original data, not just the measurements/summary More than some (most?) fields e.g. word frequencies in a text/corpus -- linguists

want to be able to exam the source data, to see the words in context


Goethe on seeing

Goethe

Man sieht nur das, was man weiß.

You only see what you know.

Culy

You can only visualize what you have.


The real Goethe on seeing

Man erblickt nur, was man schon weiß und versteht.

You glimpse only what you already know and understand.Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon

Goethe-Zitate

Was man weiß, sieht man erst!

You see first what you know!

In: Einleitung in die Propyläen

That's more optimistic!


Some challenges in LInfoVis

Dealing with the categorical/non-mappable nature of language How can we show textual data in an effective way? Exploit the capabilities of the human visual system Cater to our general cognitive capabilites Interaction is key



Dealing with large amounts of data e.g. 2560x1440 monitor = 3,686,400 pixels, but one pixel is pretty

small, and 3.7M is a lot smaller than the amount of information in a small

corpus: Penn Treebank has 4.5M words, plus POS, parses etc Particular subsets of interest will be smaller, but they often

(usually?) contain more information than can fit on a screen

What are effective strategies for dealing with large amounts of data? From a visualization perspective From an architectural/programming perspective



What are the most useful levels of abstraction for LInfoVis tools? i.e. what functionalities should LInfoVis components

contain?


Other practical challenges

How to integrate LInfoVis into workflows Of people: How can LInfoVis be made useful to people

doing linguistic analysis? Of programs: How can LInfoVis programs be integrated

with other tools? e.g. Weblicht What are the roles of LInfoVis components?

Producer/consumer Read only vs. read/write (i.e. using LInfoVis tools to modify/create

data) What's the division of labor between LInfoVis components and

others? How do we maintain the connection with the original

data?


Where do LInfoVis visualizations come from?

Use existing visualizations as is

Modify and adapt existing visualizations

Add Infovis techniques to standard linguistic diagrams

New approaches


Why components?

In many applications, the visualizations are custom-designed for the application and tightly integrated with it.

But, reinventing the wheel is not very interesting or productive.

LInfovis visualizations could be more like graphs/charts and parsers: components that can be used with a variety of data of the same type

Line graphs can be used with data from any field Parsers can be used with grammars for any language

Claims (Culy):

a) Linguistic data of the same “type” can be visualized meaningfully by the same visualization(s).

b) There are enough data sets with the same “type” to make (a) interesting, and hence components worth creating.


Structure of the course

A mix of theory and practice

Survey of visualization theory and general techniques (CuC) Presentation of particular techniques and applications (everyone)

Read articles, with one person responsible for presenting them

Programming exercises Introduction to Javascript (as necessary) Basic drawing (with Java, Javascript) Some higher level visualization toolkits (e.g. Processing, Protovis/D3)

Project


The project

Goal: develop a scientific visualization of some kind of linguistic data Start thinking about what kind of data you want to visualize, and where you'll get it

Who: Small groups If you are inexperienced in programming, work with someone who is more

experienced

What you'll need to provide me at the end of the term: 1. A functioning visualization, with some sample data to visualize 2. Technical documentation of how the visualization works, and how to use it

e.g. Javadoc and help/readme/tutorial 3. A short (~15 pages) paper describing the visualization: background, its goals,

how it works, and future directions 4. If you have gotten feedback from real or potential users, include that in the

paper


Practical information

http://www.sfs.uni-tuebingen.de/~cculy/courses/W2011/vis/

[email protected]

Office: 1.07

Tel: 07071/29-7 3966

Sprechstunden (Office hours): T 14-15, Th 16-17


For next time

Read the tutorial (link web site) Through “Principles: visual variables (2)”

Date post:	23-Jan-2016
Category:	Documents
Upload:	iona
View:	34 times
Download:	0 times

Scientific Visualization of Language Data Chris Culy Winter 2011

Documents