Post on 19-Dec-2015
transcript
KirrkirrA Dictionary Visualization Tool
Conrad WaiAndrei Pop
Presentation Overview
What is Kirrkirr? – General Purpose
– Technical Introduction
Generalizing Kirrkirrr– Allowing heterogeneous dictionaries
– Nahuatl: a case study
Redesigning Kirrkirr’s Network Visualization
What is Kirrkirr? Going beyond: unlike paper dictionaries, electronic
dictionaries can provide an interactive educational tool customizable to various audiences.
Taking advantage: contrary to the blandness of typical electronic dictionaries, Kirrkirr presents the contents of a dictionary in flexible, interactive, customizable, and (especially) fun ways.
Audience: Kirrkirr has diverse target users, with varying levels of literacy, ranging from professional linguists, elementary school children, teachers, and native speakers.
Technical Details
The dictionary is stored in XML– Rather than load the large (10Mb) XML file in memory,
each headword’s XML entry is loaded individually as needed
Formatted entries are rendered using XSLT
Dictionary is accessed via XPath The program is written in Java
– Java 1.1.8+Swing for backward-compatibility
Original Application
Originally, Kirrkirr was used with the Australian Aboriginal language Warlpiri, spoken by about 3,000 people in northern Australia.
Kirrkirr used a Warlpiri-English dictionary developed by linguists in Australia, with detailed information about each word, including glosses, definitions, dialects, grammatical comments and cross-references between words for synonyms, antonyms, “see also” and other relationships.
Generalizing Kirrkirr
Want to incorporate disparate sources– Generalize to broaden allowable dict. formats, and
consequently, the number of accessible languages
Two ways to generalize dictionary access (issues similar to DB schema integration)– Specify an overarching format to be adhered to
• But, gets unwieldy as complexity grows, and• no single “best” schema for all purposes
– Allow generic format and require conversion specification
• provide just enough info. for program to get out what it needs (not full translation of data)
Generalizing Kirrkirr II
Kirrkirr does the latter (generic+conv)– Allow heterogeneous dictionary formats
Challenges in Conversion
Preprocessing dictionary data– Converting to XML
– Detecting duplicate entries (homophones), and adding uniquifier
– Linking up pictures and sounds
– Alphabetizing/ordering of entries
Challenges in Conversion II
Writing the XML conversion specification– Cross-referencing links between words
– Fuzzy spelling rules (for regexps)
Runtime formatting of dictionary entries– Designing custom XSLTs for HTML display
• Different XSLTs for different audiences: basic stylesheet for schoolchildren and novices to the language, and more complex views for linguists and teachers
Nahuatl: A Case Study Nahuatl: spoken throughout
North and Central America (language of Aztecs)
Dictionary contains parallel data for multiple dialects (in headword and gloss)
An attempt to apply generalization to real-world example
Unforeseen hurdles to conversion
Nahuatl: Some Issues
Special characters / character encoding– During preprocessing (“equivalency” in XML)
• Solution: Temporarily change XML encoding
– During runtime display (unrecognized characters and truncated entries)
• Solution for A: Wrap in Java Stream Readers and Writers
• Solution for B: Issue is multiple character entities under a single element in DOM tree: collate entities
Nahuatl: More Issues
File naming conflicts– Implementation uses headword for HTML filenames
– Nahuatl’s colons not allowed on some platforms• Short-term solution: substitute on the fly• Long-term solution: name auto-generated, or based on
MIME/Base64 encoding
Dictionary anomalies– Tags in fields. Erroneously escaped by Shoebox
• Solution: Regexp replace
– Invalid or special headwords
Exploring New Issues
Redesigning network visualization panel
Usability Note / Design Consideration– Screen size limitations
– General eye-sight varies by target audience
– The above puts an upper bound on what we can do in one panel
Basic Idea
Word links represented visually as nodes and edges.
Edge colors represent link types.
Former Kirrkirr Visualization
HCI issues– Focus and attention
– Word islands
– Visual organization• Random Positioning• Misleading Distance
– Lack of History or Sequence• No tangible sense of back / forward with a single panel
Former Kirrkirr Visualization II
Software Design Shortcomings– Procedural Paradigm - very un-Java / un-OO
– Lack of extensibility and readability
– One large file doing most of work
– Flawed algorithm, and unnecessarily complicated
– First piece of code written for Kirrkirr - became “crufty” as Kirrkirr grew and evolved (also, written when Swing a nascent technology)
Not necessarily evident to user, but makes extension difficult
New Network Visualization
Basic premise remains the same (nodes and edges representing words and links)
Redesign to address HCI concerns, rewrite (vs. adapt) to improve software design
Basic premise remains the same (nodes and edges representing words and links)
Redesign to address HCI concerns, rewrite (vs. adapt) to improve software design
Addressing HCI Issues
Multiple panels– Reduce visual clutter
– Related words together, unrelated separate
– Provide sense of sequence
– Background panels should perhaps better indicate nodes they contain
• but feedback somewhat limited by screen size
Improve visual organization– Group links by type (vs. random)
– Make distance a relevant factor (vs. random)
– User has freedom to move nodes (vs. spring algorithm)
Improved Software Design Modular, OO approach
– Split up into constituent components: panels, nodes, edges
• Each component handles its own characteristics (e.g., color) and events (e.g., mouse listening)
– Encapsulation: Layered pane contains network panels; each panel consists of nodes and edges
– Model / View separation• Each object has model / view• Views listen to models; models fire changes• Little wrinkle with edges: one view (less overhead)
Easier to maintain and extend
Future Visual Enhancements
Network visualization– Further improve algorithm for placing nodes and links
– Improve feedback for panel switching (animate?)
– Incorporate semantic domains
Wordlist sidebar– Improving navigation and focus
– Perhaps a diamond-shaped “dial” of sorts, w/ words in central area larger
• Idea is to provide both context and focus, overview and detail
Future Visual Enhancements II
Semantic domain exploration– JTree alternatives
– Want ability to browse entire dictionary, not simply a history