1
Penn
HP Labs Bangalore, 8/21/2003
Annotation as Algebra:a formal framework for linguistic
annotation
Mark LibermanUniversity of Pennsylvania
(joint work with Steven Bird, Melbourne University)
2
Penn
HP Labs Bangalore, 8/21/2003
Outline
Motivation Sketch of the idea Survey of linguistic annotation Annotation graphs as a formal framework Practical implementations and experience Issues for the future
3
Penn
HP Labs Bangalore, 8/21/2003
What linguistic annotation is (and isn’t) “Linguistic annotation” means
symbolic descriptions of specific linguistic signals e.g. transcriptions, parses, etc.
it does not include things like: metadata
e.g. information about speakers, recordings, documents, etc.
typically stored in RDB referenced by elements of linguistic annotation
lexicons but these can be treated in a common
framework
4
Penn
HP Labs Bangalore, 8/21/2003
Motivation
A jungle of annotation file formats e.g. more than 20 common formats
for time-marked orthographic transcriptions Many new formats every year
Multiple annotations of the same data No good way to search annotations
different coding needed for each format extra difficulty of searches across formats
Problems for: tool builders researchers corpus builders and maintainers
5
Penn
HP Labs Bangalore, 8/21/2003
Basic idea #1: what to do Abstract away from file formats,
to the logical structure of linguistic annotation Replace two-level model with three-level model
as in database technology several decades ago so many applications can access many kinds of data
through a consistent API
Choose a logical structure with good properties simple, conceptually natural, computationally efficient algebra to facilitate boolean combination of queries
6
Penn
HP Labs Bangalore, 8/21/2003
Two-level model:
7
Penn
HP Labs Bangalore, 8/21/2003
Three-level model:
8
Penn
HP Labs Bangalore, 8/21/2003
Basic idea #2: how to do it Three kinds of assertion recur in linguistic
annotation assigning a label
“This chunk of stuff has property X” sequencing labels
“chunk B immediately follows chunk A” anchoring the edges of labels
“this chunk boundary has coordinates k” (in time, space, text...)
Formalized as a labeled DAG, these primitives provides a logical structure
adequate for all linguistic annotation The result also defines an algebra
useful for searching and in other ways
9
Penn
HP Labs Bangalore, 8/21/2003
Associate a “label” (typed, structured symbolic information) with a region of a linguistic signal
Basic assertion type 1: Labeling
10
Penn
HP Labs Bangalore, 8/21/2003
Basic assertion type 2: sequencing
Example:
The stretch of signal labeled “this”is followed by a stretch of signal labeled “is”
11
Penn
HP Labs Bangalore, 8/21/2003
Basic assertion type 3: anchoring
Example:
The stretch of signal labeled “this”begins 137.4592 secondsfrom the start of file XYZ.
12
Penn
HP Labs Bangalore, 8/21/2003
Informal formalization
An “annotation graph” (AG) is: a directed acyclic graph whose arcs are labeled with fielded records
e.g. phoneme=“p” or word=“this”
whose nodes may be labeled with signal coordinates
e.g. 3.45692 seconds
Labeling → arc labelsSequencing → Anchoring → signal coordinates on nodes
That’s all!
13
Penn
HP Labs Bangalore, 8/21/2003
Outcome
API, open source toolkit (C,C++,TCL,Python); sample tools:
Java version (“ATLAS”) developed by NIST
14
Penn
HP Labs Bangalore, 8/21/2003
Annotation formats & tools
Surveyed in 1999 by Liberman and Bird
Documented on web pagehttp://ldc.upenn.edu/annotation
Used in designing annotation graphsystem & AG software
Survey is updated periodically
15
Penn
HP Labs Bangalore, 8/21/2003
Some animals in the annotation zoo1 TIMIT2 BAS Partitur3 CHILDES4 LACITO5 LDC CALLHOME6 NIST UTF7 Switchboard (four types of
annotation)8 ... etc. ...
16
Penn
HP Labs Bangalore, 8/21/2003
train/dr1/fjsp0/sa1.wrd: train/dr1/fjsp0/sa1.phn:2360 5200 she 0 2360 h#5200 9680 had 2360 3720 sh9680 11077 your 3720 5200 iy11077 16626 dark 5200 6160 hv16626 22179 suit 6160 8720 ae22179 24400 in 8720 9680 dcl24400 30161 greasy 9680 10173 y30161 36150 wash 10173 11077 axr36720 41839 water 11077 12019 dcl41839 44680 all 12019 12257 d44680 49066 year ...
Sample TIMIT data
17
Penn
HP Labs Bangalore, 8/21/2003
5200 6160 96808720
had
hv ae dcl
TIMIT interpreted graphically
18
Penn
HP Labs Bangalore, 8/21/2003
W = word level5200 9680 had
P = phoneme level5200 6160 hv6160 8720 ae8720 9680 dcl
TIMIT as Annotation Graph
19
Penn
HP Labs Bangalore, 8/21/2003
BAS Partitur
Goal: a common format for research results
from many German speech projects.
A multi-tier description of speech signals:
KAN - the canonical transcriptionORT - orthographic transcriptionTRL - transliterationMAU - phonetic transcriptionDAS - dialogue act transcription
20
Penn
HP Labs Bangalore, 8/21/2003
BAS Partitur: example
KAN:0 j'a: ORT:0 ja MAU: 4160 1119 0 jKAN:1 S'2:n@n ORT:1 schönen MAU: 5280 2239 0 a:KAN:2 d'aNk ORT:2 Dank MAU: 7520 2399 1 SKAN:3 das+ ORT:3 das MAU: 9920 1599 1 2:KAN:4 vE:r@+ ORT:4 wäre MAU: 11520 479 1 nKAN:5 z'e:6 ORT:5 sehr MAU: 12000 479 1 nKAN:6 n'Et ORT:6 nett MAU: 12480 479 -1
DAS:0,1,2 @(THANK_INIT BA)DAS:3,4,5,6 @(FEEDBACK_ACKNOWLEDGEMENT BA)
21
Penn
HP Labs Bangalore, 8/21/2003
j'a: S'2:n@n
KAN:
ORT: ja sch"onen
DAS: @(THANK_INIT BA)
4160 5280
7520
j a:MAU
:
BAS Partitur graphical structure:
KAN:0 j'a: ORT:0 ja MAU: 4160 1119 0 jKAN:1 S'2:n@n ORT:1 sch"onen MAU: 5280 2239 0 a:DAS:0,1,2 @(THANK_INIT BA)
22
Penn
HP Labs Bangalore, 8/21/2003
Partitur differences from TIMIT
File organization:everything is in a single file
(even metadata)Time marking:
time anchors are in only one tier (MAU)
time anchors use <start offset, duration-1>
Relationship between the tiers:KAN tier supplies a set of identifiersMAU tier: several lines for each KAN lineDAS tier: one line for several KAN lines
Temporal structure:MAU and DAS define convex intervals
23
Penn
HP Labs Bangalore, 8/21/2003
BAS Partitur: Annotation graph
ORT: 0 ja MAU: 4160 1119 0 jORT: 1 sch"onen MAU: 5280 2239 0 a: MAU: 7520 2399 1 S MAU: 9920 1599 1 2: MAU: 11520 479 1 n
DAS:0,1,2 @(THANK_INIT BA)
24
Penn
HP Labs Bangalore, 8/21/2003
CHILDES
Child language acquisition data Archive organized by Brian
MacWhinney at CMU
CHAT transcription format Tools for creating, browsing, searching Contributions by many researchers
around the world
25
Penn
HP Labs Bangalore, 8/21/2003
CHILDES Annotation
*ROS: yahoo.%snd: "boys73a.aiff" 7349 8338*FAT: you got a lot more to do # don't you?%snd: "boys73a.aiff" 8607 9999*MAR: yeah.%snd: "boys73a.aiff" 10482 10839*MAR: because I'm not ready to go to <the bathroom> [>] +/.%snd: "boys73a.aiff" 11621 13784
26
Penn
HP Labs Bangalore, 8/21/2003
CHILDES differences from TIMIT
long recordings with multiple speakers time specified at turn level only there are gaps between the turns the transcription contains embedded
annotations
27
Penn
HP Labs Bangalore, 8/21/2003
CHILDES annotation graph
*ROS: yahoo.%snd: "boys73a.aiff" 7349 8338*FAT: you got a lot more to do # don't you?%snd: "boys73a.aiff" 8607 9999
NB: incomplete time info, disconnected structure
28
Penn
HP Labs Bangalore, 8/21/2003
CHILDES: RDB connection
ID NAME ROLE AGE SEX BIRTH
1 Ross Child 6;3.11 male 23-DEC-1977
2 Mark Child 4;4.15 male 19-NOV-1979
3 Brian Father
4 Mary Mother
“metadata” about speakers, recordings etc. stored separately in relational tables
29
Penn
HP Labs Bangalore, 8/21/2003
LACITO
Langues et Civilisations a Tradition Orale recordings of unwritten languages,
collected and transcribed over three decades preservation and dissemination
Based on XML markup for alignment to audio signal different XSL style sheets for display
generating HTML with hyperlinks to audio clips
30
Penn
HP Labs Bangalore, 8/21/2003
LACITO example
<S id="s1"> <AUDIO start="2.3656" end="7.9256"/> <TRANSCR> <W><FORM>nakpu</FORM> <GLS>deux</GLS></W> <W><FORM>nonotso</FORM> <GLS>soeurs</GLS></W> <W><FORM>si&x014b;</FORM> <GLS>bois</GLS></W> <W><FORM>pa</FORM> <GLS>faire</GLS></W> <W><FORM>la&x0294;natshem</FORM> <GLS>allerent</GLS></W> <W><FORM>are</FORM> <GLS>dit.on</GLS></W> <PONCT>.</PONCT> </TRANSCR> <TRADUC lang="Francais">On raconte que deux soeurs allerent chercher du bois.</TRADUC> <TRADUC lang="Anglais">They say that two sisters went to get firewood.</TRADUC></S>
31
Penn
HP Labs Bangalore, 8/21/2003
LACITO as AG
<AUDIO start="2.3656" end="7.9256"/><W><FORM>nakpu</FORM> <GLS>deux</GLS></W><W><FORM>nonotso</FORM> <GLS>soeurs</GLS></W><W><FORM>si&x014b;</FORM> <GLS>bois</GLS></W><W><FORM>pa</FORM> <GLS>faire</GLS></W><TRADUC lang="Francais">On raconte que deux ...</TRADUC><TRADUC lang="Anglais">They say that two ...</TRADUC>
32
Penn
HP Labs Bangalore, 8/21/2003
LACITO discussion
Two kinds of partiality for times: where they are simply unknown where they are inappropriate
Unknown times: the annotation is incomplete time-alignment is coarse-grained
Inappropriate times: for word boundaries in the phrasal
translation for punctuation?
33
Penn
HP Labs Bangalore, 8/21/2003
LDC Call Home example
980.18 989.56 A: you know, given how he's how far he's gotten, you know, he got his degree at &Tufts and all, I found that surprising that for the first time as an adult they're diagnosing this. %um 989.42 991.86 B: %mm. I wonder about it. But anyway. 991.75 994.65 A: yeah, but that's what he said. And %um 994.19 994.46 B: yeah. 995.21 996.59 A: He %um 996.51 997.61 B: Whatever's helpful. 997.40 1002.55 A: Right. So he found this new job as a financial consultant and seems to be happy with that. 1003.14 1003.45 B: Good.
34
Penn
HP Labs Bangalore, 8/21/2003
LDC CallHome as AG
995.21 996.59 A: He %um 996.51 997.61 B: Whatever's helpful. 997.40 1002.55 A: Right. So ...
35
Penn
HP Labs Bangalore, 8/21/2003
CallHome discussion
Speaker overlap No special devices, just turn time-marks Scales for an arbitrary number of
speakers Information about word-level overlap
is left ambiguous Additional time references
could easily specify word overlap
36
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF (circa 1999)
NIST: National Institute for Standards and Technology(USA)
UTF: “Universal Transcription Format” Intended to generalize over several earlier
LDC broadcast news and conversation transcription formats
Special treatment for: metadata, time stamps, speaker overlap,
contractions
N.B. now abandoned in favor of AG-based representations
37
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF example (from BN)
<turn speaker="Roger_Hedgecock" spkrtype="male" dialect= "native" start="2348.811875" end="2391.606000" mode="spontaneous" fidelity="high"> <time sec="2387.353875"> on welfare and away from real ownership \{breath and <contraction e_form="[that=>that]['s=>is]">that's a real problem in this <b_overlap start="2391.115375" end="2391.606000"> country<e_overlap></turn><turn speaker="Gloria_Allred" spkrtype="female" dialect= "native" start="2391.299625" end="2439.820312" mode="spontaneous" fidelity="high"> <b_overlap start="2391.299625" end="2391.606000">well i<e_overlap> think the real problem is that %uh these kinds of republican attacks <time sec="2395.462500">i see as code words for discrimination</turn>
38
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF: turn element
<turn speaker="Roger_Hedgecock" spkrtype="male" dialect= "native" start="2348.811875" end="2391.606000" mode="spontaneous" fidelity="high">
39
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF: Contraction
<contraction e_form="[that=>that]['s=>is]"> that's
40
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF: overlap
<b_overlap start="2391.115375" end="2391.606000">country<e_overlap>
41
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF: discussion
Relational data (e.g. speaker demographics)is embedded in the annotation (redundantly).
Time stampsare stored in three different places.
Speaker overlapis convolved with the speaker turn,so time relation with an external event disrupts the internal structure of a turn
Contractionsare treated in a way that facilitates link to
lexicon,but may be hard to ignore in a search function
42
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF as AG
43
Penn
HP Labs Bangalore, 8/21/2003
AG contraction treatment
Additional textual annotations: e.g. for expanding a contraction don't complicate the existing representation
--facilitates search
44
Penn
HP Labs Bangalore, 8/21/2003
NIST UTF / AG version
Metadatastored in a separate RDB table (cf.
CHILDES)Time stamps
stored in a single place -- AG nodesSpeaker overlap
not convolved with the speaker turn so temporal relationship with an external
event remains external to the structure of a turn
Contractionsno new device, easily ignored in search
No artificial order on speaker turns
45
Penn
HP Labs Bangalore, 8/21/2003
Switchboard
Corpus of 2400 5-minute telephone conversations collected at Texas Instruments in 1991Transcribed and aligned on three levels:
conversation, speaker turn, wordSubsequently annotated for:
POS, syntactic structure,breath groups, disfluencies,speech acts,phonetic segments,etc.
Then re-transcribed with many corrections!
--Proliferation of layers with different tokenizations--Problem of correction after annotation
46
Penn
HP Labs Bangalore, 8/21/2003
SWB example (1, 2)
B 21.86 0.26 MetricB 22.12 0.26 system,B 22.38 0.18 noB 22.56 0.06 one'sB 22.86 0.32 very,B 23.88 0.14 uh,B 24.02 0.16 noB 24.18 0.32 oneB 24.52 0.28 wantsB 24.80 0.06 itB 24.86 0.12 atB 24.98 0.22 allB 25.66 0.22 seemsB 25.88 0.22 like.
[ Metric/JJ system/NN ],/, [ no/DT one/NN ]'s/BESvery/RB ,/, [ uh/UH ] ,/, [ no/DT one/NN ]wants/VBZ [ it/PRP ]at/IN [ all/DT ]seems/VBZlike/IN ./.
47
Penn
HP Labs Bangalore, 8/21/2003
SWB example (3, 4)
B.22: Yeah, / no one seems to be adopting it. / Metric system, [ no one's very, + {F uh, } no one wants ] it at all seems like. /
((S (NP-TPC Metric system) , (S-TPC-1 (EDITED (RM [) (S (NP-SBJ no one) (VP 's (ADJP-PRD-UNF very))) , (IP +)) (INTJ uh) , (NP-SBJ no one) (VP wants (RS ]) (NP it) (ADVP at all))) (NP-SBJ *) (VP seems (SBAR like (S *T*-1))) . E_S))
48
Penn
HP Labs Bangalore, 8/21/2003
Switchboard: AG
49
Penn
HP Labs Bangalore, 8/21/2003
Another multiple annotation
It is quite realistic to have this many diverse annotations (and more!)
for the same material...
50
Penn
HP Labs Bangalore, 8/21/2003
AG formalization: Background
Annotation - the basic action: associate a label with an extent of signal labels may be of different types different types may span different
amounts of time; need not form a hierarchy
Minimal formalization: directed graph typed, fielded records on the arcs optional time references on the nodes
51
Penn
HP Labs Bangalore, 8/21/2003
TimelinesNodes are anchored to signals using offsetsAn annotation may reference more than one
signal e.g. simultaneous audio and video signals
signals from multiple microphonesaudio and physiological signals
All the signals covered by a given annotation must be from the same "flow of time" = timeline T
but signals may cover a timeline only partially(Other ordered sets,
such as the sequence of characters in a text,may also be treated as timelines... )
52
Penn
HP Labs Bangalore, 8/21/2003
Two Signals, One Timeline
(Could be treated as a single multi-channel signal --but different channels might be in different files,have different frame rates, etc.)
53
Penn
HP Labs Bangalore, 8/21/2003
AG: Formal Definition
An Annotation Graph G over a label set L and timeline T is a 3-tuple <N,A,t>:
N = set of nodes A = set of arcs labelled with elements of L t = partial function from N to T
satisfying the following conditions:1 <N,A> is acyclic, with no nodes of degree
zero2 for any path from node n1 to n2, if t(n1)
and t(n2) are defined, then t(n1) <= t(n2)
54
Penn
HP Labs Bangalore, 8/21/2003
Condition 1
1. <N,A> is acyclic, with no nodes of degree zero
1a. AGs are acyclic expresses the linearity of signal
annotations an important property wrt implementations
and to QLs containing path expressions
1b. AGs have no orphan nodes the only point of nodes is to anchor the
arcs avoids the situation of AGs that are
identical but for orphan nodes
55
Penn
HP Labs Bangalore, 8/21/2003
Condition 2for any path from node n1 to n2, if t(n1) and
t(n2) are defined, then t(n1) <= t(n2)
2. AGs respect the flow of time(or the structure of another anchoring
space)
1 12 1.23
1 122 3.15
1 2
56
Penn
HP Labs Bangalore, 8/21/2003
AG: Interpretation of LabelsArc labels may be interpreted as:
substantive content conforming to a coding practice as meta-commentary as a reference to other material as an identifier as arbitrary binary data
Choice of label interpretations falls outside the scope of the formalism
57
Penn
HP Labs Bangalore, 8/21/2003
AG: ExpressivenessIs the formalism too minimalist?Some things that some people want:
1. cross-reference from a label to another arbitrary label, arc or node
2. labels as well as anchors for nodes3. anchoring nodes to arcs or labels rather than timelines4. anchoring arcs/labels in 2- or 3-dimensional spaces5. recursive structures in labels
“Core AG” has sufficient expressive capacity to encode, in an intuitive way, all commonly used formats,and also good properties wrt creation, maintenance, search
Our strategy:- see how far we can go with this core- dispense with more complex syntax and focus on
semantics- but some of (1) has been added in core AG
implementation,and (4) has been added in “ATLAS” (NIST version)
58
Penn
HP Labs Bangalore, 8/21/2003
Structures for a single layer
All of these have (one or more) natural representations
in the basic AG formalism.
Multiple layers can of course be added in a general way.
59
Penn
HP Labs Bangalore, 8/21/2003
Equivalence classes
Equivalence classes (joint reference to an external ID)provide a way to establish symmetrical inter-label
linkageswithout any new formal devices
60
Penn
HP Labs Bangalore, 8/21/2003
AG as algebra An AG can be represented as a set of arcs
each with an associated labeland (optionally-anchored)source and destination nodes
The power set of this arc setdefines a boolean algebra (as usual)
Every member of the power setis itself a well-defined AG
This algebra can be used for queries,just as the relational algebra is for RDBs
Adding e.g. pointers from labels to other arccompromises this property(because arc subsets are not well-formed
if pointers cannot be dereferenced)
61
Penn
HP Labs Bangalore, 8/21/2003
AG as RDB
An AG can therefore also be interpretedas a relational table
or (more conveniently) as a set of three relational tables
This allows standard RDB implementationsto be used for AG storage and
retrieval Obvious advantages,
though standard RDBmay not use AG structure optimally...
62
Penn
HP Labs Bangalore, 8/21/2003
Relational Representation
a1t1
a2t2
Ann1: <l1,l2,...,ln>
Three relations: anchor, annotation (=arc), feature
(=label)
63
Penn
HP Labs Bangalore, 8/21/2003
Anchor Relation
a1t1
a2t2
Ann1: <l1,l2,...,ln>
AnchorId Offseta1 t1a2 t2
64
Penn
HP Labs Bangalore, 8/21/2003
Annotation (arc) Relation
a1t1
a2t2
Ann1: <l1,l2,...,ln>
AnnotationId Source DestinationAnn1 a1 a2
65
Penn
HP Labs Bangalore, 8/21/2003
Feature Relation
a1t1
a2t2
Ann1: <l1,l2,...,ln>
AnnotationId Feature ValueAnn1 F1 l1Ann1 F2 l2... ... ...
66
Penn
HP Labs Bangalore, 8/21/2003
Queries across multiple tables
ID Sex DR Ht
AKS0 F 1 5'04"
ASW0 F 5 5'06"
BJL0 F 5 5'07"
train/dr2/fbjl0/
ha /hh aa1/
habit /hh ae1 b ix t/
had /hh ae1 d/
hafta /hh ae1 f t ax/
67
Penn
HP Labs Bangalore, 8/21/2003
Queries on AG Tablesselect * from FEATURE where
FEATURE.AGID="Timit:AG80"select ANNOTATIONID,SPKRINFO.ID
from FEATURE,SPKRINFOwhere SPKRINFO.DR=1and SPKRINFO.Ht=70and FEATURE.VALUE="dark"
68
Penn
HP Labs Bangalore, 8/21/2003
AG software
AGTK provides API and language bindings version 2.0 recently released
Sample applications Open-source license Available on sourceforge:
69
Penn
HP Labs Bangalore, 8/21/2003
AGTK architecture
70
Penn
HP Labs Bangalore, 8/21/2003
API Summary Functions for creating, accessing,
modifying, storing and loading AGs C++ library Compiles on Unix and Windows Scripting language access:
Python, Tcl/tk
71
Penn
HP Labs Bangalore, 8/21/2003
File I/O LibraryApproach:
build import methods for all widely used formats
public API & documentation to encourage others to contribute code for their formats
Currently supported: AIF (ATLAS Interchange Format -
XML) BAS, BU, CALLHOME, CSV,
Switchboard, TIMIT, Treebank, xlabel
72
Penn
HP Labs Bangalore, 8/21/2003
Integration with other tools
Example: WaveSurfer/SNACKSjölander and Beskowwww.speech.kth.se/wavesurfer/
open source software for sound visualization, analysis and manipulation
Linux, Windows 95/98/NT/2k, Mac, Solaris, ... customizable, extensible, embeddable can read and write:
wav, au, aiff, mp3, csl, sd, sphere unlimited file size
Unicode support
73
Penn
HP Labs Bangalore, 8/21/2003
Wavesurfer Screenshot 1
74
Penn
HP Labs Bangalore, 8/21/2003
Wavesurfer Screenshot 2
75
Penn
HP Labs Bangalore, 8/21/2003
Wavesurfer Screenshot 3
76
Penn
HP Labs Bangalore, 8/21/2003
Wavesurfer Screenshot 4
77
Penn
HP Labs Bangalore, 8/21/2003
Annotation Component: Spreadsheet (TRAINS+DAMSL)
Annotation here presented in spreadsheet mode
Each row is an annotation of stretch of signalEach column is a type of annotation
78
Penn
HP Labs Bangalore, 8/21/2003
TableTrans tool
Seamless integration of AGTK for annotation,and Wavesurfer for audio display and playback.
79
Penn
HP Labs Bangalore, 8/21/2003
Components in TableTrans
80
Penn
HP Labs Bangalore, 8/21/2003
Another annotation GUI
81
Penn
HP Labs Bangalore, 8/21/2003
Issues for the future
Some positive things “stand-off” (rather than in-line) annotation
is now common though by no means universal but in-line annotators mostly realize they are
sinful AGTK implementation is mature
libraries are well designed & implemented good integration with GUIs and DB backends can read/write many common formats
Some AG-based tools are good basically, those that have really been used demand pull & influence of users on
development
82
Penn
HP Labs Bangalore, 8/21/2003
Issues for the future
Some things need more work AG API and AGTK are not yet widely used Many AG-based tools are rough sketches NIST ATLAS is not popular with researchers
(java, complexity) For many projects,
something simpler & less general is still the local optimum:
lines of tab-separated fields, or in-line mark-up (XML or ad hoc), or other legacy or new ad hoc formats
but it’s still early days...