1
Improving Interpretive Interfaces for Math Entry
Richard Zanibbi
Department of Computer Science
Rochester Institute of Technology
2
RIT Document and Pattern Recognition Lab (DPRL)
Goals:1. Improve theory and tools for constructing and evaluating
pattern recognition systems2. Apply these to problems in document recognition and pen-
based computing
Members:• Richard Zanibbi• Kurt Kluever (Master’s student)• New members welcome!
http://www.cs.rit.edu/~rlaz/dprl.html
3
Current Directions:1. Theory and Tools:
• Tools for recognition module integration and evaluation, such as the Recognition Strategy Language (Zanibbi et al.)
• Game-theoretic models of recognition problems and systems (e.g. for classifier combination)
• Machine learning algorithms for system optimization
2. Applications:• Pen and image-based math entry (lab maintains open-source
Freehand Formula Entry System(Smithies, Novins, Arvo, Zanibbi et al.)
• Optical character recognition (OCR)• Image and text-based document retrieval• “CAPTCHAs” (for distinguishing humans from 'bots’)• Table recognition, etc.
4
Interpretive Interfaces for Math Entry
5
Pen-Based Math Entry
Recognition Challenges• Large number (e.g. > 500 in LaTeX) of symbols, many
similar in structure (e.g. 0 and O)• Layout of symbols on baselines can be ambiguous• Little redundancy• Context influences symbol identity and layout interpretation
6
Example:Freehand Formula Entry System/DRACULAE
Contributors:FFES first developed as an MSc project at University of
Otago (Smithites, Novins), New Zealand, using CIT tools of Jim Arvo et al. in 1998
Since then, contributors from Queen’s University (CA), Concordia University (CA), and around the world (CMU, UC Berkley, Companies and non-profits in California and France)
7
DRACULAE (Zanibbi, 2002)
“Diagram Recognition Application for Computer Understanding of Large Algebraic Expressions”
8
DRACULAE:Layout Classes for Symbols
Symbol name defines class membership.
9
DRACULAE Layout Analysis: Sketch
Algorithm:
1. Symbols assigned layout type (class) based on symbol identity
2. Sort symbols left-right on leftmost edge of Bounding Box
3. Create baseline structure tree with region node “Expression”
4. Recursively:
a) Search right-to-left, locate the leftmost (“start”) baseline (dominance rules for symbol layout class pairs)
b) From start symbol, search left-right in symbol list for symbols adjacent on baseline (**Zhang: fuzzy version)
c) Add baseline symbols as children of parent region node
d) Place non-baseline symbols in lists associated with region nodes (e.g. for super/subsc/bleft etc.)
e) Apply a-d to each new region, until no new regions created
10
Expanding the View…Integration of scanned and pen-based expressionsInfty system, FFES prototype (impl. Josh Zimler 2006)
Long Term Goal: Flexible input and combinationAllow one to easily combine and then reformat/interpret
• LaTeX, eqn, etc.• MATLAB, Mathematica, etc.• Handwritten expressions (tablet/mouse)• Scanned images of handwritten or typeset expressions• “Vector drawing” interface input, e.g. as in Xpress (Pollanen
et al.)
11
Other Math Entry InterfacesNatural Log by Matsakis, Miller, and Viola (MIT)JIMHR: (Java-Based) Interactive Math Handwriting
Recognizer, a merge and port of FFES/DRACULAE and the Natural Log system by Joy-Gong Ho (Acuitus Corp., USA)
JMathNotes by Ernesto Tapia Rodriguez (Free University of Berlin)
Infty by M. Suzuki et. al. (Kyushu University, Japan)MathJournal by XThink Inc: first commercial pen-based
math recognition systemMathPad by Joseph LaViola
Links available: http://www.cs.rit.edu/~rlaz
12
The Recognition Strategy Language (RSL)
13
Motivation: A high-level language for pattern recognition algorithms
Table Recognition Survey (Zanibbi et al. 2004)Summarizes literature in terms of observations,
transformations, and inferences.Techniques studied characterized as making the follow types
of inferences (decisions): • Parameter values (e.g. thresholds)• Interpretation Model Operations:
– Segmentation (identifying regions of interest in data)– Classification (assigning types to regions)– Relating regions (e.g. topology (adjacencies))– Rejecting segments, classes, and region relationships
(Unanswered) Question: How should we combine recognition modules in a complex
math entry system?
Example: Simple Table Structure Recognition Algorithm (Part 1)
model regions Image Word Cell % default:’Region’ Row Column end regions
model relations % default:’contains’ adjacent_right adjacent_below end relations
recognition parameters sMaxRowSeparation 2 % millimetres sMaxColumnSeparation 2 % millimetres aResolution 300 % dpi; defaultend parameters
15
strategy main adapt aResolution using getScanResolution() observing {Image} regions
classify {Word} regions as {Cell}
relate {Cell} regions with {adjacent_right} using defineRightAdjacency(sMaxRowSeparation,aResolution)
segment {Cell} regions into {Row} regions using relationClosure() observing {adjacent_right} relations
relate {Cell} regions with {adjacent_below} using defineLowerAdjacency(sMaxColSeparation,aResolution)
segment {Cell} regions into {Column} regions using relationClosure() observing {adjacent_below} relations accept interpretationsend strategy
Trivial Decision
Observation Specification
External Decision Function
Decision type
DecisionFunction
Parameters
Input: Params, Graph withImage, Word regions (BBs)Output: Cells, Rows, Cols
1. Translate RSL Program to TXL (Using TXL)
2. Pass Input Graph (text file) to Program
3. Output (text files):
• Accepted Structures (interpretations)
• Log of all decisions and their outcomes
Running RSL Programs
17
False Negatives( F )
Generated Hypotheses:( A U R )
Recognition Targets:Correct Hypotheses
New Metrics Based on Hypothesis Histories: Historical Recall and Precision
Recall 4/8 (50.0%) 2/8 (25.0%) 8/8 (100.0%)
Precision 4/12 (33.3%) 2/5 (40.0%) 8/8 (100.0%)
Historical Recall 4/8 (50.0%) 6/8 (66.7%) 8/8 (100.0%)
Historical Precision 4/12 (33.3%) 6/17 (35.3%) 8/19 (42.1%)
Hypothesis History
19
Cell Detection Results (Handley, 2001) RSL Re-implementation on Table ‘a038’ (UW-III)
*Inference times shown are those affecting cells
0: Input (words and lines)
1: Classify words as cells
16: Merge ‘horizontally close’ cells
35: Merge cells sharing column, row assignments. Nearly 50% of correct cells rejected; new correct cells also detected
47: Two cells merged producing column header ‘Total pore space (percent)’
51: Merge header cells bounded by two horizontal lines
83: Merge cells sharing line and white space separators
20
RSL and Math EntryProposal: “MIN” SystemNew interface for math entry and offline experimentsUse RSL to define recognition strategies, capture results.(Really): testbed for studying recognition algorithms and their
intelligent combination, organization, and deployment in practice.
Goals:Compare different approaches to recognizing mathematical
expressions (from input to output) represented in RSLAllow flexible training, combination, and alteration of various
recognition strategies.Extend RSL to accommodate math and other problem
domains more effectively, while remaining abstract
21
(Some) Relevant Journals and Conferences
Journals• IEEE Trans. Pattern Analysis and Machine Intelligence• Machine Learning• Pattern Recognition• Pattern Recognition Letters• Artificial Intelligence• Int’l J. Document Analysis and Recognition• …
Conferences• Int’l Conf. Machine Learning• IEEE Computer Vision and Pattern Recognition• Computational Learning Theory (COLT)• Int’l Conf. Document Analysis and Recognition• Int’l Work. Document Analysis Systems• …
22
Thank you.
Questions?
Support:
GCCIS Department of Computer Science