+ All Categories
Home > Documents > IIIT Hyderabad. Handwriting Graphical representation of thoughts Using predefined symbols Still...

IIIT Hyderabad. Handwriting Graphical representation of thoughts Using predefined symbols Still...

Date post: 24-Dec-2015
Category:
Upload: brook-logan
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
86
IIIT Hyderabad IIIT Hyderabad Writer Identification and Verification for online Handwriting Sachin Gupta ([email protected]) Advisor Dr. Anoop M. Namboodiri
Transcript

IIIT

Hyd

erab

adII

IT H

yder

abad

Writer Identification and Verification

for online Handwriting

Sachin Gupta([email protected])

Advisor

Dr. Anoop M. Namboodiri

IIIT

Hyd

erab

ad

Handwriting Graphical representation of thoughts

• Using predefined symbols• Still used frequently (e.g., note taking)

An acquired skill• Years of habituation and practice

Complex generation process• Neuromuscular perceptual-motor task• Hand contains some 27 bones and 40 muscles

IIIT

Hyd

erab

ad

Handwriting Identification

Handwritten documents have associated identity

Handwriting Identification• Study of writership of the documents• Comparison with reference handwritten documents

IIIT

Hyd

erab

ad

Individuality (example)X YX

IIIT

Hyd

erab

ad

Applications of handwriting analysis

• Forensic tool

• Crime detection tool

• Social compatibility tool

• Employment tool

• Business tool

• Self development tool

• Genealogy tool

• Scientific research tool

• Graphic tool

• Health tool

IIIT

Hyd

erab

ad

Recognition Vs Identification

Handwriting Recognition• To automatically understand the underline text in the document• Design of automated handwritten document reading systems• Suppress variation due to writer or handwriting style

Handwriting Identification• Study to determine the writer of the document• Enhance the variation due to different handwriting styles

IIIT

Hyd

erab

ad

Problem Statement

Writer Identification• Identify writer of a questioned document • Given pool of writers

Writer Verification• Verify whether the claimed identity is right?• Given: Data based of writers

Forensic Document Analysis• Verify whether two given documents are written by same person?

IIIT

Hyd

erab

ad

Identification

ReferenceData Base

Questioned Document

35

50

65

Matching Score

ResultWriter - 3

Comparisons

Who wrote this document?

1: N Matching

IIIT

Hyd

erab

ad

VerificationReference Data Base

Questioned Document

Mayank: I wrote this document !!! Mayank Sachin Amit

Comparator

Distance <

ThresholdYes

NO

Threshold: decided based on training documents’Within and Between writer distance distributions

1: 1 Matching

IIIT

Hyd

erab

ad

Individuality Features Sub-character and character level

• Shape and size• Choice of allograph

Word level• Connections and character spacing• Aspect Ratio

Line level• Slant and slope• Word spacing

Paragraph and page level• Indentations and arrangements of text• Uniformity of margins

W1 W2

Character Level Individuality

W1

W2

Word Level Individuality

IIIT

Hyd

erab

ad

Line and Paragraph LevelWriter-1 Writer-2

Slant and Slope of linesParallelism of LinesWord Spacing – number of words in a lineUniformity of Margins

Overall Texture

IIIT

Hyd

erab

ad

Challenges

High within writer variations

• Due to mood dependent nature of handwriting• No two piece of handwriting by any individual are same

Low between writer variations• Handwriting must be readable • Degree of variations are low

IIIT

Hyd

erab

ad

Online Vs Offline

Offline• Matrix of integers• Only shape and size information is available• Temporal information about how stroke is drawn is lost

Online• Sequence of X-Y coordinates, Pen up-down events• Shape and size information is available• Sequencing of points and strokes is available

IIIT

Hyd

erab

ad

Data collection and Annotation

Major Hurdle• Sequential process: Devices needed for online handwriting• People are reluctant to writing• Standard databases are not available

Online handwriting collection devices are not accurate

Automatic segmentation and annotation• Research problem

Data collection• 600 pages of data from around 50 writers in various scripts

IIIT

Hyd

erab

ad

State of the Art

Done by handwriting experts• Mostly manually• State of art systems are not available

Using • Context dependent information such as origin, type and condition

of the documents• Difficult to model mathematically

IIIT

Hyd

erab

ad

Theme

Identifying consistent features automatically• To discriminate between writers

Usability of discriminating features• Preserve discrimination

IIIT

Hyd

erab

ad

Major Contributions

Text-independent writer identification• Designing codebook of writers• Automatically identifying and extracting discriminating features

Text-dependent writer verification• Writer-specific text generation• Robust to forgery

Forensic document examination• Repudiation detection in handwritten documents

IIIT

Hyd

erab

ad

Text-independent writer identification

IIIT

Hyd

erab

ad

Text-independent ?

Underline text is not known• Data is not annotated• Given: Sequence of strokes and x-y coordinate values

Challenges of text-independent • Extract consistent curves (features) from documents• Compare similar features between two documents • Design codebook of individual writers

IIIT

Hyd

erab

ad

Consistency…X Y

IIIT

Hyd

erab

ad

Codebook of a writer

Six different clusters extracted from Devanagari script.

IIIT

Hyd

erab

ad

Theoretical background

Handwriting modeling studies• Strokes is the combination of different

forces• Handwriting curves become consistent

due to habituation

Relative velocity points of strokes are constant for same writer (Empirical results)

Velocity Profile of above stroke

Stroke from Devanagari Script

IIIT

Hyd

erab

ad

Classifier

Soft Classification

NN1

NN2

NN3

NNn

……

.Combined

Result

Classify Writers

12

3

n

Summarized framework

Questioned document

Cluster into different clusters

Writer Classification

IIIT

Hyd

erab

ad

ResultsExperimented with• Roman, Hindi, Cyrillic, Arabic and Hebrew

Training data

• Approx. 300-400 curves for Roman

• Approx. 700-800 curves for others

Test Data

• 100 curves for Roman

• 200-300 curves for othersTables and graphs are on next page…..

IIIT

Hyd

erab

ad

Varying No of Curves

Accuracy increases with number of curves.>85% accuracy reached with 200 curves (10-12

words).

Accuracy with 12 words

IIIT

Hyd

erab

ad

Script Vs Accuracy

~10 writers for all scripts For Most Scripts Top-2 accuracy is nearly 100% except Chinese Confusion between pairs of writers

IIIT

Hyd

erab

ad

Related work• Line level features

– Word spacing– Lower and Upper profile– Fractal & wavelet features – Loops and Blobs

• Paragraph level features– Image processing

• Grey scale histogram• Run length coding• Fractal image compression

– Texture features• Gabor filter, Wavelet• Contour-let GGD• Grey scale covariance matrix

– Online features• Pen pressure, velocity, azimuth• Velocity of Bary center

– Codebook generation• Using directional features

• Our approach– Code book design using – Sub-character features – Script independent framework– Online handwriting data– Identification with less amount of

data– Automatic Identification of consistent

and discriminating features

IIIT

Hyd

erab

ad

Result comparison

Schomaker et al[28]• Combination of directional, texture and image processing features • Identification: accuracy of 87% with 900 writers• Verification: Equal error rate of 3%-8%• Test Data size: 1 page of handwritten data

Our approach[5]• Using shape based features• Identification accuracy of ~85% with 15 writers• Test data size: 12 words (1 line)

IIIT

Hyd

erab

ad

Analysis

Shape and size based primitives • Obtain reasonable accuracy with simple algorithm.

Chinese script• Most of the strokes are straight line segment• Inter-stroke relations based features can be used

To increase accuracy• Robust clustering and classification algorithm• Fusion with high level like line and paragraph primitive

IIIT

Hyd

erab

ad

Text dependent writer Verification

IIIT

Hyd

erab

ad

Problem Statement

Text-independent systems• Large amount of data needed

Text-dependent framework• Higher Accuracy • Small amount of data needed

Problems (Text-dependent systems)• Forgery (due to fixed text known in advance)• Authentication text not known (usually random text is used)

IIIT

Hyd

erab

ad

Signature Vs Text-dependent

Signature and Text-dependent handwritingVariations are unlimited, signature need not be readableWriter consciously tries to write the same signature

ChallengesDiscrimination between Within and Between writer variation has to

be done Discriminating distance method have to find out

IIIT

Hyd

erab

ad

System Specification

Empirical finding• Discriminating power of primitives vary for individuals • Primitives: sub-characters, characters, words, etc.

System Specifications•Writer – specific text

For higher accuracies With limited amount of text

•Varying text across multiple authentication Robust to forgery

IIIT

Hyd

erab

ad

Boosting?

Classifier combination method• Combines weak classifiers to generate a accurate learning algorithm• Greedy algorithm

Select weak classifiers on each stage• based on previously selected classifier

Maintains a distribution of weights over training samples

IIIT

Hyd

erab

ad

Framework

Verification as 2-class problem• Positive samples Vs Negative samples

Given• Set of writers and primitives • Table of discriminating power

Randomness is included at each stage• Proportional to the Discriminating power of the classifier• More Discriminating: more probable to be accepted

IIIT

Hyd

erab

ad

Text Generation Process

Bag of Primitives

List of Writers

W1 W2 W3

W4 W5 W6

Randomness is included at selection process.

Threshold selected Is biased: accepting the writer• For lower False Rejection Rates

Fix Threshold and Reject WritersSelect it or

not?

Accuracy

IIIT

Hyd

erab

ad

Effect of Boosting

Distance

Prob

abil

ity X1

Within writer Distance

Between writer Distance

Number of Boosting Stages

IIIT

Hyd

erab

ad

Dynamic Time Warping

Naïve Alignment Re-sampled series

DTW Alignment

• Time Series Alignment • Dynamic Programming

Approach

• Different length feature vectors can be compared

IIIT

Hyd

erab

ad

Stroke Comparison

Dynamic Time Warping• Alignment of stroke done using dynamic programming

Directional features• Strokes representation: 12 Bins of curvature directions• Curvature angle: Different between adjacent tangents direction

1 1 2 3 3 4 3 0 0 0 0 10 360

IIIT

Hyd

erab

ad

Results

Experimented with English script (20 writers) and Hindi script(10 writers)

DTW and Directional feature extraction methods are used

Each user written about 10-12 words each• 3 fold cross-validation is used

IIIT

Hyd

erab

ad

Performance measures

False acceptance rate• Percentage of user forge user those are accepted• Should be lower for forensic application

Security is the major concern

False rejection rates• Percentage of genuine users those are rejected• Should be lower for civilian applications

Usability is the major concern

IIIT

Hyd

erab

ad

False Accept Rate (Directional Feature)

IIIT

Hyd

erab

ad

False Reject Rate(Directional Features)

IIIT

Hyd

erab

ad

False Accept Rate (DTW)

IIIT

Hyd

erab

ad

False Reject Rate(DTW)

IIIT

Hyd

erab

ad

Definition

Threshold-1• Control the range of variations within writers• Decided based on positive samples

Threshold-2• Confidence before rejecting other writers (negative samples)• Lower threshold-2 == Higher confidence

IIIT

Hyd

erab

ad

Effect of thresholds..(DTW and Hindi script)

IIIT

Hyd

erab

ad

Effect of thresholds.. (DTW and Hindi script)

IIIT

Hyd

erab

ad

No. of word comparisons..(DTW & Hindi script)

IIIT

Hyd

erab

ad

Effect of thresholds.. (Directional feature and Hindi script)

IIIT

Hyd

erab

ad

Effect of thresholds.. (Directional feature and Hindi script)

IIIT

Hyd

erab

ad

Effect of thresholds.. (Directional features and English script)

IIIT

Hyd

erab

ad

Effect of thresholds.. (Directional features and English script)

IIIT

Hyd

erab

ad

No. of word comparisons..(Directional & Hindi script)

IIIT

Hyd

erab

ad

No. of word comparisons..(Directional & English Script)

IIIT

Hyd

erab

ad

Number of writers Vs Accuracy(English)

IIIT

Hyd

erab

ad

Number of writers Vs Accuracy(Hindi Script)

IIIT

Hyd

erab

ad

Analysis and Summary

Writer-specific text generation framework

Automatic text generation

Automatic threshold generation

Text is Varied• Robust to forgery

IIIT

Hyd

erab

ad

Related work

• Features– Character level

• GSC features• Structural features• Directional features

– Word level• Word model recognition• Shape curvature• Shape context• Morphological features

• Feature selection– Static feature selection– PCA based discriminating

power

• Our approach– Writer-specific text generation– Boosting based framework– Text variation– Higher accuracy with limited

amount of data

IIIT

Hyd

erab

ad

Comparison

Srihari et al.[17]• Shape context, Shape curvature, GSC features, WMR features• Performance: 42%, 22%, 62% and 28% respectively (1000

writers)• Test data size- 10 words

Our approach• Directional features • Performance: 95% (20 writers) • Test data size: 5 words

IIIT

Hyd

erab

ad

Repudiation Detection in Handwriting Documents

IIIT

Hyd

erab

ad

Traditional writer identification Vs QDE

Assumption of Natural Handwriting

Biometrics Terms• Repudiation (Negative Biometrics)• Forgery (Positive Biometrics)

Quantity and quality of data available

Cost factor involved • Used as expert witness in legal Verdict

IIIT

Hyd

erab

ad

Repudiation

The rejection or renunciation of a duty or obligation (as under a contract)

Merriam-Webster's Dictionary of Law

Handwriting Repudiation • Deliberately alter his natural handwriting to avoid

detection • To deny involvement in the case

IIIT

Hyd

erab

ad

Repudiation

Comparator

Calculate Distance

Significant Distance?

1 : 1 Matching

QuestionedDocument

Data Base

ReferenceDocument

Same Writer ?

Different Writers ?

HypothesisTesting

Written by same writer?

No Database

Dis

IIIT

Hyd

erab

ad

Verify whether given documents written by same person or

differentwithout assuming Natural

Handwriting

IIIT

Hyd

erab

ad

hard problem?

Normal Handwriting Repudiated Handwriting

IIIT

Hyd

erab

ad

Challenges

With in writer variations become high

Between-writer variations become less as compared.

Learning can’t be done as data is not available.

IIIT

Hyd

erab

ad

Ray of Hope

One can’t exclude from one’s own writing, those discriminating elements of which he/she is not aware

Maximum and minimum velocity points remain the same in-spite of absolute velocity.

Words have significant overlap at sub-character level.

IIIT

Hyd

erab

ad

Framework

• Statistically significant score between two documents.

• Utilize online information that can be available

• No assumptions about distribution of data.• May lead to erroneous conclusions.

IIIT

Hyd

erab

ad

Assumptions

• Questioned and reference document either have significant overlap or are same at word level.

• Reference document is collected in online mode.

IIIT

Hyd

erab

ad

System Framework Hypothesis Testing

Word Segmentation

Word Comparison

IIIT

Hyd

erab

ad

Hypothesis Testing

• To calculate significance of distance between two distributions.

• According to Neyman Pearson paradigmH0 : Documents written by same writer (Null Hypothesis)

H1 : Document written by different writers (Alternative Hypothesis)

• Intra-document word distances and inter-document word distances are two distribution to be compared.

• Distributions are compared to find out whether they are generated from same population.

IIIT

Hyd

erab

ad

Distribution Comparison

• KL divergence test (make assumptions on nature of distribution)

• Kolmogorov Smirnov Test (don’t make any assumptions)

IIIT

Hyd

erab

ad

Results

• Data being collected from 23 different users in English.

• Each users 3 pages of normal data and 3 pages of repudiated data is collected.

• Preprocessing: – Words are segmented using semi-automatic toolkit for word

segmentation.

IIIT

Hyd

erab

ad

Results

Intra-document distance

Inter-document distance

IIIT

Hyd

erab

ad

ROC Curve

Genuine Rejection – 82% @ Genuine Acceptance – 100%

IIIT

Hyd

erab

ad

Analysis of Results

• Semi automatic System

• Used as an aid to expert

• Null Hypothesis is never accepted without expert intervention.

-1 1 0

Similar Different

strong probability of identification

probable

indications

no conclusion

indications did not

probably did not

strong probability did not

Scale Used by Forensic Experts

IIIT

Hyd

erab

ad

Conclusion and Future work

Learning based framework to learn similarity, in-spite of discrimination between documents.

Can we tell whether writer is trying to repudiate.

Framework which can learn more features and can give independent scores on each feature.

IIIT

Hyd

erab

ad

Conclusions

Proposed algorithms for automatic identification and extraction of discriminating features for online handwriting

Framework proposed for writer-specific text generation and text variations for text-dependent systems

Introduced the problem of repudiation and proposed a hypothesis testing based framework for the same

IIIT

Hyd

erab

ad

Sachin Gupta and Anoop M. Namboodiri, Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometrics (ICB'07), PP. 356-365 Seoul, Korea, 27-29 August, 2007.

Anoop M. Namboodiri and Sachin Gupta Text Independent Writer Identification from Online Handwriting , International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congress Atlantia, France.

Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification using Boosting, In proceedings of International Conference on Frontiers in Handwriting Recognition (ICFHR’08), Montreal, Canada

Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification, planned in IEEE Transactions on Information Forensics and Security, 2008

Publications

IIIT

Hyd

erab

ad

Future work

Fusion of online and offline features for higher accuracies

Can we automatically detect person intention to repudiate or forge • Based on single document

More robust algorithms for feature extraction• Different than standard feature selection approaches

IIIT

Hyd

erab

ad

Other Projects

IIIT

Hyd

erab

ad

Face Detection

• Boosting classifiers • Simple Haar filters were used• Filter are selecting using boosting classifiers

(a) (b) (c) (d)

Paul Viola and Michael Jones, “Robust Real-time Face Detection”, International Journal of computer vision, 2004.

IIIT

Hyd

erab

ad

Object classificationlearning the in-variances

• Use multiple kernel learning framework

Shape - 3.94 Color - 0 Texture - 0

IIIT

Hyd

erab

ad

Image Segmentation

• Active Contour• Have to give initial contour • Contour adjusts itself to the object using external and internal

energy• Useful in object tracking

• Graph cuts• Represents image using graph • Find cut in the graph with minimum cut or maximum flow• Can not diverge outside will just converge inside

IIIT

Hyd

erab

ad

THANKING YOU [email protected]


Recommended