+ All Categories
Home > Documents > NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data...

NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data...

Date post: 20-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju
Transcript
Page 1: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

NAVIGATING SCIENTIFIC

LITERATURE A HOLISTIC PERSPECTIVE

Venu Govindaraju

Page 2: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Organizing Multiple Experts for Efficient Pattern Recognition Active Pattern Recognition Using Genetic Programming A Complexity Framework for Combination of Classifiers in Verification and Identification Systems Image Processing using Ontology Concepts for Image Segmentation Language Motivated Approaches for Human Action Recognition and Spotting

Towards a Globally Optimal Approach for Learning Deep Unsupervised Models

PATT

ERN

R

ECO

GN

ITIO

N

Minutia-Based Partial Fingerprint Recognition Sequential Pattern Classification without Explicit Feature Extraction Automatic Recognition of Handwritten Medical Forms for Search Engines Exploiting the Gap between Human and Machine Abilities in Handwriting Recognition for Web Security Applications

A Framework for Fingerprint Enhancement and Feature Detection

A Framework for Efficient Fingerprint Identification using a Minutiae Tree Face Modeling and Biometric Anti-spoofing using Probability Distribution Transfer Learning

Multilingual Word Spotting in Offline Handwritten Documents

BIO

ME

TR

ICS

A Novel Multi-sample Fusion Methodology for Improving Biometric Recognition

Stochastic Modeling of High-level Structures in Handwritten Word Recognition

A Stochastic Framework for Font Independent Devanagari OCR

Language Models and Automatic Topic Categorization for Information Retrieval in Handwritten Documents

Intrusion Detection using Spatial Information and Behavioral Biometrics Integrating Minutiae Based Fingerprint Matching with Local Correlation Methods

Statistical Techniques for Efficient Indexing and Retrieval of Document Images

Enhancing Cyber Security through the use of Synthetic Handwritten CAPTCHAs

Methods for Biomedical Image Content Extraction Toward Improved Multimodal Retrieval of Biomedical Articles

A Semi Supervised Framework for Handwritten Document Analysis Bayesian Background Models for Retrieval of Handwritten Documents Accents in Handwriting: A Hierarchical Bayesian Approach to Handwriting Analysis Hierarchical and Dynamic-Relational Models for Handwriting Recognition

Enhancement and Retrieval of Low Quality Handwritten Documents

Integrating Facial Expressions and Skin Texture in Dace Recognition

Probabilistic Random Field based Text Identification

DO

CU

ME

NT

AN

ALY

SIS

8/24/2015 ICDAR- 2015 2

Page 3: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 3

Page 4: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

4 8/24/2015 ICDAR- 2015

Key

Innovations Grand

Challenges

2015 +

1990-2015

Page 5: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

5

Old Order - DIA UW English Document Image Database

(Phillips, Technical report, 1996, citations: 29)

• Layout analysis

• Word, line, zone segmentation, etc.

• Performance: >90% accuracy

• Logical structure analysis

• Text entity, reading order

• Performance >99% accuracy

• Text recognition

• Performance >99% accuracy

8/24/2015 ICDAR- 2015

Page 6: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

6

Past/Current Future

Data

• Numbers

• Symbols

Information

• Processed

data

• Who/What/When

Knowledge

• Insights

• Theory

• Experiment

• Framework

Understanding

• Analysis

• Meaning

Wisdom

• Judgment

D I S C O V E R L E A R N

STATE

OF THE

ART

Scientific Process nanos gigantum humeris insidentes

8/24/2015 ICDAR- 2015

KNOWLEDGE DATA

Page 7: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Big Data

Variety

Velocity

Volume

Veracity

Source (4Vs of Big Data) : IBM

When knowledge becomes data

8/24/2015 ICDAR- 2015 7

Page 8: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8

Scientific Literature • 2009 estimate: 50 million articles; 28 thousand journals

• 1.8M articles added every year.

23 million articles

(Just biomedical literature)

45 million articles

40 million articles

Unknown (peer reviewed only)

Roughly, papers double every 10-15 years !

Variety

Velocity

Volume

Veracity

8/24/2015 ICDAR- 2015

[Meadows, 1998, p.16]

Page 9: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Volume Velocity

Variety Veracity

References Reinvention

Replicability Reputation

4V

s

4R

s

8/24/2015 ICDAR- 2015 9

Big Data Side Effects Challenges for ICDAR Community

Page 10: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

References Volume Challenge

8/24/2015 ICDAR- 2015 10

Page 11: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Reinvention Velocity Challenge

8/24/2015 ICDAR- 2015 11

Page 12: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Replicability Veracity Challenge

8/24/2015 ICDAR- 2015 12

• Nullius in verba

“On the word of no one" or "Take nobody's word for it"

Page 13: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Reputation Veracity Challenge

8/24/2015 ICDAR- 2015 13

• How Many Scientists Does It Take to Write a Paper?

Page 14: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 14

Challenges

Page 15: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 15

VOLUME

VELOCITY

VARIETY

VERACITY

BIG DATA

GRAND CHALLENGE

References Reinvention Replicability

Cognitive burden Reinventing wheel Authenticity

Page 16: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

16

Integrated learning

Scientific articles

Slides

Video talks

Tutorials

Web Blogs

Addressing the Cognitive Burden Volume

8/24/2015 ICDAR- 2015

Page 17: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Addressing Reinventing the wheel ? Velocity

8/24/2015 ICDAR- 2015 17

17

• Least square with linear constraints: one type of quadratic program in mathematics

• Isotonic regression: in statistics

Trapezoid rule: calculus 17th century Tai's Model, 1994, 254 citations

Page 18: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

18

• Dataset – UNLV/ISRI • 64 pages, 6796 blocks

• Heuristics parameters • Vertical: 15 pixels • Horizontal: 50/30 pixels

• Classes: • Text, Table, Caption, Figs

• Classifier: • Support Vector Machines

• Accuracy: • 91.73 % at block level

IBM Journal 1982

8/24/2015 ICDAR- 2015

Addressing Replicability Velocity

Page 19: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Datasets • Public

• Benchmark • Published

Experiments • Comparative results • CODE available !!

8/24/2015 ICDAR- 2015 19 8/24/2015 ICDAR- 2015

Reputation • Authors

• Lab • Journal

Citations • Only Counts ?

Addressing Authenticity Veracity

Page 20: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Veracity All citations are not equal

8/24/2015 ICDAR- 2015 20

Which citation is more trustworthy?

Sentiment analysis: Targeted NLP

Page 21: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 21 8/24/2015 ICDAR- 2015 21

Ciresan et al. 2012

Training on automatically augmented dataset: “During training the digits are randomly distorted … The MCDNN has a very low 0.23% error rate”

MNIST: 60k training, 10k testing images “Gradient-based learning applied to document recognition”, Lecun et al 1998 (Citations: 3547)

Veracity Dataset linkages

Page 22: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

OUR NEXT FRONTIER

8/24/2015 ICDAR- 2015 22

Tables, Graphs

Equations

Targeted NLP

Keyword spotting

Multimedia

Page 23: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 23

Tables Analysis 4Vs

Page 24: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Graphs Analysis 4Vs

8/24/2015 ICDAR- 2015 24

Page 25: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

25

Document awareness

Dependent variables

independent variables

regression coefficients, error

Matrix representation

Domain awareness

Operators

Symbols

8/24/2015 ICDAR- 2015

Equations Analysis Velocity

Page 26: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

26

|

Query Interface

Face Recognition FAR (0.0-0.2) vs FRR

X-Y Plots

Original Paper Original Paper

Figures

Original Paper

☐ Nature ☐ Science

☐ Tables

Advanced Search Options

CVPR

Page 27: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 27

Line graph • X Axis

• Label: AdaBoost iterations • Range: 0-5000 • Unit: -

• Y Axis • Label: Misclassification

Error • Range: 0.15-0.30 • Unit: -

• Legend: - • Number of lines: 2

Retrieve similar graphs

Query:

Page 28: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 28

SUPERVISED

SEMI-SUPERVISED

Linkages

Holistic View

• Online cursive handwriting recognition using speech recognition methods; , John Makhoul, Richard Schwartz, and George Chou ICASSP 1994

Page 29: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

29

Accelerated Discovery

8/24/2015 ICDAR- 2015

DBN

Handwriting

HMM Speech RNN

+30% +10%

? ?

Dropout

Page 30: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

30 8/24/2015 ICDAR- 2015

Key

Innovations Grand

Challenges

2015 +

1990-2015

Page 31: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Lexicons

Fusion

Retrieval

Security

Handwriting Recognition Key Innovations

Page 32: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Summary 8/24/2015 ICDAR- 2015 32

• 4Vs of Scientific Big Data • 4 Rs: References, Reinvention,

Replicability, Reputation

Grand Challenges

• Accelerated Discovery :

Supervised linkages, heuristics; • Integrate learning channels

Grand

Opportunities

• Handwriting Recognition: Lexicons; Fusion; Retrieval;

Security Key

Innovations

Page 33: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

8/24/2015 ICDAR- 2015 33

Special Thanks to All my students and colleagues

especially to colleagues

Srirangaraj Setlur and Ifeoma Nwogu

Page 34: NAVIGATING SCIENTIFIC LITERATURE · Data •Numbers •Symbols Information •Processed data •Who/What/When Knowledge •Insights •Theory •Experiment •Framework Understanding

Thank You [email protected]


Recommended