Recent Advances in Automated Fact
CheckingImmanuel TrummerCornell University
Automation & Fact Checking
Automation & Fact Checking
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Claim
IdentifyingCheck-Worthy
Claims
Automation & Fact Checking
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Claim Check
Lorem Ipsum ...
Lorem Ipsum ... Lorem
Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
IdentifyingCheck-Worthy
Claims
MatchingClaims toChecks
Verification
Automation & Fact Checking
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Claim Check
Lorem Ipsum ...
Lorem Ipsum ... Lorem
Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
Lorem Ipsum ...
IdentifyingCheck-Worthy
Claims
MatchingClaims toChecks
Talk Focus
Data-Driven Fact Checking
Claim
Verified/Refuted
Data Formula
Data-Driven Fact Checking
Claim
Verified/Refuted
Data Formula(aka. SQL Query)
Data-Driven Fact Checking
Claim
Verified/Refuted
Data Formula
Data-Driven Fact Checking
Claim
Verified/Refuted
Data Formula
(In Natural Language)
Data-Driven Fact Checking
Claim
Verified/Refuted
Data FormulaWhich data?
(In Natural Language)
Data-Driven Fact Checking
Claim
Verified/Refuted
Data FormulaWhich data? Which formula?
(In Natural Language)
International Energy Agency
International Energy Agency
Paris-based intergovernmental organization
Established in 1974, 30 member countries
Mission: serve statistics on energy sector
International Energy Agency
Paris-based intergovernmental organization
Established in 1974, 30 member countries
Mission: serve statistics on energy sector
Claims Marked Up
Claims Marked Up
Hundreds of pages, Thousands of claims ...
Claims Marked Up
Hundreds of pages, Thousands of claims ...
Verification Takes Weeks!
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
In 2017, global electricity
demand grew by 3% ...
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
In 2017, global electricity
demand grew by 3% ...
Electricity/Global/2017
Electricity/Global/2016
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
In 2017, global electricity
demand grew by 3% ...
Electricity/Global/2017
Electricity/Global/2016 D17/D16=1.03
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
In 2017, global electricity
demand grew by 3% ...
Electricity/Global/2017
Electricity/Global/2016 D17/D16=1.03
Fact Checking @ IEAClaim
Verified/Refuted
Data Formula
In 2017, global electricity
demand grew by 3% ...
Electricity/Global/2017
Electricity/Global/2016 D17/D16=1.03
The "Infodemic"
Fighting the Infodemic
Fighting the Infodemic
Fighting the Infodemic
Fighting the InfodemicClaim
Verified/Refuted
Data Formula
France has
more cases than US
F>UCDC/Confirmed/France
CDC/Confirmed/US
Fighting the InfodemicClaim
Verified/Refuted
Data Formula
France has
more cases than US
F>UCDC/Confirmed/France
CDC/Confirmed/US
Fighting the InfodemicClaim
Verified/Refuted
Data Formula
France has
more cases than US
CDC/Confirmed/France
CDC/Confirmed/USF>U
Other Use Cases
• Data journalism
• Business reports
• Scientific papers
• ...
(Demo)Verifying text summaries of relational data sets.
SIGMOD 2019S. Jo, I. Trummer, W. Yu, X. Wang, C. Yu, D. Liu, N. Mehta.
Challenges
Challenges
Text-Data Inconsistency... ...
American ...
Challenges
Multi-Claim Sentences
Text-Data Inconsistency... ...
American ...
ChallengesContext
Multi-Claim Sentences
Text-Data Inconsistency... ...
American ...
Fully Automated CheckingClaim
Translation
Formula
Evaluation
Fully Automated CheckingClaim
Translation
Formula
Evaluation
May go Wrong!
Semi-Automated CheckingClaim
Translation
Formula
Evaluation
Prob
abili
ty
Analyze Data Structure
country beer_servings spirit_servings ...... ... ... ...
Germany 346 117 ...... ... ... ...
USA 249 158 ...... ... ... ...
Analyze Data Structure
country beer_servings spirit_servings ...... ... ... ...
Germany 346 117 ...... ... ... ...
USA 249 158 ...... ... ... ...
Analyze Data Structure
country beer_servings spirit_servings ...... ... ... ...
Germany 346 117 ...... ... ... ...
USA 249 158 ...... ... ... ...
CountryGermany
USABeer Serving Spirit
... Keywords
Analyze Data Structure
country beer_servings spirit_servings ...... ... ... ...
Germany 346 117 ...... ... ... ...
USA 249 158 ...... ... ... ...
United States
America
CountryGermany
USABeer Serving Spirit
...
U.S.
Keywords
Synonyms......
Analyze Data Structure
country beer_servings spirit_servings ...... ... ... ...
Germany 346 117 ...... ... ... ...
USA 249 158 ...... ... ... ...
United States
America
CountryGermany
USABeer Serving Spirit
...
U.S.
Keywords
Synonyms......
Matches
Analyze Sentence Structure
Analyze Sentence Structure
Analyze Sentence Structure
Analyze Sentence Structure
Consider Text Structure
Claim Sentence
Paragraph
Consider Text Structure
Claim Sentence
Section
Paragraph
Consider Text Structure
Claim Sentence
Chapter
Section
Paragraph
Consider Text Structure
Claim Sentence
Chapter
Section
Paragraph
Consider Text Structure
Claim Sentence
Integrate Surrounding Keywords
Understand the Author
Translate Text
Understand the Author
Translate Text
Infer Topic
ClaimTranslationHypothesis
Understand the Author
Translate Text
Infer Topic
DocumentTopic
Hypothesis
ClaimTranslationHypothesis
(Demo)Verifying text summaries of relational data sets.
SIGMOD 2019S. Jo, I. Trummer, W. Yu, X. Wang, C. Yu, D. Liu, N. Mehta.
System OverviewLorem Ipsum ...
Data Analysis
Text Analysis
Topic Analysis
Automated AccuracyC
orre
ctne
ss C
hanc
e
0
25
50
75
100
Nr. Proposed Formulas1 2 3 4 5 6 7 8 9 10
Automated AccuracyC
orre
ctne
ss C
hanc
e
0
25
50
75
100
Nr. Proposed Formulas1 2 3 4 5 6 7 8 9 10
(Billions of possible formulas)
Automated AccuracyC
orre
ctne
ss C
hanc
e
0
25
50
75
100
Nr. Proposed Formulas1 2 3 4 5 6 7 8 9 10
(Billions of possible formulas)
Want Auto-Suggestions
Automated AccuracyC
orre
ctne
ss C
hanc
e
0
25
50
75
100
Nr. Proposed Formulas1 2 3 4 5 6 7 8 9 10
(Billions of possible formulas)
Need Human Feedback
Want Auto-Suggestions
User Study ResultsC
laim
s pe
r Min
ute
0
0.4
0.8
1.2
1.6
Verification Method
With Tool Without Tool
6x Speedup!
Mistakes Discovered
11% of claims were incorrect
7% average error of claim value
Analyzed 50 articles from major data journalism venues
Scaling It UpLorem Ipsum ...
AggChecker
Scaling It UpLorem Ipsum ...
AggChecker
Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...
Scaling It UpLorem Ipsum ...
AggChecker
Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...
Scaling It UpLorem Ipsum ...
AggChecker
Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...
Scaling It UpLorem Ipsum ...
Scrutinizer
Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Ask!
Learn
Learning Data Mapper
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Interface Optimization
Interface OptimizationWhich Question?
Interface OptimizationWhich Question?
Which Options?
Interface OptimizationWhich Question?
Which Options?
Interface OptimizationWhich Question?
Which Options? Which Order?
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Interface Optimizer
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Interface Optimizer
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Interface Optimizer
$$
$$
$$ $
$
$$
$ $
$$ $
$ $ $
$$
$
$
$
$
$
Verify Cheapest Claims First?
$$
$$
$$ $
$
$$
$ $
$$ $
$ $ $
$$
$
$
$
$
$
Verify Cheapest Claims First?
?
???? ?
?
?
??
?
?
?
??
?
? ?
?
?
Verify Interesting Claims First?
$$
$$
$$ $
$
$$
$ $
$$ $
$ $ $
$$
$
$
$
$
$
Verify Cheapest Claims First?
?
???? ?
?
?
??
?
?
?
??
?
? ?
?
?
Verify Interesting Claims First?
Consider Text Structure?
Scaling It UpLorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...Lorem Ipsum ...Lorem
Ipsum ...
Learning Data Mapper
Interface Optimizer
Claim Ordering
User Study @ IEAC
laim
s pe
r Min
ute
0
0.3
0.6
0.9
1.2
Verification Method
With Tool Without Tool
> 2x Speedup!
(Demo)Scrutinizer: fact checking statistical claims.
VLDB 2020G. Karagiannis, M. Saeed, P. Papotti, I. Trummer.
CoronaCheck Impact
CoronaCheck Impact
12,000 Users
Team
I. Trummer S. Jo G. Karagiannis N. Mehta D. Liu
W. Yu C. Yu X. Wang P. Papotti M. Saeed
Cornell Faculty PhD @ Cornell PhD @ Cornell Ugrad @ Cornell Ugrad @ Cornell
Ugrad @ Cornell Scientist @ Google Scientist @ Google Eurecom Faculty PhD @ Eurecom
Conclusion• Data-driven fact checking
• Various use cases
• Presented two tools:
• AggChecker
• Verifying text summaries of relational data sets.SIGMOD 2019 (ArXiV 2018).
• Scrutinizer
• Scrutinizer: fact checking statistical claims. VLDB 2020.