Daniel Deutch Nave Frost Amir Giladamirgilad/papers/VLDB17Presentation.pdf · 2017-08-27 · Daniel...

Post on 25-Jul-2020

4 views 0 download

transcript

Provenance for Natural Language Queries

Daniel Deutch Nave Frost Amir Gilad

Tel Aviv University

August 2017

Presented by Amir Gilad

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 1 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 2 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

ResultTel Aviv University (TAU)

(why?)

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

ResultTel Aviv University (TAU) (why?)

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

What We Have - Provenance(oname,TAU)·(aname,Tova M.)·(ptitle,OASSIS...)·(cname,SIGMOD)·(pyear,14’)+(oname,TAU)·(aname,Tova M.)·(ptitle,Querying...)·(cname,VLDB)·(pyear,06’)+(oname,TAU)·(aname,Tova M.)· (ptitle,Monitoring...)·(cname,VLDB)·(pyear,07’)+(oname,TAU)·(aname,Slava N.)·(ptitle,OASSIS...)·(cname,SIGMOD)·(pyear,14’)+(oname,TAU)·(aname,Tova M.)·(ptitle,A sample...)·(cname,SIGMOD)·(pyear,14’)+...

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Motivation

NL QueryReturn the organization of authors who published papers in database conferences after

2005

Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),

pub(wid, cid, ptitle, pyear), author(aid, aname, oid),

domainConf(cid, did), domain(did, dname),

writes(aid, wid), dname = ’Databases’, pyear > 2005

What We Want - ExplanationsTAU is the organization of 43 authors who published 170 papers

in 31 conferences in 2006 - 2015

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23

Solution Overview

Solution

Use the input question to formulate a detailed NL answer by replacingwords with values

I This is a general idea: showing provenance in a way that correspondsto the standard user interaction

When a long answer is needed, compact it using algebraicfactorization and summarization

I Here, again, we leverage the structure of the user question

Current Limitations

Only conjunctive queries are supported

Some aspects of the solution are limited to a specific NLIDBI But the general idea is not

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 4 / 23

Framework

Fact. +Sentence

Fact. +Sentence

Parser(Augmented) NaLIR(Augmented) NaLIR

BuilderQuery Builder

NL Query

NL Query

DBDBSelP

Factorization GenerationSentence

GenerationFact. +MappingFact. +

Mapping

Results + Provenance + MappingResults + Provenance + Mapping

Query + MappingQuery + MappingDep.

TreeDep. Tree

SummarizationSentenceSentenceSentenceSentence Summarized SentenceSummarized Sentence

Augment NaLIR [Fei Li, Jagadish, 15’]

Use a provenance-aware engine - SelP [Deutch et al., 15’]

Store the provenance and mappings

Translate results and provenance to NL using factorization andsummarization

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 5 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 6 / 23

Mappings

(oname, TAU)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23

Return the organization of authors who published papers in database conferences after 2005

query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,

aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =

’Databases’, pyear > 2005

Mappings

(oname, TAU)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23

Return the organization of authors who published papers in database conferences after 2005

query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,

aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =

’Databases’, pyear > 2005

Mappings

(oname, TAU)

(aname, Tova M.)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23

Return the organization of authors who published papers in database conferences after 2005

query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,

aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =

’Databases’, pyear > 2005

Mappings

(oname, TAU)

(aname, Tova M.)

(ptitle, ‘OASSIS...’)

(cname, SIGMOD)

(pyear, 2014)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23

Return the organization of authors who published papers in database conferences after 2005

query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,

aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =

’Databases’, pyear > 2005

From Mappings to an Answer

(oname, TAU)

(aname, Tova M.)

(ptitle, ‘OASSIS...’)

(cname, SIGMOD)

(pyear, 2014)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

organization

of

Tova M.

published

in

SIGMOD

in

2014

’OASSIS...’who

TAU (is the)

AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23

From Mappings to an Answer

(oname, TAU)

(aname, Tova M.)

(ptitle, ‘OASSIS...’)

(cname, SIGMOD)

(pyear, 2014)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

organization

of

Tova M.

published

in

SIGMOD

in

2014

’OASSIS...’who

TAU (is the)

AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23

From Mappings to an Answer

(oname, TAU)

(aname, Tova M.)

(ptitle, ‘OASSIS...’)

(cname, SIGMOD)

(pyear, 2014)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

organization

of

Tova M.

published

in

SIGMOD

in

2014

’OASSIS...’who

TAU (is the)

AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23

From Mappings to an Answer

(oname, TAU)

(aname, Tova M.)

(ptitle, ‘OASSIS...’)

(cname, SIGMOD)

(pyear, 2014)

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

organization

of

Tova M.

published

in

SIGMOD

in

2014

’OASSIS...’who

TAU (is the)

AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 9 / 23

Provenance Factorization

Idea

Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments

Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23

Provenance Factorization

Idea

Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments

Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]

Two Different Factorizations[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))

+ [Tova M.] · [A Sample...])

+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])

[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))

+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))

+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23

Provenance Factorization

Idea

Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments

Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]

Two Different Factorizations[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))

+ [Tova M.] · [A Sample...])

+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])

[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))

+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))

+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

Shortermeansbetter?

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23

T -CompatibilityNL Query

Return the organization of authors who published papers in database conferences after 2005

Shortest Factorization

[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))

+ [Tova M.] · [A Sample...])

+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])

As a Sentence

TAU is the organization of authors who published inSIGMOD 2014

’OASSIS...’ which was published byTova M. and Slava N.

and Tova M. published ’A sample...’

and Tova M. published in VLDB

’Querying...’ in 2014

and ’Monitoring...’ in 2007.

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23

T -Compatibility

Shortest Factorization[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))

+ [Tova M.] · [A Sample...])

+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])

Return

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

conferences ≤T authors but conferences 6≤fbad authors

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23

T -CompatibilityNL Query

Return the organization of authors who published papers in database conferences after 2005

Longer, T -Compatible Factorization

[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))

+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))

+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

As a Sentence

TAU is the organization of

Tova M. who published

in VLDB

’Querying...’ in 2006 and

’Monitoring...’ in 2007

and in SIGMOD in 2014

’OASSIS...’ and ’A sample...’

and Slava N. who published

’OASSIS...’ in SIGMOD in 2014.

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23

Factorization Algorithm

Proposition

Obtaining a minimal T -compatible factorization is coNP-hard

Algorithm

Factorize greedily: traverse the dependency tree level-by-level

For every level with mapped words, factorize their correspondingvalues in the provenance

Prioritize which values to take-out in each level by frequency

Complexity

O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23

Factorization Algorithm

Proposition

Obtaining a minimal T -compatible factorization is coNP-hard

Algorithm

Factorize greedily: traverse the dependency tree level-by-level

For every level with mapped words, factorize their correspondingvalues in the provenance

Prioritize which values to take-out in each level by frequency

Complexity

O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23

Factorization Algorithm

Proposition

Obtaining a minimal T -compatible factorization is coNP-hard

Algorithm

Factorize greedily: traverse the dependency tree level-by-level

For every level with mapped words, factorize their correspondingvalues in the provenance

Prioritize which values to take-out in each level by frequency

Complexity

O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23

Factorization Example

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23

Factorization Example

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

[TAU] ·([Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[Tova M.]·[Querying...]·[VLDB]·[2006]+[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[Tova M.]·[A sample...]·[SIGMOD]·[2014])

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23

Factorization Example

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

[TAU] ·([Tova M.] ·([OASSIS...]·[SIGMOD]·[2014]+[Querying...]·[VLDB]·[2006]+[Monitoring..]·[VLDB]·[2007]+[A sample...]·[SIGMOD]·[2014])+[Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23

Factorization Example

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))

+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))

+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23

Factorization Example

organizationPOS=NN, REL=dobj

ofPOS=IN, REL=prep

authorsPOS=NNS, REL=pobj

publishedPOS=VBD, REL=rcmod

in

conferencesPOS=NNS, REL=pobj

databasePOS=NN, REL=nn

afterPOS=IN, REL=prep

2005POS=CD, REL=pobj

paperswho

the

TAU is the organization of

Tova M. who published

in VLDB

’Querying...’ in 2006 and

’Monitoring...’ in 2007

and in SIGMOD in 2014

’OASSIS...’ and ’A sample...’

and Slava N. who published

’OASSIS...’ in SIGMOD in 2014.

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 14 / 23

Summarization

Two Levels of Summarization[TAU] ·

A

([Tova M.] ·

B

([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))

+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))

B

+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])

A

Shorter Summarized Answer Based on A

TAU is the organization of 2 authors who published

4 papers in 2 conferences in 2006 - 2014

More Detailed Summarized Answer Based on B

TAU is the organization of Tova M. who published

4 papers in 2 conferences in 2006 - 2014 and Slava N.

who published ’OASSIS...’ in SIGMOD in 2014.

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 15 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 16 / 23

Sample Use-Cases

Q: Return the authors who published papers in VLDB before 2016 andafter 2007

A: Tova M. published 16 papers in VLDB in 2008 - 2015

Q: Return the authors who published papers in database conferences

A: Tova M. published 134 papers in 18 conferences

Q: Return the organization of authors who published papers in databaseconferences after 2005

A: TAU is the organization of 43 authors who published 170 papers in31 conferences in 2006 - 2015

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 17 / 23

Sample Use-Cases

Q: Return the authors who published papers in VLDB before 2016 andafter 2007

A: Tova M. published 16 papers in VLDB in 2008 - 2015

Q: Return the authors who published papers in database conferences

A: Tova M. published 134 papers in 18 conferences

Q: Return the organization of authors who published papers in databaseconferences after 2005

A: TAU is the organization of 43 authors who published 170 papers in31 conferences in 2006 - 2015

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 17 / 23

Sample Scalability ResultsComputation time as a function of the number of assignments.Overhead of only 16% w.r.t evaluation time.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 1000 2000 3000 4000 5000

Tim

e (

sec)

Number of Assignments

Query 4

Query 5

Query 6

Query 7

Query 8

Query 9

Query 10

Query 11

Query 12

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 18 / 23

Breakdown of Computation Time

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Tim

e (

sec)

Domain of Unique Values Per Answer

Query 4 Query 5 Query 6 Qurey 7 Query 8 Query 9 Query 10 Query 11 Query 12

(a) Factorization time

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 1000 2000 3000 4000 5000

Tim

e (

sec)

Domain of Unique Values Per Answer

Query 4 Query 5 Query 6 Qurey 7 Query 8 Query 9 Query 10 Query 11 Query 12

(b) Sentence gen. time

Summarization time was negligible.

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 19 / 23

Outline

1 Introduction

2 Mappings and Answer Tree - Single Assignment

3 Factorization

4 Summarization

5 Experiments

6 Related Work and Conclusions

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 20 / 23

Related Work

NL Interfaces:

Formulate the NL query and present the answers, e.g., [Fei Li et al.,15’], [Song et al., 15’]

Present the answers in NL based on the schema [Franconi et al., 14’]

Explain the query in NL [Koutrika et al., 10’]

Provenance:

Showing the provenance in graph form, e.g., [Ailamaki et al., 98’],[Davidson et al., 08’]

Allowing user control over granularity [Cohen-Boulakia et al., 08’]

Provenance factorization and Summarization, e.g., [Chapman et al.,08’], [Olteanu et al., 12’]

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 21 / 23

Summary

Main Contributions:

First to formulate the provenance of output tuples in NL

Employing both factorization and summarization to make provenancemore understandable

Devising a criterion for provenance factorization that accounts for itspresentation in NL

Future Work:

Extend the solution to UCQs, aggregation, nested queries, ...

Support more provenance models

Generalize the requirements from NL interfaces

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 22 / 23

Summary

Main Contributions:

First to formulate the provenance of output tuples in NL

Employing both factorization and summarization to make provenancemore understandable

Devising a criterion for provenance factorization that accounts for itspresentation in NL

Future Work:

Extend the solution to UCQs, aggregation, nested queries, ...

Support more provenance models

Generalize the requirements from NL interfaces

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 22 / 23

Thank YouQuestions?

Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 23 / 23