Post on 30-Dec-2015
description
transcript
ConQuer: Efficient ConQuer: Efficient Management of Inconsistent Management of Inconsistent
DatabasesDatabases
Presented by: Presented by:
Ariel Fuxman (Univ. of Toronto)Ariel Fuxman (Univ. of Toronto)
Joint work with: Joint work with:
Renée J. Miller (Univ of Toronto)Renée J. Miller (Univ of Toronto)
Diego Fuxman (Univ. Nacional del Sur)Diego Fuxman (Univ. Nacional del Sur)
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 22
A system designed to answer SQL A system designed to answer SQL queries over queries over inconsistent inconsistent databasesdatabases
ConQuerConQuer
130K130KMaryMary
110K110KMaryMary
400K400KPaulPaul
200K200KPeterPeter
40K40KPeterPeter
IncomeIncomeNameName
namename should be theshould be the keykey
INCONSISTENT DATABASEINCONSISTENT DATABASE
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 33
One ApplicationOne Application
SalesSales
ShippingShipping
Customer SupportCustomer Support
Web FormsWeb Forms
Demographic DataDemographic Data
IntegratedIntegratedCustomerCustomerDatabaseDatabase
Customer Relationship Management Customer Relationship Management (CRM)(CRM)
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 44
Disagreement Between Disagreement Between SourcesSources
Which tuple for Which tuple for PeterPeter should we delete? should we delete?• Removing both tuples loses consistent informationRemoving both tuples loses consistent information
• Deciding the correct income may require human Deciding the correct income may require human interventionintervention
110K110K……20 Union Street20 Union StreetMaryMary
400K400K……100 Bloor Street100 Bloor StreetPaulPaul
……..
……
276 College Street276 College Street
addressaddress
40K40KPeterPeter
incomeincomenamename
400K400K……100 Bloor Street100 Bloor StreetPaulPaul
130K130K……20 Union Street20 Union StreetMaryMary
……..
……
276 College Street276 College Street
addressaddress
200K200KPeterPeter
incomeincomenamename
salessales
webweb
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 55
Inconsistent Integrated Inconsistent Integrated DatabaseDatabase
namename …… incomeincome
PeterPeter …… 40K40K
Paul Paul …… 400K400K
MaryMary …… 110K110K
namename …… incomeincome
Peter Peter …… 200K200K
PaulPaul …… 400K400K
MaryMary …… 130K130K
namename …… incomeincome
PeterPeter …… 40K40K
PeterPeter …… 200K200K
PaulPaul …… 400K400K
MaryMary …… 110K110K
MaryMary …… 130K130K
SalesSales
WebWeb
Integrated DatabaseIntegrated Database
Transfer all conflicting tuples to the integrated databaseTransfer all conflicting tuples to the integrated database
INCONSISTENT INCONSISTENT DATABASEDATABASE
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 66
Query AnsweringQuery Answering
q=“Get customers who make more than 100K”q=“Get customers who make more than 100K”
130K130K
110K110K
400K400K
200K200K
40K40K
incomeincome
webweb
salessales
sales/websales/web
webweb
salessales
MaryMary
MaryMary
PaulPaul
PeterPeter
PeterPeter
namename
Peter,Paul,MaryPeter,Paul,Mary
Peter should NOT be offered a Platinum card!!Peter should NOT be offered a Platinum card!!
Offering a Platinum credit card…Offering a Platinum credit card…
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 77
Semantics of Query AnsweringSemantics of Query Answering
Get customers who Get customers who possiblypossibly make more than make more than 100K100K• Peter, Paul, Mary Peter, Paul, Mary
Get customers who Get customers who certainlycertainly make more than make more than 100K100K• Paul, MaryPaul, Mary CONSISTENTCONSISTENT
ANSWERANSWER[Arenas et al. 99][Arenas et al. 99]
custidcustid incomeincome
PeterPeter 40K40K salessales
PeterPeter 200K200K webweb
PaulPaul 400K400K sales/websales/web
MaryMary 110K110K salessales
MaryMary 130K130K webweb
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 88
RepairsRepairs
PeterPeter 40K40K
PaulPaul 400K400K
MaryMary 110K110K
PeterPeter 40K40K
PaulPaul 400K400K
MaryMary 130K130K
PeterPeter 200K200K
PaulPaul 400K400K
MaryMary 110K110K
PeterPeter 200K200K
PaulPaul 400K400K
MaryMary 130K130K
130K130K
110K110K
400K400K
200K200K
40K40K
incomeincome
webweb
salessales
sales/websales/web
webweb
salessales
MaryMary
MaryMary
PaulPaul
PeterPeter
PeterPeter
custidcustid
Inconsistent databaseInconsistent database
RepairRepairss
Key: Key: custidcustid
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 99
CONSISTENT ANSWERSCONSISTENT ANSWERSAnswers obtainedAnswers obtained
no matter which repair no matter which repair we choosewe choose
Consistent Query AnswersConsistent Query Answers
PeterPeter 40K40K
PaulPaul 400K400K
MaryMary 110K110K
PeterPeter 40K40K
PaulPaul 400K400K
MaryMary 130K130K
PeterPeter 200K200K
PaulPaul 400K400K
MaryMary 110K110K
PeterPeter 200K200K
PaulPaul 400K400K
MaryMary 130K130K
q=q=“Get customers who make more than 100K”“Get customers who make more than 100K”
CONSISTENT CONSISTENT ANSWER=ANSWER=
{Paul,Mary}{Paul,Mary}
RepairsRepairs
MaryMary
PaulPaul
PeterPeter
MaryMary
PaulPaul
MaryMary
PaulPaul
MaryMary
PaulPaul
PeterPeter
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1010
ProblemProblem
Potentially HUGE number of repairs!Potentially HUGE number of repairs!
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1111
ConQuerConQuer
ConQuer is a system ConQuer is a system designeddesigned to to compute consistent answers compute consistent answers efficiently efficiently •avoids explicit construction of repairsavoids explicit construction of repairs
•reuses commercial database reuses commercial database technology technology
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1212
Commercial databaseCommercial databaseengineengine
ConQuer’s SolutionConQuer’s Solution
Query Query qq KeysKeys
Rewritten QRewritten Q**
ConQuer’sConQuer’sRewriting Rewriting AlgorithmAlgorithm
[ICDT 05][ICDT 05][SIGMOD 05][SIGMOD 05]
InconsistentInconsistentdatabasedatabase
Consistent Consistent answeranswer to to qq
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1313
ContributionsContributions
Rewriting algorithm Rewriting algorithm •From a large class of SPJ SQL queriesFrom a large class of SPJ SQL queries
•Into SQL queriesInto SQL queriesRewriting for queries with grouping and Rewriting for queries with grouping and
aggregationaggregationOptimized rewriting Optimized rewriting
•Exploits precomputed information, if Exploits precomputed information, if availableavailable
Experimental evaluation Experimental evaluation •Large databasesLarge databases
•TPC-H queriesTPC-H queries
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1414
DemoDemo
Present a case study of an Present a case study of an inconsistent database about airports inconsistent database about airports and citiesand cities
Explain the automatically generated Explain the automatically generated rewritingsrewritings
Deal with Select-Project-Join queries Deal with Select-Project-Join queries with grouping and aggregationwith grouping and aggregation
Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1515
ConQuer papersConQuer papers
A. Fuxman, E. Fazli, and R. J. Miller. A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient Management of ConQuer: Efficient Management of Inconsistent DatabasesInconsistent Databases, SIGMOD , SIGMOD 2005.2005.
A. Fuxman and R. J. Miller. A. Fuxman and R. J. Miller. First-First-Order Query Rewriting for Order Query Rewriting for Inconsistent DatabasesInconsistent Databases, ICDT 2005., ICDT 2005.