8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
1/47
TheRelationalModel
Ser eAbiteboul
INRIASaclay,CollgedeFranceetENSCachan
20/03/2012 1
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
2/47
epr nc p es
Abstraction n versa ty
Independence
s rac on:
e
re a ona
mo eUniversality: mainfunctionalities
Independence: theviewsrevisited
OptimizationComplexityandexpressiveness
Conclusion
20/03/2012 2
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
3/47
Theprinciples
3/20/2012 3
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
4/47
Goal:the
management
of
large
amounts
of
data
argeamoun so a a: a a ase
Softwarethatdoesthis:DBMS
,
Characteristicsofthedata
Size(giga,tera,etc.).
Sharedamong
many
users
and
programs
Maybedistributedgeographically
Heterogeneousstorage:harddisk,network
3/20/2012 4
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
5/47
Thedatamanagementsystemactsasamediatorbetweenintelligentusers
andobjectsthatstoreinformation
, m , ,
Bogart)Sance(t,s,h))
O et quelle
heure puis-je
voir un film
intget(intkey){
inthash=(key%T
S);while(table[h
ash =NULL&&ta
Thequestions
are
translated
into
first
order
logic
and
then
into
programs
ble[hash]
>getKey()=key)
hash=(hash
withpreciseandunambiguoussyntaxandsemantics
Alicedoesnotwanttowritethisprogram;shedoesnothaveto
3/20/2012 5
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
6/47
st
Datamodel Definitionlanguage fordescribingthedata
Manipulationlanguage
(queries
and
updates)
Simpledatastructure Relations
Trees
Graphs
Formallanguageforqueries og cs
Declarativevs.Procedural
Graphical
languages
3/20/2012 6ComplexgraphicalquerieswithMSAccess
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
7/47
Therelationalmodel:Codd1970
Dataare
represented
as
tables
Queriesareexpressedinrelationalcalculus:
declarative
Inpractice,
aricher
language:
SQL
ery success u o sc en ca yan n us r a y CommercialsystemssuchasOracle,IBMsDB2
DBMSonpersonalcomputerssuchasMSAccess
3/20/2012 7
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
8/47
nd
sare es gne ocap urea a a n ewor
for
allkinds
of
applications Richfunctionalities:seefurther
Inreality
Toointenseapplicationsrequirespecialized
softwareTodaymoreandmorespecializedsystems
3/20/2012 8
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
9/47
Weneedservicessuchas
Concurrency
and
transactions e a yan ecur y
Datadistribution
Scaling
Volumeofdata
Volumeofrequests
Performance Responsetime: Thetimeperoperation
Throughput: Thenumberofoperationspertimeunit
3/20/2012 9
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
10/47
Largevarietyofapplicationswithimportant
needsfor
data
management
Twomainclasses
OLTP:Online
Transaction
Processing
Transactional
Ecommerce,banking,etc..
Simpletransactions,knowninadvance
*
OLAP:Online
Analytical
Processing
Decision
making
Businessintelli ence ueries
Oftenverycomplexqueriesinvolvingaggregatefunctions
Multidimensionalqueries:
e.g.,
date,
country,
product
3/20/2012 10
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
11/47
3rd principle:
Independencephysical/logical/external
Separationinto
three
levels
External level
Physicallevel:physicalorganizationofdataondisk,diskmanagement,schemas,indexes,transaction,log
Lo ic:lo icalor anizationofdatainaschema uer
Logicallevel
andupdate
processing
Externally: views,API,programmingenvironments
Independence
Physical:
We
can
change
the
physical
organization
w ou c ang ng e og ca eve
Logical:Wecanevolvethelogicallevelwithoutmodifyingtheapplications
External:
We
can
change
or
add
views
without
affectingthelogicallevel
3/20/2012 11
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
12/47
Abstraction
Therelationalmodel
20/03/2012 12
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
13/47
20/03/2012 13
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
14/47
qHB ={s,h| d,t(Film(t,d, HumphreyBogart)Sance
(t,
s,
h
)
}
Inpractice,usingasyntaxthatiseasiertounderstand:
:
select salle,heure
,
where Film.titre =Sance.titre and acteur= HumphreyBogart
3/20/2012 14
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
15/47
inalgebraic
evaluated
efficiently
20/03/2012 15
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
16/47
Trees
IMS,IBM
late
60s,
70s
Graphs
Codasyl
Stillveryused
Ahierarchyofrecordswith
Agraphofrecordswithkeys
keys
Supplier(sno,
sname,sadd)Supplier(sno,
sname,sadd)
Part(pno,
pname)
Littleabstraction
Part(pno,
pname,qty,Order(ono,
Languages Navigational
3/20/201216
price)q y,pr ce
Procedural
Recordatattime
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
17/47
Trees
XML
Graphs
SemanticWeb
&
RDF
ExchangeformatfortheWeb
Formatforrepresentingknowledge
Standard Querylanguages:Xpath,
Standard Querylanguage:SPARQL
Xquery
Developing
very
fast
Developingveryfast
Abstraction
3/20/2012 17
Highlevel
languages
Wewilldiscussthem
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
18/47
Universality:functionalities
Herewithaver relationalview oint
3/20/2012 18
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
19/47
Thecoreoftheproblem
Beable
to
support
Terabytesofdata
Millionsofrequestsperday
Forthis
two
main
tools
Parallelism
20/03/2012 19
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
20/47
awsa ou e a a Toprotectdata Todesignschemas
Tooptimize
queries
To
explain
data
Examples Sance[titre] Film[titre] Inclusiondependency
Sance:salle heure titre Functional
dependencies
Onlyonemovieisshownatatimeinatheater
t,s,h(Sance(t,s,h) d,a(Film(t,d,a))) tgds
t,t,
s,
h(Sance(t,
s,
h)
Sance(t,s,
h) t=t)
egds
Someofthemostsophisticatedevelopmentsindbtheory
3/20/2012 20
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
21/47
ses mp e epen enc esup ocomp exseman c a amodels
Person Child Car
John Toto BMW
John Toto 2chevauxJohn Zaza BMW
John Zaza 2Chevaux
Sue Lulu
Updateanomalies
u va ues
3/20/2012 21
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
22/47
Atomicity:thesequenceofoperationsisindivisible;incaseoffailure,either
all operationsarecompletedorallarecanceled
Consistenc :The
consistenc
ro ert
ensures
that
an
transaction
the
databaseperformswilltakeitfromoneconsistentstatetoanother.(So,
consistencystatesthatonlyconsistentdatawillbewrittentothe
.
Isolation:When
two
transactions
A
and
B
are
executed
at
the
same
time,
the
changesmadebyAarenotvisibletoBuntiltransactionAiscompleted
an va ate comm t .
Durability:Oncevalidated,thestateofthedatabasemustbepermanent,and
notechnical
problem
should
lead
to
cancelling
of
transaction
operations
20/03/2012 22
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
23/47
TheDBMSmustresisttofailures
A
variety
of
techniques Journal
Backupcopies
Hotstandby:
second
system
running
simultaneously
asreasonableforanapplication
3/20/2012 23
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
24/47
Typicallythecase
Whenintegratingseveraldatasources
Organizationswith
many
branches
Activitiesinvolvingseveralcompanies
Whenusingdistributiontogetbetterperformance
Datalocalization
&
global
query
optimization
Datafragmentation
Typicallyhorizontalpartitioning
Distributed
transactions Twophasecommit
TypicallytooheavyforWebapplications
3/20/2012 24
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
25/47
Security
Protect
content
against
unauthorized
users
(humans
or
programs
Confidentiality:accesscontrol,authentication,authorization
Datacleaning
Datamining
Data
streaming
Spatiotemporaldata
Etc.
20/03/2012 25
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
26/47
Independence:views
20/03/2012 26
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
27/47
Definition:
Functionf:
Database
View
Oneofthemostfundamentaltopicsindatabases
db1
db2
v1
db3Database
states
states
db4
3/20/2012 27db6
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
28/47
Classicalquery
Defineview
Unisys.com/snow(Aspen)
Implicitdefinitionandrecursion
Yahoo.com/GetHotels(Aspen)
DatalogDependencies(tgds)
Mixbetween
XMLn
Colorado
resor resort
3/20/2012 28
n
Aspen
n
LakeTahoe
f gt
2meters1
meter
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
29/47
ntent ona ater a ze
Update:do
nothing
uer :com lexUpdate:
propagate
Base view:costly
viewmaintenance
View
base:
ambiguous
Query:simpleQueryvs.Update
The databasetradeoff
3/20/2012 29
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
30/47
Intentional:mediatior Materialized:warehouse
Queries
are
complex Updates
are
complex
Definitions
Global
as
view:
v
=
(db1,
,
dbn) Localasview: dbi=i(v) foreachI
Arbitrarycomplexconstraints betweenthedatabaseandtheviews Sometimescalledalignmentsbetweenthem
3/20/2012 30
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
31/47
Optimization
20/03/2012 31
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
32/47
Thequeriesarebasedonrelationalcalculus,alogicallanguage,
simpleandunderstandablebypeopleespeciallyinvariants
Acalculusquerycaneasilybetranslatedintoanexpressionof
Relationalalgebra
is
alimited
model
of
computation
(it
does
not
allowcom utin arbitrar functions .Thatiswh itis ossible
tooptimizealgebraicexpressionsevaluation
Finally,for
this
language,
parallelism
allows
scaling
to
very
large
databases(classAC0)
3/20/2012 32
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
33/47
Foreachsinsancedo complexityin n2
(b) If fewtuplespasstheselection complexityin n
(c) Usingtheindex complexity constant
3/20/2012 33
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
34/47
20/03/2012 34
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
35/47
Usingaccessstructures
Hash
Usingsophisticatedalgorithm
Costevaluation
to
select
an
execution
plan
Technique:Rewritequeriesbasedonheuristicstoexploreonly
art of it
3/20/2012 35
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
36/47
Theseproblemscangreatly
benefitforparallelismFiltre
fyp ca y v e e a a
Thisisnottrueforallproblems
ff
3/20/2012 36
ff
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
37/47
Complexityandexpressivity
20/03/2012 37
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
38/47
Complexity
http://www.cs.rice.edu/~vardi/papers/sigmod08.pdf
omp ex ty: ora xe queryq,
Testinggiven(I,t)whethertisinq(I)asafunctionofthe
FocusonBooleanquerytonotdependonoutputsize
Verydifferent
and
if
mixed
the
dependency
on
query
t icall hidesthede endenc onthedata
Datacomplexityasafunctionofthesizeofthedata(queryfixed)
Querycomplexityasafunctionofthesizeofthequery(datafixed)
3/20/2012 38
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
39/47
Relationalcalculusisinlogspace
The
test
can
be
performed
using
a
space
logarithmic
in
the
sizeofthedata
Thisisprimarilybecausethearityoftablesisfixed;soa
tup euses ogspace
logspace NC ( ptime)
Goodpotentialforparallelization;seefurther
20/03/2012 39
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
40/47
Thecomplexityispspace
Intuition:an
intermediary
result
may
be
very
large
is
it
isthejoinofmanyrelations
Dependsmoreonthenumberofvariablesusedinthe
query
that
in
its
actual
sizeNaiveevaluationof(PiA(RjoinS))requiresmorespacethat
thatof(PiA(R)cap^PiA(S))
Polynomial
in
the
tree
width
3/20/2012 40
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
41/47
Datacomplexity:Constantparalleltime
AC0
Acomplexityclassusedincircuitcomplexity
Theproblemsthatmaybesolvedwithcircuitsofconstantdepth
an po ynom a s ze,w t un m te an n gatesan
gates.
20/03/2012 41
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
42/47
Onecannotcomputetransitiveclosure
Addafixed
point
Inflationary:fixpoint
Ornot:while
Vardi theorem:with
an
order
on
the
domain
while=pspace
3/20/2012 42
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
43/47
Onecannottestifarelationhasanevennumberof
tuplesAbiteboulVianu
Characterizationofwhatcanbecomputedwithfixpoint
and
whileTheorem:fixpoint=whileiff ptime=pspace
3/20/2012 43
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
44/47
Conclusion
20/03/2012 44
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
45/47
Andthen:alwaysquestioneverything
Revisitthe
models,
languages,
principles
Why?
Toscaletoalwaysmoredataandqueries
Tosupportextremeapplicationsthatcannotbesupportedbystandardtechnology:
Visatransactions
To
facilitate
application
development Tooffermoreintermsofperformance,reliability,security,etc..
3/20/2012 45
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
46/47
Relationalmodel BeyondEntriesinrelations =atomic values Entriesaresetofvalues
Missingdata,probabilisiticdata
ACID Weakerconcurrency
Universal Specialized: noSQL
Dataarepersistant Queriesondataflows
Dataarestatic Data&behavior:Objectdatabases
c ve
a a ases
Constraints arestatic(FDs,etc.) Triggers
3/20/2012 46
8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel
47/47
Merci!
20/03/2012 47