CourseInforma0on§ Instructor
B.AdityaPrakash,Torg3160F,[email protected]– OfficeHours:2:30-3:30pmMondaysandWednesdays– Andbyappointment– IncludestringCS4604insubjectinanyemailyousendme
§ TeachingAssistants
SorourAmiri,McBryde106,[email protected]– OfficeHours:TBDShamimulHasan,McBryde106,[email protected]– WillnotholdregularOfficeHours
§ ClassMee0ngTimeMondayandWednesday,4:00PM-5:15PM,LaveryHall340
§ KeepinginTouchCoursewebsitehUp://courses.cs.vt.edu/~cs4604updatedregularlythroughthesemester– Piazzalinkonthewebsite
Prakash2016 VTCS4604 2
Textbook§ RequiredDatabaseManagementSystems,byRaghuRamakrishnanandJohannesGehrke.3rdEd.McGrawHill.Webpageforthebook(witherrata)hUp://pages.cs.wisc.edu/~dbbook/
§ Op8onal:– Garcia-Molina,UllmanandWidom,3rdEd.– Silberschatz,KorthandSudarshan,6thEd.
Prakash2016 VTCS4604 3
Pre-reqsandForce-adds
§ Prerequisites:agradeofCorbeUerinCS3114,seniorstanding
§ Force-addrequests:– Goto:hUps://www.cs.vt.edu/S16Force-Adds– Password:4604bap$– Surveylinkwillworkduringtheen/reclassperiod,forthefirstandsecondlectures
– Ifyoumissbothlectures,gotoMcB114andfillpaperform,andfindmetogetasignature.
Prakash2016 VTCS4604 4
CourseGrading
§ Projectisspreadover3deliverables§ Submithardcopiesofhomeworksandprojectassignmentsatthestartofclassontheduedate
§ Eachclasshasrequiredreading(oncoursewebpage)
§ NoPop-QuizzesJ
Homework 30% 6-7
Midtermexam 20% (Tenta/ve)March2,Wed.,inclass
Finalexam 30% May10,Tue.,3:25pm-5:25pm
Courseproject 20% 3assignments
Prakash2016 VTCS4604 5
CourseProject
§ Wewillputprojectoverviewlater(firstprojectassignment)
§ 2,or3personsperproject.§ Projectrunstheen/resemesterwithregularassignmentsandafinalimplementa/onassignment.
Prakash2016 VTCS4604 6
ClassPolicies§ Makesureyougothroughthedetailedpoliciesonwebsite:
hUp://courses.cs.vt.edu/~cs4604/Spring16/policies.html
§ Lectures:Informmeinadvance,ifyouhavetoleaveaclassearlyorcomelateforanyreason.
§ Latepolicy:4‘slip’days(tobeusedonlyforHWsnotproject)
§ Howtosubmitlate:seewebpage§ Exams:noaidsallowed,except:– 1pagewithyournotes(bothsides),forthemidterm– 2suchpages,forthefinal
Prakash2016 VTCS4604 7
WhyStudyDatabases?§ Academic– Databasesinvolvemanyaspectsofcomputerscience– Fer/leareaofresearch– ThreeTuringawardsindatabases
§ Programmer– aplethoraofapplica/onsinvolveusingandaccessingdatabases
§ Businessman– Everybodyneedsdatabases=>lotsofmoneytobemade
§ Student– GetthoselastthreecreditsandIdon’thavetocomebacktoBlacksburgeveragain!
– Google,Oracle,Microsop,Facebooketc.willhireme!– Databasessoundcool!– ???
Prakash2016 VTCS4604 8
WhatWillYouLearninCS4604?§ Implementa/on
– Whatisunder-the-hoodofaDBlikeOracle/MySQL?§ Design
– Howdoyoumodelyourdataandstructureyourinforma/oninadatabase?
§ Programming– Howdoyouusethecapabili/esofaDBMS?
§ CS4604achievesabalancebetween– afirmtheore/calfounda/ontodesigningmoderate-sizeddatabases
– crea/ng,querying,andimplemen/ngrealis/cdatabasesandconnec/ngthemtoapplica/ons
Prakash2016 VTCS4604 9
CourseOutline§ Weeks1–4:Query/
Manipula/onLanguagesandDataModeling– Rela/onalAlgebra– Datadefini/on– ProgrammingwithSQL– En/ty-Rela/onship(E/R)approach
– SpecifyingConstraints– GoodE/Rdesign
§ Weeks5–8:Indexes,ProcessingandOp/miza/on– Storing– Hashing/Sor/ng– QueryOp/miza/on– NoSQLandHadoop
§ Week9-10:Rela/onalDesign– Func/onalDependencies– Normaliza/ontoavoidredundancy
§ Week11-12:ConcurrencyControl– Transac/ons– LoggingandRecovery
§ Week13–14:Students’choice– Prac/ceProblems– XML– Dataminingandwarehousing
Prakash2016 VTCS4604 10
WhatisthegoalofaDBMS?
§ Electronicrecord-keepingFastandconvenientaccesstoinforma/on§ DBMS==databasemanagementsystem– `Rela/onal’inthisclass– data+setofinstruc/onstoaccess/manipulatedata
Prakash2016 VTCS4604 11
WhatisaDBMS?§ FeaturesofaDBMS– Supportmassiveamountsofdata– Persistentstorage– Efficientandconvenientaccess– Secure,concurrent,andatomicaccess
§ Examples?– Searchengines,bankingsystems,airlinereserva/ons,corporaterecords,payrolls,salesinventories.
– Newapplica/ons:Wikis,social/biological/mul/media/scien/fic/geographicdata,heterogeneousdata.
Prakash2016 VTCS4604 12
FeaturesofaDBMS• Supportmassiveamountsofdata
– Giga/tera/petabytes– Fartoobigformainmemory
• Persistentstorage– Programsupdate,query,manipulatedata.– Datacon/nuestolivelongaperprogramfinishes.
• Efficientandconvenientaccess– Efficient:donotsearchen/redatabasetoansweraquery.– Convenient:allowuserstoquerythedataaseasilyaspossible.
• Secure,concurrent,andatomicaccess– Allowmul/pleuserstoaccessdatabasesimultaneously.– Allowauseraccesstoonlytoauthorizeddata.– Providesomeguaranteeofreliabilityagainstsystemfailures.
Prakash2016 VTCS4604 13
ExampleScenario
§ Students,takingclasses,obtaininggrades– FindmyGPA– <andotherad-hocqueries>
Prakash2016 VTCS4604 14
Obvioussolu0on1:Folders
§ Advantages?– Cheap;Easy-to-use
§ Disadvantages?– Noad-hocqueries– Nosharing– LargePhysicalfoot-print
Prakash2016 VTCS4604 15
ObviousSolu0on++
§ FlatfilesandC(C++,Java…)programs– E.g.one(ormore)UNIX/DOSfiles,withstudentrecordsandtheircourses
Prakash2016 VTCS4604 16
ObviousSolu0on++
§ Layoutforstudentrecords?– CSV(‘comma-separated-values’) Hermione Grainger,123,Potions,A
Draco Malfoy,111,Potions,B
Harry Potter,234,Potions,A
Ron Weasley,345,Potions,C
Prakash2016 VTCS4604 17
ObviousSolu0on++
§ Layoutforstudentrecords?– Otherpossibili/eslikeHermione Grainger,123 123,Potions,A
Draco Malfoy,111 111,Potions,B
Harry Potter,234 234,Potions,A
Ron Weasley,345 345,Potions,C
Prakash2016 VTCS4604 18
Problems?
§ inconvenientaccesstodata(need‘C++’exper/ze,plusknowledgeoffile-layout)– dataisola/on
§ dataredundancy(andinconsistencies)§ integrityproblems§ atomicityproblems§ concurrent-accessproblems§ securityproblems§ …….
Prakash2016 VTCS4604 19
Problems-Why?
§ Twomainreasons:– file-layoutdescrip/onisburiedwithintheCprogramsand
– thereisnosupportfortransac/ons(concurrencyandrecovery)
Prakash2016 VTCS4604 20
DBMSshandleexactlythesetwoproblems
ExampleScenario§ RDBMS=“Rela/onal”DBMS§ Therela/onalmodelusesrela/onsortablestostructuredata§ ClassListrela/on:
§ Rela/onseparatesthelogicalview(externals)fromthephysicalview(internals)
§ Simplequerylanguages(SQL)foraccessing/modifyingdata– FindallstudentswhosegradesarebeUerthanB.– SELECTStudentFROMClassListWHEREGrade>“B”
Student Course Grade
HermioneGrainger Po/ons A
DracoMalfoy Po/ons B
HarryPoUer Po/ons A
RonWeasley Po/ons C
Prakash2016 VTCS4604 21
Transac0onProcessing§ Oneormoredatabaseopera/onsaregroupedintoa“transac/on”
§ Transac/onsshouldmeetthe“ACIDtest”– Atomicity:All-or-nothingexecu/onoftransac/ons.
– Consistency:Databaseshaveconsistencyrules(e.g.whatdataisvalid).Atransac/onshouldNOTviolatethedatabase’sconsistency.Ifitdoes,itneedstoberolledback.
– Isola/on:Eachtransac/onmustappeartobeexecutedasifnoothertransac/onisexecu/ngatthesame/me.
– Durability:Anychangeatransac/onmakestothedatabaseshouldpersistandnotbelost.
Prakash2016 VTCS4604 23
Prakash2016 VTCS4604 25
Disadvantagesover(flat)files
§ Price§ addi/onalexper/se(SQL/DBA)(hence:over-killforsmall,single-userdatasetsBut:mobilephones(eg.,android)usesqlite)
ABriefHistoryofDBMS§ Theearliestdatabases(1960s)evolvedfromfilesystems
– Filesystems• Allowstorageoflargeamountsofdataoveralongperiodof/me• Filesystemsdonotsupport:
– Efficientaccessofdataitemswhoseloca/oninapar/cularfileisnotknown
– Logicalstructureofdataislimitedtocrea/onofdirectorystructures– Concurrentaccess:Mul/pleusersmodifyingasinglefilegeneratenon-uniformresults
• Naviga/onalandhierarchical• UserprogrammedthequeriesbywalkingfromnodetonodeintheDBMS.
§ Rela/onalDBMS(1970stonow)– Viewdatabaseintermsofrela/onsortables– High-levelqueryanddefini/onlanguagessuchasSQL– Allowusertospecifywhat(s)hewants,nothowtogetwhat(s)hewants
§ Object-orientedDBMS(1980s)– Inspiredbyobject-orientedlanguages– Object-rela/onalDBMS
Prakash2016 VTCS4604 26
TheDBMSIndustry§ ADBMSisasopwaresystem.
§ MajorDBMSvendors:Oracle,Microsop,IBM,Sybase
§ Free/Open-sourceDBMS:MySQL,PostgreSQL,Firebird.– UsedbycompaniessuchasGoogle,Yahoo,Lycos,BASF….
§ Allare“rela/onal”(or“object-rela/onal”)DBMS.
§ Amul0-billiondollarindustry
Prakash2016 VTCS4604 27
Prakash2016 VTCS4604 28
Fundamentalconcepts
§ 3-levelarchitecture§ logicaldataindependence§ physicaldataindependence
Prakash2016 VTCS4604 30
3-levelarchitecture
§ viewlevel§ logicallevel:eg.,tables– STUDENT(ssn,name)– TAKES(ssn,cid,grade)
§ physicallevel:– howarethesetablesstored,howmanybytes/aUributeetc
Prakash2016 VTCS4604 31
3-levelarchitecture
§ viewlevel,eg:– v1:selectssnfromstudent– v2:selectssn,c-idfromtakes
§ logicallevel§ physicallevel
Prakash2016 VTCS4604 32
3-levelarchitecture
§ ->hence,physicalandlogicaldataindependence:
§ logicalD.I.:– ???
§ physicalD.I.:– ???
Prakash2016 VTCS4604 33
3-levelarchitecture
§ ->hence,physicalandlogicaldataindependence:
§ logicalD.I.:– canadd(drop)column;add/droptable
§ physicalD.I.:– canaddindex;changerecordorder
Prakash2016 VTCS4604 34
Databaseusers
§ ‘naive’users§ casualusers§ applica/onprogrammers§ [DBA(Databaseadministrator)]
Prakash2016 VTCS4604 36
``Naive’’users
Pictorially:
DBMS
data
andmeta-data=catalog
app.(eg.,reportgenerator)
Prakash2016 VTCS4604 39
DBAdministrator(DBA)
§ schemadefini/on(‘logical’level)§ physicalschema(storagestructure,accessmethods
§ schemasmodifica/ons§ gran/ngauthoriza/ons§ integrityconstraintspecifica/on
Prakash2016 VTCS4604 40
Overallsystemarchitecture
§ [Users]§ DBMS– queryprocessor– storagemanager– transac/onmanager
§ [Files]
Prakash2016 VTCS4604 41
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2016 VTCS4604 42
Overallsystemarchitecture
§ queryprocessor– DMLcompiler– embeddedDMLpre-compiler– DDLinterpreter– Queryevalua/onengine
Prakash2016 VTCS4604 43
Overallsystemarchitecture(cont’d)
§ storagemanager– authoriza/onandintegritymanager– transac/onmanager– buffermanager– filemanager
Prakash2016 VTCS4604 44
Overallsystemarchitecture(cont’d)
§ Files– datafiles– datadic/onary=catalog(=meta-data)– indices– sta/s/caldata
Prakash2016 VTCS4604 45
Someexamples:
§ DBAdoingaDDL(datadefini/onlanguage)opera/on,eg.,createtablestudent...
Prakash2016 VTCS4604 46
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2016 VTCS4604 47
Someexamples:
§ casualuser,askingforanupdate,eg.:updatestudentsetnameto‘smith’wheressn=‘345’
Prakash2016 VTCS4604 48
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2016 VTCS4604 49
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2016 VTCS4604 50
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2016 VTCS4604 51
Someexamples:
§ app.programmer,crea/ngareport,egmain(){....execsql“select*fromstudent”...}
Prakash2016 VTCS4604 52
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
pgm(src)
Prakash2016 VTCS4604 54
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
pgm(src)
Prakash2016 VTCS4604 55
Conclusions
§ (rela/onal)DBMSs:electronicrecordkeepers§ customizethemwithcreatetablecommands§ askSQLqueriestoretrieveinfo