+ All Categories
Home > Documents > Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational...

Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational...

Date post: 24-Mar-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
65
Wolf-Tilo Balke Jan-Christoph Kalo Institut für Informationssysteme Technische Universität Braunschweig www.ifis.cs.tu-bs.de Relational Database Systems 1
Transcript
Page 1: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Wolf-Tilo Balke

Jan-Christoph Kalo

Institut für Informationssysteme

Technische Universität Braunschweig

www.ifis.cs.tu-bs.de

Relational

Database Systems 1

Page 2: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Attributes can be renamed

– SQL uses the AS keyword for renaming

– New names can not be used in the WHERE clause

• Example – SELECT person.person_name AS name

FROM person WHERE name = 'Smith‘

– SELECT person.person_name AS name

FROM person WHERE person_name = 'Smith‘

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Correction – Renaming in SQL

Page 3: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• First off

– We will post up-to-date information on our website

• Language

– exam of tasks will be in German

– … but you may answer either in English, German, or Denglisch

• Content

– all content from the lecture or exercises may come up in the exams

• except content that was only in detours and not in an exercise

– This of course includes also lectures 10-14…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

Exam Facts

Page 4: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• SQL Syntax

– Use the syntax as introduced in the lecture and

exercises

• e.g. You are not allowed to use the Postgres Inheritance feature

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

Exam Facts

Page 5: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Cheat Sheets

– you may bring two hand-writtentwo-sided DIN A4 pages with notes

• No photocopies, print-outs, etc.

• Date

– the exam will be written on March 17,2017, from 13:00 until 14:30/15:00

• Duration

– 90 min or 120 min depending on your examregulations

• Room allocations will be announced on the website

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Exam Facts

Page 6: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

14.1 Towards NoSQL & NewSQL

14.2 Server Hardware at Google

14.3 Example: CouchDB

14.4 Outlook: Next Semester

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Towards NoSQL & NewSQL

Page 7: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• NoSQL and special databases have been popularized by different communities and a driven by different design motivations

• Base motivations– Extreme Requirements

• Extremely high availability, extremely high performance, guaranteed low latency, etc.– e.g. global web platforms

– Alternative data models • Less complex data model suffices

• (More complex) non-relational data model necessary– e.g. multi-media or scientific data

– Alternative database implementation techniques• Try to maintain most database features but lessen the drawbacks

– e.g. “traditional” database applications, e.g. VoltDB

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

14.1 Towards NoSQL & NewSQL

Page 8: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Traditional databases are usually all-purpose systems

– e.g. DB2, Oracle, MySQL, …

– Theoretically, general purpose DB provide all features to develop any data driven application

– Powerful query languages

• SQL, can be used to update and query data; even very complex analytical queries possible

– Expressive data model

• Most data modeling needs can be servedby the relational model

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

14.1 Towards NoSQL & NewSQL

Page 9: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

– Full transaction support

• Transactions are guaranteed to be “safe”

– i.e. ACID transaction properties

– System durability and security

• Database servers are resilient to failures

– Log files are continuously written

» Transactions running during a failure can recovered

– Most databases have support for constant backup

» Even severe failures can be recovered from backups

– Most databases support “hot-standby”

» 2nd database system running simultaneously which can take over in case of severe failure of the primary system

• Most databases offer basic access control

– i.e. authentication and authorization

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

14.1 Towards NoSQL & NewSQL

Page 10: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• In short, databases could be used as storage

solutions in all kinds of applications

• Higher scalability can be achieved with

distributed databases, having all features

known from classical all-purpose databases

– In order to be distributed, additional mechanisms are

needed

• partitioning, fragmentation, allocation, distributed

transactions, distributed query processor,….

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

14.1 Towards NoSQL & NewSQL

Page 11: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• However, classical all-purpose databases may lead to problems in extreme conditions– Problems when being faced with massively high query

loads• i.e. millions of transactions per second

• Load to high for a single machine or even a traditional distrusted database– Limited scaling

– Problems with fully global applications• Transactions originate from all over the globe

• Latency matters!– Data should be geographically close to users

• Claims:– Amazon: increasing the latency by 10% will decrease the sales by 1%

– Google: increasing the latency by 500ms will decrease traffic by 20%

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

14.1 Towards NoSQL & NewSQL

Page 12: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

– Problems with extremely high availability constraints

• Traditionally, databases can be recovered using logs or

backups

• Hot-Standbys may help during repair time

• But for some applications, this is not enough:

Extreme Availability (Amazon)

– “… must be available even if disks are failing, network routes are

flapping, and several data centers are destroyed by massive

tornados”

– Additional availability and durability

concepts needed!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

14.1 Towards NoSQL & NewSQL

Page 13: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Problems with emerging applications requiring

new data models

– Traditional databases rely on the relational model which

is not optimal for many new applications

• e.g. scientific data management like genome databases, geo-

information databases, etc.

• e.g. for handling data streams and massive volumes of sensor data

• e.g. for handling knowledge networks and reasoning

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

14.1 Towards NoSQL & NewSQL

Page 14: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• In extreme cases, specialized database-like

systems may be beneficial

– Specialize on certain query types

– Focus on a certain characteristic

• i.e. availability, scalability, expressiveness, etc…

– Allow weaknesses and limited features for other

characteristics

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

14.1 Towards NoSQL & NewSQL

Page 15: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• In the recent years, discussing “NoSQL”

databases has become very popular

– Careful: big misnomer!

• Does not necessarily mean that no SQL is used

– There are SQL-supporting NoSQL systems…

• NoSQL often refers to “non-standard” architectures for

database or database-like systems

– i.e. system not implemented as shown in RDB2

– Sometimes, the label NewSQL is also used

• Not formally defined, more used as a “hype” word

– Popular base dogma: Keep It Stupid Simple!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

14.1 Towards NoSQL & NewSQL

Page 16: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• The NoSQL movement popularized the development of special purpose databases

– In contrast to general purpose systems like e.g. Postgres

• NoSQL usually means one or more of the following

– Being massively scalable• Usually, the goal is unlimited linear scalability

– Being massively distributed

– Being extremely available

– Showing extremely high OLTP performance• Usually, not suited for OLAP queries

– Not being “all-purpose”• Application-specific storage solutions showing some database

characteristics

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

14.1 Towards NoSQL & NewSQL

Page 17: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

– Not using the relational model• Usually, much simpler data models are used

• Some, much more complex data models are used (XML, Logic-based, objects, etc.)

– Not using strict ACID transactions• No transactions at all or weaker transaction models

– Not using SQL• But using simpler query paradigms

– Especially, not supporting “typical” query interfaces• i.e. JDBC

• Offering direct access from application to storage system

– System is cloud-based, i.e. not installed on a local server• System managed by a 3rd party

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17

14.1 Towards NoSQL & NewSQL

Page 18: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• In short:

– Many NoSQL & NewSQL

focus on

building specialized

high-performance data

storage systems!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18

14.1 Towards NoSQL & NewSQL

Page 19: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• NoSQL and special databases have been popularized by different communities and a driven by different design motivations– Extreme Requirements

• Extremely high availability, extremely high performance, guaranteed low latency, etc.– e.g. global web platforms

– Alternative data models • Less complex data model suffices

– See https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

• (More complex) non-relational data model necessary– e.g. multi-media or scientific data

– Alternative database implementation techniques• Try to maintain most database features but lessen the drawbacks

– e.g. “traditional” database applications, e.g. VoltDB

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

14.1 Towards NoSQL & NewSQL

Page 20: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Classification System

Key-Value CacheCoherence, eXtreme Scale, GigaSpaces, Hazelcast, Infinispan, JBossCache, Memcached, Repcached, Terracotta, Velocity

Key-Value Store Flare, Keyspace, RAMCloud, SchemaFree

Key-Value Store - Eventually consistent

DovetailDB, Dynamo, Dynomite, MotionDb, Voldemort, SubRecord

Key-Value Store - Ordered Actord, Lightcloud, Luxio, MemcacheDB, NMDB, Scalaris, TokyoTyrant

Tuple Store Apache River, Coord, GigaSpaces

Object Database DB4O, Perst, Shoal, ZopeDB,

Document Store Clusterpoint, CouchDB, MarkLogic, MongoDB, Riak, XML-databases

Wide Columnar Store BigTable, Cassandra, HBase, Hypertable, KAI, KDI, OpenNeptune, Qbase

Array Databases SciDB, PostGIS, Oracle GeoRaster, Rasdaman

Stream Databases StreamSQL, STREAM, AURORA

Analytical Column Stores Vertica, SybaseIQ

High Throughput OLTP VoltDB, Hana

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

14.1 Towards NoSQL & NewSQL

Page 21: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

14.1 Towards NoSQL & NewSQL

Page 22: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Hardware costs of a DDBMS– Usually run by big companies with dedicated data

centers

– DDBMS usually resides on extremely expensive blade servers• DELL PowerEdge M910 (Oct 2011)

– 4x XEON E7-8837, 2.67 GHz, 8 Cores each

– 384 GB RAM

– 3.0 TB RAID HD

– 38.000 €

• Building a data center with such Blades is very expensive… (1 Rack, 32 Blades)– ~1.2 Million € for 512 cores, 12 TB RAM, 96 TB HD

– Additional costs for support, housing, etc…

– Analogy: data lives in high class condos

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

14.2 Distributed Data

Page 23: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Hardware costs of a Distributed Data System

– Software usually resides on very cheap low-end hardware

• DELL Vostro D 460 (Oct 2011)– Intel Core i7-2600 3,4 GHz, 8 Cores

– 16 GB RAM

– 2 TB HD

– 1000 €

• Performance comes cheap (1,200 machines)– ~ 1.2 Million € for 9600 cores, 19,2 TB RAM, 2,4 PB HD

– Blade: ~1.2 Million € for 512 cores, 12 TB RAM, 96 TB HD

– Analogy: data lives in the slums

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

14.2 Distributed Data

Page 24: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• … or how to build one of the most powerful data centers out of crappy hardware– Google has jealously guarded the design of its data centers

for a long time• In 2007 & 2009 some details

have been revealed

• The Google Servers– Google only uses custom

build servers

– Google is the world 4th largest server producer• They don’t even sell servers…

• In 2007, it was estimated that Google operates over 1.000.000 servers over 34 major and many more minor data centers

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

14.2 Google Servers

Page 25: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

– Data centers are connected to each other and major

internet hubs via massive fiber lines (2010)

• ~7% of all internet traffic is generated by Google

• ~60% of that traffic connects directly to consumer

networks without connecting to global backbone

– If Google was an ISP, it would be the 3rd largest global carrier

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

14.2 Google Servers

Page 26: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Some Google Datacenter facts & rumors

– In 2007, four new data centers were constructed for

600 million dollars

– Annual operation costs in 2007 are reported to be

2.4 billion dollars

– An average data center uses 50

megawatts of electricity

• The largest center in Oregon has an estimated use of over

110 megawatts

• The whole region of Braunschweig is estimated to use up

roughly 225 megawatts

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

14.2 Google Servers

Page 27: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Each server rack holds 40 to 80 commodity-class x86 PC servers with custom Linux (2010)– Servers run outdated hardware

– Each system has its own 12V battery to counter unstable power supplies

– No cases used, racks are setup in standard shipping containers and are just wired together• More info: http://www.youtube.com/watch?v=Ho1GEyftpmQ

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

14.2 Google Servers

Page 28: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Google servers are very unstable

– … but also very cheap

– High “bang-for-buck” ratio

• Typical first year for a new cluster (several racks):

– ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)

– ~1 PDU (power distribution unit) failure (~500-1000 machines suddenly disappear, ~6 hours to come back)

– ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)

– ~1 network rewiring (rolling ~5% of machines down over 2-day span)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

14.2 Google Servers

Page 29: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

– ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)

– ~5 racks go wonky (40-80 machines see 50% packet loss)

– ~8 network maintenances (might cause ~30-minute random connectivity losses)

– ~12 router reloads (takes out DNS and external VIPs for a couple minutes)

– ~3 router failures (traffic immediately pulled for an hour)

– ~dozens of minor 30-second DNS blips

– ~1000 individual machine failures

– ~thousands of hard drive failures

– Countless slow disks, bad memory, misconfigured machines, flaky machines, etc.

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

14.2 Google Servers

Page 30: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Challenges to the data center software

– Deal with all these hardware failures while avoiding

any data loss and ~100% global uptime

– Decrease maintenance costs to minimum

– Allow flexible extension of data centers

– Solution:

• Build a system with heavy redundancies

• Google: GFS (Google File System)

and Google Big Table Data System

– Now, replaced by Spanner

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

14.2 Google Servers

Page 31: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Apache CouchDB

– Couch==cluster of unreliable commodity hardware

– Aimed at serving webpages and web apps

– Core Features

• Distributed Architecture with high degree of replication

– Can run on hundreds of nodes if required

– Focus on availability of data!

– Replicas are NOT always consistent, but eventually consistent

» Some nodes can even be offline!

» CouchDB can fall into partitions, this will be fixed by the

system

» Replicas will be synced bi-directionally when opportune

31

14.3 Example: CouchDB

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Page 32: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• No support for transactions– ... but at least supports some consistency for replicas : eventual

consistency

» See CAP theorem if you are interested in this…

» In short: in system with replicas, you can have availability, consistency, and partition tolerance

• Cap theorem: pick only two

• Uses a Document Data model– Stores and retrieves documents given by JSON files

• Has a strong emphasize on open Web APIs– No client APIs necessary

– No drivers necessary

– All documents have unique URI, exposed via HTTP REST calls

• Strong support for views– Views are defined via JavaScript

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

14.3 Example: CouchDB

Page 33: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Data Model:

– JSON Documents

• Initially a format designed to serialize Javascript objects

• Primary use: data exchange in a Web environment

– E.g., AJAX applications

• Extended use: data serialization and storage

• Could be seen as lightweight XML

– pretty easy to integrate to any programming language, with minimal

parsing effort

• However: No query language, no schema

• Basic idea: Structured key-value pairs

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

14.3 Example: CouchDB

Page 34: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Example: Simple Movie DB

• Simple data items are

key-value pairs supporting

typical Web data types

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

14.3 Example: CouchDB

• “title” : “Terminator 2“

• “year” : 1991

Page 35: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• An object is a key value pair which has a set of

unordered keyvalue pairs as value

– Sub-item keys must be unique

– Objects can be used as values of a key-value pair

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

14.3 Example: CouchDB

“director”: {

“first_name” : “James“,

“last_name” : “Cameron”

}

“terminator2”: {

“title” : “Terminator 2“,

“year” : 1991,

“director” : {

“first_name” : “James“,

“last_name” : “Cameron” }}

Page 36: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Also, arrays can be used

36

Example: CouchDB

“terminator2”:

{ “title” : “Terminator 2“,

“year” : 1991,

“director” : {

“first_name” : “James“,

“last_name” : “Cameron” }

“actors”: [

{“first_name” : “Arnold”, “last_name” : ”Schwarzenegger” },

{“first_name” : “Linda”, “last_name” : ”Hamilton” },

{“first_name” : “Edward”, “last_name” : ”Furlong” },

]

}

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Page 37: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Documents are complex and autonomous

pieces of information

– Each document has a unique URI

– Can be retrieved, stored, modified, and deleted

• REST Calls: GET, PUT, POST, DELETE

– There are no references between documents

– Also, documents can be versioned, replicated,

synchronized, and restructured

• Each document is identified by an id and a revision number

• Each update created a new revision

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

Example: CouchDB

Page 38: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Quick introduction

– You can use CURL for quick interaction

• Programming language & environment for interactive web

applications

• Provides native support for most web standards like HTML,

REST, or JSON

– Assume we installed CouchDB locally

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

Example: CouchDB

Page 39: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Futon Admin Interface: http://127.0.0.1:5984/_utils/

• Already created movies DB

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

Example: CouchDB

Page 40: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Add some data:

– Each document needs an ID, think of one!

– Or just use files:

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

Example: CouchDB

curl -X PUT -d ‘{ “title” : “Terminator 2“, “year” : 1991,“director” : { “first_name” : “James“, “last_name” : “Cameron” },

“actors”: [

{“first_name” : “Arnold”, “last_name” : ”Schwarzenegger” },

{“first_name” : “Linda”, “last_name” : ”Hamilton” },

{“first_name” : “Edward”, “last_name” : ”Furlong” },

]

}‘ http://127.0.0.1:5984/movies/Terminator2

curl -X PUT -d @Terminator2.json

http://127.0.0.1:5984/movies/Terminator2

Page 41: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41

Example: CouchDB

Page 42: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• This all looks quite easy and nice…

• Let’s query for something by using…no SQL???

– CouchDB only supports views, no queries!

– Views are defined using JavaScript MapReduce functions

• Map functions are run on each document and emit a new temporary document part of the view

– Again: A document has a key, and some value…

– View is ordered by key

• Views can then be queries by a reduce function

– Reduce functions summarize emitted map result grouped by key

• The MapReduce paradigm allows for an easy distribution of queries in a multi-node environment!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42

Example: CouchDB

Page 43: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Example: Return an ordered list of all movies from 1991 or older – i.e., SELECT title FROM movies WHERE year<=1991

• …but we don’t have SQL…

– CouchDB:• Create a new view with years as keys and titles as values

• Select from this view all pairs with keys<=1991

– Views are collected in design documents• Each design document can have multiple views

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

Example: CouchDB

function(doc) {

if (doc.title && doc.year) {

emit(doc.year, doc.title);

}

}

Map:

Page 44: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Example: Return an ordered list of all movies

from 1991 or older

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

Example: CouchDB

If there is a title and a year, create a newdocument with key=‘year’ and value=‘title’

No reduce necessary right now

View key-value pairs

We call this view “year-title”

Page 45: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Query via REST HTTP

– http://127.0.0.1:5984/movies/_design/rdb1_14/_view/year-title?endkey=1991

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

Example: CouchDB

DB name Design Document Name

View name All keys up to 1991

Page 46: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Example: Create list of years and the number of

movies released in that year

• (skip years without movies released, and consider

only years 1991 and older)

– e.g. SELECT year, count(*) FROM movies WHEREyear<=1991 GROUP BY year

• In CouchDB, we can use the same map as for the

previous query

– However, we need a reducer

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

Example: CouchDB

Page 47: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Reducers are run on all mapped data

– Mapped values are grouped by key, and a reducer is

called for each key with a set of all respective values

– Reducers can also be run on their own output

• Called a re-reduce, which can be done multiple times

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47

Example: CouchDB

function(keys, values, rereduce) {

return values.length;

}

Reduce:

Page 48: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48

Example: CouchDB

“terminator2”: { "title": "Terminator 2 - Judgement Day", "year": 1991, “genre”:”Action”}

“robinHood”:{ "title": "Robin Hood - Prince of Thieves", "year": 1991, “genre”:[”Action”, “Romance”]}

“conan”:{ "title": "Conan the Barbarian", "year": 1982, “genre”:”Action”}

function(doc) {

if (doc.title && doc.year) {

emit(doc.year, doc.title);

}

}

Map:

{1991: "Terminator 2 - Judgement Day"}

{1991: "Robin Hood - Prince of Thieves"}

{1982: "Conan the Barbarian"}

Page 49: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

Example: CouchDB

{1991: "Terminator 2 - Judgement Day"}

{1991: "Robin Hood - Prince of Thieves"}

{1982: "Conan the Barbarian"}

function(keys, values, rereduce) {

return values.length;

}

Reduce:

{1982: 1}

{1991: 2}

Page 50: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Query via REST HTTP

– http://127.0.0.1:5984/movies/_design/rdb1_14/_view/s

ums?endkey=1991&group_level=1

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

Example: CouchDB

Run reducer on level 0 and level 1

Page 51: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• So, how about transactions?

– Not supported per se!

– But there are “easy” workarounds – just keep track of transaction consistency manually

– Example: inventory management• You are selling hammers, and screwdrivers, and don’t want to sell more

than you have on stock

• What happens if we sell a hammer?

• In JDBC/SQL, this would be simple…

– Have constraint that inventory number can never be negative

– Start JDBC transaction in your application

– Load current inventory number for hammers

– If there are still hammers, reduce inventory by one

– Commit transaction – if this works out, tell customer that everything is fine

» If not, somebody else snatched the last hammer quicker

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

Example: CouchDB

Page 52: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• “Solution” A: Work with revision numbers• Have an inventory document

• Load document with hammer inventory number, store revision

• Sell hammer

• Update hammer inventory document with new number if only if document has still the same revision

– If not, retrieve the new document and try to update that one…

– If you find out that there are no hammers anymore, reimburse customer and apologize

– This process catches many potential consistency problems, but gives NO guarantees at all!• This is horrible in a high concurrency environment!

• You could have purchases which get pushed back all the time…

• You could still sell more hammer than you have…

52

Example: CouchDB

inventory : {

_rev : “471c37eb3116179b9f269427372a86db”

“hammers” : 15;

“screwdrivers” :9;

}Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Page 53: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• “Solution” B: Build fake “locks” for each item• For each hammer and screwdriver, have an own inventory document

• If you want to know how many hammers you have, create a view and count all hammer documents

• If you sell a hammer, randomly load one hammer file and try to delete it

• If this works, all might be well…

– This process has still problems…• e.g., inventory documents are replicated – how do you deal with that?

– Visit our lectures RDB2 and DDM to learn how to program something that will really work…

– … in which case you just build a distributed database transaction manager yourself!! Congrats, wheel re-invented!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

Example: CouchDB

Hammer_1 : {_rev : “471c37eb3116179b9f269427372a86db”}

Hammer_2 : {_rev : “5ff77937ea707d35cc907b466f726cc8”}

Hammer_3 : {_rev : “3dd521c277ab448b91ce2e8bb57bbb4f”}

Screwdriver_1 : {_rev : “a1a70294da183c8b0fb525ec285971c9”}

Screwdriver_2 : {_rev : “09bdb275b75fea85369c86f7ba5f3467”}

Page 54: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Closing words…

– Yes, NoSQL is cool and can do cool things!

• Usually, its easy, fast, and scalable!

– No, NoSQL does NOT universally invalidate

Relational Databases

– New Challenge for YOU:

• Choose the right tool for the right task!

• What does your application really require?

• What will it require in the future?

• Which technologies fulfill these requirements best?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

NoSQL

Page 55: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Lectures

– Relational Database Systems II

– Information Retrieval and Web

Search Engines

– Software Entwicklungs Praktikum

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

14 Next Semester

Page 56: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Featuring

– the architecture of a DBMS

– storing data on hard disks

– indexing

– query evaluation and optimization

– transactions and ACID

– recovery

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

Relational Databases 2

Page 57: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Data structures for indexes!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

Relational Databases 2

Page 58: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Query optimization!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

Relational Databases 2

Page 59: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Implementing transactions!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

Relational Databases 2

Scheduler

Storage

Manager

Transaction

Manager

Page 60: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Extremely relevant for practical applications is the retrieval of textual documents

– Document retrieval models

– Indexing

– Language models

– Clustering

– Classification

– Web crawling

– Link analysis

– Spam detection

– Question answering

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

14.3 IR & Web Search

Page 61: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Document Retrieval

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

14.3 IR & Web Search

step

China

1

1

Document1

Document2

Document3

Page 62: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Document Clustering/Classification

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

14.3 IR & Web Search

+

+

−−

+

+

+

+

+

+

+

Page 63: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

• Web Search

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

14.3 IR & Web Search

The Web

Users

Web crawler

User interface

Retrieval algorithms

Page 64: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

14.3 SEP

• TRUMPS UP – Discovering Fake News

– Learn Machine Learning, Natural Language

Processing, Information Retrieval

Page 65: Relational Database Systems 1 - TU Braunschweig · 2017-02-06 · concepts needed! Relational Database ... –DDBMS usually resides on extremely expensive blade servers ... Relational

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

14 That‘s all folks…


Recommended