1340151 Big Data & Cloud Computing (P. Baumann)
Storing Data:
Disks and Files
Garcia Molina, Ullman, Widom
Ramakrishnan/Gehrke Ch. 9
"Digital information lasts forever - or five years, whichever comes first."
-- Jeff Rothenberg, RAND Corp., 1997
2340151 Big Data & Cloud Computing (P. Baumann)
Why Not Everything in Main Memory?
Costs too much
• [Rama/Gehrke] $1000 will buy you either 128MB of RAM or 7.5GB of disk
• Today: 80 EUR will buy you either 4 GB of RAM or 1 TB of disk
• …but today we have multi-Terabyte databases!
Main memory is volatile
• want data to be saved between runs (obviously!)
Typical storage hierarchy:
• Main memory (RAM) for currently used data
• Disk for main database (secondary storage)
• Tapes for archiving older versions of data (tertiary storage)
3340151 Big Data & Cloud Computing (P. Baumann)
Storage Capacity
Absolute times as of 2003, but ratios still ~ same
4340151 Big Data & Cloud Computing (P. Baumann)
Storage Cost
Again, absolute values as of 2003, but ratios still ~ same
5340151 Big Data & Cloud Computing (P. Baumann)
Storage Hierarchies
Magneto-optical media
Optical media
Magnetic tapes
RAID systems
Magnetic disks
Main memory
Storage capacityStorage capacity
Larger
CheaperSlower
Primary
memory
Secondary
memory
Tertiary
memory
6340151 Big Data & Cloud Computing (P. Baumann)
Numbers
source: http://carlos.bueno.org/2014/11/cache.html
7340151 Big Data & Cloud Computing (P. Baumann)
Nearline (Tertiary) Storage
Usually tape
• Reel, today: cartridge
• Capacity 10 GB ~6 TB per tape
Tape robots
• HSM =
Hierarchical storage management
• multi-Petabytes
8340151 Big Data & Cloud Computing (P. Baumann)
Caching & Virtual Memory
Cache: Fast memory, holding frequently used parts of a slower, larger
memory
• small (L1) cache holds a few kilobytes of the memory "most recently used" by the
processor
• Most operating systems keep most recently used "pages" of memory in main memory,
put the rest on disk
Virtual memory
• programs don't know whether accessing main memory or a page on secondary
memory page (most operating systems)
Database systems usually take explicit control over 2ndary memory access
9340151 Big Data & Cloud Computing (P. Baumann)
Where Databases Reside
Hard Disk is secondary storage device of choice
• Many flavors:
Disk: Floppy (hard, soft); Winchester; Ram disks; Optical, CD−ROM; Arrays
Main advantage over tapes: random access vs. sequential
Data stored and retrieved in units called disk blocks or pages
Unlike RAM, time to retrieve a disk page varies
depending upon location on disk
• relative placement of pages on disk
has major impact on DBMS performance!
10340151 Big Data & Cloud Computing (P. Baumann)
The Miracle Called "Hard Disk"
Disk head contains magnet, hovering over spinning platter
flight height: 10-20 nm
(x 5,000 gives one hair!)
11340151 Big Data & Cloud Computing (P. Baumann)
Components of a Disk
platters spin
arm assembly moves in or out
to position head on desired track
Tracks under heads = a cylinder
(imaginary!)
Sector size = N * block size
(fixed)
...typical numbers?
12340151 Big Data & Cloud Computing (P. Baumann)
Typical Numbers
Diameter: 1 inch ...15 inches
Cylinders: 40 (floppy) ... 20,000
Surfaces: 1 (old CDs) ... 2 (floppies) ... 30
Sector Size: 512 B ... 50 kB
Capacity: 360 kB (old floppy) ... 4 TB
13340151 Big Data & Cloud Computing (P. Baumann)
Disk Access Time
I want block X block X in memory
?
14340151 Big Data & Cloud Computing (P. Baumann)
Disk Access Time
Time = Seek Time +
Rotational Delay +
Transfer Time +
Other
15340151 Big Data & Cloud Computing (P. Baumann)
Time = Seek Time +Rotational Delay +Transfer Time +Other
Seek Time
16340151 Big Data & Cloud Computing (P. Baumann)
Average Random Seek Time
Typical S: 10 ms ...40 ms
= millions of times RAM access !
Time = Seek Time +Rotational Delay +Transfer Time +Other
17340151 Big Data & Cloud Computing (P. Baumann)
Average Rotational Delay
R = 1/2 revolution
typical R = 4.16 ms (7,200 RPM)
Time = Seek Time +Rotational Delay +Transfer Time +Other
18340151 Big Data & Cloud Computing (P. Baumann)
Transfer Rate
Transfer rate: t
• typical t: 10 ... 50 MB/second
transfer time T:
block size
T = ---------------
t
Ex: block size 32 kB, t = 32 MB/second
transfer time = …?
Time = Seek Time +Rotational Delay +Transfer Time +Other
19340151 Big Data & Cloud Computing (P. Baumann)
CPU time to issue I/O
Contention for controller
Contention for bus, memory
Typical Value:
Other Delays
Time = Seek Time +Rotational Delay +Transfer Time +Other
0
(relative to other values)
20340151 Big Data & Cloud Computing (P. Baumann)
Sequential Read?
So far: Random Block Access
What about: Reading next block?
Disks optimized towards "consecutive" reading!
• Blocks within track
• Tracks within cylinder
• Next cylinder
21340151 Big Data & Cloud Computing (P. Baumann)
"Next Block" Costs
`Next’ block concept:
• blocks on same track, followed by
• blocks on same cylinder, followed by
• blocks on adjacent cylinder
If we don’t need to change cylinder:
Block Size
Time to get = ---------------- + Negligible
block t
• + switch track (ie, read next arm)
• + once in a while, next cylinder
22340151 Big Data & Cloud Computing (P. Baumann)
Random vs Sequential Read
Rule of Thumb:
• Random I/O: Expensive
• Sequential I/O: Less expensive
Ex: 1 KB Block:
• Random I/O: ~ 20 ms
• Sequential I/O: ~ 1 ms
relative difference is smaller for larger blocks
Whenever possible arrange file blocks sequentially on disk (by `next’)
to minimize seek and rotational delay
• For sequential scan, pre-fetching several pages at a time is a big win! “burst read”
23340151 Big Data & Cloud Computing (P. Baumann)
...Writing?
Cost for Writing cost for Reading
... unless we want to verify!
• Then, need to add
Block size
---------------- + (full) rotation
t
24340151 Big Data & Cloud Computing (P. Baumann)
...To Modify a Block?
(a) Read Block
(b) Modify in Memory
(c) Write Block
[ (d) Verify ]
25340151 Big Data & Cloud Computing (P. Baumann)
Wrap-Up
Capacities grow, data hunger grows larger
• Moore's Law vs Greg's Law vs disk growth
Databases heavily i/o bound
• Disk space management largely determines performance
Disk access time =
Seek Time + Rotational Delay + Transfer Time + Other