The Ageof
Infinite Storagehas begun
Many of us have enough money in our pockets right nowto buy all the storage we will be able to fill for the next 5 years.
So having the storage capacity is no longer a problem.Managing it is a problem (especially when the volume gets large).
How much data is there?
Tera Bytes (TBs) are Here 1 TB costs 1k$ to buy 1 TB costs ~300k$/year to own
Management and curation are the expensive part
Searching 1 TB takes hours
I’m Terrified by TeraBytes
I’m Petrified by PetaBytes
Googi 10100
. . .
Yotta 1024
Zetta 1021
Exa 1018
Peta 1015
Tera 1012
Giga 109
Mega 106
Kilo 103
We are here
I’ll soon be Exafied by ExaBytes
I’m too old to ever be Zettafied by ZettaBytesBut you may be in your lifetimeYou may even be Yottafied by YottaBytesYou may never be Googified by GoogiBytes
But the next generation may be?
How much information is there?
Soon everything can be recorded and indexed.
Most of it will never be seen by humans.
Data summarization, trend detection, anomaly detection, data mining, are key technologies
Yotta
Zetta
Exa
Peta
Tera
Giga
Mega
KiloA BookA Book
.Movie
All books(words)
All Books MultiMedia
Everything!
Recorded
A PhotoA Photo
10-24 Yocto, 10-21 zepto, 10-18 atto, 10-15 femto, 10-12 pico, 10-9 nano, 10-6 micro, 10-3 milli
First Disk, in 1956 IBM 305 RAMAC
4 MB
50 24” disks
1200 rpm (revolutions per minute)
100 milli-seconds (ms) access time
35k$/year to rent
Included computer & accounting software(tubes not transistors)
W. P.7th Grade
C.S. lab Asst.
10 years later1.
6 m
eter
s 30 MB
MemexAs We May Think, Vannevar Bush, 1945
“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”
“yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely”
Can you fill a terabyte in a year?
Item Items/TB Items/day
a 300 KB JPEG image 3 M 9,800
a 1 MB Document 1 M 2,900
a 1 hour, 256 kb/s MP3 audio file
9 K 26
a 1 hour 1 MPEG video 290 0.8
On a Personal Terabyte, How Will We Find Anything?
Need Queries, Indexing, Data Mining, Scalability, Replication…
If you don’t use a DBMS, you will implement one of your own!
Need for Data Mining, Machine Learning is more important then ever!
Of the digital data in existence today,
80% is personal/individual
20% is Corporate/Governmental DBMSDBMS
We’re awash with data! Network data:
10 terabytes by 2004 ~ 1013 Bytes
US EROS Data Center archives Earth Observing System (near Soiux Falls SD) Remotely Sensed satellite and aerial imagery data
15 petabytes by 2007 ~ 1016 Bytes
National Virtual Observatory (aggregated astronomical data) 10 exabytes by 2010 ~ 1019 Bytes
Sensor data from sensors (including Micro & Nano -sensor networks) 10 zettabytes by 2015 ~ 1022 Bytes
WWW (and other text collections) 10 yottabytes by 2020 ~ 1025 Bytes
Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences) 10 gazillabytes by 2030 ~ 1028 Bytes?
Stock Market prediction data (prices + all the above?) 10 supragazillabytes by 2040 ~ 1031 Bytes?
Useful information must be teased out of these large volumes of raw data.
AND these are some of the 1/5th of "Corporate" or "Governmental" data collections. The other 4/5ths of data sets are personnel!
I made up these Name! Projected data sizes are overrunning our ability to name their orders of magnitude!
Parkinson’s Law (for data) Data expands to fill available storage
Disk-storage version of Moore’s Law
Available storage doubles every 9 months!
How do we get the information we need from the massive volumes of data we will have? Querying (for the information we know is there) Data mining (for the answers to questions we
don't know to ask precisely).