Gartner Hype Cycle, July 2013, h6p://www.gartner.com/newsroom/id/2575515
h6p://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-‐_2011.svg
h6p://en.wikipedia.org/wiki/File:Hard_drive_capacity_over_Lme.svg
h6p://www.jcmit.com/disk2012.htm
h6p://www.ibm.com/developerworks/library/os-‐dataminingrubytwi6er/
How much data is generated through social media tools? People send more than 144.8 billion Email messages sent a day. People and brands on Twi(er send more than 340 million tweets a day. People on Facebook share more than 684,000 bits of content a day. People upload 72 hours (259,200 seconds) of new video to YouTube a minute. Consumers spend $272,000 on Web shopping a day. Google receives over 2 million search queries a minute. Apple receives around 47,000 app downloads a minute. Brands receive more than 34,000 Facebook ‘likes’a minute. Tumblr blog owners publish 27,000 new posts a minute. Instagram photographers share 3,600 new photos a minute. Flickr photographers upload 3,125 new photos a minute. People perform over 2,000 Foursquare check-‐ins a minute. Individuals and organizaLons launch 571 new websites a minute. WordPress bloggers publish close to 350 new blog posts a minute. The Mobile Web receives 217 new parLcipants a minute. (The most updated numbers are available from the sites themselves.)
h6p://marciaconner.com/blog/data-‐on-‐big-‐data/
Gigabyte 1,073,741,824 bytes; 230; approx 1,000,000,000 or 10 9 1 Gigabyte: Paper in the bed of a pickup; symphony in high-fidelity sound; broadcast quality movie 2 Gigabytes: 20 yards of books on a shelf 5 Gigabytes: 8mm exabyte tale 10 Gigabytes: 20 Gigabytes: Audio collection of the works of Beethoven; five exabyte tapes; VHS tape used to store digital data 50 Gigabytes: Library floor of books on shelves 100 Gigabytes: Library floor of academic journals on shelves; large ID-1 digital tape 200 Gigabytes: 50 exabyte tapes Terabyte 1,099,511,627,776 or 240; approx. 1,000,000,000,000 or 10 12 1 Terabyte: Automated tape robot; all the X-ray films in a large technological hospital; 50,000 trees made into paper and printed; daily rate of eOS (earth Orbiting System) data (1998) 2 Terabytes: Academic research library 10 Terabytes: Printed collection of the U. S. Library of Congress 50 Terabytes: Contents of a large mass storage system Petabyte 1,125,899,906,842,624 bytes or 250 approx. 1,000,000,000,000,000 or 10 15 1 Petabyte: 3 years of eOS data (2001) 2 Petabytes: All U. S. academic research libraries 20 Petabytes: 1995 production of hard-disk drives 200 Petabytes: All printed material; 1995 production of digital magnetic tape Exabyte 1,152,921,504,606,846,976 bytes or 260 approx. 1,000,000,000,000,000,000 or 10 18 5 Exabytes: All words ever spoken by human beings. Zettabyte 1,180,591,620,717,411,303,424 bytes or 270 approx. 1,000,000,000,000,000,000,000 or 10 21 Yottabyte 1,208,925,819,614,629,174,706,176 bytes or 280 approx. 1,000,000,000,000,000,000,000,000 or 10 24
h6p://www.jamesshuggins.com/h/tek1/how-‐big.htm
h6p://www.emc.com/collateral/analyst-‐reports/idc-‐the-‐digital-‐universe-‐in-‐2020.pdf
Jan 13 Jun 13 Jan 14 Jun 14
RCE 6 dev RCE 6 alpha / beta
InformaLon security interns
NetApp OS upgrade
Level 3 compliance compleLon
Level 4 compliance planning
RCE 6 producLon
NX4 upgrade
RCE cloud batch dev RCE alpha
Big Data playground dev
RCE beta
Merge cloud into prod queues
Non-‐research file migraLon
Non-‐research web migraLon
RCE cloud producLon support dev
Where we’ve been and where we’re going
Jun 13 Jun 14 Jun 15 Jun 16
RCE 6
Level 3 Level 4
RCE cloud batch
Big Data 1.0
Non-‐research file migraLon complete
RCE 7 RCE 8
Level 5
RCE cloud COD
RCE cloud login
Big Data 2.0
Big Data 3.0
Non-‐research web migraLon
complete
The next three years: milestones
HUIT / Harvard IT themes
• Cloud compuLng • InformaLon security • IdenLty management (PIN / IDM) • EducaLonal tools (HarvardX / EdX) • Web publishing (HWPI / OpenScholar) • Library technologies (Dataverse)
IQSS / HMDC themes
• Focus on Social Science research compuLng • Moving commodity services to HUIT • Expanding to the cloud • InformaLon security (confidenLal research data) • Latest and greatest system applicaLon sokware • Ease-‐of-‐use funcLonality • HosLng research-‐oriented web sites • Working directly with the users
• Support and services • Technology training and consultaLon
• ArchitecLng novel approaches to cloud, security, scalable deployment and use of computaLon, I/O, and databases
Things we could get into more
• Databases • Scalable, structured, unstructured, social data, searching,
clustering, mining, etc. • Security
• IsolaLng computaLons, files, resources • Expanded customizaLon
• Giving users exactly and only what they need • Parallel compuLng
• InvesLgate how best to support the underlying technologies (MPI, resource managers, etc.)
• Leverage developing applicaLon support (GridR, etc.) • Ease-‐of-‐use cluster compuLng
• Browser / mobile support (NX 4) • Single applicaLon support
• Developing open source communiLes