Getting your head around big data

transcript

Getting your head around

BIG Data

https://github.com/glennblockhttps://twitter.com/gblock

“I should be tweeting"

Make machine data accessible, usable and valuable to everyone.

Platform for Machine DataAny Machine Data

HA Indexes and Storage

Search and Investigation

Proactive Monitoring

Operational Visibility

Real-time Business Insights

CommodityServers

Online Services Web

Services

ServersSecurity GPS

Location

StorageDesktops

Networks

Packaged Applications

CustomApplicationsMessaging

TelecomsOnline

Shopping Cart

Web Clickstreams

Databases

Energy Meters

Call Detail Records

Smartphones and Devices

15,000 BC – PicturesLascaux, France

6000 BC – Symbols

3,500 BC – Language

1,275 BC – Papyrus

1st - 13th Century - Codex

13th Century – Movable type

15th Century – Printing press

19th to 20th century Babbage Analytical engine

1936 – Turing machine

1945 – ENIAC

1947 – The first bug

1977 - Arpanet

1990s Internet

Phones and Tablets

Services

New consumer devices

90 percent of all the data in the world has been generated over the last two years

source: sciencedaily.com

Every day 2.5 quintillion bytes of data is generated

1 quintillion = 1 + 18 zeros!57.5 billion 32 GB iPads

source: storagenewsletter.com

2.7 zettabytes exist in the digital universe

1 zettabyte = 1 + 21 zeros!42zb = All human speech digitized

source: highscalability.com

How big is big?

That’s A LOT of data!

How do you harness it?

This is what big data is really about.

Asking questions andgetting answers

Massive amounts of data.

Machine generated

VOLUME

Data is coming from a multitude of sources

Mix of structured and un-structured (JSON, XML, CSV, Plain Text)

Need a way to store it and and query it

VARIETY

VARIETYLog filesActivity FeedsEmails

Device StreamsAudio FilesVideos

Data arrives at many different frequencies

Need to be able to process real time.

VELOCITY

Not all data that is stored is useful.

Need to identify the useful data

Need to wade through all the noise

VERACITY

SOLUTIONS

Map/Reducefunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)

function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)

Hi scale and availability databases

Distributed processing of large datasets

Data Visualization and analysis

End to end tools

More information

www.mongodb.org www.memsql.com cassandra.apache.orghadoop.apache.org

www.tableausoftware.comwww.elasticsearch.orgsplunk.com

@gblock http://github.com/glennblock

http://www.flickr.com/photos/11812960@N04/4050576435

Getting your head around big data

Engineering