Presentation at Google Day on Big Data

Post on 11-Aug-2014

115 views 7 download

Tags:

description

 

transcript

Big Data

Hello

Rezaur Rahman (Jitu)CTO, G&R Ad Networkjitu@gandr.com.bd@jituboss

Data is growing at a exponential rate and traditional tools like RDBMS is not enough to process

Data is everywhere:

• Flickr (87 million registered members and 3.5 million photos per day)

• YouTube (4B videos streamed per day)• Yahoo! Webmap (3 trillion links, 300TB compressed, 5PB

disk)• Facebook is collecting your data 500 terabytes a day• Walmart handles more than 1 million customer

transactions every hour• IDC Estimates that by 2020, business transactions on the

internet- business-to-business and business-to-consumer – will reach 450 billion per day.

Data is growing at a 40% rate, reaching nearly 45 ZB by 2020 according to IDC

1 ZB is equal to 1 billion TB

What is Big Data and what is not?

• Order details of a e-commerce site• All Orders across 1000s of e-commerce sites• One person’s voter ID information• Every citizen’s voter ID information dataset

Simple Definition: Big Data is Data, that is too big to process with a single machine

What is Big Data?

3 v’s of Big Data

Types of Data:

• Relational Data (Tables/Transaction/Legacy Data)

• Unstructured Data – Apache weblogs• Text Data (Web)• Semi-structured Data (XML) • Graph Data• Social Network, Semantic Web (RDF)• Streaming Data

Data Processing Tasks:

• Aggregation and Statistics - Data warehouse• Contextual Advertising – Real Time Bidding,

Remarketing• Indexing, Searching, and Querying - Keyword

based search, Pattern recognition• Knowledge discovery - Data Mining, Statistical

Modeling

Traditional Architecture

• Relational Data is everything– SQL– Embedded– Client-Server Based

• Data Stack– Web, CDN, Load Balancers, Application, Database

and Storage

Traditional Scalability

• Scale-up– Memory And Hardware has limitations

• Scale-out– Reading

• Cache is everything– Query Cache– Memcache

• Pre-fetching, Replication– Writes

• Redundant Disk Arrays, RAID• Sharding

NoSQL Solution

• Lot of companies emerged to solve data problem• Big Table: Google started to implement massively

distributed scalable system• Many companies followed building scale-out

architecture using commodity hardware• ACID was termed as bad for scaling, so relaxed

consistency model came• Google Big Table and Amazon Dynamo are

notable

Big Data Tools

Big Data Landscape

Thanks

Questions?