+ All Categories
Home > Technology > Enabling fast pages and furious development while supporting a billion users

Enabling fast pages and furious development while supporting a billion users

Date post: 05-Jul-2015
Category:
Upload: viet-nt
View: 1,298 times
Download: 3 times
Share this document with a friend
Description:
Facebook is one of the top sites on the internet and supports more than 900 million users. It handles billions of messages, hundreds of millions of photos, and generates hundreds of terabytes of data - every day! This data is also becoming more complex and interconnected over time. Every page the site serves, requires processing large amounts of data and needs to be rendered in milliseconds. Business and practical constraints dictate that more users are served with less resources. In addition, product changes regularly occur in a rapid manner. These constraints dictate that the site requires an infrastructure that is scalable, fast, efficient and flexible beyond what has been built ever before. In this talk, we will share key learning from our experience in building an infrastructure that addresses the above challenges. In particular, we will discuss key components of the Facebook software architecture, instrumentation and data collection mechanisms that allow us to monitor the health of the site, and innovative tools that analyze vast amount of data to help us pre-empt site issues and help identify root causes when things go wrong. We describe how this infrastructure and tools allow the engineers to move fast and rapidly launch products as Facebook builds for a billion users and beyond.
52
Enabling Fast Pages and Furious Development While Supporting a Billion Users Subbu Subramanian, Ph.D. Software Engineer
Transcript
Page 1: Enabling fast pages and furious development while supporting a billion users

Enabling Fast Pages and Furious Development While Supporting a Billion Users

Subbu Subramanian, Ph.D. Software Engineer

Page 2: Enabling fast pages and furious development while supporting a billion users

Pottery Challenge

Page 3: Enabling fast pages and furious development while supporting a billion users

Pottery Challenge: Day 3

Team 1: Make a PERFECT Pot Team 2: Make 20 pots

Page 4: Enabling fast pages and furious development while supporting a billion users

950,000,000

Page 5: Enabling fast pages and furious development while supporting a billion users

700B minutes spent

on the site every month

2.5M sites using

social plugins

30B pieces of content

shared each month

500M daily active users

Latests stats @ http://newsroom.fb.com/content/default.aspx?NewsAreaId=22

Page 6: Enabling fast pages and furious development while supporting a billion users

UNIQUE TECHNOLOGY CHALLENGES Require tackling some

Page 7: Enabling fast pages and furious development while supporting a billion users

Scaling traditional websites

Bob

Bob’s data

Bob

Bob’s data

Page 8: Enabling fast pages and furious development while supporting a billion users
Page 9: Enabling fast pages and furious development while supporting a billion users

Memcache: Slide 1: story of crossing over different schools: New school = cluster of machines and they were not connected then the campuses could be connected which through everything into the blender IN order to make it scale we looked in Open Source and found Memcache: Memcache ran out of steam and we took some engineers to help figure out how Memcache wasn’t working Slide of graph going up from MIT as things bottlenecked we solved the problems in memcache narrative: we solve one things and then things began to break. so successful in some areas that we blue up our switches. Graphic of school - then multiple schools as bubbles and isolated - then schools opening up to other schools and crashing the system which is why we went to Memcache. Slide 2: graph Slide 3: stats

Page 10: Enabling fast pages and furious development while supporting a billion users

Scaling Facebook: Interconnected data show one animate in at a time and animate their lines pop out more stories

Bob

Page 11: Enabling fast pages and furious development while supporting a billion users

Scaling Facebook: Interconnected data show one animate in at a time and animate their lines pop out more stories

Bob Brian

Page 12: Enabling fast pages and furious development while supporting a billion users

Scaling Facebook: Interconnected data show one animate in at a time and animate their lines pop out more stories

Bob Brian Felicia

Page 13: Enabling fast pages and furious development while supporting a billion users

News Feed

950 million unique home pages

Page 14: Enabling fast pages and furious development while supporting a billion users

Multifeed

Multifeed

Actor ID - Object ID - Story type

Actor ID - Object ID - Story type

Stories for up to 5000 friends in milliseconds

Blank photo/ActorID: Object ID/Storytype grey out the text blocks and superimpose the actual copy

Examining thousands of stories to find the 45 most interesting stories out of thousands and returned in milliseconds

Page 15: Enabling fast pages and furious development while supporting a billion users

Memcache

TAO (Custom Cache

1 Billion operations per second

Page 16: Enabling fast pages and furious development while supporting a billion users

800M

New Apps February 2004

Sign Up

NewsFeed 2006

Platform launch 2007

Translations 2008

The Stream 2009

Open Graph 2010

</> Social Plugins 2010

Photos Update 2010

Places 2010

Mobile Event 2010

Groups 2010

Messages 2010

New Profile 2010

Questions 2011

? Unified Mobile Sites

while supporting growth …

2011 2004

New Apps 2004/2005

Page 17: Enabling fast pages and furious development while supporting a billion users

Pottery Challenge

Team 1: Make a PERFECT Pot Team 2: Make 20 pots

Page 18: Enabling fast pages and furious development while supporting a billion users

Scale���

Photo by Eole: at http://www.flickr.com/photos/eole/2193801804// and used under Creative Commons license

Move Fast

Page 19: Enabling fast pages and furious development while supporting a billion users

Move Fast

Moving fast does not mean poor quality We want a high ship rate

Invest in removing friction that slows us down

Page 20: Enabling fast pages and furious development while supporting a billion users

Starting on day ONE

Follow your passion – pick your team Push any time you want to

Empower Engineers

Page 21: Enabling fast pages and furious development while supporting a billion users

Commits per Month

1/1/2006 1/1/2012 1/1/2007 1/1/2008 1/1/2009 1/1/2010 1/1/2011

Page 22: Enabling fast pages and furious development while supporting a billion users

Be Bold

and innovate

Move Fast

and build things

Scale Big

with min resources

OMG!

Page 23: Enabling fast pages and furious development while supporting a billion users

Be Bold

and innovate

Move Fast

and build things

Scale Big

with min resources

How can Infrastructure support these goals?

q Pre-empt issues before they hit production

q Know immediately when things go wrong

q Know what to do when things go wrong

=> LOTS OF INSTRUMENTATION, TOOLS and AUTOMATION

Page 24: Enabling fast pages and furious development while supporting a billion users

News feed

Perflab (aka Difflab)

• Performance test every commit

• Spot regressions before deploy

Page 25: Enabling fast pages and furious development while supporting a billion users

News feed

Perflab (aka Difflab)

• Also tracks slow drift regressions

• Helps us push thousands of revs per week

Page 26: Enabling fast pages and furious development while supporting a billion users

News feed

Gatekeeper

if!(gk_check(‘secret_project’, $user) {

!render_cool_feature();} else {

!render_normal_feature();}!

Simple code but powerful check

• Many options for precise targeting

• 500M+ gatekeeper checks performed every second

Page 27: Enabling fast pages and furious development while supporting a billion users

Rigorous Test Coverage

Page 28: Enabling fast pages and furious development while supporting a billion users

Assigning ownership to failures

Page 29: Enabling fast pages and furious development while supporting a billion users

Canary Tier and Delta view

Page 30: Enabling fast pages and furious development while supporting a billion users

Be Bold

and innovate

Move Fast

and build things

Scale Big

with min resources

q Pre-empt issues before they hit production

q Know immediately when things go wrong

q Know what to do when things go wrong

Page 31: Enabling fast pages and furious development while supporting a billion users

*Lots* of Instrumentation = Fire Hose of Data

Page 32: Enabling fast pages and furious development while supporting a billion users

News feed

Claspin

• High-density heatmap viewer for large services

• Find needles in a haystack -> drilldown quickly

Page 33: Enabling fast pages and furious development while supporting a billion users
Page 34: Enabling fast pages and furious development while supporting a billion users

tasks sevmanager logview testconsole

differential wirehog domino groups

hipal hsh hud kobold

ods opsfeed scuba serf

Page 35: Enabling fast pages and furious development while supporting a billion users

Be Bold

and innovate

Move Fast

and build things

Scale Big

with min resources

q Pre-empt issues before they hit production

q Know immediately when things go wrong

q Know what to do when things go wrong

Page 36: Enabling fast pages and furious development while supporting a billion users

Scuba: a tool for diving into an emerald sea of data

Page 37: Enabling fast pages and furious development while supporting a billion users

Motivation

Page 38: Enabling fast pages and furious development while supporting a billion users

Requirements for data exploration

Need

ü  Speed

ü  Real-time data

ü  Ad-hoc filtering and grouping

ü  AVG, SUM, COUNT, histograms & percentiles

Don’t Need

⤫  Replication

⤫  Transactions

⤫  Long retention

⤫  Table joins

⤫  Unique keys

⤫  Full map-reduce

Page 39: Enabling fast pages and furious development while supporting a billion users

Hive (hadoop)

•  “Unlimited” storage/CPU

•  Full-featured Query Language

•  Numerous tools and Frameworks

But Slow!

Page 40: Enabling fast pages and furious development while supporting a billion users

MySQL

Works, but …

Page 41: Enabling fast pages and furious development while supporting a billion users

Scuba: Data Model

Few pre-defined types and operations

“Data Sets”

- No upfront schema declaration

- Stored In memory

- Sorted by Time stamp

Page 42: Enabling fast pages and furious development while supporting a billion users

Scuba Data Types: Integers

VLQ encoded array of 64bit integers (single char*)

O(1) lookup

Usage

Aggregate on these (SUM, AVG, etc...)

Filter (==, <=, >=)

“Which pages on the site have an average wall time in the last hour > 2 seconds”

Page 43: Enabling fast pages and furious development while supporting a billion users

Scuba Data Types: Normals

Strings mapped to ints, Stored as array of ints

size: 4 bytes

String Normalization

// char* => int32

'home.php', ’dc1', 'a2', 'en_US' => 32, 14, 3, 289

Usage

Group By value

Filter (==, !=, in set, not in set)

“Top 10 countries that have the slowest pages today”

Page 44: Enabling fast pages and furious development while supporting a billion users

Scuba Data Types: Denorm

Array of plain ol' char*

To be used ONLY for unique identifiers that would not benefit from normalization

size: 8 + strlen + 1 bytes

Usage

None other than displaying the value

Cannot filter or group by these. No native regex support.

Page 45: Enabling fast pages and furious development while supporting a billion users

Scuba Data Types: Tagsets

A set of normals. Stored as bit vector

size: 8 + 2 + ceil(I / 8) bytes where I is the max index represented

Usage

Filter (has all tags, has some tags, has none of these tags)

Bit Set

'timeline', 'mercury', 'titan_drafts' => 0, 5, 14

“Is there a difference in cpu usage across users in test group A vs test group B”

Page 46: Enabling fast pages and furious development while supporting a billion users

Storage

1 aggregator / box

8+ leaves / box

Hundreds of boxes

Data stored over Terabytes of RAM!

Page 47: Enabling fast pages and furious development while supporting a billion users

Leaves

Multiple per machine ( #cores / N )

Only queried by the aggregator on the local machine

Persist all write traffic to disk (compressed). Replay all writes on startup

Store all samples efficiently in memory

Leaves are independent; No shared state

Page 48: Enabling fast pages and furious development while supporting a billion users

Aggregation

Queries distributed, and aggregated as a binary tree

(For now, there is no sorting of results. All aggregation operations must be commutative and associative.)

Page 49: Enabling fast pages and furious development while supporting a billion users

Operations

4 functions:

visit, summarize, combine, and finalize

Also a Hive-SQL like query language interface

Page 50: Enabling fast pages and furious development while supporting a billion users

Querying Front-end

Page 51: Enabling fast pages and furious development while supporting a billion users

tasks sevmanager logview testconsole

differential wirehog domino groups

hipal hsh hud kobold

ods opsfeed scuba serf

Page 52: Enabling fast pages and furious development while supporting a billion users

Be Bold

and innovate

Move Fast

and build things

Scale Big

with min resources


Recommended