Scaling your website

transcript

SCALING YOUR WEBSITE

Alejandro Marcu

Dutch PHP Conference 2016

Started programming Logo at 8 years old

Then moved to Basic, Turbo Pascal, C++, Java

2001 – 2004 Various programming jobs in Argentina

2004 – 2008: TopCoder 2009 – 2015: Facebook

Alejandro Marcu

Scalable architecture

Scaling the database

Caching

Introducing new features

What You Will Learn Today

Scalable architecture

Single Server

Hosted or in the cloud

Web App: Apache/Nginx +

DB: MySql, MongoDB, etc.

Cache: Memcache, Redis Web App

CacheDB

Server

More RAM

More cores or faster CPU

Network Interfaces

Scaling Vertically

Functional Partitioning

Servers can have different

hardware specs

More latency

Limited growthServer 1

Server 3Server 2

Web App

CacheDB

Data Center

Splitting the Web App

Web Front End should be a

thin presentation layer

Services

Just another class

Remote over SOAP, REST, Thrift

Start simple, plan for scale

Web Front End

Service 1

Service 2 Service nBack End

iOS App

AndroidApp

Back end servers can have

one or more services

Some services can be in

more than one server

Service 1 Service nBack End

Server 4 Server k

Server 1

Server 3Server 2

Web Front End

CacheDB

Data Center

Don’t store anything locally

Use external storage (e.g. databases)

Can use local caching

Stateless Services

HTTP Session

Cookies

External Data Store

Uploaded Files

DFS: GFS, HDFS, ClusterFS

Amazon S3

Stateless Front End

Multiple Front End Servers

Load Balancer:

Cloud based (Amazon ELB)

Software (NGINX, HAProxy)

Hardware (BIG-IP,

Netscaler)

Load Balance

Service 1 Service n

Back End

CacheDB

Data Center

Web FE 1Front End

Web FE k

Caching static files

Files that are the same on each request, e.g. jpg, png, css, js, mp3, etc

Reverse HTTP Proxy Load balancers usually

provide this functionality

CDN (Content Delivery Network) E.g. Akamai, Amazon

Cloudfront Pay for usage Multiple locations

User CDN

Data Center

staticcontent

dynamiccontent

Advantages

Lower latency for users

Reduced disaster risk

Economic opportunities

Challenges

Consistency

Latency between data centers

Bandwidth between data centers

Multiple Data Centers

Scaling databases

Too much data

Too many reads

Too many writes

Want higher availability

Scaling relational databases

Replication

Usually much more reads

than writes

Higher availability

Read after write can be

Master

Slave Slave

DB clients

Binlogs

Limited growth

Can separate unrelated

functionality

Payment

Sharding

Tables are split into multiple

Sharding key used to decide

which db, e.g. id

Sharding function, e.g.

db(id) = (id % 2) + 1

Searching becomes more

complicated

id name

1 John

3 Jack

5 Anne

id name

2 Louise

6 Marie

Sharding

E.g., add an extra db

New sharding function:

db(id) = (id % 3) + 1

Conclusion: modulo is not a

good sharding function

id name

1 John

3 Jack

5 Anne

id name

2 Louise

6 Marie

id name

1 John

id name

2 Louise

5 Anne

id name

3 Jack

6 Marie

Consistent Sharding

Consistent sharding needs

less reallocations id name

1 John

3 Jack

5 Anne

id name

2 Louise

6 Marie

id name

1 John

3 Jack

id name

2 Louise

id name

5 Anne

6 Marie

Sharding

Create many logical DBs

Distribute them across

servers

Server 1

DB 1DB 2……DB 16

Server 2

DB 17DB 18……DB 32

Sharding

Re-distribute DBs when

needed

Need a function to map db

to server, can be a

configuration

Server 1

DB 1DB 2……DB 16

Server 2

DB 17DB 18……DB 24

Server 3

DB 25DB 18……DB 32

Sharding colocation

Put owned data in the same

table (e.g. shard by user_id

in post table)

Can execute joins

userid name

1 John

3 Jack

5 Anne

id name

2 Louise

6 Marie

DB 2user

postid user_id text

100 1 …

125 1 …

180 3 …

postid user_id text

143 2 …

110 6 …

175 6 …

Sharding fan-out

Many-to-many relationships

are spread out

To get friend’s names:

Get ids

Group by db

Query on each db

Gets worse with more dbs

Caching helps a lot

Needs inverse entries

userid name

1 John

3 Jack

5 Anne

id name

2 Louise

6 Marie

DB 2user

friendid1 id2

Replication

Scales reads, higher availability

Functional partitioning

Limited scalability

Helps across the board

Sharding

Scales reads, writes, too much data and helps with availability

Those 3 techniques can be combined

Database scaling

Caching

Usually required at large scale Key-Value stores

Set(key, value[, TTL])

Get(key)

Delete(key) Different levels

Client side (e.g. in the browser in JS)

In the WebServer (e.g. APC)

Distributed cache (e.g. Redis, Memcached)

Caching application data

E.g. APC (Alternative PHP Cache)

Very fast

Duplicated caching between web servers

Expensive to invalidate

Use sparingly, mostly for global data

Caching in the web server

Examples:

Memcached (+ McRouter or libmemcached)

One or more cache servers, shared use between clients

Network latency

Distributed cache

Features to consider:

Replication

Partitioning

Separate pools

Persistence

Atomic operations

Distributed cache

When the value is no longer valid, usually just delete the key

Example:

user_friends:100 => ‘John X, Bob Y, Anne Z’

Need to invalidate when:

The user adds or removes friends

A friend removes him as a friend

A friend changes his name

Can you tolerate temporary inconsistencies?

Cache invalidation

What happens if you change the structure of the values? Example: (old) user_friends:100 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’

New code breaks with old style keys

Old code breaks with new style keys

Solution: use versions: (old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’

Cache versioning

Objectives:

A/B testing

Quickly revert it if needed

Protect infrastructure

Ease of development

Some possibilities:

1. Development branch

2. Feature toggle

3. Percentage Rollout

4. Advanced Rollout

New branch for the feature, merge when finished

Can be fine in the early stages

No extra setup or complexity

Long living branch, may be hard to merge

Development Branch

Can be changed at run time (console or configuration)

Should distinguish prod from testing

Allows for intermediate commits Code structure:

if (feature_enabled(‘homepage_redesign’)) { new_homepage();} else { old_homepage();}

Feature Toggle

Dynamically control the percentage of users

for a feature

When increasing the percentage, should

include previous users Code structure:

if (feature_enabled(‘homepage_redesign’, $user_id)) { new_homepage();} else { old_homepage();}

Percentage Rollout

Turn on/off features for a percentage of users that:

Are employees

Are in another rollout group

Use a certain language

Are in a certain country

Individually whitelist or blacklist people

Advanced Rollout

Some frameworks to check out:

Swivel

Opensoft/rollout

LaunchDarkly

Don’t forget to clean up the old code paths

Contact Information

amarcu@gmail.com

/alejandro.marcu

/alejandromarcu

@AlejandroMarcu

/in/alejandromarcu

Scaling your website

Career