(NET409) How Twilio Migrated Its Services from EC2-Classic to EC2-VPC

Post on 15-Apr-2017

829 views 2 download

transcript

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

@Sumbry

Director of Cloud Services

Twilio.com

October, 2015

NET409

Movin' On Up to the VPCHow Twilio Migrated its Infrastructure from

EC2-Classic to EC2-VPC

Purpose of this talk

- Learn about Twilio

- Review legacy infrastructure

- Why EC2-VPC?

- How we built the Twilio Cloud

- How we migrated

- Internal tools developed

- Lessons learned

What Is a Twilio?

- A global communications company

- A real-time communications API

- Used by over 500,000 developers

- Requires low-latency resilient infrastructure

- Has lots of infrastructure on EC2-Classic

Who are Twilio customers?

Legacy Twilio

What did Twilio look like yesterday?

- Twilio has used AWS since 2008

- Three products

- All infrastructure located in us-east-1

- Hundreds of instances

- 10/8 shared private network

- Non-consecutive EIPs

Before global

What is going global?

- Launched outside US

- Global provisioning

- Route traffic between regions

- Low-latency communications

- Global service discovery

The network after global

Problems with going global

- Overlapping 10/8 networks

- Proxies not ideal, point-to-point

- Routing around failovers

- Need low latency connectivity

Why EC2-VPC?

What is EC2-VPC?

EC2-VPC is the next major revision of the EC2 platform:

- Software Defined Network

- Elastic Network Interfaces

- HVM and SR-IOV

What is a software defined network?

- Define your own network

- VPC and subnet routing tables

- Network Access Control Lists

- Provision networks like virtual machines

- Protects data-in-transit

What are elastic network interfaces?

- Public and Private EIPs

- Multiple Private EIPs per interface

- Multiple ENIs per instance

- Security groups follow an ENI

- ENI has a MAC address

What are HVM instances?

- Hardware Virtualized Machine instances

- PCI Express speeds to network adapter

- Low-latency access to network adapter

- Up to 10 GB network speeds

Why move to EC2-VPC?

- SDN solves overlapping 10/8 networks

- Route tables eliminates proxies

- Routing around failovers is an API call

- HVM solves low latency connectivity problem

The Twilio Cloud

What is the Twilio Cloud?

- Iteration 2.0 of our infrastructure

- Addresses many EC2-Classic limitations

- Connectivity between data centers

- Automatic failover and redundancy

- Provider agnostic

What does the Twilio Cloud look like?

What about routing?

We built it, did they come?

We solved all previous issues but no one used it:

- Twilio Cloud was isolated from EC2-Classic

- Existing services had no migration path

Data center migration

Why is a migration like moving data centers?

- Separate infrastructure from EC2-Classic

- Need to migrate all your compute

- Zero downtime

The networks

What problems do we need to solve?

- Move an instance from Classic to VPC

- Network connectivity

- Instance discoverability

- No service interruptions

Classic deploy

VPC deploy

Kill Classic

Steps to migrate a service

Wait - you just invented a bunch of stuff …

- Bridge EC2-Classic and VPC?

- Global Service Discovery?

- Multiple Service Deployments?

- WTF!

Migration tools

What are the tools for migrating to EC2-VPC?

We modified existing internal tools:

- IP Tunnel Manager / ClassicLink

- Global Service Discovery

- HAProxy Distributed Load-Balancing

- Config-Renderer

What is IP Tunnel Manager ClassicLink?

ClassicLink allows you to link

your EC2-Classic instance to

a VPC in your same account

in the same region.

It provides network

connectivity between EC2-

Classic and EC2-VPC

instances.

What is Global Service Discovery?

GSD stores IP addresses for any service in the cluster and

serves them on-demand.

What is distributed load balancing?

Every instance in the cluster runs its own instance of

HAProxy. It load balances requests to any downstream

services.

What is Config-Renderer?

Config-Renderer renders configuration files filled with data

from Global Service Discovery, like HAProxy Configs!

What about deploying services?

Our internal

provisioning tool

called BoxConfig lets

us deploy services

with the click of a

button.

How does it all work?

Unix philosophy

We use lots of small tools and combine them:

- Twilio Cloud to route

- ClassicLink to bridge

- HAProxy for distributed load-balancing

- Global Service Discovery for IP info

- Config-Renderer to write HAProxy configs

- BoxConfig to deploy

In conclusion

Where are you today?

- The Twilio Cloud is live today

- Routes traffic through nine virtual data centers

- Over 100 IPSEC Mesh links

- Automatic region failover thanks to EIGRP

- 35% of Twilio infrastructure is in EC2-VPC

- We can complete the migration in 2015

What are some lessons learned?

- Properly subnet your VPC. You have one shot.

- No need to do a giant migration all at once.

- Tools need to work both ways in case you screw up.

- Less complexity always wins.

Thank you!

Remember to complete

your evaluations!

Related Sessions