10 Pro Tips for scaling your
startup from 0-10M users
Stephan Hadinger
Sr Mgr, Solutions Architecture
@aws_actus
Consumer Business
> 270million activecustomers accounts
13 countries:Australia, Brazil, Canada, China, France, Germany, India, Italy, Japan, Mexico, Netherlands, Spain, United Kingdom
Seller Business IT Infrastructure Business
> 2 million sellers on Amazon websites
Use Amazon technology for your own retail website
Leverage Amazon’s massive fulfillment center network
> 1 million active customers in over 190 countries
Cloud computing infrastructure for hosting web-scale solutions
“AMAZON WEB SERVICES IS PROBABLY THE MOST
IMPORTANT THING THAT HAS HAPPENED TO MOBILE AND
WEB APP DEVELOPERS THAT THE PRESS JUST MISSES.
- Steve Blank, The Four Steps to the Epiphany - The Startup Owner’s Manual
…JEFF BEZOS HAS ACCIDENTALLY OR MAYBE ON
PURPOSE POWERED A WHOLE GENERATION OF
APPLICATIONS.”
Iterative Product Development
MVP
Time
Scal
e
Started: burbn, location-based mobile app. Photo sharing is just one feature
Now: re-written as photo app. Sold to FB for 1bn
http://www.slideshare.net/gueste94e4c/dropbox-startup-lessons-learned-3836587
Pro Tip #1Learn early, learn often
Pro Tip #2Not Launching = Painful
Not Learning = Fatal
http://www.slideshare.net/gueste94e4c/dropbox-startup-lessons-learned-3836587
11.6s
Mean time between
deployments
(weekday)
1,079
Max number of
deployments in a
single hour
10,000
Mean number of
hosts
simultaneously
receiving a
deployment
30,000
Max number of
hosts
simultaneously
receiving a
deployment
DEPLOYMENTS AT
AMAZON.COM
Pro Tip #3“Keep the main thing the main thing”
http://www.slideshare.net/gueste94e4c/dropbox-startup-lessons-learned-3836587
“Fortunately, we spent almost all our effort on making an elegant, simple product that
‘just works’ and making users happy”
Drew Houston, Founder, Dropbox
1. Learn early, learn often2. Not Launching = Painful, Not Learning = Fatal3. Keep the main thing the main thing
http://aws.amazon.com/solutions/case-studies/airbnb/
Pro Tip #4Anytime you can use an AWS service to solve a problem, do it
so that you don’t have to solve that problem yourself
AP
I
Regions Availability Zones Edge Locations
Storage
S3 EBS Glacier StorageGateway
Fo
un
dat
ion
Ser
vice
s
Networking
VPC DirectConnect
ELB Route53
Databases
RDS ElastiCacheDynamo RedShift
Content Delivery
CloudFront
Analytics
DataPipelineEMR Kinesis
EC2
Compute
WorkSpaces
AWS Global Infrastructure
Dep
loym
ent
&
Man
agem
ent
IAM Federation
Identity & AccessMonitoring
CloudWatch
Deployment & Management
BeanStalk Cloud
Formation
OpsWork CloudTrail
Libraries, SDK’s
Web Console
Interaction
Human Interaction
Support
Command Line
AWS Global Infrastructure
Ap
plic
atio
nS
ervi
ces
Application Services
SES SNS SQS ElasticTranscoder
CloudSearch SWF AppStream
*Non comprehensive list
AP
I
Regions Availability Zones Edge Locations
Storage
S3 EBS Glacier StorageGateway
Fo
un
dat
ion
Ser
vice
s
Networking
VPC DirectConnect
ELB Route53
Databases
RDS ElastiCacheDynamo RedShift
Content Delivery
CloudFront
Analytics
DataPipelineEMR Kinesis
EC2
Compute
WorkSpaces
AWS Global Infrastructure
Dep
loym
ent
&
Man
agem
ent
IAM Federation
Identity & AccessMonitoring
CloudWatch
Deployment & Management
BeanStalk Cloud
Formation
OpsWork CloudTrail
Libraries, SDK’s
Web Console
Interaction
Human Interaction
Support
Command Line
AWS Global Infrastructure
Ap
plic
atio
nS
ervi
ces
Application Services
SES SNS SQS ElasticTranscoder
CloudSearch SWF AppStream
Low-cost, fast development on AWS
Time
Scal
e
Scenario Small team with initial idea for Mobile app
3 months to get to launchUnknown customer/problem/solution
No cash….
Amazon EC2Elastic Load
Balancing
Elastic
Virtual Servers
in the cloud
Dynamic traffic
distribution
Amazon
Route 53
Availability Zone B
Availability Zone A
Domain Name System (DNS) web service
Amazon RDS
Managed relational
database service
DBA
1. Learn early, learn often2. Not Launching = Painful, Not Learning = Fatal3. Keep the main thing the main thing4. Use AWS services to solve problems that you don’t have to solve yourself
http://aws.amazon.com/solutions/case-studies/flipboard
Pro Tip #5Focus less on MySQL administration,
more on scaling out the rest of your services
Getting to MVP for $250
Time
Scal
e
$235$15$0
Total Spend to MVP
$250• 3 months dev/test/release • Serving Beta customers• Ready for full production
and scale
http://aws.amazon.com/solutions/case-studies/flipboard
Pro Tip #6Focus on distributed services and fault tolerant
systems from Day 1
1. Learn early, learn often2. Not Launching = Painful, Not Learning = Fatal3. Keep the main thing the main thing4. Use AWS services to solve problems that you don’t have to solve yourself5. Focus less on MySQL administration, more on the rest of your services6. Focus on distributed services and fault tolerant systems from Day 1
One of the fastest growing sites in history. Cites AWS for making it possible to handle growth and scale
http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html
February 2013
48.7 million users globallyRaised $200M (Total = $338M)
$2.5B valuation
How do we keep costs down as we scale up?
Pro Tip #7Use Auto-scaling
http://highscalability.com/blog/2012/12/12/pinterest-cut-costs-from-54-to-20-per-hour-by-automatically.html
EC2 Instance Hours S3 Storage Volume
UP 293% UP 1700%
• 80 million objects stored in S3
• 410 terabytes of user data
• 70 master databases
• 150 EC2 instances in the web tier
• 90 instances for in-memory caching
• 35 instances used for internal purposes
• Elastic Load Balancing
• Elastic MapReduce (Hadoop)
Pro Tip #8Use “Reserved Instances”
http://highscalability.com/blog/2012/12/12/pinterest-cut-costs-from-54-to-20-per-hour-by-automatically.html
AWS offers multiple purchasing models
On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Reserved
Make a low, one-time
payment and receive a
significant discount on
the hourly charge
For committed
utilization
Spot
Bid for unused capacity,
charged at a Spot Price
which fluctuates based
on supply and demand
For time-insensitive or
transient workloads
• Auto-scaling – automated shut down of 40% of instances off-peak
• Reserved Instances – to save on EC2 for base workload
Savings: 71%
1. Learn early, learn often2. Not Launching = Painful, Not Learning = Fatal3. Keep the main thing the main thing4. Use AWS services to solve problems that you don’t have to solve yourself5. Focus less on MySQL administration, more on the rest of your services6. Focus on distributed services and fault tolerant systems from Day 17. Use Auto-scaling8. Use “Reserved Instances”
Pro Tip #9Look into all possible ways to improve
product and user experienceHint: this involves lots of analytics behind the scene
“THANKS TO AMAZON WEB SERVICES, WE CAN DELIGHT OUR PLAYERS WORLDWIDE.”
Sami Yliharju | Services Lead
Let’s addBig Data
for analytics of web, mobile, gaming, and log data
Amazon EMR (Elastic Map Reduce)
Amazon Kinesis
Hosted Hadoop framework
Real-time processing of large, distributed data
streams
Amazon Redshift
Petabyte-scaledata warehouse service
http://aws.amazon.com/solutions/case-studies/flipboard
Pro Tip #10Control your costs as your user base grows
1. Learn early, learn often2. Not Launching = Painful, Not Learning = Fatal3. Keep the main thing the main thing4. Use AWS services to solve problems that you don’t have to solve yourself5. Focus less on MySQL administration, more on the rest of your services6. Focus on distributed services and fault tolerant systems from Day 17. Use Auto-scaling8. Use “Reserved Instances”9. Look into all possible ways to improve product and user experience10. Control your costs as your user base grows
25
China
9
France
Russia
JV 50/50 with local partner,
1m members
Position in high quality professional
profiles
French-speaking countries
in Africa
Ideally positioned organically with strong
momentum
#1 with 3m members
France
Founded in 2005
Market leader, with 9m
members
Very strong ties with local
recruiters
Strong corporate offering,
with 21% of top 1,000
companies recruiting on
Viadeo
Acquired in 2008
#1 with 25m members
Trusted source of professional data in
China
Bringing trust into the professional world
Strong local management team
China
3
Africa
Viadeo
Our current platform
• Infrastructure : 250 Linux servers hosted in San Francisco, CA
• Backends : MySQL, ElasticSearch, Hbase, Spark, Hadoop
• Service-oriented platform : « Kasper », Java
– CQRS : Command-Query Responsibility Segregation
– ES : Event Sourcing
– DDD : Domain Driven Design
• Web application : « Limbo », Node.js
• Mobile applications : iOS and Android
AWS @ Viadeo : the story so far
• 2011 : file storage (Amazon S3)
• 2012 : Viadeo office flooded! Internal servers destroyed All services rebuilt in AWS in 48h (VPC, EC2)
• 2013 : data processing (Hadoop, Amazon EMR)
• 2014 : more data !
– New analytics infrastructure : Snowplow S3 Redshift (≈ 20 million events / day for starters)
– Content personalization : EMR, Spark
AWS @ Viadeo : the story so far
• Infrastructure live in 3 regions (us-east-1, us-west-2, eu-west-1)
• 4 VPC
• ≈ 100 EC2 instances (half production, half development)
• ≈ 15 TB in Amazon S3
• 2 Amazon Redshift clusters (5 instances each)
• AWS services currently in production at Viadeo :
– Core infra : VPC, Route 53, IAM, S3, CloudFront,
– Instances : Elastic Load Balancing, Elastic Beanstalk, CloudFormation, EC2
– Data storage & processing : RDS, EMR, Redshift
Enter physical infrastructure challenges…
• Improving our agility
– How can we deploy infrastructure as often as we deploy code (i.e. every day)?
– How can we experiment quickly and at (almost) zero cost?
• Optimizing our spend
– How can we avoid CAPEX peaks caused by hardware refresh?
– How do we best adapt spend to traffic and business conditions?
• Implementing a stronger Disaster Recovery plan
– …without building a 2nd DC which we don’t need anyway
• Scaling storage & CPU for data processing
– Do we really need big servers, lots of rack space and lots of power?
– How do we efficiently handle unpredictable workloads?
Why we decided to move everything to AWS
• We want to focus on our real job : building a great service
– No more issues with hardware vendors, no more licensing hell. Ever!
– Experimenting, (re)deploying & scaling are just a few clicks away
– Great ecosystem of SaaS partners running in AWS
• We’ve been using it for a while now and it just works
– AWS CloudFormation is infrastructure as code
– « Time to server » is in minutes, not days
• We’re ready and eager for it : Agile and DevOps are strong Viadeo values
• We want to measure ROI and optimize cost
Our AWS manifesto
• Key objectives : automation, scalability, safety
• Continuous integration & delivery on all layers (infrastructure, instances, applications)
• Infrastructure : deep integration with AWS services for maximum leverage
• Applications : use our current stack for now, unless the benefit/cost ratio of adopting an equivalent AWS service is too good to pass (Redshift)
• Move everything, but no big bang : the move will be gradual plan & build with parallel run in mind (DC + AWS)
Infrastructure as code: done!
Github
CircleCI
Jenkins
Amazon S3
AWS AMIspacker.io
AWSCloudFormation
Puppet codebase, CloudFormation templates
New challenges ;)
• Some performance & cost trade-off in the short term
• Technical debt (aka « guess what’s under the rug? »)
– Layer 7 load balancing rules (*evil*), legacy filers, etc.
– Servers running really old versions of whatever: keep or reinstall?
– Any cruft ignored over the years. Time to clean up!
• Parallel run requires good connectivity AWS DirectConnect
• CloudFront performance not great across regions need a 2nd CDN
• Negotiating early termination of the legacy hosting contract !
Conclusion
• 5 years ago : «Cloud computing? Why?»
• Now : «NO cloud computing? Why?»
• Some good reasons, but mostly really bad ones
• Cloud computing a fashion? I don’t think so.
• Cloud = infrastructure in digital form (like photos, music, movies, money, etc.)
• Using AWS helps Viadeo change every day, by speeding up innovation and delivery
• See you there!