+ All Categories
Home > Technology > Improving Efficiency of Twitter Infrastructure using Chargeback

Improving Efficiency of Twitter Infrastructure using Chargeback

Date post: 23-Jan-2018
Category:
Upload: vinu-charanya
View: 219 times
Download: 2 times
Share this document with a friend
66
Improving efficiency of Twitter Infrastructure using Chargeback @vinucharanya @micheal
Transcript
Page 1: Improving Efficiency of Twitter Infrastructure using Chargeback

Improving efficiency of Twitter Infrastructureusing Chargeback

@vinucharanya @micheal

Page 2: Improving Efficiency of Twitter Infrastructure using Chargeback

• Brief History • Problem • Chargeback

• Engineering Challenges • The product • Impact

• Future

AGENDA

Page 3: Improving Efficiency of Twitter Infrastructure using Chargeback

© Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html

2010

Page 4: Improving Efficiency of Twitter Infrastructure using Chargeback

© Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html

Page 5: Improving Efficiency of Twitter Infrastructure using Chargeback

© Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html

3283 Tweets Per Sec (TPS)

Page 6: Improving Efficiency of Twitter Infrastructure using Chargeback

© Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html

5X increaseon avg. TPS

3283 Tweets Per Sec (TPS)

Page 7: Improving Efficiency of Twitter Infrastructure using Chargeback

©The Simpsons

Page 8: Improving Efficiency of Twitter Infrastructure using Chargeback
Page 9: Improving Efficiency of Twitter Infrastructure using Chargeback

MONOLITH SERVICES

Page 10: Improving Efficiency of Twitter Infrastructure using Chargeback

FENCING & OWNERSHIP

Clear isolation of services & its ownership.

RELIABILITY Failure isolation and graceful degradation

SCALABILITY & EFFICIENCY

Scale independently ensuring efficient use of infrastructure

DEVELOPER PRODUCTIVITY

Make it simple for engineers to build and launch services quickly and easily

(Micro) Services Oriented Model

Page 11: Improving Efficiency of Twitter Infrastructure using Chargeback

2013

Page 12: Improving Efficiency of Twitter Infrastructure using Chargeback

August 2 at 7:21:50 PDT

Page 13: Improving Efficiency of Twitter Infrastructure using Chargeback

August 2 at 7:21:50 PDT

143,199 Tweets Per Sec (TPS)

Page 14: Improving Efficiency of Twitter Infrastructure using Chargeback

August 2 at 7:21:50 PDT

28X increaseon avg. TPS

143,199 Tweets Per Sec (TPS)

Page 15: Improving Efficiency of Twitter Infrastructure using Chargeback

Hundreds and thousands of #events at any given instant

Page 16: Improving Efficiency of Twitter Infrastructure using Chargeback

Most Retweeted Tweet in History

Page 17: Improving Efficiency of Twitter Infrastructure using Chargeback

RELIABILITY DEVELOPER AGILITY SCALABILITY EFFICIENCY

Page 18: Improving Efficiency of Twitter Infrastructure using Chargeback

“Do More with Less”

Page 19: Improving Efficiency of Twitter Infrastructure using Chargeback

Fast forward to 2016

Page 20: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE AND DATACENTER MANAGEMENT

CORE APPLICATION SERVICES

TWEETS

USERS

SOCIAL GRAPH

PLATFORM SERVICES

SEARCH

MESSAGING & QUEUES

CACHE

MONITORING AND ALERTING

REVERSE PROXY

FRAMEWORK/

LIBRARIES

FINAGLE (RPC)

SCALDING (Map Reduce in Scala)

HERON (Streaming Compute)

JVM

MANAGEMENT

TOOLS

SELF SERVE

SERVICE DIRECTORY

CHARGEBACK

CONFIG

DATA & ANALYTICSPLATFORM

INTERACTIVE QUERY

DATA DISCOVERY

WORKFLOW MANAGEMENT

INFRASTRUCTURESERVICES

MANHATTAN(Key-Val Store)

HDFS (File System)

BLOBSTORE

GRAPH STORE

STORAGE

AURORA (Scheduler)

HADOOP (Map-Reduce)

MESOS (Cluster Manager)

COMPUTE

DEPLOY(Workflows)

Page 21: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE AND DATACENTER MANAGEMENT

CORE APPLICATION SERVICES

TWEETS

USERS

SOCIAL GRAPH

PLATFORM SERVICES

SEARCH

MESSAGING & QUEUES

CACHE

MONITORING AND ALERTING

REVERSE PROXY

FRAMEWORK/

LIBRARIES

FINAGLE (RPC)

SCALDING (Map Reduce in Scala)

HERON (Streaming Compute)

JVM

MANAGEMENT

TOOLS

SELF SERVE

SERVICE DIRECTORY

CHARGEBACK

CONFIG

DATA & ANALYTICSPLATFORM

INTERACTIVE QUERY

DATA DISCOVERY

WORKFLOW MANAGEMENT

INFRASTRUCTURESERVICES

MANHATTAN(Key-Val Store)

HDFS (File System)

BLOBSTORE

GRAPH STORE

STORAGE

AURORA (Scheduler)

HADOOP (Map-Reduce)

MESOS (Cluster Manager)

COMPUTE

DEPLOY(Workflows)

Page 22: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE AND DATACENTER MANAGEMENT

CORE APPLICATION SERVICES

TWEETS

USERS

SOCIAL GRAPH

PLATFORM SERVICES

SEARCH

MESSAGING & QUEUES

CACHE

MONITORING AND ALERTING

REVERSE PROXY

FRAMEWORK/

LIBRARIES

FINAGLE (RPC)

SCALDING (Map Reduce in Scala)

HERON (Streaming Compute)

JVM

MANAGEMENT

TOOLS

SELF SERVE

SERVICE DIRECTORY

CHARGEBACK

CONFIG

DATA & ANALYTICSPLATFORM

INTERACTIVE QUERY

DATA DISCOVERY

WORKFLOW MANAGEMENT

INFRASTRUCTURESERVICES

MANHATTAN(Key-Val Store)

HDFS (File System)

BLOBSTORE

GRAPH STORE

STORAGE

AURORA (Scheduler)

HADOOP (Map-Reduce)

MESOS (Cluster Manager)

COMPUTE

DEPLOY(Workflows)

Page 23: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE AND DATACENTER MANAGEMENT

CORE APPLICATION SERVICES

TWEETS

USERS

SOCIAL GRAPH

PLATFORM SERVICES

SEARCH

MESSAGING & QUEUES

CACHE

MONITORING AND ALERTING

REVERSE PROXY

FRAMEWORK/

LIBRARIES

FINAGLE (RPC)

SCALDING (Map Reduce in Scala)

HERON (Streaming Compute)

JVM

MANAGEMENT

TOOLS

SELF SERVE

SERVICE DIRECTORY

CHARGEBACK

DEPLOY(Workflows)

CONFIG

DATA & ANALYTICSPLATFORM

INTERACTIVE QUERY

DATA DISCOVERY

WORKFLOW MANAGEMENT

INFRASTRUCTURESERVICES

MANHATTAN(Key-Val Store)

HDFS (File System)

BLOBSTORE

GRAPH STORE

STORAGE

AURORA (Scheduler)

HADOOP (Map-Reduce)

MESOS (Cluster Manager)

COMPUTE

Page 24: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE AND DATACENTER MANAGEMENT

CORE APPLICATION SERVICES

TWEETS

USERS

SOCIAL GRAPH

PLATFORM SERVICES

SEARCH

MESSAGING & QUEUES

CACHE

MONITORING AND ALERTING

REVERSE PROXY

FRAMEWORK/

LIBRARIES

FINAGLE (RPC)

SCALDING (Map Reduce in Scala)

HERON (Streaming Compute)

JVM

MANAGEMENT

TOOLS

SELF SERVE

SERVICE DIRECTORY

CHARGEBACK

CONFIG

DATA & ANALYTICSPLATFORM

INTERACTIVE QUERY

DATA DISCOVERY

WORKFLOW MANAGEMENT

INFRASTRUCTURESERVICES

MANHATTAN(Key-Val Store)

HDFS (File System)

BLOBSTORE

GRAPH STORE

STORAGE

AURORA (Scheduler)

HADOOP (Map-Reduce)

MESOS (Cluster Manager)

COMPUTE

DEPLOY(Workflows)

Page 25: Improving Efficiency of Twitter Infrastructure using Chargeback

THOUSANDS OF SERVICES

HUNDREDS OF TEAMS

Page 26: Improving Efficiency of Twitter Infrastructure using Chargeback

What is the overall use of infrastructure & platform resources across Twitter’s services?

Page 27: Improving Efficiency of Twitter Infrastructure using Chargeback

What is the overall use of infrastructure & platform resources across Twitter’s services?

How to attribute resource consumption to teams/organization?

Page 28: Improving Efficiency of Twitter Infrastructure using Chargeback

What is the overall use of infrastructure & platform resources across Twitter’s services?

How to attribute resource consumption to teams/organization?

How do you incentivize the right behavior to improve efficiency of resource usage?

Page 29: Improving Efficiency of Twitter Infrastructure using Chargeback

Ability to meter allocation and utilization of resources per service, per engineering team and charge them accordingly

CHARGEBACK

Page 30: Improving Efficiency of Twitter Infrastructure using Chargeback

COMPUTE STORAGE

PLATFORM AND OTHER SERVICES

SERVICE Tweet Service

SERVICE Ads Shard

SERVICE Who To Follow

RESOURCEunit of abstraction

MULTI-TENANCYtenant management using canonical identifiers

Page 31: Improving Efficiency of Twitter Infrastructure using Chargeback

SERVICEIDENTITY

RESOURCECATALOG

COMPUTE STORAGE

PLATFORM AND OTHER SERVICES

SERVICE Tweet Service

SERVICE Ads Shard

SERVICE Who To Follow

RESOURCEunit of abstraction

MULTI-TENANCYtenant management using canonical identifiers

Page 32: Improving Efficiency of Twitter Infrastructure using Chargeback

SERVICEIDENTITY

RESOURCECATALOG

COMPUTE STORAGE

PLATFORM AND OTHER SERVICES

SERVICE Tweet Service

SERVICE Ads Shard

SERVICE Who To Follow

RESOURCEunit of abstraction

MULTI-TENANCYtenant management using canonical identifiers

METERING ANDCHARGEBACK

Page 33: Improving Efficiency of Twitter Infrastructure using Chargeback

SERVICEIDENTITY

RESOURCECATALOG

METERING ANDCHARGEBACK

COMPUTE STORAGE

SERVICEMETADATA

PLATFORM AND OTHER SERVICES

SERVICE Tweet Service

SERVICE Ads Shard

SERVICE Who To Follow

RESOURCEunit of abstraction

MULTI-TENANCYtenant management using canonical identifiers

Page 34: Improving Efficiency of Twitter Infrastructure using Chargeback

UNIFIED CLOUD PLATFORM

SERVICEIDENTITY

RESOURCECATALOG

METERING ANDCHARGEBACK

COMPUTE STORAGE

SERVICEMETADATA

PLATFORM AND OTHER SERVICES

SERVICE Tweet Service

SERVICE Ads Shard

SERVICE Who To Follow

RESOURCEunit of abstraction

MULTI-TENANCYtenant management using canonical identifiers

Page 35: Improving Efficiency of Twitter Infrastructure using Chargeback

SERVICE IDENTITY

Page 36: Improving Efficiency of Twitter Infrastructure using Chargeback

A canonical way of identifying a service that consumesresources on various platform infrastructure.

Page 37: Improving Efficiency of Twitter Infrastructure using Chargeback

• Disparate identifiers across infrastructure and platform services

• Multiple provisioning workflows (Self-Serve, Tickets)

• Disparate Ownership trackers (Email, LDAP)

• Lack of support for public cloud Identity and Access Management systems (IAM)

role: cim-servicejob_name: ui; env: prodid: <role>.<env>.<job_name>

app_id: cost_reportingid: <app_id>

Project: chargebackTeam: Cloud Infra MgmtSource code: /cim

COMPUTE

STORAGE

PROBLEM

BATCHCOMPUTE

role: cim-servicepool: etl_pipe_prodjob_name: compute_costid: <role>.<pool>.<job_name>

Page 38: Improving Efficiency of Twitter Infrastructure using Chargeback

DASHBOARD

IDENTITY MANAGER

PROVISION

CONSUMPTION

• Designed an Entity Model that • Define canonical identifier scheme

across infrastructure and platform services• Define ownership structure with org

• Single pane of glass for every developer to manage their project IDs (including abstracting out public cloud IAM systems)

• Provider APIs for infrastructure services to provision and manage identityINFRASTRUCTURE

SERVICEINFRASTRUCTURE

SERVICEINFRASTRUCTURE

SERVICEINFRASTRUCTURE

SERVICEINFRASTRUCTURE

SERVICE

OUR APPROACH

API

Page 39: Improving Efficiency of Twitter Infrastructure using Chargeback

Source of truth for identifier to org structure mapping improving Service ownership within the Org

Enables service to service authentication/authorization

IMPACT

Page 40: Improving Efficiency of Twitter Infrastructure using Chargeback

BUSINESS OWNER

TEAM

PROJECT

SERVICE/SYSTEM ACCOUNT

<INFRA, CLIENTID>

1:N

1:N

1:N

1:N

ENTITY MODEL FOR SERVICE IDENTITY

Model that provides canonical identifier across infrastructure and platform service and ties it to an org structure

Page 41: Improving Efficiency of Twitter Infrastructure using Chargeback

BUSINESS OWNER

TEAM

PROJECT

SERVICE/SYSTEM ACCOUNT

<INFRA, CLIENTID>

1:N

1:N

1:N

1:N

REVENUE

ADS SERVING

adshard

adshard

<Aurora, adshard.prod.adshard>

EXAMPLE of services running (on Aurora/Mesos)

ADS PREDICTION

prediction

ads-prediction

<Aurora, ads-prediction.prod.campaign-x>

ENTITY MODEL: EXAMPLE

Page 42: Improving Efficiency of Twitter Infrastructure using Chargeback

RESOURCE CATALOG

Page 43: Improving Efficiency of Twitter Infrastructure using Chargeback

Consistent way of identifying and inventorying ofresources of various platform infrastructure.

Page 44: Improving Efficiency of Twitter Infrastructure using Chargeback

• Lack of clarity on what is available & how many resources are consumed

• Need to capture resource fluidity across infrastructure and platform services

• Better support to model abstract resources (ex, QPS, Tweets per Second)

• Need to define TCO (Total Cost of Ownership) of a resource per unit time

PROBLEM

CPUMEMORYDISK

STORAGE IN GBWPSRPS

COMPUTE

STORAGE

BATCHCOMPUTE

CPUFILES ACCESSEDSTORAGE IN GB

Page 45: Improving Efficiency of Twitter Infrastructure using Chargeback

CORES MEMORY DISK

application = Task( name = 'application', resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 1024 * MB), processes = [stage_application, run_application], constraints = order(stage_application, run_application))

Page 46: Improving Efficiency of Twitter Infrastructure using Chargeback

CORES MEMORY DISK

application = Task( name = 'application', resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 1024 * MB), processes = [stage_application, run_application], constraints = order(stage_application, run_application))

GPU NETWORK Need for Fluidity!

Page 47: Improving Efficiency of Twitter Infrastructure using Chargeback

• Defining unit price for a resource • Framework to price resources. • Ensure Total Cost of Ownership. Eg. License cost, chargeback cost from other

services, human cost etc. • Support for Time Granularity. Eg. Machines/VMs used per day, Cores used per day

Page 48: Improving Efficiency of Twitter Infrastructure using Chargeback

Used Cores

Operational Overhead

Headroom

Underutilized Quota AllocationTotal Cost of Ownership

Twitter Compute Platform

$X core-dayContainer Size Buffer (Underutilized Reservation)

Exce

ss Q

uota

and

Res

erva

tion

Non-Prod Used Cores

Disaster Recovery & Event Spikes

Page 49: Improving Efficiency of Twitter Infrastructure using Chargeback

PROVIDER

INFRASTRUCTURE SERVICE

OFFERINGS

OFFER MEASURES

OFFER MEASURE COST

1:N

1:N

1:N

1:N

ENTITY MODEL FOR RESOURCE CATALOG

Model that supports Resource Fluidity and captures and manages unit price of a resource over time.

Page 50: Improving Efficiency of Twitter Infrastructure using Chargeback

TWITTER DC/PUBLIC CLOUD

AURORA

COMPUTE

CORE-DAYS

$X

PROVIDER

INFRASTRUCTURE SERVICE

OFFERINGS

OFFER MEASURES

OFFER MEASURE COST

1:N

1:N

1:N

1:N

EXAMPLE of Resource Catalog

TWITTER DC

HADOOP

STORAGE

GB- RAM

ENTITY MODEL: EXAMPLE

PROCESSING CLUSTER

FILE ACCESSES

…GB- RAM

FILE ACCESSES… …

$X $Y …$M $N… …

Page 51: Improving Efficiency of Twitter Infrastructure using Chargeback

METERING PIPELINE

Page 52: Improving Efficiency of Twitter Infrastructure using Chargeback

HIGH LEVEL ARCHITECTURE

Page 53: Improving Efficiency of Twitter Infrastructure using Chargeback

The Product

Page 54: Improving Efficiency of Twitter Infrastructure using Chargeback

TEAM/ORG BILL

Page 55: Improving Efficiency of Twitter Infrastructure using Chargeback

INFRASTRUCTURE PNL

Page 56: Improving Efficiency of Twitter Infrastructure using Chargeback

ORG/TEAM BUDGET

Page 57: Improving Efficiency of Twitter Infrastructure using Chargeback

CUSTOM REPORTS

• Infrastructure & Platform Owners • Overall Cluster Growth • Allocation v/s Utilization of resources by Customer Team

• Service Owners • Allocation v/s Utilization of resources across each Infrastructure & Platform

• Finance • Budget Management (Budget v/s Spend)

• Execs • Efficiency • Trends

Page 58: Improving Efficiency of Twitter Infrastructure using Chargeback

What has been the Impact?

Page 59: Improving Efficiency of Twitter Infrastructure using Chargeback

Jun 1, 2015 Sept 1, 2015

Twitter Compute Platform (Aurora/Mesos)

3 months (Jun - Sep, 2015)

Allocated Quota

Utilized Cores

Page 60: Improving Efficiency of Twitter Infrastructure using Chargeback

Sept 1, 2015 Jan 1, 2015

Twitter Compute Platform (Aurora/Mesos)

4 months (Sep, 2015 - Jan, 2016)

Allocated Quota

Utilized Cores

Page 61: Improving Efficiency of Twitter Infrastructure using Chargeback

More core usage against reservationcompared to May 2015

33%

Page 62: Improving Efficiency of Twitter Infrastructure using Chargeback

• Ensures true to the cost unit price computation

• Input for capacity planning and budgeting

• Visibility into the organizational spend and enables accountability

• Improved utilization of infrastructure service resources • Enables comparison with Public Cloud Offerings

• Improved Service Ownership

IMPACT

Page 63: Improving Efficiency of Twitter Infrastructure using Chargeback

Kite - Unified Cloud Platform A cloud agnostic service lifecycle manager

Page 64: Improving Efficiency of Twitter Infrastructure using Chargeback

SERVICE IDENTITYMANAGER

RESOURCEPROVISIONING MANAGER

DASHBOARD(SINGLE PANE OF GLASS)

REPORTING

INFRASTRUCTURE SERVICEINFRASTRUCTURE SERVICEINFRASTRUCTURE SERVICEINFRASTRUCTURE & PLATFORM SERVICE

SERVICE LIFECYCLE WORKFLOWS

METADATA RESOURCE QUOTA MANAGEMENT DEPLOY METERING &

CHARGEBACKIDENTITY

PROVIDER APIS & ADAPTERS

Page 65: Improving Efficiency of Twitter Infrastructure using Chargeback

@vinucharanya

@dpkagrawal

@pragashjj@fvrojas

@micheal

@igb

@imjessicayuen

@_jordanly

@xcv58

Page 66: Improving Efficiency of Twitter Infrastructure using Chargeback

Recommended