+ All Categories
Home > Documents > Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting...

Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting...

Date post: 09-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
25
Testing in Production at Scale Amit Gud | SREcon19 Americas | March 25, 2019
Transcript
Page 1: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Testing in Production at Scale

Amit Gud | SREcon19 Americas | March 25, 2019

Page 2: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Meet Alice!

Page 3: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

AliceSoftware Developer

Upstream services

Downstream services

A

B C

D E F

Page 4: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Test runner

A

D E F

B C

A’

F’E’D’

Page 5: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Key Takeaway

Testing in Production can be a viable solution.

Page 6: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Agenda

01 The Scale02 Why Test in Production?03 Tenancy Oriented Architecture04 Tenancy Building Blocks05 Extensions to Tenancy Architecture

Page 7: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

600Cities

The Scale

64Countries

75mActive Riders

3mActive Drivers

15mTrips Per Day

10bCumulative Trips

Page 8: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

1000sMicroservices

1000sCommits per day

Page 9: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Agenda

01 The Scale02 Why Test in Production?03 Tenancy Oriented Architecture04 Tenancy Building Blocks05 Extensions to Tenancy Architecture

Page 10: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Why Test in Production?

Less operational cost of maintaining a parallel stack.

One knob to control capacity.No synchronization required.

Page 11: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Why Test in Production?

More accurate end-to-end capacity planning.

Delta test traffic runs on the production stack.Test traffic takes same code path as production traffic.

Bonus: The Testing in Production framework enables other use case.

Use cases like Canary, Shadowing, A/B Testing become an extension to the Testing in Production framework.

Page 12: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Agenda

01 The Scale02 Why we Test in Production?03 Tenancy Oriented Architecture04 Tenancy Building Blocks05 Extensions to Tenancy Architecture

Page 13: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Tenancy Oriented Architecture

Edge Gateway

.

.

.

Msg Q DB

Cache

ctxctx

keyspace: ctx

Log/Metrics

tag: ctx

Test traffic

Production traffic

● Isolation between test & production

● Tenancy-based access control○ Test request cannot create/mutate prod artifacts

● Minimal deviation between test and production environments

Page 14: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Design Considerations

● Infra components needing tenancy support

● Explosion of support matrix○ # of transports/encodings○ # of languages

● Gradual transition from current architecture to tenancy-aware architecture

● Tenancy-based service discovery & routing

● Onboarding overhead - impact on developer productivity

Page 15: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Agenda

01 The Scale02 Why Test in Production?03 Tenancy Oriented Architecture04 Tenancy Building Blocks05 Extensions to Tenancy Architecture

Page 16: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Tenancy Building Blocks

1. Context & Context propagation

2. Tenancy Aware Infrastructure

3. Tenancy Aware Environments

4. Tenancy Aware Routing

Page 17: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

1. Context & Context Propagation

● Tenancy context for both in-flight data (requests) and the at-rest data (persistent artifacts)

● Tenancy can be ‘testing’, ‘production’, etc.○ Aligns with tenancy of the actors involved in the request

● Request tenancy propagated agnostic to transport / protocol

● Persistent artifact tenancy implementation depends on the specific data component

Page 18: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

2. Tenancy Aware Infrastructure

● Types of infrastructure components○ Storage datastores, e.g. Cassandra○ Message queues, e.g. Kafka○ External caching, e.g. Redis○ Search, e.g. ElasticSearch○ Observability: Logging, Metrics.

● 2 ways of making infrastructure aware of tenancy○ Client library (language specific)○ Gateway integration

Page 19: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

3. Environments - Mixed Tenancy Mode (Goal State)

Test (pre-prod/dev) Production (multi tenant)

Test runner

● Every service instance is able to handle both test and prod traffic.

● “Native tenancy” support for all the infra components.

Services in mixed tenancy mode.

Edge Gateway

Page 20: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

3. Environments - Test Tenancy Mode (Intermediate State)

Test Tenancy (prod build) Production (prod build)

.

.

.

● Supports tenancy adoption in advance of infra support.

● Separates the infra components explicitly via a separate environment.

● Utilize tenancy-based request routing to route test traffic to test tenancy environment.

Service instance in Production environment.

Service instance in Test Tenancy environment.

Downstream service.

Page 21: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

4. Tenancy Aware Routing

● Out-of-process sidecar implementation.

● Agnostic to service language and transport used.

● Config-based routing policies and instant kill-switch.

Test Tenancy Instances

Production and Mixed Tenancy Instances

Routing layer (Deputy)

Mixed tenancy instance

Production tenancy instance

Test tenancy instance

Test tenancy request

Production tenancy request

Ap

DmCmBpBt

At

Ap

Page 22: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Agenda

01 The Scale02 Why we Test in Production?03 Tenancy Oriented Architecture04 Tenancy Building Blocks05 Extensions to Tenancy Architecture

Page 23: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Extensions to Tenancy Architecture

● Rate Limiting○ Tenancy-based QoS policies.○ Safe-guard production from other traffic.

● Shadow traffic○ Route traffic for A/B testing, where A is experimental code and B is production.○ Ability to route only portion of the traffic without affecting production.

● Canary Deployments, Blue/Green Deployments○ Gradually bring up/down deployments.

● Record & Replay○ Duplicate part or whole of traffic to record requests for a particular scenario or user.

Page 24: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

#TiP-is-not-as-scary-as-it-sounds!

Building a framework for Testing in Production is a long-term investment and can be a viable solution.

Page 25: Testing in Production at Scale · 2019-03-29 · Extensions to Tenancy Architecture Rate Limiting Tenancy-based QoS policies. Safe-guard production from other traffic. Shadow traffic

Thanks


Recommended