+ All Categories
Home > Documents > AB Testing at Expedia

AB Testing at Expedia

Date post: 15-Apr-2017
Category:
Upload: paul-lucas
View: 183 times
Download: 5 times
Share this document with a friend
27
AB Testing Revolution through constsant evolution
Transcript
Page 1: AB Testing at Expedia

AB Testing

Revolution through constsant evolution

Page 2: AB Testing at Expedia
Page 3: AB Testing at Expedia

Expedia SF114 Sansome

www.expedia.com@expediaeng

Work with us: [email protected]

m

Page 4: AB Testing at Expedia

Paul LucasSr Director, TechnologyWant to visit next? Greece

Jeff MadynskiDirector, TechnologyWant to visit next? Croatia

Anuj GuptaSr Software Dev EngineerWant to visit next? Peru

Page 5: AB Testing at Expedia

Revolution through constant evolution

Page 6: AB Testing at Expedia
Page 7: AB Testing at Expedia

Technology EvolutionV0 – batch processing from abacus exposure logs, Omniture, and booking datamart. Tableau visualization

V1 - Storm, Kestrel, DynamoDB / Postgresql reading UIS messages and client log data. (Nov 2014 - Dec 2015)

V2 - Introduce Kafka and Cassandra (May 2016)

Page 8: AB Testing at Expedia

TNL – original solution• Batch processing• Tableau visualization• Merged data from OMS/omniture• Problems:

– 1-2d feedback loop – what if we had mistakes in test implementation(bucketing not what anticipated)?

– In order to fix data import errors - start over again

Page 9: AB Testing at Expedia

TNL Dashboard v0

Omnitureclick data

Booking datamart

Abacus exposures

Tableau

Hadoop ETL

Page 10: AB Testing at Expedia

TNL v0 -> v1

Page 11: AB Testing at Expedia

Begin Jeffdelete this page

Page 12: AB Testing at Expedia
Page 13: AB Testing at Expedia

TNL v1 Problems • Database size 420GB, queries took 3-5 minutes

• Data drop (kestrel) • Increase in data (multi-brand, +customers)

Page 14: AB Testing at Expedia

TNL v1->v1.1, v2• Fighting fires, borrowing more time• POC next

Page 15: AB Testing at Expedia

Fighting fires – borrowing more time

Page 16: AB Testing at Expedia

User Interaction Service(UIS) Traffic

Page 17: AB Testing at Expedia

Scaling messaging system

Kafka

• Publish-subscribe based messaging system

• Distributed and reliable• Longer retention and

persistence• Monitoring dashboard

and alerts• Buffer for system

downtime

Kestrel limitation

• Message durability is not available

• Reaching potential scalability issues

• In-active open source project

Page 18: AB Testing at Expedia

Scaling database performance

• Database views for caching–Views created every 6 hours

–UI only loads data from views

–Read-only replicas for select queries

• Archive data–Moved old and completed experiment data to

separate tables

–DB cleanup using vacuum and re-indexing

Page 19: AB Testing at Expedia

TNL Dashboard v2

Page 20: AB Testing at Expedia

Product Demo

Page 21: AB Testing at Expedia

Streaming

Page 22: AB Testing at Expedia

•Column-oriented, time series schema•Time-to-live(TTL) on data•Only store most popular aggregates

Page 23: AB Testing at Expedia

v1 VS v2•New Architecture

– More scalable– More responsive– Less prone to data loss

• Lessons learnt–System is as fast as the slowest component

–Fault-tolerance and resilience

–Partition data

–Pre-production environment

Page 24: AB Testing at Expedia

Questions/discussion

Page 25: AB Testing at Expedia

APPENDIX

Page 26: AB Testing at Expedia

27Apply statistical power to test results results

Using 90% confidence level, 1 out of 10 tests will be false positive or negative

Heads TailsRight hand 51 49Left hand 49 51

Right hand is superior at getting

heads!

Page 27: AB Testing at Expedia

Do’s and Don’ts when concluding tests

Don’t call test too early; this increases false

positives or negatives

Don’t call tests as soon as you see positive results because test

result frequently goes up and down

To claim a test Winner/Loser, the positive/negative effect has to stay for

at least 5 consecutive days and the trend is stable

Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test

Define one success metric and run tests for a pre-determined duration;

(For hotel/flight tests in the US, suggest running until confidence interval of conversion change is

within +/- 1%); tests should run at least 10 days

Don’t assume the midpoint (observed % change during the test period) will hold true after the feature is rolled out: a 4.0% +/- 4.0% test may have zero impact and may not be much

better than a 1.0% +/- 1.0% test

Don’t call an inconclusive test “trending positive” or “trending

negative” as test result fluctuates

Contact ARM testing team for questions

[email protected]

Using 90% confidence levelWinner: Lower bound of % change >= 0 (or probability of test being positive >= 95%);Loser: Higher bound of % change <= 0 (or probability of test being negative >= 95%)

Else: Inconclusive or Neutral


Recommended