SomeSQL at Skyscanner - Scaling in a changing world of databases and hardware

Post on 27-Nov-2014

281 views 0 download

Tags:

description

Slides from Alistair Hann's presentation at 2014 All Your Base Conference in Oxford, UK Synopsis: Skyscanner performs 200 million searches every month, and generated $7bn of downstream revenue last year. Scaling from zero brought many challenges, in the context of a continually growing variety of databases and hardware. Alistair will talk about how that rapid change has shaped Skyscanner’s data architecture and how they moved from just using SQL Server to a range of hardware and data technologies (Postgres, Couchbase, ElasticSearch, Hadoop). (Unfortunately the animations were destroyed by the upload)

transcript

SomeSQLScaling in a changing world of databases and hardware

Alistair HannCTO, Skyscanner

Buzzwords

Web 2.0

Year of Mobile

Big Data

NoSQL

Scaling the live pricing cache

Website Native Apps APIs and White Labels

Traditional Airlines Budget Airlines Online Travel Agencies

Prices +Timetables

Data Collection Services

1) Which websites should we show?

2) What prices do we already have

cached?

3) Live update what we still

need.

4) Clean up and save the new data

5) Return the prices to the user.

Live Pricing Service

Live Pricing Service

Live Pricing Service

Cached Prices (key/value)

2 bn itineraries and quotes

270 GB

table

250 GB

indices

2000 quotes per second

What we really needed

Consistency

Horizontal Scaling

Elasticity

Persistence

Speed

Resilience

Simplicity

Live Pricing Service

Cached Prices (key/value)

Beyond key value

Couchbase – Map Reduce Views

{  "website": {    "published": true,    "id": "affd", ...  },  "office_id": "1",  "city_id": "AUHA",  "raw_data": [...]  "address":   "closing_time": "00:00",  "routenodeid": "9618",  "type": "office“}

What about the hardware?

Disk for VMs

c.f. 250,000 iops Fusion I/O

Standard $0.03 / GBGlacier $0.01 / GB

Quote Bus staging

UK1

Thrift

long-term archive

GZIP

queryablehierarchical

LZO

queryableflat

filter

Loader

GZIP

Quote Bus

UK2

Thrift

Loader

hierarchical

flat

Hadoop clusteror

Elastic MapReduce

analystsquery

load

analytical tools

feed

export

The death of the data warehouse

Fluentd

Graphite

Fluentd

Stitchedevents

Stitchedevents

Operationalmetrics

reportingKafka

ErrorsRaw JSON events

ElasticMapReduce

RawEvents

Trigger and view materialization Indexes on the data

A distributed database…

Some things don’t change

We still face the same challenges

RAM and Disk i/o concerns

Administration

Security

Data insert and retrieval

Monitoring and alerting

Performance optimization

The report of my death was an exaggeration

Elastic Search

NoSQL

Microsoft SQL Server

Relational Vs NoSQL

EdinburghQuartermile One15 Lauriston PlaceEdinburgh EH3 9EN

Glasgow5th floor, 151-155 St Vincent St, Glasgow   G2 5NW

SingaporeNo. 08-01&04 & 09-048th floor, Robinson Point, 39 Robinson Rd, Singapore

BeijingLevel 19, Tower E2, Oriental Plaza, No. 1 East Chang  An Avenue, Dong Cheng District, Beijing 100738

Miami1395 Brickell Ave, Suite 900, Miami, Florida 33131

BarcelonaTorre NN, Calle Tarragona, 157, 4a Planta, Barcelona, 08014

thank you