Date post: | 13-Feb-2017 |
Category: |
Data & Analytics |
Upload: | marco-rossetti |
View: | 265 times |
Download: | 0 times |
‡
Data Science at Trainlinefor Smarter Journeys
London, 22/11/2016
@DataScienceFest
@TrainlineTalent
‡
Outline• A bit about Trainline.• Cloud-based serverless architecture for Big Data.• Case Study: BusyBot
• Other Case Studies
2
John Telford, Head of Data Architecture.Leading the adoption of Big Data technology at Trainline. Manages a team of Data Engineers and Database Administrators. Previously worked on Data Warehousing and Big Data at Channel 4. Computer Science degree from Brunel University.Twitter: @jtelford1
Marco Rossetti, Senior Data Scientist.Leading personalisation initiatives, like providing context-aware personalised services, journey recommendations, and tailored travel options. Previously worked on recommender systems for researchers at Mendeley. He has a PhD in Computer Science from University of Milan-Bicocca.Twitter: @ross85
‡
Trainline - Smarter JourneysHelp our customers save,• Time (no more queuing for tickets at station)• Money (book early, find cheap tickets)• Energy (remove complexity)
Headlines...• We process more than £2.3 billion in ticket sales annually.• 100,000 smarter journeys every single day.• 44 train companies, across 24 European countries.• ~400 employees (London, Edinburgh, Paris).• More than 30m visits per month• 1 ticket sold every three seconds
3
Trainline takeover of Kings X, Oct 2016.
‡
Bob's cloud lawsIt’s cloud if…1. It offers self provisioning.2. It offers pay-as-you-go pricing.3. It is, for all intents and purposes, infinitely scalable.
Thus, no need for support from the provider for set-up, no upfront payments for licences or minimum term agreements, and no constraints on what I can do!
• Hosting is not cloud.• BYO licensing is not cloud.
7
‡
From servers... to serverless
8
Servers = Pets
Virtual Machines= Cattle
Containers & Serverless = Herds
Trainline policy:Use PaaS wherever possible,Use Serverless wherever possible,... so long as they are good enough.
‡
Lessons: Lambda• Effortless scaling; we often have >
100 λs running at once.• Warm-up time.
• Choose language / framework carefully.
• Consequences of 'freeze'.• Monitoring– single thread.
Google "Trainline Engineering Lambda"
11
ServiceTimeDistribution
Execution(ms)
‡
Lessons: Kinesis Streams
• TCO is generally low.• But... understand costs, related to capacity of stream (number & size of
messages), time-to-live, etc.• Monitoring / alerting... CloudWatch is (probably) not enough.• Compress & encrypt?
Google "AWS Overview of Security Processes"
12
‡
0% 10% 20% 30% 40% 50% 60% 70%
Delays
Overcrowding
Value for money
Toilet Facilities
Luggage Space
Availability of staff
Car Parking
Unhappy customers
Source : National Rail Passenger Survey (NRPS) 2015
14
‡
Infrastructure – Data Gateway
Feedbackcollection
DailyEnrichment
{"train_destination": "RDG","retail_train_number": "GW2980","train_origin": "NRC","train_date": "2016-08-08T07:38:00.000Z","customer_longitude": 0,"train_hashid": "NRC:RDG:08/08/2016 08:38:00:GW2980","customer_location_on_train": "Back","customer_hashid": ”…","customer_got_seat": 1,"customer_feedback": "Yes","feedback_type": 1,"customer_latitude": 0,"feedbackid": ”…","device_id": ”…","timestamp": "2016-08-08T07:41:39.390Z","customer_id": ”…”
} 18
‡
Infrastructure – Data PlatformModel BuildingAndValidation Service
route-origin
route-destination stop
customer-location-train
percentage-who-got-seat
feedback-count
EUS MAN EUS middle 0.738059701 4020
EUS BHM EUS middle 0.63788222 3532
KGX LDS KGX middle 0.704984154 3471
BHM EUS BHM middle 0.679082241 3356
KGX EDB KGX middle 0.5589236 3233
EUS GLC EUS middle 0.676663543 3201
MAN EUS MAN middle 0.769495772 3193
PAD SWA PAD middle 0.608086078 3067
EUS BHM EUS front 0.672365666 2866
EUS MAN EUS front 0.790479625 2773
{"retailTrainIdentifier": "VT7280","isBusy": false,"callingPoints": [
{"stationCode": "EUS","coaches": [
{"position": "Back", "recommend": true},{"position": "Front", "recommend": false},{"position": "Middle", "recommend": false}
]},{
"stationCode": "MKC","coaches": [
{"position": "Back", "recommend": false},…
22
‡
• AtleastN feedbacks
• AtleastfeedbacksforD days
• CIonthepercentagewhogotaseat<=p
Data Validation
23
‡
SummaryBusyBot Hotels
Journey RecommendationsSearch
Prediction
DelaysPrices
Real Time InformationPersonalisation
….
29
‡
Any Questions?
(we are hiring!)
Data Scientist positions: [email protected] Engineer positions: [email protected]
30