+ All Categories
Home > Data & Analytics > PyData Paris 2015 - Track 4.4 Clément JAMBOU

PyData Paris 2015 - Track 4.4 Clément JAMBOU

Date post: 19-Jul-2015
Category:
Upload: pole-systematic-paris-region
View: 280 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
Using Python and Data Science to tackle real-time transportation problems at Lyft Clement Jambou
Transcript
Page 1: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Using Python and Data Science to tackle real-time transportation problems at Lyft

Clement Jambou

Page 2: PyData Paris 2015 - Track 4.4 Clément JAMBOU

About me

Clement Jambou

Page 3: PyData Paris 2015 - Track 4.4 Clément JAMBOU

What is Lyft

- over 60 cities in the US !- 5X growth in 2014

Page 4: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Data Science at Lyft

Many complex problems require optimizations:

- Dynamic pricing- Dispatching drivers efficiently- Lyft Line Passenger matching- Accurate ETAs - Marketplace optimization : matching demand and supply

through incentives and promotions- Events detection- Fraud ...

These problems becomes even more challenging with the geolocation component.

Page 5: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Dynamic Pricing: Prime Time

Event : A big concert is happening, demand >> supply

The prices will automatically increase in subregions to encourage drivers to drive in that direction:

Where/ When do we use Prime Time ?

Page 6: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Dispatching Drivers Efficiently

Request arrives in real-time:

Which driver should we dispatch ?

Should we decide not to dispatch a driver that is too far away ?

Should our decision depend on the time a driver has been waiting ?

How do we minimize pick up time and idle time of the drivers to optimize the pool of drivers ?

Page 7: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Lyft Line Matching

Matching Passenger going in the same direction.

Matching to optimize efficiency and reduce cost.

Page 8: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Accurate ETAs (Estimated Time of Arrival)

- Knowing when the driver will arrive or when you will reach your destination is useful information for the passenger.

- Order the drivers for dispatch based on ETAs

- We guarantee prices for Lyft Line, but drivers are paid depending on the distance and time of the route.

Page 9: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Python usage

We rely heavily on python in Data Science as well as for backend services at Lyft.

- numpy, scipy, matplotlib- scikit-learn- pandas, geopandas- geohash, haversine, shapely

Page 10: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Geolocation Data

Many of our problems uses geolocation data. We use Open Source libraries and develop our own analysis and visualization tools.

- http://geojson.io/- Demo Ipython Notebook

Page 11: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Data Pipeline

We use Amazon Redshift as our Data warehouse.

Pandas is directly integrated with redshift, so that we get our data in the right format from any SQL query.

Page 12: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Simulation

When the problem cannot be solved directly using historical data we often rely on simulations:

- replay a week of sessions data- What would happen with more/less drivers- How would our metrics change with a different

pricing/dispatch algorithm ?

Our simulator is entirely written in python, runs locally or in parallel and uses the same algorithms used in production.

Page 13: PyData Paris 2015 - Track 4.4 Clément JAMBOU

Questions

[email protected]


Recommended