HighwayTollgatesTrafficFlowPredictionTask1.TravelTimePrediction
School of Mathematical SciencesZhejiang University, Hangzhou, China
Yide Huang
KDD CUP 2017
Outline1Introduction
2ProblemUnderstanding
5Summary
3Features
4Models
1 introduction
Toestimatetheaveragetraveltimefromdesignatedintersectionstotollgates:a.RoutesfromIntersectionAtoTollgates2&3b. RoutesfromIntersectionBtoTollgates1&3c. RoutesfromIntersectionCtoTollgates1&3
Theroadnetworktopology ,vehicletrajectories,historicaltrafficvolumeattollgatesandweatherdata
EvaluationMetrics:
drt:actualaveragetraveltimeforroute r duringtimewindow t
prt:predicted averagetraveltimeforroute r during timewindow tR:thenumber ofroutesT:numberofto-predict timewindows
Task:
Given:
2 Problem Understanding
Weatherconditions
2.1 Influence factor of travel time
1
2
3
Timeoftheday
Holidays
Trafficconditions
Roadnetworktopology
4
5
2 Problem Understanding2.1 Influence factor of travel time
Roadnetworktopology5
Linkinformation2
1 Sharinglinkornot
(A-3,B-3,C-3)
(length,widthandnumberoflanes)
July.19th Oct.18th Oct.24th Oct.25th Oct.31th
0:00
6:00
12:00
18:00
24:00
0:00
6:00
12:00
18:00
24:00
offline
online TrainingSet
TrainingSetValidationSet
ValidationSet
TestingSet
TestingSet
2.2 Data set partition2 Problem Understanding
1Humancomfortindex2Precipitation3Theirstatisticalfeatures(mean,sum)
1Timeo'clock2Weekday,weekendorholiday3Whetherit’s rushhour
3.1 Feature Engineering
1Thenumberofcars2Theratioofroad’s carnumber3Road’sETA4Links’ ETA5TheweightedmeanofLinks’ velocitybasedontheirlength6Whetherthereisanemergency7TherankfeatureofLinks’ velocity8Trafficvolume9Averagecapacityofvehicle10ThenumberandratioofcarthathasnoETC11Roadnetworktopology features12Lastweek’s historicalETAandcarnumber13Theirstatisticalfeatures(mean,sum)
3 Features
Weather
2
1
Time
3 RoadFeatures
3.2 Missing data processing
6:00-6:20
6:20-6:40
6:40-7:00
7:00-7:20
7:20-7:40
7:40-8:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
7days’historical ETA
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
Topredict targettimewindowsonebyone,dayafterday.
MissingdataProcessing
MissingdataProcessing
6:00
_
8:00
7days’historical data
6:00
_
8:00
6:00
_
8:00
6:00
_
8:00
6:00
_
8:00
6:00
_
8:00
6:00
_
8:00
3 FeaturesPart 3 Part 1
Part 2
Target time windowsMissinglabel
Processing
6:00-6:20
6:20-6:40
6:40-7:00
7:00-7:20
7:20-7:40
7:40-8:00
MissingdataProcessing
Highfeatureimportance:replacedbymeanvalues
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
MissingdataProcessing&
3.2 Missing data processing3 Features
Lowfeatureimportance:ifthenumberoftimewindowswhichhavemissingvalues<=3,missingvalueswillbereplacedbymeanvalues
Part 2Part 1
8:00-8:20
8:20-8:40
8:40-9:00
9:00-9:20
9:20-9:40
9:40-10:00
3.2 Missing data processing3 Features
Balance:moresamplesandlessnoisy data
1ifthenumberofmissingvalues<=3,missingvaluewillbereplacedbymeanvalue
2Pre-training
MissinglabelProcessing
3.2 Missing data processing3 Features
RouteB-1,C-1,C-3:thenumberofsamplesincreasesabout20%
4 Models
TrainingSetPre-training
XGBoost Model_1 resultBagging
4.1 Model_1
TrainingSet’
1pre-training:Theratioofsamples arepreserved:0.8- 0.952Model:eXtremeGradientBoosting3Bagging:Differentparameters,Averagevalue4Model_1 result: stage1:MAPE=0.1785
stage2:MAPE=0.1786
4 Models4.2 Model_2
Thelasttimewindow’s featuresismoreimportantthanothertimewindows
6:00-6:20
6:20-6:40
6:40-7:00
7:00-7:20
7:20-7:40
7:40-8:00
MissingdataProcessing
Part 1
TrainingSet2Pre-training
XGBoost Model_2 resultBagging
TrainingSet’1FeatureSelection
1Featureselection:Preservethelasttimewindow’s featuresanddeletelow-importance featuresofotherfivetimewindows
2pre-training:Theratioofsamples thatarepreserved :0.8- 0.953Model:eXtremeGradientBoosting4Bagging:Differentparameters,Averagevalue5Model_2 result: stage1:MAPE=0.1792
stage2:MAPEUnknown
4 Models4.2 Model_2
Model_1Model_2Ensemble
Stage10.17850.17920.1763
Stage20.1786Unknown 0.1771
MAPE:
5 Summary
Futurework:
1Fullyusingoflinkinformation
2Missingdataprocessing
Thanks!
Q&A