Catch Me If You Can: Detecting Pickpocket Suspects from Large-Scale Transit Records
Paper by Bowen Du, Chuanren Liu, Wenjun Zhou,
Zhenshan Hou and Hui Xiong
Presented by Qi Dong, based on slides by: Luyao Niu,
Julang Ying and Guojun Wu
IntroductionPickpockets in public transit
Image Source: https://www.google.com/search?q=pickpocket&biw=1920&bih=974&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjAqtOW85_SAhUJw7wKHefSCMYQ_AUIBygC#imgrc=K3pd8op8KP3VCM:
Can you catch them?
Introduction: problem Passengers in public transit suffer a lot from professional pickpockets
During the first 9 months of 2014 in Beijing, 350 pickpockets were caught in the
subway system, 490 were caught on buses.
Big cities all over the world such as Barcelona, Prague, Rome, and Paris suffer from
pickpockets too
Consequences
- public safety concerns
- decreased passenger satisfaction
Introduction: causesPickpocket problem in public transit
- Crowdedness and rush make people vulnerable to thieves
- “Professionals” are hard to catch, experienced and strategic
How to prevent theft in public transit?
- “Opportunity makes the thief”, they commit crimes to get what they want without
being caught (U.S. Department of Justice)
- Catching thieves discourages further crime
Introduction: solution through dataDeploying more security personnel is costly
- A smart surveillance and tracking framework is desired
On the other hand, large automated fare collection (AFC) datasets are available
- Millions of passengers travel with smart cards
- 13 million records each day in Beijing, 1+ billion records used in this study
Data can reveal behavioral differences between thieves and passengers
- Notable thief behaviors: long traveling, unnecessary transfers, wandering
Introduction: challengesNo existing solution to detect pickpockets through AFC data
Challenges include
- Identifying useful features
- False positives (not all anomalies are thieves)
- Data imbalance (needle in a haystack)
- Decision making for real time security support
Introduction: pickpockets vs. anomaliesSuspect identification is
not the same as anomaly
detection
Outliers may be
- People from less
crowded places(E-B)
- Thieves(A-C-D-B)
Related workOf using AFC records to detect
people’s behavior
Passengers Activity Patterns
- About collective passenger
behavior, not identifying
individuals
Abnormal traveling behavior
detection
- location based
- trajectory based
Related work: passenger activity patternsDetecting collective patterns and groups of passengers
- Performance assessing of the transit network
- Flawed transit route identification and improvement
- Passenger flow forecast(crowdedness estimate for stations in transportation
network)
- Transit behavior analysis (for different group of passengers)
Related work: abnormal traveling behavior detectionLocation based
- location relevant events detection, e.g., accidents and protests
- gathering events detection, e.g., football matches and concerts;
Trajectory based
- fraudulent taxi driving behavior detection
System design
Features from AFC records and
geographical data
Ground truth from public pickpocket
announcements
First filter out normal passengers and
focus on anomalies
Then using supervised learning to
find thieves
overview
System design: components
Notice the “‘user in the loop”: police will give feedback of whether a suspect is a thief
System design: data descriptionAFC Records
- Smart card ID, route number, event (boarding/exiting), Station, time stamp;
- 1.7 billion records between April and June in 2014, 1.6 billion after removing
replicates;
System design: data descriptionTrips
- Combining records
- Allowing transfers
- And connection
- Within empirical
- cutoff of 30 minutes
System design: geographical informationPoints of Interest (POI) and
functional regions
- Based on (Yuan et al. 2012)
- Segment by major road network
- Functional density as features
(frequency/area)
- Categorize regions with LDA
into ten functions (see table)
System design: geographical information
System design: geographical informationPublic transit network information
- bus /subway route number
- Sequence ID and name of station
- latitude and longitude
44,524 bus stations on 896 routes (grey)
320 subway stations on 18 routes (blue)
Merge stations located at the same road
intersection
System design: incident reportsIncident reports via Sina Weibo (a social network service with public posts)
Include official announcements and personal complaints
- Date
- Time
- Location of Theft
10,529 records during study
System design: mobility characteristicsTravel time and frequency
Abnormal for thieves: they spend 3+ hours and have higher frequencies, searching for
victims and opportunities (Indeed, picking pockets is hard work)
System design: mobility characteristicsShort rides
- Regular passengers prefer fewer transfers and have longer transit records
- Pickpockets often switch routes within few stops to avoid detection
- Distributions of record distance: Gaussian, average increases with trip length
System design: mobility characteristicsFunctional Transitions
- Normal passengers tend to have sequential patterns of functional regions
- Pickpockets tend to not follow patterns, transitioning randomly between
functional regions
Frequently visited regions (thieves focus on familiar regions)
Deviation from social norms (empirically: time gaps of trips and region transitions)
Historical behaviors (median of daily features, and number of days flagged as suspect)
System design: mobility characteristics-
System design: suspect identificationTwo steps to distinguish pickpockets with high accuracy and low false positives
- anomaly detection techniques to identify anomalies
- further distinguishes the pickpockets by support vector machines (SVM)
System design: suspect identificationSuppose the dataset is y=1 means thief
Develop the predictive model
Highly imbalanced problem! Thieves are a tiny fraction of the population
Two steps:
- Regular Passenger Filtering that allows some false positives
- Suspect Detection on more balanced data that is more accurate
System design: suspect identification - anomaly detectionAnomaly detection to filter out the vast majority of normal passengers
- Helps deal with the data imbalance problem
One Class SVM with non-linear decision boundaries to detect outliers, using
appropriate kernel functions and soft margins
Kernel is defined as where maps the original
feature into a high dimensional kernel space where the optimal decision boundary
exists:
System design: suspect identification - anomaly detectionOptimization objective of the One-Class SVM is
(an algorithm which returns a function that defines a “small” region capturing most of
the data points; are slack variables, )
The parameter C controls the fraction of anomalies (e.g., x such that g(x) = 1)
We use the Gaussian kernel
and learn the best parameter h (bandwidth) by cross-validation
System design: suspect identification - suspect detectionBy controlling the parameter C of One-Class SVM, the number of the false-positives
are limited and comparable with the number of suspects
Now use two-class SVM with the same feature and kernel
Optimize
Experimental results
1.6 billion records from Beijing
Three kinds of baselines
New app for security personnel
Case study
Baseline and evaluation
Experimental resultsExperiment on real-world datasets containing over 1.6 billion transit records
Split the data into training and test sets
- Historical training set: three months (from April to June, 2014)
- Test set comes from the following two weeks (in July 2014)
From the training set, we filter out passengers whose maximum number of daily
records is no more than three
Platform: Windows Server 2012 64-bit system (4-CPU, each with 2.6GHz with
Quad-Core, and 128G main memory)
Experimental results: baselinesClassification methods (data imbalance led to inferior results in these methods)
- Logistic regression
- Decision trees
- Support vector machines
Anomaly detection
- One class SVM
- Local outlier factor (LOF) - measuring the deviation of data point from neighbors
Two step method (using LOF as the first step instead of one class SVM)
Optimize parameters with 10-fold cross-validation (80% / 20%) if applicable
Experimental results: performanceEvaluation of methods
- Precision and recall
- Precision is the number of correctly identified positives divided by the number of identified
positives instances.
- Recall is the number of correctly identified positives divided by the number of all positive instances
in the test set.
- F-score
- Run time
Results: our method performs the best, AD performs better than other 1 step methods
Experimental results: baselines
Experimental results: feature analysisEvaluate with different
feature combinations
- Db; daily behavior
- Sc: social comparison
- Hb: historical behavior
More features improves
accuracy
Accuracy is slightly lower
on weekends
Experimental results: prototypeGUI with five basic
components
- Statistics
- Passenger flow
- Active region
- Suspects list
- Selected suspect
Database updated everyday
Detectives can get instant
result
Experimental results: case studyVisitors: attractions; Shoppers: commercial area Thieves: random walk
Curve represents
transition
color represents
traffic density
(red=high
green=low)
Experimental results: case studyVisitors: attractions; Shoppers: commercial area Thieves: random walk
Curve represents
transition
color represents
traffic density
(red=high
green=low)
Future work
They rely on identifying individual
smart cards in AFC, so what if thieves
learn to not use smart cards?
Getting new accurate labels is hard
Caught thieves may not be
representative of all thieves because
it’s a needle in a haystack problem.
Suggestions and limitations
Questions