Catch Me If You Can: Detecting Pickpocket Suspects …dz220/CS671/10Catch.pdfCatch Me If You Can:...

Catch Me If You Can: Detecting Pickpocket Suspects from Large-Scale Transit Records

Paper by Bowen Du, Chuanren Liu, Wenjun Zhou,

Zhenshan Hou and Hui Xiong

Presented by Qi Dong, based on slides by: Luyao Niu,

Julang Ying and Guojun Wu

IntroductionPickpockets in public transit

Image Source: https://www.google.com/search?q=pickpocket&biw=1920&bih=974&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjAqtOW85_SAhUJw7wKHefSCMYQ_AUIBygC#imgrc=K3pd8op8KP3VCM:

Can you catch them?

Introduction: problem Passengers in public transit suffer a lot from professional pickpockets

During the first 9 months of 2014 in Beijing, 350 pickpockets were caught in the

subway system, 490 were caught on buses.

Big cities all over the world such as Barcelona, Prague, Rome, and Paris suffer from

pickpockets too

Consequences

- public safety concerns

- decreased passenger satisfaction

Introduction: causesPickpocket problem in public transit

- Crowdedness and rush make people vulnerable to thieves

- “Professionals” are hard to catch, experienced and strategic

How to prevent theft in public transit?

- “Opportunity makes the thief”, they commit crimes to get what they want without

being caught (U.S. Department of Justice)

- Catching thieves discourages further crime

Introduction: solution through dataDeploying more security personnel is costly

- A smart surveillance and tracking framework is desired

On the other hand, large automated fare collection (AFC) datasets are available

- Millions of passengers travel with smart cards

- 13 million records each day in Beijing, 1+ billion records used in this study

Data can reveal behavioral differences between thieves and passengers

- Notable thief behaviors: long traveling, unnecessary transfers, wandering

Introduction: challengesNo existing solution to detect pickpockets through AFC data

Challenges include

- Identifying useful features

- False positives (not all anomalies are thieves)

- Data imbalance (needle in a haystack)

- Decision making for real time security support

Introduction: pickpockets vs. anomaliesSuspect identification is

not the same as anomaly

detection

Outliers may be

- People from less

crowded places(E-B)

- Thieves(A-C-D-B)

Related workOf using AFC records to detect

people’s behavior

Passengers Activity Patterns

- About collective passenger

behavior, not identifying

individuals

Abnormal traveling behavior

detection

- location based

- trajectory based

Related work: passenger activity patternsDetecting collective patterns and groups of passengers

- Performance assessing of the transit network

- Flawed transit route identification and improvement

- Passenger flow forecast(crowdedness estimate for stations in transportation

network)

- Transit behavior analysis (for different group of passengers)

Related work: abnormal traveling behavior detectionLocation based

- location relevant events detection, e.g., accidents and protests

- gathering events detection, e.g., football matches and concerts;

Trajectory based

- fraudulent taxi driving behavior detection

System design

Features from AFC records and

geographical data

Ground truth from public pickpocket

announcements

First filter out normal passengers and

focus on anomalies

Then using supervised learning to

find thieves

overview

System design: components

Notice the “‘user in the loop”: police will give feedback of whether a suspect is a thief

System design: data descriptionAFC Records

- Smart card ID, route number, event (boarding/exiting), Station, time stamp;

- 1.7 billion records between April and June in 2014, 1.6 billion after removing

replicates;

System design: data descriptionTrips

- Combining records

- Allowing transfers

- And connection

- Within empirical

- cutoff of 30 minutes

System design: geographical informationPoints of Interest (POI) and

functional regions

- Based on (Yuan et al. 2012)

- Segment by major road network

- Functional density as features

(frequency/area)

- Categorize regions with LDA

into ten functions (see table)

System design: geographical information

System design: geographical informationPublic transit network information

- bus /subway route number

- Sequence ID and name of station

- latitude and longitude

44,524 bus stations on 896 routes (grey)

320 subway stations on 18 routes (blue)

Merge stations located at the same road

intersection

System design: incident reportsIncident reports via Sina Weibo (a social network service with public posts)

Include official announcements and personal complaints

- Date

- Time

- Location of Theft

10,529 records during study

System design: mobility characteristicsTravel time and frequency

Abnormal for thieves: they spend 3+ hours and have higher frequencies, searching for

victims and opportunities (Indeed, picking pockets is hard work)

System design: mobility characteristicsShort rides

- Regular passengers prefer fewer transfers and have longer transit records

- Pickpockets often switch routes within few stops to avoid detection

- Distributions of record distance: Gaussian, average increases with trip length

System design: mobility characteristicsFunctional Transitions

- Normal passengers tend to have sequential patterns of functional regions

- Pickpockets tend to not follow patterns, transitioning randomly between

functional regions

Frequently visited regions (thieves focus on familiar regions)

Deviation from social norms (empirically: time gaps of trips and region transitions)

Historical behaviors (median of daily features, and number of days flagged as suspect)

System design: mobility characteristics-

System design: suspect identificationTwo steps to distinguish pickpockets with high accuracy and low false positives

- anomaly detection techniques to identify anomalies

- further distinguishes the pickpockets by support vector machines (SVM)

System design: suspect identificationSuppose the dataset is y=1 means thief

Develop the predictive model

Highly imbalanced problem! Thieves are a tiny fraction of the population

Two steps:

- Regular Passenger Filtering that allows some false positives

- Suspect Detection on more balanced data that is more accurate

System design: suspect identification - anomaly detectionAnomaly detection to filter out the vast majority of normal passengers

- Helps deal with the data imbalance problem

One Class SVM with non-linear decision boundaries to detect outliers, using

appropriate kernel functions and soft margins

Kernel is defined as where maps the original

feature into a high dimensional kernel space where the optimal decision boundary

exists:

System design: suspect identification - anomaly detectionOptimization objective of the One-Class SVM is

(an algorithm which returns a function that defines a “small” region capturing most of

the data points; are slack variables, )

The parameter C controls the fraction of anomalies (e.g., x such that g(x) = 1)

We use the Gaussian kernel

and learn the best parameter h (bandwidth) by cross-validation

System design: suspect identification - suspect detectionBy controlling the parameter C of One-Class SVM, the number of the false-positives

are limited and comparable with the number of suspects

Now use two-class SVM with the same feature and kernel

Optimize

Experimental results

1.6 billion records from Beijing

Three kinds of baselines

New app for security personnel

Case study

Baseline and evaluation

Experimental resultsExperiment on real-world datasets containing over 1.6 billion transit records

Split the data into training and test sets

- Historical training set: three months (from April to June, 2014)

- Test set comes from the following two weeks (in July 2014)

From the training set, we filter out passengers whose maximum number of daily

records is no more than three

Platform: Windows Server 2012 64-bit system (4-CPU, each with 2.6GHz with

Quad-Core, and 128G main memory)

Experimental results: baselinesClassification methods (data imbalance led to inferior results in these methods)

- Logistic regression

- Decision trees

- Support vector machines

Anomaly detection

- One class SVM

- Local outlier factor (LOF) - measuring the deviation of data point from neighbors

Two step method (using LOF as the first step instead of one class SVM)

Optimize parameters with 10-fold cross-validation (80% / 20%) if applicable

Experimental results: performanceEvaluation of methods

- Precision and recall

- Precision is the number of correctly identified positives divided by the number of identified

positives instances.

- Recall is the number of correctly identified positives divided by the number of all positive instances

in the test set.

- F-score

- Run time

Results: our method performs the best, AD performs better than other 1 step methods

Experimental results: baselines

Experimental results: feature analysisEvaluate with different

feature combinations

- Db; daily behavior

- Sc: social comparison

- Hb: historical behavior

More features improves

accuracy

Accuracy is slightly lower

on weekends

Experimental results: prototypeGUI with five basic

components

- Statistics

- Passenger flow

- Active region

- Suspects list

- Selected suspect

Database updated everyday

Detectives can get instant

result

Experimental results: case studyVisitors: attractions; Shoppers: commercial area Thieves: random walk

Curve represents

transition

color represents

traffic density

(red=high

green=low)

Experimental results: case studyVisitors: attractions; Shoppers: commercial area Thieves: random walk

Curve represents

transition

color represents

traffic density

(red=high

green=low)

Future work

They rely on identifying individual

smart cards in AFC, so what if thieves

learn to not use smart cards?

Getting new accurate labels is hard

Caught thieves may not be

representative of all thieves because

it’s a needle in a haystack problem.

Suggestions and limitations

Questions

Date post:	28-Mar-2018
Category:	Documents
Upload:	vuongdung
View:	214 times
Download:	0 times

Catch Me If You Can: Detecting Pickpocket Suspects …dz220/CS671/10Catch.pdfCatch Me If You Can:...

Documents