Clustering of Trajectory Data obtained from Soccer Game Record -A First Step to
Behavioral Modeling
Shoji Hirano Shusaku [email protected] [email protected]
Dept of Medical Informatics,Shimane Univ. School of Medicine, Japan
Outline
Introduction Data Structure Method Experimental Results Conclusions and Future Work
Introduction
Clustering of Spatio-temporal Data Provides a way to discover interesting characteristics
about the motion of targets Related field: meteorology, medical image analysis,
sports, crime research etc. Approaches
Spatial clustering + temporal continuity trace (e.g. tracking of moving object)
Spatial clustering based on temporal correlation (e.g. fMRI analysis)
Spatial clustering + observation of the temporal changes of the clusters (e.g. Observation of the climate regimes)
Objective
Development of a clustering method for trajectories with multiscale structural comparison scheme Compare trajectories according to both local and global views. Visualize common characteristics of trajectories
Application: Clustering of trajectories of passes in soccer game records Discovery of interesting spatio-temporal patterns of passes which
may reflect the strategy and tactics of the team Globally similar passes: strategy of the team -ex. Attack from right side Locally similar passes: tactics of the ream -x. Frequent use of one-two
passes
Data Structure
Soccer game records( provided for research purpose by DataStadium Inc., Japan)
Series Time Action Team1 Player1 Team2 Player2 X1 Y1 X2 Y21 10:11:01 KICK OFF Fra -6 01 10:11:01 PASS Fra 12 Fra 21 -6 -11 6 -471 10:11:01 PASS Fra 21 Fra 21 6 -47 80 711 10:11:03 PASS Fra 21 Fra 17 80 71 31 8411 10:11:04 TRAP Fra 17 Fra 31 841
…
2 10:12:00 THROW IN Jap 11 3500 150
…
37 12:03:04 P END Fra 19 2432 71
Data Structure
Field geometry and Pass sequence
X
Y
-5346
5346
3500-3500
- 1000
0
1000
2000
3000
4000
5000
6000
- 2500 - 2000 - 1500 - 1000 - 500 0 500 1000 1500
PASS start
IN GOAL
t
Pass sequence clustering: Problems
Irregularly-sampled spatio-temporal sequence Data point is generated when a player takes an
interaction with a ball High interaction -> Dense Data
Low interaction -> Sparse Data Need for Multiscale Observation
Strategy -> global pass featureTactics -> local pass feature
Both exist concurrently
It is required to partly change comparison scale according to the granularity of data and type of events
- 1000
0
1000
2000
3000
4000
5000
6000
- 2500 - 2000 - 1500 - 1000 - 500 0 500 1000 1500
Sparse
Dense
Trajectory MiningPreprocessing
Segmentation and Generation of Multiscale Trajectories
Segment Hierarchy Trace and Matching
Calculation of Dissimilarities
Clustering of Trajectories
Method: Multiscale Matching
A pattern matching method that compares structural similarity of planar curves across multiple observation scales
Able to compare objects by partly changing observation scales Simultaneously compare both global and local similarities
Sequence A Sequence B
Scale
MatchedPairs
segment
Multiscale Description (Witkin et al 1984, Mokhatan et al. 1986)
Describe convex/concave structure at multiple scales
Sequence description : t : course parameter
Sequence x(t) at scale
Scale controls the degree of smoothing = small : local feature, large : global feature
),( tX),()(),( tgtxtX
duuxut2
2
2
)(
exp2
1)(
))(),(()( tytxtc
)),(),,((),( tYtXtc
Scale
)0,(tc
),( tc
Multiscale Matching based on Convex/Concave Structure of Segments (Ueda et al. 1990)
Segment: Partial sequence between adjacent inflection points
Curvature K (t, ) at scale
Inflection point: Represent a sequence as a set of segments
2/322 )(),(
YXYXYX
tK
),()(),(
),( )()( tgtxttX
tX mm
mm
0),(),1(:),( tKtKtci
Scale
),( tci
NiaA i ,...,2,1|)()(
)0(1a
)0(2a
)0(A
)(A
Matching Procedure
Sequence A
Sequence B
Inflection Points
Scale 0 Scale 1 Scale 2
IN GOAL
IN GOAL
A0(0)A1(0)
A2(0)A3(0)
A4(0)
B2(0)
B0(0)
B5(0)
B6(0)
B3(0)
B4(0)
B1(0)B0(1)
B1(1)
B2(1)B3(1)
B4(1)
B0(2)
B1(2)
B2(2)
A0(1)
A2(1)
A1(1)
A0(2)
A2(2)
A1(2)
t
Segment Dissimilarity
Dissimilarity of Segments
Dissimilarity of sequences
)()( , hj
ki ba
),( )()( hj
ki bad
Length
)(kai
)(kail)(k
ia
Segment ai(k)
)(hb j
)(hb jl)(h
jb
Segment bi(j)
Rotation Angle
)()( hb
ka ji
)(
)(
)(
)(
hB
hb
kA
ka
L
l
L
lji Max( , )
P
ppp badBAD
1
)0()0( ) ,(),(
P: the number of matched pairs
Indiscernibility-based Clustering: Overview
1. Assignment of initial equivalence relations (ERs) Assign an initial ER to each of the N objects. An ER independently performs binary classification,
similar or dissimilar, based on the relative proximity. Indiscernible objects under all of the N ERs form a
cluster.
2. Iterative refinement of initial ERs
For each pair of objects, count the ratio of ERs that have ability to discriminate them (indiscernibility degree)
If the number is small, assume that these ERs give too fine classification and disable their discrimination ability
Iterate step2 until the clusters become stable
Experiments
Data Game records of FIFA WorldCup 2002
(64 games, including all heats and finals) Number of goals: 168 (own goals excluded)
Procedure Select series containing ‘IN GOAL’ event, and
generate a total of 168 trajectories of 2-D ball location. For every possible pair of the trajectories, calculate
dissimilarity by using multiscale matching. Group the trajectories by using the obtained
dissimilarities and indiscernibility-based clustering
Experimental Results
Cluster Constitution
Cluster Cases
1 87
2 24
3 17
4 16
5 8
6 4
Cluster Cases
7 3
8 3
9 2
10 2
11 2
12 1
Note: 55.2% (7839/14196) of triplet in the dissimilarity matrix did not satisfy the triangular inequality due to matching failure
Experimental Results (cont’d)
Cluster 1 (87 cases)
Turkey vs Japan Italy vs Korea
IN GOAL
Matching Result
Corner Kick – Goal
Europe: 45, South America: 24, Asia: 9
Experimental Results (cont’d)
Cluster 2 (24 cases)
Poland vs Portugal Germany vs Cameroon
IN GOAL
Matching Result
Complex Pass – Side attack- Goal
Europe: 13, South America: 7, Asia: 3
Experimental Results (cont’d)
Cluster 4 (16 cases)
Slovenia vs Paraguay
IN GOAL
China vs Turkey
Matching Result
Side Change – Centering/Dribble – Goal
Europe: 10, South America: 4, Africa: 2
Experimental Results (cont’d)
Cluster 3 (17 cases)Side Change – Centering/Dribble – Goal
(Intermediate cases between Cluster 2 and 4)
Europe: 10, South America: 2, Africa: 2Asia 2
Summary of Experimental Results
Goal success patterns can be classified into 4 major groups (with 8 minor patterns)
Patterns: complexity of pass sequences With additional information
Dribble/Centering/Side change: European Style However, the differences are not statistically significant.
Key is “Side Change” Players (Defenders) should take care of the other side of the ball
movement. The higher complexity of pass transactions, the higher rate of
goal success gains by side change.
Conclusions
Presented a new scheme of spatio-temporal data mining Grouped similar patterns using multiscale comparison
and indiscernibility-based clustering techniques. Visualized similar patterns using matching results. Application to real World Cup data:
Grouping and visualization of interesting pass patterns:ex. Complex pass -> side attack -> goal
Future Work
Technical Issues Numerical Evaluation Validation and improvement of segment dissimilarity measure;
inclusion of event type to dissimilarity
Apply the proposed method to all path series including non-‘IN GOAL’ series Differences between success and failure are very small. This suggests that the patterns of soccer attack are simple.
Apply the proposed method to medical environment Trajectories of Laboratory Examinations (IEEE ICDM06) Trajectories of Patients’ Movement: Patient Safety
Criteria for determining the best set of segment pairs Complete match; original sequence should be correctly formed by
concatenating the selected segments without any overlaps or gaps
Minimization of total segment difference
Matching Criteria
Overlap Gap
P
ppp badBAD
1
)0()0( ) ,(),(
P : Number of matched segment pairs
A
B
a1a2
a3 a4a5
b1 b2 b3 b4b5) ,( )0()0(
pp bad )0()0( , pp ba: dissimiarity of segments
Matching Failure Problem in MSM
Theoretically, any sequence can finally become a single segment at enough high scales. Therefore, any pair of sequences should be successfully matched.
Practically, there should be an upper limit of scales in order to reduce computational complexity. Therefore, the number of segments can be different even at the highest scales.
If matching is not successful, the method should return infinite dissimilarity or a magic value that indicates matching failure.
Scale 1
Scale 2
Scale n
no-match
match
Trajectory MiningPreprocessing
Segmentation and Generation of Multiscale Trajectories
Segment Hierarchy Trace and Matching
Calculation of Dissimilarities
Clustering of Trajectories