Download - Clustering of Trajectory Data obtained from Soccer Game Record -A First Step to Behavioral Modeling Shoji Hirano Shusaku Tsumoto [email protected] [email protected].

Clustering of Trajectory Data obtained from Soccer Game Record -A First Step to

Behavioral Modeling

Shoji Hirano Shusaku [email protected] 　 [email protected]

Dept of Medical Informatics,Shimane Univ. School of Medicine, Japan

Outline

Introduction Data Structure Method Experimental Results Conclusions and Future Work

Introduction

Clustering of Spatio-temporal Data 　 Provides a way to discover interesting characteristics

about the motion of targets Related field: meteorology, medical image analysis,

sports, crime research etc. Approaches

Spatial clustering + temporal continuity trace (e.g. tracking of moving object)

Spatial clustering based on temporal correlation (e.g. fMRI analysis)

Spatial clustering + observation of the temporal changes of the clusters (e.g. Observation of the climate regimes)

Objective

Development of a clustering method for trajectories with multiscale structural comparison scheme Compare trajectories according to both local and global views. Visualize common characteristics of trajectories

Application: Clustering of trajectories of passes in soccer game records Discovery of interesting spatio-temporal patterns of passes which

may reflect the strategy and tactics of the team Globally similar passes: strategy of the team -ex. Attack from right side Locally similar passes: tactics of the ream -x. Frequent use of one-two

passes

Data Structure

Soccer game records（ provided for research purpose by DataStadium Inc., Japan)

Series Time Action Team1 Player1 Team2 Player2 X1 Y1 X2 Y21 10:11:01 KICK OFF Fra -6 01 10:11:01 PASS Fra 12 Fra 21 -6 -11 6 -471 10:11:01 PASS Fra 21 Fra 21 6 -47 80 711 10:11:03 PASS Fra 21 Fra 17 80 71 31 8411 10:11:04 TRAP Fra 17 Fra 31 841

…

2 10:12:00 THROW IN Jap 11 3500 150

…

37 12:03:04 P END Fra 19 2432 71

Data Structure

Field geometry and Pass sequence

X

Y

-5346

5346

3500-3500

- 1000

0

1000

2000

3000

4000

5000

6000

- 2500 - 2000 - 1500 - 1000 - 500 0 500 1000 1500

PASS start

IN GOAL

t

Pass sequence clustering: Problems

Irregularly-sampled spatio-temporal sequence Data point is generated when a player takes an

interaction with a ball High interaction -> Dense Data

Low interaction -> Sparse Data Need for Multiscale Observation

Strategy -> global pass featureTactics -> local pass feature

Both exist concurrently

It is required to partly change comparison scale according to the granularity of data and type of events

- 1000

0

1000

2000

3000

4000

5000

6000

- 2500 - 2000 - 1500 - 1000 - 500 0 500 1000 1500

Sparse

Dense

Trajectory MiningPreprocessing

Segmentation and Generation of Multiscale Trajectories

Segment Hierarchy Trace and Matching

Calculation of Dissimilarities

Clustering of Trajectories

Method: Multiscale Matching

A pattern matching method that compares structural similarity of planar curves across multiple observation scales

Able to compare objects by partly changing observation scales Simultaneously compare both global and local similarities

Sequence A Sequence B

Scale

MatchedPairs

segment

Multiscale Description (Witkin et al 1984, Mokhatan et al. 1986)

Describe convex/concave structure at multiple scales

Sequence description ：　　 t : course parameter

Sequence x(t) at scale

Scale controls the degree of smoothing = small ： local feature, large ： global feature

),( tX),()(),( tgtxtX

duuxut2

2

2

)(

exp2

1)(

))(),(()( tytxtc

)),(),,((),( tYtXtc

Scale

)0,(tc

),( tc

Multiscale Matching based on Convex/Concave Structure of Segments (Ueda et al. 1990)

Segment: Partial sequence between adjacent inflection points

Curvature K (t, ) at scale

Inflection point: Represent a sequence as a set of segments

2/322 )(),(

YXYXYX

tK

),()(),(

),( )()( tgtxttX

tX mm

mm

0),(),1(:),( tKtKtci

Scale

),( tci

NiaA i ,...,2,1|)()(

)0(1a

)0(2a

)0(A

)(A

Matching Procedure

Sequence A

Sequence B

Inflection Points

Scale 0 Scale 1 Scale 2

IN GOAL

IN GOAL

A0(0)A1(0)

A2(0)A3(0)

A4(0)

B2(0)

B0(0)

B5(0)

B6(0)

B3(0)

B4(0)

B1(0)B0(1)

B1(1)

B2(1)B3(1)

B4(1)

B0(2)

B1(2)

B2(2)

A0(1)

A2(1)

A1(1)

A0(2)

A2(2)

A1(2)

t

Segment Dissimilarity

Dissimilarity of Segments

Dissimilarity of sequences

)()( , hj

ki ba

),( )()( hj

ki bad

Length

)(kai

)(kail)(k

ia

Segment ai(k)

)(hb j

)(hb jl)(h

jb

Segment bi(j)

Rotation Angle

)()( hb

ka ji

)(

)(

)(

)(

hB

hb

kA

ka

L

l

L

lji Max( , )

P

ppp badBAD

1

)0()0( ) ,(),(

P: the number of matched pairs

Indiscernibility-based Clustering: Overview

1. Assignment of initial equivalence relations (ERs) Assign an initial ER to each of the N objects. An ER independently performs binary classification,

similar or dissimilar, based on the relative proximity. Indiscernible objects under all of the N ERs form a

cluster.

2. Iterative refinement of initial ERs

For each pair of objects, count the ratio of ERs that have ability to discriminate them (indiscernibility degree)

If the number is small, assume that these ERs give too fine classification and disable their discrimination ability

Iterate step2 until the clusters become stable

Experiments

Data Game records of FIFA WorldCup 2002

(64 games, including all heats and finals) Number of goals: 168 (own goals excluded)

Procedure Select series containing ‘IN GOAL’ event, and

generate a total of 168 trajectories of 2-D ball location. For every possible pair of the trajectories, calculate

dissimilarity by using multiscale matching. Group the trajectories by using the obtained

dissimilarities and indiscernibility-based clustering

Experimental Results

Cluster Constitution

Cluster Cases

1 87

2 24

3 17

4 16

5 8

6 4

Cluster Cases

7 3

8 3

9 2

10 2

11 2

12 1

Note: 55.2% (7839/14196) of triplet in the dissimilarity matrix did not satisfy the triangular inequality due to matching failure

Experimental Results (cont’d)

Cluster 1 (87 cases)

Turkey vs Japan Italy vs Korea

IN GOAL

Matching Result

Corner Kick – Goal

Europe: 45, South America: 24, Asia: 9



Poland vs Portugal Germany vs Cameroon

IN GOAL

Matching Result

Complex Pass – Side attack- Goal

Europe: 13, South America: 7, Asia: 3



Slovenia vs Paraguay

IN GOAL

China vs Turkey

Matching Result

Side Change – Centering/Dribble – Goal

Europe: 10, South America: 4, Africa: 2


Cluster 3 (17 cases)Side Change – Centering/Dribble – Goal

(Intermediate cases between Cluster 2 and 4)

Europe: 10, South America: 2, Africa: 2Asia 2

Summary of Experimental Results

Goal success patterns can be classified into 4 major groups (with 8 minor patterns)

Patterns: complexity of pass sequences With additional information

Dribble/Centering/Side change: European Style However, the differences are not statistically significant.

Key is “Side Change” Players (Defenders) should take care of the other side of the ball

movement. The higher complexity of pass transactions, the higher rate of

goal success gains by side change.

Conclusions

Presented a new scheme of spatio-temporal data mining Grouped similar patterns using multiscale comparison

and indiscernibility-based clustering techniques. Visualized similar patterns using matching results. Application to real World Cup data:

Grouping and visualization of interesting pass patterns:ex. Complex pass -> side attack -> goal

Future Work

Technical Issues Numerical Evaluation Validation and improvement of segment dissimilarity measure;

inclusion of event type to dissimilarity

Apply the proposed method to all path series including non-‘IN GOAL’ series Differences between success and failure are very small. This suggests that the patterns of soccer attack are simple.

Apply the proposed method to medical environment Trajectories of Laboratory Examinations (IEEE ICDM06) Trajectories of Patients’ Movement: Patient Safety

Criteria for determining the best set of segment pairs Complete match; original sequence should be correctly formed by

concatenating the selected segments without any overlaps or gaps

Minimization of total segment difference

Matching Criteria

Overlap Gap

P

ppp badBAD

1

)0()0( ) ,(),(

P : Number of matched segment pairs

A

B

a1a2

a3 a4a5

b1 b2 b3 b4b5) ,( )0()0(

pp bad )0()0( , pp ba： dissimiarity of segments

Matching Failure Problem in MSM

Theoretically, any sequence can finally become a single segment at enough high scales. Therefore, any pair of sequences should be successfully matched.

Practically, there should be an upper limit of scales in order to reduce computational complexity. Therefore, the number of segments can be different even at the highest scales.

If matching is not successful, the method should return infinite dissimilarity or a magic value that indicates matching failure.

Scale 1

Scale 2

Scale n

no-match

match

Trajectory MiningPreprocessing

Segmentation and Generation of Multiscale Trajectories

Segment Hierarchy Trace and Matching

Calculation of Dissimilarities

Clustering of Trajectories