Date post: | 16-Jan-2017 |
Category: |
Data & Analytics |
Upload: | aist |
View: | 167 times |
Download: | 0 times |
Finding duplicate labels in behavioral dataAn application for E-Sport analytics
Mehdi Kaytoue
2016, Ekaterinburg, Russia
Une histoire de cigognes...
My hometown My teddy-bear Ekaterinburg
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 2 / 59
Short bio
2011 – Ph. D. from University de Lorraine, Nancy, France: Miningnumerical data with formal concept analysis with Amedeo Napoli anda strong collaboration with Sergei O. Kuznetsov.
2011 – Post-doc in Belo Horizonte (Brazil) with Wagner Meira Jr.
2012 – Assistant professor at INSA Lyon, team data mining andmachine learning lead (at the time) by Jean-Francois Boulicaut (nowby Celine Robardet).
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 3 / 59
What I could have been talking about
Constrained pattern mining
A database, e.g. transaction database
A fixed pattern shape, e.g. itemsets
A search space of all possible patterns (generally a lattice)
Several constraints, e.g. min. frequency
Goal: complete, correct, (non redundant) extraction of patterns sat.the constraints
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 4 / 59
What I could have been talking about
Constrained pattern mining
Numerical data, sequential data, graph data, augmented graphs, ...
Family of constraints, bounds
Discriminant patterns
Formal and generic frameworks, e.g. Formal Concept Analysis
Generic algorithms and pattern domains that can be applied in manyapplication domains
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 5 / 59
Patterns in dynamic attributed graphs
• Triggering patterns: attribute variationscan impact the topology of the graph< {a+, b+}, {c-},{deg+} >
17
Supervised descriptiverules discovery
description —> class label(s)
Langages with different expressivityHeuristic approaches (beam search)Subgroup discovery:stat. distribution of classesRedescription mining:Jaccard betwen the supportsPareto frontiers:when several measures
18
Telematics
Mobility trace
Asset 290 –
6 h
Asset 290 –
Journalism
• Mr. Y says: unemployment decreases! • He is not wrong but… • Politicians are experts for
giving facts true in the favorable contexte • A context = a pattern!
the goal is to re-contextualize the fact automatically
Mandat Mr. Y
Mandat Mr. X
Contextualized sub-graph mining
Sous-graphe des jeunes, le lundi avec un abonnement velov
Event detection from social media
But today ...
League of Legends – NA LCS Summer FinalMadison Square Garden in New York, NY (19 August 2015)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 6 / 59
Competitive gaming is raising drastically
Video game is a lucrativeindustry
People enjoy watching otherplaying (streaming viaTwitch.tv)
E-sports: professionalcyberathletes with teams,commentators, sponsors,cash prizes, ... ; betweensport and pure marketing
G. Cheung and J. Huang.
Starcraft from the stands: understanding the game spectator.In SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011, pp. 763–772.
M. Kaytoue, A. Silva, L. Cerf, W. Meira Jr. et C. Raıssi
Watch me playing, i am a professional: a first study on video game live streaming.In WWW 2012 (Companion Volume), pages 1181–1188. ACM, 2012.
T. L. Taylor
Raising the Stakes:E-Sports and the Professionalization of Computer Gaming.In MIT Press, 2012.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 7 / 59
A lot of challenges
Millions of games played on adaily basis
Security issues
Bugs, cheaters
Balance issues
Fun vs challenging agents
Profiling & prediction
Match preparation
Playground for AI research
Arthur von Eschen
Machine Learning and Data Mining in Call of Duty (invited industrial talk).European Conference on Machine Learning and Knowledge Discovery in Databases,ECML/PKDD, Nancy, France, Sept. 2014)
S. Ontanon, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and M. Preuss,
A survey of real-time strategy game ai research and competition in starcraft.Computational Intelligence and AI in Games, IEEE Transactions on, vol. 5, no. 4, pp. 293–311, 2013.)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 8 / 59
StarCraft II: real time strategy game
Description
Two players are battling against each other on a map
Each chooses a faction (Zerg, Terran, Protoss: 6 different match-upare possible)
Goal: use units to gather resources, to create buildings that canproduce units ... establish a strategy (choose the right buildings andarmy composition) to destroy your opponent.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 9 / 59
Observation 1
Players and teams observegame records of others
Complete game logs areavailable
Global ranking as well (suchas ATP in tennis)
More and more players use sev-eral [un-]official accounts tohide their games and not beingstudied by the others
http://leagueoflegends.wikia.com/wiki/Smurf
https://www.reddit.com/r/starcraft/comments/3gkfso/sc2_who_is_that_smurf/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 10 / 59
Problem 1
Player1 Avatar1
Player2 Avatar2
Match
Avatar3
Viewers
? ||||||||
Can we identify if two avatars belong to the same player?We have huge amounts of behavioral data!
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 11 / 59
Observation 2 and problem 2
Esport has all elements of a sport (pro, amateurs, coach,commentators, competition with high prizes, sponsors ...)
Studying the strategies of the players is a key problem
Can we discover automatically strategies from game traces?
Game editors need balanced games
Players need to discover frequent strategies of their opponents
Discovering patterns reveling strategies characteristic of a player ofa win/loss in general
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 12 / 59
Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
Behavioral data as replay files
The RTS game StarCraft 2:to improve strategy execution,players
assign control groups tounits and buildings,
bind them to keyboardhotkeys (1, 2, ..., 9, 0),
use them intensively alongwith the mouse(see on Youtube ’moonAPM demo’) Source: Yan et al., SIGCHI2015
Avatar Game trace Outcome
RorO s,s,hotkey4a,s,hotkey3a,s,hotkey3s, ... LoseTAiLS Base,hotkey1a,s,hotkey1s,s,hotkey1s, ... Win
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 14 / 59
Keyboard usage patterns
Hypothesis
A player cannot hide behavioural patterns when changing avatars
0510152025 O O O O O O O O O O O O OX O O O O O O O O O O O O O O O O O O O OX O O O OXX O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OX O O O O O O O O O O O O O O O O O OX O OX O O O OXX O O O O O O O O OX O O O O O O O O O O O O OXXXX O O O O O O O O O O O OX O O O O O O O O O O O O OX O O O O O O O OX O O O O O O O O O O O O OXXX O O O O O O O O O O O O OXX O O O O OXXX O OX O OXXXXXXXXXX O O O OXXX O OX O O OX O O OXXXX O OX O OXXXX O O O O OXXX O O O OX O O O OX O O OXXX O O O O O O OX O O O O O O O OXXXX O O O O O O OXXXXX O OXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXX O O OXXXXXXXXXXXX O OXXXXXX OXXXXXXXXXXXXX O O OXXXXXXXXXX O O O OXXXXXXXX O OXXXXXXXXXXXXXXX OX O O O O OXX OXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX O O O O O OXXXXXXXXXXXXXX O OXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXX O O O O O O O O O O O O O O OXX OXXXX OXXXXXXXXXXXX O OXXXXXXXXXXXXXXXXXXXXXX OX O O OX OXXX O O O O O O OX O O O O O O O OXX OX OX O O O O O O O O O OX OX O O O O O O O O O O
Dendogram of a hierarchical clustering from 708 traces from 354games: each color denotes a unique avatarM. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 15 / 59
Predictive models with high accuracy
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0
Pre
cisi
on
θ=5
j48
smo
nbayes
knn
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0θ=10
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0
Pre
cisi
on
θ=15
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0θ=20
j48
smo
nbayes
knn
PrecisionHotkeys hide unique patterns
20 first seconds of the gameare enough
20 games are enough
We found a similar result, butconsidering on purpose datasetwithout avatar aliases, sinceprecision drastically drops
Eddie Q. Yan, Jeff Huang, Gifford K. Cheung.
Masters of Control: Behavioral Patterns ofSimultaneous Unit Group Manipulation in StarCraft2.In CHI 2015, Crossings, Seoul, Korea 37–11, 2015.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 16 / 59
The duplicate label problem
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 17 / 59
Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
Notations
A prediction model ρ : T → L is learned
T a set of traces
L a set of trace labels (the avatars)
Tl the set of traces generated by avatar l ∈ L
The model is evaluated (e.g. cross-validation)
ρ(t) ∈ L return the model prediction for the trace t ∈ T
Confusion matrix C ρ = [ci ,j/|Tli |] withci ,j = |{t ∈ Tli s.t. ρ(t) = lj}|
l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 19 / 59
Objectives
Idea: two avatars of the same player should draw a high confusion
l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5
We are searching for pairs of labels that concentrate the confusion(arbitrary sets are left for later)
C ρij ' C ρji ' C ρii ' C ρjj
C ρij + C ρji + C ρii + C ρjj ' 2
... li lj ...... ...li ... Ci,i Ci,j ...lj ... Cj,i Cj,j ...... ...
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 20 / 59
Method (1/2): extract fuzzy concepts
Formal Concept Analysis (FCA) with a fuzzy set intersectionEach label (row) is considered as a fuzzy setLabels and their (fuzzy) intersections u form a semi-latticeClosed sets are extracted and scored (monotone)
M. Kaytoue, V. Codocedo, A. Buzmakov, J. Baixeries, S.O. Kuznetsov, A. Napoli:
Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing.ECML/PKDD 2015, Nectar track
Example
l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5
δ(l1) = {l0.61 , l0.42 , l03 , l04 , l
05}
δ(l2) = {l0.41 , l0.552 , l0.05
3 , l04 , l05}
d = δ(l1) u δ(l2) = {l0.41 , l0.42 , l03 , l04 , l
05}
support(d) = {l1, l2}
s(d) =
|L|∑j=1
dj = 0.8
The pair (l1, l2) is an avatar alias candidate
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 21 / 59
Method (2/2): rank and filter pairs
Candidate pairs are scored
A cosine similarity is used, the highest the better
cluster score(ai , aj) = cosine(〈C ρii , Cρij 〉, 〈C
ρjj , C
ρji 〉)
... li lj ...... ...li ... Ci,i Ci,j ...lj ... Cj,i Cj,j ...... ...
Why?
ai ajai 1 0aj 1 0
cosine(〈1, 0〉, 〈0, 1〉) = 0
Candidates are ranked; the list is cut with a threshold if necessary
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 22 / 59
Experimental settings
Datasets
Collection 1 - 2014 World Championship Series: 955 one-versus-onehigh level games and 171 unique players
Collection 2 - Spawning Tool Website crawl July 2014: 10,108one-versus-one games and 3,805 players
1
10
100
1000
200 400 600 800 1000 1200 1400
Nu
mb
er
of
ga
me
s p
laye
d (
log
-sca
le)
Number of players
Collection 2Collection 1
0
20
40
60
80
100
0 100 200 300 400 500 600 700 800 900 1000
% A
ctio
ns
Time (secs)
BaseSelectionSingleMineralHotkeys
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 23 / 59
Chosen features allow powerful prediction
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0
AU
C
θ=5
j48
smo
nbayes
knn
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0θ=10
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0
AU
C
θ=15
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0θ=20
j48
smo
nbayes
knn
AUC
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0
Pre
cisi
on
θ=5
j48
smo
nbayes
knn
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0θ=10
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0
Pre
cisi
on
θ=15
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0θ=20
j48
smo
nbayes
knn
Precision
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 24 / 59
Building a ground truth and evaluating aliases retrieval
Idea: each class is split into several; can we retrieve them?
Parameters:: γ = 0.2, θ = 20, λ = 0.9, τ = 90
SurrogatesClassifier F1 MAP Recall AUC Precision P@10j48 0.468 0.824 0.805 0.904 0.33 1.0naivebayes 0.226 0.740 0.390 0.915 0.16 0.8smo 0.312 0.971 0.536 0.993 0.22 1.0knn 0.567 0.822 0.976 0.882 0.4 0.9
Surrogates & URLSClassifier F1 MAP Recall AUC Precision P@10j48 0.588 0.907 0.606 0.866 0.57 1.0naivebayes 0.443 0.857 0.457 0.864 0.43 1.0smo 0.257 0.912 0.266 0.945 0.25 1.0knn 0.670 0.937 0.691 0.874 0.65 1.0
Surrogates & URLS & NamesClassifier F1 MAP Recall AUC Precision P@10j48 0.689 0.983 0.606 0.935 0.8 1.0naivebayes 0.560 0.943 0.492 0.906 0.65 1.0smo 0.258 0.949 0.227 0.960 0.3 1.0knn 0.758 0.967 0.667 0.792 0.88 1.0
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 25 / 59
About false positive
Some FP are not (sameunique id hidden for theexperiments)
Some FP with highscore are actually theavatars we are lookingfor!
0.6 0.7 0.8 0.9 1.0 1.1Score
0
5
10
15
20
Ranki
ng
EGaLive - aLiveRC
SMO Top 20 : γ=0.05, θ=5, λ=0.9
SUGURLNAMESFP
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 26 / 59
Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
Can we do better?
(bi)-cluster the confusion matrix
Cavadenti, O., V. Codocedo, J.-F. Boulicaut, et M. Kaytoue.
When cyberathletes conceal their game : Clustering confusionmatrices to identify avatar aliases.Dans International Conference on Data Science and AdvancedAnalytics (IEEE DSAA 2015).
`1 `2 `3 `4 `5
`1 10 8 0 0 0`2 7 8 1 0 0`3 0 0 5 3 1`4 0 1 0 12 6`5 0 0 0 5 8
The model is built a false labeling!
Some labels may be hard to be learned
Imbalanced distribution of the labels
Non enough samples for some labels
Virtual identities may be shared
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 28 / 59
General intuition
The problem of finding label duplicates
Given
a set of instances (game traces) T
each taking a label in L
Find a tolerance relation over L, that is, a set of subsets of L covering L,possibly with non-empty intersections (more general than a partition).
Basically
A tolerance relation is an anti-chain of the lattice of label subsets (2L,⊆)
{{l1, l2}, {l3}, {l4, l5}}{{l1, l2, l3}, {l3, l4, l5}}...
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 29 / 59
General idea
Build a binary classifier for all subsets of labels
L
Ø
For each, B ⊂ L, we have
a model ρB : T −→ {+,−} with + = B et − = B,provided with its confusion matrix
Desiderata
A set B ⊂ L is valid iff it represent a set of duplicate labels
How to select these valid sets?How to avoid building 2|L| models?
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 30 / 59
F1-mesure for each label set B
Predicted
Act
ual C ρB + −
+ α++ α+−− α−+ α−−
F1-mesure
Given B ⊂ L and C ρB :
ϕB =2 · α++
(2 · α++) + (α+−) + (α−+)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 31 / 59
First constraint
Given C ,D ⊂ L and E = C ∪ D.
Greedy model improvement
E is valid ifϕE ≥ max(ϕC , ϕD)
φE ?
φc=0.5 φD=0.4
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 32 / 59
Is it enough? (actually it is...)
Given C ,D ⊂ L and how the corresponding models classified 10 instances
C
D
C and D are probably not duplicate labels
C D
C and D are probably duplicate labels
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 33 / 59
Constraint 2
For E ⊆ L, PE is composed of the instances classified as TP, FN, FP.
Instance coverage
E ⊆ L is valid if
max(|PC |, |PD |) ≤ |PE | ≤ |PC |+ |PD | − µ(PC ,PD) · θ
with µ a measure (min, max) and θ ∈ [0; 1].
Intuitively, if E is valid, we should have PE = PC ∩ PD , having similartraces.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 34 / 59
Algorithm
Generate all subsets, level-wise, bottom-upFor each subset B ⊂ L,
Learn model ρBValidate (crossed validation)Compute scoresCheck constraints (remove from candidates otherwise)Continue next level with current candidates
The result is given by the maximal elements (size-wise/inclusion-wise)
L
Ø
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 35 / 59
Experimental settings
Datasets
Collection C1 - 2014 World Championship Series: 955 one-versus-onehigh level games and 171 unique players
Collection C2 - Spawning Tool Website crawl July 2014: 10,108one-versus-one games and 3,805 players
Need a ground truth from C1.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 36 / 59
Ground truth
Imagine several traces/instances of A ∈ L.
A A A A A A
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
Ground truth
Imagine several traces/instances of A ∈ L.
A A A B B B
Balanced split 50% – 50%
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
Ground truth
Imagine several traces/instances of A ∈ L.
A A B B C C
Balanced split 33% – 33% – 33%
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
Ground truth
Imagine several traces/instances of A ∈ L.
A A B B B B
Imbalanced split 33 % – 66 %
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
Experimental results on C1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●0.00
0.25
0.50
0.75
1.001_
1
2_3
1_4
1_1_
1
1_1_
2
1_2_
31_
1_1_
11_
1_2_
21_
2_3_
4
Proportions
Pré
cisi
on
●
●
●●●● ●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●0.00
0.25
0.50
0.75
1.00
1_1
2_3
1_4
1_1_
1
1_1_
2
1_2_
31_
1_1_
11_
1_2_
21_
2_3_
4
Proportions
Rap
pel
0
50
100
1_1 2_3 1_4 1_1_1 1_1_2 1_2_3 1_1_1_1 1_1_2_2 1_2_3_4Proportions
Dur
ée (
sec.
)
classifier
SMO
RandomForest
NaiveBayes
MultilayerPerceptron
J48
IBk
New pairs found on C2 with imbalanced distributionFor example Ex-pro EGStephanoRC associated to a lIlIlIllIIII nameM. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 38 / 59
Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
Goal
Discovery of strategies
Automatically from a large set of games
Evaluate their capacity to win/loose
Framework
Sequential pattern mining
Discriminant pattern mining
Jian Pei et al.
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth.In ICDE, 2001.
Guozhu Dong, Jinyan Li
Efficient Mining of Emerging Patterns : Discovering Trends and Differences.In KDD, 1999.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 40 / 59
Sequential pattern mining
id description
s1 〈a{abc}{ac}d{cf }〉s2 〈{ad}c{bc}{ae}〉s3 〈{ef }{ab}{df }cb〉s4 〈eg{af }cbc〉
Example
Set of items: I = {a, b, c , d , e, f }Sequence : s1 = 〈a{abc}{ac}d{cf }〉Sub-sequence: 〈abc〉 v 〈a{abc}{ac}d{cf }〉Frequent sub-sequence: 〈cb〉 v s2, s3, s4
⇒ |supportD(〈cb〉)| = |{s2, s3, s4}| = 3 ≥ minSupp = 2
Problem : extract the complete and correct collection of frequentsequential patterns
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 41 / 59
Emerging pattern [Dong, Li - 1999]
id description class
s1 〈a{abc}{ac}d{cf }〉 +s2 〈{ad}cc{bbc}{ae}〉 +s3 〈{ef }{ab}{df }cb cb〉 −s4 〈eg{af }cbcbc〉 −
Discriminating power
Each sequence is labeled (+ or −)
A pattern is emerging if it has a high support in a class and low onein the other
Growth-rate: gr(s,Dx) = |support(s,Dx )||Dx | × |Dy |
|support(s,Dy )|
gr(〈cb〉,D−) = 22 ×
21 = 2
P. K. Novak, N. Lavrac, and G. I. Webb.:
Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining..J. Mach. Learn. Res., 10:377–403, 2009.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 42 / 59
How to encode game logs?
Case 1 :Sequence Winner
〈(j1, a){(j1, b)(j1, c)(j2, c)}{(j2, a)(j1, d)}(j2, b)〉 j1〈(j3, a){(j3, b)(j3, c)(j3, d)}{(j1, b)(j1, c)}(j1, d)〉 j3
but we wish to generalize to + and − classes only
Case 2 :
Player sequence classj1 〈a{bc}d〉 +j2 〈c{ab}〉 −j1 〈a{bcd}〉 −j3 〈{bc}d〉 +
⇒ but we need to take into account the action/reaction principle
Proposed encoding:Sequence
〈(a,+){(b,+)(c,+)(c,−)}{(a,−)(d ,+)}(b,−)〉〈(a,+){(b,+)(c,+)(d ,+)}{(b,−)(c,−)}(d ,−)〉
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 43 / 59
Definitions
Items
Sequence can take symbols in I = A× R ou R = {+,−}.
Dual of an item, of a sequence
The dual of item i = (a, r) ∈ I is given by i = (a,R\r) ∈ I.The dual of a sequence s, denoted s, is obtained by replacing each item(a, r) ∈ I with its dual (a,R\r) ∈ I.
Example
s = 〈{(a,−)(b,+)(c ,−)}(e,+)〉s = 〈{(a,+)(b,−)(c ,+)}(e,−)〉
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 44 / 59
Discriminating measure
The balance measure)
Let s be a frequent sequential pattern,
balance(s) =|supportD(s)|
|supportD(s)|+ |supportD(s)|
Properties
balance(s) ∈ [0; 1]
balance(s) = 0.5⇒ balanced strategy
balance(s) = 1 or 0⇒ imbalanced strategy
balance(s) + balance(s) = 1
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 45 / 59
PrefixSpan [Han et al., 2001]
Algorithm that enumerates frequent sequence prefixes
Input:
Sequence database (encoded game logs)Minimal support (minSupp)
Output :
All frequent sequential patterns and only them
i1
i2 i6i3
i4 i5
<i1>
<i1 i2> <{i1 i6}>
<i4> <i5>
<i1 i3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 46 / 59
Algorithms
Balance measure computation
As a post processing
Naively
For each frequent pattern, builds its dualScan the base to get its support
Naive optimization
i1
i1i2i3i4i5i6
1,q2,q6,q101,q2,q201,q63,q7,q143,q8,q93,q6,q10,q15
i1 .....................
Item Dual(Item)i1i2i3i4i5i6
i4i6i5i1i3i2
SupportDual(<i1>)q=qSeq(Dual(i1),i1)q=q{3,7,14}
SupportDual(<i1qi2>)q=qIntersect(SupportDual(<i1>),Seq(Dual(i2),i1)q=q{3}
Seq
i2 i6i3
i4 i5
<i1>
<i1qi2> <{i1qi6}>
<i4> <i5>
<i1qi3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 47 / 59
Algorithm
Suppressing redundant patterns
s = 〈{(a,−)(b,+)(c ,−)}(e,+)〉s = 〈{(a,+)(b,−)(c ,+)}(e,−)〉As a post process
Double search in the prefix tree
i1
Item Dual(Item)i1i2i3i4i5i6
i4i6i5i1i3i2
i2 i6i3
i4 i5
<i1>
<i1 i2> <{i1 i6}>
<i4> <i5>
<i1 i3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 48 / 59
Algorithms
Actually, plenty of algorithm adaptations are possible for someparticular cases of datasets
We designed an efficient and generic algorithmExtends PrefixSpan by considering two projected databases per node.
G. Bosc, M. Kaytoue, C. Raıssy, J.-F. Boulicaut, P. Tan.
Mining Balanced Sequential Patterns in RTS Games.European Conference on Artificial Intelligence, ECAI 2014
G. Bosc, P.Tan, J.-F. Boulicaut, C. Raıssy and M. Kaytoue
A Pattern Mining Approach to Study Strategy Balance in RTS Games.IEEE Transactions on Computational Games and Artificial Intelligence (early access), 2015.
Another work applied to StarCraft II dataC. Low-Kam, C. Raıssi, M. Kaytoue, J. Pei
Mining Statistically Significant Sequential Patterns.International Conference on Data Mining (ICDM) 2013.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 49 / 59
Data collection
Scraping 371 267 replaysFiltering to keep 90 768 games, 30 678 different players
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 5 10 15 20 25 30 35 40
Replay
Time (min)
0
100
200
300
400
500
600
0 5 10 15 20 25 30 35 40
APM
Time (min)
Average + Standard deviationAverage
Average - Standard deviation 0
20
40
60
80
100
0 5 10 15 20 25 30 35 40
% Actions
Time (min)
BuildTrainSelectMoveClickResearchUpgradeHotKeyMinimapOther
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 50 / 59
Sequence dataset
Data BuildItem Seq. IS I/IS
PvP 1,160 6,668 11.5 2.0PvT 3,655 18,754 19.0 2.6PvZ 3,748 22,784 19.6 2.7TvT 2,201 7,457 20.7 2.8TvZ 4,492 23,637 22.5 2.8ZvZ 1,689 9,554 14.2 2.2
Table: Encoding building construction during the 10 first minutes
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 51 / 59
Quantitative results
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 52 / 59
Quantitative results
Symmetric axis: y = 0.5Non perfect symmetry: if a sequence s is frequent,it does not imply that s is frequent tooPattern with highest support are the most known strategies and arebalanced
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 53 / 59
Example of discovered patterns [Forge-Expand]
Protoss strategy in PvZMotivation: favor economy in early game while still being able todefend
minSupp 5% - 591 patterns
s = 〈{(Nexus, 5,+)}{(Gateway , 6,+)(PhotonCannon, 6,+)}〉 -balance(s) = 0.52
s = 〈{(Nexus, 5,+)}{(PhotonCannon, 6,+)(Assimilator , 6,+)}〉 -balance(s) = 0.52
Temps (sec)36A-A40A:
96A-A106A:132A-A145A:132A-A145A:144A-A158A:144A-A158A:144A-A158A:
ActionPylonForgeNexusPylonGatewayPhotonACannonAssimilatorAx2
BuildAOrderA:AForgeAExpand
Source : http://www.teamliquid.net/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 54 / 59
Example of discovered patterns [4 Gates]
Protoss strategy in PvPMotivation: all-in, aggressive, early game attack (scarifies economy)
minSupp 5% - 3418 motifs
s = 〈{(Gateway , 3,+, 1) (Assimilator , 3,+, 1)} {(Cyb.Core, 4,+, 1)}{(Gateway , 7,+, 2) (Gateway , 7,+, 3) (Gateway , 7,+, 4)}〉 -balance(s) = 0.59
Temps (sec)36W-W40W:72W-W79W:
96W-W106W:108W-W119W:132W-W145W:192W-W211W:216W-W238W:240W-W264W:240W-W264W:
ActionPylonGatewayAssimilatorPylonCyberneticsWCoreWarpgateGatewayWx3PylonAssimilator
BuildWOrderW:W4WGates
Source : http://www.teamliquid.net/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 55 / 59
Imbalanced strategies
A hot topic for game editorsTvZ + minSupp = 1% : 17 990 patterns
“Bunker-Rush” detected and imbalancedBunker contained 602 motifs20 patterns with balance(s) ≥ 0.6 or ≤ 0.4 when the bunker is done inearly games = 〈{(Barracks, 1,S , 1)}, {(SpPool , 4,F , 1)}, {(Bunker , 6,S , 1),(SpCrawler , 6,F , 1)}〉 (balance(s) = 0.61)
This balance issue has been actually corrected (May 2012): a Zergcounter unit as been slightly improved and bunker timing is longer.We divided the dataset into two and run a comparative analysis,frequent patterns with bunkers are more balanced.
The code is available and can be used for other tasks!https://github.com/guillaume-bosc/BalanceSpan
(For example, mining (im)-balanced drafting in MOBA games).
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 56 / 59
Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
Conclusion
Take away facts
E-sport may not be a ’true’ sport, but its development is incredible
New challenges in video game design and analytics: fun/difficultyparadigm to satisfy standard players and pro
Games traces hide individual patterns
In StarCraft 2, ia customizable keyboard usageWhen avatar aliases are present, one needs to unscramble the confusionmatrixTo avoid biases, on can build the lattice of binary classifiers
Games traces hide strategies
Sequential pattern mining with a new measure, the balance measurecan help discovering such patternsIt can be applied in any zero-sum game scenario for descriptiveanalytics
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 58 / 59
Thanks to my colleaguesat INSA/ LIRIS: Guillaume Bosc, Jean-Francois Boulicaut,Victor Codocedo, Quentin Labernia, Marc Plantevit, Celine Robardetat MIT Media Lab / Game Lab: Philip Tanat INRIA: Chedy Raıssiand most importantly to you and the AIST organization team!
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 59 / 59