+ All Categories
Home > Data & Analytics > Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Date post: 16-Jan-2017
Category:
Upload: aist
View: 167 times
Download: 0 times
Share this document with a friend
68
Finding duplicate labels in behavioral data An application for E-Sport analytics Mehdi Kaytoue 2016, Ekaterinburg, Russia
Transcript
Page 1: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Finding duplicate labels in behavioral dataAn application for E-Sport analytics

Mehdi Kaytoue

2016, Ekaterinburg, Russia

Page 2: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Une histoire de cigognes...

My hometown My teddy-bear Ekaterinburg

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 2 / 59

Page 3: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Short bio

2011 – Ph. D. from University de Lorraine, Nancy, France: Miningnumerical data with formal concept analysis with Amedeo Napoli anda strong collaboration with Sergei O. Kuznetsov.

2011 – Post-doc in Belo Horizonte (Brazil) with Wagner Meira Jr.

2012 – Assistant professor at INSA Lyon, team data mining andmachine learning lead (at the time) by Jean-Francois Boulicaut (nowby Celine Robardet).

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 3 / 59

Page 4: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

What I could have been talking about

Constrained pattern mining

A database, e.g. transaction database

A fixed pattern shape, e.g. itemsets

A search space of all possible patterns (generally a lattice)

Several constraints, e.g. min. frequency

Goal: complete, correct, (non redundant) extraction of patterns sat.the constraints

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 4 / 59

Page 5: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

What I could have been talking about

Constrained pattern mining

Numerical data, sequential data, graph data, augmented graphs, ...

Family of constraints, bounds

Discriminant patterns

Formal and generic frameworks, e.g. Formal Concept Analysis

Generic algorithms and pattern domains that can be applied in manyapplication domains

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 5 / 59

Page 6: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Patterns in dynamic attributed graphs

• Triggering patterns: attribute variationscan impact the topology of the graph< {a+, b+}, {c-},{deg+} >

17

Page 7: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Supervised descriptiverules discovery

description —> class label(s)

Langages with different expressivityHeuristic approaches (beam search)Subgroup discovery:stat. distribution of classesRedescription mining:Jaccard betwen the supportsPareto frontiers:when several measures

18

Page 8: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Telematics

Mobility trace

Asset 290 –

6 h

Asset 290 –

Page 9: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Journalism

• Mr. Y says: unemployment decreases! • He is not wrong but… • Politicians are experts for

giving facts true in the favorable contexte • A context = a pattern!

the goal is to re-contextualize the fact automatically

Mandat Mr. Y

Mandat Mr. X

Page 10: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Contextualized sub-graph mining

Sous-graphe des jeunes, le lundi avec un abonnement velov

Page 11: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Event detection from social media

Page 12: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

But today ...

League of Legends – NA LCS Summer FinalMadison Square Garden in New York, NY (19 August 2015)

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 6 / 59

Page 13: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Competitive gaming is raising drastically

Video game is a lucrativeindustry

People enjoy watching otherplaying (streaming viaTwitch.tv)

E-sports: professionalcyberathletes with teams,commentators, sponsors,cash prizes, ... ; betweensport and pure marketing

G. Cheung and J. Huang.

Starcraft from the stands: understanding the game spectator.In SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011, pp. 763–772.

M. Kaytoue, A. Silva, L. Cerf, W. Meira Jr. et C. Raıssi

Watch me playing, i am a professional: a first study on video game live streaming.In WWW 2012 (Companion Volume), pages 1181–1188. ACM, 2012.

T. L. Taylor

Raising the Stakes:E-Sports and the Professionalization of Computer Gaming.In MIT Press, 2012.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 7 / 59

Page 14: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

A lot of challenges

Millions of games played on adaily basis

Security issues

Bugs, cheaters

Balance issues

Fun vs challenging agents

Profiling & prediction

Match preparation

Playground for AI research

Arthur von Eschen

Machine Learning and Data Mining in Call of Duty (invited industrial talk).European Conference on Machine Learning and Knowledge Discovery in Databases,ECML/PKDD, Nancy, France, Sept. 2014)

S. Ontanon, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and M. Preuss,

A survey of real-time strategy game ai research and competition in starcraft.Computational Intelligence and AI in Games, IEEE Transactions on, vol. 5, no. 4, pp. 293–311, 2013.)

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 8 / 59

Page 15: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

StarCraft II: real time strategy game

Description

Two players are battling against each other on a map

Each chooses a faction (Zerg, Terran, Protoss: 6 different match-upare possible)

Goal: use units to gather resources, to create buildings that canproduce units ... establish a strategy (choose the right buildings andarmy composition) to destroy your opponent.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 9 / 59

Page 16: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Observation 1

Players and teams observegame records of others

Complete game logs areavailable

Global ranking as well (suchas ATP in tennis)

More and more players use sev-eral [un-]official accounts tohide their games and not beingstudied by the others

http://leagueoflegends.wikia.com/wiki/Smurf

https://www.reddit.com/r/starcraft/comments/3gkfso/sc2_who_is_that_smurf/

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 10 / 59

Page 17: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Problem 1

Player1 Avatar1

Player2 Avatar2

Match

Avatar3

Viewers

? ||||||||

Can we identify if two avatars belong to the same player?We have huge amounts of behavioral data!

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 11 / 59

Page 18: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Observation 2 and problem 2

Esport has all elements of a sport (pro, amateurs, coach,commentators, competition with high prizes, sponsors ...)

Studying the strategies of the players is a key problem

Can we discover automatically strategies from game traces?

Game editors need balanced games

Players need to discover frequent strategies of their opponents

Discovering patterns reveling strategies characteristic of a player ofa win/loss in general

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 12 / 59

Page 19: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Outline

1 Predictive models from behavioral data

2 Unscrambling confusion matrices to identify aliases

3 Enumerating the lattice of binary classifiers

4 Discovering strategies and balance issues

5 Conclusion

Page 20: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Behavioral data as replay files

The RTS game StarCraft 2:to improve strategy execution,players

assign control groups tounits and buildings,

bind them to keyboardhotkeys (1, 2, ..., 9, 0),

use them intensively alongwith the mouse(see on Youtube ’moonAPM demo’) Source: Yan et al., SIGCHI2015

Avatar Game trace Outcome

RorO s,s,hotkey4a,s,hotkey3a,s,hotkey3s, ... LoseTAiLS Base,hotkey1a,s,hotkey1s,s,hotkey1s, ... Win

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 14 / 59

Page 21: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Keyboard usage patterns

Hypothesis

A player cannot hide behavioural patterns when changing avatars

0510152025 O O O O O O O O O O O O OX O O O O O O O O O O O O O O O O O O O OX O O O OXX O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OX O O O O O O O O O O O O O O O O O OX O OX O O O OXX O O O O O O O O OX O O O O O O O O O O O O OXXXX O O O O O O O O O O O OX O O O O O O O O O O O O OX O O O O O O O OX O O O O O O O O O O O O OXXX O O O O O O O O O O O O OXX O O O O OXXX O OX O OXXXXXXXXXX O O O OXXX O OX O O OX O O OXXXX O OX O OXXXX O O O O OXXX O O O OX O O O OX O O OXXX O O O O O O OX O O O O O O O OXXXX O O O O O O OXXXXX O OXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXX O O OXXXXXXXXXXXX O OXXXXXX OXXXXXXXXXXXXX O O OXXXXXXXXXX O O O OXXXXXXXX O OXXXXXXXXXXXXXXX OX O O O O OXX OXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX O O O O O OXXXXXXXXXXXXXX O OXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXX O O O O O O O O O O O O O O OXX OXXXX OXXXXXXXXXXXX O OXXXXXXXXXXXXXXXXXXXXXX OX O O OX OXXX O O O O O O OX O O O O O O O OXX OX OX O O O O O O O O O OX OX O O O O O O O O O O

Dendogram of a hierarchical clustering from 708 traces from 354games: each color denotes a unique avatarM. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 15 / 59

Page 22: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Predictive models with high accuracy

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0

Pre

cisi

on

θ=5

j48

smo

nbayes

knn

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0θ=10

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0

Pre

cisi

on

θ=15

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0θ=20

j48

smo

nbayes

knn

PrecisionHotkeys hide unique patterns

20 first seconds of the gameare enough

20 games are enough

We found a similar result, butconsidering on purpose datasetwithout avatar aliases, sinceprecision drastically drops

Eddie Q. Yan, Jeff Huang, Gifford K. Cheung.

Masters of Control: Behavioral Patterns ofSimultaneous Unit Group Manipulation in StarCraft2.In CHI 2015, Crossings, Seoul, Korea 37–11, 2015.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 16 / 59

Page 23: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

The duplicate label problem

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 17 / 59

Page 24: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Outline

1 Predictive models from behavioral data

2 Unscrambling confusion matrices to identify aliases

3 Enumerating the lattice of binary classifiers

4 Discovering strategies and balance issues

5 Conclusion

Page 25: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Notations

A prediction model ρ : T → L is learned

T a set of traces

L a set of trace labels (the avatars)

Tl the set of traces generated by avatar l ∈ L

The model is evaluated (e.g. cross-validation)

ρ(t) ∈ L return the model prediction for the trace t ∈ T

Confusion matrix C ρ = [ci ,j/|Tli |] withci ,j = |{t ∈ Tli s.t. ρ(t) = lj}|

l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 19 / 59

Page 26: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Objectives

Idea: two avatars of the same player should draw a high confusion

l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5

We are searching for pairs of labels that concentrate the confusion(arbitrary sets are left for later)

C ρij ' C ρji ' C ρii ' C ρjj

C ρij + C ρji + C ρii + C ρjj ' 2

... li lj ...... ...li ... Ci,i Ci,j ...lj ... Cj,i Cj,j ...... ...

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 20 / 59

Page 27: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Method (1/2): extract fuzzy concepts

Formal Concept Analysis (FCA) with a fuzzy set intersectionEach label (row) is considered as a fuzzy setLabels and their (fuzzy) intersections u form a semi-latticeClosed sets are extracted and scored (monotone)

M. Kaytoue, V. Codocedo, A. Buzmakov, J. Baixeries, S.O. Kuznetsov, A. Napoli:

Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing.ECML/PKDD 2015, Nectar track

Example

l1 l2 l3 l4 l5l1 0.6 0.4 0 0 0l2 0.4 0.55 0.05 0 0l3 0 0 0.8 0.15 0.05l4 0 0.05 0 0.7 0.25l5 0 0 0 0.5 0.5

δ(l1) = {l0.61 , l0.42 , l03 , l04 , l

05}

δ(l2) = {l0.41 , l0.552 , l0.05

3 , l04 , l05}

d = δ(l1) u δ(l2) = {l0.41 , l0.42 , l03 , l04 , l

05}

support(d) = {l1, l2}

s(d) =

|L|∑j=1

dj = 0.8

The pair (l1, l2) is an avatar alias candidate

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 21 / 59

Page 28: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Method (2/2): rank and filter pairs

Candidate pairs are scored

A cosine similarity is used, the highest the better

cluster score(ai , aj) = cosine(〈C ρii , Cρij 〉, 〈C

ρjj , C

ρji 〉)

... li lj ...... ...li ... Ci,i Ci,j ...lj ... Cj,i Cj,j ...... ...

Why?

ai ajai 1 0aj 1 0

cosine(〈1, 0〉, 〈0, 1〉) = 0

Candidates are ranked; the list is cut with a threshold if necessary

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 22 / 59

Page 29: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Experimental settings

Datasets

Collection 1 - 2014 World Championship Series: 955 one-versus-onehigh level games and 171 unique players

Collection 2 - Spawning Tool Website crawl July 2014: 10,108one-versus-one games and 3,805 players

1

10

100

1000

200 400 600 800 1000 1200 1400

Nu

mb

er

of

ga

me

s p

laye

d (

log

-sca

le)

Number of players

Collection 2Collection 1

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800 900 1000

% A

ctio

ns

Time (secs)

BaseSelectionSingleMineralHotkeys

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 23 / 59

Page 30: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Chosen features allow powerful prediction

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0

AU

C

θ=5

j48

smo

nbayes

knn

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0θ=10

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0

AU

C

θ=15

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0θ=20

j48

smo

nbayes

knn

AUC

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0

Pre

cisi

on

θ=5

j48

smo

nbayes

knn

101 102 103 1040.5

0.6

0.7

0.8

0.9

1.0θ=10

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0

Pre

cisi

on

θ=15

j48

smo

nbayes

knn

101 102 103 104

log(τ)

0.5

0.6

0.7

0.8

0.9

1.0θ=20

j48

smo

nbayes

knn

Precision

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 24 / 59

Page 31: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Building a ground truth and evaluating aliases retrieval

Idea: each class is split into several; can we retrieve them?

Parameters:: γ = 0.2, θ = 20, λ = 0.9, τ = 90

SurrogatesClassifier F1 MAP Recall AUC Precision P@10j48 0.468 0.824 0.805 0.904 0.33 1.0naivebayes 0.226 0.740 0.390 0.915 0.16 0.8smo 0.312 0.971 0.536 0.993 0.22 1.0knn 0.567 0.822 0.976 0.882 0.4 0.9

Surrogates & URLSClassifier F1 MAP Recall AUC Precision P@10j48 0.588 0.907 0.606 0.866 0.57 1.0naivebayes 0.443 0.857 0.457 0.864 0.43 1.0smo 0.257 0.912 0.266 0.945 0.25 1.0knn 0.670 0.937 0.691 0.874 0.65 1.0

Surrogates & URLS & NamesClassifier F1 MAP Recall AUC Precision P@10j48 0.689 0.983 0.606 0.935 0.8 1.0naivebayes 0.560 0.943 0.492 0.906 0.65 1.0smo 0.258 0.949 0.227 0.960 0.3 1.0knn 0.758 0.967 0.667 0.792 0.88 1.0

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 25 / 59

Page 32: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

About false positive

Some FP are not (sameunique id hidden for theexperiments)

Some FP with highscore are actually theavatars we are lookingfor!

0.6 0.7 0.8 0.9 1.0 1.1Score

0

5

10

15

20

Ranki

ng

EGaLive - aLiveRC

SMO Top 20 : γ=0.05, θ=5, λ=0.9

SUGURLNAMESFP

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 26 / 59

Page 33: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Outline

1 Predictive models from behavioral data

2 Unscrambling confusion matrices to identify aliases

3 Enumerating the lattice of binary classifiers

4 Discovering strategies and balance issues

5 Conclusion

Page 34: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Can we do better?

(bi)-cluster the confusion matrix

Cavadenti, O., V. Codocedo, J.-F. Boulicaut, et M. Kaytoue.

When cyberathletes conceal their game : Clustering confusionmatrices to identify avatar aliases.Dans International Conference on Data Science and AdvancedAnalytics (IEEE DSAA 2015).

`1 `2 `3 `4 `5

`1 10 8 0 0 0`2 7 8 1 0 0`3 0 0 5 3 1`4 0 1 0 12 6`5 0 0 0 5 8

The model is built a false labeling!

Some labels may be hard to be learned

Imbalanced distribution of the labels

Non enough samples for some labels

Virtual identities may be shared

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 28 / 59

Page 35: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

General intuition

The problem of finding label duplicates

Given

a set of instances (game traces) T

each taking a label in L

Find a tolerance relation over L, that is, a set of subsets of L covering L,possibly with non-empty intersections (more general than a partition).

Basically

A tolerance relation is an anti-chain of the lattice of label subsets (2L,⊆)

{{l1, l2}, {l3}, {l4, l5}}{{l1, l2, l3}, {l3, l4, l5}}...

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 29 / 59

Page 36: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

General idea

Build a binary classifier for all subsets of labels

L

Ø

For each, B ⊂ L, we have

a model ρB : T −→ {+,−} with + = B et − = B,provided with its confusion matrix

Desiderata

A set B ⊂ L is valid iff it represent a set of duplicate labels

How to select these valid sets?How to avoid building 2|L| models?

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 30 / 59

Page 37: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

F1-mesure for each label set B

Predicted

Act

ual C ρB + −

+ α++ α+−− α−+ α−−

F1-mesure

Given B ⊂ L and C ρB :

ϕB =2 · α++

(2 · α++) + (α+−) + (α−+)

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 31 / 59

Page 38: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

First constraint

Given C ,D ⊂ L and E = C ∪ D.

Greedy model improvement

E is valid ifϕE ≥ max(ϕC , ϕD)

φE ?

φc=0.5 φD=0.4

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 32 / 59

Page 39: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Is it enough? (actually it is...)

Given C ,D ⊂ L and how the corresponding models classified 10 instances

C

D

C and D are probably not duplicate labels

C D

C and D are probably duplicate labels

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 33 / 59

Page 40: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Constraint 2

For E ⊆ L, PE is composed of the instances classified as TP, FN, FP.

Instance coverage

E ⊆ L is valid if

max(|PC |, |PD |) ≤ |PE | ≤ |PC |+ |PD | − µ(PC ,PD) · θ

with µ a measure (min, max) and θ ∈ [0; 1].

Intuitively, if E is valid, we should have PE = PC ∩ PD , having similartraces.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 34 / 59

Page 41: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Algorithm

Generate all subsets, level-wise, bottom-upFor each subset B ⊂ L,

Learn model ρBValidate (crossed validation)Compute scoresCheck constraints (remove from candidates otherwise)Continue next level with current candidates

The result is given by the maximal elements (size-wise/inclusion-wise)

L

Ø

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 35 / 59

Page 42: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Experimental settings

Datasets

Collection C1 - 2014 World Championship Series: 955 one-versus-onehigh level games and 171 unique players

Collection C2 - Spawning Tool Website crawl July 2014: 10,108one-versus-one games and 3,805 players

Need a ground truth from C1.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 36 / 59

Page 43: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Ground truth

Imagine several traces/instances of A ∈ L.

A A A A A A

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59

Page 44: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Ground truth

Imagine several traces/instances of A ∈ L.

A A A B B B

Balanced split 50% – 50%

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59

Page 45: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Ground truth

Imagine several traces/instances of A ∈ L.

A A B B C C

Balanced split 33% – 33% – 33%

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59

Page 46: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Ground truth

Imagine several traces/instances of A ∈ L.

A A B B B B

Imbalanced split 33 % – 66 %

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59

Page 47: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Experimental results on C1

●●

●●

●●

●●

●0.00

0.25

0.50

0.75

1.001_

1

2_3

1_4

1_1_

1

1_1_

2

1_2_

31_

1_1_

11_

1_2_

21_

2_3_

4

Proportions

Pré

cisi

on

●●●● ●

●●

●●●●●

●●●

●0.00

0.25

0.50

0.75

1.00

1_1

2_3

1_4

1_1_

1

1_1_

2

1_2_

31_

1_1_

11_

1_2_

21_

2_3_

4

Proportions

Rap

pel

0

50

100

1_1 2_3 1_4 1_1_1 1_1_2 1_2_3 1_1_1_1 1_1_2_2 1_2_3_4Proportions

Dur

ée (

sec.

)

classifier

SMO

RandomForest

NaiveBayes

MultilayerPerceptron

J48

IBk

New pairs found on C2 with imbalanced distributionFor example Ex-pro EGStephanoRC associated to a lIlIlIllIIII nameM. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 38 / 59

Page 48: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Outline

1 Predictive models from behavioral data

2 Unscrambling confusion matrices to identify aliases

3 Enumerating the lattice of binary classifiers

4 Discovering strategies and balance issues

5 Conclusion

Page 49: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Goal

Discovery of strategies

Automatically from a large set of games

Evaluate their capacity to win/loose

Framework

Sequential pattern mining

Discriminant pattern mining

Jian Pei et al.

PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth.In ICDE, 2001.

Guozhu Dong, Jinyan Li

Efficient Mining of Emerging Patterns : Discovering Trends and Differences.In KDD, 1999.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 40 / 59

Page 50: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Sequential pattern mining

id description

s1 〈a{abc}{ac}d{cf }〉s2 〈{ad}c{bc}{ae}〉s3 〈{ef }{ab}{df }cb〉s4 〈eg{af }cbc〉

Example

Set of items: I = {a, b, c , d , e, f }Sequence : s1 = 〈a{abc}{ac}d{cf }〉Sub-sequence: 〈abc〉 v 〈a{abc}{ac}d{cf }〉Frequent sub-sequence: 〈cb〉 v s2, s3, s4

⇒ |supportD(〈cb〉)| = |{s2, s3, s4}| = 3 ≥ minSupp = 2

Problem : extract the complete and correct collection of frequentsequential patterns

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 41 / 59

Page 51: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Emerging pattern [Dong, Li - 1999]

id description class

s1 〈a{abc}{ac}d{cf }〉 +s2 〈{ad}cc{bbc}{ae}〉 +s3 〈{ef }{ab}{df }cb cb〉 −s4 〈eg{af }cbcbc〉 −

Discriminating power

Each sequence is labeled (+ or −)

A pattern is emerging if it has a high support in a class and low onein the other

Growth-rate: gr(s,Dx) = |support(s,Dx )||Dx | × |Dy |

|support(s,Dy )|

gr(〈cb〉,D−) = 22 ×

21 = 2

P. K. Novak, N. Lavrac, and G. I. Webb.:

Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining..J. Mach. Learn. Res., 10:377–403, 2009.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 42 / 59

Page 52: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

How to encode game logs?

Case 1 :Sequence Winner

〈(j1, a){(j1, b)(j1, c)(j2, c)}{(j2, a)(j1, d)}(j2, b)〉 j1〈(j3, a){(j3, b)(j3, c)(j3, d)}{(j1, b)(j1, c)}(j1, d)〉 j3

but we wish to generalize to + and − classes only

Case 2 :

Player sequence classj1 〈a{bc}d〉 +j2 〈c{ab}〉 −j1 〈a{bcd}〉 −j3 〈{bc}d〉 +

⇒ but we need to take into account the action/reaction principle

Proposed encoding:Sequence

〈(a,+){(b,+)(c,+)(c,−)}{(a,−)(d ,+)}(b,−)〉〈(a,+){(b,+)(c,+)(d ,+)}{(b,−)(c,−)}(d ,−)〉

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 43 / 59

Page 53: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Definitions

Items

Sequence can take symbols in I = A× R ou R = {+,−}.

Dual of an item, of a sequence

The dual of item i = (a, r) ∈ I is given by i = (a,R\r) ∈ I.The dual of a sequence s, denoted s, is obtained by replacing each item(a, r) ∈ I with its dual (a,R\r) ∈ I.

Example

s = 〈{(a,−)(b,+)(c ,−)}(e,+)〉s = 〈{(a,+)(b,−)(c ,+)}(e,−)〉

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 44 / 59

Page 54: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Discriminating measure

The balance measure)

Let s be a frequent sequential pattern,

balance(s) =|supportD(s)|

|supportD(s)|+ |supportD(s)|

Properties

balance(s) ∈ [0; 1]

balance(s) = 0.5⇒ balanced strategy

balance(s) = 1 or 0⇒ imbalanced strategy

balance(s) + balance(s) = 1

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 45 / 59

Page 55: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

PrefixSpan [Han et al., 2001]

Algorithm that enumerates frequent sequence prefixes

Input:

Sequence database (encoded game logs)Minimal support (minSupp)

Output :

All frequent sequential patterns and only them

i1

i2 i6i3

i4 i5

<i1>

<i1 i2> <{i1 i6}>

<i4> <i5>

<i1 i3>

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 46 / 59

Page 56: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Algorithms

Balance measure computation

As a post processing

Naively

For each frequent pattern, builds its dualScan the base to get its support

Naive optimization

i1

i1i2i3i4i5i6

1,q2,q6,q101,q2,q201,q63,q7,q143,q8,q93,q6,q10,q15

i1 .....................

Item Dual(Item)i1i2i3i4i5i6

i4i6i5i1i3i2

SupportDual(<i1>)q=qSeq(Dual(i1),i1)q=q{3,7,14}

SupportDual(<i1qi2>)q=qIntersect(SupportDual(<i1>),Seq(Dual(i2),i1)q=q{3}

Seq

i2 i6i3

i4 i5

<i1>

<i1qi2> <{i1qi6}>

<i4> <i5>

<i1qi3>

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 47 / 59

Page 57: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Algorithm

Suppressing redundant patterns

s = 〈{(a,−)(b,+)(c ,−)}(e,+)〉s = 〈{(a,+)(b,−)(c ,+)}(e,−)〉As a post process

Double search in the prefix tree

i1

Item Dual(Item)i1i2i3i4i5i6

i4i6i5i1i3i2

i2 i6i3

i4 i5

<i1>

<i1 i2> <{i1 i6}>

<i4> <i5>

<i1 i3>

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 48 / 59

Page 58: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Algorithms

Actually, plenty of algorithm adaptations are possible for someparticular cases of datasets

We designed an efficient and generic algorithmExtends PrefixSpan by considering two projected databases per node.

G. Bosc, M. Kaytoue, C. Raıssy, J.-F. Boulicaut, P. Tan.

Mining Balanced Sequential Patterns in RTS Games.European Conference on Artificial Intelligence, ECAI 2014

G. Bosc, P.Tan, J.-F. Boulicaut, C. Raıssy and M. Kaytoue

A Pattern Mining Approach to Study Strategy Balance in RTS Games.IEEE Transactions on Computational Games and Artificial Intelligence (early access), 2015.

Another work applied to StarCraft II dataC. Low-Kam, C. Raıssi, M. Kaytoue, J. Pei

Mining Statistically Significant Sequential Patterns.International Conference on Data Mining (ICDM) 2013.

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 49 / 59

Page 59: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Data collection

Scraping 371 267 replaysFiltering to keep 90 768 games, 30 678 different players

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 5 10 15 20 25 30 35 40

Replay

Time (min)

0

100

200

300

400

500

600

0 5 10 15 20 25 30 35 40

APM

Time (min)

Average + Standard deviationAverage

Average - Standard deviation 0

20

40

60

80

100

0 5 10 15 20 25 30 35 40

% Actions

Time (min)

BuildTrainSelectMoveClickResearchUpgradeHotKeyMinimapOther

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 50 / 59

Page 60: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Sequence dataset

Data BuildItem Seq. IS I/IS

PvP 1,160 6,668 11.5 2.0PvT 3,655 18,754 19.0 2.6PvZ 3,748 22,784 19.6 2.7TvT 2,201 7,457 20.7 2.8TvZ 4,492 23,637 22.5 2.8ZvZ 1,689 9,554 14.2 2.2

Table: Encoding building construction during the 10 first minutes

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 51 / 59

Page 61: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Quantitative results

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 52 / 59

Page 62: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Quantitative results

Symmetric axis: y = 0.5Non perfect symmetry: if a sequence s is frequent,it does not imply that s is frequent tooPattern with highest support are the most known strategies and arebalanced

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 53 / 59

Page 63: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Example of discovered patterns [Forge-Expand]

Protoss strategy in PvZMotivation: favor economy in early game while still being able todefend

minSupp 5% - 591 patterns

s = 〈{(Nexus, 5,+)}{(Gateway , 6,+)(PhotonCannon, 6,+)}〉 -balance(s) = 0.52

s = 〈{(Nexus, 5,+)}{(PhotonCannon, 6,+)(Assimilator , 6,+)}〉 -balance(s) = 0.52

Temps (sec)36A-A40A:

96A-A106A:132A-A145A:132A-A145A:144A-A158A:144A-A158A:144A-A158A:

ActionPylonForgeNexusPylonGatewayPhotonACannonAssimilatorAx2

BuildAOrderA:AForgeAExpand

Source : http://www.teamliquid.net/

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 54 / 59

Page 64: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Example of discovered patterns [4 Gates]

Protoss strategy in PvPMotivation: all-in, aggressive, early game attack (scarifies economy)

minSupp 5% - 3418 motifs

s = 〈{(Gateway , 3,+, 1) (Assimilator , 3,+, 1)} {(Cyb.Core, 4,+, 1)}{(Gateway , 7,+, 2) (Gateway , 7,+, 3) (Gateway , 7,+, 4)}〉 -balance(s) = 0.59

Temps (sec)36W-W40W:72W-W79W:

96W-W106W:108W-W119W:132W-W145W:192W-W211W:216W-W238W:240W-W264W:240W-W264W:

ActionPylonGatewayAssimilatorPylonCyberneticsWCoreWarpgateGatewayWx3PylonAssimilator

BuildWOrderW:W4WGates

Source : http://www.teamliquid.net/

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 55 / 59

Page 65: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Imbalanced strategies

A hot topic for game editorsTvZ + minSupp = 1% : 17 990 patterns

“Bunker-Rush” detected and imbalancedBunker contained 602 motifs20 patterns with balance(s) ≥ 0.6 or ≤ 0.4 when the bunker is done inearly games = 〈{(Barracks, 1,S , 1)}, {(SpPool , 4,F , 1)}, {(Bunker , 6,S , 1),(SpCrawler , 6,F , 1)}〉 (balance(s) = 0.61)

This balance issue has been actually corrected (May 2012): a Zergcounter unit as been slightly improved and bunker timing is longer.We divided the dataset into two and run a comparative analysis,frequent patterns with bunkers are more balanced.

The code is available and can be used for other tasks!https://github.com/guillaume-bosc/BalanceSpan

(For example, mining (im)-balanced drafting in MOBA games).

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 56 / 59

Page 66: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Outline

1 Predictive models from behavioral data

2 Unscrambling confusion matrices to identify aliases

3 Enumerating the lattice of binary classifiers

4 Discovering strategies and balance issues

5 Conclusion

Page 67: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Conclusion

Take away facts

E-sport may not be a ’true’ sport, but its development is incredible

New challenges in video game design and analytics: fun/difficultyparadigm to satisfy standard players and pro

Games traces hide individual patterns

In StarCraft 2, ia customizable keyboard usageWhen avatar aliases are present, one needs to unscramble the confusionmatrixTo avoid biases, on can build the lattice of binary classifiers

Games traces hide strategies

Sequential pattern mining with a new measure, the balance measurecan help discovering such patternsIt can be applied in any zero-sum game scenario for descriptiveanalytics

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 58 / 59

Page 68: Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.

Thanks to my colleaguesat INSA/ LIRIS: Guillaume Bosc, Jean-Francois Boulicaut,Victor Codocedo, Quentin Labernia, Marc Plantevit, Celine Robardetat MIT Media Lab / Game Lab: Philip Tanat INRIA: Chedy Raıssiand most importantly to you and the AIST organization team!

M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 59 / 59


Recommended