Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute,...

Post on 15-Jan-2016

214 views 0 download

transcript

Active subgroup mining

for descriptive induction

tasks

Dragan Gamberger

Rudjer Bošković Instute, Zagreb

Zdenko Sonicki

University of Zagreb

Talk overview:

- descriptive induction- active subgroup mining - subgroup discovery- data mining server- a real medical example

Descriptive induction is aimed at generating (inducing) knowledge that is understandable (interpretable) by humans.

It is different from classification aimed induction where the main goal is high classification quality (but induced classification schemes are typically too complex for human interpretation).

Main properties of descriptive induction:

- simple rules

- reasonable prediction quality (both on available and future cases)

Main problem: overfitting

functional genomics domain has 150 examples with 16000 measured attribute values

- descriptive induction- active subgroup mining - subgroup discovery- data mining server- a real medical example

Active subgroup mining is a data analysis approach specially developed for medical applications (but applicable also for other domains).

It is based on the observation that expert knowledge (in medical domains it means knowledge

and experience of medical doctors) is very important for the quality of obtained results.

In active subgroup mining the expert is positioned in the center of the process and machine learning (subgroup discovery) is only a tool that helps him in the data analysis process.

definition of task(s)

induction of models

presentation

visualization

integration

statistical evaluatio

n

selection of models

expert

subgroup discovery

- descriptive induction- active subgroup mining - subgroup discovery- data mining server- a real medical example

+++

+

+

+

+

+

+

+

+

+

+

+

+

classical versus subgroup discovery

induction

+

+

+

+

+

+

very specific subgroup very sensitive subgroup

generality – the main parameter of the subgroup induction process

Subgroup discovery is a beam search algorithm which generates short rules in the form of conjunctions of conditions.

Conditions are based on the values of available attributes.

example:

CHD <- age > 53 AND T.CH > 6.1 AND BMI < 30

- descriptive induction- active subgroup mining - subgroup discovery- data mining server- a real medical example

dms.irb.hr

meningoencephalitis domain

subgroup describing bacteria in contrast to the virus type disease

- descriptive induction- active subgroup mining - subgroup discovery- data mining server- a real medical example

Conclusions:

-descriptive induction and active subgroup mining are novel concepts potentially very interesting for data analysis and knowledge induction in medical applications

- active and central role of medical experts is essential

- we have extensive and positive experience with these methodology on different medical domains but no experience in constructing medical guidelines. For such applications potentially useful might be:

- detection of decision points for numerical attributes

- detection of apparent but significant contradictions

- explicit noise detection