+ All Categories
Home > Documents > Searching for Credible Relations in Machine Learning Doctoral Dissertation

Searching for Credible Relations in Machine Learning Doctoral Dissertation

Date post: 16-Feb-2016
Category:
Upload: hestia
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Searching for Credible Relations in Machine Learning Doctoral Dissertation. Vedrana Vidulin Supervisor: prof. dr. Matja ž Gams Co-supervisor: prof. dr. Bogdan Filipi č. Ljubljana, 3 February 2012. Introduction. Task: domain analysis of complex domains Problem: - PowerPoint PPT Presentation
Popular Tags:
20
Searching for Credible Relations in Machine Learning Doctoral Dissertation Vedrana Vidulin Supervisor: prof. dr. Matjaž Gams Co-supervisor: prof. dr. Bogdan Filipič Ljubljana, 3 February 2012
Transcript
Page 1: Searching for Credible Relations in Machine Learning Doctoral Dissertation

Searching for Credible Relations in Machine LearningDoctoral Dissertation

Vedrana Vidulin

Supervisor: prof. dr. Matjaž GamsCo-supervisor: prof. dr. Bogdan Filipič

Ljubljana, 3 February 2012

Page 2: Searching for Credible Relations in Machine Learning Doctoral Dissertation

2 of 20 Searching for Credible Relations in Machine Learning

Introduction• Task: domain analysis of complex domains

• Problem:– When DM methods construct models on complex domains, the

models often contain parts (relations) that are less-credible from the perspective of human analyst.

– Less-credible parts can:• Lead to wrong conclusions about the most important relations in the

domain• Undermine user’s trust in DM methods (Stumpf et al., 2009).

• Proposed solution: a new method that in algorithmic way combines human understanding and raw computer power in order to extract credible relations – supported by data and meaningful for the human.

Page 3: Searching for Credible Relations in Machine Learning Doctoral Dissertation

3 of 20 Searching for Credible Relations in Machine Learning

An Example• A decision-tree model is constructed:

– With J48 algorithm in Weka,– From a data set that represents the impact of R&D sector

on economic welfare of a country

Country GERD per capita (PPP$)

Researchers per million inhabitants (HC)

…Sector investing the most in R&D

GNI per capita

Armenia 7.6 1,660 … Government low

Latvia 37.1 2,455 … Government middle

Japan 813.7 6,227 Business enterprise high

… … … … … …

37 attributes: R&D sector

167

exam

ples

: Cou

ntrie

s

Class: Economic welfare

Page 4: Searching for Credible Relations in Machine Learning Doctoral Dissertation

4 of 20 Searching for Credible Relations in Machine Learning

= Abroad

GERD per capita (PPP$)

Sector employing the most researchers

<= 105.5

Sector investing the most in R&D

> 105.5

middle (49.0/20.92)

= N/A

middle (42.87/13.15) GERD per capita (PPP$)

= Goverment

middle (5.0)

low (12.58/0.39)

<= 10.8

middle (10.29/4.29)

> 10.8

middle (16.7/8.77)

= N/A

high (6.57/1.28)

= Government

high (24.0/1.0) high (0.0) high (0.0) high (0.0)

= Higher education

= Business enterprise

= Business enterprise

= Higher education

= Private non-profit

An Example (2)

Page 5: Searching for Credible Relations in Machine Learning Doctoral Dissertation

5 of 20 Searching for Credible Relations in Machine Learning

Outline• Definition of credible relation

• Human-Machine Data Mining (HMDM) method

• Experimental evaluation

• Conclusions and contributions

Page 6: Searching for Credible Relations in Machine Learning Doctoral Dissertation

6 of 20 Searching for Credible Relations in Machine Learning

Credible Relation• Relation – a pattern that connects a set of attributes that

describe the properties of a concept underlying the data and a class/target attribute that represents the concept.

• Credible relation – of great meaning and of high quality:– Meaning – a subjective criterion attributed by the human

based on the common sense, an informal knowledge about the domain, observed frequency and stability of the relation.

– Quality – an objective criterion that indicates a support of the selected quality measures.

• Credible model – composed only of credible relations.

Page 7: Searching for Credible Relations in Machine Learning Doctoral Dissertation

7 of 20 Searching for Credible Relations in Machine Learning

How to Establish Credible Relations?

The relation is composed ofattributes A1 and A2.

Re-examine relation’s credibility by:1) Removing attributes A1 and A2

from data set 2) Adding attributes A1 and A2 to

If the relation is supported by evidence, add it to the list of candidates for credible relations.

Page 8: Searching for Credible Relations in Machine Learning Doctoral Dissertation

8 of 20 Searching for Credible Relations in Machine Learning

The HMDM Algorithm

Until no new interesting relations

Repeat Create several models (e.g., trees) Choose most interesting models

For each interesting modelExamine credibility of relations in the modelby adding and removing attributes from the data set

Merge candidate relations with the output list of credible relations

Page 9: Searching for Credible Relations in Machine Learning Doctoral Dissertation

9 of 20 Searching for Credible Relations in Machine Learning

The HMDM Algorithm (2)HMDM (data set) REPEAT Select DM method Select parameters and their ranges, define constraints Perform INITIAL_DM creating a list of models LM: FOR each interesting model M from LM, reexamine M: REPEAT Perform any of the following: {

ADD_ATTRIBUTES REMOVE_ATTRIBUTES Expand credibility indicator }

Evaluate the results with several quality measures and for meaning UNTIL no more interesting relations are found in the search space near the initial model Store credible relations and integrate conclusions END FOR UNTIL no more new interesting relations are found anywhere in the data set

Page 10: Searching for Credible Relations in Machine Learning Doctoral Dissertation

10 of 20 Searching for Credible Relations in Machine Learning

NO ATTRIBUTESA1 | 71.43

A2 | 85.71

A2 | 100

HMDM: ADD_ATTRIBUTES

ATTRIBUTESA1 A2 A3 C1 1 0 11 1 0 11 0 1 00 1 1 01 1 0 10 0 1 01 0 0 0

Quality: Accuracy (%)

Model: J48 trees

Candidates for credible relations

A1 & A2 – combination

Page 11: Searching for Credible Relations in Machine Learning Doctoral Dissertation

11 of 20 Searching for Credible Relations in Machine Learning

ALL ATTRIBUTES | 100A3 | 100

A1 | 71.43

HMDM: REMOVE_ATTRIBUTES

Quality: Accuracy (%)ATTRIBUTES

A1 A2 A3 C1 0 1 10 1 0 00 1 0 01 0 1 11 0 1 11 1 1 11 1 1 1

Model: J48 trees

Candidates for credible relations

A1 || A3 – redundancy

Page 12: Searching for Credible Relations in Machine Learning Doctoral Dissertation

12 of 20 Searching for Credible Relations in Machine Learning

Type-Credibility Scheme

• Three levels of credibility:1. Frequent and stable relations

• Often appear in models• When added improve quality• When removed reduce quality

2. Frequent and less-stable relations• Often appear in models• When added sometimes improve quality and sometimes not• When removed sometimes reduce quality and sometimes not

3. Not supported by evidence

Page 13: Searching for Credible Relations in Machine Learning Doctoral Dissertation

13 of 20 Searching for Credible Relations in Machine Learning

Quality Measures• The decision trees are evaluated according to:

– Accuracy– Corrected class probability estimate (CCPE)– Kappa

• The regression trees are evaluated according to:– Correlation coefficient– Relative absolute accuracy (RAA)

• In addition, trees are evaluated according to – the total change in quality caused by adding and removing attributes:

Page 14: Searching for Credible Relations in Machine Learning Doctoral Dissertation

14 of 20 Searching for Credible Relations in Machine Learning

Experimental Evaluation• Performed on three domains:

1. Research and development (R&D)2. Higher education3. Automatic web genre identification

Page 15: Searching for Credible Relations in Machine Learning Doctoral Dissertation

15 of 20 Searching for Credible Relations in Machine Learning

R&D Domain: Remove Attributes Graph

GERD-PC || GERD-GDPRES-HC || RES-FTEAPP-NON-RES

Page 16: Searching for Credible Relations in Machine Learning Doctoral Dissertation

16 of 20 Searching for Credible Relations in Machine Learning

Domains• Higher education

– Goal: An analysis of the impact of higher education sector on economic welfare of a country

– DM methods: J48 and M5P trees– Data: 60 attributes; 167 examples: countries; class: GNI

per capita

• Automatic web genre identification– Goal: Improve predictive performance by eliminating less-

credible relations from J48 decision-tree models– Data: 500 attributes: words; 1,539 examples: web pages;

class: 20 genres

Page 17: Searching for Credible Relations in Machine Learning Doctoral Dissertation

17 of 20 Searching for Credible Relations in Machine Learning

R&D and Higher Education Domains – Credible RelationsR&D• First level: increase the level of investment in R&D sector• Second level:

– Increase the number of patents– Increase the number of researchers– Develop business enterprise sector as the key leader in R&D activities

Higher education• First level: stimulate participation in higher education and improve

student exchange programs• Second level:

– Increase the level of investment in all levels of education (“low”)– Increase number of graduates in science programs (“middle”)– Attract more foreign students (“middle”)

Page 18: Searching for Credible Relations in Machine Learning Doctoral Dissertation

18 of 20 Searching for Credible Relations in Machine Learning

Evaluation

• User study on 22 participants:– 64% of participants did not recognize less-credible relations in the

single model– When presented with credible models all accepted credible models

as better

Accuracy (%)Data J48 HMDM

HI-EDU 71.86

R&D 63.47

Correlation coefficientData M5P HMDM

HI-EDU 0.681

R&D 0.722 0.787

Data: Genres

F-Measure J48 HMDMMicro-AVG 0.280 0.370Macro-AVG 0.284 0.377

Page 19: Searching for Credible Relations in Machine Learning Doctoral Dissertation

19 of 20 Searching for Credible Relations in Machine Learning

Conclusions• A novel method Human-Machine Data Mining (HMDM)

was designed that combines human understanding and raw computer power to extract credible relations from data.

• The HMDM method was evaluated on three complex domains showing that:– the method is able to find important relations in data– credible models are better in quality than the models

constructed by automatic DM methods– humans accept credible models

Page 20: Searching for Credible Relations in Machine Learning Doctoral Dissertation

20 of 20 Searching for Credible Relations in Machine Learning

Contributions• The main contributions:

– A new method Human-Machine Data Mining (HMDM) was designed for extracting credible relations from data

– The CCPE statistical measure, originally conceived for classification rules, was extended for decision trees

– Interactive explanation structures in the form of added and removed attributes graphs were designed, conceived to facilitate the extraction of credible relations

• Additional contributions:– A computer program was developed to support the HMDM

method– The analysis of three real-life domains


Recommended