FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

transcript

Nikolay ZagoruikoIrina Borisova, Vladimir Dyubanov, Olga Kytnenko

Institute of Mathematics of the Siberian Devisionof the Russian Academy of Sciences,

Pr. Koptyg 4, 630090 Novosibirsk, Russia,

zag@math.nsc.ru

Data Analysis, Pattern Recognition, Empirical Prediction, Discovering of Regularities, Data Mining, Machine Learning,

Knowledge Discovering, Intelligence Data Analysis, Cognitive Calculations

The special attention involves ability of the person

- To estimate similarities and distinctions between objects, - To make classification of objects, - To recognize a belonging of new objects to available classes, - To discover natural dependences between characteristics and - To use these dependences (knowledge) for forecasting

Specificity of Data Mining tasks:

• Polytypic attributes

• Number of attributes>> number of objects

• Presence of noise, spikes and blanks

• Absence of the information on distributions

Situation in Data Mining

Thousands of algorithmsReasons: Types of scales, dependences of features, lows of

distribution, linear-nonlinear decision rules, small or big training set,

How to make algorithms, which will be invariant to this features?

Which function is common for all DM algorithms?

Basic function, used by the person at the clustering, recognition, feature selection etc., is function of estimation of similarity between objects.

Measures of Similarity

( , ) 1 ( ) ,

( , ) 1 | |

( , ) 1 max | |,

( , )( , ) ,

max( , )

( , ) 1 ,

na bi i

i i ii

a bi i

a bni i

i a bi i i

S a b x x

min x xS a b

S a b e

Similarity is not absolute, but a relative category

Is an object b similar to a or it is not similar? Whether the objects b and a belong to one class?

We should know the answer on question: In competition with what?

Measure F(z,a|b) of similarity of the object z to object a in competition with object b

Locality: F is depend on distances (z,a) and (z,b) only.

Normality: If z=a, F(z,a|b)=+1. If z=b, F(z,a|b)=-1.

If (z,a)=(z,b), F(z,a|b)=F(z,b|a)= 0.

Invariance to moving and rotation of coordinates.

Antysimmetricity: F(z,a|b)= -F(z,b|a)

======================================

Simmetry: F(z,a|b)=F(z,b|a)

Thriangularity: F(z,a|b)+F(a,b|z)≥F(b,z|a)

======================================

Competitive Space

Function of Concurrent (Rival) Similarity (FRiS)

)()2|1,(

Methods of DM, using FRiS-function, allows to improve a old algorithms

and to solve a some new tasks:

• Quantitative estimation of compactness • Choice of informative attributes • Construction of decision rules• Censoring of the training set• Generalized classification• Filling of blanks (inputation)• Forecasting• Ordering of objects

All pattern recognition methods are based on hypothesis of compactness

Braverman E.M., 1962

The patterns are compact if-the number of boundary points is not enough in comparison with common number; - compact patterns are separated from each other refer to not too elaborate borders.

Compactness

For high compactness it is need:

Maximum of the similarity between objects of one pattern

Minimum of the similarity between objectsof different patterns

Maximal similarity between objects of the same pattern

Compact patterns should satisfy to condition of the

1( , | )

D F j i bM

Max inCompactness

2 1 2 1( , | ) ( ) / ( )F j i b r r r r

Min out

Compactness

Maximal difference of these objects with the objects of other patterns

1( , | )

A BM M

ii qA B

T F q s iM M

*A BC C C

Compact patterns should satisfy to the condition

( ) / 2i i iC D T

2 1 2 1( , | ) ( ) / ( )F q s i r r r r

Algorithm FRiS-Stolpfor selection of the standards (“stolps”)

max ( ) / 2i i iC D T

Decision rules

Recognition

k=K+11

k=K+29

Censoring of the training setCensoring

Censoring of the training set

1.0.8689 -90(90)-202.0.8902 -90(90)-203.0.9084 -90(90)-204.0.9167 -90(90)-205.0.8903 -90(90)-206.0.7309 -88(90)-97.0.2324 -86(90)-7

MMmmkCH /',...)/(

=argmax |r|(H,P) =1,2,…7

Censoring

=4 or 5

Informativeness by Fisherfor normal distribution

1 22 21 2

Compactness has the same sense and can be used as a criteria of informativeness, which is invariant to

low of distribution and to relation of N:M

Results of comparative researches have shown appreciable advantage of this criterion in comparison

with number of errors at the Cross-Validation

Criteria of informativeness

Comparison of the criteria (CV - FRiS)

Order of attributes by informativeness

....... ....... C = 0,661

....... ....... C = 0,883

noise0,6

0,05 0,1 0,15 0,2 0,25 0,3

N=100 M=2*100

mt =2*35 mC =2*65 +noise

Criteria

Algorithm GRAD It based on combination of two greedy approaches:

forward and backward searches.

At a stage forward algorithm Addition

is used

At a stage backward algorithm Deletion is used

Algorithm AdDel To easing influence of collecting errors a relaxation method it is applied.n1 - number of most informative attributes, add-on to subsystem (Addition),n2<n1 - number of less informative attributes, eliminated from subsystem (Deletion).

AdDel Relaxation method: n steps forward - n/2 steps back

Algorithm AdDel. Reliability (R) of recognition at

different dimension space.

R(AdDel) > R(DelAd) > R(Ad) > R(Del)

Algorithm GRAD• AdDel can work with not single attributes only, but also with groups of

attributes (granules) of different capacity m=1,2,3,…: , , ,…

The granules can be formed by the exhaustive search method.

• But: Problem of combinatory explosion!

Decision: orientation on individual informativeness of attributes

Dependence of frequency f hits in an informative subsystem from serial number L on individual informativeness

It allows to granulate a most informative part attributes only

Algorithm GRAD(Granulated AdDel)

1. Independent testing N attributes

Selection m1<<N first best (m1 granules power 1)

2. Forming combinations

Selection m2<< first best (m2 granules power 2)

3. Forming combinations

Selection m3<< first best (m3 granules power 3)

M =<m1,m2,m3> - set of secondary attributes (granules)AdDel(M) selects m*<<|M| best granules, which included n* attributes

2 6 9 25,3 ,5 , ,...X x x x x

Value of FRiS for points on a plane

Classification (Algorithm FRiS-Class)

FRiS-Cluster divides a objects on clustersFRiS-Tax unites a clusters to classes (taxons)

Using FRiS-function allows:- To make a taxons of any form;- To search a optimal number of taksons.

r1 r2*

Примеры таксономии алгоритмом FRiS-Class

Comparison the FRiS-Class with other algorithms of taxonomy

2 3 4 5 6 7 8 9 10 11 12 13 14 15

FRiS-Cluster

Kmeans

FRiS-Tax

Taxonomic Decision Rule

Universal classification

Labeled Semilabeled Unlabeled

(Pattern Rec) (ТРФ) (Clastering)

Universal classification

Unlabeled Semilabeled Labeled

(Clastering) (Pattern Rec)

=================================

FRiS-TDR

Some real tasks DM

Task K M NMedicine:Diagnostics of Diabetes II type 3 43 5520 Diagnostics of Prostate Cancer 4 322 17153Recognition of type of Leukemia 2 38 7129

Physics:Complex analysis of spectra 7 20-400 1024

Commerse:Forecasting of book sealing(Data Mining Cup 2009) - 4812 1862

Data Mining Cup 2009http:www.prudsys.deServiceDownloadsbin

Prognosis of data at absolure scale

1…………………………………………1856 1…8

TRAINING

1... 84% = 0.. A = 0 - 2300.2394

CONTROL

1.......2418

To predict 19344 cells

DMC 2009

618 teams from 164 Universities of 42 countries participated

231 have sent decisions, 49 were selected for rating

1 Uni Karlsruhe TH_ II

17260 16 TU Graz

2 TU Dortmund 17912 18 Uni Weimar_I 23796

3 TU Dresden 18163 19 Zhejiang University of Sc. and Tech 23952

4 Novosibirsk State University 18353 20 University Laval 24884

5 Uni Karlsruhe TH_ I

18763 24 University of Southampton

6 FH Brandenburg_I

19814 25 Telkom Institute of Technology

7 FH Brandenburg_II

20140 26 University of Central Florida

8 Hochschule Anhalt

20767 32 Indian Institute of Technology

9 Uni Hamburg_

21064 34 Anna University Coimbatore

10 KTH Royal Institute of Technology

21195 38 Technical University of Kosice 32841

11 RWTH Aachen_I 21780 39 Uiversity of Edinburgh 45096

14 Budapest University of Technology

23277 48 Warsaw School of Economics

15 Isfahan University of Technology

23488 49 FH Hannover 1938612

NN Teams Errors NN Teams Errors

Comparison with 10 methods of feature selection

• Jeffery I.,Higgins D.,Culhane A. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. //

• http://www.biomedcentral.com/1471-2105/7/3599 tasks on microarray data. 10 methods the feature selection.Independent attributes. Selection of n first (best). Criteria – min of errors on CV: 10 time by 50%.

4 decision rules:Support Vector Machine (SVM), Between Group Analysis (BGA),

Naive Bayes Classification (NBC), K-Nearest Neighbors (KNN).

40 decision of each of 9 tasks

Methods of selection

Methods Results

Significance analysis of microarrays (SAM) 42Analysis of variance (ANOVA) 43Empirical Bayes t-statistic 32Template matching 38 maxT 37 Between group analysis (BGA) 43 Area under the receiver operating characteristic curve (ROC) 37Welch t-statistic 39 Fold change 47 Rank products 42 FRiS-GRAD 12

Empirical Bayes t-statistic – for middle set of objectsArea under a ROC curve – for small noise and large set Rank products – for large noise and small set

Results on tasks

• Задача N0 m1/m2 max of 4 GRAD• ALL1 12625 95/33 100.0 100.0• ALL2 12625 24/101 78.2 80.8• ALL3 12625 65/35 59.1 73.8• ALL4 12625 26/67 82.1 83,9• Prostate 12625 50/53 90.2 93.1 • Myeloma 12625 36/137 82.9 81.4• ALL/AML 7129 47/25 95.9 100.0• DLBCL 7129 58/19 94.3 89.8• Colon 2000 22/40 88.6 89.5

Recognition of two types of Leukemia - ALL and AML

ALL AMLTraining set 38 27 11 N = 7129Control set 34 20 14

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Gene Selection for Cancer Classification using

Support Vector Machines. Machine Learning. 2002, 46 1-3: pp. 389-422.

Training set 38 Test set 34N g Vsuc Vext Vmed Tsuc Text Tmed P7129 0,95 0,01 0,42 0,85 -0,05 0,42 294096 0,82 -0,67 0,30 0,71 -0,77 0,34 242048 0,97 0,00 0,51 0,85 -0,21 0,41 291024 1,00 0,41 0,66 0,94 -0,02 0,47 32512 0,97 0,20 0,79 0,88 0,01 0,51 30256 1,00 0,59 0,79 0,94 0,07 0,62 32128 1,00 0,56 0,80 0,97 -0,03 0,46 3364 1,00 0,45 0,76 0,94 0,11 0,51 3232 1,00 0,45 0,65 0,97 0,00 0,39 3316 1,00 0,25 0,66 1,00 0,03 0,38 348 1,00 0,21 0,66 1,00 0,05 0,49 344 0,97 0,01 0,49 0,91 -0,08 0,45 312 0,97 -0,02 0,42 0,88 -0,23 0,44 301 0,92 -0,19 0,45 0,79 -0,27 0,23 27

Pentium T=3 hours

FRiS Decision Rules P

0,72656 537/1 , 1833/1 , 2641/2 , 4049/2 34

0,71373 1454/1 , 2641/1 , 4049/1 34

0,71208 2641/1 , 3264/1 , 4049/1 34

0,71077 435/1 , 2641/2 , 4049/2 , 6800/1 34

0,70993 2266/1 , 2641/2 , 4049/2 34

0,70973 2266/1 , 2641/2 , 2724/1 , 4049/2 34

0,70711 2266/1 , 2641/2 , 3264/1 , 4049/2 34

0,70574 2641/2 , 3264/1 , 4049/2 , 4446/1 34

0,70532 435/1 , 2641/2 , 2895/1 , 4049/2 34

0,70243 2641/2 , 2724/1 , 3862/1 , 4049/2 34

Name of gene Weight

2641/1 , 4049/1 33 2641/1 32

В 27 первых подпространствах P =34/34

Pentium T=15 sec

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O.

Best features SVM FRiS

FRE 803,4846 30(88%) 33(97%)

4846 27(79%) 30(88%)

Projection a training set on 2641 и 4049 features

Diabetes of II type Ordering of patients

M=43 17+8+18, N=5520

• Average similarity Fav of patients to healthy people

Healthy Patients

Group of risk

The group of risk did not participate in training

It is useful for early diagnostics of diseases and for monitoring process of treatment

Methods of DM, using FRiS-function, allows to improve a old algorithms

and to solve a some new tasks:

• Quantitative estimation of compactness • Choice of informative attributes • Construction of decision rules• Censoring of the training set• Generalized classification• Filling of blanks (inputation)• Forecasting• Ordering of objects

Unsettled problems

• Stolp+corridor (FRiS+LDR)• Imputation of polytypical tables• Unite of tasks of different types (UC+X)• Optimization of algorithms• Realization of program system (OTEX 2)• Applications (medicine, genetics,…)• …..

Conclusion

FRiS-function:1.Provides effective measure of similarity,

informativeness and compactness

2.Provides unification of methods and invariance to parameters of tasks,low of distribution, relation M:N

3.Provides high enough quality of decisions

Publications:

http://math.nsc.ru/~wwwzag

Thank you!

• Questions, please?

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Education