+ All Categories

W07 The Discovery Challenge

on Thrombosis Data

Data Provider Katsuhiko TAKABAYASHI MDChiba University Hospital, Japan

Anti-Phospholipid antibody Syndrome (APS)

Anti-cardiolipin antibodies (aCL) Lupus anticoagulant (LAC)

Induces thrombotic events (such as AMI, Stroke, deep venous throm

bosis, miscarriage, pulmonary hypertension etc.)

sometimes positive in other collagen diseases (Lupus, Sjoegren syndrome)






X Xa



prothrombin thrombin

fibrinogen fibrin XIII a XIII

PL Ca++


Ca++ Ca++Mg++ Mg++


ProteinC ProteinC

Collagen disease




Collagen  Diseases

Autoimmune disease Rheumatic disease Connective tissue disease

Thombosis ; Vessel stasis by blood clots Myocardial Infarction, Stroke etc.




Who  in APS ?

When ?

Which laboratory data have relations with thrombosis as well as anti-cardiolipin antibodies ?

The Goal of this trial (1)

If a data mining technique can point out important key factors (aCL, LAC, PT, APTT) which are already known to be related with thrombosis properly from many variants we provided.

Assessment of validity of each study

The Goal of this trial (2)

The Results to expect

2) to predict the time of thrombosis or detect the change of some variants in the course of thrombosis from the series of temporal data.

1) to identify high risk patients who have no history of thrombosis so far.

Evaluation of the results

Common sense results (positive control)

Probable results Possible results unclear results, difficult to evaluate Nonsense results (negative control)

From the current medical point of view,.

We cannot judge what we We cannot judge what we do not know !do not know ! The study

most results of whichhave good accordance withcurrent knowledge

The studymost results of whichhave low accordance withcurrent knowledge

Domain researcherscannot believe the restof unclear results !

Assessment in domain field

Domain researchers cannot say that other unclear resultsare also true.

Medical  Data Set

Medical data set here is from 1241 patients with collagen diseases and 7 basic laboratory data for aCL from 806 cases were provided.

As for temporal laboratory data, 41 items in 57,543 tests totally in 17 years were prepared.

Seventy-six cases had some thrombotic events in their clinical course.

It can predict patients’ health state from spe-exams and lab-exams in 99.28%.

CNS lupus has a relation with anti-DNA Ab level and IgM type aCL.

aCL IgM and anti-DNA Ab levels are related independently with the thrombosis in the future.

Evaluations from medical aspects

Coursac I et alThe bridge theory ;Genetic  Programming

a lot of rules with 100% confidence, but most of them were not useful.

A rule that aCL >2.4 and range of aCL IgM from 1.9 to 2.7 and KCT (-) is SLE.

The rule that sex is M and ANA is 0 is Behcet.

We would like to look at the other rules not written here to find attractive ones.

Evaluations from medical aspects

Boulicaut et al δ- strong classification rules

LAC, ANA, U-pro, centromere-type, SSA, SSB,RNP,SM,SCl-70 were strong contributors to predict the presence of thrombosis.

Other possibilities of thrombosis without aCL antibodies.

Evaluations from medical aspects

Jensen S et al CRISP ( cross-industry standard process )

Sequential analysis for temporal data did not show interesting results.

It might be difficult to predict the time of thrombosis. One possibility is that the data might be modified by the treatment or prophylaxis.

Evaluations from medical aspects

Jensen S et al

Bias by physicians

Modification of treatment

Selection of the cases, laboratory data

determined a discriminate function that separates occurrences of thrombosis with very low false negatives.

However, ..... is it possible to translate the meaning and make us understood ?


Evaluations from medical aspects

Werner J and Fogarty T genetic programming

When the Results Beyond Expert’s Knowledge ability

Complicated relations might be difficult to be explained. No drug relations for three items were tried.

The results through a black box might be ignored by the experts simply because it can not make them understood!

reasonable results as Infozoom. ANA pattern analysis Patients with severe attacks have more pos

sibilities of other attacks. Thrombosis related with the level of aCLs. Alveolar hemorrhage and CNS attacks are n

ot associated with milder attacks.

Evaluations from medical aspects

Zytkow J and Gupta SSQL ; cross contingency classification

Evaluations from medical aspects

Beilken and Spenke (InfoZoom) : by using user friendly interface, easy to understand their test results. They could choose the reasonable and interesting rules.

Levin: by using Wizwhy producing 7356 rules. Complicated rules are difficult to comment because of its complexity.

Taylor : from temporal data missing data disturbed the analysis. Only common sense findings were selected.

reasonable results as Infozoom. ANA pattern analysis Patients with severe attacks have more pos

sibilities of other attacks. Thrombosis related with the level of aCLs. Alveolar hemorrhage and CNS attacks are n

ot associated with milder attacks.

Evaluations from medical aspects

Zytkow J and Gupta SSQL ; cross contingency classification

To obtain the good results efficiently

Preprocessing the data is very essential by domain researchers who concerned with the database to minimize the noises.

Definition, classification, adjustment etc. Recognition of the modification by the

treatment or prophylaxis. Indication to treat missing data

(1)Cleaning of data

To involve medical knowledge as possible with the data set in the beginning

To cooperate with domain researchers to obtain domain knowledge during data mining.

To obtain the good results efficiently

(2)Introduction of the domain knowledge

Causal Relation

Misjudge in temporary meaning

Bacteria invades Pneumonia occurs

Bacteria has invaded Pneumonia occurs

Bacteria will invade Pneumonia occurs

Backward and non-objective relationships

An interactive technique will avoid user’s discontent of a black box and assist to drive to the right direction.

Hypothetico-deductive method will be easily accepted by physicians.

To obtain the good results efficiently

(3)Cooperation with domain researchers

Causal Relation

Misjudge in temporary meaning

It rains The road is wet.

It rains The road is wet.

It will rain The road is wet.

Backward and non-objective relationships

Data mining

Retrospective approach ; not arranged, many noises.

Data ; More genuine and adequate data set must be prepared. Terms, definitions and background must be introduced beforehand.

Rules ; Complicated rules (relations between more than 3 items) found by this analysis cannot be explained nor proved whether they are true from medical approach.

3 種の薬剤の治験はない


Modification of treatment Selection of the cases, laboratory


change of the disease; before and after the events (thrombosis)

By Physicians

By Accident

Top Related