Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T....

Post on 03-Jan-2016

213 views 0 download

Tags:

transcript

Artificial Immune based Approach to Association

Rule Mining

By: B. Hoda HelmiSupervisor: Adel T. RahmaniJanuary 2008

A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Master of Science in Artificial Intelligence-Computer Engineering

1

Outline

The Immune System Natural

and Artificial

Association Rules

Web Usage Mining

Proposed Algorith

mAISWUM

Results and

Conclusion

2

Natural Immune System

Immune System

• A system that protects the body from foreign substances and pathogenic organisms.

Antibody

• The immune system creates antibodies which match the antigens and cause the pathogens to be destroyed

Antigen

• Substances capable of starting a specific immune response are referred to as antigens (viruses, bacteria, fungi).

3

A High Level Overview4

Natural Immune System

Immunity

Innate

Danger Theory

Adaptive

Clonal

Selection

Network

Theory

Affinity Maturation

Hyper

mutatio

n

5

Innate versus Adaptive IS

Innateimmediately available for combat

6

Adaptive Immunity

epitope

Low affinity

receptor

structurally similar – high affinity

7

Clonal Selection &Affinity Maturation

8

Network Theory

1

2

3

Ag

Stimulation (Positive Response)

Suppression (Negative Response)

Idiotypic network (Jerne, 1974):B cells stimulate each other.Creates an immunological memory

9

Danger Theory10

Artificial Immune System

Algorithms

Affinity

Representation

Application

Solution

AIS

A Framework

for A

IS

11

Association Rules

Set of items: I={I1,I2,…,Im}Transactions: D={t1,t2, …, tn}, tj IItemset: {Ii1,Ii2, …, Iik} ILarge (Frequent) itemset: Itemset

whose number of occurrences is above a threshold.

Support of an itemset: Percentage of transactions which contain that itemset.

12

Given:a set of items I={I1,I2,…,Im},a database of transactions

D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I,

The Association Rule Problem is to identify all association rules X Y with a minimum support and confidence.

13

Association Rule Mining Steps

Find Frequent Itemsets.

Generate rules from frequent itemsets.

Challenging Step In Association Rule Mining

14

Goal

In this project our goal is to find all the

in

using

frequent itemsets

Web usage data

artificial immune system

15

Web Usage Mining

Web usage mining also known as Web log mining

Mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web.

16

Web Usage Mining

Applicatio

ns

•Target potential customers for electronic commerce•Enhance the quality and delivery of Internet information services to the end user•Improve Web server system performance•Identify potential prime advertisement locations•Facilitates personalization/adaptive sites•Improve site design•Fraud/intrusion detection•Predict user’s actions (allows prefetching)

17

Motivations(of choosing this application)

Web

Unstable

Noisy

Enormous

Distributed Data

18

WUM-Definitions

Web Logs

• Set of all accessed to URLs of a Web site that is stored in Web server

Session

• A sequence of URLs that are accessed by a user in one visit of Web site. (Itemset)

Strong trend

• crowded paths that frequently are traversed by users. (Frequent Itemsets)

19

Web Log

O:0000002560 || T:1997/09/12-22:43:00 ||U:/ || R:http://www.hyperreal.org/

O:0000002560 || T:1997/09/12-22:50:27 || U:/categories/software/ || R:http://www.hyperreal.org/music/machines/

O:0000002560 || T:1997/09/12-22:50:38 || U:`/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/

O:0000002560 || T:1997/09/12-22:50:47 || U:/categories/software/Windows/V909V03.TXT || R:http://www.hyperreal.org/music/machines/categories/software/Windows/

O:0000002560 || T:1997/09/12-22:51:06 || U:/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/

20

Session Construction

URLS IDX

/ 0/categories/software/

1

/categories/software/Windows/

2

/categories/software/Windows/V909V03.TXT

3

/categories 4/manufacturers 5/samples.html/ 6/gearlists/ 7/features/ 8/ecards/ 9

1 1 1 1 0 0 0 0 0 007:27

00:11

02:10

00:19

02:01

00:00

00:00

00:00

00:00

00:00

1 1 2 1 0 0 0 0 0 0

Duration

Frequency

eVisitedPagPagePagesitsNumberOfVi

PagesitsNumberOfViPageFrequency

))((

)()(

))(/)((max

)(/)()(

PageLengthPageionTotalDurat

PageLengthPageionTotalDuratPageDuration

eVisitedPagpage

21

Representation

Antibody: (strong trends)

Antigen: (incoming sessions)

URL1(0/1)

URL2(0/1)

URLm(0/1)

URL1(0/1)

URL2(0/1)

URLm(0/1)

• Age• Stimulation Level• Scale

Antibody features

• ValidityAntigen features

22

Scenario

Antigen enters the body

Determine if the first signal is produced? (2 signals are needed for an antigen to trigger AIS, first signal is

produced if antigen is harmful to body)

If first signal is produced, present antigen to antibodies and compute distance, weight and influence zone.

Determine antibody with maximum weight. If maximum weight > threshold

compute SL and IZ for antibodyelse create by duplication a new antibody.

Clone and Mutate.

23

Danger Signal

Danger Theory (two signal approach) If antigen is harmful trigger an IS response else discard

the antigen.

In data mining context : harmful interesting (valid)

What is Danger signal in our system?◦ We should find a measure to determine the validity of

sessions.

24

Validity Measure

)2

1)(1(

),(

)(

1

1 1

PP

jksimilarity

SessionyConsistenc

P

k

P

kj

D

djisimilarity

ji,1),(

))(/)((max

)(/)()(

PageLengthPageionTotalDurat

PageLengthPageionTotalDuratPageDuration

eVisitedPagpage

eVisitedPagPagePagesitsNumberOfVi

PagesitsNumberOfViPageFrequency

))((

)()(

25

Validity Measure

)()(

)()(2)(

PageDurationPageFrequency

PageDurationPageFrequencyPageInterest

P

w

SessionInterest

P

i

pi 1)(

)()(

)()(2)(

SessionyConsistencSessionInterest

SessionyConsistencSessionInterestSessionValidity

26

Affinity Measure

What affinity measure is used in our proposed algorithm?

L

l

L

l

i

L

l

ji

ji

lantigenlantibody

lantigenInterestlantibody

antigenantibodyS

1 1

1cos

][][

])[(][

),(

27

Affinity Measure

)2

(2

2

ij

ijd

ij ew

Weight function decreases with distance from the antigen/data location.

is a scale parameter that controls the decay rate of the weights along the spatial dimensions

2ij

28

Stimulation Level

2

1

iJ

J

j

ij

iJ

w

s

21

iJ

iJiJiJ

wWs

1

1

1

J

j

ijiJ wW

29

Weighted Stimulation

)(2

1 JwwW

ws validityiJ

iJiJiJ

30

Network Stimulation & Suppression

21

21

21 )(

iJ

N

n

in

iJ

N

n

in

validityiJ

iJiJiJ

BB

ww

JwwW

ws

31

Cloning

min

1

ageage

ws

wsKN iN

nn

iclonesclones

B

Antibodies are cloned in proportion to their stimulationlevels relative to the average network stimulation.

To avoid preliminary proliferation of antibodies and to encourage a diverse repertoire new antibodies do not clone before they are mature (their age exceeds a threshold)

32

Hypermutation

Somatic hyper mutation is a powerful natural exploration mechanism in IS, that allows it to learn how to respond to new antigens that have never been seen before.

very costly and inefficient operation since its complexity is exponential in the number of features.

we model this operation in AIS by an instant antigen duplication whenever an antigen is encountered that fails to activate the entire immune network.

33

Directed Mutation

Antibodies which are added to population via mutation are always superior individuals.

In this mutation mechanism whenever the system realize there are not enough good antibodies to confront with antigens, new antibodies add to population.

It is a new from of DANGER THEORY.

Directed mutation mechanism is as follow:

34

Directed Mutation

0 1 1 0 0 0 0 1 0

0 1 0 0 1 1 1 0 0

0 1 1 1 0 1 1 0 0

1 1 0 0 1 1 1 0 0

1 0 0 0 0 1 1 1 1

0 1 0 0 0 1 1 1 0

Web log

In to the system

35

Directed Mutation

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 0 0 1 1 1 0

36

Directed Mutation

1 1 0 1 0 0 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 0 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

37

Directed Mutation

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

38

Directed Mutation

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

39

Directed Mutation

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 1 0 1 1 0 0

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

40

Directed Mutation

1 1 0 1 0 0 -1 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 0 1 0 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

41

Directed Mutation

1 1 0 1 0 0 -1 +2 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 0 1 0 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

42

Decide to Mutate

After some times

1 1 -9 1 0 0 -1 +8 0

0 1 1 1 0 0 0 1 0

1 1 0 -10 0 -9 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +9 1 1 -7 0

0 1 +9 1 0 1 1 -8 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

43

Mutation Occur

After some times

1 1 -9 1 0 0 -1 +8 0

0 1 1 1 0 0 0 1 0

1 1 0 -10 0 -9 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +9 1 1 -7 0

0 1 +9 1 0 1 1 -8 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 0 0 1 1 1 0

0 1 0 1 0 1 1 1 0

1 1 1 1 0 0 -1 0 0

1 1 0 1 0 0 0 1 0

44

Directed Mutation

Directed mutation is not computationaly complex.

It doesn't cause antibodies to destroy before they have to leave population.

It make system intelligent -> system can decide when to create new individuals.

After each T antigens enter the system, directed mutation happens.

45

Compression

Compression: cluster antibody population into k clusters.

external interactions: those occurring between an antigen (external agent) and the antibody in the immune network.

internal interactions: those occurring between one antibody and all other antibodies in the immune network.

The most expensive computation and storage overhead stems from calculating and storing all the internal network interactions (quadratic complexity with respect to the network size).

After compression: ◦ internal interactions:

◦ external interactions: k

choosing an appropriate number of clusters

2BN 1)( 2 k

k

NB

BN

BNk )( BNO

46

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

1

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

47

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

48

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

49

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

50

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

51

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

52

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

53

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

54

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

55

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

56

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

57

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

58

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

55

1

59

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

55

1

60

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

6

9

24

23

25

22

21

1

4

3 5

2

1

2

10

49

50

61

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

62

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

63

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

64

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

65

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

66

Algorithm VisualizationX

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

6

6

68

67

Algorithm VisualizationX

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

6

6

68

68

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

6

6

68

69

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

6

6

68

46

70

Pseudocode

maxBN

1- Fix the maximal population size,

maxBN .

2- Initialize antibodies using a cross section of input data, initi 2 .

3- Compress immune network into K subnets using five iteration of K-means.

4- Repeat for each antigen jantigen {

4-1 Compute )( jantigenvalidity ;

4-2 If )( jantigenvalidity < validity threshold

4-2-1 discard jantigen and continue with a new antigen;

4-1 Present jantigen to each subnet centroid ( KkCk ,...,1, ) in network, compute distance

and weight. 4-2 Determine the most activated subnet (ma subnet) which has maximum kjw .

4-3 If all antibodies in ma subnet have minwwij (antigen weak to activate subnet){

4-3-1 create by duplication a new antibody (antibody= jantigen , initi 2 )

}else{ 4-3-1 Increment number of stimulation of antibody i; 4-3-2 Compute iantibody stimulation level ( ijws )

4-3-3 Update iantibody scale value ( 2ij )

} 4-4 clone antibodies; 4-5 If population size >

maxBN {

4-5-1 For each antibody i in network

4-5-1-1 If min. ageageantibodyi BN

n niJi swsantibody1

. ;

4-5-2 Sort antibodies in ascending order of their stimulation level; 4-5-3 Kill worst excess ))((

maxBB NNtop antibodies.

} 4-6 mutate antibodies after every T antigen. 4-7 After every T antigen, use five iteration K-means with previous centroid as initial centroid.

}

71

Data

Data set 1• One week of HTTP

requests to Music Machine Web site. www.hyperreal.org

• 220146 Requests.• 19542 Sessions.• 4756 URLs.

Data set 2• One week of HTTP

requests to the University of Saskatchewan’s WWW server.

• 44298 Requests.• 9188 Sessions.• 1519 URLs.

72

Ground Profiles

For evaluating learned profiles, it should be shown that the learned profiles are good representatives of the input data:

Summarization ability of AISWUM

In order to show this ability, a comparison between distribution of the learned profiles and input data should be done, so:

we need some ground profiles

Ground profiles are extracted using:

Scalable K-Means

73

Evaluation Metrics

L

kki

L

kkcki

ci

tAb

gtAb

gtAbprc

1,

1,,

)(

))((

)),((

L

kkc

L

kkcki

ci

g

gtAb

gtAbcvg

1,

1,, ))((

)),((

otherwise

prcgtAbprcifgtABPRC ci

tN

ic

Ab

0

min)),((max1)),((

)(

1

otherwise

cvggtAbcvgifgtABCVG ci

tN

ic

Ab

0

min)),((max1)),((

)(

1

)),(()),((),(, ccCVGPRC gtABCVGgtABPRCctS

74

Results (Music Machine)

Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.

75

Precision

Distribution of precise antibodies per input category at time t.

76

Coverage

Distribution of complete antibodies per input category at time t.

77

Results (Saskatchewan University)

Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.

78

Precision

Distribution of precise antibodies per input category at time t.

79

Coverage

Distribution of complete antibodies per input category at time t.

80

Evaluation Metrics

x c

x c

N

t

N

cCVGPRC

N

t

N

cCVGPRCCVGPRC

ctS

tctSctS

tP

1 1,

1 1,,

),(

),,(),(

)(

Overall level of learned antibodies precision with respect to input datat

Ratio of learned antibodies that accurately represent the past input data to all of learned antibodies

t

81

Evaluation Metrics

Overall coverage of learned antibodies with respect to input data

x c

x c

N

t

N

cCVGPRC

N

t

N

cCVGPRCCVGPRC

tctS

tctSctS

tC

1 1,

1 1,,

),,(

),,(),(

)(

t

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

82

Results (Music Machines)

Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

t

83

Results (Saskatchewan)

t

84

Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

Results

Maximum

Contentment

Minimum

Contentment

Average

Contentmen

t of 50 users

41% 15% 28% State 1

60% 40% 51% State 2

67% 45% 56% State 3

Danger Theory

Weighted Items

Weighted Sessions

State 1 No No No

State 2 Yes No No

State 3 Yes Yes Yes

85

Run Time

The rune time with one scan of data with non-optimal C++ code on Pentium 4 PC tooks:◦For the first dataset: less than 6 min.◦For the second dataset: less than 3 min.

86

Comparison with other methods

Method AIS-WUM SKM DBSCAN BIRCH aiNet Fuzzy AIS SOSDM

Reliability/

Insensitivity to initial

condition

Yes No Yes No Yes Yes Yes

Noise tolerance Yes No Yes No No Yes Moderately

Need to scan before

learning

No Yes Yes Yes Yes Yes No

Time complexity O(N) O(N) O(Nlog(N)) O(N) O(N²) O(N²) O(N)

Buffer data No Yes Yes Yes Yes Yes Yes

Number of clusters

specified

No Yes No Yes No No Yes

Handle evolving

clusters

Yes No No No Yes Yes Yes

Automatic scale

estimation

Yes No No No No Yes No

Clustering Model Network Centroids Medoids Centroids Network Network Network

Handle different

similarity measures

Yes No Yes No Yes Yes Yes

Density/Partition

based

Density Partition/

Distance

Density Partition Partition/

Distance

Density Partition/

Distance

87

Novelties of the proposed algorithm

Low Computational Complexity.

Danger Theory in Two Forms

Directed Mutation

Weighted Stimulation

Learning the Data in a Single Pass

Natural Mechanism

Applicable to Stream Data

Bi-functionality: Frequent Itemsets Mining + Finding Centroids of Clusters in Large Datasets

Clear and fast identification of outliers.

88

Conclusion

A robust and scalable algorithm for frequent itemsets mining is designed which is well fitted for noisy sparse data like Web usage data.

89

Conclusion

The main factor behind the ability of proposed algorithm to learn in a single pass lies in the richness of the immune network structure that form a dynamic synopsis of the data and danger theory which decide which antigen is dangerous and when new antibodies are needed for combating antigens.

90

Publications

B.Hoda Helmi, Adel T. Rahmani, Nona Helmi, “An Evolutionary Control Model for a Generic Multiagent System Using Artificial Immune Systems”, in proceeding of First Joint Congress on Fuzzy and Intelligent Systems,2007, Ferdowsi University.

B. Hoda Helmi, Adel T. Rahmani, “Image Segmentation with a New Texture Feature Based on AIS ”, In proceeding of the first conference on Data Mining, AmirKabir University, 2007, Tehran, Iran.(farsi)

B.Hoda Helmi, Adel T. Rahmani, “An AIS Algorithm for Web Usage Mining with Directed Mutation”, accepted in IEEE World Congress on Computational Intelligence, CEC division, 2008, Hong Kong.

B. Hoda Helmi, Adel T. Rahmani, “An Enhanced AIS for WUM, inspired by Danger Theory”, submitted to ICEE 2008, Tarbiat Modarres University, 2008, Tehran, Iran. (farsi)

91

Publications

Adel T. Rahmani, B.Hoda Helmi, “EIN-WUM an AIS-based Algorithm for Web Usage Mining”, submitted to Genetic and Evolutionary Computation Conference, 2008, Atlanta, Georgia.

B. Hoda Helmi, Adel T. Rahmani, “A New Web Usage Mining Method based on An Artificial Immune System Solution with Enhanced Network and Danger Theory ”, submitted to International Journal of Control, Automation, and Systems.

B.Hoda Helmi, Adel T. Rahmani, “Evolutionary based Combining of Evolved Neural Network Classifiers”, accepted in IASTAD International Conference on Signal Processing, Pattern Recognition and applications, 2006, Austria. (unrelated)

92

پایان

Thanks

93