Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | joseph-lawson |
View: | 213 times |
Download: | 0 times |
Artificial Immune based Approach to Association
Rule Mining
By: B. Hoda HelmiSupervisor: Adel T. RahmaniJanuary 2008
A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Master of Science in Artificial Intelligence-Computer Engineering
1
Outline
The Immune System Natural
and Artificial
Association Rules
Web Usage Mining
Proposed Algorith
mAISWUM
Results and
Conclusion
2
Natural Immune System
Immune System
• A system that protects the body from foreign substances and pathogenic organisms.
Antibody
• The immune system creates antibodies which match the antigens and cause the pathogens to be destroyed
Antigen
• Substances capable of starting a specific immune response are referred to as antigens (viruses, bacteria, fungi).
3
A High Level Overview4
Natural Immune System
Immunity
Innate
Danger Theory
Adaptive
Clonal
Selection
Network
Theory
Affinity Maturation
Hyper
mutatio
n
5
Innate versus Adaptive IS
Innateimmediately available for combat
6
Adaptive Immunity
epitope
Low affinity
receptor
structurally similar – high affinity
7
Clonal Selection &Affinity Maturation
8
Network Theory
1
2
3
Ag
Stimulation (Positive Response)
Suppression (Negative Response)
Idiotypic network (Jerne, 1974):B cells stimulate each other.Creates an immunological memory
9
Danger Theory10
Artificial Immune System
Algorithms
Affinity
Representation
Application
Solution
AIS
A Framework
for A
IS
11
Association Rules
Set of items: I={I1,I2,…,Im}Transactions: D={t1,t2, …, tn}, tj IItemset: {Ii1,Ii2, …, Iik} ILarge (Frequent) itemset: Itemset
whose number of occurrences is above a threshold.
Support of an itemset: Percentage of transactions which contain that itemset.
12
Given:a set of items I={I1,I2,…,Im},a database of transactions
D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I,
The Association Rule Problem is to identify all association rules X Y with a minimum support and confidence.
13
Association Rule Mining Steps
Find Frequent Itemsets.
Generate rules from frequent itemsets.
Challenging Step In Association Rule Mining
14
Goal
In this project our goal is to find all the
in
using
frequent itemsets
Web usage data
artificial immune system
15
Web Usage Mining
Web usage mining also known as Web log mining
Mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web.
16
Web Usage Mining
Applicatio
ns
•Target potential customers for electronic commerce•Enhance the quality and delivery of Internet information services to the end user•Improve Web server system performance•Identify potential prime advertisement locations•Facilitates personalization/adaptive sites•Improve site design•Fraud/intrusion detection•Predict user’s actions (allows prefetching)
17
Motivations(of choosing this application)
Web
Unstable
Noisy
Enormous
Distributed Data
18
WUM-Definitions
Web Logs
• Set of all accessed to URLs of a Web site that is stored in Web server
Session
• A sequence of URLs that are accessed by a user in one visit of Web site. (Itemset)
Strong trend
• crowded paths that frequently are traversed by users. (Frequent Itemsets)
19
Web Log
O:0000002560 || T:1997/09/12-22:43:00 ||U:/ || R:http://www.hyperreal.org/
O:0000002560 || T:1997/09/12-22:50:27 || U:/categories/software/ || R:http://www.hyperreal.org/music/machines/
O:0000002560 || T:1997/09/12-22:50:38 || U:`/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/
O:0000002560 || T:1997/09/12-22:50:47 || U:/categories/software/Windows/V909V03.TXT || R:http://www.hyperreal.org/music/machines/categories/software/Windows/
O:0000002560 || T:1997/09/12-22:51:06 || U:/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/
20
Session Construction
URLS IDX
/ 0/categories/software/
1
/categories/software/Windows/
2
/categories/software/Windows/V909V03.TXT
3
/categories 4/manufacturers 5/samples.html/ 6/gearlists/ 7/features/ 8/ecards/ 9
1 1 1 1 0 0 0 0 0 007:27
00:11
02:10
00:19
02:01
00:00
00:00
00:00
00:00
00:00
1 1 2 1 0 0 0 0 0 0
Duration
Frequency
eVisitedPagPagePagesitsNumberOfVi
PagesitsNumberOfViPageFrequency
))((
)()(
))(/)((max
)(/)()(
PageLengthPageionTotalDurat
PageLengthPageionTotalDuratPageDuration
eVisitedPagpage
21
Representation
Antibody: (strong trends)
Antigen: (incoming sessions)
URL1(0/1)
URL2(0/1)
URLm(0/1)
URL1(0/1)
URL2(0/1)
URLm(0/1)
• Age• Stimulation Level• Scale
Antibody features
• ValidityAntigen features
22
Scenario
Antigen enters the body
Determine if the first signal is produced? (2 signals are needed for an antigen to trigger AIS, first signal is
produced if antigen is harmful to body)
If first signal is produced, present antigen to antibodies and compute distance, weight and influence zone.
Determine antibody with maximum weight. If maximum weight > threshold
compute SL and IZ for antibodyelse create by duplication a new antibody.
Clone and Mutate.
23
Danger Signal
Danger Theory (two signal approach) If antigen is harmful trigger an IS response else discard
the antigen.
In data mining context : harmful interesting (valid)
What is Danger signal in our system?◦ We should find a measure to determine the validity of
sessions.
24
Validity Measure
)2
1)(1(
),(
)(
1
1 1
PP
jksimilarity
SessionyConsistenc
P
k
P
kj
D
djisimilarity
ji,1),(
))(/)((max
)(/)()(
PageLengthPageionTotalDurat
PageLengthPageionTotalDuratPageDuration
eVisitedPagpage
eVisitedPagPagePagesitsNumberOfVi
PagesitsNumberOfViPageFrequency
))((
)()(
25
Validity Measure
)()(
)()(2)(
PageDurationPageFrequency
PageDurationPageFrequencyPageInterest
P
w
SessionInterest
P
i
pi 1)(
)()(
)()(2)(
SessionyConsistencSessionInterest
SessionyConsistencSessionInterestSessionValidity
26
Affinity Measure
What affinity measure is used in our proposed algorithm?
L
l
L
l
i
L
l
ji
ji
lantigenlantibody
lantigenInterestlantibody
antigenantibodyS
1 1
1cos
][][
])[(][
),(
27
Affinity Measure
)2
(2
2
ij
ijd
ij ew
Weight function decreases with distance from the antigen/data location.
is a scale parameter that controls the decay rate of the weights along the spatial dimensions
2ij
28
Stimulation Level
2
1
iJ
J
j
ij
iJ
w
s
21
iJ
iJiJiJ
wWs
1
1
1
J
j
ijiJ wW
29
Weighted Stimulation
)(2
1 JwwW
ws validityiJ
iJiJiJ
30
Network Stimulation & Suppression
21
21
21 )(
iJ
N
n
in
iJ
N
n
in
validityiJ
iJiJiJ
BB
ww
JwwW
ws
31
Cloning
min
1
ageage
ws
wsKN iN
nn
iclonesclones
B
Antibodies are cloned in proportion to their stimulationlevels relative to the average network stimulation.
To avoid preliminary proliferation of antibodies and to encourage a diverse repertoire new antibodies do not clone before they are mature (their age exceeds a threshold)
32
Hypermutation
Somatic hyper mutation is a powerful natural exploration mechanism in IS, that allows it to learn how to respond to new antigens that have never been seen before.
very costly and inefficient operation since its complexity is exponential in the number of features.
we model this operation in AIS by an instant antigen duplication whenever an antigen is encountered that fails to activate the entire immune network.
33
Directed Mutation
Antibodies which are added to population via mutation are always superior individuals.
In this mutation mechanism whenever the system realize there are not enough good antibodies to confront with antigens, new antibodies add to population.
It is a new from of DANGER THEORY.
Directed mutation mechanism is as follow:
34
Directed Mutation
0 1 1 0 0 0 0 1 0
0 1 0 0 1 1 1 0 0
0 1 1 1 0 1 1 0 0
1 1 0 0 1 1 1 0 0
1 0 0 0 0 1 1 1 1
0 1 0 0 0 1 1 1 0
Web log
In to the system
35
Directed Mutation
0 1 1 0 0 0 0 -1 0
0 1 0 0 +2 1 1 -1 0
0 1 +2 +2 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
0 1 0 0 0 1 1 1 0
36
Directed Mutation
1 1 0 1 0 0 0 1 0
0 1 1 1 0 0 0 1 0
1 1 0 0 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
1 1 0 1 0 1 0 1 0
0 1 1 0 0 0 0 -1 0
0 1 0 0 +2 1 1 -1 0
0 1 +2 +2 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
37
Directed Mutation
1 1 0 1 0 -1 0 1 0
0 1 1 1 0 0 0 1 0
1 1 0 -1 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
1 1 0 1 0 1 0 1 0
0 1 1 0 0 0 0 -1 0
0 1 0 0 +2 1 1 -1 0
0 1 +2 +2 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
38
Directed Mutation
1 1 0 1 0 -1 0 1 0
0 1 1 1 0 0 0 1 0
1 1 0 -1 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
1 1 0 1 0 1 0 1 0
0 1 1 0 0 0 0 -1 0
0 1 0 0 +2 1 1 -1 0
0 1 +2 +2 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
39
Directed Mutation
0 1 1 0 0 0 0 1 0
0 1 0 0 +2 1 1 -1 0
0 1 +3 1 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
0 1 0 1 0 1 1 0 0
1 1 0 1 0 -1 0 1 0
0 1 1 1 0 0 0 1 0
1 1 0 -1 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
40
Directed Mutation
1 1 0 1 0 0 -1 1 0
0 1 1 1 0 0 0 1 0
1 1 0 -1 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
1 1 0 1 0 0 1 0 0
0 1 1 0 0 0 0 1 0
0 1 0 0 +2 1 1 -1 0
0 1 +3 1 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
41
Directed Mutation
1 1 0 1 0 0 -1 +2 0
0 1 1 1 0 0 0 1 0
1 1 0 -1 0 1 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
1 1 0 1 0 0 1 0 0
0 1 1 0 0 0 0 1 0
0 1 0 0 +2 1 1 -1 0
0 1 +3 1 0 1 1 -1 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
42
Decide to Mutate
After some times
1 1 -9 1 0 0 -1 +8 0
0 1 1 1 0 0 0 1 0
1 1 0 -10 0 -9 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
0 1 1 0 0 0 0 1 0
0 1 0 0 +9 1 1 -7 0
0 1 +9 1 0 1 1 -8 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
43
Mutation Occur
After some times
1 1 -9 1 0 0 -1 +8 0
0 1 1 1 0 0 0 1 0
1 1 0 -10 0 -9 0 1 0
1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 1 0
0 1 1 0 0 0 0 1 0
0 1 0 0 +9 1 1 -7 0
0 1 +9 1 0 1 1 -8 0
1 0 0 0 1 1 1 0 0
+2 -1 0 0 0 1 1 1 +2
0 1 0 0 0 1 1 1 0
0 1 0 1 0 1 1 1 0
1 1 1 1 0 0 -1 0 0
1 1 0 1 0 0 0 1 0
44
Directed Mutation
Directed mutation is not computationaly complex.
It doesn't cause antibodies to destroy before they have to leave population.
It make system intelligent -> system can decide when to create new individuals.
After each T antigens enter the system, directed mutation happens.
45
Compression
Compression: cluster antibody population into k clusters.
external interactions: those occurring between an antigen (external agent) and the antibody in the immune network.
internal interactions: those occurring between one antibody and all other antibodies in the immune network.
The most expensive computation and storage overhead stems from calculating and storing all the internal network interactions (quadratic complexity with respect to the network size).
After compression: ◦ internal interactions:
◦ external interactions: k
choosing an appropriate number of clusters
2BN 1)( 2 k
k
NB
BN
BNk )( BNO
46
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
1
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
47
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
48
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
49
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
50
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
51
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
52
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
53
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
54
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
55
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
56
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
57
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
5
58
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
55
1
59
Algorithm Visualization
16
29 2
8
30
43
44
42
45
41
48
46
49
50
47
26
36
40
37
39
35
34
33
38
32
12
31
15
11
13
14
27
18
201
9
17
9
1
0
87
6
25
24
23
22
21
1
2
3 4
55
1
60
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
6
9
24
23
25
22
21
1
4
3 5
2
1
2
10
49
50
61
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
62
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
63
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
64
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
65
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
66
Algorithm VisualizationX
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
6
6
68
67
Algorithm VisualizationX
17
28 2
9
30
45
44
42
43
41
48
46 4
7
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
49
50
6
6
68
68
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
6
6
68
69
Algorithm Visualization
17
28 2
9
30
45
44
42
43
41
26
36
39
37
40
35
34
33
38
32
12
31
13
11
15
14
27
18
192
0
16
8
7
76
9
24
23
25
22
21
1
4
3 5
2
1
2
10
6
6
6
68
46
70
Pseudocode
maxBN
1- Fix the maximal population size,
maxBN .
2- Initialize antibodies using a cross section of input data, initi 2 .
3- Compress immune network into K subnets using five iteration of K-means.
4- Repeat for each antigen jantigen {
4-1 Compute )( jantigenvalidity ;
4-2 If )( jantigenvalidity < validity threshold
4-2-1 discard jantigen and continue with a new antigen;
4-1 Present jantigen to each subnet centroid ( KkCk ,...,1, ) in network, compute distance
and weight. 4-2 Determine the most activated subnet (ma subnet) which has maximum kjw .
4-3 If all antibodies in ma subnet have minwwij (antigen weak to activate subnet){
4-3-1 create by duplication a new antibody (antibody= jantigen , initi 2 )
}else{ 4-3-1 Increment number of stimulation of antibody i; 4-3-2 Compute iantibody stimulation level ( ijws )
4-3-3 Update iantibody scale value ( 2ij )
} 4-4 clone antibodies; 4-5 If population size >
maxBN {
4-5-1 For each antibody i in network
4-5-1-1 If min. ageageantibodyi BN
n niJi swsantibody1
. ;
4-5-2 Sort antibodies in ascending order of their stimulation level; 4-5-3 Kill worst excess ))((
maxBB NNtop antibodies.
} 4-6 mutate antibodies after every T antigen. 4-7 After every T antigen, use five iteration K-means with previous centroid as initial centroid.
}
71
Data
Data set 1• One week of HTTP
requests to Music Machine Web site. www.hyperreal.org
• 220146 Requests.• 19542 Sessions.• 4756 URLs.
Data set 2• One week of HTTP
requests to the University of Saskatchewan’s WWW server.
• 44298 Requests.• 9188 Sessions.• 1519 URLs.
72
Ground Profiles
For evaluating learned profiles, it should be shown that the learned profiles are good representatives of the input data:
Summarization ability of AISWUM
In order to show this ability, a comparison between distribution of the learned profiles and input data should be done, so:
we need some ground profiles
Ground profiles are extracted using:
Scalable K-Means
73
Evaluation Metrics
L
kki
L
kkcki
ci
tAb
gtAb
gtAbprc
1,
1,,
)(
))((
)),((
L
kkc
L
kkcki
ci
g
gtAb
gtAbcvg
1,
1,, ))((
)),((
otherwise
prcgtAbprcifgtABPRC ci
tN
ic
Ab
0
min)),((max1)),((
)(
1
otherwise
cvggtAbcvgifgtABCVG ci
tN
ic
Ab
0
min)),((max1)),((
)(
1
)),(()),((),(, ccCVGPRC gtABCVGgtABPRCctS
74
Results (Music Machine)
Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.
75
Precision
Distribution of precise antibodies per input category at time t.
76
Coverage
Distribution of complete antibodies per input category at time t.
77
Results (Saskatchewan University)
Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.
78
Precision
Distribution of precise antibodies per input category at time t.
79
Coverage
Distribution of complete antibodies per input category at time t.
80
Evaluation Metrics
x c
x c
N
t
N
cCVGPRC
N
t
N
cCVGPRCCVGPRC
ctS
tctSctS
tP
1 1,
1 1,,
),(
),,(),(
)(
Overall level of learned antibodies precision with respect to input datat
Ratio of learned antibodies that accurately represent the past input data to all of learned antibodies
t
81
Evaluation Metrics
Overall coverage of learned antibodies with respect to input data
x c
x c
N
t
N
cCVGPRC
N
t
N
cCVGPRCCVGPRC
tctS
tctSctS
tC
1 1,
1 1,,
),,(
),,(),(
)(
t
Ratio of past input data that are summarized accurately with antibodies to the all input data.
t
82
Results (Music Machines)
Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.
Ratio of past input data that are summarized accurately with antibodies to the all input data.
t
t
83
Results (Saskatchewan)
t
84
Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.
Ratio of past input data that are summarized accurately with antibodies to the all input data.
t
Results
Maximum
Contentment
Minimum
Contentment
Average
Contentmen
t of 50 users
41% 15% 28% State 1
60% 40% 51% State 2
67% 45% 56% State 3
Danger Theory
Weighted Items
Weighted Sessions
State 1 No No No
State 2 Yes No No
State 3 Yes Yes Yes
85
Run Time
The rune time with one scan of data with non-optimal C++ code on Pentium 4 PC tooks:◦For the first dataset: less than 6 min.◦For the second dataset: less than 3 min.
86
Comparison with other methods
Method AIS-WUM SKM DBSCAN BIRCH aiNet Fuzzy AIS SOSDM
Reliability/
Insensitivity to initial
condition
Yes No Yes No Yes Yes Yes
Noise tolerance Yes No Yes No No Yes Moderately
Need to scan before
learning
No Yes Yes Yes Yes Yes No
Time complexity O(N) O(N) O(Nlog(N)) O(N) O(N²) O(N²) O(N)
Buffer data No Yes Yes Yes Yes Yes Yes
Number of clusters
specified
No Yes No Yes No No Yes
Handle evolving
clusters
Yes No No No Yes Yes Yes
Automatic scale
estimation
Yes No No No No Yes No
Clustering Model Network Centroids Medoids Centroids Network Network Network
Handle different
similarity measures
Yes No Yes No Yes Yes Yes
Density/Partition
based
Density Partition/
Distance
Density Partition Partition/
Distance
Density Partition/
Distance
87
Novelties of the proposed algorithm
Low Computational Complexity.
Danger Theory in Two Forms
Directed Mutation
Weighted Stimulation
Learning the Data in a Single Pass
Natural Mechanism
Applicable to Stream Data
Bi-functionality: Frequent Itemsets Mining + Finding Centroids of Clusters in Large Datasets
Clear and fast identification of outliers.
88
Conclusion
A robust and scalable algorithm for frequent itemsets mining is designed which is well fitted for noisy sparse data like Web usage data.
89
Conclusion
The main factor behind the ability of proposed algorithm to learn in a single pass lies in the richness of the immune network structure that form a dynamic synopsis of the data and danger theory which decide which antigen is dangerous and when new antibodies are needed for combating antigens.
90
Publications
B.Hoda Helmi, Adel T. Rahmani, Nona Helmi, “An Evolutionary Control Model for a Generic Multiagent System Using Artificial Immune Systems”, in proceeding of First Joint Congress on Fuzzy and Intelligent Systems,2007, Ferdowsi University.
B. Hoda Helmi, Adel T. Rahmani, “Image Segmentation with a New Texture Feature Based on AIS ”, In proceeding of the first conference on Data Mining, AmirKabir University, 2007, Tehran, Iran.(farsi)
B.Hoda Helmi, Adel T. Rahmani, “An AIS Algorithm for Web Usage Mining with Directed Mutation”, accepted in IEEE World Congress on Computational Intelligence, CEC division, 2008, Hong Kong.
B. Hoda Helmi, Adel T. Rahmani, “An Enhanced AIS for WUM, inspired by Danger Theory”, submitted to ICEE 2008, Tarbiat Modarres University, 2008, Tehran, Iran. (farsi)
91
Publications
Adel T. Rahmani, B.Hoda Helmi, “EIN-WUM an AIS-based Algorithm for Web Usage Mining”, submitted to Genetic and Evolutionary Computation Conference, 2008, Atlanta, Georgia.
B. Hoda Helmi, Adel T. Rahmani, “A New Web Usage Mining Method based on An Artificial Immune System Solution with Enhanced Network and Danger Theory ”, submitted to International Journal of Control, Automation, and Systems.
B.Hoda Helmi, Adel T. Rahmani, “Evolutionary based Combining of Evolved Neural Network Classifiers”, accepted in IASTAD International Conference on Signal Processing, Pattern Recognition and applications, 2006, Austria. (unrelated)
92
پایان
Thanks
93