+ All Categories
Home > Documents > Focused Reducts Janusz A. Starzyk and Dale Nelson.

Focused Reducts Janusz A. Starzyk and Dale Nelson.

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
42
Focused Reducts Janusz A. Starzyk and Dale Nelson
Transcript
Page 1: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Focused Reducts

Janusz A. Starzyk and Dale Nelson

Page 2: Focused Reducts Janusz A. Starzyk and Dale Nelson.

What Do We Know?Major Assumption

ASSUMPTION:This is ALL we know

RealWorld

Model

19 18 16 84 38 124 69 60 55 36 10918 27 184 63 28 13 16 68 67 29 5911 30 47 185 25 31 31 29 52 101 426 19 76 151 34 64 50 26 19 83 465 37 44 223 12 53 237 28 51 36 27

15 20 80 153 11 48 254 90 97 88 3725 14 51 107 27 72 79 78 82 36 4221 30 211 134 36 109 159 110 48 68 9230 30 175 91 35 128 68 45 47 102 4715 8 66 95 45 175 116 48 142 114 4822 11 151 78 30 79 20 78 54 100 10115 8 138 103 40 67 57 32 53 24 6826 24 50 71 50 145 73 196 13 52 29

Sampled Data

Page 3: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Problem Size Dilemma

250 125 169 89 43 100 33 190251 255 110 200 50 83 56 150217 250 98 141 48 66 44 232108 78 105 181 34 33 5 141119 255 244 241 65 33 19 5078 222 212 58 109 38 86 25592 124 68 144 67 64 55 218

…1024

...1602

Page 4: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Rough Set Tutorial

• Difference between rough sets and fuzzy sets

• Labeling data

• Remove duplicates/ambiguities

• What is a core?

• What is a reduct?

Page 5: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Rough Sets vs Fuzzy Sets

Fuzzy Sets - How gray is the pixel

Rough Sets - How big is the pixel

Page 6: Focused Reducts Janusz A. Starzyk and Dale Nelson.

ExampleSample HRR Data

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 .680 .127 .121 .5162 1 .948 .272 .022 .4403 1 .821 .189 .139 .4804 2 .396 .680 .279 .2395 2 .512 .851 .184 .2906 2 .394 .281 .338 .5647 2 .775 .507 .006 .6178 2 .281 .359 .582 .7739 3 .113 .097 .451 .45010 3 .896 .327 .122 .927

Page 7: Focused Reducts Janusz A. Starzyk and Dale Nelson.

ExampleLabel Data

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 3 2 1 38 2 1 2 2 39 3 1 1 2 210 3 3 2 1 3

Label 1 < .25 .25 >= Label 2 <=.45 Label 3 > .45

Labeling can be different for different columns/attributes

Ranges can be different for different columns/attributes

Page 8: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 3 2 1 38 2 1 2 2 39 3 1 1 2 210 3 3 2 1 3

Remove Ambiguities & Duplicates

Page 9: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Equivalence Classes

E1={1, 2, 3} E2={4, 5} E3={6} E4={7} E5={8}

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

Page 10: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Definitions

• Reduct - A reduct is a reduction of an information system which results in no loss of information (classification ability) by removing attributes (range bins). There may be one or many for a given information system)

• Core - A core is the set of attributes (range bins) which are common to all reducts.

Page 11: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Core

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

Signals 6 and 8 are ambiguous upon removal of Range Bin 1.Therefore, Range Bin 1 is part of core.

Core - The range bins common to ALL reducts - The most essential range bins without which signals cannot be classified

Page 12: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Core

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

No ambiguous signals therefore, Range Bin 2 is NOT part of core.

Page 13: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Core

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

No ambiguous signals therefore, Range Bin 3 is NOT part of core.

Page 14: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Core

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

No ambiguous signals therefore, Range Bin 4 is NOT part of core.

Page 15: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

Compute ReductsRange Bin 1 + Range Bin 2

Range Bin 1 and Range Bin 2 classify therefore, they belong to a reduct

Page 16: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute ReductsRange Bin 1 + Range Bin 3

Range Bin 1 and Range Bin 3 do not classify therefore, they do NOT belong to a reduct

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

Page 17: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute ReductsRange Bin 1 + Range Bin 4

Range Bin 1 and Range Bin 4 classify therefore, they belong to a reduct

Signal Target ID Range Bin 1 Range Bin 2 Range Bin 3 Range Bin 41 1 3 1 1 22 1 3 1 1 23 1 3 1 1 24 2 2 3 1 15 2 2 3 1 16 2 2 1 2 27 2 1 2 2 38 3 1 1 2 2

Page 18: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Reduct Summary

• Range bins 1 and 2 are a reduct

– Sufficient to classify all signals

• Range bins 1 and 4 are a reduct

– Sufficient to classify all signals

• Range bins 1 and 3 are NOT a reduct

– Cannot distinguish target classes 2 and 3

• No need to try

– Range bins 1, 2, 3

– Range bins 1, 2, 4

Page 19: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Did You Notice?

• Calculating a reduct is time consuming!

• n = 29 value = 536,870,911

• We are interested in n 50

• This is a BIG NUMBER requiring a lot of time to compute reduct which is a f (# signals), too

n

k knk

n

1 )!(!

!

Page 20: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Why Haven’t Rough Sets Been Used Before?

Page 21: Focused Reducts Janusz A. Starzyk and Dale Nelson.

The Procedure

• Normalize signal• Partition signal

– Block– Interleave

• Wavelet transform• Binary multi-class entropy labeling• Entropy based range bin selection• Determine minimal reducts• Fuse marginal reducts for classification

Page 22: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Data

• Synthetic generated by XPATCH

• Six targets– 1071 Signals per target– 128 Range bins/signal

– Azimuth -25o to +25o

– Elevation -20o to 0o

Page 23: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Normalize the Data

• Ensures all data is range normalized

• Use the 2 Norm

• Divide each signal bin value by N

2

1

2

iiyN

Page 24: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Partition the Signal

1 128

64 651 128

64 651 12832 33 96 97

64 651 12832 33 96 9716 17 48 48 80 81 112 113

1

21

1 2 3 4

4321 5 6 7 8

Block Partitioning

Page 25: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Partition the Signal

1 128

1 128

1 128

1 128

1

Interleave Partitioning

1st 2nd 3rd 4th 5th 6th 7th 8th

1 Piece

2 Pieces

4 Pieces

8 Pieces

Page 26: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Why Use a Wavelet Transform?

0 200 400 600 800 1000 12000

5

10

15

20

25

30

35

40

45

50Feature and Maximum Cluster Sizes

Feature Index

Cluster

Size

Original Signal

Best-20/60 signalsClassified

BestWavelet50/60 SignalsClassified!!

Many features are better than the best from original signal

Page 27: Focused Reducts Janusz A. Starzyk and Dale Nelson.

HRR Signal and Its Haar Transform

Page 28: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Multi-Class Information Entropy

},,{ 1 nt xxu

}|{ tiixt uxxxC

t

xtxt u

CP

ixi

xjxii P

PPP

x

ixi

xjxii P

PPP

x

)1)(1(

1)1()1(6 kji

6

1

6

1

loglogj

jxjx

jjxjx PPPPE

36

1

36

1

loglogl

ll

lll xxxx

PPPP

Let xi be range bin values across all signals for a target class

Define

Without assuming any particular distribution we can define the probability as:

Using this definition we define two other probabilities

where

Then multi-class entropy is defined as:

Page 29: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Binary Multi-Class Labeling

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.250

2

4

6

8

10

12

14

16

18

Range Bin Value

Target Gaussians Bin 910

target 1target 2target 3target 4target 5target 6

-0.1 -0.05 0 0.05 0.1 0.150

2

4

6

8

10

12

14

Range Bin Value

Target Gaussians Bin 1000

target 1target 2target 3target 4target 5target 6

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1

2

3

4

5

6

7

8

9

10

Range Bin Value

Target Gaussians Bin 54

target 1target 2target 3target 4target 5target 6

0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.020

50

100

150

200

250

Range Bin Value

Target Gaussians Bin 130

target 1target 2target 3target 4target 5target 6

Page 30: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Range Bin Selection

• Total range bins available depends on partition size

• We chose 50 bins per reduct– Time considerations

– Implications

• Based on maximum relative entropy

Signal Size

Transformed Range Bins

128 102464 44832 19216 80

Signal Size

Bins in Classifier

128 5064 10032 20016 400

Page 31: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Core

• Computation of core is easy and fast– Eliminate one range bin at a time and see if the

training set is ambiguous - only that range bin can discriminate between the ambiguous signals

– Accumulate the bins resulting in ambiguous data - that is the core

• These range bins MUST be in every reduct

• O(n) process

Page 32: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Compute Minimal Reducts

• To the core add one range bin at a time and compute the number of ambiguities

• Select the range bin(s) with the fewest ambiguities-there may be several-save these as we will use them to compute the reduct

• Add that range bin to the core and repeat previous step until there are no ambiguities - this is a reduct

• Calculate reducts for all bins with equivalent number of ambiguities-yields multiple reducts

• O(n2) process

Page 33: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Time Complexity

Training Set Size 50 to 400 Attributes (Range Bins) 1602 SignalsTest Set Size 4823 Signals

5 10 15 20 25 30 35 40 45 50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000Run Time

Number of Bins

Tim

e Need 50

Page 34: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Fuzzy Rough Set Classification

• Test signals may have a range bin value very close to labeling division point

• If this happens we define a distance where this is considered a “don’t care” region

• Classification process proceeds without the “don’t care” range bin

))max(,)min(min(* diid xxxxb

Page 35: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Weighting FormulaRequirements

• We desire the following for combining classifications– All Pcc(s) = 0 weight = 0– All Pcc(s) = 1 weight = 1– Several low Pcc(s) weight higher than

any of the Pcc(s)– One high Pcc and several low Pcc(s)

weight higher than the highest Pcc

Page 36: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Weighting Formula

n

i i

n

ii

t

Pcc

PccPccPcc

W

1

max1

max

11

11

1

Page 37: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Fusing Marginal Reducts

• Each signal is marked with the classification by each reduct along with the reduct’s performance (Pcc) on the training set

• A weight is computed for each target class for each signal

• A signal is assigned the target class with the highest weight

Page 38: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Results - Training

Page 39: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Results Testing

Page 40: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Conjectures

• Robust in the presence of noise– Due to binary labeling– Due to fuzzification

• Robust to signal registration– Due to binary labeling– Due to averaging effect of wavelets on interleaved

partitions– Due to fuzzification

Page 41: Focused Reducts Janusz A. Starzyk and Dale Nelson.

5 10 15 20 25 30 35 40 45 50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000Run Time

Number of Bins

Tim

e

Rough Set Theoretic HRR ATR - Summary

0 1 TIME

METHOD

-Normalize Signal

-Partition Signal

- Block

- Interleave

-Wavelet Transform

-Binary Multi-class Entropy Labeling

-Entropy based Range Bin Selection

-Determine Minimal Reducts

-Fuse marginal reducts for classification

BREAKTHROUGHS-Reduct (classifier) generation time from exponential to quadratic !-Fusion of marginal (poor performing) reducts-Wavelet Transform Aiding-Multi partition to increase number of range bins considered-Use of binary multi-class entropy labeling-Entropy based range bin selection-Performance within 1% of theoretic best-Max problem size increased by 2 orders of magnitude

APPLICATIONS

-1-D Signals

-HRR

-LADAR vibration

-Sonar

-Medical

-Stock market

-Data Mining

Quadratic

Exponential

Page 42: Focused Reducts Janusz A. Starzyk and Dale Nelson.

Future Directions

• Fuzz factor sensitivity study

• Sensitivity to signal alignment

• Sensitivity to noise

• Iterated wavelet transform performance study

• Effectiveness on air to ground targets

• Other application areas


Recommended