+ All Categories
Home > Documents > Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Date post: 22-Dec-2015
Category:
View: 229 times
Download: 2 times
Share this document with a friend
Popular Tags:
24
Parallel C3M 1 Parallel C3M Aylin Tokuç Erkan Okuyan Özlem Gür
Transcript
Page 1: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 1

Parallel C3MParallel C3M

Aylin TokuçErkan Okuyan

Özlem Gür

Aylin TokuçErkan Okuyan

Özlem Gür

Page 2: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 2

OutlineOutline

• Basics of Parallel computing

• Sequential C3M

• Parallel C3M

• Basics of Parallel computing

• Sequential C3M

• Parallel C3M

Page 3: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 3

Parallel ComputationParallel Computation

Decomposition: The process of dividing a computation into smaller parts.

Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.

Decomposition: The process of dividing a computation into smaller parts.

Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.

Page 4: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 4

Parallel Computation Primary Considerations

Parallel Computation Primary Considerations

• Load Balancing

• Minimizing Communication

• Task Dependency Optimization

• Load Balancing

• Minimizing Communication

• Task Dependency Optimization

Page 5: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 5

Parallel Computation Load Balancing

Parallel Computation Load Balancing

Page 6: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 6

Parallel Computation Minimizing Communication

Parallel Computation Minimizing Communication

Page 7: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 7

Parallel Computation Task Dependency Optimization

Parallel Computation Task Dependency Optimization

Page 8: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 8

C3M AlgorithmC3M Algorithm

1- Determine the cluster seeds of the database.

2- if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d

3- If there remain unclustered documents, group them into a ragbag cluster.

Page 9: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 9

C3M FormulasC3M Formulas

Page 10: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 10

C3M – Sample MatricesC3M – Sample Matrices

000101

110000

110001

001111

101001

D

.3750.0.125.375.125

0.0.417.4170.0.167

.083.277.361.083.194

.1880.0.063.563.188

.083.111.194.25.361

C

Page 11: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 11

Parallel C3M- DistributionParallel C3M- Distribution

Distribute rows among processors

Load balancing by cyclic block distribution

Distribute rows among processors

Load balancing by cyclic block distribution

Page 12: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 12

Local CalculationsLocal Calculations

All processors calculate α, partial β and PiAll processors calculate α, partial β and Pi

Current Method for Weighted Matrix: too costlyCurrent Method for Weighted Matrix: too costly

Need coloumn vectors (but row-wise partitioned)

Need coloumn vectors (but row-wise partitioned)

Page 13: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 13

Seed Powers PiSeed Powers Pi

• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.

• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.

• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.

• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.

Page 14: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 14

Minimize Communication - Proposed Heuristic

Minimize Communication - Proposed Heuristic

m

kkii d

1

),1min('

n

j

jjijiii mmdP

1

'1

''

# of non-zeros# of non-zeros

All processors calculate α, partial β and β’

Page 15: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 15

Effectiveness of HeuristicEffectiveness of Heuristic

• A matlab script is written to compare the effectiveness of the proposed heuristic.

• Correlation Coeeficient = 0.95

• A matlab script is written to compare the effectiveness of the proposed heuristic.

• Correlation Coeeficient = 0.95

Page 16: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 16

Communication btw Processors

Communication btw Processors

• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.

• Then, all processor calculate cii=δi

• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.

• Then, all processor calculate cii=δi

Page 17: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 17

# of Clusters# of Clusters

• Processors exchange local δ

• All processors calculate nc

• Processors exchange local δ

• All processors calculate nc

m

iicn

1

Page 18: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 18

Cluster-head SelectionCluster-head Selection

• Calculate seed power of local documents

• Exchange largest nc seed powers.

• Calculate largest nc seed powers among all Pi and find cluster heads.

• Calculate seed power of local documents

• Exchange largest nc seed powers.

• Calculate largest nc seed powers among all Pi and find cluster heads.

n

j

jjijiii mmdP

1

'1

''

Page 19: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 19

Clustering Non-seed DocsClustering Non-seed Docs

• Exchange seed documents

• Cluster non-seed documents (as in sequential C3M) in each processor.

• Exchange seed documents

• Cluster non-seed documents (as in sequential C3M) in each processor.

Page 20: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 20

Future WorkFuture Work

• Term Based Clustering

• Overlapping Clusters

• Term Based Clustering

• Overlapping Clusters

Page 21: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 21

C3M SummaryC3M Summary• Load Balancing with cyclic block distribution• Communication minimization by a new

heuristic• Task dependency minimized with block

distirbution & heuristic.

• Load Balancing with cyclic block distribution• Communication minimization by a new

heuristic• Task dependency minimized with block

distirbution & heuristic.

n

j

jjijiii mmdP

1

'1

''

Page 22: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 22

ReferencesReferences• Concepts and the effectiveness of the cover

coefficient-based clustering methodology, F. Can, E. A. Ozkarahan

• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder

• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder

• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan

• Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan

• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder

• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder

• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan

Page 23: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 23

Questions?Questions?

Page 24: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 24

The EndThe End

Thank you for your patience

Thank you for your patience


Recommended