+ All Categories
Home > Documents > Distributed MATLAB Workshop...Distributed MATLAB Workshop Rahman Tashakkori & Darren Greene Jan 20,...

Distributed MATLAB Workshop...Distributed MATLAB Workshop Rahman Tashakkori & Darren Greene Jan 20,...

Date post: 28-Jan-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
28
1 1 Distributed MATLAB Workshop Rahman Tashakkori & Darren Greene Jan 20, 2007 Department of Computer Science Appalachian State University A Consortium to Promote Computational Science and High Performance Computing Appalachian State University 2 MATLAB A Quick Overview
Transcript
  • 1

    1

    Distributed MATLAB Workshop

    Rahman Tashakkori&

    Darren Greene

    Jan 20, 2007Department of Computer Science

    Appalachian State University

    A Consortium to Promote Computational Science and High Performance Computing Appalachian State University

    2

    MATLAB A Quick Overview

  • 2

    3

    Entering and Generating Matrices– Direct assignment in row major order, such as

    A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]A =

    16 3 2 135 10 11 89 6 7 124 15 14 1

    4

    Entering and Generating MatricesGenerate matrices from built-in functions

    • sum(A) returns the sum of all the columns, which is 34 for each column

    • sum(A′)′ returns the sum for all row, which is 34, note that the transpose operation is ′

    • sum(diag(A)) sums the main diagonal, also 34

    • sum(diag( fliplr(A) )) sums the diagonal from lower left to upper right, it is also 34

  • 3

    5

    Accessing Matrix Elements

    • Subscripts– Uses parentheses to indicate subscripts– A(1,4) + A(2,4) + A(3,4) + A(4,4) returns 34

    • Out of range indices– Trying to read an element out of range produces

    an error message– Trying to assign a value to an element out of

    range expands the matrix !!

    A(5,4) = 17 producesDangerous one

    16 3 2 13 05 10 11 8 09 6 7 12 04 15 14 1 17

    6

    The Colon Operator

    • Examples– 1:5 produces 1 2 3 4 5– 20:-3:0 produces 20 17 14 11 8 5 2– 0:pi/4:pi produces 0 0.7854 1.5708 2.3562

    3.1416

    • Accessing portions of a matrix– A(1:k,j) references the first k elements in the j

    column– A(:,end) references all elements in the last

    column

  • 4

    7

    The if command

    • An example

    • Some useful Boolean tests Useful Boolean tests for matrices

    8

    Commands for repetition• The for command (notice the required ‘end’)

    • The while command (also requires ‘end’)a = 0; fa = -Inf;b = 3; fb = Inf;while b-a > eps*b

    x = (a+b)/2;fx = x^2 – 2*x – 5;if sign(fx) == sign(fa)

    a = x; fa = fx;else

    b = x; fb = fx;end

    end

  • 5

    9

    Scripts and M files

    • An M file (.m extension) stores MATLAB code– The file name is used to reference the code– The code is a sequence of commands that is

    executed

    • An example Stored in rootFinder.mExecuted by ‘rootFinder’

    on command line>> rootFinder

    % rootFinder.ma = 0; fa = -Inf;b = 3; fb = Inf;while b-a > eps*b

    x = (a+b)/2;fx = x^2 – 2*x – 5;if sign(fx) == sign(fa)

    a = x; fa = fx;else

    b = x; fb = fx;end

    end

    10

    Distributed MATLAB Overview

  • 6

    11

    IntroductionThe Distributed Computing Toolbox and the MATLAB Distributed Computing Engine enable us to coordinate and execute independent MATLAB operations simultaneously on a cluster of computers, speeding up execution of largeMATLAB jobs.

    A job is some large operation that we need to perform in a MATLAB session. We will break a job down into segments called tasks.

    Users decide how best to divide a job into tasks. One can divide a job into identical tasks, but tasks do not have to be identical.

    Building a house is a job, smaller parts (carpentry, painting, etc…) are tasks. How nice it was if they all could go in parallel.

    12

    A good question for you to think about

    How can a successful parallelization be applied in building houses?

  • 7

    13

    IntroductionThe MATLAB session in which the job and its tasks are defined is called the client session. Often, this is on the machine where we program MATLAB.

    The client uses the Distributed Computing Toolbox to perform the definition of jobs and tasks. The MATLAB Distributed Computing Engine (MDCE) is the product that performs the execution of the job by evaluating each of its tasks and returning the result to the client session.

    The job manager is the part of the engine that coordinates the execution of jobs and the evaluation of their tasks. The job manager distributes the tasks for evaluation to the engine’s individual MATLAB sessions called workers.

    14

    Basic Distributed Computing Configuration

  • 8

    15

    16

    Job ManagerThe job manager can be run on any machine on the network. The job manager runs jobs in the order in which they are submitted (FIFO), unless any jobs in its queue are promoted, demoted, canceled, or destroyed.

    Each worker is given a task from the running job by the job manager, executes the task, returns the result to the job manager, and then is given another task.

  • 9

    17

    When all tasks for a running job have been assigned to workers, the job manager starts running the next job with the next available worker.A MATLAB Distributed Computing Engine setup usually includes many workers that can all execute tasks simultaneously, speeding up execution of large MATLAB jobs.

    It is generally not important which worker executes a specific task. The workers evaluate tasks one at a time, returning the results to the job manager. The job manager then returns the results of all the tasks in the job to the client session.

    Job Manager

    18

    Interactions of Distributed Computing Sessions

  • 10

    19

    Configuration with Multiple Clients and Job Managers

    Problem we faced – network and firewall issues, multicasting

    20

    A network configuration that supports Jini.

    Jini technology facilitates communication between the machines and processes that comprise a distributed computing configuration.

    The MATLAB Distributed Computing Engine provides Jinias part of the job manager scripts, so there is no need for download or separate installation. Jini starts up automatically with the job manager service if it is not already running.

    More information on Jini:http://www.jini.org/

    Network Requirements

  • 11

    21

    • Distributed computing processes rely on a DNS service being present on the network in order to locate one another.

    • To allow communications between them, services of the MATLABDistributed Computing Engine supports multicast and unicast range. We are suing unicast PORT 27350 (MATLAB call this the Base-port). This port is configured in the mdce_def.sh file.

    • Distributed computing processes make use of several TCP ports.We can specify the port of the worker or job manager session by using the -baseport with the startup scripts.

    startjobmanager.sh -name MyJobManagerstartworker.sh -jobmanager MyJobManager

    Distributed computing processes will work correctly on machines with multiple NICs.

    Network Requirements

    22

    • On UNIX systems, the command: hostname -imust return the address of a network interface card (NIC) instead of the loopback address, so that distributed computing processes can recognize and communicate with each other.

    •The job manager’s checkpoint directories can grow to occupy a lot of disk space. Be sure to locate them where they can be accommodated.

    Network Requirements – cont.

  • 12

    23

    The Distributed Computing Toolbox and MATLAB Distributed Computing Engine are supported on Windows, UNIX, and Macintosh platforms. Mixed platforms are supported, so that the clients, job managers, and workers do not have to be on the same platform.

    Every machine that hosts a worker or job manager session must also run the MATLAB Distributed Computing Engine (MDCE) Service. The MDCE daemon makes it possible for these processes on different machines to communicatewith each other.

    The MDCE daemon must be configured on each machine that will be running a job manager or worker session.

    You must have root privileges to install the MDCE daemon.

    Supported Platforms

    24

    We will skip the administrative work involved with starting the job manager and the workers. Please see Pages 22-39 of the Distributed MATLAB manual.

    Note:

  • 13

    25

    A client session communicates with the job manager by calling methods and configuring properties of a job manager object.

    Though not often necessary, the client session can also access information about a worker session through aworker object.

    When we create a job in the client session, the job actually exists in the job manager. The client session has access to the job through a job object. Likewise, tasks that are defined for a job in the client session exist in the job manager, andCan be accessed through task objects.

    Components Represented in the Client

    26

    Program Development Guidelines

    When writing code for the Distributed Computing Toolbox, you should advance one step at a time in the complexity of your application.

    Verifying your program at each step prevents you from debugging several potential problems simultaneously. If you run into any problems at any step along the way, back up to the previous step and re-verify your code.

  • 14

    27

    The recommended programming practiceThe recommended programming practice for distributed computing applications are:1 Run code normally on your local machine. First verify your functions so that as you progress, you are not trying to debug the functions and the distribution at the same time.

    2 Run code distributed to only one node, where that node is likely the local machine. Create a job and task to verify that the function is working in a distributed computing model.

    3 Distribute the code to two nodes. Expand your job to include two tasks, preferably executed on two different workers.

    4 Distribute the code to N nodes. Scale up your job to include as many tasks as you need.

    The client session of MATLAB must be running the Java Virtual Machine (JVM) to use the Distributed Computing Toolbox. Do not start MATLAB with the -nojvm flag.

    28

    Life Cycle of a Job

  • 15

    29

    Stages of a JobPending - You create a job on the job manager with the create Job function in your client session of the Distributed Computing Toolbox. The job’s first state is pending. This is when you define the job by adding tasks to it.

    Queued - When you execute the submit function on a job, the job manager places the job in the queue, and the job’s state is queued. The job manager executes jobs in the queue in the sequence in which they are submitted, all jobs moving up the queue as the jobs before them are finished. You can change the order of the jobs in the queue with the promote and demote functions.

    Running - When a job reaches the top of the queue, the job manager distributes the job’s tasks to worker sessions for evaluation. The job’s state is running. If more workers are available than necessary for a job’s tasks, the job manager begins executing the next job. In this way, there can be more than one running job at a time.

    Finished – When all of a job’s tasks have been evaluated, a job is moved to the finished state. At this time, you can retrieve the results from all the tasks in the job with the function getAllOutputArguments.

    30

    Note that when a job is finished, it remains in the job manager, even if you clear all the objects from the client session. The job manager keeps all the jobs it has executed, until you restart the job manager in a clean state. Therefore, you canretrieve information from a job at a later time or in another client session, so long as the job manager has not been restarted with the -clean option.

  • 16

    31

    1 Find a Job Manager — Your network may have one or more job managersavailable. The function you use to find a job manager creates an object inyour current MATLAB session to represent the job manager that will runyour job.

    2 Create a Job — You create a job to hold a collection of tasks. The job existson the job manager, but a job object in the local MATLAB session representsthat job.

    3 Create Tasks — While your job is in the pending state, you can create tasksto add to the job. Each task of a job can be represented by a task object inyour local MATLAB session.

    4 Submit a Job to the Job Queue for Execution — When your job is completelydefined with all its tasks, you submit it to the queue in the job manager. Thejob manager distributes your job’s tasks to its workers for evaluation. Whenall of the workers are completed with the job’s tasks, the job manager movesthe job to the finished state.

    5 Retrieve the Job’s Results — The resulting data from the evaluation of thejob is available as a property value of each task object.

    Using the Distributed Computing Toolbox

    32

    1. Find a Job Managerjm = findResource('jobmanager','name','MyJobManager');

    2. Create a Jobj = createJob(jm);

    3. Create Tasks (sum is pre-defined)createTask(j, @sum, 1, {[1 1]});createTask(j, @sum, 1, {[2 2]});createTask(j, @sum, 1, {[3 3]});

    4. Submit a Job to the Job Queue for Executionsubmit(j);

    5. Retrieve the Job’s Results waitForState(j)results = getAllOutputArguments(j)results =[2][4][6]

    Example

  • 17

    33

    The dfeval function allows you to evaluate a function in a cluster of workers without having to define jobs and tasks yourself. When you can divide your job into similar tasks, using dfeval might be an appropriate way to run your job.

    results = dfeval(@sum, {[1 1] [2 2] [3 3]})results =[2][4][6]

    Example - a different approach

    34

    Evaluating Functions Synchronously Asynchronously

    The dfeval function operates synchronously, that is, it blocks the MATLAB command line until its execution is complete. If you want to send a job off to the job manager and get access to the command line while the job is being run asynchronously, you can use the dfevalasync function.The dfevalasync function operates in the same way as dfeval, except that it does not block the MATLAB command line, and it does not directly return results.

  • 18

    35

    function listWorkers()jm = findResource('jobmanager', 'LookupURL', 'grid0.cs.appstate.edu');

    numIdleWorkers = size(jm.IdleWorkers, 1);workerCount = 0;numBusyWorkers = size(jm.BusyWorkers, 1);workerCount = 0;

    fprintf(1, 'Idle: %i Busy: %i Total: %i\n\n', numIdleWorkers, numBusyWorkers, numIdleWorkers+numBusyWorkers);

    fprintf(1, '-----Idle Workers-----\n');for workerCount=1:numIdleWorkers

    fprintf(1, '%s (%s)\n', jm.IdleWorkers(workerCount).Name, jm.IdleWorkers(workerCount).HostName);

    endif numBusyWorkers > 0

    fprintf(1, '\n\n');

    fprintf(1, '-----Busy Workers-----\n');fprintf(1, 'Total: %i\n\n', numBusyWorkers);for workerCount=1:numBusyWorkers

    fprintf(1, '%s (%s)\n', jm.BusyWorkers(workerCount).Name, jm.BusyWorkers(workerCount).HostName);

    endend

    fprintf(1, '\n');

    36

    function watchWorkers()function watchWorkers()

    jm = findResource('jobmanager', 'LookupURL', 'grid0.cs.appstate.edu');run = true; % determines when to stop monitoring workersactive = false; % keeps track of if any workers are busy

    while(run)pause(0.5);% show how many workers are being allocatedfprintf(1, '%i of %i', jm.NumberOfBusyWorkers, jm.NumberOfIdleWorkers+ jm.NumberOfBusyWorkers);

    busy = jm.BusyWorkers;fprintf(1, ' --[');for workerCount=1:size(busy, 1)

    fprintf(1, ' %s ', busy(workerCount).Name);endfprintf(1, ' ]--');

    fprintf(1, '\n');

    if jm.NumberofBusyWorkers == 0if active == true

    fprintf(1, 'No activity\n');run = false;

    end

    elseactive = true;

    endend

  • 19

    37

    Please disable the McAfee

    Virus Scanner for the duration

    of Lab hours

    38

    High-Performance Image Content-Based Search

    Darren W. Greene *Rahman Tashakkori *

    Barry Kurtz *Steven H. Heffner **

    * Appalachian State University, Boone, NC** Wake Forest Medical School

    Fall 2006 Distributed Computing Workshop

  • 20

    39

    Introduction

    • Motivation– Digital Images (x-ray, CTs, MRI, ultrasound)– Locate Abnormalities (tumors, fractures)– Mammograms

    • Microcalcification Clusters

    40

    Image Used

    1.5256 x 256Mammogram3_sample.bmp2947 x 1022Mammogram3.bmp3

    0.5128 x 128Mammogram2_sample.bmp2478 x 1008Mammogram2.bmp2

    1.064 x 64Mammogram1_sample.bmp2758 x 1001Mammogram1.bmp1

    SampleScaling

    Sample Size(pixels)Sample Filename

    Image Size(pixels)Image FilenameTest Case

  • 21

    41

    Test Case 1

    Image: Mammogram1.bmp

    Sample: Mammogram1_sample.bmp

    42

    Test Case 2

    Sample: Mammogram2_sample.bmp

    Image: Mammogram2.bmp

  • 22

    43

    Test Case 3

    Image: Mammogram3.bmp

    Sample: Mammogram3_sample.bmp

    44

    The Search Methods

    • The effectiveness and efficiency of five methods were tested.– Standard Convolution (the baseline)– Iterative Convolution– Mean-Square Measures of Variance– Pearson’s Correlations– Haar Wavelet Lifting

  • 23

    45

    Summary of Search Methods

    Efficiency Comparison of the Various Search Methods

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    4096 16384 65536

    Area of Sample

    Tim

    e (s

    econ

    ds)

    Convolution Iterative Mean-Square Pearson's Correlations Haar Wavelet

    46

    ASU Cluster

    • 8 Machines• Connected by a Network Switch• Red Hat Enterprise Linux 3.0• MATLAB Distributed Toolbox

  • 24

    47

    ASU Cluster

    48

    Distributed Convolution

  • 25

    49

    Distributed Convolution

    50

    MATLAB Steps1) Find a Job Manager

    jm = findResource('jobmanager', 'LookupURL‘, 'grid0.cs.appstate.edu');

    2) Create a Job job = createJob(jm);

    set(job, 'FileDependencies', {'convolution.m'});

    3) Create Taskstask1 = createTask(job, @convolution, 1, {quadrant_1, sample});task2 = createTask(job, @convolution, 1, {quadrant_2, sample});

    4) Submit a Job to the Job Queue for Execution submit(job);

    5) Wait and Then Retrieve the Job’s Resultsresults = getAllOutputArguments(job);

    6) Destroy the Jobdestroy(job);

  • 26

    51

    Average Completion TimeAverage Time vs Number of Quadrants

    7.65

    66.91

    40.92

    21.83

    9.16

    0

    10

    20

    30

    40

    50

    60

    70

    80

    1 2 4 8 16

    Number of Quadrants

    Tim

    es (m

    inut

    es)

    Mammogram 3

    Mammogram 2

    Mammogram 1

    Average Time vs Number of Nodes

    0

    10

    20

    30

    40

    50

    60

    70

    80

    1 2 4 8

    Number of Nodes

    Tim

    es (m

    inut

    es)

    Mammogram 3

    Mammogram 2Mammogram 1

    52

    Final Conclusions

    • MATLAB’s DCT significantly reduced the complete time of standard convolution

    • More partitions increased the communication latency

  • 27

    53

    AKNOWLEDGEMENTSThe North Carolina Consortium for High PerformanceComputing Project funded by the UNC Office ofPresident

    The Department of Computer Science at Appalachian

    The Office of Students Research

    National Science Foundation Grant CSEMS-0123168 and CSEMS-0324002

    54

    Undergraduate Research Projects

    Consortium Member Universities with a cluster:Appalachian State UniversityUNCGUNCCUNCAUNCPNC A&TELONWake ForestFayetteville State UniversityWestern Carolina UniversityLenoir Rhyne CollegeHigh Point University

    Non-Consortium Workshop ParticipantsUNC-CentralUNCW (they have a grid program)

  • 28

    55

    Possibilities

    • Set up of the cluster at your university• Parallel Computing – MPI, PVM, etc• Grid Computing • Distributed MATLAB (requires DML)

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice


Recommended