ChangE71 Mark L. Chang Northwestern University Evanston, IL mchang@ece.nwu.edu Adaptive Computing in...

Post on 01-Apr-2015

215 views 3 download

Tags:

transcript

E71Chang

Mark L. Chang

Northwestern University

Evanston, IL

mchang@ece.nwu.edu

Adaptive Computing in NASAMulti-Spectral Image Processing

Scott A. Hauck

University of Washington

Seattle, WA

hauck@ee.washington.edu

E72Chang

Background

• (1991) Initiative by NASA to study Earth as an environmental system—Earth Science Enterprise (ESE)

• (1999) Launch of the first Earth Observation System (EOS) satellite, Terra

E73Chang

The Data Flow

• EOS divides telemetry processing into five levels with the following flow:

Receiver Level 0

L1L4L2

L3Instrument 1

Instrument 2

Instrument 3

Instrument n

E74Chang

The Processing Problem

• I/O intensive• Terra satellite generates ~918 Gbytes of data per day

• Current NASA-supported data holdings total ~125,000 Gbytes

• MODIS instrument accounts for over half the daily data and processing load

MODISInstrument

E75Chang

Why Adaptive Computing?

• Instrument dependent processing

• Data products involve many different algorithms

• Algorithms often change over the lifetime of the instrument

RAMRAMRAMRAMRAM

RAMRAMRAMRAMRAM

RAMRAM

E76Chang

MATCH Compiler

• Current mappings are done by hand• Hardware description languages (Verilog, VHDL)

• C program interface to adaptive compute engine

• Requires low-level understanding of the architecture

• MATCH == MATlab Compiler for Heterogeneous computing systems• MATLAB codes compiled to a configurable computing system

automatically

• Embedded processors, DSPs, and FPGAs

• Performance goals

• Within a factor of 2-4 of the best manual approach

• Optimize performance under resource constraints

E77Chang

MATCH Compiler Framework

• Parse MATLAB programs into intermediate representation

• Build data and control dependence graph

• Identify scopes for fine-grain, medium grain, and coarse grain parallelism

• Map operations to multiple FPGAs, multiple embedded processors and multiple DSP processors

• Automatic parallelization, scheduling, and mapping

E78Chang

MATCH Testbed

VME bus and chassis

Motorola MVME-2604embedded boards•IBM PowerPC 604•64 MB RAM•OS-9 OS•Ultra C compiler

Transtech TDMB 428DSP board•Four TDM 411 cards containing TI TMS 320C4 DSP, 8 MB RAM•TI C compiler

Annapolis Wildchild board•Nine XILINX 4010 FPGAs•2 MB RAM•Wildfire software

Development Environment:

SUN Solaris 2, HP HPUX and Windows

Ultra C/C++ for MVME

TI C for TMS320

XILINX XACT for XILINX

Force 5VMicroSPARC CPU64 MB RAM

E79Chang

Motivation for MATCH

• NASA scientists prefer MATLAB• High-level language, good for prototyping and development

• NASA applications are well-suited to the MATCH project• Lots of image and signal processing applications

• Same domain as users of embedded systems

• High degree of data parallelism

• Small degree of task parallelism

• NASA has an interest in adaptive technologies (ASDP)

• Will be a benchmark for the MATCH compiler

E710Chang

Multi-spectral Image Classification

• Want to classify a multi-spectral image in order to make it more useful for analysis by humans• Used to determine type of terrain being represented

• Similar to data compression

• Similar to clustering analysis

Pixel[000][000] = ForestPixel[123][123] = UrbanPixel[255][212] = TundraPixel[410][230] = Water

etc…

E711Chang

Multi-Spectral Classification

kP

i

kiT

ki

kddk

WXWX

PSXf

122/ 2

)()(exp

1

)2(

1)|(

E712Chang

MATLAB Iterative

for p=1:rows*cols % load pixel to process pixel = data( (p-1)*bands+1:p*bands );

class_total = zeros(classes,1); class_sum = zeros(classes,1);

% class loop for c=1:classes

class_total(c) = 0; class_sum(c) = 0;

% weight loop for w=1:bands:pattern_size(c)*bands-bands weight = class(c,w:w+bands-1); class_sum(c) = exp( -(k2(c)*sum( (pixel-weight').^2 ))) + class_sum(c); end

class_total(c) = class_sum(c) * k1(c); end results(p) = find( class_total == max( class_total ) )-1;end

kP

i

kiT

ki

kddk

WXWX

PSXf

122/ 2

)()(exp

1

)2(

1)|(

E713Chang

MATLAB Vectorized

% reshape dataweights = reshape(class',bands,pattern_size(1),classes);

for p=1:rows*cols % load pixel to process pixel = data( (p-1)*bands+1:p*bands);

% reshape pixel pixels = reshape(pixel(:,ones(1,patterns)), bands,pattern_size(1),classes);

% do calculation vec_res = k1(1).*sum(exp( -(k2(1).*sum((weights-pixels).^2)) )); vec_ans = find(vec_res==max(vec_res))-1; results(p) = vec_ans;end

kP

i

kiT

ki

kddk

WXWX

PSXf

122/ 2

)()(exp

1

)2(

1)|(

E714Chang

PE4PE3PE2PE1

PNNController

SubtractionUnit

PixelMemory

WeightMemory

SquareUnit

BandAccumulator

# of bands times

K2 MultUnit

K2[K]Memory

exp LUTUnit

exp / K1[K]K1[K] Mem

exp MultUnit

ClassAccumulator

# weights/class times

K1 MultUnit

ClassCompare

PE0

ResultMemory

Initial FPGA Mapping

kP

i

kiT

ki

kddk

WXWX

PSXf

122/ 2

)()(exp

1

)2(

1)|(

5% 67% 85% 82% 82%

E715Chang

Improving the Mapping

• Improve speed of PNN• Utilize all eight processing elements

• Time-multiplex low-rate functions

• Vary precision of multipliers/lookups

PE4PE3PE2PE1

PNNController

SubtractionUnit

PixelMemory

WeightMemory

SquareUnit

BandAccumulator

# of bands times

K2 MultUnit

K2[K]Memory

exp LUTUnit

exp / K1[K]K1[K] Mem

exp MultUnit

ClassAccumulator

# weights/class times

K1 MultUnit

ClassCompare

PE0

ResultMemory

1:1 1:1 1:4 1:4 1:20

kP

i

kiT

kik K

WXWXKSXf

1 21

)()(exp)|(

E716Chang

Optimized Mapping

PE0

Pixel Reader

PE1

SubtractSquare

PE2

SubtractSquare

PE3

SubtractSquare

PE4

SubtractSquare

PE5

K2 Multiplier

PE6

ExponentLookup

PE7

Class Accumulator

PE7

K1 MultiplierClass Comparison

5%

75%

85% 61% 54% 97%

kP

i

kiT

kik K

WXWXKSXf

1 21

)()(exp)|(

E717Chang

Results

Raw Image Data

Processed Image

Reference: HP C180 Workstation

Pixels Processed per Second

1.6

35.4

149

364

1942

5825

1

10

100

1000

10000

MatlabIterative

MatlabVectorized

Java C VHDL VHDL(2)

Method

Pix

els

E718Chang

Results (Cont’d)

Pixels Processed per Second

14.8

92

1942

5825

1

10

100

1000

10000

Java C VHDL VHDL(2)

Method

Pix

els

Reference:MATCH Testbed

Force 5VMicroSPARC CPU64 MB RAM

E719Chang

Results (Cont’d)Lines of Code

39 27

474371

2205

2480

0

500

1000

1500

2000

2500

3000

MatlabIterative

MatlabVectorized

Java C VHDL VHDL(2)

Method

Lin

es

E720Chang

Conclusions

• NASA is interested in adaptive computing

• NASA has many candidate applications• High processing loads and I/O requirements

• Applications are well-suited for acceleration using adaptive computing

• Scientists will want to write in MATLAB rather than C+VHDL

• Good benchmarks for the MATCH compiler

• Will help identify functions and procedures necessary for real-world applications