Systolic Arrays for Image Processing - LDOS -...

Systolic Arrays for Image Processing

J. Tasič, U. Burnik Faculty of Electrical and Computer Engineering

University of Ljubljana Tržaška 25

SI-61111 Ljubljana, Slovenia e-mail: [email protected]

Abstract

Recent advances in VLSI technology offer to the user to fabricate thousands of switching elements on a single chip. Computation power can be significantly increased by parallel processing of the algorithms mapped and designed in VLSI technology for applications that require large computational power. The major applications in image processing systems may be divided into few categories; pre-processing, coding, compression, restoration, segmentation, representation, description and interpretation. Introducing the systolic arrays into the area of image processing can significantly improve speed of processing. VLSI algorithms and arrays for image processing have received special attention in recent years.

In this paper we first intend to present some basic image restoration problems, introducing the importance of fast linear algebra algorithms. Then, some basic systolic arrays and algorithms will be presented. Also some basic mapping principles of algorithms into systolic arrays will be shown. Special attention would be done to the adaptive image filtering techniques.

Finally, the Singular Value Decomposition method would be applied in two-dimensional adaptive FIR filtering. A two-dimensional adaptive algorithm based on a Singular Value Decomposition (SVD) method will be presented as an example of systolic arrays applied in the area of image processing.

1 Image degradation models and restoration techniques

In image processing we are dealing with deterministic and stochastic representations of images, with improving the quality of images by removing degradation presented on image. In the process of the image restoration we try to restore an image from degraded one so that it is as close as possible to the original image. Some degradation contain random noise, interference, geometrical distortions, loss of contrast, blurring effects, etc..

Image restoration problem can be described as a problem of determining an appropriate inverse function to the degradation procedure. This is actually a two-side problem, first identifying the distortion function and then computing its inverse. Both can be combined into a single procedure. The most important problem is that image restoration is an ill-conditioned problem at best and a singular problem at worst.

189

J. Tasič, U. Burnik

For image restoration on digital computers we shall assume the input images of the procedure are in discrete form. Several linear algebra tools may be applied to find the solution if we suppose that the degradation is a linear procedure.

In all known methods we are dealing with enormous data and fast and effective algorithms or structures have to be applied. The same problem arises in the area of image reconstruction, where we have to reconstruct high resolution images or object by processing data obtained from views of the object from many different perspectives. Such problem is reconstruction of the 3-D object from 2-D projections in tomography. Convolution methods and Fourier transform techniques are extensively used in this area.

1.1 The Image Degradation Model

To attempt image restoration efficiently, the problem we are dealing with should be classified, so that we can find the most appropriate restoration method. On the other hand, a suitable model of image degradation should be selected considering the chosen restoration procedure.

An imaging system can be generally represented as a continuous system with point-spread function

h h x y f= ( , , , , )ξ η

and our distorted image will result from a convolution like

g x y h x y f d d( , ) ( , , , , ( , ))=−∞

∞

−∞

∞

∫∫ ξ η ξ η ξ η .

This is a very general notation. Imaging systems can be split into two main groups:

• non-linear systems

• linear systems

Furthermore, models in use can be continuous-continuous, continuous-discrete and discrete-discrete models. The most physically appropriate model of a general imaging system is the continuous-discrete model (continuous images are discretized by the imaging systems).

It is very difficult to find an inverse of the non-linear degradation function. Since the selected reconstruction methods are chosen from linear matrix algebra, it would be best to choose a discrete-discrete degradation model. In digital signal processing theory an image is represented by a matrix of discrete-position samples. Therefore, the observed image is presented as a sequence of numerical values which are stored in memory cells of the device that is used to process it. We are looking for effective mathematical tools, which can perform operations on an image presented in the fashion we now describe.

190


The objective of our study will, throughout this paper, be reduced to discrete linear point-spread degradation functions. For discrete image f ( , )x y degraded to image g( , )x y and subjected to additive noise n(x,y), we may write

g f h( , ) ( , ) ( , , , ) ( , )x y x y x y u v x yv

N

u

N

= +==∑∑

11n

or, alternatively, in tensor notation

[ ] [[ ]]{[ ]} [ ]G H F N= + .

with two-dimensional matrices G, F and N and using the four-index operator H. The notation may be simplified, restacking the original and distorted image as well as noise into vectors f, g and n, and representing the point-spread operator H as a N2 ×N2 matrix,

g Hf n= + .

It is assumed that the image has been sampled into an N ×N array of points.

It is obvious that the size of the arrays, especially for the matrix H, may become extremely large. The search for an inverse operator and even storage of such matrices in a monumental and rather inconvenient task.

For different classes of distortion functions, matrix H may take on special structures, reducing the complexity of finding an inverse function. There are four specific types of the spread function, listed by decreasing complexity:

• separable space-invariant point-spread function , A and B are Toeplitz matrices H A= ⊗ B

B

• nonseparable space-invariant point-spread function H is a block Toeplitz matrix

• separable space-variant point-spread function H A= ⊗

• nonseparable space-variant point-spread function H arbitrary

For purposes of our further study, we will limit our view to space-invariant point-spread functions.

The goal of digital image restoration is to perform a set of mathematical operations on a degraded image in order to get a result which should be as close to the undistorted image as possible:

T g . { } f→^

There are two main approaches to image restoration, traditionally, image filtering has been done in frequency domain. The experience gained by studying Fourier analysis-based filters can be important in creating more powerful linear algebra-based reconstruction structures.

191


In our particular case, we have to deal with digital uniformly-sampled image values. In order to keep the mathematical notation as clear as possible, most authors suggest presenting the image matrices (of dimensions M N× ) in the form of column vectors (of dimensions MN ×1).

The most common orientation in linear algebra is a least-squares approach. We are looking for an inverse function G which minimizes the error function

e=(G·g-f)T(G·g-f)

in the sense of its least squared value.

In many cases an information about the images is known such as smoothness, intensity distribution, therefore the mentioned equation can be modified using an additional constraint:

em=(G·g-f)T(G·g-f)+k(Zf)T(Zf)

where the matrix Z represents the intensity weighting for the overall smoothness of the image and k is the proper regularisation parameter. The inversion function can be presented as follows:

The common problem in solving the system for the inverse function ~f = is that we often encounter an ill-conditioned problem. Another difficulty, probably less explicit in 1-D signal processing, is the dimension of matrices. For a general problem with H having no special properties the solution for G is typically an inverse of a matrix with an enormous number of elements (6 for a low resolution image of 512×512 pixels).

Gg

8 1010. ⋅

In a situation where a limited finite-impulse response restoration filter would satisfactorily solve the problem, the use of convolution filter is a good alternative. The solution may then be represented by minimizing

e W g f= -∗∗

where the size of W is far smaller than the size of transformation matrix G. This solution is only appropriate for shift-invariant imaging functions. Normally, good results can be achieved even for a 3×3 convolution matrix W, but the biggest W we have ever used in simulations was a 9×9 matrix. The symbol ** in this equation stands for 2-D convolution. We have to point out that the solution in such a formulation exists only for linear, space-invariant distortion functions with finite (space-limited) response.

The two-dimensional convolution is a procedure, presented by the following equation:

$( , )f i j = ∑∑ g(i+ k - n, j+ l - n)w(k,l); n= M / 2l= 1

M

k= 1

M

.

The idea is to present this convolution filtering process by using matrix-vector multiplication. The filtering process can cleanly be expressed as a system of linear equations:

$f Xw= .

192


The matrix X now contains elements of the distorted image vector g, specially arranged to satisfy the above convolution expression. For this notation, the filtering matrix WM M× has to be restacked to a vector form;

w = [ ( , ), ( , ), . . . , ( , ), ( , ), . . . . , ( , )]w w w M w w M M T1 1 1 2 1 2 1 .

To calculate optimum filter weights, we will need to define the ordering of elements in the image signal matrix X. Similar to the 1-D problem, a signal matrix can be created by stacking partial image vectors into a new matrix.

Stacking all existing input filter values together we may define an (N - 2k)2 × M input signal matrix as:

X x x x= + −[ ( , ), ( , ), , ( , )]k k k k N k N k T1 K −

with input signal vector

x( , ) [g( , ), ,g( , ), g( , )]x y x k y k x y x k y k T= − − + +K K .

The process of image restoration is therefore the problem how to determine the optimum solution of the above expression. Several approaches are well known from the literature [21,22,...], some of them are listed below:

• Optimization domains

− Fourier Computation Methods

− Linear Algebraic Restoration

• Optimization methods

− Inverse filtering

− Constrained and Unconstrained Least Squares filtering

− Geometrical Mean filtering

− Maximum Entropy filtering

− Pseudo-Inverse filtering

− Natural Algorithms

1.3 Some popular image restoration procedures

1.3.1 2-D Median Filters

Image restoration of the video signal can be realized with 1-D median filtering performed on the frame to frame basis. Most applications however require two dimensional filtering. The key property of the 2-D median filtering is in combination of two separated 1-D filters, where one of them processes the pixels along the rows and the other the pixels along the columns. Virtual 2-D filter can be realized using two 1-D arrays of processors.

193


1.3.2 Neural Networks

There exists a lot of methods based on neural network model. A very popular statistical based method for image restoration is called maximum a posterior approach MAP Based on Bayes’ theorem, also known as Iterated Conditional Modes proposed by J. Besag. The method updates the current solution xs at pixel s by taking into account all the available information. The best estimate of xs, by given ys and all current estimates xS-{s}, can be obtained by maximization of the

P(xs|ys,xS-{s}) ∝ P(ys|xs)Ps(xs|xδs) for all s ∈ S.

It can happen that the optimization procedure fails, if it finds a local solution. In such case natural algorithms, such as genetic algorithms or simulated annealing can be applied in parallel or sequential order.

1.3.3 Wiener Filtering

A lot methods already well known from 1D signal processing problems are based on Wiener optimum solution.

Following the convolution filtering approach, presented in section 1.2, reconstructed image element could be defined by the inner vector product

$( , ) ( , )[ ( ), ( ), , ( , )]

( , ) [g( , ),g( , ), ,g( , ), g( , )].

f x y x yw 1,1 w 1,2 w M M

x y x k y k x k y k x y x k y kM k

T

T

==

= − − − + − + += +

w xw

xK

K K12 1

Based upon this interpretation, the Wiener error surface function is denoted as expectation of a single-point squared error value.

J( ) [e ( , )]J( ) [(f( , ) ( , )) ]J( ) [(f( , )) ] [ ( , ) f( , )] [ ( , ) ( , )]

WW w xW w x w x x

=

= −

= − +

E x yE x y x yE x y E x y x y E x y x y

T

T T

2

2

2 2 wT

c

Rw

For a stationary image, the expectation is a constant value. E x y[(f( , )) ]2 =

The next expectation, p x , is called cross-correlation vector between desired and distorted image.

= E x y x y[ ( , ) f( , )]

R x x= E x y x yT[ ( , ) ( , )] is an auto-correlation matrix of the distorted signal. This is probably the most important value not only for the Wiener approach, hence it determines the conditioning of the possible solution. Finally, we may write the squared-error surface function as

J( )W w p w= − +c T T2 .

194


For stationary images, the mean-squared error J(W) is exactly a second-order function of correlation coefficients W. For a parabolic surface it is well known that it possesses a unique minimum W0 at the point with zero gradient value:

∇ =

=

= − +=

0

2 2

ddJ( )WWp R

Rw p0

w

.

The equation is a so called normal equation that defines optimum solution for the convolution weight matrix W. We may notice that for a solution a matrix inversion will be required. As this is a complex procedure, first attempt to this problem uses very rough estimates of the optimum solution. This approach is called LMS algorithm [18,19] and it is still a very popular engineering approach to the adaptive filtering problem. The procedure uses instantaneous estimates of the gradient of the error surface J(W) to iteratively approach the optimum solution.

Rw p0 =

An extension of the procedure from 1D to 2D adaptive filtering can be made with searching for the solution either columnwise or rowwise. The solution seems good due to its low computational requirements. The problem of this approach is that for large variances of the input signal, which is specific to the image signals, the performance of the algorithm becomes very slow and unstable.

It would be better to determine the shape and orientation of a global error surface and then to calculate the best possible weight matrix in a single operation. Convolution with the resulting weight matrix can be used as reconstruction procedure. The entire solution is based on the following correlation estimates:

$R X= ⋅1N - M + 1

T X ,

$p X= ⋅1N - M + 1

T d

d

,

where d stands for desired image vector

[ ]d = + + + + + − + + − −f k k f k k f k N f k k f n k n k T( , ) ( , )... ( , )... ( , )... ( , )1 1 1 2 1 1 2 1 .

The normal equation is finally expressed as

R w pw R pw X X X

0

0

0

⋅ =

=

=

−

−

1

1( )T T

.

Although matrix R is generally smaller than matrix H the inversion may still remain an ill-conditioned problem and the solution for W unstable. That is the reason why special inversion techniques should be employed in solving such problems (SVD, eigendecomposition).

195


2 Systolic Linear Algebra Applications

2.1 An Overview of the Problem

The systolic array idea was first introduced by H.T. Kung and C. E. Leiserson [1] where such array is defined as a network of processors with rhythmical data computation and propagation along the system. In systolic arrays data is pumped from cell to cell among the array. In systolic arrays the required computations are performed concurrently in the cells. Jose Fortes et all in their article [2] systematically analyze different approaches to the transforming procedure of an algorithm represented in high level construct into systolic architecture. He grouped all known transforming procedures into four classes:

• direct mapping from the algorithm-representation level to the systolic architecture, • mapping from the algorithm representation over algorithm model into hardware, • mapping of the previous designed architectures into a new architecture, • symbolic transformations and transformations.

All nowadays known methods can be find in the second class, as H.T. Kung method, Moldovan and Fortes method, Miranker and Winklers method, S. Y. Kung method, Quinton method, Cappello and Steiglitz’s method, etc. In the Cappello and Stieglitz’s method each index corresponds to a single axis of the geometric space. Each point in this space corresponds to the processor with simple computational operations.

Among the researchers S.Y.Kung’s approach [4] is very popular, where the algorithm is presented by Signal Flow Graph (SFG). After some operations the resulting Signal Flow Graph with operation and delay modules maps straightforward into the systolic array.

Most modern DSP applications are based on linear algebra algorithms. In sequential algorithms the complexity of the algorithm depends on the required computation and storage capacity. The complexity analysis of the parallel algorithms includes another important parameter, the communication required. Therefore in massive parallel computation the most important factors are: computation, communication and memory.

Data distribution limitations and finite number of processing elements restrict ourselves to a special class of applications, where recursions and the local dependency play very important role. These restrictions influenced the generality of the possible mapping procedures.

2.2 Systolic Array Algorithms

After identification of tasks and possible VLSI architectures, new algorithms with degree of parallelism and regularity, with low communication overheads have to be developed [6].

Array algorithm is a set of rules solved finite number of steps on a multiple number of locally connected processors. The array algorithms are defined by sinchronisity, concurrency control, granularity and communication geometry. A tool for design of systolic algorithms has been proposed by Leiserson and Saxe [7].

This criterion defines a special class of algorithms that are recursive and locally dependent. The great majority of digital signal processing algorithms possess such properties. Typical

196


class of algorithms are matrix based algorithms. Major computational needs in signal processing and applied mathematical problems can be reduced to a basic set of matrix operations and other related algorithms [8]. All these algorithms involve repeated application of relatively simple operations, with regular structure and local interconnections of the computing network. This leads to the computational wavefront [5].

The recursive nature of the algorithm and local data dependency affects the creation of continuous waves of data. The computation starts with one element and propagates among the processor array. This concept of locality and recursivity provides a theoretical basis for the design of the highly parallel processor arrays.

2.3 Architecture of VLSI arrays

Several types of arrays are determined according to the data flow and according to the dimensions of the array. Usually there are one, two or three data paths, with same or opposite directions. The subclasses of one-dimensional, or linear arrays are referred as Unidirectional Linear Array (ULA), Bidirectional Linear Array (BLA), or Three path communication Linear Array (TLA) [6]. Triangular, square and hexagonal geometry of processor arrays are commonly used besides linear arrays. Some of the common architectures are presented in Figure 1.

(c)

(a) (b) (d)

Figure 1: Some examples of systolic arrays: (a) triangular array, (b) square array, (c) BLA array, (d) hexagonal array

2.4 Basic Linear Algebra Algorithms Used for Image Processing

Digital image and signal processing encompasses broad spectrum of mathematical methods. They are transform techniques, convolution, correlation techniques in filtering processes and set of linear algebraic methods like matrix multiplication, pseudo inverse calculation, linear system solver, different decomposition methods, geometric rotation and annihilation.

Generally we can classify all signal processing algorithms into two groups: basic matrix operations and special signal processing algorithms. Fortunately, most of the algorithms fall

197


in the classes of the matrix calculations, convolution, or transform type algorithms. These algorithms possess common properties such as regularity, locality and recursiveness.

In this paper we have to define the speedup of a parallel algorithm. It is defined as a ratio of the corresponding sequential and parallel times. If we define:

as number of processors, np

τ p time required by the algorithm for π processors, τ 1 time required by the same algorithm for one processor,

then the speedup is τ τ1 1/ p > .

Another important parameter is efficiency of the calculation defined as τ τ1 / ( ).np p

2.4.1 Inner vector multiplication

Inner product of two n dimensional vectors x and y is close to this number of steps. This product is obtained as product of the row vector u and the column vector v and can be given as:

T

. x y x yTj j

j

n

==∑

1

Sequentially it can be computed in (2n-1) steps, on parallel computer with n processors it can be computed in 1+log n steps. The speedup of the parallel version is approximately 2n/log(2n) and the achieved efficiency is 2/log(2n).

2.4.2 Matrix-vector multiplication

Matrix-vector multiplication algorithm of an n×m matrix A with a vector x of dimension m results in

y Ax=

where y is an n element vector. The i-th element of y is defined as:

y ai ij

m

==∑

1

xj j ,

where is of the matrix A. The Uniform Linear processor Array structure is convenient for this operation where one data stream is flowing to the right and the other data stream is flowing top down (Figure 2).

aij

198


Figure 2: Processor array for matrix-vector multiplication

The proposed parallel solution uses linear processor array with n processor elements required. Processor array is shown on Figure 2. The total execution time of the algorithm equals t=2n-1.

2.4.3 Matrix-matrix multiplication

Matrix-Matrix multiplication algorithm of an n×m matrix A with n×p matrix B results in new matrix denoted by C of dimension m×p. Matrix C is given by C=A·B where the elements are defined as:

c aij ik kjk

n

==∑

1

b .

This method can be realized with the array of processors of dimension m×p. The principle is the same as on Figure 4. The connections are realized in horizontal and in vertical directions. Therefore the mesh connections of Linear processor Array structure is convenient for this operation where the data stream of matrix B is flowing to the right and the data stream of matrix A is flowing top down. The elements of matrix C are stored in the appropriate processors of the array. In the case of the matrix-matrix computation the expected speedup is O n

n( )log3

.

2.4.4 Linear equation solvers

Solving a system of linear equations is one of the most important problems in DSP. The problem is to find the solution vector x of dimension (n×1) for a given n linear equations Ax=y, where A is nonsingular matrix of dimension (n×n). The problem can be solved by computing an inverse matrix , that is x= y. This inversion matrix computation procedure is computationally very intensive, and procedure is numerically unstable. The approach using the triangularization procedure is often in use to triangularize matrix A. An

A−1 A−1

199


upper triangular system , where is an n×n upper triangular system is finally solved by back-substitution.

A x y* = * A*

AQ⋅

= ⋅

−

+n

ki

kk

aa

1 2)

cossin

an

Q k i

ki

( ,

θ

In the numerical analysis literature there are many matrix triangularisation methods as Gauss elimination, QR and LU decomposition or other methods. Also other effective methods for solving the system of equations exist. They are bidiagonalization methods and Singular Value Decomposition methods.

Triangularisation of matrices

Different techniques may be applied to obtain triangular matrix decomposition. The most commonly used are methods using Givens rotations or Householder reflections.

Although Householder reflections are proven to be more efficient in sequential algorithms, this is not the case for parallel execution. Using O( n ) processors, direct implementation of Householder’s reduction and the Gram-Schmidt algorithm require O(n.log n) steps. Given’s reduction can be modified to produce a parallel algorithm in O(n) steps with the same number of processors.

2

The QR tridiagonization procedure uses Givens rotations to annihilate lower triangular elements. For each annihilation, one rotation is to be performed. The entire process of tridiagonization could be written as :

R QQ Q Q Q

Q Q Q

== ⋅ ⋅ ⋅ ⋅

⋅

=

=

T

Tk n

kk k k n

ki ki

ki ki

1 0

0 1

K K

K( , ( , )

) sincos

arct

θ θθ θ

After the algorithm has been transformed into a system of uniform recurrence equations, the mapping to a systolic structure is straightforward. The result is a triangular systolic array, as shown on Figure 3.

Two different purpose processor elements are used. Elements on the diagonal are simply delay elements used to transfer the values of b coming from the top to the right. Other elements perform Givens parameter generation in the first operational step and Givens rotations afterwards. The results can be obtained from the right side of the array.

Actually, n(n-1)/2 processor elements are required, as the delay elements on the diagonal of the array are can simply be realized using registers instead of processor elements.

200


First step:Givens generationthen:Givens rotation

Figure 3: Triangular array for QR decomposition

Eigenvalue and Singular Value Decomposition Problem

Another important methods in signal and image processing are eigenvalue/eigenvector and singular value/vector decomposition methods. Some parallel algorithms have been developed like parallel version of the Jacobi and Jacobi-like algorithms, QR algorithm for obtaining several eigenvalues of a symmetric tridiagonal matrix [10], etc.

Jacobi algorithm is described in Golub [11] and in Wilkinson, Reinsch [12]. A real symmetric matrix A can be reduced to the diagonal form by a sequence of plane rotations. In practice this iterative process of reduction of the off-diagonal elements is terminated when these of-diagonal elements become negligible comparing to the elements on the main diagonal. Classical Jacobi algorithm eliminates the element in the position (p, q) and its symmetric counterpart. The main task is to find a sequence of reduction the off-diagonal element in parallel, where we are not concerned about destroying zeros that we previously introduced. It is possible to eliminate more than one element simultaneously in one sweep. Maximal number of the annihilated off-diagonal elements in one sweep is (n2-n)/2 pairs. In approximately few (8) sweeps the matrix becomes practically diagonal. The diagonal elements represent the eigenvalues ant he products of individual transformations are taken as

201


the eigenvectors. In the structure of O(n2) processors, one sweep requires O(n) steps yielding a speedup over sequential algorithm of O(n2). The suggested array is shown on Figure 4.

Figure 4: Systolic array implementation of the Jacobi decomposition

Other methods reduce the matrix to a tridiagonal form or upper Hessenberg form, depending if matrix is symmetric or not. If the matrix is symmetric tridiagonal, we may apply the QR algorithm. This method is described in Reisch Wilkinson [12].

Singular Value Decomposition of matrices is useful in multidimensional signal processing. Matrix A can be factorized in Q , where QQT

1Σ 2 1 is an mxm orthogonal matrix an Q2 is an nxn

orthogonal matrix and has the diagonal form Σ Σ = where D 00 0

D r= diag( , ,..., )σ σ σ1 2 ,

σ σ σ1 2≥ ≥ ... r ≥ 0 and r is rank of matrix A. The form is called

the SVD of the matrix A, where the singular values

A Q Q u vTi

i

r

i iT= =

=∑1 2

1

Σ σ

σ i are the square roots of the nonzero eigenvalues of ATA and ui and vi are column vectors of the matrices Q1 and Q2 respectively. The column vectors of Q1 and Q2 are the eigenvectors of ATA.

The preferable method for solving the SVD problem is described in Golub and Van Loan [11]. The described technique finds U and V simultaneously by simply applying the symmetric QR algorithm to ATA.

This method can be also applied for solving the common problem in signal and image processing, the least square problem.

202


3 LS SVD digital image filtering

For illustration, the use of singular value decomposition in two-dimensional filtering applications will be presented. First, the Wiener solution will be extended to two-dimensional problem, introducing special formulation of an image signal matrix. The problem will be solved algebraically with two-dimensional convolution filter implemented. The Wiener normal equation will be solved by using singular value decomposition of the image signal matrix. The effectiveness of the suggested method will be illustrated on a practical filtering problem.

3.1 Image Restoration

We have decided to represent the degradation model for our imaging system in a form of discrete linear point-spread degradation functions. For discrete image F degraded to image G and subjected to additive noise N, we may write

g x y h x y u v f u v n x yv

N

u

N

( , ) ( , , , ) ( , ) ( , )= +==∑∑

11

or, alternatively, in tensor notation [ ] [[ ]]{[ ]} [ ]G H F N= +

with two-dimensional matrices G, F and N and using the four-index operator H [22].

The objective of restoration is to find an inverse to the degradation function. The solution presented is not valid for all cases of image degradation. In some cases it is possible to use convolution filter to restore the image. The solution may then be represented in a form of

$F W G= ∗ ∗ ,

with symbol ** standing for 2-D convolution. We have to point out that the solution in such a formulation exists only for linear, space-invariant distortion functions with finite (space-limited) response.

The general adaptive filter representation for this case is illustrated in Figure 5.

The filter operates on a real image (signal matrix) X that is corrupted with noise. The desired signal (reference image) is also provided. The filtering parameters can be represented in form of an N×N matrix W, and the filtering process may be represented by convoluting the image input X with the matrix W. During the adaptation, the filtering weights may be changed in order to obtain optimal solution.

The filtering result is given by

$f( , ) w( , ) g( , )x y i j x i k y i kj

M

i

M

= + −==∑∑

00

M = 2k+ 1

+ −

203


Figure 5: SVD based 2-D adaptive filter

The difference between the desired and the resulting image

e( , ) f( , ) $f( , )x y x y x y= −

is called the estimation error. From Wiener filter theory, optimal filtering coefficients W are defined by the minimum mean-square error criteria. The objective function

should be minimized for W to obtain the optimum filter. J( ) [e ( , )]W = E x y2

For this particular example, Wiener optimal solution is to be applied. The idea is well known from 1-D adaptive filtering, where instantaneous estimates of gradient of the error surface J(W) are used to approach the optimum solution iteratively. The algorithm is popularly called LMS algorithm. It is possible to extend the algorithm to be used in both x and y image dimensions, iteratively searching for the solution either columnwise or rowwise. The procedure is numerically convenient due to low storage and computing requirements. The problem of this approach is that the instantaneous estimates of the error surface have relatively large variances. The estimate of their gradient vectors may then not always be pointing to a global optimum; the fact could cause unstable performance of the algorithm. The stability may be improved using smaller adaptation step-size, however this seriously affects the convergence rate of the procedure.

As already suggested in Section 1.3.3, the normal equation

Rw p0 =

is to be solved for w, using special inversion techniques. The one proven to be very efficient is the method using SVD matrix decomposition.

204


3.3 Singular Value Decomposition

One of the methods for the stable inversion process of the matrix R X X= T is called SVD pseudo-inversion.

Better than calculate straight inverse of R X is to apply pseudo-inversion technique directly on matrix X. The solution for W can be expressed directly as

X− =1 ( T −1)

dw X0 = +

where pseudo-inverse X is defined in terms of the products of the singular-value decomposition of X. The procedure is numerically stable and its solution is unique in that its vector norm is minimum [18].

+

= ΣU XVT

a

b

c

d

e

f

Figure 6: Sharpening of blurred image using 2D LS SVD algorithm (a) original image, (b) Image, blurred with 5-by-5 low pass convolution filter, additive noise,

(c) b, restored with 7-by-7 2D LS SVD filter, (d) b, inverse filtered, (e) b, noiseless, (f) e, restored

Convolution operator W can be created by restacking the values of the vector w back to the M M× matrix form:

205


wu d

v==≠

∑ iT

ii

M

i

i

σσ

10

2

Ww w

w w=

− +

( ) ( )

( ( ) ) ( )

1

1 1 2

L

M O M

L

M

M M M.

The non-iteratively calculated filtering parameters are optimal for the specific image/distorted image combination. They may be directly applied in a classical two-dimensional convolution filter.

3.4 Implementation of the procedure

The procedure may be implemented as a systolic array algorithm. The actual algorithm is to be combined out of partial linear algebra solutions presented in section 2. Note that the array to perform singular value decomposition is almost identical to eigendecomposition array.

Figure 6 represents sharpening of blurred images using two-dimensional least-square SVD algorithm. The original image is shown on Figure 6a. The image was blurred using a 5-by-5 low-pass convolution filter. Some uncorelatted noise has been added at the end of the blurring procedure (Figure 6b). Image has been restored using a 7×7 adaptive algorithm; the result is shown on Figure 6c. The results of the inverse filtering of the same image are presented on Figure 6d. From the result we can deduct that the proposed algorithm is less sensible than classical inverse filtering procedure. The image blurred without presence of noise and the noiseless image sharpened by using 2D LS SVD algorithm are shown on Figures 6e and 6f, respectively.

a

b

c

Figure 7: Removing of noise using 2D LS SVD algorithm

(a) noisy image, (b) a, 2D LS SVD filtered, (c) a, low-pass filtered

The same procedure applied in noise removal is shown on Figure 7. From the images it is clear, that the 2D LS SVD algorithm does not converge to expected low-pass solution. A big amount of the uncorrelated noise has been successfully removed from the image preserving sharp edges of the image. The softening of the image contours is a common problem when low-pass filters are used for noise removal (Figure 7c)

206


The simulation results show that the Wiener filtering principle can successfully be implemented in image restoration. Methods well known from the linear algebra theory may be applied instead of classical methods based on Fourier transformation. The effectiveness of the procedure may be improved using special updating techniques.

4 Conclusion

Characteristically for almost all presented linear algebra operations, suggested to use in digital signal processing applications is that they consist of a huge number of relatively basic mathematical operations. The fact that the operations are repetitive, yet applied on a wide set of data inspired us to employ several processor elements performing the same task on separate data elements in parallel. Special properties of the mentioned processing problem allow us to construct a massive array of equal processor elements, which concurrently perform the necessary numerical operations.

There exist several well-known parallel computer architectures; the architecture may vary according to the applied processor elements, reconfigurability, data interchange connections, etc. The architecture to be applied on a specific problem depends mostly on a problem itself. As the digital signal processing demands high speed computing with fixed procedures in use at relatively low cost, general-purpose parallel computers are not convenient for use. Digital signal processing is a data-oriented computing problem, so architectures with global data interchange are to be omitted. What we really need is an array of locally interconnected processor elements with local memory. The processor elements should sinhronuously perform the same set of operations on the data structure. This architecture is a systolic array - the rhythmical operation of the array reminds us to the systolic of the heart

The basic approach to mapping techniques and some possible applications were presented in this chapter. However, this was only a brief introduction to the world of special-purpose VLSI systolic architecture. More details on the described procedures as well as on optimization techniques not presented here may be found from the literature.

Bibliography

[1] H. T. Kung and C. E. Leiserson, Systolic Arrays (for VLSI), Tech. rep. CS-79-103, Carneige Mellon University, Pitsburg, PA, Apr. 1978

[2] A. B. Fortes, K. S. Fu, B. W. Wah, Systematic approaches to the design of algorithmic specified systolic arrays, Proc. IEEE ICASSP´85, IEEE Computer Society Press, March 1985, pp. 300-303

[3] R. Cappello, K. Stieglitz, Unifying VLSI array designs with geometric transformations, Proc. of. 1983 Int Conf. on Parallel Processing, 1983, pp. 448-457

[4] S. Y. Kung, VLSI Array Processors, Prentice Hall, 1988

[5] S. Y. Kung, K. S. Arun, R. J. Gal-Ezer, D. B. Rao, Wavefront array processor: Language, architecture and applications, IEEE Tr. Comput., c-31 1982, pp. 1054-1066.

207


[6] M. Gušev, Processor Array Implementations of Systems of Affine Recurrence Equations for Digital Signal Processing, PhD dissertation, University of Ljubljana, June 1992.

[7] C. E. Leiserson, J. B. Saxe, Optimizing synchronous systems, J.VLSI and Computer Systems, 1983, pp 41-67

[8] S. Y. Kung, VLSI array processor for signal processing, Conf. Advanced Res. in Integrated Circuits, MIT, Cambridge, Jan. 28-30, 1980.

[9] H. T. Kung, Notes on VLSI computation, Parallel Processing Systems, D.J. Evans ed., Cambridge University Press, 1983, pp 339-356.

[10] Sameh, Numerical Parallel Algorithms-A Survey; High Speed Computer and Algorithm Organisation, 1977, pp. 207-228, Academic Press, 1977

[11] H. Golub, C. F. Van Loan, Matrix computations, John Hopkins University Press, Baltimore, London, 1989.

[12] H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, London, 1965.

[13] I. Moldovan, On the analysis and synthesis of VLSI algorithms, IEEE Trans. Comput., 31 (1982), pp. 1121-1126

[14] R. P. Brent, , F. T. Luk and C. Van Loan, Computation of the Singular Value Decomposition using mesh-connected processors. J. VLSI and Computer Systems, vol. 1 no. 3, pp. 250-260, 1985.

[15] R. P. Brent, F. T. Luk. The solution of singular-value and symmetric eigenvalue prob-lems on multiprocessor arrays. SIAM J. Sci. Stat. Comput. vol. 6, no. 1, January 1985.

[16] L. Thiele, Computational arrays for cyclic-by rows Jacobi Algorithms. SVD and Signal Processing, Algorithms, Applications and Architectures, Elsevier Science Publishers B. V. North Holland, 1988.

[17] U. Burnik, G. Cain and J. Tasič, On the Parallel Jacobi Method Based Eigenfilters, COST 229 WG4 Workshop on Parallel Computing, Funchal, Portugal, 1993.

[18] S. Haykin, Modern Filters, Macmillan, New York, 1989.

[19] S. Haykin, Adaptive Filter Theory, Prentice Hall, Englewood Cliffs, N. J., 1991.

[20] J. Tasič, et. al. Eigenanalysis in Adaptive FIR Filtering, Internal Report, University of Westminster, January 1993.

[21] R. C. Gonzales, R. C. Woods, Digital Image Processing, Addison-Wesley Publishing Company, 1992.

[22] H. C. Andrews, B. R. Hunt, Digital Image Restoration, Prentice Hall, Englewood Cliffs, New Jersey, 1977.

208

Date post:	04-Mar-2018
Category:	Documents
Upload:	hacong
View:	227 times
Download:	2 times

Systolic Arrays for Image Processing - LDOS -...

Documents