[IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA...

Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007

A Hardware-friendly Support Vector Machine for EmbeddedAutomotive Applications

Davide Anguita, Member, IEEE, Alessandro Ghio, Student Member, IEEE,Stefano Pischiutta, Sandro Ridella, Member, IEEE

Abstract-We present here a hardware-friendly version ofthe Support Vector Machine (SVM), which is useful to imple-ment its feed-forward phase on limited-resources devices suchas Field Programmable Gate Arrays (FPGAs) or microcon-trollers, where a floating-point unit is seldom available. Ourproposal is tested on a machine-vision benchmark dataset forautomotive applications.

I. INTRODUCTION

In the last years, the interest in hardware implementationsof Support Vector Machines (SVM) has increased and severalexamples have appeared in the literature, targeting bothdigital [1] and analog or hybrid implementations [2], [3].The final goal of these proposals is to build dedicateddevices for embedded systems, where it is necessary tosatisfy severe resource constraints on power consumption andsilicon area. In fact, digital embedded systems often rely onMicrocontrollers, Digital Signal Processors (DSPs) or FieldProgrammable Gate Arrays (FPGAs), which allow to reachbetter resource/performance ratios than general-purpose mi-croprocessors, but require a careful implementation design.

The main problems in implementing the SVM feed-forward phase on FPGAs or resource-limited processors are,among others: (a) the short register length used for storingthe variables of the trained machine, and (b) the compu-tational requirements needed to compute the SVM kernel.Usually, unless the training is performed on-chip [1], thecoefficients of the SVM are found off-line, on a conventionalcomputer and using floating-point arithmetic; only later theyare translated in a more suitable form (e.g. fixed-point orinteger arithmetic). The effect of this translation, which canbe performed by rounding or truncating the SVM coefficientsto the desired precision, gives rise to a quantization noise,which can be predicted, with good approximation, for mostpractical purposes [4]. Furthermore, if all the variables ofthe trained SVM are translated in fixed-point format, it ispossible to take advantage of recent results and avoid the useof expensive (in terms of hardware resources) multipliers, sothat the kernel computation becomes easier to implement [5].

Here we propose a different approach: we slightly modifythe process of finding the coefficients of the SVM, bydescribing a method for building a "hardware-friendly" SVMfrom the beginning, starting from the training phase, so to

D. Anguita, A. Ghio, S. Pischiutta and S. Ridella are with the Departmentof Biophysical and Electronic Engineering (DIBE), University of Genoa,Via Opera Pia lIA, 16145 Genoa, Italy (e-mail {Davide.Anguita, San-dro.Ridella} ounige.it, {ghio, pischiutta} (dibe.unige.it).

build a SVM that is more sound from a theoretical point ofview.

Section II describes briefly the hardware-friendly SVM;Section III details a method for searching its optimal hyper-parameters and, finally, Section IV shows the experimentalsresults on an automotive application (i.e. pedestrian detec-tion).

II. A SUPPORT VECTOR MACHINE FOR DIGITALHARDWARE

Let us consider a dataset composed by I patterns{(X1, Yi), , (x1, Yl)} where xi C sRm and yi = ±1.The SVM learning phase consists in solving the followingConstrained Quadratic Programming (CQP) problem:

minmoITQa + rTaa 2 (1)

where ri = 1 Vi, and Q is a symmetric positive semidef-inite I x I matrix qij = yjyjK(xi, xj) defined through aMercer kernel K(., ) [6]. After solving the above problem,the feed-forward phase can be computed as

(2)f (x) = E yiaiK(xi, x)i=l

Note that, differently from the conventional CQP problem[6], we omit the equality constraint Ei yiai = 0, which isequivalent to search for a SVM without the bias term (b = 0).The advantage of this simpler formulation is obvious, whendealing with limited precision variables, because it allowsus to avoid the computation of b. In fact, it is well-knownthat the bias term does not appear in Eq. (1) and it must becomputed using the Karush-Kuhn-Tucker (KKT) conditions[7], which are satisfied only at the solution point o*. Sinceour objective is to find a fixed-point solution, which willdiffer, in general, from o*, it would be incorrect to computethe bias term through the KKT conditions [8]. It can benoted, however, that the theoretical framework of the SVMis preserved, even with b = 0, if the kernel satisfies someweak density properties [9]. Valid kernels are, for example,the Gaussian and the Laplacian ones: in these cases, the biasterm can be safely removed.

In order to build a hardware-friendly SVM, we performthe following normalization

2n 1/3i = ozi C (3)

1-4244-1 380-X/07/$25.00 ©2007 IEEE

and apply it to Eq. (1), obtaining

minI TQO + sT/2

O<Q3i< 2 1 Vi[1,...,l ]

(4)

(5)where si 2n 1 Vi. Note that this formulation providesCthe same solution of the original one, but allows us to removethe regularization constant C from the constraints, making itpossible to represent the integer part of the coefficients ofthe SVM with exactly n bits.

The feedfoward phase of the SVM can now be computedby

f (x) = y /3iK(x, x). (6)i=l

The output of the SVM is scaled by 2C 1, respect tothe original one, but its sign, which corresponds to theclassification label, is not affected.Any dense positive definite kernel can be used in the above

formulation, but a Laplacian kernel is easier to implementin digital hardware, since it avoids the use of expensivemultipliers and requires only some shifters and adders [5].We choose here the following Laplacian kernel:

K(xi,x) = 2-'xi-Xx1 (7)

where ay> O is the kernel hyperparameter and the 1-norm isdefined as l x lj = Ei lxj

To allow for a simple computation of the kernel, we alsorestrict the values of the hyperparameter to be a power oftwo:

Y = 2P (8)

where p is a (negative or positive) integer value.Let us suppose to represent the kernel values with u > 1

bits, then

0 < K(xi,xj) < 1-2 U ie [1,1] (9)

(note that the value K(x, x)Analogously, we suppose

with v > 1 bits, so that

1I must be treated separately).to represent each feature of x

° < xi < 1 2-v Vi C [1,m] (10)

where m is the dimensionality of x.

III. SEARCHING FOR THE OPTIMAL HYPERPARAMETERS

The tuning of the SVM hyperparameters (C, -y) is one ofthe main problems that must be solved when searching forthe optimal classifier. Even though some practical methodshave been suggested for deriving them in a very simple andefficient way [10], the most effective procedure is to solvethe related CQP problem several times, with different (C, -)pairs, and estimate the generalization error at each step,by means of some resampling method (e.g. k-fold Cross-Validation) [11]. Finally, the optimal hyperparameters arechosen in correspondence to the minimum of the estimatedgeneralization error.

Unfortunately, both the number of steps and the size ofthe searching space can severely influence the quality ofthe solution and the amount of computation time neededby the search procedure. Some proposals exist for findingthe admissible search-space for C [12], but no equivalentmethod is available for y.We show here that, in the case of a hardware-friendly

SVM, the number of admissible values for the hyperparam-eter ai can be defined in advance, allowing us to explore thesearch-space exhaustively.

Let us define, for simplifying the notation, Eii-x lx 1, and Ki = K (xi, x).We find the desired result by making use of Interval Arith-

metic [13], which simplifies the derivation of the search-space upper and lower bounds. Within this framework, Eq.(9) can be rewritten as:

KiC[0,I2 u] (1 1)Similarly, the bounds for the data features of Eq. (10) can

be expressed by the following:

Xi C [O, 1-2-V]. (12)

Note that, if Ki = VVli, then, from Eq. (6), f(x) = 0; inother words, the classifier is not able to choose between thetwo classes. This implies that, for the correct functioning ofthe SVM, an index j must exist such that Kj :t 0. Therefore,if we quantize the kernel using u bits, there will be at leastone case for which Ki > 2-u, then the interval for Ki canbe refined:

Ki C[2 u1 2 u] (13)

The interval for the exponent Ei can be easily defined as:

Ei C [2-V m(1 -2-v)]. (14)

The lower bound follows from the fact that there is atleast one Support Vector, which differs for at least onebit from the input vector x; otherwise, the input vectorwould coincide with a Support Vector and the class wouldbe already determined. The upper bound, instead, can bedetermined by considering the case when all the features ofthe input vector differs the most from any Support Vector.

Using Eq. (8), we can write:

Ki = 2-2Ei

and applying it to Eq. (13):

2 2PEi c [2 1 -2 u]

Applying a logarithm to the interval, we obtain:

-2PE C [- ,log2(1-22)].

Interval Arithmetic properties state that a Cequivalent to -a C [-c, -b], then:

2PEi C [d, U]

(15)

(16)

(17)

[b, c] is

(18)

where d =-log2 (1- 2-u) > 0. Let us apply again thelogarithm to obtain:

p + log2 Ei C [log2 a, log2 U] (19)

that can be written as:

p C [log)2 0log2 (m (1 -2-v)), v +log2u]

TABLE IBOUNDS FORp E [a, b] USING THE WORST-CASE ANALYSIS OF EQ. (21).

(20)

Since p must be an integer value, a reasonable choice is:

[log2 () -log2 (m (1 -2-v)) < p < Fv + log2 Ul (21)Sometimes, this interval defines a space of admissible

values for a which is quite wide; this is due mainly tothe worst-case analysis, which does not take in accountthe actual values stored in the dataset. Alternatively, it ispossible to use the minimum and the maximum value of theexponent Ei C [Emin,Emax], computed using the actualtraining patterns. In this case, by substituting these values inEq. (21), the admissible range becomes:

1log2 () -log2 Emax] < p < 1log2 Ulog2 Emin] (22)IV. EXPERIMENTAL RESULTS

We tested our method on an automotive application, wherethe use of embedded systems is widespread, consisting inthe detection of pedestrians against the background or otherobjects. We use the Daimler-Chrysler dataset [14], which hasbeen introduced to benchmark these kind of systems. Thedataset consists of 8-bits grayscale images (36x 18 pixels),divided in 4900 patterns representing a pedestrian crossing aroad and 4900 non-pedestrian examples. We used one eighthof the entire set for training (1225), by randomly samplingit, and the remaining patterns for testing (8575).

During the training phase, the kernel is computed a prioriwith u bits by rounding the real value to the nearest fixed-point one. The optimization process is performed through amodified Sequential Minimal Optimization (SMO) algorithm[15] where only one ai is changed at each iteration [16]. Thisis possible because we removed the equality constraint in Eq.(1).

The optimal hyperparameters are found using the methoddescribed in the previous sections and a 10-fold Cross-Validation procedure to estimate the generalization error. Inthe model selection step, the floating-point a solution isused: the corresponding integer solution is found aposteriori,using n bits for representing 3. Each variable of the SVMhas been found by rounding the corresponding floating-point value to the nearest fixed-point or integer one: in aforthcoming paper we will analyse more effective techniques[16].The number of bits of the variables describing the SVM

is explored for testing its effect on the classifier performanceand compared with the floating-point version. Table I and IIshow the lower and the upper bounds for the exponent p ofthe hyperparameter a using the worst-case analysis of Eq.(21) and the actual data analysis of Eq. (22), respectively.It can be seen that the obtained intervals are quite similar(and sometimes outperform) the bounds proposed, using onlypractical considerations, by [17].

Table III shows the results obtained using the floating-point SVM in double precision (64 bits)'.

1 The software used for the experiments is available athttp://www.smartlab.dibe.unige.it

n=6 8 12 16u = 6 [-15, 9] [-15, 11] [-15, 15] [-15, 19]

8 [-17, 9] [-17, 11] [-17, 15] [-17, 19]12 [-21, 10] [-21, 12] [-21, 16] [-21, 20]16 [-25, 10] [-25, 12] [-25, 16] [-25, 20]

TABLE IIBOUNDS FOR p E [a, b] USING THE DATA SET ANALYSIS OF EQ. (22).

n=6 8 12 16u = 6 [-15, 9] [-15, 11] [-15, 11] [-15, 11]

8 [-17, 9] [-17, 11] [-17, 11] [-17, 11]12 [-21, 10] [-21, 12] [-21, 12] [-21, 12]16 [-25, 10] [-25, 12] [-25, 12] [-25, 12]

Table IV details the error rates obtained on the test set byvarying the number of bits of the kernel (u) and the numberof bits of the coefficients 3 (n). The number of bits of theinput data has been fixed to v = 8, because they representpixel gray values. As can be seen from the experimentalresults, 12 bits are sufficient for both u and n, in order toobtain the same classification rate of the floating-point SVM.The same data are presented in Fig. 1, where it can be seenthat there is a large region, corresponding to a small numberof bits, where the performance is equivalent to the floating-point version.

25

20 -

F

S 15-0

10 -

56

8

1214

15

16 20

n - # of bits used for beta- # of bits used for kernel

Fig. 1. Trend of the error rate varying u and n.

In order to analyze the effect of each variable separately,we use the floating-point precision for representing twovariables, while the third one is quantized with a changingnumber of bits.

Fig. 2 shows the error rate when varying the number of bits(n) used for the coefficients 3. As can be seen, 12 bits aresufficient for obtaining the same performance of the floating-point solution.

Fig. 3 shows the error rate when varying the number ofbits (u) used for the kernel. In this case, 8 bits are sufficient

TABLE III

ERROR RATE ON THE TEST SET FOR THE FLOATING-POINT SVM.

|C | Error rate (%)10 2-7( 5.96

TABLE IV

ERROR RATES FOR V = 8 BITS.

n = 6 8 12 16u = 6 14.58 10.54 9.28 9.29

8 17.66 9.83 6.18 6.1912 22.97 11.38 5.96 5.9616 20.58 10.53 5.95 5.96

50 2

for obtaining the best performance.Fig. 4 shows the error rate when varying the number of bits

(v) used for the data. This is an interesting result, because itshows that the precision needed to represent the data is lowerthan expected: in fact, even if the pixels are described by 8bits, our experimental results show that 4 bits are sufficient.

35 \..

S 25

35

30

25

X 208

15

10

6 8 10 12 14 16# of bits

Fig. 3. Error rate on the test set varying the precision of the kernel.

20

10

0 2 4 6 8 10 1220 .. ...... ..........................................................#.of.bit20 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~of bits

Fig. 4. Error rate on the test set varying the precision of the data.

5 L2 4 6 8 10 12 14 16

# of bits

Fig. 2. Error rate on the test set varying the precision of 3.

V. CONCLUSIONS

We described a method to build a hardware-friendlyversion of the Support Vector Machine. This approach allowsus to find the minimum number of bits that must be usedfor implementing the SVM in digital hardware, withoutany performance degradation respect to a floating-pointversion. A simple procedure for bounding the search-spaceof the SVM hyperparameters has also been introduced. Morerefined rounding schemes are being explored and will bepresented in a forthcoming work.

REFERENCES

[1] D. Anguita, A. Boni, S. Ridella, "A digital architecture for SupportVector Machines: theory, algorithm and FPGA implementation", IEEETransactions on Neural Networks, vol. 14, pp. 993-1009, Sept. 2003.

[2] R. Genov, G. Cauwenberghs, "Kerneltron: Support Vector Machines insilicon", IEEE Transactions on Neural Networks, vol. 14, pp. 1426-1434, Sept. 2003.

[3] S. Chakrabartty, G. Cauwenberghs, "Sub-microwatt analog VLSI Sup-port Vector Machine for pattern classification and sequence estimation",in "Advances in Neural Information Processing Systems ", (Chapter 17),Edited by M. I. Jordan, Y LeCun and S. A. Solla, Dec. 2004.

[4] D. Anguita, G. Bozza, "The effects of Quantization on Support VectorMachines", Proc. ofthe IEEE Int. Joint Conference on Neural Networks,IJCNN 2005, Montreal, Canada, August 2005.

[5] D. Anguita, S. Pischiutta, S. Ridella, D. Sterpi, "Feed-Forward SupportVector Machine without multipliers", IEEE Transactions on NeuralNetworks, vol. 17, pp. 1328-1331, 2006.

[6] C. Cortes, V. Vapnik, "Support-vector networks", Machine Learning,vol. 27, pp. 273-297, 1991.

[7] N. Cristianini, J. Shawe-Taylor, "An introduction to Support Vector Ma-chine and other kernel-based learning methods", Cambridge UniversityPress, 2000.

[8] D. Hush, P. Kelly, C. Scovel, I. Steinwart, "QP Algorithms withGuaranteed Accuracy and Run Time for Support Vector Machines",Journal ofMachine Learning Research, vol. 7, pp. 733-769, May 2006.

[9] T. Poggio, S. Mukherjee, R. Rifkin, A. Raklin, A. Verri, "b", in"Uncertainty in Geometric Computations", (Chapter 11), Edited by J.Winkler, M. Niranjan, Oct. 2002.

[10] B.L. Milenova, J.S. Yarmus, M.M. Campos, "SVM in Oracle Database10g: Removing the Barriers to Widespread Adoption of Support VectorMachines", in Proc. of the 31st Int. Conf. on Very Large Data Bases,Trondheim, Norway, pp. 1152-1163, Aug. 30 - Sep. 02, 2005.

[11] K. Duan, S.S. Keerthi, A.N. Poo, "Evaluation of Simple PerformanceMeasures for Tuning SVM Hyperparameters", Neurocomputing, Vol.51, pp. 41-59, 2003.

[12] T. Hastie, S. Rosset, R. Tibshirani, J. Zhu, "The Entire Regularization

ao

p

5 _

Path for the Support Vector Machine", Journal of Machine Learning [15] C.-J. Lin, "Asymptotic convergence of an SMO algorithm withoutResearch, Vol. 5, pp. 1391-1415, 2004. any assumptions", IEEE Transactions on Neural Networks, vol. 13, pp.

[13] G. Alefeld, J. Herzberger, Introduction to Interval Computation, Aca- 248-250, 2002.demic Press, 2006. [16] D. Anguita, A. Ghio, S. Pischiutta, S. Ridella, "A Support Vector

[14] S. Munder, D.M. Gavrila, "An Experimental Study on Pedestrian Machine with integer parameters", Submitted to Neurocomputing.Classification", IEEE Transactions on Pattern Analysis and Machine [17] C.-W. Hsu, C.-C. Chang, C.-J. Lin, "A practical guide to supportIntelligence, vol. 28, pp. 1863-1868, 2006. vector classification", Technical report, Dept. of Computer Science,

National Taiwan University, July 2003.

Date post:	04-Dec-2016
Category:	Documents
Upload:	sandro
View:	213 times
Download:	1 times

[IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA...

Documents