39fJ Analog Artificial Neural Network for BreastCancer Classification in 65nm CMOS
Ruobing Hua and Arindam SanyalElectrical Engineering Department, University at Buffalo, Buffalo, NY.
Email: {ruobingh, arindams}@buffalo.edu
Abstract—An analog artificial neural network (ANN) classifierusing a common-source amplifier based nonlinear activationfunction is presented in this work. A shallow ANN is designed in65nm CMOS to perform binary classification on breast cancerdataset and identify each patient data as either benign ormalignant. Use of common-source amplifier structure simplifiesthe ANN and results in only 39fJ/classification at 0.8V powersupply and core area of only 240µm2. The classifier is trainedusing Matlab and validated using Spectre simulations.
Keywords— artificial neural network, analog AI circuit,classifier, CMOS, WBCD
I. INTRODUCTION
Artificial intelligence (AI) and machine learning algorithmsare used in a wide variety of applications including medical di-agnostic systems. While AI algorithms have traditionally beendeveloped and tested on graphics processing units (GPUs),there have been recent efforts to implement AI classificationalgorithms using analog integrated circuits [1]–[5]. Comparedto GPUs, energy consumed by custom analog AI circuits areseveral orders lower in magnitude, thus holding the potentialfor integration of AI circuits into future internet-of-things(IoT) devices. However, existing analog AI circuits have sofar demonstrated only multiply-and-add capabilities [2]–[4] orimplemented a sub-circuit, such as linear regression [5], [6].
In this work, we report a complete analog artificial neuralnetwork (ANN) classifier that detects benign or malignantbreast cancer on the popular wisconsin breast cancer dataset(WBCD) [7]. The WBCD has 9 attributes corresponding tobiopsy results on 699 patients and is grouped into two classes:benign (458 patients) and malignant (241 patients). We use anANN with 1 hidden layer and common-source (CS) amplifierbased nonlinear activation function to implement the completeANN classifier in 65nm CMOS technology. The proposedbinary classifier consumes only 39fJ/classification which isseveral orders better than the gender detection system [3]with 655nJ/classification. The rest of this paper is organizedas follows: the classifier training and circuit implementationis discussed in Section II, simulation results are presented inSection III and the conclusion is brought up in Section IV.
II. PROPOSED ARCHITECTURE
The core arithmetic operation that a classifier needs toperform is given by f (
∑Wixj) where f(·) can be linear
or non-linear activation function, Wi is the weight vector andxi is the input vector to the layer implementing the activation
0 0.2 0.4 0.6 0.8 1
Vg
0
0.1
0.2
0.3
0.4
0.5
0.6
Vsd
Vdd
Vsd
Vg
Fig. 1. CS amplifier and its transfer function
function. Analog implementation of activation function is moreenergy and area efficient than digital implementation sinceanalog computation leverages the full physical properties ofMOS device compared to digital implementation which usesMOS device as a switch. We propose to take advantage ofinherent nonlinearity of a CS amplifier to implement nonlinearactivation function for hidden layer and output layer. Instead ofdesigning circuits which closely match widely used activationfunctions like tanh [8], we use a CS amplifier as shown inFig. 1 to approximate tanh activation function, thus simplifyingthe hardware cost greatly compared to conventional circuitimplementations of these activation functions. In order to getgood classification accuracy with the proposed approximateactivation functions, we import SPICE simulation data intoour training algorithm. Fig. 2 shows the design flowchart usedfor the proposed classifier. During the first step, we design1 slice each of the hidden and output layers in SPICE andcharacterize them completely before importing the data intoMatlab. We extract the V-I transfer curves for hidden andoutput neurons and use the characterization data as look-uptable during the ANN training phase. After the ANN is trainedin Matlab, we use the training weights to design the ANN inSPICE and perform simulation on the test set to validate thearchitecture. The ANN weights are encoded in the widths ofNMOS transistors in both hidden and output layers.
Algorithm 1 shows the pseudo-code snippet used to trainthe ANN in Matlab based on parameters extracted fromSPICE simulations. The ANN weights are updated by back-propagation as the network iterates through each input fromthe training set. We split the WBCD dataset randomly intotraining set containing 419 data and test set containing 280
978-1-7281-2788-0/19/$31.00 ©2019 IEEE 436
Fig. 2. Flowchart for proposed analog classifier design
Algorithm 1 ANN Training pseudo-code1: W1,W2← Random number from [-1,1]2: W1←Weight vector for hidden layer3: W2←Weight vector for output layer4: NumberCorrect = 05: for i<Number of Training Iterations do6: for j< Size(Train set) do7: Input(j)← Trainset(j)8: HiddenOuput(j)← f(Bias;W1, Input(j))9: Ouput(j)← g(Bias;W2, HiddenOutput(j))
10: if Output(j) = Train label(j) then11: NumberCorrect+ = 1
12: end if13: delta1 ← (Output(j) − trainlabel(j)) ∗ (1 −
Output(j)2)14: delta2 ← (W2∗delta1)∗(1−HiddenOutput(j)2)15: W1←W1− alpha ∗ (Input(j) ∗ delta′2)16: W2←W2− alpha ∗ (hiddenoutput(j) ∗ delta′1)17: end for18: accuracy ← NumberCorrect/N19: end for20: if W1 > 0 then21: W1 ← 122: else23: W1 ← 024: end if25: if W2 > 0 then26: W2 ← 127: else28: W2 ← 029: end if
data. The 9 attributes in the WBCD dataset have integer valuesbetween 0-10. We scale the attributes to fit them into 0.27V-0.6V range before feeding them to the proposed ANN circuit.The ANN weights are initiated to random values at the startof training. The weights are updated using the well knownstochastic gradient descent [9] which computes derivative oferror (mean squared error for our case) with respect to currentweights and calculates new weights based on the learning rate,derivative of error and current weights. In order to reducehardware cost, we constrain all the weights to be {0, 1}.For any weight transistor that represents 1, the W/L ratio is
0 10 20 30 40
Epoch
0.6
0.7
0.8
0.9
1
Tra
inin
g a
ccu
racy
Fig. 3. Training accuracy versus iterations
VDD
VP[j]
x[1]
W1,j,1W1,j,2W1,j,9
x[2]x[9]
Fig. 4. Schematic of ANN for breast cancer classification
200nm/60nm. Here we have to notice that the approximationof mapping weights element into binary is the result of matlaband SPICE simulation, in this way we can achieve a best wayto get the correct result with less number of NMOS weighttransistors. And we can only ensure that binary weights worksfor this certain dataset and may not work for another dataset.If anyone try to build a neural network using binary weightmatrix in the way that initializing a binary matrix, it maybe hard or impossible to converge using back-propagationalgorithm. Fig. 3 shows the training accuracy versus number
437
of iterations. The ANN takes less than 40 iterations to reachhigh accuracy.
Fig. 4 shows the transistor level schematic of the proposedclassifier. The classifier has 1 input layer with 9 inputs, 1hidden layer with 5 hidden neurons and 1 output layer withsingle output neuron. The weights of the hidden layer aswell as that of the output layer are encoded in the widthsof the NMOS transistors in each neuron as shown in Fig. 4.The weight matrices for the hidden and output layer are alsoshown in Fig. 4. While the weight vectors are implemented byvarying the widths of NMOS transistors, addition is performedin current domain by summing the currents through eachNMOS transistor using diode-connected PMOS load. Thepseudo-differential amplifier in the output layer has a largeoutput swing which allows direct classification of the outputas ‘0/1’ with ‘1’ indicating malignant breast cancer while ‘0’indicating benign tumor. Since the activation functions arebased on current mode operations, the proposed classifier isimmune to charge injection errors unlike switched-capacitorimplementations. The relatively simple nature of the proposedclassifier circuit results in very low power consumption andallows classifications to be performed at high speed. As willbe shown in Section III, the proposed classifier has goodimmunity against random mismatches and variations in powersupply voltage.
III. SIMULATION RESULTS
Fig. 5. Classifier layout in 65nm
Fig. 5 shows layout of the proposed classifier designed in65nm CMOS technology. The entire classifier core occupiesan area of only 16µm× 15µm.
Fig. 6 shows the post-layout receiver operating characteristic(ROC) curve of the proposed classifier. An ROC curve plotstrue positive (TP) rate versus false positive (FP) rate at
different classification thresholds where TP rate (TPR) andFP rate (FPR) are defined as
TPR =TP
TP + FN; FPR =
FP
FP + TN(1)
where FN is false negative and TN is true negative. Forour dataset, TP is the number of samples that the classifiercorrectly identifies as malignant while TN is the number ofsamples the classifier correctly identifies as benign. Similarly,FP is the number of benign samples the classifier incorrectlyidentifies as malignant while FN is the number of malignantsamples the classifier incorrectly identifies as benign. Lower-ing the classification threshold results in the classifier markingmore samples as positive thus increasing both TP and FP,while increasing the classifier threshold lowers both TP andFP. Area under the ROC curve (AUC) indicates the measure ofseparability of the two classes and higher value of AUC meansthe classifier is better at correctly distinguishing between thetwo classes. The AUC of the proposed classifier is 0.989 whichindicates that the proposed classifier is very good at identifyingbenign and malignant breast cancers correctly irrespective ofthe distribution of the two classes in the test dataset.
0 0.2 0.4 0.6 0.8 1
False positive rate
0
0.2
0.4
0.6
0.8
1
Tru
e p
ositiv
e r
ate
AUC=0.989
Fig. 6. ROC curve for proposed classifier
Fig. 7 shows the classifier accuracy and energy versus powersupply voltage, Vdd, and operating frequency. As seen fromFig. 7(a), the classifier has an accuracy of 0.964 as long asVdd ≥ 0.7V. Once Vdd drops below 0.7V, the accuracy dropsto 0.84 at Vdd = 0.6V. The classifier energy reduces with Vddand the classifier consumes only 39fJ at 0.8V. Fig. 7(b) showsclassifier accuracy and energy versus operating frequency.The classifier maintains a high accuracy upto frequencies of400MHz and the accuracy starts reducing as the classifieroperating frequency exceeds 400MHz. The classifier energyis 39fJ at 300MHz.
Fig. 8 shows the histogram of classification accuracy ex-tracted from 50 monte-carlo runs across process and mismatchcorners. It can be seen that the classifier has an averageaccuracy of 0.96 with a standard deviation of 0.017. Thus,the classifier accuracy is not significantly affected by randommismatches and process variations.
Table I compares our work with existing WBCD classifiers.Most of WBCD classification has been done using neural net-works implemented using python or Matlab programs on GPU
438
0.6 0.7 0.8 0.9 1
Vdd (V)
0.8
0.85
0.9
0.95
1
Accu
racy
20
40
60
80
100
Energy(fJ)
(a)
200 400 600 800 1000
Operating frequency (MHz)
0.7
0.75
0.8
0.85
0.9
0.95
1
Accu
racy
0
20
40
60
80
100
120
140
160
Energy(fJ)
(b)
Fig. 7. Classifier accuracy and energy versus a) Vdd; b) operating frequency
Fig. 8. Classifier accuracy versus monte-carlo iterations
which can easily consume several mJ/classification [1]. It canbe seen from Table I that the proposed classifier implementedon CMOS technology can achieve similar classification ac-curacy as GPU based neural networks while consuming only39fJ.
TABLE ICOMPARISON WITH STATE-OF-THE-ART WBCD CLASSIFIERS
Accuracy(%) Platform[10] 99.54 GPU[11] 95.57 GPU[12] 97.40 GPU[13] 99.58 GPU[14] 90.8 FPGAThis work 96.43 CMOS IC
IV. CONCLUSION
In this paper, we have presented an analog machine learningclassifier which uses two-transistor common-source amplifieras the basic building block. We have used the popular WBCDdataset for breast cancer which has two classes: benign andmalignant depending on the values of 9 attributes obtainedfrom biopsy data. The classifier employs constant weights thatmakes the neural network less flexible, however, it can beused for special purpose which is classifying WBCD datasetin our case. The proposed classifier has an accuracy of 0.96and consumes only 39fJ/classification which is several ordersof magnitude better than existing CMOS classifiers.
REFERENCES
[1] Z. Wang and N. Verma, “A low-energy machine-learning classifier basedon clocked comparators for direct inference on analog sensors,” IEEETransactions on Circuits and Systems I: Regular Papers, vol. 64, no. 11,pp. 2954–2965, 2017.
[2] E. H. Lee and S. S. Wong, “ A 2.5 GHz 7.7 TOPS/W switched-capacitormatrix multiplier with co-designed local memory in 40nm,” in IEEEInternational Solid-State Circuits Conference (ISSCC), 2016, pp. 418–419.
[3] Z. Wang, J. Zhang, and N. Verma, “Realizing Low-Energy ClassificationSystems by Implementing Matrix Multiplication Directly Within anADC.” IEEE Trans. Biomed. Circuits and Systems, vol. 9, no. 6, pp.825–837, 2015.
[4] F. N. Buhler, A. E. Mendrela, Y. Lim, J. A. Fredenburg, and M. P. Flynn,“A 16-channel noise-shaping machine learning analog-digital interface,”in IEEE Symposium on VLSI Circuits (VLSI-Circuits), 2016, pp. 1–2.
[5] J. Zhang, Z. Wang, and N. Verma, “A machine-learning classifierimplemented in a standard 6T SRAM array,” in IEEE Symposium onVLSI Circuits (VLSI-Circuits), 2016, pp. 1–2.
[6] D. P. Solomatine and D. L. Shrestha, “AdaBoost. RT: a boostingalgorithm for regression problems,” Neural Networks, vol. 2, pp. 1163–1168, 2004.
[7] Breast Cancer Wisconsin (Original) Data Set. [Online]. Available:https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
[8] M. Carrasco-Robles and L. Serrano, “A novel CMOS current mode fullydifferential tanh (x) implementation,” in IEEE International Symposiumon Circuits and Systems, 2008, pp. 2158–2161.
[9] S.-i. Amari, “Backpropagation and stochastic gradient descent method,”Neurocomputing, vol. 5, no. 4-5, pp. 185–196, 1993.
[10] E. D. Ubeyli, “Implementing automated diagnostic systems for breastcancer detection,” Expert systems with Applications, vol. 33, no. 4, pp.1054–1062, 2007.
[11] J. Abonyi and F. Szeifert, “Supervised fuzzy clustering for the identifi-cation of fuzzy classifiers,” Pattern Recognition Letters, vol. 24, no. 14,pp. 2195–2207, 2003.
[12] M. Karabatak and M. C. Ince, “An expert system for detection of breastcancer based on association rules and neural network,” Expert systemswith Applications, vol. 36, no. 2, pp. 3465–3469, 2009.
[13] A. Marcano-Cedeno, J. Quintanilla-Domınguez, and D. Andina, “Breastcancer classification applying artificial metaplasticity algorithm,” Neu-rocomputing, vol. 74, no. 8, pp. 1243–1250, 2011.
[14] D. Selvathi and R. D. Nayagam, “FPGA implementation of on-chipANN for breast cancer diagnosis,” Intelligent Decision Technologies,vol. 10, no. 4, pp. 341–352, 2016.
439