+ All Categories
Home > Documents > On the Design of Analog VLSI Iterative Decoders - Electronics

On the Design of Analog VLSI Iterative Decoders - Electronics

Date post: 12-Feb-2022
Category:
Upload: others
View: 14 times
Download: 1 times
Share this document with a friend
217
Series in Diss. ETH No 13879 Signal and Information Processing Volume 2 Hartung Gorre Konstanz November 2000 On the Design of Analog VLSI Iterative Decoders A dissertation submitted to the Swiss FederalInstitute of Technology, Zürich for the degree of Doctor of Technical Sciences presented by Felix Lustenberger Ing. en Microtechnique dipl. EPFL born on May 31, 1969 citizen of Kriens (LU) and Honau (LU) accepted on the recommendation of Prof. Dr. George S. Moschytz, examiner Prof. Dr. Hans-Andrea Loeliger, co-examiner Prof. Dr. David A. Johns, co-examiner
Transcript
Page 1: On the Design of Analog VLSI Iterative Decoders - Electronics

Series in Diss. ETH No 13879

Signal andInformation

Processing

Volume 2

HartungGorre

Konstanz November 2000

On the Design ofAnalog VLSIIterative Decoders

A dissertation submitted to theSwiss Federal Institute of Technology, Zürichfor the degree ofDoctor of Technical Sciences

presented by

Felix LustenbergerIng. en Microtechnique dipl. EPFLborn on May 31, 1969citizen of Kriens (LU) and Honau (LU)

accepted on the recommendation ofProf. Dr. George S. Moschytz, examinerProf. Dr. Hans-Andrea Loeliger, co-examinerProf. Dr. David A. Johns, co-examiner

Page 2: On the Design of Analog VLSI Iterative Decoders - Electronics
Page 3: On the Design of Analog VLSI Iterative Decoders - Electronics

Series in Diss. ETH No 13879

Signal andInformation

Processing

Volume 2

HartungGorre

Konstanz Hartung-Gorre Verlag, Konstanz, November 2000

On the Design ofAnalog VLSIIterative Decoders

A dissertation submitted to theSwiss Federal Institute of Technology, Zürichfor the degree ofDoctor of Technical Sciences

presented by

Felix LustenbergerIng. en Microtechnique dipl. EPFLborn on May 31, 1969citizen of Kriens (LU) and Honau (LU)

accepted on the recommendation ofProf. Dr. George S. Moschytz, examinerProf. Dr. Hans-Andrea Loeliger, co-examinerProf. Dr. David A. Johns, co-examiner

Page 4: On the Design of Analog VLSI Iterative Decoders - Electronics
Page 5: On the Design of Analog VLSI Iterative Decoders - Electronics

Für Rita, Simon und David

Page 6: On the Design of Analog VLSI Iterative Decoders - Electronics

ii

Page 7: On the Design of Analog VLSI Iterative Decoders - Electronics

Acknowledgements

First of all I would like to thank my doctoral father Prof. Dr.George S. Moschytz for his confidence in me and my work bygiving me a large scientific freedom and for always having thedoor of his office open in case of doubts and setbacks. I verymuch admire his human kindness to create a fruitful workingenvironment within the Signal and Information Processing Lab-oratory.

I am deeply indebted to my mentor and friend Prof. Dr. Hans-Andrea Loeliger who set the initial spark to this fascinating re-search project. With his never ending enthusiasm he led mesafely through all the periods of frustration and exhaustmentwhich are common to ambitious research projects. I very muchappreciated the long technical and sometimes also philosphicaldiscussions that broadened my understanding of coding theory,data communications and signal processing.

Many thanks also to Prof. Dr. David A. Johns from the Uni-versity of Toronto, Canada, for having accepted to serve as acompetent co-examiner for the present doctoral dissertation.

My special thanks go to Markus Helfenstein and Felix Tarköyof our research team. Their maturity, both technically and per-sonally, substantially added value to the results of this interdis-ciplinary research. It was and still is a great pleasure for me toshare ideas and dreams together.

All the colleagues at the laboratory always helped to create anintellectually very stimulating and familiar working environ-ment. I especially wish to thank my roommates and friendsHanspeter Schmid and Pascal Vontobel, but also Dieter Arnold,Marcel Joho, Daniel Lippuner, Heinz Mathis, and Stefan Moserfor the time they invested into long and profound discussionson many different topics. I also enjoyed and learned a lot fromthe collaboration with Georg Fromherz and Ermanno Schincawhose diploma work I supervised.

Page 8: On the Design of Analog VLSI Iterative Decoders - Electronics

iv

I would also say ‘thank you’ to Max Dünki, who keeps our Sunworkstation cluster constantly running and up-to-date. Withgreat pleasure I also acknowledge the assistance with the de-sign of printed circuit boards and measurement setup providedby the technical staff of the laboratory, in particular, Felix Frey,Patrick Schweizer, and Thomas Schärer.

Whenever there were problems that we were not able to solve atthe laboratory, I always found kind persons whithin the depart-ment of electrical engineering willing to help me. The follow-ing persons (in alphabetical order) provided me with practicalsolutions to many problems: Christoph Balmer, Hubert Käslin,Ruedi Köppel, and Andreas Wieland from the Design Center;Armin Deiss, Norbert Felber, Clemens Hammerschmied, Hans-peter Mathys, Michael Oberle, Dirk Pfaff, and Robert Reute-mann from the Integrated Systems Laboratory (IIS); Didier Cot-tet and Michael Scheffler from the Electronics Laboratory (IfE);Geert Bernaerts and Etienne Hirt from Art of Technology, aspin-off of IfE. Thank you very much to all of you for yourhelp.

My deepest gratitude goes to my wonderful family. My wifeand best friend Rita always supported me during the time ofworking on this dissertation, which is just simply as hard as do-ing the research itself. For months, my little sons Simon andDavid had to share their father on weekends only. I would alsothank my parents who taught me in my younger years the cu-riosity and the passion to know how things work. All of themgave me the courage to finish the work.

This work was financially supported by the Swiss National Sci-ence Foundation under Grants 21-49619.96 and 20-55761.98,and by ETH Zurich under Grant 41-2639.5.

Page 9: On the Design of Analog VLSI Iterative Decoders - Electronics

Abstract

The rapidly growing electronic networking of our society hascreated the need for a high-speed and low-power data commu-nications infrastructure. Both voice and data communicationshave been made available for the mobile user. Additionally,more complex coding schemes and decoding algorithms havebeen introduced to protect the user data from corruption duringthe transmission over a communications channel. The aim ofall these new coding and decoding approaches is to meet thetheoretical channel capacity limit to make a better use of thesignal power and channel bandwidth. The iterative probability-propagation-type algorithms that are used to decode state-of-the-art codes such as Turbo codes and low-density parity-checkcodes create the need for a tremendous computational power.Often, the computational complexity can not be implementedwith a traditional digital design approach and a given powerbudget.

This thesis discusses the efficient implemention of high-perfor-mance decoding algorithms in analog VLSI technology. Thebuilding blocks are very simple analog translinear circuits thatimplement vector multipliers with basically only one transistorper element of the outer product of two discrete probability dis-tributions. The presented analog probability propagation net-works made of these building blocks are a direct image of theunderlying sum-product algorithm. The design of these ana-log networks follows a heavily semiconductor-physics-centeredbio-inspired design approach, that exploits, rather than fightsagainst, the inherent nonlinear behaviour of the basic semi-conductor devices. By using such a bio-inspired design ap-proach, the performance of these networks in terms of speedor power-consumption or both is increased by at least a factorof 100 compared to digital implementations. Despite the use ofvery-low-precision circuit devices, a remarkable system-levelaccuracy can be achieved by such a large, highly-connectedanalog network.

Page 10: On the Design of Analog VLSI Iterative Decoders - Electronics

vi

The first part of the thesis discusses the background of chan-nel coding and decoding and the theoretical foundations of fac-tor graphs and the sum-product algorithm, which operates bymessage passing on such graphs. This part provides a brief in-troduction to the information-theoretic aspects of the interdis-ciplinary research effort.

The second part of the thesis is devoted to the actual transis-tor level implementation of the sum-product algorithm usingvery simple analog-VLSI computational building blocks. Thispart discusses the design-oriented aspects of the research, how-ever, it relies heavily on the information-theoretic concepts in-troduced in the first part.

Finally, we present practical designs and design studies of sev-eral decoding networks. Algorithmic simulations, circuit sim-ulations, and, where available, measurement results of the im-plemented decoding networks are presented. Two of the de-coder examples were actually fabricated in a 0.8µm BiCMOSprocess. Additionally, application-specific design problems arediscussed.

The thesis is finished with a summary of the achieved resultsand a presentation of future research propositions in the field ofanalog decoding.

Keywords: Iterative decoding, low-density parity-check (LDPC)codes, repeat-accumulate (RA) codes, trellis codes, Turbo codes,maximum-a posterioriprobability (MAP) decoder, maximum-likelihood (ML) sequence detection, sum-product algorithm,Viterbi algorithm, probability propagation, factor graphs, ana-log VLSI technology, bio-inspired networks.

Page 11: On the Design of Analog VLSI Iterative Decoders - Electronics

Kurzfassung

Die schnell wachsende elektronische Vernetzung unserer Ge-sellschaft hat einen grossen Bedarf an schneller und leistungs-armer Datenkommunikationsinfrastruktur erzeugt. SowohlSprach- wie auch Datenkommunikationsmittel sind inzwischenfür den mobilen Benutzer zugänglich. Zusätzlich werdenlaufend komplexere Kodierungsverfahren und Dekodieralgo-rithmen eingeführt, um die Benutzerdaten vor Übertragungs-fehlern zu schützen. Das Ziel dieser neuen Kodier- undDekodierverfahren ist das Erreichen der theoretischen Kanal-kapazitätsgrenze, damit die vorhandene Signalleistung undKanalbandbreite optimal ausgenutzt werden können. Diebei der Dekodierung der dem aktuellen Stand der Technikentsprechen Kodes (wie zum Beispiel Turbo Kodes und Kodesmit dünn besetzter Paritätsprüfmatrix) verwendeten itera-tiven Wahrscheinlichkeits-Fortpflanzungs-Algorithmen benöti-gen eine enorme Rechenleistung. Diese Rechenkomplexitätkann bei vorgegebenem Leistungsbudget oft nicht mehr mit tra-ditionellen digitalen Entwurfsansätzen erreicht werden.

Die vorliegende Dissertation beschäftigt sich mit der ef-fizienten analogen VLSI-Implementation von leistungsstarkenDekodieralgorithmen. Die Baublöcke der vorgestellten Tech-nik sind sehr einfache analoge translineare Schaltungen zurImplementierung von Vektormultiplizierern. Dabei wird prak-tisch nur ein Transistor zur Bildung eines Elementes desäusseren Produkts von zwei diskreten Wahrscheinlichkeits-verteilungen benötigt. Die aus den Baublöcken aufgebautenanalogen Wahrscheinlichkeits-Fortpflanzungs-Netzwerke sindein direktes Abbild des zugrundeliegenden Summe-Produkt-Algorithmus’. Der Entwurfsprozess dieser analogen Netz-werke verfolgt einen auf die Halbleiterphysik ausgerichtetenund von der Biologie inspirierten Entwurfsansatz, wobei dasgrundlegend nichtlineare Verhalten von Halbleiterelementenausgenutzt wird anstatt dagegen anzukämpfen. Indem dieserbio-inspirierte Entwurfsansatz verfolgt wird, kann das Leis-

Page 12: On the Design of Analog VLSI Iterative Decoders - Electronics

viii

tungsverhalten in Bezug auf den Stromverbrauch oder dieGeschwindigkeit oder beides, verglichen mit einer äquivalentendigitalen Lösung, um mindestens einen Faktor 100 erhöht wer-den. Obwohl nur Bauelemente mit sehr schlechten Präzisions-eigenschaften verwendet werden, erreichen diese hochgradigverbundenen analogen Netzwerke eine erstaunliche Systemge-nauigkeit.

Der erste Teil der Dissertation vermittelt Hintergrundinfor-mationen zum Thema Kanalkodierung und -dekodierung undliefert die theoretischen Grundlagen über Faktorgraphen undden Summe-Produkt-Algorithmus, der gemäss dem sogenann-ten Nachrichten-Übertragungs Prinzip auf solchen Graphenangewandt wird. Dieser Teil gibt eine kurze Einführung in dieinformationstheoretischen Aspekte der interdisziplinären For-schungsanstrengungen.

Der zweite Teil der Arbeit ist der eigentlichen Implementierungauf Transistorebene des Summe-Produkt-Algorithmus’ mittelssehr einfacher Rechenbaublöcke gewidmet. Dieser Teil disku-tiert somit die entwurfsorientierten Aspekte der Arbeit. Ernimmt jedoch sehr stark Bezug auf die im ersten Teil vorgestell-ten informationstheoretischen Konzepte.

Schliesslich werden im dritten Teil praktische Ausführun-gen und Entwürfe von verschiedenen Dekodiernetzwerken be-sprochen. Es werden dabei algorithmische Simulationen,Schaltungssimulationen und, soweit vorhanden, Messresultateder von uns gebauten Dekodiernetzwerke vorgestellt. Zweidieser Dekoderbeispiele wurden in einer 0.8µm BiCMOS-Technologie fabriziert. Zusätzlich werden auch anwendungs-spezifische Entwurfsprobleme besprochen.

Die vorliegende Dissertation wird durch eine Zusammenfas-sung der erzielten Resultate und Vorschläge für weitergehendeForschungsprojekte im Bereich der analogen Dekodierung ab-gerundet.

Stichwörter: Iterative Dekodierung, Kodes mit dünn besetz-ter Paritätsprüfmatrix, Repetitions-Anhäufungs-Kodes, Trellis-Kodes, Turbo-Kodes, Maximum-a-posteriori-Wahrscheinlich-keits-Dekoder, Maximum-Likelihood (ML)-Sequenzdetektion,Summe-Produkt-Algorithmus, Viterbi-Algorithmus, Wahr-scheinlichkeitsfortpflanzung, Faktorgraphen, analoge VLSI-Technik, bio-inspirierte Netzwerke.

Page 13: On the Design of Analog VLSI Iterative Decoders - Electronics

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . 11.2 Outline of this Thesis . . . . . . . . . . . . . . 3

2 Background Information 52.1 About Coding . . . . . . . . . . . . . . . . . . 5

2.1.1 General Communication System . . . . 52.1.2 Types of Codes . . . . . . . . . . . . . 62.1.3 Hamming Codes . . . . . . . . . . . . 102.1.4 Low-Density Parity-Check Codes . . . 112.1.5 Trellis Codes . . . . . . . . . . . . . . 112.1.6 Turbo Codes . . . . . . . . . . . . . . 162.1.7 Channel Models . . . . . . . . . . . . 172.1.8 Types of Errors . . . . . . . . . . . . . 22

2.2 Analog Viterbi Decoding . . . . . . . . . . . . 232.2.1 Computational Considerations on the

VLSI Implementation of Viterbi De-coders . . . . . . . . . . . . . . . . . . 23

2.2.2 Circuit Implementation of Analog andMixed-Signal Viterbi Decoders . . . . 24

2.3 Network Decoding . . . . . . . . . . . . . . . 262.3.1 Non-Algorithmic Diode Decoding . . . 262.3.2 Neural Network and Fuzzy Logic De-

coding . . . . . . . . . . . . . . . . . . 272.3.3 Analog Network Decoding . . . . . . . 29

2.4 Bio-Inspired Networks . . . . . . . . . . . . . 29

3 The Probability-Propagation Algorithm 313.1 Problem Statement . . . . . . . . . . . . . . . 31

3.1.1 Basic Decision Theory . . . . . . . . . 313.1.2 MAP Decision Rule . . . . . . . . . . 333.1.3 ML Decision Rule . . . . . . . . . . . 343.1.4 Decoding Rules . . . . . . . . . . . . . 35

3.2 Factor Graphs . . . . . . . . . . . . . . . . . . 373.2.1 Definition of Factor Graphs . . . . . . 373.2.2 Examples of Factor Graphs . . . . . . . 40

3.3 The Sum-Product Algorithm . . . . . . . . . . 463.3.1 The Sum-Product Update Rules . . . . 48

Page 14: On the Design of Analog VLSI Iterative Decoders - Electronics

x Contents

3.3.2 Message Passing Schedules . . . . . . 493.4 Probability Calculus Modules . . . . . . . . . . 52

3.4.1 Soft-Logic Gates . . . . . . . . . . . . 543.4.2 Building Blocks with Multiple Inputs . 55

4 Circuit Implementation 594.1 Basic Circuit . . . . . . . . . . . . . . . . . . 59

4.1.1 Signal Summation . . . . . . . . . . . 594.1.2 Basic Translinear Network Theory . . . 604.1.3 Core Circuit for Matrix Multiplications 624.1.4 Log-Likelihood Interpretation of Input

and Output Distributions . . . . . . . . 664.2 Soft-Logic Gates and Trellis Modules . . . . . 664.3 Connecting Building Blocks . . . . . . . . . . 72

4.3.1 Current- or Voltage-Mode Connections? 724.3.2 Stacking and Folding Building Blocks . 754.3.3 Scaling Probabilities . . . . . . . . . . 76

4.4 Implementation Issues . . . . . . . . . . . . . 794.4.1 Device Matching Considerations . . . . 804.4.2 Finite Current Gain . . . . . . . . . . . 844.4.3 Finite Output Resistance . . . . . . . . 854.4.4 Thermal Effects . . . . . . . . . . . . . 854.4.5 Other Implementation Issues . . . . . . 89

5 Decoder Examples 915.1 Decoder for a Simple Trellis Code . . . . . . . 91

5.1.1 Code Description . . . . . . . . . . . . 915.1.2 Implementation Using Discrete Tran-

sistors . . . . . . . . . . . . . . . . . . 965.2 Decoder for a Tail-Biting Trellis Code . . . . . 101

5.2.1 General Description . . . . . . . . . . 1015.2.2 Circuit Design . . . . . . . . . . . . . 1075.2.3 Simulation Results . . . . . . . . . . . 1085.2.4 Test Setup . . . . . . . . . . . . . . . . 1125.2.5 Measurement Results . . . . . . . . . . 1155.2.6 Power/Speed Comparison . . . . . . . 119

5.3 Decoder for a Turbo-Style Code . . . . . . . . 1225.3.1 General Code and Decoder Description 1225.3.2 Circuit Design . . . . . . . . . . . . . 1245.3.3 Automating the Design Process . . . . 1325.3.4 Simulation Results . . . . . . . . . . . 1355.3.5 Test Setup . . . . . . . . . . . . . . . . 140

5.4 Probability-Based Analog Viterbi Decoder . . . 1435.4.1 Reformulation of the Viterbi Algorithm 143

Page 15: On the Design of Analog VLSI Iterative Decoders - Electronics

5.4.2 Proposed Implementation . . . . . . . 1485.5 High-Level Study of Plain CMOS Implementa-

tions . . . . . . . . . . . . . . . . . . . . . . . 1515.5.1 Continuous CMOS Model from Weak

to Strong Inversion . . . . . . . . . . . 1535.5.2 Redundant Equations and Code Real-

izations . . . . . . . . . . . . . . . . . 1545.5.3 High-Level Simulation Results . . . . . 157

5.6 Appendix — Schematics of the Tail-BitingTrellis Decoder . . . . . . . . . . . . . . . . . 159

5.7 Appendix — Schematics of the Turbine Decoder 163

6 Concluding Remarks 1716.1 Summary of the Results . . . . . . . . . . . . . 1716.2 Ideas for Further Work and Outlook . . . . . . 172

A Selected Circuit Structures 175A.1 Transistor Terminals and Voltages . . . . . . . 175A.2 Cascode Structures . . . . . . . . . . . . . . . 176A.3 Current Mirrors . . . . . . . . . . . . . . . . . 178

List of Abbreviations 181

List of Symbols 183

Bibliography 187

Page 16: On the Design of Analog VLSI Iterative Decoders - Electronics

xii Contents

Page 17: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 1

Introduction

1.1Motivation

In the last few years the demand for efficient and reliable dig-Coding: why, where,and howital data transmission and data storage has tremendously in-

creased. This trend has been accelerated by the emergence ofhigh-speed data networks even at the local-area scale. Alreadyin 1948, Shannon [1] showed that it is possible, by properen-codingof the information source, to reduce errors induced by anoisy channel to any desired level, without sacrificing the rateof information transmission or storage of a given channel, aslong as the rate is below the so-called channel capacity. Theterm encoding in this context means that we add redundant in-formation to our data stream. This type of coding is generallycalled channel coding.1 The redundancy introduced by channelcoding will help the decoding block of a receiver with findingthe best decision for the sent data sequence. Decoding can beregarded as a projection of the possibly infinite number of re-ceived messages of the channel-output vector space onto thechannel-input vector space of the codewords. As a very simpleexample we could analyze the situation as shown in Fig. 1.1.Every mark is assumed to be a valid configuration or code-word of our code whereas the received data can be any pointin the plane. The individual codewords are schematically sep-arated by decoding-region border-lines. The decoder block inthe receiver path tries to match the incoming data, which is cor-rupted by channel noise, symbol interference, or other destruc-tive events, to the nearest valid configuration, i.e., it will create

1The two other main types of coding are source coding (or data com-pression), where we try to reduce the redundancy of the data source, andcryptography which modifies the characteristic of the data-source such thatunwanted observers cannot see the information content.

Page 18: On the Design of Analog VLSI Iterative Decoders - Electronics

2 Chapter 1. Introduction

x1

x2

x3

x4

x9

x8

x7

x6x5

Figure 1.1 A simple code viewed as discrete points in the plane of allmessages possibly received by a data communications receiver.

an estimate of the most probable codeword. This is a very sim-plistic picture of the purpose of coding, but it will help to seehow coding transforms an arbitrary channel into an almost reli-able bit pipe.

In general, we can state that the more complex our codingComplex codinggives more protection scheme is constructed, the more protection we get from cod-

ing. On the other hand, decoding will become more compli-cated. The computational complexity of decoding for codesthat try to reach the theoretical limits defined by Shannon [1] isgrowing more than linearly, i.e., quadratically or even exponen-tially. Today’s state-of-the-art codes such as Turbo codes [2–4],low-density parity-check codes [5–8], and other similarly builtcodes need huge computational power to deliver real-time re-sults.

Because of this rapidly growing computational effort, new de-Rapidly growingcomplexity seeks fornew decodingtechniques

coding techniques and technologies are investigated. The prob-lem can be tackled in different ways. The first and most obviousapproach is to boost processing speed by using more sophis-ticated semi-conductor processes. Unfortunately, unless moreparallelisms are introduced in the decoding system, the process-ing speed is just increasing linearly with the clock frequency.Alterations of the decoding system on the algorithmical level isa second and generally more successful approach to the com-plexity dilemma. But these alterations are in most cases madeat the cost of precision, i.e., the decoding is no more ideal. Thiscan be seen as a loss in the bit error-rate (BER). But for prac-tical solutions, a trade-off can be found in many cases whichsatisfies more or less both parts, decoding speed and precision.The third approach, which gains more and more momentum, isthe bio-inspired network-decoding approach.

Page 19: On the Design of Analog VLSI Iterative Decoders - Electronics

1.2. Outline of this Thesis 3

Today’s comprehension of the functioning of the human brainBio-inspired circuitsmay solve dilemmais that it consists of an agglomeration of neurons [9–11], each

having a relatively poor precision, but highly interconnected.Comparable to this understanding, one can think of highly con-nected electrical networks consisting of very simple local pro-cessors which globally exchange imprecise information. Inter-estingly, both ‘networks’ can reach outstanding precision onthe system level through a high degree of interconnectivity. Inthe case of the human brain, learning modifies and even den-sifies the connection pattern between the individual neurons.Thus, information storage seems to be a matter of a three-dimensional arrangement of individual cells far more complexthan any known standard computer hardware. But nevertheless,electrical networks inspired by its biological counterparts andrelatively simple compared to them promise achieving a veryfast and robust systems behaviour. Thus, one of the sources thathad the greatest influence on our motivation and inspiration isthe bio-inspired background in general and Mead’s outstand-ing work on neuromorphic systems in particular [11]. Mead’swork showed clearly the direction to take if very efficient elec-tronic solutions for processing analog signals are demanded.Beside a large academic interest, we find the bio-inspired de-sign approach also in industrial products such as Logitech’strackball marble [12], OCR readers for banking applications[13], and combined angle/torque sensors for automotive appli-cations [14].

1.2Outline of this Thesis

The present thesis is divided into six chapters. This chaptergave some introductory comments on the general motivation ofthe implementation of analog decoders. Chapter 2 is devotedto a short review of the basic notions of coding theory, and thepresentation of some inspiring sources with many citations ofinteresting literature. In Chapter 3, we present the algorithmicbackground and propose a new dissection of the general sum-product algorithm into generic trellis computation modules. Inthe fourth chapter, we fill the gap between the mathematicalrepresentation of the building blocks and their VLSI imple-mentation. Thereby we introduce the generic transistor-levelimplementation of these modules, and we discuss many prac-tical design aspects of large probability propagation networks

Page 20: On the Design of Analog VLSI Iterative Decoders - Electronics

4 Chapter 1. Introduction

composed of such modules. Then in Chapter 5, we present anddiscuss five designs of decoders for error-control codes. Theyrepresent the practical part of the thesis. The examples are com-pleted and are discussed in various depth, mainly due to sometime constraints of the whole research project. Beside an imple-mentation using discrete BJT devices, we have designed twocomplete decoders in BiCMOS technology which have beentested partly. Additionally, we discuss a Viterbi-decoder designusing our generic trellis calculation modules. Finally, we pro-pose a plain CMOS implementation and discuss its propertiesby using high-level simulations. The content of the fifth, as wellas of the fourth chapter are mainly results of original work.

We conclude the thesis by a summary of the results achievedduring the whole research and give some comments on wherefuture work may be directed and on where we see the future ofanalog probability calculation networks.

Page 21: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 2

Background Information

2.1About Coding

In this section, we will briefly introduce some terms and defini-Terms and definitionsfor the binary casetions of coding theory. The section primarily addresses circuit

designers to help them understand what follows. We will re-strict ourselves to the binary case, i.e., codes over the Galoisfield GF(2). This means that our information unit is the binarydigit or bit. The extension to codes over GF(q) for q > 2 is gen-erally possible and straightforward, but is omitted for the sakeof brevity.

2.1.1General Communication System

Text-books on coding such as [15, 16] often start with what ‘Figure 1’ ofinformation theoryis commonly known as ‘Figure 1’ in information theory [1].

Fig. 2.1 shows this system overview of a data communication(transmission or storage) system. An information source emitsa sequence of binary digits (bits), called the uncoded sequenceu. This sequence is transformed into the coded sequencex byan encoder. We assume that during transmission over the chan-nel (or the storage medium), the coded sequencex is corruptedby a noise vectorn. This assumption is correct if the noise is ofadditive nature and no inter-symbol interference is present, i.e.,the channel is without a filter transfer function. Thus we willobserve a noisy sequencey at the input of our decoder. Thedecoder then estimates the most probably sent data sequenceuusingy.

Page 22: On the Design of Analog VLSI Iterative Decoders - Electronics

6 Chapter 2. Background Information

Digitalsource

Codingchannel

Encoder

Digitalsink

Decoder

Noise

u x

n

û y

Figure 2.1 Simplified model of an encoder/decoder system.

2.1.2 Types of Codes

In general, we distinguish between two main types of codes thatDifferent types ofcoding are of common use today,block codesandconvolutional codes.

The output of a block encoder is strictly block oriented and isgenerated by combinatorial operations, whereas the convolu-tional encoders create data streams of possibly infinite length.Additionally, the output of a convolutional encoder is createdby a finite state machine, i.e., the encoder incorporates memorythat tracks the history of the incoming data bits. In the follow-ing we will briefly discuss the two cases and introduce someterms related to coding.

Block Codes

A binary block code is defined as an algebraic mapping fromDefinition of a blockcode the vector spaceFk over the Galois fieldF = GF(2) into the

vector spaceFn, i.e., a data sequence of lengthk is mappedonto a codeword of lengthn. If this mapping from one vectorspace to another is linear we speak of alinear code; otherwisethe code is non-linear. We restrict ourselves to the most impor-tant concepts and thus omit intentionally the detailed presenta-tion of the non-linear case.1

The encoder for ablock codecuts the incoming datastream intoNotation: [n,k] blockcode blocks of lengthk. In the binary case 2k possible messages

are encoded inton bit long codewords. Thus we speak of an1One can show that capacity is actually achieveable with linear codes.

Page 23: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 7

[n,k] block code. A linear block codeis entirely described byits generator matrixG. A codeword is built using the relation

x = u ·G, (2.1)

wherex andu are assumed to be row vectors. This assumptioncommonly used in coding theory is in contrast to the generalnotation in linear algebra. By using (2.1) we observe immedi-ately that linear codes transform the all-zero input vector intothe all-zero codeword. Equivalently, the codewordx always hasto satisfy the relation

H ·xT = 0T, (2.2)

whereH is a parity-check matrix. The rows of this matrix con-tain the information of which bits are checked by a given paritycheck, and the columns describe in which parity checks a givenbit is involved. The parity-check matrix can be derived fromthe generator matrixG.

The code rateof a block code is defined as the proportion of Code rate of a blockcodethe number of information-carrying bits compared to the total

code-word length. If the generator matrix is ak×n matrix withfull rank then the rate is given by

R = k/n. (2.3)

Convolutional codes

Like block codes,convolutional codesare defined as mapping Definition ofconvolutional codesfrom one vector space to another. But this time, the incoming

and the outgoing data streams of an encoder can be of infinitelength. An encoder for a convolutional code is shown in Fig. 2.2as afinite-state machine, i.e., a sequential logic circuit, with amemory orderof m. The generator matrixG of a convolutionalencoder has a general form of

G(D) =

G11(D) G12(D) · · · G1n(D)G21(D) G22(D) · · · G2n(D)

......

. . ....

Gk1(D) Gk2(D) · · · Gkn(D)

, (2.4)

Page 24: On the Design of Analog VLSI Iterative Decoders - Electronics

8 Chapter 2. Background Information

u x

Shift register stage XOR gate Multiplexer

D

D

D

Figure 2.2 Binary convolutional encoder with rate R= 1/2 and memoryorder m= 2.

where each elementGi j (D) represents a transfer function(Kronecker-delta response) of a linear discrete-time system(LDS) of orderm:

Gi j (D) = amDm +am−1Dm−1 +·· ·+a0

bmDm +bm−1Dm−1 +·· ·+b0. (2.5)

In the case of the encoder of Fig. 2.2, we observe two polyno-Code rate of aconvolutional code mialsG11(D) = 1+ D2 andG12(D) = 1+ D + D2. The code

rate of a convolutional code is defined by the input bits dividedby the number of outcoming code bits or equivalently

R = k/n, (2.6)

wherek andn are the dimensions ofG(D). In the example ofFig. 2.2, we haveR = 1/2.

Typically, the code rateR is kept constant, whereas the memoryIncreased complexityby higher-ordermemory

order m is increased in order to combat channel noise. Thismeans that the complexity of the code is increased to obtainmore redundancy.

In general, one wishes to keep the code rate close to unity, butHigh code rates forbetter efficiency still have strong error-protection capabilities. This creates con-

straints in the process of finding appropriate codes. Large code

Page 25: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 9

rates, i.e. larger than 0.9, are normally used for storage applica-tions such as magnetic recording, where the information densityon the storage medium is mainly limited by the used materialand by the transmission speed. Note that the capacity (in bitsper use) of a magnetic-recording channel and binary signalingin general is limited to unity. For data communication applica-tions, the constraint on the code rate is somewhat less stringent,although one wishes to have good error-correcting capabilitieswithout sending too much redundant information. But finallyonly the achievable throughput of the communication systemis interesting for us. So often we encounter code rates of 1/2and less for very noisy channels. Under these circumstanceswe may transmit the data symbols at a higher speed in order toachieve the desired communication rate. By doing so we inten-tionally allow the channel to make a certain amount of errorsthat can later be corrected by the decoder. Note also that thecode rate can be greater than 1 if the transmitted code symbolsarem-ary symbols and not binary any more. This is often thecase in modem technology such as in the V.34, V.90 and xDSLstandards [17–19].

Capacity of the channel

According to Shannon’s 1948 pioneering paper [1], the capac- Definiton of thechannel capacityity C of a given noisy channel is defined as the maximum pos-

sible transmission rate, i.e., the rate at which the source is prop-erly matched to the channel. This abstract definition has verydeep significance. It is always possible to send information ata rate lower thanC through a channel with an error probabil-ity as small as desired by properly encoding the informationsource. This statement on controlled error probability is nottrue for rates aboveC. Shannons theorem on error-free trans-mission does not tell us how to make good codes, it only tellsus the limit. Practical error control schemes have long beenfar away from this theoretical limit. Only the advent of thecomplex Turbo codes [2, 3] and their iterative decoding by be-lief propagation has brought us to some tenth of dBs above theShannon limit. An even higher rate has been reached using verylarge low-density parity-check codes [20], which were inventedby Gallager [5].

Page 26: On the Design of Analog VLSI Iterative Decoders - Electronics

10 Chapter 2. Background Information

Systematic codes

Often, the uncoded data sequence is part of the codeword. IfDefinition of asystematic code the uncoded information bits are transmitted together with the

parity-check bits over the channel, the code is calledsystematic.But this must not be necessarily the case. The code can consistequally well of parity information only. In order to improvethe code rate of a given code, sometimes a different approachis chosen: first the data sequence is encoded as usual, but thensome bits are pruned and thus not transmitted over the chan-nel. This encoding procedure can often be observed in Turbocoding.

Hamming distance

The distance between two codewords is the number of posi-Definition of theHamming distance tions where they differ. The minimum distanced of a code is

then defined as the minimum of all distances between any twocodewords of the code. A code with minimum distanced iscapable of correctingb(d−1)/2c errors, wherebxc denotes thegreatest integer less than or equal tox. If d is even, the code cansimultaneously correct(d − 2)/2 errors anddetect d/2 errors.This statement is strictly true for block codes. In the case ofconvolutional codes, the decoder may be confused if more than(d −2)/2 errors are present and may not recover anymore untilthe decoder is resynchronized with a known state.

The definition of the previous paragraph is only strictly validThe Hammingdistance is called freedistance forconvolutional codes

for block codes. In the case of convolutional codes, we speakof the free distanceof a code, but the meaning of this distancemeasure is basically the same [21].

2.1.3 Hamming Codes

Hamming codes are a whole class of linear block codes that cancorrect single errors. The error-correcting capability is given byits Hamming distanceas defined before.

According to [15], Hamming codes of lengthn = 2r −1 (r ≥ 2)Definition of aHamming code are defined as having a parity-check matrixH whose columns

consist of all non-zero binary vectors of lengthr , each used

Page 27: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 11

once. A Hamming code is thus an (n = 2r − 1, k = 2r − 1−r , d = 3) block code. The Hamming code as it was first definedis the [7,4,3] block code with the parity-check matrix

H =0 0 0 1 1 1 1

0 1 1 0 0 1 11 0 1 0 1 0 1

. (2.7)

Any code consisting of rows that are created using linear com-Equivalence of blockcodesbinations and column permutations of the original matrixH is

said to beequivalent. There exist several extensions of the orig-inal Hamming code using redundant equations, for example the[8,4,4] extended Hamming code. They generally perform bet-ter because of their larger minimum distanced.

2.1.4Low-Density Parity-Check Codes

Low-density parity-check (LDPC) codes have a parity-check Defintion of anLDPC codematrix H that is very sparse, i.e., there are only a few 1’s in the

matrix. According to Gallager’s initial definition in 1963 [5],the parity-check matrix contains a small fixed numberj of 1’sin each column and another small fixed numberk of 1’s in eachrow.

For good performance, large block lengths are required. LDPC codes mayoutperform Turbo

codesMacKay [22] has shown recently that the number of 1’s in thecolumns has not necessarily to be constant. Even better resultsare obtained by statistically varying the number of 1’s withina column but still keeping the total number small. Recently, ithas been shown by Richardsonet al. [20] that very long low-density parity-check codes outperform comparably long Turbocodes by a distinct margin and come even closer to the theoret-ical capacity of a given channel.

2.1.5Trellis Codes

In what follows, we will briefly present two different waysof temporally describing a convolutionally encoded sequence.The two graphical representations completely define the code.The code-tree representation graphically explodes for large

Page 28: On the Design of Analog VLSI Iterative Decoders - Electronics

12 Chapter 2. Background Information

10

11

01

00

1/01

0/00

1/00

0/01

1/11 0/11

1/10 0/10

Figure 2.3 The state-transition diagram for the convolutional encoder ofFig. 2.2.

code sequences. Thus the more condensed trellis diagram isa solution to that representation problem.

Assume the convolutional encoder of Fig. 2.2 and the corre-Convolutional coderepresented by a codetree. . .

sponding state-transition diagram of Fig. 2.3. The initial con-tent of the encoder memory is assumed to be 00. We may nowdraw a tree, as in Fig. 2.4, with branches labeled according tothe outcoming bits of this encoder for any incoming data se-quenceu. The boxed nodes of the tree denote the content ofthe encoder memory and will be called thestate of the encoder.Following the upgoing branch at a given node means that wehave encoded a binary 1, and a 0 otherwise. Going throughthe entire tree we can on the one hand read off the informationsequence and on the other also collect the encoded sequence.

Because the size of the tree representation is rapidly outgrowing. . . or a trellis diagramstandard papersizes for larger code lengths, we might look fora more compact image of the code that is more quickly cogniz-able. Thetrellis representation is the solution to this problem.Since the output at timet +1 is simply defined by the state (i.e.the encoder memory) of timet and the new input digit, we cancollapse all nodes showing the same state memory content ata given time instant. For our simple example of Fig. 2.4 thismeans that we have only 2m = 22 = 4 different states to depict.

Page 29: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 13

00

10

01

11

01

10

00

00

10

00

11

00

10

01

11

11

00

11

00

10

01

01

10

00

11

10

01

11

00

01

10

00

11

10

01

11

00

10

00

11

11

01

11

00

01

1

0

u1 u2 u3 …

The code tree for the convolutional encoder of Fig. 2.2. Figure 2.4

As shown in Fig. 2.5, the name choice seems to be obvious: thelook of the graph is the same as the trellises we find in our gar-dens to fasten tall flowers and espalier trees to the wall. Afterhaving left the initial transitional trellis sections, the trellis ofthe code is simply built by concatenating identical trellis sec-tions. Compared to a block code, the convolutional code haspossibly infinitely long codewords. But often the code is manu-ally terminated to the zero state by adding an appropriate num-ber of zeros at the input of the encoder after a certain number ofbits. By doing so, we may transform a convolutional code intoa block code.

A single section of the complete trellis diagram is caracterized Description of atrellis sectionby the left statesSt , the right statesSt+1 and the branches con-

necting the left states and the right states with a characteristicpattern (Fig. 2.6). A branch between a left state and a rightstate indicates an allowed state transition of the encoding state-machine of Fig. 2.3. The labels on the branches directly indicatethe incoming data and the encoder output.

Trellis codes represent a huge class of codes that can be de-General trellis codesfined by trellis diagrams as defined above. The encoding of anytrellis code can be done by a finite-state machine. The generaltrellis code may consist of many different trellis sections [21].

Page 30: On the Design of Analog VLSI Iterative Decoders - Electronics

14 Chapter 2. Background Information

10

000000 00

01

10

11

01

10

11

00

01

10

11

1

0

u1 u2 u3 u4 …

Figure 2.5 Trellis representation of the code tree of Fig. 2.4.

00

11

10

left states uncoded/coded right states

01

00

11

10

01

0/00

1/01

1/10

0/10

0/01

1/00

1/11

0/11

St St+1

Figure 2.6 One trellis section of the code of Fig. 2.2.

Page 31: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 15

………

………

………

………

Formation of a tail-biting trellis. Figure 2.7

Thus, the finite-state machine may be very complex and consistof interconnected simpler finite-state machines. In contrast tothe general trellis codes, convolutional codes represent only asub-class of trellis codes. The convolutional codes are linearcodes since the output sequence is found by convolving the in-put sequence with the Kronecker-Delta response of the encoder.Convolutional codes have therefore a limited number of statesand state transitions which are the same for all time instances.Hence, the finite state-machine is generally simple.

Tail-Biting Trellis Codes

A sub-class of the trellis codes are the so-calledtail-biting trel- Definition of atail-biting trellislis codes. This type of code is defined by the concatenation of a

certain number of individual trellis sections. But instead of ter-minating the overall trellis at the beginning and at the end, thetail-biting trellis is formed by connecting the outgoing states ofthe last trellis section to the incoming states of the first trellissection. Fig. 2.7 shows such a tail-biting trellis which forms aclosed ring structure. A valid codeword is then defined by apath starting in any state at a certain point, i.e., not necessarilythe zero state, and terminating in the same state after one turn.Encoding by a convolutional encoder is somewhat more diffi-cult than for the non-tail-biting case, since an additional condi-tion for the closed path has to be met. The benefit of this closedstructure is that no termination information, which can causemuch overhead for convolutional codes of small blocklengths,has to be added.

Page 32: On the Design of Analog VLSI Iterative Decoders - Electronics

16 Chapter 2. Background Information

2.1.6 Turbo Codes

A recently discovered new class of concatenated codes, the so-Parallellyconcatenated Turbocodes

called Turbo codes, has turned the view of coding theory upsidedown. In fact, until 1993, all types of codes known up to thenwere separated from Shannon’s limit by well over 1 dB. Eventhe most advanced coding schemes, as for example the serialconcatenation of constituent codes that has been used for state-of-the-art satellite communications, struggled with this imag-ined boundary. In 1993, Berrouet al.[2] presented their first ar-ticle on Turbo codes. The imagined boundary was oversteppedby the introduction of this new coding scheme that consistsof two parallelly concatenated convolutional codesC1 andC2connected by a bit-interleaving structure or permutationπ , asschematically presented in Fig. 2.8. The excellent performanceof Turbo codes is rooted in both the construction of the codeand the corresponding iterative decoding techniques.

In fact, the high complexity of Turbo codes is distributed bothFunctioning of Turbocodes over the constitutent convolutional codes and the interleaver.

As we will see in Section 3.2.2, the interleaver can be seen asa complex connection pattern between the two parallelly con-catenated convolutional codes. If we cut this connection pat-tern vertically at a given position and count the number of linescrossed by this section, we directly get the number of statesof an equivalent convolutional code. As an example, we couldassume that the section crossesm pattern connections of the in-terleaver. Under these conditions, an equivalent convolutionalcode would have at least 2m states, since we did not take intoaccount the number of states of the constituent convolutionalcodes so far. Hence, very complex concatenated codes canbe constructed even if relatively low-complexity convolutionalcodes are used for their constituent codes. Furthermore, theinterleaver allows, by spacial (or temporal, depending on thepoint of view) decorrelation of adjacent information pieces, thecontrol of hard-to-correct errors. However note that interleav-ing structures for decorrelating error bursts are also used forother data transmission schemes, but there they are a pre-stepbefore decoding, and hence, they are not involved in the de-coding itself. This decorrelation is most important in the caseof burst errors which are, for example, present in the case ofmobile communication very close to a base station.

Page 33: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 17

code C1

code C2

x1

u

x2

π interleaver

Principal ingredients of a Turbo code. Figure 2.8

Convolutional codes with the same complexity as Turbo codesA note on decodingof Turbo codesare very difficult to decode by standard means (e.g., the Viterbi

algorithm [23, 24]), since the number of states grows exponen-tially with the memory orderm. Using iterative decoding tech-niques significantly lowers the computational complexity. Thisis mainly due to the interleaver which creates most of the com-plexity in Turbo codes, and is a simple permutation operation.

The characteristic of Turbo codes is a very steep curve of bit- Turbo codes showsteep waterfall curveserror rate (BER) versus signal-to-noise ratio (SNR). Such char-

acteristic curves are also known as water-fall curves. In the caseof parallel concatenation of constituent codes, the BER versusSNR curve flattens at higher SNR values to an error floor. Thisis due to the small minimum distance of parallely concatenatedTurbo codes. Serially concatenated codes, in contrast to paral-lelly concatenated codes, have a higher free distance, thus theerror-floor phenomenon is not present in this case. A compari-son of BER curves of several typical parallelly and serially con-catenated codes is shown in Fig. 2.9 (see also [21,25]).

2.1.7Channel Models

Up to here, we have treated the coding channel of Fig. 2.1 asa black box. Actually it consists of a modulator, the physi-cal channel, and a demodulator as shown in Fig. 2.10. In thecontext of this thesis we introduce two simple channel modelsof interest, theadditive white gaussian noise channel(AWGNchannel) and thebinary symmetric channel(BSC). Although

Page 34: On the Design of Analog VLSI Iterative Decoders - Electronics

18 Chapter 2. Background Information

10-1

10-2

10-3

10-4

10-5

10-6

0 0.2 0.4SNR [db]

0.6 0.8 1

BE

R

PCCCSCCC

Figure 2.9 A comparison of serially (SCCC) and parallely (PCCC)concatenated Turbo codes. Parallel concatenated Turbo codesshow their characteristic error-floor behaviour [25].

neither represents the actual situation of most of today’s main-stream applications with any accuracy, they are well-suited forour presentation of analog decoders. Complex situations suchas a mobile radio channel incorporate, in addition to the AWGNelement, also problems like inter-symbol interference (ISI) dueto multipath transmission, and fading [26]. We will intention-ally omit these complex channel situations from the presenta-tion to keep it simple.

AWGN channel

The modulator of Fig. 2.10 adapts the discrete-time, binary se-BPSK modulatorquence to the continuous-time, analog, physical channel. Gen-erally, this is done by so-called pulse-shaping filters. In thebinary case, the output of such a modulator is built out of twosignalss0(t) ands1(t) for an encoded 0 and 1, respectively. Interms of simple detectability, a good choice of these signals for

Page 35: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 19

Digitalsource

AWGNchannel

Matchedfilter

detector

Encoder

Digitalsink

samplerDecoder

u

s(t)

r(t)

n(t)

û

Modulator

Q-levelquantizer

x

y

Discrete memoryless channel

Demodulator

ρρρ

A more detailed view of Fig. 2.1 for a coded system on anadditive white Gaussian noise (AWGN) channel.

Figure 2.10

a wideband channel is

s0(t) =√

2Ec

Tssin(2π f0t + π

2

), 0≤ t < Ts

s1(t) =√

2Ec

Tssin(2π f0t − π

2

), 0≤ t < Ts

(2.8)

whereTs is the duration of one symbol (or the sampling pe-riod), f0 is a multiple of 1/Ts, and Ec is the energy of eachsymbol. This is calledbinary-phase-shift-keyed(BPSK) modu-lation. The transmitted signal is a sine-wave pulse whose phaseis either+π/2 or−π/2 depending on the encoder output. Sincewe haves1(t) = −s0(t) we speak of antipodal signaling. Thetwo signal forms allow us to send one channel bit per symboland timeTs.

If we assume that the physical channel is memoryless, i.e.,Continuous AWGNdefinitionthe channel output depends only on the presently transmitted

symbol, a common form of disturbance of the channel can beAWGN. If we assume the transmitted signal ass(t), the re-ceived signal is

r (t) = s(t)+n(t), 0≤ t < Ts (2.9)

Page 36: On the Design of Analog VLSI Iterative Decoders - Electronics

20 Chapter 2. Background Information

where n(t) denotes a sample function of the additive whiteGaussian noise process withpower spectral density(PSD)8nn( f ) = N0/2.

The demodulator has to produce an output corresponding to theBPSK demodulatorsignal received in each time intervalTs. This output may be areal number or one symbol from a discrete set of preselectedsymbols, depending on the demodulator design. An optimumdemodulator always includes a matched filter or a correlationdetector followed by a sampling process. For the BPSK-casewith coherent detection, the sampled output of the demodulatoris the real number

ρ =∫ Ts

0r (t)

√2

Tssin(2π f0t + π

2

)dt . (2.10)

For theM-ary case, the demodulator decomposes the receivedDemodulator forM-ary signals signal and the noise intoN-dimensional vectors, whereN ≤ M .

This means that the signal and the noise are expanded into aseries of linearly weighted orthonormal basis functions{ fn(t)}as is shown, for example, in [26]. It is assumed that theNbasis functions{ fn(t)} span the signal space, so that each of thepossible transmitted signal waveforms can be represented as aweighted linear combination of{ fn(t)}.In contrast to the continous-time AWGN channel definitionDiscrete AWGN

definition of (2.9), we can define an equivalent discrete version of theAWGN channel whose samples at time instancei are charac-terized by

yi = xi +ni , (2.11)

whereni is a white Gaussian random process, andxi was ob-tained by mapping the binary bits of the codeword to an an-tipodal signal with amplitude

√Ec: xi 7→ xi = ±√

Ec. TheGaussian random process is white in the sense that each sam-ple is independent of any other sample. The probability densityfunction of each sample of such a process is defined by

fn(x) = 1√2πσ 2

n

e− x2

2σ2n , (2.12)

whereσ 2n is the variance of the zero-mean white Gaussian noise

n(t). This varianceσ 2n is related to the the one-sided PSDN0

Page 37: On the Design of Analog VLSI Iterative Decoders - Electronics

2.1. About Coding 21

1

0

1

0

1-

1-

The transition diagram of a binary symmetric channel (BSC)with cross-over probabilityε.

Figure 2.11

by

σ 2n = N0

2. (2.13)

In fact, we can think of the outputyi of the discrete AWGN Analog-input anddiscrete-input soft

decoderschannel as of an unquantized demodulator outputρ of the con-tinous time AWGN channel. By doing so we can treat both ofthe channel models equivalently. Hence, the sequence of un-quantized demodulator outputs can be passed on directly to ananalog decoder. Today, a much more common approach to de-coding is to quantize the continuous detector outputρ into oneof a finite numberQ of discrete symbols. In this case, thedigi-tal decoderhas discrete inputs.

Binary Symmetric Channel

If we assume a memoryless physical channel enclosed by an DMC and BSCdefinitionsM-ary modulator and aQ-ary output demodulator, adiscrete

memoryless channel(DMC) can be modelled. The most im-portant case in the context of this thesis is thebinary symmetricchannel(BSC) withM = Q = 2. This configuration can be rep-resented by a channel diagram and is completely described bythe transition probabilityε. The probabilities on the branchesof Fig. 2.11 represent the conditional probabilitiesp(yi |xj ) withi , j ∈ {0,1}.A decoder following a DMC withM = Q is generally called DMC and hard

decodershard-decision decoder or hard decoder. Through the quantisa-tion of the channel output which can be relatively coarse, thesedecoders perform generally worse than those in conjunctionwith soft-output channels such as the AWGN channel. Note

Page 38: On the Design of Analog VLSI Iterative Decoders - Electronics

22 Chapter 2. Background Information

that in the case of a hard decoder, the input can be still a real-valued number but the possible number of symbols or values islimited.

The transition probabilityε can be calculated from the know-BPSK transitionprobability expressedby thecomplementary errorfunction Q

ledge of the signals used, the probability distribution of thenoise, and the output quantization threshold of the demodula-tor. In the case of BPSK modulation on an AWGN channelwith optimum coherent detection and binary output quantiza-tion, the transition probabilityε is just the bit error probabilityfor a signal sequence with equally likely symbols given by

ε = Q

(√2Ec

N0

), (2.14)

whereQ(x) , (1/√

2π)∫∞

x e−y2/2dy is theQ functionof Gaus-sian statistics [26].

2.1.8 Types of Errors

On memoryless channels, the noise affects each transmittedRandom-errorchannels symbol independently. As an example, consider the BSC with

a transition diagram as shown in Fig. 2.11. Each transmitted bithas a probabilityε of being received incorrectly and a proba-bility 1 − ε of being received correctly, independently of othertransmitted bits. Hence transmission errors occur at random inthe received data sequence. Therefore memoryless channels arecalledrandom-error channels. Typical examples of such chan-nels are the deep-space channel and many satellite channels, aswell as most line-of-sight transmission systems [16].

On the other hand, on channels with memory, noise is not inde-Burst-error channelspendent from transmission to transmission. A very simple ex-ample is a model with two states: a ‘good’ state in which trans-mission errors occur infrequently and a ‘bad’ state in whichtransmission errors are highly probable. Both of them maybe modeled according to Fig. 2.11 but with differentε’s. Thechannel is in the good state most of the time, but occasionallyshifts to the bad state due to a change in the transmission char-acteristic of the channel induced, for example, by ‘deep fading’caused by multipath transmission. As a consequence, transmis-sion errors occur in clusters or bursts because of the high tran-sition probability in the bad state. Such channels are thus called

Page 39: On the Design of Analog VLSI Iterative Decoders - Electronics

2.2. Analog Viterbi Decoding 23

burst-error channels. Typical examples include radio channels,where the error bursts are caused by signal fading due to multi-path transmission, wire and cable transmission that is subject toimpulse switching noise and crosstalk, and magnetic recording,which is subject to tape dropouts due to surface defects anddust particels [16]. For this type of errors, codes with error-decorrelation capabilities, such as e.g. Turbo codes, are espe-cially suitable. The interleaver separates the error burst whichthen can be corrected on a local scale. Large LDPC codes havecomparable capabilities (see also Section 2.1.4).

Finally, combinations of both channels with random errors andCompound channelsburst errors can be found. We call these channelscompoundchannels. For each of the above cases, codes especially adaptedto their environment may be constructed.

2.2Analog Viterbi Decoding

2.2.1Computational Considerations on the VLSIImplementation of Viterbi Decoders

The common basic digital circuits with binary memory cells Happy matchbetween algebraiccoding theory and

digital circuitprimitives

and logic gates are ideally suited for finite-field arithmeticwhich is the mathematical basis for algebraic coding and de-coding theory. This happy match has been exploited for a longtime to build efficient VLSI implementations of decoders forsuch codes. Codes constructed according to algebraic codingtheory are best used in applications that have a sufficient mar-gin to the theoretical performance limit. Typical examples ofsuch codes are BCH codes and Reed-Solomon codes [15, 27],which provide strong error protection against low noise levels.

In contrast to the algebraic approach,probabilistic techniques Probabilistic codingtechniques are less

ideal for a VLSIimplementation

like the maximum-likelihood (ML) sequence detection usingthe Viterbi algorithm [23, 24] and decoding techniques basedon probability calculus, such as the sum-product algorithm thatwe use in the context of this thesis, are best suited for appli-cations that have to operate near the theoretical performancelimit. However, the match of these techniques to digital VLSIis less than perfect. In fact, the implementation of a high-speedViterbi decoder takes considerably more chip area than, say, a

Page 40: On the Design of Analog VLSI Iterative Decoders - Electronics

24 Chapter 2. Background Information

BMC ACS SSMdecodedinformation

receivedchannelinformation

ACS feedback loop

Figure 2.12 Simplified block diagram of a Viterbi decoder with its mainconstituents branch-metric computation (BMC) unit,add-compare-select (ACS) unit and storage-survivor-memory(SSM).

BCH decoder achieving the same bit rate. This is mainly dueto the binary number system of today’s computers which cre-ates a considerable overhead in the implementation of floating-point arithmetic units and thus needs a significant amount ofchip area. Approximations to the floating-point number repre-sentation using equivalent fixed-point units can be made, butthe binary system still introduces a large amount of redundancyin terms of the data representation at the cost of higher powerconsumption and larger chip area.

Because of the ever growing transmission-speed and power-Partly analog Viterbidecoders are asolution to thehigh-speed andlow-power dilemma

efficiency requirements for both fixed and mobile communica-tion devices, some researchers have recently become interestedin analog decoding techniques. Many analog or hybrid imple-mentations of the Viterbi decoding algorithm for trellis codeshave been proposed [28–40]. All of them simply replace themost critical parts of the Viterbi decoder of Fig. 2.12, generallythe add-compare select (ACS) unit and its feed-back loop, byanalog circuit implementations. But still, the whole decoderremains a sequential machine that performs one trellis compu-tation after the other. Therefore, the time needed for one trelliscomputation limits the speed of the overall system.

2.2.2 Circuit Implementation of Analog andMixed-Signal Viterbi Decoders

Regarding the different circuit implementations, we can dis-Distinction betweenvoltage-mode andcurrent-modeimplementations

tinguish betweenvoltage-modeandcurrent-modeimplementa-tions. These two terms are generally used to classify whether

Page 41: On the Design of Analog VLSI Iterative Decoders - Electronics

2.2. Analog Viterbi Decoding 25

the information-carrying signals are mainly of current natureor of voltage nature. This distinction is in practice not so ob-vious. For example the input information of a simple currentmirror can be seen as the current passing the diode-connectedinput transistor as well as the gate-source voltage of the sametransistor driving the second transistor. A more appropriatemeans to identify voltage- and current-mode circuits is the cri-teria whether currents are driving mostly high-impedance nodes(voltage-mode) or low-impedance nodes (current-mode) at theinput of a building block [41,42].

A second classification criteria of the different analog andDistinction betweencontinuous-time anddiscrete-time circuits

mixed-signal implementations of Viterbi decoders are theirmode of operation in time. We can mainly distinguish betweencontinuous time circuits and switched or discrete time circuits.For the overall decoding function to be implemented it doesnot matter at all which of the two operation modes is used. Themain classes of the discrete-time circuits are switched-capacitor(SC) circuits [43,44] and switched-current (SI) circuits [45]. Inthat sense this classification criterion is orthogonal to the dis-tinction between voltage-mode and current-mode circuits.

From the point of view of the impedance criteria, the imple-Discussion of analogand mixed-signalViterbi decoders

mentations described in [28–32] are clearly continuous-timevoltage-mode circuits using SC cells to store the analog state-metrics and feed them back to the input of the ACS unit. Theyall use opamp-based adders and comparators to implement theACS unit. With the evolution in time, they have consecutivelyreached higher operational speeds starting from some 10Mbit/sto over 200Mbit/s. The solution described in [38] uses SCtechniques for both, metric calculation and storage and is there-fore also a pure voltage-mode circuit implementation. A pro-cessing speed of 500kbit/s at a power consumption of less than8mW has been demonstrated by this technique. The approachin [33, 34] is one of mixed current- and voltage-mode. Themetric calculations are accomplished partly in the current do-main [33], whereas the comparison and the storage of the statemetrics (SC sample-and-hold (S/H) circuits) are in the voltage-mode. The realization of the ACS function is implemented bya new diode network using threshold-programmable diode de-vices. Hence this part of the Viterbi decoder is a continuous-time circuit. Very high operational speed of over 300Mbit/shas been reported in [32]. Pure current mode implementationsare reported by Demosthenous and Taylor in [35–37]. The ad-

Page 42: On the Design of Analog VLSI Iterative Decoders - Electronics

26 Chapter 2. Background Information

dition and comparison of two current signals are simple tasks incontinuous-time current-mode circuits. The storage of currentsignals can equally well be accomplished by SI memory cellscompared to SC techniques. Decoding speeds of> 100Mbit/sare expected using the fully-current-mode approach. But al-though the calculations in the current domain are very promis-ing regarding their simplicity, the fully-current-mode approachis still in its feasability-study state, whereas voltage-mode cir-cuits are already considered for use in practical applications.

2.3 Network Decoding

In order to meet the ever growing demand in decoding speed,several methodologies for creating network decoders have beenproposed so far. Most of them are based on sequential machinesor are some other kind of discrete-time signal processors. Butthere exist also continuous-time processors which, as we willsee later on in this thesis, come most closely to our approachof probability-based analog, continuous-time networks. In thissection we revise the most important examples of the networkdecoding approach.

2.3.1 Non-Algorithmic Diode Decoding

Minty [46] described an elegant solution method to the shortestShortest-pathproblem path problem: assume a net of flexible, variable length strings.

In that scale model of the network you wish to find the shortestpath. If you now pick up the source node and the destinationnode and pull the nodes apart until the net is tightened, you di-rectly find the solution along the thightened path as in Fig. 2.13.Note that Minty’s solution applies to non-directed graph mod-els only. It is thus not directly applicable to trellis decoding.The decoding of a trellis code is equivalent to the shortest pathproblem in adirectedgraph [24,46].

An analog circuit solution to the shortest-path problem in di-Diode decoderrected graph models has been found independently by Davisand much later by Loeliger [48, 49]. It consists of an analognetwork using series-connected diodes. According to the indi-vidual path section lengths as in Fig. 2.14 a number of series-connected diodes are placed. The currentI will then flow along

Page 43: On the Design of Analog VLSI Iterative Decoders - Electronics

2.3. Network Decoding 27

The shortest path problem solved using a rope net [47]( c©1999 IEEE).

Figure 2.13

the path with the least number of series-connected, forward-biased diodes. Note however that the sum of the diode thresh-old voltages fundamentally limits practical applications. Veryhigh supply voltages will be needed for larger diode networks,which makes this elegant solution useless for VLSI implemen-tations.

A similar method to Davis’ diode decoder has recently beenDiode decoder withvariable threshold

voltagepresented by Buet al. [47]. The basic network elements arevariable threshold-voltage diode-devices connected to an ana-log network. For each branch of the network, exactly one de-vice is introduced for each variable parameter. Instead of as-sembling more or fewer diodes in series for one branch, thethreshold voltage can be tuned continuously within a certainrange. Fundamentally, this solution suffers from the same lim-itations in terms of supply voltage as the solution of [49]. Theidea of variable turn-on voltages however offers a possibilityto build soft-input decoders. This idea can also be found inthe analog Viterbi decoder implementation of Shakibaet al.[33,34].

2.3.2Neural Network and Fuzzy Logic Decoding

Classifying signals is a task to which neural networks areNeural networks anddecodingoften applied. This approach can be found in various do-

mains of signal and information processing as for examplein optical character recognition (OCR) [50–53], handwritingrecognition [54–56], voice recogniton [57,58] and general pat-tern recognition and classification [59, 60]. The mapping of

Page 44: On the Design of Analog VLSI Iterative Decoders - Electronics

28 Chapter 2. Background Information

r1 r2 r3r4 r5

I

Figure 2.14 Hard-decision decoder for a very simple trellis code. Thereceived bits ri control directly the switches as indicated by thedashed lines.

the high-dimensional received input-signal space to the lower-dimensional space of valid codewords can also be seen as a‘simple’ classification problem. Hence, it is not astonishingthat several attempts to solve the decoding problem with neu-ral networks have been proposed [39, 40, 61–63]. Wang andWicker [39] and Verdieret al. [40] even proposed an analogimplementation of a neural network for decoding purposes.

Related to the neural network decoding approach is the fuzzyFuzzy sets forclassifiers logic decoding idea. In 1965, Zadeh proposed thefuzzy setcon-

cept [64] as a means of handling unreliable information. Basedon these mathematical foundations, Wuet al. [65] introduceda hybrid fuzzy-neural-network decoder. The proposed fuzzyneural classification network basically consists of a three-layerneural network. To enhance the associative capability of thenetwork, fuzzy membership functions were defined for eachhidden node.

2Analog Network Decoding

With the advent of computationally demanding iterative de-coding techniques, several researchers started to look for non-

Page 45: On the Design of Analog VLSI Iterative Decoders - Electronics

2.4. Bio-Inspired Networks 29

traditional representation and implementation methods of algo-rithmic decoding networks. Wiberget al.were the first to spec-ulate on analog implementations of the so-called sum-productalgorithm [66]. Inspired by the analog diode network decoderspresented in Section 2.3.1 and Wiberg’s work on iterative graphdecoding [67], a new analog implementation approach for themaximum-a posteriori(MAP) decoding has recently been pre-sented independently by Hagenaueret al. [68–70] and by us[71–79]. This work is also the basis for the present thesis,which deals mainly with design aspects and implementationissues of analog VLSI iterative decoders based on the sum-product algorithm. For a long time, Hagenaueret al. did notconsider the actual transistor implementation of their non-linearnetworks at all. Only very recently they became interestedin chip implementations of their networks [70]. They finallycame up with the same generic transistor modules as we hadbefore [71], with only a slight difference in the way of connect-ing individual circuit blocks.

2.4Bio-Inspired Networks

Mead championed a completely new analog VLSI design style Adaptive analogsystems are far more

efficient[11]. Theneuromorphicapproach for analog signal processingcircuits mimics in many ways the function of the biological ner-vous system. This design style is characterized by exploiting,rather than fighting, the fundamental nonlinearities of transis-tor physics. Precision is achieved on the system level despitelow-precision components. Mead suggests in [80] that“adap-tive analog systems are 100 times more efficient in their use ofsilicon, and they use 10’000 times less power than compara-ble digital systems.”Many practical examples such as buildingblocks for neural networks [81–86], artificial cochleas [87–91],silicon retinas [92–94], and motion detectors [12,95] have beensuccessfully fabricated. They show indeed astonishing system-level performance compared to traditional analog and digitalsystems.

In the context of decoding of digital codes, ‘neuromorphic cir- neuromorphic vs.bio-inspiredcuits’ does not seem to be the correct term. In fact, we do not

want to copy the function of the nervous system. The aim israther to have an electronic system that has advantages com-parable to its biological counterpart. Thus we speak ofbio-

Page 46: On the Design of Analog VLSI Iterative Decoders - Electronics

30 Chapter 2. Background Information

inspiredcircuits instead of neuromorphic circuits to make thisdifference clear. Key-features of bio-inspired circuits are low-power and high-speed operation on system level. They benefitfrom collective computation, its precision is gained on the sys-tem level despite the use of low-precision components, and theyare small in size. A further key-feature of the bio-inspired de-sign approach is that one exploits, rather than fights against,the inherent nonlinearities of the basic semiconductor devices.This means that in contrast to a conventional analog circuit-design approach, the bio-inspired approach uses the devices ‘asis’ and does not try to implement ‘linear’ transfer characteristicsout of linearized non-linearities by relying on the small-signalconcept.

Page 47: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 3

TheProbability-Propagation

Algorithm

3.1Problem Statement

In this section, we will review the basic decision theory and itsapplication to the decoding problem. Several standard decod-ing criteria such as the maximum-likelihood (ML) rule and themaximum-a posteriori(MAP) rule are discussed for both bit-and block-wise decoding.

3.1.1Basic Decision Theory

Basically, decoding is a decision-making process. Based on Decoding is adecision processthe observed data vector, the decoder tries to figure out which

actual information bit or information vector has been generatedby the information source.

Recalling our data transmission system of Fig. 2.1, we assumein the following that the data source is an independently andidentically distributed (i.i.d.) source, so that we can write

PU(u) =k−1∏i=0

PU (ui ) (3.1)

Page 48: On the Design of Analog VLSI Iterative Decoders - Electronics

32 Chapter 3. The Probability-Propagation Algorithm

generator decisiontransformation

generators

A R

B1,…,BL

â(R)

Noise

Figure 3.1 A general model of the decision problem. Using observation Rone wants to decide on A without having direct access to it.

with PU (ui ) having the same distribution for alli .1 The datatransmission is assumed to be over a time-invariant, memory-less and feedback-less channel. Under these conditions we areallowed to write the probability of the received values condi-tioned on the sent values as a product of conditional probabili-ties:

PY|X(y|x) =k−1∏i=0

PY|X(yi |xi ). (3.2)

A general model of a decision2 problem is shown in Fig. 3.1,Decision modelwith A being a random variable or random vector with finitealphabetA, B1 ∈ B1, . . . , BL ∈ BL some additional randomvariables we are not directly interested in, and observationRa continuous or discrete random variable or random vector. Forreasons of convenience, in the further derivations, we assumeR to be a finite discrete random variable with alphabetR.

A good quality measure of a decision is the probability of mak-Quality measure fordecisions ing a correct decision. This probabilityPcorrect, P[ A(R) = A]

1The subscriptU of the probability functionP denotes the investigatedrandom variable, whereas the argumentu is a concrete realization of thisrandom variableU . To have a more condensed writing, one often writessimply P(u). A more detailed description of the random variable conceptand stochastic processes can be found in [96].

2It is important to note the difference betweendecisionandestimation.While in an estimation problem a value close (according to a certain dis-tance measure) to the actual occurred value is headed for, in a decision

Page 49: On the Design of Analog VLSI Iterative Decoders - Electronics

3.1. Problem Statement 33

can be calculated in a general way as

Pcorrect=∑r ∈R

P(A(r ) = A|R = r )PR(r )

=∑r ∈R

PA|R(a(r )|r )PR(r ). (3.3)

The aim of a decision-making process is then to maximize thisprobability. The two most popular rules, the MAP and the MLdecision rules, are discussed in the following subsections.

3.1.2MAP Decision Rule

By definition, probabilities are non-negative numbers in therange [0..1], and the decisiona(r ) of A can be defined freely foreach outcomer of R. HencePcorrect is maximized by choosinga(r ) for eachr such thatPA|R(a|r ) is maximized. This followsdirectly from (3.3), and thus we define

aMAP(r ) , argmaxa∈A

PA|R(a|r ) (3.4)

= argmaxa∈A

∑b∈B1×...×BL

PAB|R(a,b|r ), (3.5)

where the function argmaxx f (x) returns the particularx thatmaximizesf (x).3 This decision rule is calledmaximum-a pos-teriori (MAP) decision rule to express that it maximizes theaposterioriprobability P(a|r ) for r .

problem one looks for the exact value. Therefore, an estimation can bemore or less good, whereas the decision is either right or wrong.

3In general, argmax is a function that returns a set of all maxima. Ofcourse, this set has more than one element if the maximum is achieved fordifferent values; but here we assume that argmax returns only one value.

Page 50: On the Design of Analog VLSI Iterative Decoders - Electronics

34 Chapter 3. The Probability-Propagation Algorithm

3.1.3 ML Decision Rule

In the previous subsection we have seen that the maximizationof Pcorrectis independent ofPR(r ), and so we can write (3.4) as

a(r ) = argmaxa∈A

PA|R(a|r ) = argmaxa∈A

PAR(a,r )

PR(r )= argmax

a∈APAR(a,r )

= argmaxa∈A

PR|A(r |a)PA(a).

(3.6)

However, PA(a) is often unknown, so one assumes thatA isuniformely distributed, i.e.,PA(a) is constant for alla ∈ A.With this assumption we can postulate a new decision rule:

aML (r ) , argmaxa∈A

PR|A(r |a). (3.7)

This decision rule is known as themaximum-likelihood(ML)decision rule. We immediately verify the equivalence betweenthe ML-case on the left-hand side of (3.8) and the MAP-caseon the right-hand side of the equation of (3.8), whenA is uni-formely distributed:

argmaxa∈A

PR|A(r |a) = argmaxa∈A

PR|A(r |a)PA(a), (3.8)

again using the fact that the maximization is independent ofPR(r ).

We have observed that the MAP decision rule and the ML de-Equivalence andoptimality of MAPand ML

cision rule are equivalent if the condition thatA is uniformelydistributed is met. In the general case, however, the ML deci-sion does not necessarily maximizePcorrect, since the distribu-tion of A can rarely be guaranteed to be uniform. For practicalapplications, wherePA(a) is often unknown, the ML decisionrule is used regularly despite its non-optimum character. This isnot a limiting drawback as the aim of possible source encodingis to modify the probability distribution ofA to have as uniforma distribution as possible.

Page 51: On the Design of Analog VLSI Iterative Decoders - Electronics

3.1. Problem Statement 35

3.1.4Decoding Rules

In order to match the terminology of Fig. 2.1 and Fig. 3.1, weBlock- and bit-wisedecodinghave first to find correspondences. Two different cases have to

be distinguished in the context of decoding: block- and bit-wisedecoding.

Block-Wise Decoding

For block-wise decoding, the translation is as follows:

A , U unknown information vector

R , Y received data vector

A , U reconstructed information vector

The random variablesBi are not used in this case. StartingMAP block decodingwith these definitions and using a simplified notation, the MAPdecison rule is then given by

uMAP(y) = argmaxu

P(u|y) (3.9)

= argmaxu

P(u,y). (3.10)

MAP block decoding maximizesPcorrect= P[U = U]. Thisis equivalent to the minimization of the block error probabil-ity PBlockerr = P[U 6= U]. This kind of block decoding is alsoknown assequence estimation, although the term estimation issomewhat misleading in the context of decoding. More appro-priate is the termsequence detection.

Equivalently, the ML decision rule is defined as ML block decoding

uML (y) = argmaxu

P(y|u) (3.11)

if PU(u) is assumed to be uniformly distributed.

An example for such an ML decoder is the well-known Viterbisequence detector used in harddisc drive applications (see e.g.[29]). Note that a MAP-version of the Viterbi algorithm is pos-sible as well, as we will see in Section 5.4.1.

Page 52: On the Design of Analog VLSI Iterative Decoders - Electronics

36 Chapter 3. The Probability-Propagation Algorithm

Decoding type Decision rule

(block-) MAP u(y) = argmaxu P(u|y)

(block-) ML u(y) = argmaxu P(y|u)

symbol-MAP uk(y) = argmaxuk P(uk|y) ∀k

symbol-ML uk(y) = argmaxuk P(y|uk) ∀k

Table 3.1 Decision rules for the main decoding types.

Bit-Wise Decoding

Bit-wise decoding uses a somewhat different assignment of ter-minology:

A , Uk unknown information bit

R , Y received data vector

A , Uk reconstructed information bit

Bi , Ui ∀i 6= k

for eachk = 0,. . . , K −1. With these definitions, the MAP ruleMAP bit-wisedecoding for bit-wise decoding is given by

uk(y) = argmaxuk

P(uk|y) (3.12)

= argmaxuk

∑u∈Uk:uk=uk

P(u|y) (3.13)

= argmaxuk

∑u∈Uk:uk=uk

P(u,y) (3.14)

for all k = 0,. . . , K − 1. The MAP decision rule for bit-wisedecoding maximizesPcorrect= P[Uk = Uk]. At the same timeit minimizes the symbol error probabilityPSymbolerr= P[Uk 6=Uk]. Likewise we define the bit-wise ML decoding rule. Ta-ble 3.1 summarizes all four cases.

Page 53: On the Design of Analog VLSI Iterative Decoders - Electronics

3.2. Factor Graphs 37

3.2Factor Graphs

After having laid out the decision-theoretic background of de-coding, we will briefly have a look at an important graphicalmodel which is used afterwards to describe both code and de-coding system.

3.2.1Definition of Factor Graphs

According to the definition in [97], afactor graphis abipartite Factor graphsrepresent the

factorization of aglobal function

graphthat expresses how a “global” function of many variablesfactors into a product of “local” functions. This generaliza-tion of Tanner graphs [98] by adding hidden state variables hasbeen introduced in [66] and in Wiberg’s doctoral thesis [67].Factor graphs subsume many other graphical models such asMarkov random fields, andBayesian networks. To see the rela-tionship between these graphical models and the factor graphs,the reader may consult [97] and the references therein.

To introduce the factor-graph concept, let us start with a A very simplefactor-graph examplevery simple example [97]. Take a real-valued function

g(x1,x2,x3,x4,x5) of five variables that can be written as theproduct

g(x1,x2,x3,x4,x5)

= fA(x1) fB(x2) fC(x1,x2,x3) fD(x3,x4) fE(x3,x5) (3.15)

of five functions fA, fB, fC, fD, and fE.

The factor graph corresponding to (3.15) is shown in Fig. 3.2.Graphical ingredientsof factor graphWe can identify a circle for each variablexi representingvari-

able nodesand a filled rectangle for each factorf representingfunction nodes, respectively. The variable nodes forxi are con-nected to the function node forf by means of edges if and onlyif xi is an argument off . A third type of node, not actuallypresent in Fig. 3.2, is the class of auxiliary variables orstatenodeswhich are drawn as double circles (see e.g. Fig. 3.5).They form a subset of the variable nodes and are often used toshape the factor graph. They are not observable from the out-side of the system.

Page 54: On the Design of Analog VLSI Iterative Decoders - Electronics

38 Chapter 3. The Probability-Propagation Algorithm

x1

fA fB fC fD fE

x2 x3 x4 x5

Figure 3.2 A factor graph expressing that a global functiong(x1,x2,x3,x4,x5) factors as the product of the local functionsfA(x1), fB(x2), fC(x1,x2,x3), fD(x3,x4), and fE(x3,x5).

Forney’s Normal Graphs

The original definition of factor graphs allows only the con-Definition ofForney’s normalgraphs

nection of nodes of different type [67]. As we will see inChapter 5, it might be interesting to allow direct connectionsbetween function nodes. By doing this, the factor graphs andthe sum-product algorithm that runs on these factor graphs candirectly be transformed into an analog transistor-level descrip-tion. In [99, 100], Forney takes the original definition of fac-tor graphs and applies some modifications to allow direct con-nections of function nodes. He calls these new graphsnormalgraphs. He uses the convention that the degree of a variablenode has to be less or equal to 2, i.e., variable nodes have atmost two connections. By doing this, he can separate externalvariables (I/O nodes observeable from outside) very easily frominternal variables or state variables. State-variable nodes havedegree two whereas external variable nodes have only degreeone. As we will see in the introduction to the sum-product al-gorithm, a variable node with two connecting edges has no pro-cessing task to fulfill since it passes the messages right away tothe next function node. Since the internal variables have no ex-plicit function anymore, they are not needed anymore and canbe safely omitted. Thus internal variables, i.e., what we calledstates in the original factor-graph definiton, appear as the edgesonly between function nodes. Forney even modifies the origi-nal graphical representation by introducing so-calledstub nodesfor the I/O-nodes. In the context of this thesis, we do not applythe whole notational rigorosity and just use the node-splittingproperty in factor graphs for our implementation purposes.

The transformation of an original factor graph into the newNode splittingForney-style factor graph can be carried out easily bynodesplitting. Variable nodes with a degree higher than 2 can be

Page 55: On the Design of Analog VLSI Iterative Decoders - Electronics

3.2. Factor Graphs 39

x

fA fB fC

y z x

fA fB fC

y' y'' y''' z

=internal variableext. variable

I/O node

a) b)

a) original factor graph, b) node-splitting procedure to keepthe degree of variable nodes below or equal to 2.

Figure 3.3

replaced by the special function ’equal’ and variable nodes ofdegree 2 as in Fig. 3.3. The ‘equal’ function node assures thaty′, y′′ andy′′′ are kept at the same value. For a formal definitionof the ‘equal’ function see Section 3.2.2.

Function Summaries

Recalling the simple example of Fig. 3.2, we suppose that we Marginal andfunction summariesare interested in determining the influence on one specific vari-

able by the rest of the global function. To find the solutionto this problem, we need to compute a functionsummaryormarginal. According to the slightly unconventional summationnotation of [97], the ‘summary forx2’ of a functionh of threevariablesx1, x2, andx3 is denoted by a ‘not-sum’ of the form∑

∼{x2}h(x1,x2,x3) ,

∑x1

∑x3

h(x1,x2,x3). (3.16)

Therefore by using the notation of (3.16) we have the generali th marginal function associated withg(x1, . . . ,xn) denoted by

gi (xi ) ,∑∼{xi }

g(x1, . . . ,xn), (3.17)

which is the summary forxi of g. To get the marginal func-tion, we thus sum over all possible configurations ofg otherthanxi . The need for marginal functions will be clearer in thefollowing subsection, where a specific decoding example willbe described.

Page 56: On the Design of Analog VLSI Iterative Decoders - Electronics

40 Chapter 3. The Probability-Propagation Algorithm

3.2.2 Examples of Factor Graphs

The application range of factor graphs is very broad. Fac-Factor graphs appearin various fields tor graphs can be applied to both set-theoretic and probabilis-

tic modeling. Examples for set-theoretic modeling includecode description and state-space models, whereas typical ex-amples for probabilistic modeling include Markov chains, hid-den Markov models. Furthermore, factor graphs can even beused to describe fast transforms, which has been demonstratedby Kschischanget al.in [97] for the case of the Fast FourierTransform (FFT). We will restrict ourselves in this subsectionto the presentation of very simple, but important examples inthe context of coding and decoding.

Linear Block Codes

Let us begin with the description of linear codes by factorIverson’s notationgraphs. For every code, one or even more than one factor graphrepresentation can be found. In the case of linear codes, it isconvenient to start with the parity-check matrix as for exam-ple in (2.7). The Hamming codeC is defined over GF(2) andeach binary 7-tuplex , (x1,x2,x3,x4,x5,x6,x7) has to satisfyHxT = 0T. Each row of the parity-check matrix gives us anequation that has to be satisfied byx, and simultaneously,all theequations have to be satisfied to form a valid codeword. Now,“Iverson’s convention” [101] is very useful to assist behaviouralmodeling. If P is a predicate, i.e., a Boolean proposition, then[ P] is the binary function indicating truth ofP:

[ P] ,{

1 if P;0 otherwise.

(3.18)

For each equation of the codeC, a binary indicator function canCode membershipfunction be defined which describes the satisfaction of the check equa-

tion. The product of these functions indicates then the mem-bership in the code. Therefore, the three rows of the consideredcodeC can be used to derive the code membership indicator

Page 57: On the Design of Analog VLSI Iterative Decoders - Electronics

3.2. Factor Graphs 41

x1

f3 f2 f1

x2 x3 x4 x5 x6 x7

The factor graph of the binary Hamming code defined by theparity-check matrix (2.7).

Figure 3.4

functiong(x1,x2, . . . ,x7) as the product of three local functions:

g(x1,x2, . . . ,x7) = [(x1,x2, . . . ,x7) ∈ C] (3.19)

= [(x4 ⊕ x5 ⊕ x6 ⊕ x7) = 0]

· [(x2 ⊕ x3 ⊕ x6 ⊕ x7) = 0]

· [(x1 ⊕ x3 ⊕ x5 ⊕ x7) = 0], (3.20)

where⊕ denotes the sum operator (or XOR function) in GF(2).Using (3.20) we can draw the corresponding factor graph asshown in Fig. 3.4. Therein, instead of the black square usedin general factor graphs, we have inserted the special symbol� to indicate the actual parity-check function. We will freelyuse symbols for function nodes, depending on the type of thelocal function. However, variable nodes will always be drawnas circles; sometimes though double circles will be used forauxiliary variables (states) as in Fig. 3.5.

Convolutional Codes

We can draw a factor graph for a convolutional code in the same Rectilinear factorgraphs for

convolutional codesstyle as in Fig. 3.4. The membership functions are somewhatdifferent this time: as we have seen in Fig. 2.5, a valid code-word of a convolutional code can be read off as a path in thecorresponding trellis diagram. At each time step, a new trel-lis section is added. Thus not suprisingly, the factor graph of aconvolutional code is rectilinear, i.e., a straight line, as is shownin Fig. 3.5. Each function node is characterized by a binary in-dicator function which can be drawn as a trellis diagram whereedges exist only for valid state transitions. For a non-terminated

Page 58: On the Design of Analog VLSI Iterative Decoders - Electronics

42 Chapter 3. The Probability-Propagation Algorithm

Figure 3.5 The factor graph representation of a terminated trellis code.

Figure 3.6 The factor graph of a very short turbo code.

convolutional code, the length of the factor graph is, like thetrellis diagram, possibly infinite. The factor graph of a tail-biting code is formed by identifying the first and the last statenode of the factor graph of an ordinary convolutional code andthereby forming a ring structure.

Turbo Codes

Taking again the turbo encoder of Fig. 2.8, we can easily drawInterleaver of Turbocodes is visible infactor graph

the factor graph using the knowledge of the form of convolu-tional codes. The concatenation of the two constituent codesC1 andC2 of a turbo code as shown in Fig. 2.8 is realized bythe interleaver. This permutationπ can be identified as the con-nection pattern in the middle row of Fig. 3.6. Obviously, to geta good code according to Shannon [1], the codes have to bemuch longer than those of Fig. 3.6.

The connection pattern is the main feature of the factor graphTurbo-style codeshave a similarconnection pattern

of a turbo code. Apart from the original turbo codes, a wholeclass of similar codes has been identified. The so-called turbo-style codes also have a highly connected pattern as their keyfeatures. As a beautiful example, we redraw the factor graphof the [22,11,7] subcode of a binary Golay code in Fig. 3.7 (see[26, 67]). The membership indicator function of the functionnodes on the main ring structure is again characterized by atrellis diagram as shown in the inset of Fig. 3.7. Note that the

Page 59: On the Design of Analog VLSI Iterative Decoders - Electronics

3.2. Factor Graphs 43

00111100

10

0101

10

The factor graph of a turbo-style code. Figure 3.7

aesthetically very pleasing appearance of this factor graph ismainly due to the introduction of additional state variable-nodeswhich let us factor the overall function in the desired way.

Probability distributions and decoding

Probability distributions are another important class of func-Factorizations of jointprobability mass

functions appear inmany situations

tions that can be represented by factor graphs. Since condi-tional and unconditional independence of random variables isexpressed in terms of a factorization of their joint probabil-ity mass function or joint probability density function, factorgraphs for probability distributions appear in many different

Page 60: On the Design of Analog VLSI Iterative Decoders - Electronics

44 Chapter 3. The Probability-Propagation Algorithm

situations. As we saw in Section 3.1.1 discussing the basic de-cision theory, decoding is one of the many applications whereexactly this kind of functions arises.

A situation that is often modeled in coding theory is as follows:MAP decoding is asituation that is oftenmodeled

select a codeword (x1, . . . ,xn) with uniform probability from acodeC of length n which is then transmitted over a discretememoryless channel without feedback that has a correspondingoutput (y1, . . . , yn). Since we assumed a memoryless channel,by definition the conditional probability mass function or con-ditional probability density function evaluated at a particularchannel output is given by the product form

p(y1, . . . , yn|x1, . . . ,xn) =n∏

i=1

pY|X(yi |xi ). (3.21)

The a priori probability of selecting a particular codeword isconstant. Thus thea priori joint probability mass function ofthe codeword is proportional to the code set membership func-tion. Using (3.21) we can therefore write the joint probabilitymass function of{x1, . . . ,xn, y1, . . . , yn} as

p(x1, . . . ,xn, y1, . . . , yn)

= γ · [(x1, . . . ,xn) ∈ C] ·n∏

i=1

pY|X(yi |xi ), (3.22)

whereγ is a constant, positive scale factor. The code member-ship indicator function [(x1, . . . ,xn) ∈ C] itself may factor intoa product of local indicator functions as we have seen in (3.20).Reusing the Hamming code example, we can write the jointprobability mass function according to

p(x1, . . . ,x7, y1, . . . , y7)

= γ · [(x4 ⊕ x5 ⊕ x6 ⊕ x7) = 0] · [(x2 ⊕ x3 ⊕ x6 ⊕ x7) = 0]

· [(x1 ⊕ x3 ⊕ x5 ⊕ x7) = 0] ·7∏

i=1

pY|X(yi |xi ),

(3.23)

which can be easily described as the factor graph shown inFig. 3.8.

Page 61: On the Design of Analog VLSI Iterative Decoders - Electronics

3.2. Factor Graphs 45

x1

f3 f2 f1

x2 x3 x4 x5 x6 x7

y1 y2 y3 y4 y5 y6 y7

Factor graph for the joint probability density function ofchannel input and output for the Hamming code of Fig. 3.4.

Figure 3.8

Compared to the factor graph of the code (see Fig. 3.4), the fac-Getting a factor graphof the decoder by

augmenting the factorgraph of the code

tor graph of the joint probability mass function of codewordsymbols and channel output symbols is obtained simply by aug-menting the factor graph of the code. This is done by addinga channel function and the corresponding channel-input vari-able for each I/O variable node of the factor graph as demon-strated in Fig. 3.8. This is a very interesting observation, sincewe realize that the factor graph of the code and the factor graphof a possible decoding scheme are tightly related. In general,one possible decoder can directly be derived knowing the codestructure.

Logic functions

Surprisingly, even if you did not hear of factor graphs ex- Logic circuitdiagrams are very

popular factor graphsplicitely before, you may be already quite familiar with certaintypes of them. The local functions of Fig. 3.9 are drawn as logicgates and remind us of the definition of the corresponding bi-nary indicator function. For example the OR gate with inputsu1 andu2 and outputx1 represent the binary indicator functionf (u1,u2,x1) = [x1 = (u1ORu2)]. The global function of thefactor graph representation of the logic circuit of Fig. 3.9 canbe written as

g(u1,u2,u3,u4,x1,x2, y1)

= [x1 = (u1ORu2)][x2 = (u3XORu4)][ y1 = (x1NAND x2)].(3.24)

Page 62: On the Design of Analog VLSI Iterative Decoders - Electronics

46 Chapter 3. The Probability-Propagation Algorithm

u1

u2

u3

u4

x1

x2

y1

≥1

=1

&

Figure 3.9 A logic circuit is also a factor graph.

The global functiong takes the value of 1 if and only if all itsarguments form a valid configuration which is consistent withthe logic circuit of Fig. 3.9. In general, every block diagrammay be viewed and drawn as a factor graph.

3.3 The Sum-Product Algorithm

The sum-product algorithm is a generic algorithm that oper-Generic sum-productalgortihm calculatesfunction summariesin a distributedmanner

ates on a factor graph via a sequence of local computations atevery factor-graph node [97]. The computation rules consistonly of multiplications and additions, hence the name ‘sum-product algorithm’. The local results are passed asmessagesalong the edges of the factor graph. The algorithm can be usedto compute the exact function summary, as defined by (3.17),in a factor graph that forms a tree, i.e., has no loops. But thesum-product algorithm can also be applied to factor graphs withcycles where it results in an iterative algorithm without a nat-ural termination. This makes the function summary non-exact.But decoding of turbo codes or low-density parity-check codesare some of the most exciting applications that reflect preciselythis situation with a factor graph having cycles. And with someprecautions, the algorithm performs very well.

To formally start the mathematical presentation of the sum-Exercise example ofcalculations. . . product algorithm, we would like to make a short example

[102]. Let us consider the specific case with the real-valuedglobal function defined in (3.15) that may represent a condi-tional joint probability mass function of a collection of discrete

Page 63: On the Design of Analog VLSI Iterative Decoders - Electronics

3.3. The Sum-Product Algorithm 47

fA

fBCDE

fE (x3,x5)

fD (x3,x4)

fA (x1)

fC (x1,x2,x3)

fB (x2)

fDE

fB 1

1

fE

fD

fB

x1

x3

x5

x4

x2

Gathering separate product terms in the factor graph tocompute the marginal g1(x1).

Figure 3.10

random variables, given some observationy. We are then inter-ested in the function summary

p(x1|y) =∑x2

∑x3

∑x4

∑x5

g(x1,x2,x3,x4,x5) = g1(x1). (3.25)

Using the factorization given by (3.15), we derive

p(x1|y) =∑x2

∑x3

∑x4

∑x5

fA(x1) fB(x2) fC(x1,x2,x3) fD(x3,x4) fE(x3,x5)

= fA(x1)∑x2

fB(x2)∑x3

fC(x1,x2,x3)∑x4

fD(x3,x4)

︸ ︷︷ ︸fD

∑x5

fE(x3,x5)

︸ ︷︷ ︸fE︸ ︷︷ ︸

fDE︸ ︷︷ ︸fBCDE

(3.26)

We observe immediately thatg1(x1) can be calculated by onlyknowing fA and fBCDE. The latter can be computed by justknowing fB, fC, and fDE. Finally, fDE can be calculated byjust knowing fD and fE. The products can be assembled in thefactor graph as shown in Fig. 3.10.

With each node in the factor graph we can now imagine an as- The meaning ofmessages and

message passingsociated processor which is capable of doing local products andlocal function summaries. They may communicate together bysending and receiving messages from neighbouring nodes. The

Page 64: On the Design of Analog VLSI Iterative Decoders - Electronics

48 Chapter 3. The Probability-Propagation Algorithm

messages are whole distributions, i.e., the outcome of the func-tion nodes, which are passed from one factor graph node to an-other connected by an edge. In general, they represent discreteprobability mass functions, but also continuous probability dis-tributions are included in the framework. Through the messagepassing behaviour, all information needed to calculateg1(x1)becomes available atx1. Hence, the information is distributedfully bi-directional on all branches of the network if we calcu-late the function summary for all variables.

3.3.1 The Sum-Product Update Rules

The simple computational update rule of the sum-product algo-The sum-productalgorithm uses verysimple update rules

rithm can be described, in all generality, as follows [97]:

The message sent from a nodev on an edgee is theproduct of the local function atv (or the unit functionif v is a variable node) with all messages received atv on edgesother thane, summarized for the variableassociated withe.

Thus after calculating the product of all incoming messages in-cluding the local function, a summary function with respect tothe considered node or the variable to which the resulting mes-sage is sent, has to be applied.

Let us denoteµv→w as the message sent from nodev to nodew. Then, as illustrated in Fig. 3.11, two different computationscan be expressed for the update between a variable node and afunction node and vice-versa:

variable-to-function update:

µx→ f (x) =∏

h∈n(x)\{ f }µh→x(x) (3.27)

function-to-variable update:

µ f →x(x) =∑∼{x}

f (Xn( f ))

∏y∈n( f )\{x}

µy→ f (y)

(3.28)

Page 65: On the Design of Analog VLSI Iterative Decoders - Electronics

3.3. The Sum-Product Algorithm 49

µf→x (x)

µh1→x (x) µy1→f (y1)

µx→f (x)

fx

h1

h2y2

y1

n( f ) \ {x}

n(x) \ {f}

The sum-product algorithm update rules illustrated in afragment of a factor graph.

Figure 3.11

The setn(v) denotes the neighbours of nodev, i.e., n( f ) ={x, y1, y2, . . .} andn(x) = { f ,h1,h2, . . .}. The particularly sim-ple form of (3.27) is due to the fact that there is no local functionto include, and the summary forx of a product of functions ofx is simply the product itself. Equation (3.28) on the other handgenerally involves complicated function multiplications and thesummary operator application afterwards.

Special cases arise when a variable node has only degree 2, i.e.,Simplified updaterules for leaf nodesit has only two neighbours. Then the message is just passed

on. Leaf nodes, i.e, nodes with only one neighbour send thefollowing messages:

µx→ f (x) = 1 (3.29)

andµ f →x(x) = f (x) (3.30)

respectively, where, by a slightly abused notation, 1 denotes theunit function.

3.3.2Message Passing Schedules

So far, we have discussed the update rules of the sum-product How should weschedule the updates?algorithm in detail. So we know exactly how to calculate the

messages. But then the question of how to initiate the updatesand how to sequence the updates arises immediately. In fact,finding the optimal update schedule with respect to the leastnumber of calculations is a non-trivial problem.

Page 66: On the Design of Analog VLSI Iterative Decoders - Electronics

50 Chapter 3. The Probability-Propagation Algorithm

It is not clear how message passing is initiated, since a messageInitialization of thealgorithm by unitmessages

generally depends on messages that have been sent before. Ini-tially we thus suppose that a unit message is present on everyedge on any given vertex. This means that every node has sent aunit function to all of its neighbours. With this convention, ev-ery node is in a position to send a message at any time startingfrom its equilibrium state.

A second assumption is that of the synchronized message pass-Assume asynchronizedmessage passingscheme

ing schedule, i.e., we assume a discrete time signal processingautomaton synchronized with a global clock. Although this isnot necessarily the case in practical implemenations (as we willsee with the case of asynchronous analog VLSI networks de-scribed later on in this thesis), it is a fairly reasonable assump-tion that simplifies the understanding of the problem consider-ably. Thus only one message may be passed on any given edgein any given direction during one clock cycle, and this messagereplaces any previous message passed on that edge in any di-rection. We say that a vertexv has a messagependingat anedgee if the message thatv can send one is potentially differ-ent from the previous message sent one. For example, variablenodes initially have no messages pending, since they would ini-tially only send a unit message, and this is exactly what they aresupposed to send. On the other hand, function nodes will cre-ate pending messages at the beginning. In general, whenever amessage arrives at a vertexv, it will create a pending messageat every edge other than the one where the message has arrived.Thus, a message that arrives at a leaf node will not cause anyother pending message, since there are no edges other thane.Thus leaf nodesabsorbpending messages, whereas non-leafnodesdistributepending messages.

Although we make the assumption of a synchronized automa-Abundantly manydifferent updateschedules

ton, there is a huge number of different message passingschemes possible. Fortunately, this is not a limiting problem.We only need to ensure a possible schedule to benowhere idleto terminate the sum-product algorithm of a cycle-free factorgraph. We call a schedule that sends at least one pending mes-sage at each clock ticknowhere idle. If the cycle-free factorgraph has a finite number of nodes, i.e., it is a finite tree, thecalculations of the sum-product algorithm are terminated in afinite number of steps. This is easily unterstood if one remem-bers that leaf nodes absorb pending messages and that in a fi-nite tree every path eventually reaches a leaf. Conversely, factor

Page 67: On the Design of Analog VLSI Iterative Decoders - Electronics

3.3. The Sum-Product Algorithm 51

a) b)

t=0–

t=1–

t=2–

t=0

t=1

t=2

t=0–

t=1–

t=2–

t=0

t=1

t=2

Two message passing schedules for a simple factor graph: a)the flooding schedule and b) a two-way schedule.

Figure 3.12

graphs with cycles never have a nowhere-idle message-passingschedule. The termination of the calculations is done in thesecases arbitrarily by truncation or by a suitable stopping rule.The number of iterations to be performed to get satisfying re-sults is generally determined by simulation.

Two examples of possible message passing schedules are The floodingschedule and the

two-way scheduleshown in Fig. 3.12 where the flow of pending messages is vi-sualized. A pending message is shown as a dot near the givenedge, whereas the transmission of a message is indicated byan attached arrow to the dot. Time-flow is also indicated, withmessages sent at non-negative integer timest . Figure 3.12a)shows the so-calledflooding schedulein which all pending mes-sages are sent during each clock cycle, starting with the pendingmessages at function nodes. Having in mind the original factor-graph formulation, one could equivalently say that first all vari-able nodes are updated and then a time step later all functionnodes are updated and so on. As there are no direct connec-tions between two variable nodes of an original factor graph,the order of calculation of the different variable nodes is notimportant. The same applies also to the function nodes. Thisalternative formulation of the flooding scheme is not correct ifForney’s normal graphs are used. Though not optimal in the

Page 68: On the Design of Analog VLSI Iterative Decoders - Electronics

52 Chapter 3. The Probability-Propagation Algorithm

sense of the least number of computations, it is the most sim-ple message passing scheme to implement in software since nospecial attention has to be paid to the order of the update. Asa second example, Fig. 3.12b) shows atwo-way schedule. Un-der this schedule we will have exactly one pending message thatpasses in each direction over a given edge for all time instances.

A third update schedule that is particularly suitable for recti-Theforward-backwardupdate schedule fortrellises

linear factor graphs, e.g. the factor graph of a trellis, is theforward-backward scheme. As the name already states, onefirst needs to calculate in the forward direction of the rectilin-ear factor graph and then in the backward direction. A variablenode and a function node is updated alternately until the endof the factor graph is reached. Then the whole process is re-versed for the backward direction. Note that this is independentof the forward propagation and can be done at the same time.The forward-backward update schedule is a special case of thetwo-way schedule and uses the least possible number of cal-culations to build the function summary in a rectilinear factorgraph. In fact, the forward-backward update schedule is es-sentially equivalent to the calculations sequence of the BCJRalgorithm or forward-backward algorithm [103].

Having in mind the application of the sum-product algorithmThe fully parallelupdate schedule forasynchronousnetworks

to the decoding of error-correcting codes using large asyn-chronous analog networks, none of the previously mentionedmessage passing schedules is appropriate. In fact our previousassumption of synchronicity is not met and the update schedulecan not be controlled directly. If the calculation speed of allnetwork nodes is equal, the scheduling scheme can be seen asa fully bi-directional asynchronous flooding scheme. Messagesare transmitted at any time in every direction.

3.4 Probability Calculus Modules

In the previous sections of this chapter we have introduced theDissect thesum-productalgorithm intobuilding blocks

basics of factor graphs and the sum-product algorithm whichcan be run on these graphs. We have also seen in the factorgraph examples in Section 3.2.2 that the messages passed fromone node to another often have the meaning of probabilities orprobability density functions. To construct probability propaga-tion networks, we consider in the following building blocks as

Page 69: On the Design of Analog VLSI Iterative Decoders - Electronics

3.4. Probability Calculus Modules 53

pX (x1) pZ (z1)

pZ (zk)

pY (y1)

f(x,y,z)

pY (yn)

pX (xm)

… …

The building block of probability propagation networks. Figure 3.13

in Fig. 3.13. The building blocks compute a discrete probabilitymass functionpZ from the discrete probability mass functionspX and pY as follows: letX, Y, andZ be finite sets. LetpXand pY be the input probability mass functions defined on thesets (alphabets)X andY, respectively. LetpZ be the outputprobability mass function onZ defined by

pZ(z) = γ∑x∈X

∑y∈Y

pX(x)pY(y) f (x, y,z), ∀z ∈ Z, (3.31)

where f is a function fromX×Y×Z into {0,1} and whereγ isan appropriate scale factor that does not depend onz. The scalefactorγ is mathematically required to yield a probability distri-bution pZ(z) at the output whose sum is

∑i pZ(zi ) = 1. Equa-

tion (3.31) can be identified as the function-to-variable updateof the sum-product algorithm as defined by (3.28). This pro-cessing step is used to build summary functions or marginals asdefined by (3.17). Note that compared to the notation of (3.28),the different terms of (3.31) are slightly rearranged.

The{0,1}-valued functionsf can be illustrated by trellis mod- Indicator functionfrepresented by trellis

diagramsules as in Fig. 2.6, the inset in Fig. 3.7 and Fig. 3.14. Such atrellis module is a bipartite graph with labeled edges as intro-duced in Section 2.1.5. The set of left-hand vertices isX, theset of right-hand vertices isZ, and an edge betweenx ∈ X andz ∈ Z with labely ∈ Y exists if and only if f (x, y,z) = 1. Con-versely, the trellis module uniquely definesf . In the contextof coding theory, the binary indicator functionsf are knownas local indicator functions of the factorized global code mem-bership indicator functions as we have seen in the factor graphexamples in Section 3.2.2.

Page 70: On the Design of Analog VLSI Iterative Decoders - Electronics

54 Chapter 3. The Probability-Propagation Algorithm

zy

a) b) c)

x

11

00

1

0zyx

11

100

0

1

0zyx

11

1

00

01

0

Figure 3.14 Trellis representations of a) the equal gate, b) soft-XOR gate,and c) the backward reasoning on a soft-AND gate.

3.4.1 Soft-Logic Gates

Equal Gate

In Section 3.2.1, we encountered the ‘equal’ function to allownode splitting. We will now have a closer look at this specialfunction. The functionf (x, y,z) for this particular case is equalto 1 if and only ifx = y = z and f (x, y,z) = 0 otherwise. Thecorresponding trellis module is shown in Fig. 3.14(a). The in-dicator function f can be substituted in (3.31) to calculate theprobability formulation of the output distribution which is givenby [

pZ(0)pZ(1)

]= γ

[pX(0)pY(0)pX(1)pY(1)

], (3.32)

whereγ is a scale factor to satisfypZ(0)+ pZ(1)= 1. Thus thecomputations of theequal gatereduce to the component-wiseproduct ofpX andpY. Such computations appear whenever in-dependent information of some random variables is combined.

Soft-XOR Gate

Another common functionf is defined asf (x, y,z) = 1 if andonly if z = x ⊕ y with ⊕ denoting the standard modulo-2 ad-dition and f (x, y,z) = 0 otherwise. The corresponding trellisdiagram is depicted in Fig. 3.14b). With this functionf themodule of Fig. 3.13 becomes asoft-XORgate: if pX andpY arethe distributions of two independent binary random variablesXandY, respectively, then the distribution ofX ⊕Y is pZ given

Page 71: On the Design of Analog VLSI Iterative Decoders - Electronics

3.4. Probability Calculus Modules 55

by [pZ(0)pZ(1)

]=[

pX(0)pY(0)+ pX(1)pY(1)pX(0)pY(1)+ pX(1)pY(0)

]. (3.33)

Soft-AND Gate in Backward Direction

As a generalization of the soft-XOR gate, “soft” versions of allBackward-reasoningon soft-logic gates is

also possiblestandard logic gates can be constructed by a suitable choice off . But not only the standard forward-functioning direction oflogic gates can be considered. Recall that soft-logic versionsof factor graph nodes work fully bi-directionally. Thus, back-ward reasoning, i.e, from one of the inputs and the output tothe other input, is possible, too. For example assume the logicAND function with inputsy and z and outputx in the back-ward direction fromx and y to z which is shown by the trel-lis diagram of Fig. 3.14(c). If we substitute the correspondingcharacteristic indicator functionf into (3.31) we get[

pZ(0)pZ(1)

]= γ

[pX(0)pY(0)+ pX(0)pY(1)pX(0)pY(0)+ pX(1)pY(1)

], (3.34)

where againγ is needed to ensure a probability distributionpZat the output. This module is very unlikely to occur in coding,but would naturally appear in many applications of Bayesiannetworks.

3.4.2Building Blocks with Multiple Inputs

The building block of Fig. 3.13 takes two input probability dis- Dissection propertyusing Forney’snormal graphs

tributions and calculates one output probability distribution. Ingeneral however, factor graph nodes have a degree of morethan three, which implies that building blocks with more thantwo inputs are needed. This is not an actual problem, sinceevery building block with more than two input distributionscan be dissected into a cascade of two-input building blocks asshown in Fig. 3.15b). This approach is also consistent with For-ney’s definition of normal graphs [99] where internal degree-2 variable nodes do not modify the messages from the in-coming branch to the outgoing branch and can therefore beomitted from the drawing. Alternatively to the partitioning of

Page 72: On the Design of Analog VLSI Iterative Decoders - Electronics

56 Chapter 3. The Probability-Propagation Algorithm

1

1

1

2

2

3

3

n-1

n-1

n-1

n

n

n

2

3

4

4

5

5 6

a) b)

c)

...

Figure 3.15 A factor graph representation of the dissection property ofbuilding blocks with n-input probability distributions to makethem compatible to 2-input building blocks of Fig. 3.13. Thefigure shows a) an n-input block, b) a cascade structure, andc) a tree structure. State nodes are intentionally omitted(see [99]).

Fig. 3.15b), we might draw a tree structure which is an inter-esting option from the point of view of the signal delays. Be-cause every processing node adds some delay due to parasiticeffects in the transistor circuits we want to implement, the dis-section solution of Fig. 3.15b) adds much more delay from port1 to portn than to a port in the middle of the cascade, than thestructure of Fig. 3.15c). By applying a tree structure the delayfrom one port to any other port is partly equalized.

Fully bidirectional factor graph calculation nodes for probabil-Fully bi-directionalbuilding blocks canbe constructed

ity propagation networks can be constructed by using buildingblocks as in Fig. 3.13. To optimize the number of operations, in-termediate results from certain building blocks may be reused.As an example we look at a soft-XOR gate of degree 4, i.e., asoft-XOR gate with four neighbours as shown in Fig. 3.16b).

Page 73: On the Design of Analog VLSI Iterative Decoders - Electronics

3.4. Probability Calculus Modules 57

1

2 3

4

a) b)

1 2

3 4

3 4

1 2

input

output

input

output

A fully bidirectional factor graph node of a soft-XOR gateimplemented using 2-input building blocks: a) factor graph

node and b) calculations implemented using 2-input buildingblocks. The inputs and outputs denote the incoming and

outgoing messages (probability distributions), respectively.

Figure 3.16

The actual transistor level implementation of the buildingblocks in Fig. 3.13 is described in the following chapter. Therewe explore how a schematic of an actual probability gate candirectly be generated by simply knowing its corresponding in-dicator functionf .

Page 74: On the Design of Analog VLSI Iterative Decoders - Electronics

58 Chapter 3. The Probability-Propagation Algorithm

Page 75: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 4

Circuit Implementation

In Chapter 3 we learnt how to build a factor graph descriptionof a given code, and how to augment it into the factor-graphdescription of a decoder, on which the sum-product algorithmcan be applied to find a symbol-wise MAP solution of the de-coding problem. Now we will present a generic solution for thetransistor-level implementation of these building blocks. Fur-thermore, practical design aspects and design issues are dis-cussed.

4.1Basic Circuit

The underlying equations of the general building block of Main operations ofthe building blocks:

multiplication andsummation

Fig. 3.13 can be separated into two main computational parts:first, component-wise products of the incoming discrete proba-bility distributions have to be built. Second, product terms thatbelong to the valid configurations, i.e., fulfill the binary indi-cator function f , are summed for the appropriate terms of thediscrete output probability distribution.

4.1.1Signal Summation

Summing signals is easily accomplished in the current domain,Summation is easy incurrent domaini.e., when signals are represented by currents. This is due to

Kirchhoff’s current summation law, which states that the sumof all currents along the incoming branches to a given node isequal to the sum of all currents of the outgoing branches. Ifonly one outgoing branch exists, it automatically carries thesum current of all incoming branches. This means that currentsummation is simply done by connecting wires. As in the trel-lis diagram, the selective sum needed for the implementation of

Page 76: On the Design of Analog VLSI Iterative Decoders - Electronics

60 Chapter 4. Circuit Implementation

(3.31) can be built connecting the appropriate wires (terms) ofthe pair-wise products. Doing the same operation in the voltagedomain would definitively need more circuitry. Most of the tra-ditional voltage-adder circuits, such as for example the opampbased voltage adders (see e.g. [104]), are anyway based on theprinciple of current addition, but additionally they need domaintransformation from voltage to current and back. Furthermore,also the discrete-time switched capacitor (SC) adders are not di-rect voltage adders. The voltage signals applied to SC circuitsare inherently transformed to a charge representation on the ca-pacitors. Hence, the addition is based on a charge transfer fromone capacitor to another, and finally the charges are inherentlytransformed back to a voltage. A comprehensive overview ofthe general SC techniques can be found in [43, 44]. Note thatthe SC-based circuits are not anymore continous-time, asyn-chronous circuits and thus not suitable for our intended asyn-chronous networks. But still, this principle is extensively usedin SC filter circuits.

4.1.2 Basic Translinear Network Theory

Building the outer products of the incoming discrete probabilityThe term ‘translinear’was coined by BarrieGilbert in 1975

distributions is a more complex task than doing the summationof currents. But luckily, Gilbert coined the termtranslinearnet-work in 1975 [105]. This thinking style is also current-basedand thus fits well to our extremely efficient current additionprimitives described in the previous subsection. The translin-ear networks (TN) are based on an astonishingly simple the-ory [106, 107]. The heart of translinear networks are bipolarjunction transistors (BJT). These transistors exhibit an expo-nential characteristic of the collector current in forward activemode according to

IC = AEJS(T)eVBE/nUT = IS(T)eVBE/nUT , (4.1)

whereAE is the emitter area,JS is the saturation current den-sity, andIS is the saturation current. The absolute temperatureis T andUT denotes the thermal voltagekT/q. The factorn isthe ‘emission coefficient’, an indicator of imperfect emission ofelectrons, generally close to unity. We use the standard notationfor the transistor terminals (see e.g. [108,109]). A similar char-acteristic can be found for the drain current of a weakly inverted

Page 77: On the Design of Analog VLSI Iterative Decoders - Electronics

4.1. Basic Circuit 61

MOS transistor with

ID = W

LJ0(T)e(VG−nVS)/nUT = I0(T)e(VG−nVS)/nUT , (4.2)

whereW and L are the width and the length of the transistor,respectively,J0 is a specific current density comparable to theone in the BJT case, andI0 is the corresponding specific cur-rent. The slope factorn falls in the range of [1. . .2], generallyclose to 1.5. Note that the mechanisms leading ton in the caseof a bipolar transistor and a MOS transistor are not the same,although the effects thereof are comparable.

With these definitions in mind, we assume the circuit configu-Kirchhoff’s voltagelaw applied to a

special arrangementof transistors

ration of Fig. 4.1. Furthermore, we assume that all transistorshave identical geometric dimensions. According to the translin-ear theory, we can form a closed loop of an even numberNof base-emitter junctions (gate-source steps),N/2 in each di-rection (clock-wise (CW) and counter-clock-wise (CCW)) andarbitrarily ordered. According to Kirchhoff’s voltage law, wewrite ∑

CW

VBEi =∑CCW

VBEi . (4.3)

Inserting (4.1) into (4.3) leads to the conclusion that the voltage A voltage sumcorresponds to

currentmultiplications

sum over allVBEi corresponds to the product of the collectorcurrentsICi normalized toISi :

∏CW

ICi

ISi

=∏

CCW

ICi

ISi

. (4.4)

This result is totally independent of the temparatureT and (at The translinearprinciple is also valid

in MOS technologyleast on the level of principle) also independent of the currentgainβ of the transistors. Several distinct loops may share someof their base-emitter junctions. The results, as presented for thebipolar case, are also fully valid for weakly inverted MOS tran-sistors [110]. The generalized translinear principle for quadrat-ically behaving MOS devices leads to somewhat different, butalso very useful expressions [111–113]. The MOS translinearprinciple expresses that the sum of the square-rooted currentsfrom the clock-wise oriented transistors equals that of the op-posite direction.

Page 78: On the Design of Analog VLSI Iterative Decoders - Electronics

62 Chapter 4. Circuit Implementation

VBEi

ICi

Figure 4.1 A simple translinear loop.

4.1.3 Core Circuit for Matrix Multiplications

Using (4.4), it is relatively easy to construct a circuit that cre-A transistor matrixinspired by thetranslinear principle

ates pair-wise products of two incoming probability distribu-tions. The fundamental circuit that underlies the realization ofall the building blocks is shown in Fig. 4.2. Its inputs are thecurrentsIx,i , i = 1,2,. . . ,m and the currentsI y, j , j = 1,2,. . . ,n.Its outputs are the currentsIi , j . All transistors in Fig. 4.2 are as-sumed to be ideal voltage-controlled current sources accordingto (4.1) and (4.2) for BJT and weakly inverted MOS transistors,respectively. Our terminology and notation correspond in thefollowing to the weakly inverted MOS case. As we will seelater in this section, the function of the circuit is then given by

Ii , j = Iz(Ix,i /Ix)(I y, j /I y) (4.5)

with Ix ,∑m

i=1 Ix,i , I y ,∑n

j=1 I y, j , and Iz ,∑mi=1

∑nj=1 Ii , j = Ix. The circuit thus computes the

scaled pairwise product of the two probability mass func-tions pX(i ) , Ix,i /Ix, i = 1,. . . ,m, and pY( j ) , I y, j /I y,j = 1,. . . ,n.

The application of the circuit of Fig. 4.2 to the computationMatrix multiplicationfor two discreteprobabilitydistributions

of (3.31) is now straightforward. LetX = {x1, . . . ,xm} andY = {y1, . . . , yn}. The input terminals of the circuit are fedwith the currentsIx,i , Ix pX(xi ) and I y, j , I y pY(yj ), respec-tively, where the sum currentsIx and I y can be chosen freelyin the range where (4.1) and (4.2) hold for all transistors in the

Page 79: On the Design of Analog VLSI Iterative Decoders - Electronics

4.1. Basic Circuit 63

I1,1 I1,n

. . .

. . .

. . .

Ix,1

Im,1 Im,n

. . .

. . .

Ix,m

. . .

. . .

. . .

Iy,1

Vref

Iy,n

Vref

Fundamental circuit using two input distributions, eachrepresented as a current vector.

Figure 4.2

I1,1 I1,n

. . .

. . .. . .

Ix,1

Im,1 Im,n

. . .

. . .

Ix,m

. . .

. . .

. . .

Iy,1

Vref

Iy,n

Vref

Fundamental circuit with(n−1)m independent translinearloops. Two of the loops are shown by the dashed polygons.

Figure 4.3

Page 80: On the Design of Analog VLSI Iterative Decoders - Electronics

64 Chapter 4. Circuit Implementation

circuit. The output currents then equalIi , j = IzpX(xi )pY(yj ),i = 1,. . . ,m, j = 1,. . . ,n.

The computation of (3.31) is completed by summing the cur-rents Ii , j for eachz ∈ Z for which f (xi , yj ,z) = 1. If a termpX(xi )pY(yj ) is used more than once, the corresponding cur-rent Ii , j must first be copied a corresponding number of times.

Translinear Network Interpretation

The circuit may be analyzed by first drawing all (n − 1)m in-dependent translinear loops in the circuit of Fig. 4.3 and writ-ing down the corresponding loop equations. Second, the con-straints on the three sum-currents of the individual distribu-tions can also be stated immediately. Finally, the system of(n−1)m+3 equations can be brought to the form of (4.5).

Standard Large-Signal Analysis

The result of (4.5) may also be verified by standard large-signalLarge-signal proof ofthe basic function analysis as given in the following proof using the first order

approximations of the drain (or collector) current given by (4.2)and (4.1), respectively.

Proof of (4.5): Let Vx,i and Vy, j denote the potentials at theinput terminals forIx,i and I y, j , respectively. On the one hand,we have

Ii , j

Ix,i= Ii , j /

n∑`=1

Ii ,` (4.6a)

= I0eVy, j −n Vx,i

n UT /

n∑`=1

I0eVy,`−n Vx,i

n UT (4.6b)

= eVy, jn UT /

n∑`=1

eVy,`n UT . (4.6c)

Page 81: On the Design of Analog VLSI Iterative Decoders - Electronics

4.1. Basic Circuit 65

log-domainprocessor

Iin

V *in V *

out

Iout

The principle of a log-domain signal processor. Figure 4.4

On the other hand, we have

I y, j

I y= I y, j /

n∑`=1

I y,` (4.7a)

= I0eVy, j −n Vref

n UT /

n∑`=1

I0eVy,`−n Vref

n UT (4.7b)

= eVy, jn UT /

n∑`=1

eVy,`n UT . (4.7c)

Combining (4.6c) and (4.7c) finally yields (4.5). �

Log-Domain Signal Processing Interpretation

A third interpretation of the circuit functioning of Fig. 4.2 canbe given by the log-domain signal processing concept. In thistechnique, the input signals are compressed by the logarithmfunction, then the actual processing task is fulfilled in the com-pressed signal domain and finally the signal is again expandedby the inverse function (exponentiation) as shown in Fig. 4.4.This concept was first introduced by Adams in 1979 [114] toachieve electronic gain and cut-off frequency control in fil-ters. It was afterwards extensively developed by different re-searchers [115–118] in its main application field of filtering.Log-domain signal processing is also very attractive for low-power and low-voltage applications [119, 120]. The main as-pects of log-domain signal processors (or generally speaking:companding signal processors) are low-voltage operation, en-hanced dynamic range and high-frequency operation. Further-more, filters exhibit a wide frequency tuning range. The is-sues of companding signal-processors such as signal-dependentnoise, distortions due to device mismatch, intermodulation pro-duction by interference and increased bandwidth requirements

Page 82: On the Design of Analog VLSI Iterative Decoders - Electronics

66 Chapter 4. Circuit Implementation

are mainly a problem in linear circuits. Non-linear processorssuch as decoders are much more robust against these problems.

4.1.4 Log-Likelihood Interpretation of Input andOutput Distributions

The circuit of Fig. 4.2 can be operated differently, if we omitLog-likelihood ratiosappear between thegate voltages of thediode-connectedtransistors

the input diode-connected logarithm transistors on the left-most column as in Fig. 4.5. Instead of using the input currentsI y,i , I y pY(yi ) we apply the voltages equivalent to the differentgate voltagesVy,i , nUT ln[ pY(yi )] +const. If we consider thevoltage difference

Vy,1− Vy,n = nUT (ln[ pY(y1)] − ln[ pY(yn)]) = nUT lnpY(y1)

pY(yn)(4.8)

we immediately recognize the log-likelihood ratio representa-tion of inputs 1 andn. Through the thermal voltageUT, this log-likelihood-ratio representation by (n−1) voltage differences istemperature dependent. The representation is equivalent to theprobability representation byn currents if we know exactly theabsolute temperatureT (see Section 4.4.4 for a detailed analysisof the temperature dependence). In principle, we can thus freelychoose the input representation ofpY. By the same means ofa battery of diode-connected transistors, the input and outputprobability distributionpX and pZ , respectively, can be trans-formed into a log-likelihood-ratio representation.

4.2 Soft-Logic Gates and Trellis Modules

We will now recall the examples of Section 3.4.1 and constructMain pitfalls appearalready in verysimple modules

the corresponding circuit diagrams. In many ways, they exhibitthe main problems that may arise during the construction ofsuch trellis modules. We will discuss these issues during thepresentation of the individual examples.

Soft-XOR gate. Let us start with the most simple module, theButterfly trellis isvisible in the circuitdiagram

soft-XOR gate. It can be drawn directly using its butterfly trellissection that has been derived from its binary indicator functionf . This circuit module is shown in Fig. 4.6. Like all modules

Page 83: On the Design of Analog VLSI Iterative Decoders - Electronics

4.2. Soft-Logic Gates and Trellis Modules 67

I1,1 I1,n

. . .

. . .

Ix,1

Im,1 Im,n

. . .

. . .

Ix,m

. . .

. . .

. . .

Vy,1

Vy,n

Vy,

1-V

y,n=

nUT

ln[p

Y(y

1)/p

Y(y

n)]

The generic multiplication matrix with voltage inputs for thepY distribution representing log-likelihood ratios.

Figure 4.5

with binary input distributions, it consists of 6 core transistorsforming the multiplication matrix and its characteristic connec-tion pattern of the trellis. In fact the trellis pattern will be di-rectly visible on silicon if the devices are properly arranged.This fact may be helpful in order to create automatic tools forgenerating such building blocks in a chip-design environment.All product terms are used to build the output probability dis-tribution pZ . The output terms are mirrored by the current mir-rors on top of the kernel circuit. The input currentsIx,i arealso passed through an input current mirror. By doing this, themodule gets freely cascadable by simply connecting the outputcurrent vectorIz to one of the input current vectorsIx and I y,respectively, of the next circuit section. This method marks themost simple way of interconnection of several building blocks.Note also that all the current mirrors may equally well oper-ate in the strong inversion region of MOS transistors, therebyhaving a standard quadratic behaviour.

By defining a proper data representation (different from our Soft-XOR gate is aversion of the Gilbert

multiplierprobability mass functions), the soft-XOR circuit can be iden-tified as a version of the so-called Gilbert multiplier [121] fortwo real-valued differential inputs. But in Gilbert-multipliers,

Page 84: On the Design of Analog VLSI Iterative Decoders - Electronics

68 Chapter 4. Circuit Implementation

Iy p(y=0)

Iz p(z=0)

Ix p(x=0)

Iy p(y=1)

Iz p(z=1)

Ix p(x=1)

Figure 4.6 Transistor-level implementation of the soft-XOR buildingblock.

the inherent scaling of the output signal by the sum of the cur-rent inputsIx,i is generally not wanted. Furthermore, the outputvalues are limited to this current sum which limits the appli-cation of the Gilbert multiplier for general-purpose real-valuemultiplications even more.

Beside the Gilbert multiplier [121], many other forms of solid-Other multiplierimplementations exist state multipliers have been developed since. Some of them

rely on cross-coupled transistor pairs both in MOS and bipo-lar technology [122–124], but also different approaches existsuch as the quadratic-translinear principle [125], the quarter-square principle [126, 127] and floating-gate MOSFET tech-niques [128, 129]. But none of them reaches the outstandingsimplicity of doing a multiplication matrix with only one tran-sistor per multiplication. Thus, it is hard to see such bulky mul-tiplier circuits in very large analog probability propagation net-works where thousands if not even millions of such multipliersare needed.

Equal gate. The second example is identical to the previousDummy paths need aproper termination example, except for the two special signal paths that attract our

attention. The two outputs of the transistors in the middle of

Page 85: On the Design of Analog VLSI Iterative Decoders - Electronics

4.2. Soft-Logic Gates and Trellis Modules 69

Iy p(y=0)

Iz p(z=0)

Ix p(x=0)

Iy p(y=1)

Iz p(z=1)

Ix p(x=1)

Transistor-level implementation of the equal gate buildingblock.

Figure 4.7

the circuit diagram of Fig. 4.7 are not used for the computa-tion of the output distribution of the equal gate. Thus they rep-resent dummy paths. They are terminated in order to ensurethe correct operation of the two transistors of the kernel circuit.In practical circuits, one tries to keep the drain voltage levelsof the output transistors of the computation kernel at the samelevel. This reduces the effect of the finite output resistance ofthese transistors which is largely drain-voltage dependent. Onepossible solution could be to pass the dummy currents througha diode-connected p-type transistor as in the input section ofa current mirror. Through its heavy compression, the voltageswing is kept minimal at this point. A more effective, but alsocostly solution in terms of circuit overhead would be the addi-tion of cascode stages for each output transistor of the kernel.

Let us come back to the dummy product terms. The corre-Asymmetric trellisdiagrams have to be

expanded fora clean circuit

implementation

sponding transistors cannot simply be omitted because for acorrect operation, the multiplication matrix has always to en-close alln ·m transistors. By omitting certain transistors in themultiplication matrix, the input distributions would be changedby building them only on a subset of the terms. The observationthat all product terms have to be present in the multiplication

Page 86: On the Design of Analog VLSI Iterative Decoders - Electronics

70 Chapter 4. Circuit Implementation

zy

a)

x

11

00

1

0zy

b)

x

11

0

10

01

0

dummy

Figure 4.8 Trellis expansion operation for a correct circuitimplementation. Fig. a) shows the original trellis of the equalgate and b) the expanded trellis of the same gate, ready fortransistor implementation.

matrix can also be illustrated on the trellis diagram level of thebinary indicator functionf . At every left state of the incomingdistributionpZ , branches for every incoming symbol ofy haveto be drawn. Valid configurations, i.e., branches actually drawnin the final trellis diagram, lead to the right states, whereas non-valid configurations lead to some dummy right state. Exactlythese branches leading to the dummy state represent the dummypaths that are afterwards present in the circuits. As a generalstatement we can say that every state of the trellis diagram hasto carry the same number and the same type of the outgoingbranches to enable the correct circuit implementation. This ex-pansion, sometimes necessary for a circuit implementation, isshown schematically in Fig. 4.8 for the equal gate.

AND gate backwards. In our third example, the backward rea-Backward reasoningthrough logic gates soning of a soft-AND gate, we encounter again a speciality. As

already shown in the trellis diagram of Fig. 3.14c), the productterm pX(0)pY(0) is used twice. Compared to a voltage dupli-cation, current duplicates are not for free. Thus instead of firstadding product terms according to the binary indicator functionf , we first have to copy the currents of the individual productterms by means of a current mirror and build the sum only af-terwards. The product termpX(1)pY(0) is not used at all in thiscase. This leads to the circuit implementation of Fig. 4.9.

Page 87: On the Design of Analog VLSI Iterative Decoders - Electronics

4.2. Soft-Logic Gates and Trellis Modules 71

Iy p(y=0)

Iz p(z=0)

Ix p(x=0)

Iy p(y=1)

Iz p(z=1)

Ix p(x=1)

Transistor-level implementation of the backward reasoning ofan AND gate building block, i.e., from one input and the output

back to the other input of the AND gate.

Figure 4.9

Page 88: On the Design of Analog VLSI Iterative Decoders - Electronics

72 Chapter 4. Circuit Implementation

General Trellis Diagrams. The implementation of a generalAny trellis diagramcan be implemented trellis diagram is straightforward if one thinks of the previous

three examples. All the ingredients necessary for the circuitimplementation are presented therein. For a trellis module withn andm elements of the incoming probability distribution, ann×m transistor matrix is drawn. Additionally, for the distribu-tion pY, n diode-connected transistors are drawn and connectedto the matrix according to Fig. 4.2 for current inputs. If the log-likelihood representation is chosen, these transistors are omit-ted. The inputs forpX are completed by current mirrors and theoutcoming product terms of the transistor matrix are then con-nected according to the trellis diagram. Unused product termsare connected to a dummy node. Finally, for each term of theoutput distributionpZ , a current mirror can be added to allowcascading the module directly with other modules.

4.3 Connecting Building Blocks

In order to build large probability propagation networks, manyLarge networks needthe connection ofmany building blocks

building blocks have to be cascaded. Since the input and outputsignals can be represented by current vectors or voltage vectors,many different possibilities for cascading single stages may beconsidered. The easiest solution has already been shown in theexamples of Section 4.2. Solutions different from the one previ-ously shown are presented in the following. Another issue maybe the current loss by unused product terms of the multiplica-tion matrix. If many stages are cascaded, the currents may tendto zero and vanish in the noise floor of the electronic circuit.Therefore, the current levels have to be brought to a reasonablelevel from time to time. Solutions to that problem are presentedin the Section 4.3.3

4.3.1 Current- or Voltage-Mode Connections?

The debate between voltage-mode adherents and current-modeOngoing debatevoltage mode vs.current mode

supporters is going on for already a few years. Both of themthink that their thinking-style is the most appropriate for high-performance integrated-circuit design. But in our opinion, itdoes not matter whether one wishes to implement a voltage

Page 89: On the Design of Analog VLSI Iterative Decoders - Electronics

4.3. Connecting Building Blocks 73

Iy2Iy1

Ix1

ΣΠ

Current Mirrors

Current Mirrors

ΣΠ

Iz2Current Mirrors

Current Mirrors

Interconnection of several modules by simple current mirrors. Figure 4.10

mode circuit or a current-mode circuit. It is much more im-portant whether the circuit fits in the overall system or not.It is even shown in [42] that no fundamental difference existsbetween the two domains. Thus we will look at the questionof choosing voltage-mode or current-mode interconnects onlywith respect to the best fit in the complete system.

As we have already seen in Section 4.2, the most simple solu-Connecting buildingblocks is most simple

with current mirrorstion for cascading our basic building blocks is to pull out thecurrents at the output by simple current mirrors and feed themdirectly to the input of the following building block. This in-troduces the least circuit overhead in terms of transistor count.Extra transistors can be added to the current mirrors to dupli-cate the currents several times if they are needed. By doing this,no superfluous domain-changes have to be made. One modulejust smoothly connects to another as is shown in Fig. 4.10.

In the eyes of traditional IC-designers, this whole circuit is aThe building blocksmay be analyzed with

non-traditionalapproaches

real nightmare. Generally, they argue that there exist paths thatmay have probablity values of zero, thus conduct almost no cur-rent, and therefore are infinitely slow. Their solution would beto add bias current-sources to turn the whole circuit into a fullclass-A circuit. If we keep in mind the application field of ourcircuits, this would add a non-tolerable overhead to large net-works of our building blocks. Fortunately, such a traditionaldesign approach is not needed in our case. We can even argueintuitively that the large currents, i.e., the large probabilities,determine the transient behaviour of our circuits mostly, as isthe case with digital simulations. Small currents (or probabil-ities) are not negligible, but far less important in the case ofdecoding.

Page 90: On the Design of Analog VLSI Iterative Decoders - Electronics

74 Chapter 4. Circuit Implementation

Vy,1

Vy,2

Vx,1

Vx,2

Vx,1

Vx,2

Vz,1

Vz,2

Σcore

transistormatrix

coretransistor

matrix

Σcore

transistormatrix

1core

transistormatrix 1

22

Figure 4.11 Interconnection of several modules by level-shifter circuits asdescribed by Moerz et al. [70].

One can think of voltage-mode circuits as interconnects be-Voltage-modeconnections are alsopossible

tween voltage output modules and voltage input modules. Inthis case we save one connection because of the definition ofthe log-likelihood ratios of (4.8). The actual voltage-mode con-nection is established by level-shifters. The most simple im-plementation of such a level-shifter would be a source-followercircuit (see e.g. [130]). Unfortunately, the outputs of the sourcefollowers cannot be connected directly to the inputs of the fol-lowing stage. Additional domain transformations have to beadded by means of long-tailed pairs operating in the linear re-gion and a linear back-transformation to voltage mode. In con-trast to the actual computation circuit, the interconnection cir-cuits are then fully class-A. Moerzet al. [70] have chosen thisapproach in their chip implementation as shown in Fig. 4.11.The diode-connected transistors in the interconnect circuit aredesigned to work in the linear region and act as resistors. Notealso that the circuit has to be designed with transistors largerthan minimum size to obtain the desired performance. The in-terconnect circuit works well for small alphabet sizes of theinput distributions but is a potential source of severe signal er-rors due to the many domain changes. A careful design willeliminate these problems, but at the cost of a larger circuit area.Additionally, the voltage-mode connections suffer from inher-

Page 91: On the Design of Analog VLSI Iterative Decoders - Electronics

4.3. Connecting Building Blocks 75

Iy1

Ix

Σ

Π

IzCurrent Mirror

Σ

Iy2 Π

Current Mirror

Stacking several core circuits to get more than two inputdistributions for one building block.

Figure 4.12

ent temperature tracking problems as we will discuss in Sec-tion 4.4.4.

4.3.2Stacking and Folding Building Blocks

Beside simply cascading building blocks by current mirrors,Stacking circuits formultiple-input

building blocksthere exist two other schemes which may save transistors un-der certain circumstances. As a first possibility, circuit modulesmay be stacked. This avoids the use of additional current mir-rors. The number of stacked modules is thereby strictly limitedby the available headroom left by the supply voltage. Generally,the stacking technique is not possible for low-voltage applica-tions, i.e., below 5V, using state-of-the-art silicon technologies.Apart from the area savings, the main advantage of stacking isa reduced power consumption. The current mirror sequence fora whole discrete probability distribution represents a dummypath in the sense that the vertical current through that path doesnot contribute to the actual computation. By omitting as manyof these paths as possible, the total power consumption may bedrastically reduced.

A second solution to the unwanted vertical current dissipa-Folding reduces thecurrent-mirror counttion problem is creating folded building blocks as shown in

Page 92: On the Design of Analog VLSI Iterative Decoders - Electronics

76 Chapter 4. Circuit Implementation

Iy2

Iy1

Ix1

ΣΠ

Current Mirrors

Current Mirrors

ΠΣ

Iz2Current Mirrors

n-type

p-type

Figure 4.13 Folding n-type and p-type building blocks saves half of thecurrent mirrors.

Fig. 4.13. Hereby, the outputs of a first, normaln-type mod-ule, i.e., a module withn-type transistors, are passed by currentmirrors to a second,p-type module. Ap-type module is justbuilt by exchanging alln-type transistors byp-type transistors.From its appearance point of view, it may also be called the‘head-over’ version of the building block. By cascading foldedversions of the building block, half of the current mirrors aresaved. Additionally, no superfluous vertical currents are flow-ing. Unfortunately, this interconnecting technique is not suit-able for BiCMOS technology, since generally no good-qualityvertical BJTs are available. A second drawback of this foldingtechnique is its heavy use ofp-type MOS transistors which areby a factor of about 3 larger than theirn-type counterparts (dueto slower mobility of holes, etc.). This makesp-type circuitsconsiderably larger thann-type circuits.

4.3.3 Scaling Probabilities

The second issue in interconnecting many building blocks, be-Underflows in datarepresentation mayoccur in thesum-productalgorithm

sides choosing the right topology, is induced by the current lossof unused product terms. The problem is inherent to the sum-product algorithm, since the multiplication of a huge number ofpositive real values below unity tends towards zero. Thus onehas to pay attention to underflows in data representation evenin digital floating-point implementations. This issue is gener-ally resolved by scaling up the terms of the discrete probabilitydistributions from time to time to values acceptable in the datarepresentation.

Page 93: On the Design of Analog VLSI Iterative Decoders - Electronics

4.3. Connecting Building Blocks 77

Iout,1

Iref

. . .

Iout,n

. . .Iin,1

Vref

Iin,n

Vref

A vector normalizer circuit according to Gilbert [131]. Figure 4.14

In analog implementations of large probability propagation net-Signals may fade intothe noise floorworks, the fading problem of the probability values is even

more pronounced. The noise floor of the electronic circuits putsa lower limit on the minimal currents representing the signalsin analog electronic systems. As a rule of thumb, current-signallevels should not fall below a few 10−13 A. This limit is aboutin the same region where the exponential law of (4.1) and (4.2)is not valid anymore. So the probability mass functions repre-sented by current vectors have to be scaled up electronically.In [131] Gilbert has presented an array-normalizer circuit thatimplements exactly the needed scaling function. The circuit ofFig. 4.14 can easily be explained by using the translinear prin-ciple [107].

The scaling circuit of Fig. 4.14 is actually a degenerate versionVery efficient scalingcircuitof the fundamental circuit of Fig. 4.2 withm = 1 andIx,1 fixed

to some constant current. Its function is given by

Iout,j = Iref I in, j /

n∑`=1

I in,`. (4.9)

The scaling operation can be integrated into the building blocks. Adding scalingcircuits at the output

eliminates currentfading

Whether it is at the output of the current module (Fig. 4.15a) orat the input of the next stage (Fig. 4.15b) does not matter at firstglance. Scaling at the output has the advantage that small cur-rents are brought back to a nominal level immediately after theblock that caused the current loss and therefore speeds up the

Page 94: On the Design of Analog VLSI Iterative Decoders - Electronics

78 Chapter 4. Circuit Implementation

Iz

Iy

Iref

Iref

Iz,1 Iz,2

Vref

Ix,1 Ix,2

Vref

ΣΠ

Iy

Ix

ΣΠ

Current Mirror

Current Mirror

Figure 4.15 Scaling circuits with a) scaling at the output and b) scaling atthe input.

overall network. Unfortunately, BiCMOS technologies gener-ally provide fast vertical NPN transisitors and only relativelyslow lateral PNP transistors, which would have to be used forthe scaling circuits located at the output of a module. Thus, forpractical reasons, scaling in high-speed circuits using BiCMOStechnology will mainly be accomplished in the input stage ofthe following module as in Fig. 4.15b). An example of a mod-ule with scaling at the output is shown in Fig. 4.16. For thiswe have retaken the equal gate of Fig. 4.7 and added a scalingcircuit using weakly inverted pMOS transistors.

By adding current mirrors at the outputs, the scaling circuit caneven be expanded into a building block of its own. This so-lution is better suited for low-voltage implementations, sinceonly three transistors are stacked in any given module. But thisis at the cost of increased power consumption, since additionalmodules are introduced. In many applications, scaling after ev-ery module is not necessary. It suffices to scale only after everythird or fourth module, or even later. But this is highly appli-cation specific and has to be investigated for every consideredcase or code.

Page 95: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 79

Iy p(y=0)

Iz p(z=0)

Ix p(x=0)

Iy p(y=1)

Iz p(z=1)

Iref

Ix p(x=1)

The equal gate circuit of Fig. 4.7 with scaled-up outputs. Figure 4.16

4.4Implementation Issues

Both simulations and physical implementations of our decoding Several problemsarise in an actual

implementation of aprobability-propagation

network

networks have shown a high immunity against non-ideal circuitbehaviour. The decoder of Fig. 5.3, which we will discuss inChapter 5, has been implemented using discrete BJT transistorsout of the box, i.e., without any preliminary matching selection.Nevertheless the overall precision for this decoding network iswithin 5% of the theoretical values. This result should give afirst impression of the robustness of the new technique. In thefollowing subsection, we will give deeper insight into severalnon-ideal effects that may occur during physical implementa-tion of the proposed analog probability propagation networks,such as device matching, temperature matching, and finite inputresistance and output conductance of the transistors. Appropri-ate countermeasures against these effects are given if available.

Page 96: On the Design of Analog VLSI Iterative Decoders - Electronics

80 Chapter 4. Circuit Implementation

4.4.1 Device Matching Considerations

Mismatch of transistors usually affects the functionality of ana-In bio-inspiredcircuits, precision isgained on systemlevel

log circuits more than digital ones. Correct operation can oftenonly be guaranteed by choosing large device sizes, which slowsdown the operation speed of the circuit. Fortunately, systemsfollowing thebio-inpireddesign style [11] seem to possess onebig advantage over conventional analog systems: precision isgained on the system level by a parallelization of many compu-tational units which are not inherently precise by themselves.In such systems, small device sizes do not degrade the overallprecision significantly, which makes high-speed operation pos-sible.

In a first attempt, we try to quantify the correspondence be-Quantification of thecurrent errors of asingle bipolartransistor

tween errors in the probability representation (current domain)and the errors in the log-likelihood representation (voltage do-main). To do this we recall the collector current equation (4.1)of a single bipolar junction transistor. A relative collector cur-rent errorε is introduced. Through some mathematical oper-ations, this error on the left-hand side of (4.10) is then prop-agated until its influence on the input base-emitter voltage isvisible:

IC(1+ ε) =(

I0eVBEUT

)· (1+ ε) (4.10)

= I0eVBEUT ·eln (1+ε)

= I0eVBE+UT ln(1+ε)

UT (4.11)

≈ I0eVBE+UTε

UT , (4.12)

where (4.12) has been derived from (4.11) by using the Tay-lor approximation ln(1+ ε) ≈ ε for small ε. This approxima-tion actually over-estimates the influence of relative current er-rors on the the base-emitter voltage for largerε. We observein (4.12) that relative errorsε in the collector currents can beequivalently expressed as absolute errors onVBE. If we assumea voltage swing of1VBE = 300 mV corresponding to a usablecurrent of about 5 decades (actually1VBE = 60 mV/dec in thecase of BJTs), we can put the absolute error onVBE in rela-tion to the total swing. For example, given a relative collectorcurrent errorε = 10%, which results in a voltage errorδVBE ofabout 2.59 mV, this ratio is then 300/2.59= 116. This corre-

Page 97: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 81

sponds to an equivalent resolution ofVBE of about 7 bits. Ta-ble 4.1 summarizes the relationship between the dynamic rangeof the circuit and the achievable resolution at the voltage level(log-likelihood ratios) for a given allowed collector current er-ror ε. Exactly the same results are obtained if a MOS tran-sistor in weak inversion instead of a BJT is considered. Theresolution may be reduced by higher circuit temperatures, sinceUT is directly proportional to the absolute temperature. Frominformation-theoretic considerations we know that an equiva-lent internal resolution of the log-likelihood ratios of about 4 to6 bits is often sufficient for Turbo decoding applications [132].For the external channel information, even a resolution of 3 to4 bits are generally sufficient for almost negligible degradationsin the BER characteristic [133,134]. Thus, matching problemsresulting in current errors should not fatally corrupt the overallcircuit behaviour, even if small devices are used in the circuits.

ε δVBE resolution @ DR= 4dec resolution @ DR= 5dec1VBE = 240mV 1VBE = 300mV

1% 0.258mV 9.9bits 10.2bits5% 1.26mV 7.6bits 7.9bits10% 2.47mV 6.6bits 6.9bits25% 5.78mV 5.4bits 5.7bits50% 10.5mV 4.5bits 4.8bits

Relation between dynamic range and resolution of a singlebipolar transistor or a single MOS transistor in weak

inversion.

Table 4.1

Since the analytical mismatch analysis for a complete decodingAnalytical mismatchanalysis of a

complete decoder isintractable

system is intractable at the present time, extensive Monte-Carloanalyses of several output parameters have been carried out forvarious decoder implementations. An example of such a simu-lation record is given in Fig. 4.17 and Fig. 4.18 for the tail-bitingtrellis decoder described in Section 5.2. Bit 1 and bit 3 havebeen toggled during their transmission and thus need to be cor-rected. Therefore they have been chosen in this particular code-word configuration. The decoding delays, i.e., the time neededby the decoder to change the output bits to correctly sliced out-put state, are denoted DD1 for bit 1 and DD3 for bit 3. Thestatistical simulations show that the behaviour of the networksis indeed inherently robust. All simulations have shown that the

Page 98: On the Design of Analog VLSI Iterative Decoders - Electronics

82 Chapter 4. Circuit Implementation

mu = 23.9197nsd = 3.8591nN = 765

mu = 27.5378nsd = 3.44699nN = 765

00

100

200

300

0

60

120

180

10 50 [ns] 20 29 39 [ns]

Decoding Delay of û1 (DD1) Decoding Delay of û3 (DD3)#samples #samples

Figure 4.17 Typical statistical distribution of the decoding delay of bit 1(DD1) and bit 3 (DD3) affected by device mismatch.

decoder converges to the correct ouput state. The robustness ofthe networks can even be increased by a proper description onthe system level, e.g. by the choice of an adequate equation setwhich describes the code (see Section 5.5.2).

The only effects of the transistor mismatch observed in the sim-Monte-Carlosimulations showonly a slightdegradation indecoding speed

ulation results is avariation of the decoding time. As shownin Fig. 4.17, the additive white Gaussian noise which modelsthe variations in the device geometries and parameters in theMonte-Carlo simulation is clearly visible in the distribution ofthe decoding delays. This effect can be modelled to a firstapproximation by additional Gaussian noise added during thetransmission of the symbols over the channel. For a given SNRin the received symbols, i.e., thea priori input probabilities, thedecoder takes some amount of time to converge to the correct

Page 99: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 83

2010

DD1 [ns]

DD3 [ns]DD1 vs. DD3

403020

38

24

28

26

22

32

36

34

30

The correlation between the decoding delay of bit 1 (DD1) andbit 3 (DD3) affected by device mismatch.

Figure 4.18

output state. The smaller the SNR is, the longer it takes the de-coder to correct the error bits. Taking the mismatch effects intoaccount, the SNR is slightly modified according to the simplemodel. Since the changes of the device parameters are stochas-tic, the decoding time gets shorter or longer, depending on theapplied codeword (Fig. 4.17). As is shown in Fig. 4.18, thecorrelation between the decoding delay variations is not verystrong (correlationr below 0.5). Thus it may be assumed thatthe transistor variations are independent.

Page 100: On the Design of Analog VLSI Iterative Decoders - Electronics

84 Chapter 4. Circuit Implementation

IIN

Q1

I1 I2 Im IIN

Q1

1

IIN

Q1

M1

a) b) c)

Figure 4.19 Base current compensation for the fundemental multipliermatrix circuit: a) initial input connection, b) with an idealvoltage unity gain buffer, c) simple transistor implementationwith one MOS transistor (source follower circuit).

4.4.2 Finite Current Gain

A second implementation issue, which can seriously affect theThe sum of all basecurrents may exceedinput current inBiCMOS

performance of the proposed circuits, is the finite current gainof bipolar transistors. Compared to the almost infinite inputresistance of MOS transistors, BJTs in a BiCMOS technologyexhibit a current gainβ of a few tens or hundreds and thus amuch smaller input resistance than the MOS counterparts. In aBiCMOS implementation, eachy-input to the core circuit hasto drivem+ 1 bases of then × m transistor array. If we con-sider trellis sections with two or four states (less than ten statesin general), this would not affect the overall performance toomuch. But withβ = 100 and a trellis section of a thousandstates, which is quite usual in channel equalization applications,we would need 10 times the input current to only drive the basesof the transistors. This is clearly not possible without additionalcircuitry. Note that even in the 16-state case we encounter a sys-tematic loss of about 16% in precision compared to the idealcase. Therefore we need a mechanism to compensate this loss.The basic diode-connected BJT Q1 of Fig. 4.19 a) has to be re-placed with a buffered version as shown in Fig. 4.19 b). Thesimplest circuit to implement the unity gain voltage buffer is thesource follower circuit in Fig. 4.19 c). The design of the voltagebuffer is a trade-off between speed and voltage overhead intro-duced by the additional gate-source voltage of transistor M1.

Page 101: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 85

4.4.3Finite Output Resistance

The impact of the finite output resistance of both MOS transis- Finite outputresistances may even

compensate otherproblems

tors and BJTs is far lower than expected. As has been observedin transient simulations of a complete decoder, the finite outputresistance even helps to partially compensate the finite currentgain of the BJTs. Once again, the highly connected structure ofthe analog decoding networks helps to recover from implemen-tation non-idealities. In CMOS versions of the circuits, how-ever, the finite output resistance overestimates the desired out-put probabilities, i.e., it compresses the probability ratio andthus reduces the log-likelihood ratios. In case of very cleardifferences, this is not a problem anyway, since decisions aremade easily under these circumstances. On the other hand, ifthe probabilities are almost equal, then the output probabilityratios are far less disturbed by finite output resistances, sincethe drain-source voltages are at almost the same levels. In sum-mary, the finite output resistance of both BJT and MOS tran-sistors is not a big issue. On the contrary, it may even help tocompensate other effects in the circuits of large networks.

If high-precision output values are required, standard tech-Use cascode circuitsfor higher-precision

outputsniques for improving the output resistance such as cascodestructures can be applied [135–137]. But this extra transistorcircuitry has to be replicated many times in order to improvethe whole network, and the increase in chip area may be unac-ceptable. It may be more economical in terms of chip area tospend only a few extra transistors for compensating the finitecurrent gain as seen in the previous subsection.

4.4.4Thermal Effects

An important performance issue of analog circuits is the ther-mal behaviour of different devices. The controlling base-emitter voltage of BJTs, for example, is known to show a−2mV/K temperature gradient (see e.g. [138]). Similar prob-lems arise in weakly inverted MOS transistors where a temper-ature dependence of the drain current with respect to the sourcevoltage can be observed [109]. This may cause severe prob-lems if thermal matching cannot be guaranteed. We will dis-cuss two different cases, namely the effects of temperature onprobability-based and on log-likelihood-ratio-based analog net-works.

Page 102: On the Design of Analog VLSI Iterative Decoders - Electronics

86 Chapter 4. Circuit Implementation

Temperature Dependence in Probability-BasedNetworks

In the following, three thermal situations will be discussed: a) aThree commonthermal situations uniform temperature distribution over the whole chip, b) a tem-

perature gradient over the chip but a uniform temperature distri-bution within one building block, and c) a temperature gradientwithin one of the building blocks. They may affect the tem-perature behaviour of probability based networks, i.e., the basicinformation exchanged between the basic building blocks arecurrent vectors representing discrete probability distributions.

a) A uniform temperature distribution affects all devicesin the same manner. Since the information travellingthrough the decoding network has the form ofcurrentratios, the temperature, which mainly affects the ther-mal voltageUT = kT/q, has no effect. As an exam-ple, consider the current ratio of two perfectly matchedbipolar transistors in a current mirror. No influence oftemperature on the current ratio can be observed in thiscase. Reasoning with translinear circuit principles [105],we see immediately that this simple example completelydescribes the temperature tracking behaviour within onecircuit module of our considered networks.

b) If the temperature within one module is approximatelyconstant, but the entire chip is affected by a slight tem-perature gradient, the solution is identical to the one ina), by following the same argumentation.

c) A temperature gradient within a circuit module is themost difficult case to handle. Depending on the natureof the temperature gradient, and depending on the lay-out chosen for the implementation, different values forthe temperature tracking error are obtained. To estimatethe maximum temperature difference within one module,we can make the following calculation. According to thethermal conductivity law, the temperature difference in-side a slice of semiconductor with areaA and thicknesssis given by

1T = Ps

λA, (4.13)

with the dissipated powerP and the thermal conductivitycoefficentλ.Applying (4.13) to the implemented decoder moduleswith A = 100 · 200µm2, s = 100 µm, P = 3 mW, and

Page 103: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 87

λ = 1.5 W/(m·K) for a silicon substrate, the maximumtemperature difference will be1T = 0.1 K. For such asmall value, the error introduced by thermal mismatch isnegligible compared to the device matching error:

εrel(1T) = −1+exp

(VBEkTnom

q

)− 1TTnom+1T

(4.14)

At room temperature (300 K) andVBE = 0.7 V, the erroris approx. 1%. Although this type of error is not strictlyrandom, it can well be approximated by additional noise,similarly as in the device-matching case.

It is most likely that we will encounter a mix of cases b) andc) in our circuit chips, because the body of the package inducesa uniform temperature over the entire integrated circuit. Underthese assumptions, thermal effects will not have a noticeableimpact on the function of the probability-propagation networks.

Temperature Dependence in Log-Likelihood-RatioBased Networks

The second case, where the information exchanged betweenThe log-likelihoodratio representation

appears astemperature-

dependentvoltages. . .

the different circuit modules is represented by voltages corre-sponding to log-likelihoods (or voltage differences for the log-likelihood ratios as introduced in Section 4.1.4), is more prob-lematic. As we have noticed in analyzing (4.2), the drain cur-rent of a weakly inverted MOS transistor is temperature depen-dent through the thermal voltageUT and the slope factorn. Forthe following temperature analysis, we can rewrite (4.2) as

ID = I0(T)e(VG−nVS)/UT = I0(T)e(αVG−βVS)/UT , (4.15)

whereα(T) andβ(T) are arbitrary parameters, introduced forthe sake of simplicity in the following mathematical analysis.If we use this notation of the drain currents, we can describethe voltagesVy, j , no matter whether implicitly present at inputsof the circuit of Fig. 4.2 or applied to the inputs of the circuit of

Page 104: On the Design of Analog VLSI Iterative Decoders - Electronics

88 Chapter 4. Circuit Implementation

Fig. 4.5, by

Vy, j = 1

α

(UT ln

I y, j

I0+βVref

)(4.16)

= 1

α

(UT ln pY(yj )+UT ln

I y

I0+βVref

)(4.17)

= 1

αUT ln pY(yj )+ Vos(T), (4.18)

where Vref is the source reference potential of the diode-connected transistors at the input of the transistor matrix andVos(T) is a temperature-dependent offset voltage which can befreely chosen (corresponding to the free choice of the sum cur-rent I y in (4.18)).

Now assume a voltage mode connection situation, i.e., the prob-. . . hence do notconsider voltage-mode-connections forlarge probability-propagationnetworks

abilities are transformed to log-likelihoods by means of a log-arithm circuit at the output of the circuit modules, and furtherassume two different temperaturesT1 andT2 at the output andthe input of the two modules to connect. Then the tempera-ture dependent offset term cancels smoothly if we use a dif-ferential signal representation, i.e., assuming log-likelihood ra-tios, But unfortunately we observe that the temperature depen-dent termUT/α introduces a non-recoverable error in the coretransistor matrix, although the differential data representationwould allow error-free signal transmission in theory. The volt-age error proportional to the absolute temperatureT is evenamplified on the drain-current level by the inherent exponen-tial VI-characteristic of the transistor. Hence, the main con-clusion from the above reasoning is that it is not advisableto use voltage-mode connections for global connections on alarge analog network, where the temperature cannot be guar-anteed the same on the whole circuit chip. However, for verysmall networks, such as for example the one presented by Mo-erz et al. [70], the temperature effects seem to be negligible.The same temperature problem also affects the BJT case, wherewe observe the famous−2 mV/K temperature gradient on thebase-emitter voltage of the transistor.

Page 105: On the Design of Analog VLSI Iterative Decoders - Electronics

4.4. Implementation Issues 89

4.4.5Other Implementation Issues

Topology-Induced Problems

Besides issues directly related to individual circuit devices, weBiasing large analognetworks requirescareful design and

layout

also encountertopology-induced problems. By this we meanthat, for example, biasing a large probability propagation net-work may cause severe problems since no local matching can beguaranteed for a distributed biasing networks (e.g. distributedcurrent mirrors for the current sources needed in each cell).Since the geometrical dimensions may get very large, addi-tional effects such as non-zero resistance of long metal wiresshow up. This may affect signal tracks as well as power-supplylines. It must be kept in mind during the design phase that a dis-tributed bias-network implemented with BJTs draws a consid-erable amount of base current which causes large voltage dropson long metal tracks. In the extreme, these voltage drops mayprevent the whole network from working correctly. Compara-ble problems arise in digital circuits for the clock distribution.There the solution is to balance the load in different branchesof a clock distribution tree instead of having one large singletrack. Adapted to our networks, this would mean using localrepeaters for the biasing circuits. Errors introduced by thesecircuits are not critical, since all calculations rely on relativesignal strength.

Construction of Large Analog Networks

A second, more general implementation issue is how to con- Large analogprobability

propagation networksare built by

construction

struct large analog computational networks. Up to a few hun-dred transistors, an analog system may be drawn very easily if ahierarchic design approach is chosen. But imagine a large fac-tor graph of several hundreds or thousands of individual nodes.How do you want to make sure that, after a long day of drawinginterconnection lines between the individual building blocks,you do not make drawing mistakes? It is certainly a good ideanot to rely on your own drawing capability if a schematic can begenerated by a computer program. In the context of coding, thestructure of a block code is for example described by its parity-check matrixH. This matrix may serve as basis for many dif-ferent design steps: the code’s performance may be evaluatedby using the matrix in a simulation program, but it may also

Page 106: On the Design of Analog VLSI Iterative Decoders - Electronics

90 Chapter 4. Circuit Implementation

serve as the basis for our schematic generation program. Thisapproach has been demonstrated in the decoder example of Sec-tion 5.3. In general, it will not be possible to design a large ana-log network first-time-right without computer aided design. Bythis we subsume not only computer-aided drawing (CAD), butalso computer-aided engineering (CAE), which includes muchmore than only the sketch of schematics.

Testability

Another big issue of such large networks istestability. HowDesign analognetworks fortestablity

can we guarantee that a circuit leaving the waferfab works asexpected? Rudimentary tests such as checking the supply cur-rent or verifiying individual test blocks are generally not suffi-cent to guarantee the overall functionality. Testing large digitalcircuits is much easier than doing the same thing for analognetworks. Boundary scan and JTAG test access ports are com-monly used today for looking inside the working digital circuit.They mostly rely on digital registers that can be adressed andread out serially on certain circuit chip pins. Testing large ana-log systems is much more difficult. The measuring circuitryshould not modifiy (by creating additional loads on the inter-esting nodes) the overall behaviour. Additionally, the resolu-tion of measured values should be better than the resolution ofthe actual circuit under test. This means that the circuits formeasuring have to be more precise, and are thus in general alsomore complex and space consuming, than the circuit to verify.Even if a measurement circuit can be shared among many cir-cuit nodes to test, it may add a considerable overhead to theoverall network. So it would be desired that the circuit’s func-tionality could be guaranteed by design. One approach to thismay lie in an information-theoretic approach that tries to quan-tify the impact of individual error sources to an overall proba-bility propagation network. Unfortunately, we did not have thetime yet to investigate such an approach, but it will be subjectto future research.

Page 107: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 5

Decoder Examples

In this chapter, we describe several decoder designs. The de- Five decoderexamples are

discussed on variouslevels of completion

coder examples are discussed on various levels of completion.First we will discuss an implementation of a very simple trel-lis code using discrete bipolar transistors. The result of thiseffort is a demonstration unit giving static output results. Thesecond example consists of the complete VLSI implementationof a short tail-biting trellis code. For this example we are ableto present dynamic measurement results. The third examplehas actually not been tested, since bad bonding contacts madeby a subcontractor prevented us from measuring the chip. Thetwo examples at the end represent design studies that we willuse as the basis for further projects at our lab. Note that largeschematics are placed in the appendix at the end of this chapterto simplify the reading of the text.

5.1Decoder for a Simple Trellis Code

5.1.1Code Description

As a first complete decoder example, we examine a decoder forSimple binary [5,2,3]block code describedas trellis diagram. . .

a binary [5,2,3] block code, i.e., 2 databitsui are encoded intocodewordsx of length 5 and with Hamming distances betweendifferent codewords being at least 3. The code consists onlyof the four codewords [0,0,0,0,0], [0,0,1,1,1], [1,1,0,1,1],[1,1,1,0,0]. The first and third bit (underlined),x1 andx3 re-spectively, are considered as information bits. The considered[5,2,3] block code issystematic, since uncoded information bitsare also present in the codeword. The code can be representedby a 5-section trellis diagram as given in Fig. 5.1. Hence, avalid codeword is indicated by one of the four paths throughthat trellis diagram.

Page 108: On the Design of Analog VLSI Iterative Decoders - Electronics

92 Chapter 5. Decoder Examples

0

x1 x2 x3 x4 x5

11

1

0

1

0

1

0

1

0

0

Figure 5.1 A simple trellis code consisting only of 4 codewords.

x1

y1

x2 x3 x4 x5

y2 y3 y4 y5

Figure 5.2 Factor-graph representation of the binary [5,2,3] trellis code.

Corresponding to the trellis diagram of Fig. 5.1, we can directly. . .and as factor graphdraw the augmented factor graph as shown in Fig. 5.2. It di-rectly describes the topology of the decoder. Each functionnode in the lower part of the factor graph (black rectangle) cor-responds to one trellis section of the code. State variables aredrawn as double circles, whereas the observable variables areshown as single circles.

The block diagram of the decoder network is shown in Fig. 5.3Block diagram of theanalog decodercircuit

with trellis modules as in Fig. 5.4. It is a direct implementa-tion of the forward-backward algorithm [103] which is a spe-cial adaptation of the general sum-product algorithm to codesdescribed by trellis diagrams. In principle we may draw a cor-responding building block for each function node in the blockdiagram. Since the building blocks are only uni-directional,we draw them separately for both directions: the upper rowof the decoding network implements the forward part and themiddle row the backward part of the forward-backward algo-rithm. The soft-output decoder network of Fig. 5.3 computesthea posterioriprobabilities of the information bits only. Hard-decisions of the information bits can easily be formed from

Page 109: On the Design of Analog VLSI Iterative Decoders - Electronics

5.1. Decoder for a Simple Trellis Code 93

4

3

2

2

1

1

2

1

65

1 2 3 4

3 4

5 6

2

15

6 1 23 4 2

15

6

3

43 42

159 8 10 7

3 4 5 6

6 3 4

7 8

A1

B2 B3 C

D

B4 A2

B1

µ1(0)

p(x1=0|y) p(x1=1|y)

p(x3=0|y) p(x3=1|y)

µ1(1) µ2(0) µ2(1) µ3(0) µ3(1) µ4(0) µ4(1) µ5(0) µ5(1)

A decoder for the code of Fig. 5.1. The trellis implementationof modules A-D are given in Fig. 5.4.

Figure 5.3

Page 110: On the Design of Analog VLSI Iterative Decoders - Electronics

94 Chapter 5. Decoder Examples

1

0

1

0

01

001110

1

0

1

00

1

1

0

00

01

10

11

1

00

0

0

1

0

1

11

1

0

1

0

module A module B

module C module D

Figure 5.4 Trellis modules for the decoder network of Fig. 5.3.

thesea posterioriprobabilities by using a bit slicer. The dashedboxes in Fig. 5.3 correspond to those computations in the gen-eral forward-backward algorithm which are not used in this ex-ample, since they do not contribute to the final result.

In the following paragraphs, all messages passed from block toMessages that arepassed in the decoderare discussedindividually

block in the decoder (or from node to node in the factor graph)will be deduced step by step to point out the underlying algo-rithm. We assume that a codeword is transmitted over a BSCwith transition probabilityp(y|x). Let y = [y1, . . . , y5] be thereceived channel output. Furthermore, we assume that theapriori probability is uniform over the codewords. Introducingthe abbreviationµi (b) , p(yi |xi =b), thea posterioriprobabil-ity of a codewordx = [x1,x2,x3,x4,x5] can be written as

p(x|y) = γ µ1(x1)µ2(x2)µ3(x3)µ4(x4)µ5(x5), (5.1)

whereγ is a scale factor that does not depend on the codeword.In our example, thea posterioriprobabilities of the information

Page 111: On the Design of Analog VLSI Iterative Decoders - Electronics

5.1. Decoder for a Simple Trellis Code 95

bits are thus given by

p(x1=0|y) = γ ·(µ1(0)µ2(0)µ3(0)µ4(0)µ5(0)

+µ1(0)µ2(0)µ3(1)µ4(1)µ5(1)) (5.2)

p(x1=1|y) = γ ·(µ1(1)µ2(1)µ3(0)µ4(1)µ5(1)

+µ1(1)µ2(1)µ3(1)µ4(0)µ5(0)) (5.3)

p(x3=0|y) = γ ·(µ1(0)µ2(0)µ3(0)µ4(0)µ5(0)

+µ1(1)µ2(1)µ3(0)µ4(1)µ5(1)) (5.4)

p(x3=1|y) = γ ·(µ1(0)µ2(0)µ3(1)µ4(1)µ5(1)

+µ1(1)µ2(1)µ3(1)µ4(0)µ5(0)). (5.5)

These quantities, up to the scale factorγ , are computed by thedecoding network of Fig. 5.3.

We begin the detailed description with the middle row (the Backwardcomputations of the

forward-backwardalgorithm

backwards computation) of Fig. 5.3. Module A2 simply scalesthe input vector [µ5(0),µ5(1)]T to some fixed level. For the re-maining part of this section, we shall not distinguish betweendifferently scaled versions of a vector. Then module B4 com-putes the vector [µ4(0)µ5(0),µ4(1)µ5(1)]T . Module C com-putes the vector

µ3(0)µ4(0)µ5(0)µ3(1)µ4(0)µ5(0)µ3(0)µ4(1)µ5(1)µ3(1)µ4(1)µ5(1)

, (5.6)

and from that the vector[µ3(0)µ4(0)µ5(0)+µ3(1)µ4(1)µ5(1)µ3(1)µ4(0)µ5(0)+µ3(0)µ4(1)µ5(1)

].

From the latter, the module B3 computes the vector[µ2(0)

(µ3(0)µ4(0)µ5(0)+µ3(1)µ4(1)µ5(1)

)µ2(1)

(µ3(1)µ4(0)µ5(0)+µ3(0)µ4(1)µ5(1)

) ] .

From that, the module B2 computes[µ1(0)µ2(0)

(µ3(0)µ4(0)µ5(0)+µ3(1)µ4(1)µ5(1)

)µ1(1)µ2(1)

(µ3(1)µ4(0)µ5(0)+µ3(0)µ4(1)µ5(1)

) ] ,

which is proportional to [p(x1=0|y), p(x1=1|y)]T .

Page 112: On the Design of Analog VLSI Iterative Decoders - Electronics

96 Chapter 5. Decoder Examples

Iref

3out

1in

4out

2in

Figure 5.5 Schematic of moduleA: a simple scaling circuit.

In the upper row (the forward computation) of Fig. 5.3, theForwardcomputations of theforward-backwardalgorithm

module A1 scales the input vector [µ1(0),µ1(1)]T to some fixedlevel, and the module B1 computes [µ1(0)µ2(0),µ1(1)µ2(1)]T .In the bottom row (combination), the module D computes fromB1’s output and (5.6) the vector

[µ1(0)µ2(0)µ3(0)µ4(0)µ5(0)+µ1(1)µ2(1)µ3(0)µ4(1)µ5(1)µ1(0)µ2(0)µ3(1)µ4(1)µ5(1)+µ1(1)µ2(1)µ3(1)µ4(0)µ5(0)

],

which finally is proportional to [p(x3=0|y), p(x3=1|y)]T .

5.1.2 Implementation Using Discrete Transistors

The decoder of Fig. 5.3 was implemented on a printed circuitCircuits implementedwith discrete BJTs board (PCB) using discrete bipolar transistors. We selected the

CA3096 transistor array [139] for both NPN and PNP transis-tors. On top of each generic module, an output current scalingcircuit using PNP transistors was added to prevent the proba-blity signals fading into the noise floor. Fig. 5.5 to Fig. 5.8 showthe individual circuit schematics. The numbers near the inputpins and output pins correspond directly to the ones found inthe block diagram of Fig. 5.3.

The purpose of this simple decoder was purely demonstrational.Demonstrationpurpose of thedecoder circuit

Thus LED bargraph displays were added to monitor both inputand output values in a linear scale and a logarithmic scale. Ad-ditionally, easy-to-use potentiometric input-probability value

Page 113: On the Design of Analog VLSI Iterative Decoders - Electronics

5.1. Decoder for a Simple Trellis Code 97

3in

5out

1in

4in

6out

2in

Iref

Schematic of moduleB: the equal gate. Figure 5.6

controls were used. They allow to preset the probabilities from0.5 to 0.999 and the sign, i.e., the choice of either a ’1’ or a’0’, for each input valueµi individually. A photograph of thedemonstration unit built at our electronics laboratory is shownin Fig. 5.9.

The readout of the output values is purely static. No transientReadout values aredisplayed by linear

and a logarithmicLED bargraph

display

information can be obtained from the built demonstration unit.The comparison of the measureda posterioriprobabilities withthe theoretic values (using the calculations of the previous, in-troductory section) showed a remarkably close agreement, al-though no preselection of the individual bipolar transistors wascarried out. Harris’ transistor array CA3096 contains 3 NPNand 2 PNP BJTs. These transistors were partitioned automati-cally among the modules on the schematic by the packager ofthe PCB design software. Even if one had taken into accountthat two matched transistor pairs are present on each array, wecould not have assured that all transistors to be matched in the-ory would be in the same package. For the assembly, the tran-sistor arrays were just taken out of the box and soldered on thePCB. Hence, one would expect a very bad matching behaviour

Page 114: On the Design of Analog VLSI Iterative Decoders - Electronics

98 Chapter 5. Decoder Examples

IrefIref

5out

7out

9out

3in

1in

6out

8out

10out

4in

2in

Figure 5.7 Schematic of moduleC: butterfly connection with previouscurrent duplication.

Page 115: On the Design of Analog VLSI Iterative Decoders - Electronics

5.1. Decoder for a Simple Trellis Code 99

5in

7out

1in

4in

8out

2in

3in

6in

Iref

Schematic of moduleD: combining forward and backwardcomputations.

Figure 5.8

Page 116: On the Design of Analog VLSI Iterative Decoders - Electronics

100

Cha

pter

5.D

ecod

erEx

ampl

es

Figure 5.9: Demonstration unit of a MAP decoder for the [5,2,3] block code. The LED bargraphs on the bottom of the frontpanel represent the probabilities p(x1 = 1|y) and p(x3 = 1|y) on a linear scale. The display section on the right-hand side is alogarithmic-scale display of the selectable input and output values.

Page 117: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 101

of the overall system. Additionally, the transistors are on differ-ent substrates and thus may also experience different tempera-tures. Despite these harsh conditions, an overall static precisionof about±5% was measured using both the linear scale and thelogarithmic scale outputs.

5.2Decoder for a Tail-Biting Trellis Code

Our second example of a complete decoder was implemented A first chip-levelimplementationon silicon using the 0.8µm double-poly, double-metal BiC-

MOS process of AMS [140]. We discuss the code description ofthe tail-biting trellis code, the simulation results, the test setup,and the measurement results, and do a coarse comparison to anequivalent digital implementation.

5.2.1General Description

Description of the Code

The considered binary [18,9,5] code is a tail-biting trellis codeThe tail-biting trelliscodeas introduced in Section 2.1.5. The trellis diagram for the code

consists of nine equal sections as the one shown in Fig. 5.10.First, these nine equal trellis sections are cascaded like an or-dinary trellis. Then the outgoing states of the last section areidentified with the starting states of the first section to form thisclosed structure. A valid codeword is a path that starts in anarbitrary state, goes through the entire trellis one time and endsin the same state.

The encoding of the dataword needs some special attention dueto the tail-biting nature of the code. It can be carried out withthe convolutional encoder of Fig. 5.11. First, the convolutionalencoder is reset to the all-zeros state. Second, 9 information bitsu1, . . . ,u9 are fed, one by one, into the convolutional encoder;in this process, 18 output bits (9 pairs) are generated, which wewill refer to asx′

1, x′2, . . . ,x′

18. Third, two dummy zero bits arefed into the convolutional encoder to drive it back to the all-zerostate and thereby generating four extra output bitsx′

19, . . . ,x′22.

Page 118: On the Design of Analog VLSI Iterative Decoders - Electronics

102 Chapter 5. Decoder Examples

00

11

01

left states uncoded/coded right states

10

00

11

01

10

0/00

1/11

1/00

0/11

1/01

0/10

1/10

0/01

Figure 5.10 One section of the binary [18,9,5] tail-biting trellis code.

ui

x'2i-1

x'2i

D D

Figure 5.11 A convolutional encoder for the binary [18,9,5] tail-bitingtrellis code.

Page 119: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 103

Finally, the codewordx = [x1,x2, . . . ,x18] is formed by the rule

xi ={

x′i ⊕ x′

i+18 i = 1,. . . ,4;x′

i i = 5,. . . ,18.(5.7)

Note that we could also encode the information bits differently:first we initialize the encoder state with the two last informationbits, then we apply successively information bits fork = 18time-steps. The output of the encoder is then directly the de-sired codeword. However, a disadvantage of this method is thenecessary initialization of the encoder, but one needs less clocksteps for the encoding process.

For any closed path in the tail-biting trellis, both the informa-tion bits and the corresponding coded bits can be read off theedge labels along the path.

Factor Graph and Block Diagram of the Decoder

Having the examples from Section 3.2 in mind, sketching theThe factor graph ofthe tail-biting trellis

code has a ringstructure

augmented factor graph of Fig. 5.12 for the tail-biting trelliscode is straightforward. The tail-biting nature of the code isclearly visible in this drawing.

In the following, it is assumed that the codewords are trans-Data transmission isassumed on a BSCmitted over a memoryless channel with transition probabilities

p(y|x), x ∈ {0,1}. A complete decoding network for this codeis given in Fig. 5.13, with the computation modules defined inFig. 5.14. Each signal line in Fig. 5.13 represents a whole prob-ability mass function. The inputs and outputs of the decoderare probability mass functions defined on a two-letter alphabet,whereas the remaining signals represent probability mass func-tions that are defined on a four-letter alphabet. Therefore, wehave drawn the latter with heavier lines to make a clear distinc-tion.

The inputs to the decoder are the probabilitiesp(yi |0) andp(yi |1), i = 1,. . . ,18, wherey = [y1, . . . , y18] is the channeloutput data. The outputs of the decoder are approximatea pos-teriori probabilitiesp(ui |y) for all information bitsui . A finaldecoding decision may be obtained by comparingp(ui = 1|y)with p(ui = 0|y).

Page 120: On the Design of Analog VLSI Iterative Decoders - Electronics

104 Chapter 5. Decoder Examples

u3

u2

u1u9

u7

u8

u6 u5

u4

y1y2

y3

y4

y5

y6

y7

y8

y9y10

y11

y12

y13

y14

y15

y16

y17

y18

Figure 5.12 Factor graph representation of the binary [18,9,5] tail-bitingtrellis code.

Page 121: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 105

B

C

B

CC

D

C

B B

DD D

A A A A

p(y1|x1) p(y2|x2)

p~(u1|y) p~(u2|y) p~(u3|y) p~(u9|y)

p(y3|x3) p(y4|x4) p(y5|x5) p(y6|x6) p(y17|x17) p(y18|x18)

Decoding network for the binary [18,9,5] tail-biting trelliscode.

Figure 5.13

Page 122: On the Design of Analog VLSI Iterative Decoders - Electronics

106 Chapter 5. Decoder Examples

00

11

01

10

1

0

00

01

11

10

00

11

01

10

00

11

01

10

00

11

00

11

01

10

1001

00

11

01

10

00

11

01

10

00

11

00

11

01

1010

01

1

1

0

10

0

11

10

01

00

module A module B

module C module D

Figure 5.14 Trellis representation of the binary indicator functions to beimplemented in the decoder modules for the binary [18,9,5]tail-biting trellis code.

Page 123: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 107

Again, this decoder network is a direct implementation of the A directimplementation of the

forward-backwardalgorithm

forward-backward algorithm [103] (adapted to a tail-biting trel-lis). The type-B modules in Fig. 5.13 perform the “forward”computation and the type-C modules perform the “backward”computation on the tail-biting trellis. The type-A modules pre-compute the branch metrics and the type-D modules computethe final probabilities for each information bit.

The outputs of the network are onlyapproximate a posteriori Sum-productalgorithm on factorgraphs with cycles

delivers onlyapproximateprobabilities

probabilities, because the forward-backward algorithm com-putesexact a posterioriprobabilities only for ordinary (not tail-biting) trellis codes. On the level of factor graphs, we should re-call that the sum-product algorithm produces only exacta pos-teriori output probabilities if the code is a cycle-free factorgraph, i.e., if it has the form of a tree. The approximation neednot be very good, since finally only the sign of the differencep(ui = 1|y)− p(ui = 0|y) matters.

Note that this decoding network contains two loops that corre-Always convergenceif only one loop exists

in the factor graphspond to the forward and the backward computations. In gen-eral, networks with multiple loops may not always converge toa stable state. However, networks as in Fig. 5.13 with noin-teracting loops are guaranteed to converge unless some of theinput probabilities are zero [141].

5.2.2Circuit Design

The decoder network for the binary [18,9,5] tail-biting trellis Circuits designedwith AMS 0.8µmBiCMOS process

code was designed as an analog network with about 940 BJTsand 650 pMOS transistors in the AMS 0.8µm double-metalBiCMOS technology [140]. Fig. 5.15 and Fig. 5.51 to Fig. 5.53show the circuit implementations of the modules of type A to D.

Representatively, we discuss the type-B module: the core tran-Sizing of the bipolartransistorssistors for the multiplication part of the circuit are minimum-

size BJTs (3× 0.8µm2). Each BJT sits in its own well, whichhas to be separated from any other well by 7µm on each side.This will create much unused active area on the chip layoutwhich can be used for routing the interconnects. In fact, thesize of the BJTs are the main factor determining the area of abuilding block.

Page 124: On the Design of Analog VLSI Iterative Decoders - Electronics

108 Chapter 5. Decoder Examples

The remaining transistors, located in the current mirrors on theSizing of the pMOStransistors top of the circuit of Fig. 5.51, are pMOS FETs with almost

minimal size (12× 1.6µm2). They were designed for maxi-mum speed in a 5V design andVDsat= 2V. Hence, the currentmirrors operate in the strong-inversion region for the nominalcurrents. The current error was roughly estimated accordingto the Pelgrom formula [142]. Hand calculations showed thatthe standard deviation will then be far below 1% at the nom-inal current of 200µA, but the error will go up to about 33%for very low currents. This is a minor problem, since the de-termining factor will be the error at equal current levels for allthe elements of the probability distribution. Under these condi-tions, the matching is still reasonable if the discrete probabilitydistributions have not too many elements, i.e. less than 10 el-ements. If this does not hold, the circuits have to be designedmore carefully for the desired matching in the uniformly dis-tributed situation.

The other modules in Fig. 5.14 shown in Fig. 5.51 to Fig. 5.53Other modules arebuilt similarly are built similarly. For each edge in the trellis representation, a

transistor can be identified in the middle row of the circuit dia-gram. Additionally, dummy transistors are introduced such thateach state has the same number and the same types of outgoingbranches.

The layout of the individual building blocks was made with theBack-to-back layoutaim to allow direct back-to-back connections on every edge. Bydoing this, only the connections to close the tail-biting structurehave to be drawn. Fig. 5.16 shows one vertical slice containing,from top to bottom, modules of type B, A, C, and D. A consid-erable part of the area is used by the power supply stripes.

5.2.3 Simulation Results

The analog network of the whole decoder chip has been simu-Transient simulationof the decodingprocess

lated using Cadence’s Spectre simulation tool. For MOS tran-sistors, the BSim 3v3 model was used. A typical simulationresponse is shown in Fig. 5.17. It demonstrates the correctionof two toggled bits for a binary symmetric channel (BSC) withcrossover probability 0.05. We have chosen the configurationwith the two toggled bits separated by 3 correct bits becauseit is hardest for the decoder circuit to recover the correct in-formation in this case. The transient curves in Fig. 5.17 show

Page 125: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 109

L00_1

L00_2

L01_2

L01_1

L11_1

L11_2

L10_2

L10_1

L2_0

L2_1

VrefB

L1_0 L1_1

VrefA

Iref

Circuit implementation of the type-A module: branch-metricprecomputation.

Figure 5.15

Page 126: On the Design of Analog VLSI Iterative Decoders - Electronics

110 Chapter 5. Decoder Examples

Figure 5.16 A vertical slice of the layout containing, from top to bottom,modules of typeB, A, C, andD.

Page 127: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 111

0

25

50p~(bi|y) [µA]

p~(b1=1|y)

p~(b2=1|y)

p~(b3=1|y)

0 10-10 20 30 40 50 t[ns]

A typical simulation response for the decoder of Fig. 5.13:transient curves of the approximatea posterioriprobabilities

p(uj |y) for j = 1,2,. . . ,9 given the transmitted all-zerocodeword and two bit errors in the received channelinformation. (Probability 1 corresponds to50µA.)

Figure 5.17

the computed approximatea posterioriprobabilitiesp(ui |y) forall nine information bits. In this example, it takes about 30nsuntil the sign of all output probabilities have reached their finalvalue. The bias current of each module was chosen to be 50µA.A probability of 1 corresponds to a current value of 50µA, aprobability of 0 corresponds to 0µA. A single 5V supply wasused, and the total (static and dynamic) power consumption wasmeasured to be 50mW.

To show the effects of different cross-over probabilities, i.e., Effects of differentcross-over

probabilitiesdifferent strengths of the conditional input probabilities, on theerror-correcting capabilities, we made a sequence of five simu-lations with differentε. The circuits were biased this time witha current of 200µA for each section, and we used also a dif-ferent uncoded datawordu = [1,0,1,1,0,1,1,0,1] but left thetwo toggeled bits at the same position. Fig. 5.18 to Fig. 5.21show the plots for BSCs withε = {40%,25%,5%,2.5%} re-spectively, whereas Fig. 5.22 represents the case with 4 era-sures, i.e., the conditional input probability of the toggeled bitsis 0.5. As we observe in the five plots, the case with 4 erasure

Page 128: On the Design of Analog VLSI Iterative Decoders - Electronics

112 Chapter 5. Decoder Examples

0

100

200p~(ui|y) [µA]

0 10-10 20 30 40 6050 t[ns]

p~(u1=1|y)

p~(u3=1|y)

Ibias=198.7µAStatic Power=98.5mW

Delay Bit 1 = 6.837nsDelay Bit 3 = 8.845ns

Dataword: 101101101Errors: 4,8

Figure 5.18 Simulation of a decoder correcting two bit errors andε = 40%.

bits is the fastest one, since actually no error-correcting actionwas done. In the remaining four plots we observe that verystrong errors make the decoding time considerably longer. Onthe other hand, very weak errors are easily corrected as we haveseen before for the special case of erasures.

5.2.4 Test Setup

For testing the implemented circuit chip, an HP 83000 digi-HP 83000-basedtesting tal circuit tester was used as shown in Fig. 5.23. The tester

commands the 18 off-chip high-speed D/A converters on theDUT adapter board for generating the input voltage waveforms.These input voltages are proportional top(yi |xi ). In the com-plete transmission system, these probabilities stem from theoutput of the demodulator as shown in Fig. 2.10. Since thevoltage signals have to be applied in parallel during the de-coding process, the converters act as external analog memory,too. The input voltages are converted on chip into the currentsignals as needed by the decoder core. Additionally, the DUTadapter board generates bias currents and reference voltages forthe test chip. And finally, the DUT adapter board contains nine

Page 129: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 113

0

100

200p~(ui|y) [µA]

0 10-10 20 30 40 6050 t[ns]

p~(u1=1|y)

p~(u3=1|y) Ibias=198.7µAStatic Power=98.5mW

Delay Bit 1 = 10.742nsDelay Bit 3 = 12.549ns

Dataword: 101101101Errors: 4,8

Simulation of a decoder correcting two bit errors andε = 25%.

Figure 5.19

0

100

200p~(ui|y) [µA]

0 10-10 20 30 40 6050 t[ns]

p~(u1=1|y)

p~(u3=1|y)

Ibias=198.7µAStatic Power=98.5mW

Delay Bit 1 = 23.461nsDelay Bit 3 = 26.296ns

Dataword: 101101101Errors: 4,8

Simulation of a decoder correcting two bit errors andε = 5%. Figure 5.20

Page 130: On the Design of Analog VLSI Iterative Decoders - Electronics

114 Chapter 5. Decoder Examples

0

100

200p~(ui|y) [µA]

0 10-10 20 30 40 6050 t[ns]

p~(u1=1|y)

p~(u3=1|y)

Ibias=198.7µAStatic Power=98.5mW

Delay Bit 1 = 34.130nsDelay Bit 3 = 40.216ns

Dataword: 101101101Errors: 4,8

Figure 5.21 Simulation of a decoder correcting two bit errors andε = 2.5%.

0

100

200p~(ui|y) [µA]

0 10-10 20 30 40 6050 t[ns]

p~(u1=1|y)

p~(u3=1|y)

Ibias=198.7µAStatic Power=98.5mW

Delay Bit 1 = 0nsDelay Bit 3 = 0ns

Dataword: 101101101Erasures: 4,8,12,16

Figure 5.22 Simulation of a decoder correcting 4 erased bits.

Page 131: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 115

I-V converter pairs and nine high-speed ECL comparators (bitslicers) for measurement purposes. The analog voltages may bemeasured by a high-speed oscilloscope, whereas the bit sliceroutputs are directly fed back to the HP 83000 test system.

5.2.5Measurement Results

Measurements of the transient behaviour of the output proba- Transientmeasurements of the

decoding processbilities, shown in Fig. 5.24, match well with the simulation re-sults. The measured approximate output probabilities of bit 1and bit 2 are drawn in this plot; the error correction of bit 1can be seen clearly. Furthermore, the output of the externalcomparator for the hard decision on bit 1 is shown, making thedecoding delay of approx. 31ns evident. We observed heavyringing of the output currents. On the one hand, this ringingwas caused by coupling between the pins of the package, whichcould not be calibrated out completely. On the other hand, thecavity of the package was very wide and hence the long bondwires added considerable inductivity on all pins. This induc-tance prevents high-speed signals as well as fast changes of thecurrent consumption to propagate properly and thereby gener-ates oscillations. However, it was found that decoding speedand errors were not affected by this ringing. We could haveprevented ringing by assigning the pinout properly and by us-ing, for example, the chip-on-board (COB) mounting techniqueor flip-chip assembly, which almost eliminate the bond-wire in-ductances.

In Fig. 5.25, the result of another measurement is shown. TheMeasurement of theeffect of several

toggled bitssourcewordu = [1,0,1,1,0,1,1,0,1] is encoded and applied toa BSC with crossover probability 0.05. Consecutively, zero,one, two, and three bit errors were applied, where the three-error configuration corresponds to another codeword with twobit errors. The error correction capability for two applied errorscan clearly be observed in Fig. 5.25. In this measurement setup,a probability of 1 corresponds to 200µA and one oscilloscopedivision corresponds to 25µA. The differences of the outputamplitudes of bit 1 and bit 3 stem from device mismatch effectswithin the decoder core. As we have seen, the building blockswere designed with virtually minimal transistor sizes as in dig-ital design. Extensive Monte-Carlo simulations were made, inwhich none of the simulated configurations failed and the stan-dard deviation of the output currents was within 3 to 4% of the

Page 132: On the Design of Analog VLSI Iterative Decoders - Electronics

116

Cha

pter

5.D

ecod

erEx

ampl

es

… …

… … …

D/A

D/A

210

2

8

9

Bias

18x

I/V

I/V

9x

2

TP

TP

DUT Adapter Board

Tail-Biting Decoder Chip

HP83000

Figure 5.23: The test setup based on the HP 83000 digital circuit tester.

Page 133: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 117

p~(b1=1|y)

p~(b2=1|y)

b~1 (after comparator)

error correction

p=1

p=0.5

p=0logic '1'

logic '0'

A typical measured transient response of the outputprobabilities of bit1 and bit2. 20mV/div correspond to

10µA/div and probability 1 corresponds to50µA.

Figure 5.24

nominal value. Therefore it is assured that the decoder is veryrobust against mismatch errors.

Unfortunately, beside the ringing problem as visible inProblems during themeasurementsFig. 5.24, we observed a second severe defect on the chip imple-

mentation. For very small input values, the V-I converters com-pletely shut off the currents and thus created zero-probabilitypaths in the tailbiting trellis. The decoder could not recoverfrom a zero probability value. Hence, the malfunctioning V-Iconverters prevented us from measuring a BER curve, whichwould be the most important measurement to fully characterizethe decoder chip.

A chip micrograph of the complete prototype implementationFloorplan of the chipis shown in Fig. 5.26. The entire chip area is 2.8× 2.6mm2 in-cluding pads. The area of the decoder itself is 1.7× 0.7mm2,and it is situated in the lower right corner of the die. The re-maining area is taken by the VI-converters needed for measure-ment purposes.

With a decoding time of 90 ns per decoded 9-bit sourceword, 100Mbit/sthroughput measureda data rate of 100Mbit/s can be achieved, which includes am-

Page 134: On the Design of Analog VLSI Iterative Decoders - Electronics

118 Chapter 5. Decoder Examples

0 er

ror

1 er

ror

2 er

rors

3 er

rors

p=1

p=0.5

p=0

Figure 5.25 Measured transient response of the output probabilities of bit1and bit3. Consecutively, zero, one, two and three bit errorshave been introduced. Each module is biased by200µA(corresponding to probability1). In the plot,50mV/divcorrespond to25µA/div.

Page 135: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 119

Chip micrograph of the protype chip. The actual decoder is onbottom right part of the chip, whereas the VI converters reside

on the top half of the chip.

Figure 5.26

ple margin and reset time. The main chip characteristics aresummarized in Table 5.1.

5.2.6Power/Speed Comparison

In the following, an estimate of the power consumption of Estimation of thedigital complexity

and the timingconstraints

an equivalent digital decoder is deduced. We found by high-level discrete-time simulations that the decoding algorithm con-verges to its final value after 15 iterations per section if parallelupdates for each node type are assumed. This corresponds toa total of 4400 multiplications and 1100 additions. In order toachieve the desired bit rates, fast but low-power 5-bit multipli-ers and adders are assumed at each node of the factor graphof Fig. 5.12. The maximum operating frequency of these nodeprocessors is 167MHz (15 iterations within 90ns). Defining agate as a basic digital circuit with up ton inputs and 1 output,

Page 136: On the Design of Analog VLSI Iterative Decoders - Electronics

120 Chapter 5. Decoder Examples

Technology AMS 0.8µm 2M2P BiCMOS# BJT 940# PMOS 650Supply voltage single 5VPower consumption@ Iref = 50µA

50mW (w/o V-I converters)

Chip area 2.8× 2.6mm2

Core size 1.7× 0.7mm2

Decoding speed (withmargin)

100MBit/s

Table 5.1 Summary of the prototype chip characteristics.

30 gates are needed for implementing the full adder and 110 forthe one-step multiplier.

Furthermore, assuming that all gates have the same delay timeEstimation of thedigitalimplementation

tD, an average node capacitance of 0.1 pF, an activity per gateof 1/4 (i.e. every node charges and discharges within 8 cal-culation steps) and an energy loss due to overlap currents of20%, the power dissipated per multiplication can be estimatedas 2.5mW. Similarly, 0.65mW is required per 5-bit addition.Therefore, operating from a single 3V power supply and ne-glecting additional scaling operations and buffering, the overallpower consumption for the decoder under consideration can beestimated to be above 11.5W. A rough estimation has shownthat adding the input network and the analog memories at mostdoubles power consumption and chip area. Therefore the powerefficiency of our analog decoder is superior by a factor of morethan 200 compared to its digital counterpart. Similar numberscould be found for the efficiency in the use of die area. A dig-ital implementation of such iterative decoders is possible withtoday’s CMOS processes if a delay per gate oftD = 0.45ns isassured forn = 6 processors in parallel. However, suchdigitalimplementations are limited to small codes, since otherwise thepower consumption and die area become unpractically large.

However, note that the digital implementation can potentiallyOptimization of thegate level be optimized on the gate level. Instead of doing the full multi-

plication, table-lookup techniques can be applied; this is espe-cially suitable for data quantized to only a few bits [143]. Ad-ditionally, the whole gate-level schematic can automatically beoptimized by the digital design tools offered by most of the im-

Page 137: On the Design of Analog VLSI Iterative Decoders - Electronics

5.2. Decoder for a Tail-Biting Trellis Code 121

portant vendors of IC design frameworks. By doing so, trade-offs can be evaluated easily.

A second approach to optimized digital implementations areOptimization in theupdate schedulesdifferent update strategies. We might choose a more suitable

update schedule in the digital domain for this decoder as dis-cussed in Section 3.3.2. Instead of parallel updates of all nodes,the forward and backward trellises could be calculated seriallywith approx. 3 rounds each. This means that the presented ana-log decoding technique develops its full power for low-densityparity-check codes, where the factor graph is highly connected.In this case, no simplifications can be made in the digital im-plementation.

Another problem in comparing different implementation ap-Optimization on thealgorithmic levelproaches, without having exact details of the actual design, are

the different realizations of the same algorithm. The Max-Log-MAP [144], for example, approximates the MAP algorithm butsignificantly reduces the complexity by performing the calcula-tions in the log-domain. Only additions and the max-operationsare necessary in this case. Although the performance loss isabout 0.3dB at low SNR values, it may be used for low-powerapplications. As an illustration for this design uncertainty wecite the design study done by Vogtet al. [132]. They have com-pared different Turbo decoder realizations and found that theeficiency of these realizations differs by as much as a factorof 2.5.

Page 138: On the Design of Analog VLSI Iterative Decoders - Electronics

122 Chapter 5. Decoder Examples

parity bit XOR function channel/info bit

Figure 5.27 The structure of a basic repeat-accumulate (RA) code.Quasi-cyclic repeat-accumulate (QCRA) codes are formed byadding a tailbiting connection (dashed line) to the rectilinearpart of the factor graph.

5.3 Decoder for a Turbo-Style Code

Our third complete decoder example is a decoder for a quasi-A turbo-style QCRAcode as the thirddecoder example

cyclic repeat-accumulate (QCRA) code. Repeat-accumulate(RA) codes are a special form of low-density parity-checkcodes with a factor graph as shown in Fig. 5.27. Tanner pro-posed the initial idea for the basic structure of the QCRA codes[145, 146]. This idea was refined by making numeric simula-tions to evaluate the performance of different realizations of thecode [147]. In the following, we present the code structure, thedesign automation process that allows us the direct generationof the final layout, and simulation results. Unfortunately, thechip-on-board (COB) assembly of the chips failed due to bond-ing problems and we have still no working module to makemeasurements at the present time. So we just describe the testsetup for the decoder chip at the end of this section.

5.3.1 General Code and Decoder Description

The code chosen for our implementation is a linear [44,22,8]Code defined byLDPC matrix quasi-cyclic repeat-accumulate (QCRA) code that is character-

Page 139: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 123

ized by the following parity-check matrix:

H ,

I (1) — I (0) — — I (1)

I (0) I (9) I (1) I (0) — —— I (0) — I (1) I (0) —I (1) I (4) — — I (1) I (0)

(5.8)

where we use the following notation:

I (0) , I =

1 – – – – – – – – – –– 1 – – – – – – – – –– – 1 – – – – – – – –– – – 1 – – – – – – –– – – – 1 – – – – – –– – – – – 1 – – – – –– – – – – – 1 – – – –– – – – – – – 1 – – –– – – – – – – – 1 – –– – – – – – – – – 1 –– – – – – – – – – – 1

I (1) ,

– 1 – – – – – – – – –– – 1 – – – – – – – –– – – 1 – – – – – – –– – – – 1 – – – – – –– – – – – 1 – – – – –– – – – – – 1 – – – –– – – – – – – 1 – – –– – – – – – – – 1 – –– – – – – – – – – 1 –– – – – – – – – – – 11 – – – – – – – – – –

etc. are shifted 11× 11-element identity matrices, and ‘—’ isa zero matrix of the appropriate size or ‘–’ is a zero element inthe matrix.

A corresponding factor graph representation is shown inTurbine form of thefactor graphFig. 5.28. Note that the parity-check matrixH of (5.8) has been

reordered, by doing linear combinations of rows and columnpermutations, to get the symbol numbering as shown in the fac-tor graph of Fig. 5.28. We have chosen the even-numbered bitson the outer ring to be the information bits whereas the odd-numbered bits are parity bits. Note that the bits on the innerring are auxiliary bits. Hence we have a systematic, linear, rate

Page 140: On the Design of Analog VLSI Iterative Decoders - Electronics

124 Chapter 5. Decoder Examples

1/2 code. The girth of the graph, i.e., the smallest number ofbranches of a closed loop in the factor graph, isg = 10. Thisis an important parameter, since the iterative decoding on fac-tor graphs with cycles performs sub-optimally. The larger thegirth, the smaller the influence of cycles on the decoding per-formance.

The factor graph serves directly as the block diagram of the de-The factor graphserves directly as amodel of the decodercircuit

coder circuit. Because of the form of the factor graph, we willdenote the decoder as a ‘Turbine decoder’ in the following. InFig. 5.28 we identify five types of nodes: fully bi-directionalvariable nodes on the innermost circle, fully bi-directional soft-XOR nodes with 3 or 4 ports on the second circle from thecenter, and partly bi-directional 2.5-port1 variable nodes withand without an output for the information bits and the paritybits, respectively. Since the factor graph nodes work mostlyfully bi-directionally and our building blocks allow only two-input circuits, we dissect them into building blocks as shownin Fig. 5.29. The lines in the figures represent actually two-element probability mass functions. The letters at the inputs ofthe building blocks indicate whether one enters by thex-inputs(bottom of the core transistor matrix) or by they-inputs (on theleft side of the core transistor matrix). In order to save on theinput logarithm transistors as well as to omit unnecessary cur-rent duplicates, we try to reuse as many of the input terms aspossible. However, this is not always possible. Some of theinputs have a letter in parentheses beside them such as, for ex-ample, (x). This indicates a domain change from ay input to anx input and vice versa. This increases the current consumption,since a scaler circuit of its own has to be used for that purpose.

5.3.2 Circuit Design

General transistor sizing

The transistor level schematics of the individual factor graphTransistor sizingcomparable to thetail-biting decoder

nodes are given in Fig. 5.30 and in Fig. 5.54 to Fig. 5.57. Formost of them, the large schematics are divided into two partsthat are shown on opposite pages and should be read together.The computation nodes have been implemented using the same

1The ‘half port’ is an input port only as needed for the input of thechannel information

Page 141: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 125

Channel-FunctionXOR-FunctionInfo-/Channel-BitChannel-BitBit

1

2

3

4

5

6

7

8

910

11121314

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

3132

33 34 3536

37

38

39

40

41

42

43

44

(1)

(2)(3) (4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)(12)

(13)(14)(15)

(16)(17)

(18)

(19)

(20)

(21)

(22)

Factor graph of the binary linear [44,22,8] QCRA code. Figure 5.28

Page 142: On the Design of Analog VLSI Iterative Decoders - Electronics

126 Chapter 5. Decoder Examples

1 2

3 4

3 4

1 2

y

y

y

x

(x)x

y

y

y

x

(x)x

1 2

1 23

y

(x)

x

3y

yx

1 2

1 23

y

(x)

x

3y

yx

1 2

2 1out3

y

x

x

3x

yy

1 2

23

y x

3x

y

a)

b)

c)

e)

d)

Figure 5.29 Dissection of the factor graph nodes of Fig. 5.28 intoelementary 2-input building blocks with a) fully bi-directional3-port variable node, b) bi-directional 2.5-port variable node,c) bi-directional 2.5-port variable node with output bit slicer,d) fully bi-directional 3-port soft-XOR node, and e) fullybi-directional 4-port soft-XOR node.

Page 143: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 127

BiCMOS technology from AMS as the one we used for thetail-biting decoder described in Section 5.2 [140]. Again, allthe bipolar transistors are minimum size (3× 0.8µm2). ThepMOS transistors located in the current mirrors on the top ofthe circuits are also almost minimum size (12× 1.6µm2). Theywere designed for maximum speed in a 5V design andVDsat=2V. Hence, the current mirrors operate in the strong-inversionregion for the nominal currents of 200µA.

Input variable nodes

The 2.5-port factor graph nodes for input variables of Fig. 5.28Reset circuitry for theinput variable nodesneed some special attention. In order to have predictable ini-

tial conditions for the sum-product algorithm, the input proba-bilities of the network are generally set to a uniform distribu-tion. Therefore, additional initialization functionality has beenimplemented in the circuits of the affected 2.5-port computa-tion nodes by means of clamping reset switches as shown inFig. 5.54 and Fig. 5.55. By clamping the binary input distribu-tions together, the input current sum is forced to be theoreti-cally distributed uniformly on the two diode-connected inputtransistors of port-1. Unfortunately, the pMOS transistor M1r(W/L = 20µm/0.8µm) is not a perfect switch. More likely itbehaves as a MOS resistor in the linear range, and the drain-source resistance is not negligible at a zero gate voltage. Thiscauses a differential voltage-drop at the input port 1 that di-rectly translates into an imbalance of the input distribution andthereafter also in the output distributions. Even if the transistorM1r had been made very wide, the parasitic channel resistancewould not have dropped to an acceptable level. Therefore theadditional transistors M2r and M3r (W/L = 40µm/0.8µm) areconnected to the two output ports 2 and 3 to equalize the outputdistributions during the reset time. The transistors prove to bevery efficient in equalizing the input distributions of the wholenetwork even at their relatively small size.

Current-mode comparator

In addition to the reset hardware, the 2.5-port variable node for Bit slicer for theinformation bit nodesthe information bits also contains the output circuitry. This in-

cludes the summarization calculations, i.e., the pair-wise prod-uct of all incoming branches, and a bit slicer. The bit slicer

Page 144: On the Design of Analog VLSI Iterative Decoders - Electronics

128 Chapter 5. Decoder Examples

Out3<0>

In1<0>

In1<1>

Iref

Out3<1>

Out1<0>

Iref

Out1<1>

In3<0>

In3<1>

VrefBIn2<0>

In2<1>

VrefA

Figure 5.30 Circuit implementation of the fully bidirectional 3-portvariable node (left part). Denoted as ‘Bit’ in Fig. 5.28.

Page 145: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 129

Out2<0>

Out2<1>

Iref

Circuit implementation of the fully bidirectional 3-portvariable node (right part). Denoted as ‘Bit’ in Fig. 5.28.

Figure 5.30

Page 146: On the Design of Analog VLSI Iterative Decoders - Electronics

130 Chapter 5. Decoder Examples

M1

M2

M4

M3

M6

M5

M8

M7

M10

M9

M12

M11

in out

Figure 5.31 Current comparator for the output bit slicer.

is actually a current comparator that compares the the two ele-ments of the output distribution. Ifp(ui = 1|y) > p(ui = 0|y) itputs a logic ‘1’ at the output and a logic ‘0’ otherwise. The ac-tual comparator of Fig. 5.31 can decide only whether the signalcurrent is positive (flowing into the circuit) or negative (flowingout of the circuit). Therefore a precomputation has to be donewith the output currents according to

Icomp= Iz p(ui = 1|y)− Iz p(ui = 1|y)+ Iz p(ui = 0|y)

2. (5.9)

This actually calculates adynamic thresholdvalue correspond-ing to the probability of 0.5 and subtracts the output value cor-responding to the conditional probability of being a ’1’. Thisvalue is not strictly positive anymore and may hence serve asthe input of the current comparator.

The comparator circuit of Fig. 5.31 is a class-B circuit that usesTraff’s current-modecomparator positive feedback in the first stage. It was initially proposed by

Traff [148]. Transistors M1 and M2 form a ‘head-over’ digi-tal inverter followed by a traditional unit-size inverter. At theoutput of this bistable circuit we observe only small voltagevariations that are subsequently amplified by a scaled inverterchain according to the grading 1:1:2:4 of the nominal transistorwidths. The sizing of an unit inverter isW/L = 9µm/0.8µmfor pMOS transistors andW/L = 5µm/0.8µm for nMOS tran-sistors, respectively. This sizing corresponds to the one foundin the digital cell library.

Page 147: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 131

On-chip D/A converter

To drive the inputs of the Turbine-decoder core, 7-bit current- On-chip D/Aconverter to provide

input values to thedecoder core

output D/A converters with an input latch are placed on thechip. This is different from the previous decoder example,where the input signals are produced externally on the DUTadapter board by means of high-speed D/A converters. Themain benefit from placing the converters directly near the in-put nodes on the chip is a better speed performance and noproblems from zero currents, which would subsequently resultin zero-probability paths. A current steering architecture hasbeen chosen due to its simplicity and robustness [108,137]. Asshown in Fig. 5.32, the currents, which are normally dumpedinto a dummy load, are used for the complementary valueof the binary input distribution. By programming the 7-bitvalue to the register we choose the conditional input probabilityp(yi |xi = 1) according to

I p(yi |xi = 1) = 4 Iref p(yi |xi = 1)

= 4 Iref

(b1

2+ b2

4+ b3

8+ b4

16+ b5

32+ b6

64+ b7

128

), (5.10)

wherebi are the programmed values of the D/A converter andIref is set toIMSB/4. Conversely we get

I p(yi |xi = 0) = 4 Iref p(yi |xi = 0)

= 4 Iref

[1−

(b1

2+ b2

4+ b3

8+ b4

16+ b5

32+ b6

64+ b7

128

)](5.11)

for the complement ofp(yi |xi = 1).

For the binary-weighted current sources we used simple cas-Robust design of thecurrent-steering D/A

convertercode structures. The current switches are of very small size(W/L = 2µm/0.8µm) to allow fast switching. The summedcurrents are passed to the input of the decoding network by cur-rent mirrors. The current mirror transistors are all equippedwith regulated-cascode structures [135]. Hence, the resultinghigh-impedance output of the D/A converter provides well de-fined input values to the decoding network. The layout of thecomplete converter, as shown in Fig. 5.33, is as compact as pos-sible (334× 434µm2). However, the D/A convert design fol-lows a traditional design approach and the layout therefore uses

Page 148: On the Design of Analog VLSI Iterative Decoders - Electronics

132 Chapter 5. Decoder Examples

1 2 4 8 16 32 64

b1 b2 b3 b4 b5 b6 b7

static register

7

I p(yi|xi=1) I p(yi|xi=0)

data

Figure 5.32 Current-steering D/A converter with binary-weighted currentsources.

a considerable amount of silicon area due to matching consid-erations. 45 of these D/A converters are placed on the chip:44 converters are used for the decoding network inputs, and forone converter the output signals are connected to the chip padsfor measurement purposes.

5.3.3 Automating the Design Process

Drawing schematics and layouts of analog decoding networksGenerating theschematics andlayout of a largeanalog network

becomes too complicated for large codes. The design is verysusceptible to drawing errors that will later on affect the overallbehaviour of the circuit. Additionally, the design time may getfar too long, and changes in the design may require a completeredrawing of all the elements of the network. Hence we have tolook for a design flow that offers the construction of the decod-ing network. Our aim is thus a computer-assisted methodologythat checks the critical parts, i.e., the interconnections in theschematic and in the layout of the circuit chip. We have devel-oped such a method in the context of the chosen QCRA code,which we will describe in the following.

Page 149: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 133

A part of the layout showing four current-steering D/Aconverters used for the input-value generation for the decoding

network. One D/A converter measures334×434µm2.

Figure 5.33

Page 150: On the Design of Analog VLSI Iterative Decoders - Electronics

134 Chapter 5. Decoder Examples

The parity-check matrixH of (5.8) together with the allocationThe parity-checkmatrix is the basis forthe automated designflow

of the information bits and the channel bits comprises the entireinformation needed to generate both the factor-graph represen-tation and the actual decoder structure. Thus we can parse thismatrix and extract the information needed for the design. Eachrow of the matrixH provides the information about the paritychecks (XOR functions). The rows with 3 or 4 entries corre-spond to 3-port and 4-port fully bi-directional soft-XOR mod-ules. In the same way we analyze the columns. In the case ofthe [44,22,8] QCRA code, the first 22 columns of the 66× 44parity-check matrix correspond to the non-observable, fully bi-directional internal 3-port variable nodes. The remaining 44columns then correspond to the 2.5-port variable nodes. Themodule incorporates the output functionality if the correspond-ing bit is an information bit. After having parsed the entireparity-check matrix, we can generate a text file in the Verilogsyntax by the same program. This file contains the structuraldescription of the decoder. Subsequently, this Verilog file isimported as a schematic file into the Cadence design environ-ment by the programVerilogIn. In a next step, the transistor-level schematics of the building blocks are filled into the emptyplace-holders of our design structure. Now we are ready forthe transistor-level simulations, since we have a complete hi-erarchic transistor-level description of our circuit. The circuitsimulator tools such as Spectre are well embedded into the de-sign framework. Thus the schematic is a central pivot of thewhole design flow.

The second part of the decoder design consists in drawing theDesign of analogstandard cells full-custom layout of individual building blocks. They are de-

signed to be placed in a library as analog standard cells. Thislibrary can then be used with a (digital) place-and-route (P&R)tool such as CellEnsemble or SiliconEnsemble of the CadenceIC design framework. Special attention has to be paid to thechoice of the width of an individual cell such that the modulesare placed correctly by the software tool. After having drawnthe individual cells, the layout of the entire decoder can begenerated automatically by the P&R tools of the design framework. For the final chip, the additional bias-network blocks canbe assembled semi-automatically inside the pad ring of the chiplayout.

We have sketched the procedure presented above in the flow-A flow-chart of thedesign procedure chart shown in Fig. 5.34. The described method has been

Page 151: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 135

sucessfully applied to the design of our test chip. Fig. 5.35shows a clipping of the generated decoder core. Clearly visibleis the massive power-feeding trunk on the right of the imagethat provides the supply for the decoder core.

5.3.4Simulation Results

The analog network of the whole decoder chip has been simu-Fast transients of thedecoder circuitlated using Cadence’s Spectre simulation tool. The BSim 3v3

model was used for the MOS transistors. A typical simulationresponse is shown in Fig. 5.36. It demonstrates the correctionof three toggled bits (channel bits 2, 3, and 4) on a binary sym-metric channel (BSC) with a crossover probability of 5%. Wehave chosen the configuration with three consecutively toggledbits because it is hardest for the decoder circuit to recover thecorrect information in this case. In fact, the toggled bits are lo-cated on the same minimum-girth loop, as we can easily see inFig. 5.28. The transient curves in Fig. 5.36 show the computedapproximatea posterioriprobabilitiesp(ui |y) for all 22 infor-mation bits. In this example, it takes about 100ns until the signof all output probabilities have reached their final value. Thebias current of each module was chosen to be 200µA. A prob-ability of 1 corresponds to a current value of 200µA, a proba-bility of 0 corresponds to 0µA. A single 5V supply was used.The total maximum (static and dynamic) power consumptionwas estimated to be 1W. In Fig. 5.36 we also observe the non-perfect action of the clamping reset switches. It always resultsin a light preference of the applied input value, i.e., a correct bitimmediately starts in the right direction of the decoding trajec-tory.

As soon as the errors are no more placed on the same minimum-Existence of variabledecoding difficultygirth loop, the decoding network gets considerably faster. This

behaviour is shown in Fig. 5.37 and Fig. 5.38. In these simu-lations we use the same simulation conditions as before. Weobserve an opposite situation if we introduce more toggled bitsthan the decoder is capable of correcting. An oscillatory be-haviour is observed if 4 input bits on the same minimum-girthloop are toggled. The transient simulation of this input configu-ration is shown in Fig. 5.39. Note that we also observe the sameoscillations in digital simulations. Hence these oscillations arean inherent problem of this code.

Page 152: On the Design of Analog VLSI Iterative Decoders - Electronics

136 Chapter 5. Decoder Examples

Matrix H(MATLAB)

Verilog schematic

core layout

final layout

Composerdraw

schematic

Virtuosodraw

cell layout

cell layout

Virtuosodraw biasnetworks

layoutbias network

parse H

write Verilogfile

import withVerilogIn

Spectresimulation

CellEnsembleSiliconEnsemble

P&R

SiliconEnsemblefinal assembly

Cadence Design Framework

transient curve

1

2

3

4

65

7

Figure 5.34 Design flow used to automate the construction of large analogdecoding networks.

Page 153: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 137

A part of the automatically generated layout of the turbinedecoder. The massive metal lines on the right are part of the

power-feeding trunk.

Figure 5.35

Page 154: On the Design of Analog VLSI Iterative Decoders - Electronics

138 Chapter 5. Decoder Examples

0

100

200

500 20-10 80 110 140

p~(u1=1|y)

p~(ui|y) [µA]

t[ns]

p~(u2=1|y)

Figure 5.36 A typical simulation response for the decoder of Fig. 5.28:transient curves of the approximatea posterioriprobabilitiesp(uj |y) for j = 1,2,. . . ,22given the encoded informationwordu = {1,0,0,0. . . ,0} and three bit errors in the receivedchannel information (channel bits 2, 3, and 4). Probability 1corresponds to200µA.

0

100

200

200 10-10 30 40 50 7060

p~(u1=1|y)

p~(ui|y) [µA]

t[ns]

p~(u2=1|y)

Figure 5.37 Transient simulation using the same simulation conditions asbefore. Bits 1, 3, and 4 are toggled in this case.

Page 155: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 139

0

100

200

200 10-10 30 40 50 7060

p~(u1=1|y)

p~(ui|y) [µA]

t[ns]

p~(u2=1|y)

p~(u3=1|y)

Transient simulation using the same simulation conditions asbefore. Bits 2, 4, and 6 are toggled in this case.

Figure 5.38

0

100

200

1000-100 200 300 400

p~(u2=1|y)

p~(ui|y) [µA]

t[ns]

Transient simulation using the same simulation conditions asbefore. Bits 1, 2, 3, and 4 are toggled in this case. Oscillations

are also observed in time-discrete simulations.

Figure 5.39

Page 156: On the Design of Analog VLSI Iterative Decoders - Electronics

140 Chapter 5. Decoder Examples

Technology AMS 0.8µm 2M2P BiCMOS# BJT 3895# PMOS 2884Supply voltage single 5VPower consumption@ Iref = 200µA

1W (estimated)

Chip area 5.28× 5.45mm2

Core size 2.7× 2.5mm2

Decoding speed (withmargin)

150MBit/s (estimated)

Table 5.2 Summary of the simulated chip characteristics of the prototypeturbine decoder.

5.3.5 Test Setup

A chip micrograph of the complete prototype implementationFloorplan of theturbine decoder chip is shown in Fig. 5.40. The entire chip area is 2.8× 2.6mm2

including pads. The area of the decoder itself is 1.7× 0.7mm2.The decoder is situated in the lower right corner of the die. Theremaining area is taken by the D/A converters needed for theinput value generation. The main simulated chip characteristicsare summarized in Table 5.2.

For testing the integrated circuit chip, an HP 83000 digital cir-The HP 83000 basedtest system cuit tester is used as shown in Fig. 5.41. The silicon dies are

mounted by the COB assembly technique on small PCBs. TheCOB adapter is then connected to the DUT adapter board. Thedigital tester directly commands the on-chip D/A converters us-ing a 6-bit address bus and a 7-bit data bus. The DUT adapterboard generates the bias currents and the reference voltages forthe test chip. Additionally, the DUT adapter board containseleven I-V converters for half of the analog probability outputs.The analog voltages may be measured by a high-speed oscil-loscope, whereas the on-chip bit slicer outputs are directly fedback to the HP 83000 test system on a 22-bit wide digital bus.

Page 157: On the Design of Analog VLSI Iterative Decoders - Electronics

5.3. Decoder for a Turbo-Style Code 141

Chip micrograph of the protype turbine decoder chip. Theactual decoder is the bottom right part of the chip, whereas the

D/A converters are placed in an L-shape on the remainingspace.

Figure 5.40

Page 158: On the Design of Analog VLSI Iterative Decoders - Electronics

142

Cha

pter

5.D

ecod

erEx

ampl

es

8

6

7

22

Bias

I/V

I/V

22x

TP

TP

DUT Adapter Board

COB Adapter forTurbine Chip

HP83000

Adress

Data

Bit_out

Figure 5.41: The test setup of the turbine-decoder based on the HP 83000 digital circuit tester.

Page 159: On the Design of Analog VLSI Iterative Decoders - Electronics

5.4. Probability-Based Analog Viterbi Decoder 143

5.4Analog Viterbi Decoder Using ProbablityPropagation Modules

The Viterbi algorithm [23,24] is an essential tool for every com- Valuable Viterbialgorithmmunications engineer. Several analog implementations have

been made so far to speed up Viterbi decoders and lower theirpower consumption [28, 32, 37, 38]. In all these analog imple-mentations, the digital add-compare-select (ACS) units, whichare the key parts of the whole decoder, are replaced by analogones, but the rest of the decoder, in particular the path memoryand the survivor-path trace-back units, is still digital.

In the following, we will reformulate the original Viterbi algo-rithm to fit the framework of sum-product calculus and prob-ability propagation networks. Using this new formulation wewill then propose a modified architecture for the implemen-tation of large probability-propagation-based Viterbi decoderswhich could be used, for example, for the decoding of highlycomplex trellis codes and channel tracking applications.

5.4.1Reformulation of the Viterbi Algorithm

The general MAP sequence detection Problem

The Viterbi algorithm was initially proposed in 1967 as a MAP sequencedetection of a Markov

processmethod of decoding convolutional codes [23]. It has been rec-ognized since then that it is a general recursive solution tothe MAP sequence detection problem of a finite-state discrete-time Markov process observed under memoryless noise condi-tions [24]. The encoding Markov process can be characterizedby a state transition diagram as given, for example, in Fig. 2.3.The observed process is Markov in the sense that the probabil-ity P(xk+1|x0,x1, . . . ,xk) of being in statexk+1 at timek + 1given all states up to the timek, depends only on the statexk attime k:

P(xk+1|x0,x1, . . . ,xk) = P(xk+1|xk). (5.12)

Furthermore, the state sequence of all possible transitions canProblem descriptionby a state-transition

diagrambe described by a trellis diagram. In fact, the Viterbi algorithmcan be seen as the solution of the so-called shortest-path prob-lem through a given graph, i.e., the trellis diagram.

Page 160: On the Design of Analog VLSI Iterative Decoders - Electronics

144 Chapter 5. Decoder Examples

The original definition of the Viterbi algorithm tries to maxi-mize the joint probability measure

P(x,y) = P(x)P(y|x) (5.13)

=K−1∏k=0

P(xk+1|xk)K−1∏k=0

P(yk|xk+1,xk︸ ︷︷ ︸ξk

), (5.14)

wherex is the state vector andy is the observed vector of lengthK , and using a convenient definiton of the transitionξk at timek as the pair of states (xk+1,xk). Usually,K is called thecon-straint lengthof a decoder. If we now assign a distance measurefor each branch of the trellis diagram, i.e., for each state transi-tion ξk, according to

λ(ξk) , ln P(xk+1|xk)+ ln P(yk|ξk) (5.15)

we can rewrite (5.14) as

ln P(x,y) =K−1∑k=0

λ(ξk), (5.16)

which has to be maximized to get the sequence most probablygenerated by the encoding process, given our observation.

Viterbi algorithm for an AWGN channel

We consider in the following a coded transmission systemPath metrics for anAWGN channel where the channel can be modeled as an AWGN channel.

Hence, using the definition (2.12) of the probability densityfunction of a Gaussian process, we may now define the actualpath metrics as

µ ∝ p(y|x) ∝ e− (y−x)2

2σ2 , (5.17)

λ = −(y− x)2 ∝ lnµ, (5.18)

whereλ is the squared Euclidean distance measure between thesent and the received symbol andµ is proportional to the con-ditional probabilityp(y|x) which is simply a shifted Gaussiandistribution. According to (5.16), the path length expression

Page 161: On the Design of Analog VLSI Iterative Decoders - Electronics

5.4. Probability-Based Analog Viterbi Decoder 145

Γj,k

Γi,k

Γj,k+1

Γi,k+1

k

Sj

Si

k+1λjj,k

λii,k

a) b)

λij,k

λji,k

Θj,k

Θi,k

Θj,k+1

Θi,k+1

k

Sj

Si

k+1µjj,k

µii,k

µij,k

µji,k

A simple butterfly trellis section used for defining the updaterules of the Viterbi algorithm. Fig. a) shows the notation of the

max-sum formulation and b) shows the sum-product case.

Figure 5.42

may be written in different forms:

lnK−1∏k=0

µk =K−1∑k=0

lnµk ∝K−1∑k=0

λk. (5.19)

Since we are heading for the global maximum of the path Solution of theshortest path problemlength, the maximum has also to be satisfied locally. So we

can finally formulate the well-known iterative max-sum Viterbialgorithm update rule for encoders with a single input bit:

0i ,k+1 = max[λi i ,k +0i ,k,λj i ,k +0j ,k

], 0≤ i , j ≤ K −1

(5.20)for the update from time instancek to time instancek +1 (seeFig. 5.42a)). The variables0 are the accumulated path metricsfor each state of the trellis diagram. The result of the selectionoperation (max) at each state is stored in a memory as 1 bit ofdigitial information, hence the name storage-survivor memory.This local decision information is used later on for tracing backthe most probable (or shortest) path.

Instead of maximizing∑

k λk we could equally well maximize Changing thedefinition of the path

metricsln∏

k µk of the original Viterbi algorithm formulation of (5.14).This formulation of the Viterbi algorithm is also known as themax-product formulation. It has been used for a long time to ap-proximate the sum-product algorithm because of its much lower

Page 162: On the Design of Analog VLSI Iterative Decoders - Electronics

146 Chapter 5. Decoder Examples

computational complexity. Instead of approximating the sum-product algorithm with the max-product calculus, we couldequally well do the opposite. By some slight modifications ofthe max-product update rules, we can state the iterative updaterules of the sum-product formulation of the Viterbi algorithm:

2i ,k+1 = µi i ,k ·2i ,k +µj i ,k ·2j ,k, 0≤ i , j ≤ K −1. (5.21)

with the notation according to Fig. 5.42b). Again, the variables2 are the accumulated path metrics for each state of the trellisdiagram.

If we compare the original max-product formulation of theComparison of twoversions of theViterbi algorithm

Viterbi algorithm with the sum-product algorithm approxima-tion, we observe strict equivalence if max(µ1,µ2) = µ1 +µ2.So let us briefly analyze the following expression assuming anAWGN channel andµ1 > µ2 or equivalentlya1 < a2, where theparametersa1 anda2 were introduced for mathematical sim-plicity only:

µ1 +µ2

µ1= 1+ µ2

µ1= 1+ e

−(a22−a2

1)

2σ2︸ ︷︷ ︸limσ→0(.)=0

≈ 1 = max(µ1,µ2)

µ1

(5.22)

The parameterσ is a free tuning factor and has the meaning ofFree tuningparameterσ a signal-to-noise ratio (SNR). But do not confuse this parame-

ter σ with the SNR of the AWGN channel. Now we observethat by diminishingσ to zero, the path metrics are pushed moreand more apart from each other. At the limit whenσ = 0 weget exact equivalence with the maximization function. Hence,the sum-product Viterbi algorithm formulation has an indepen-dent tuning parameter, which allows to get results equivalentto those of the traditional max-sum formulation (or min-sumformulation if the path metrics are defined by− lnµ).

Simulation results like the one in Fig. 5.43 for a rateR = 1/2Simulation results forthe two formulations [171,133] convolutional code confirm this theoretical equiv-

alence. If we choose a fixedσ < 0.1, corresponding to anSNRdecoder> 20 dB, the coding loss gets negligible. Con-versely, one may run into numeric problems if one choosesσ

too small. In a nutshell: no adaptation of the parameterσ isnecessary in order to get good decoding results with the sum-product formulation.

Page 163: On the Design of Analog VLSI Iterative Decoders - Electronics

5.4. Probability-Based Analog Viterbi Decoder 147

0 1 2 3 4 5 610

8

107

106

105

104

103

102

101

100

viterbi (min sum version)prob_viterbi opt. SNRdecoder prob_viterbi SNRdecoder =5dB prob_viterbi SNRdecoder =10dB prob_viterbi SNRdecoder =20dB

Comparison of simulation results of a Viterbi decoder for arate R= 1/2 [171,133] convolutional code using a traditional

ACS and a Viterbi decoder using the equivalent SPS unit. Nodifference is visible between the min-sum Viterbi decoder and

the probability-based Viterbi decoder ifσ < 0.1(SNRdecoder> 20 dB).

Figure 5.43

Page 164: On the Design of Analog VLSI Iterative Decoders - Electronics

148 Chapter 5. Decoder Examples

Viterbi decoder using sum-product-select modules

In the following subsection we will postulate a new Viterbi de-A new probability-propagation-baseddecoder architecture

coder architecture that uses our probability propagation mod-ules to do the path metric calculations. Depending on the defini-tion of the branch metrics, the ACS unit of a traditional Viterbidecoder performs an operation known as the min-sum calcu-lus (or the max-sum calculus) in the context of iterative decod-ing [149]. But as we have seen before, we may change the def-inition of the branch metrics from thesquared Euclidean dis-tancemeasure to aconditional probabilitymeasure. By doingthis, the min-sum calculus can easily be transformed into thesum-product calculus.

Following the notation in [37], Fig. 5.44 shows the simplifiedThe block diagramlevel of the Viterbidecoder

block diagrams for both a traditional Viterbi decoder and thenewly proposed probabilistic Viterbi decoder. The branch com-putation unit (BMC) computes the appropriate branch metricswith respect to the received symbols. These metrics are passedeither to the ACS or the sum-product-select (SPS) unit. Theprocessed output (1-bit digital information) is then passed tothe storage-survivor-memory (SSM). This memory keeps trackof the decisions made by all ACS units or the SPS units, respec-tively, and traces back to find the most probably sent sequenceof information bits. The SSM is strictly the same circuit forboth versions of the analog Viterbi decoder.

To date, most of the analog Viterbi decoder implementationsToday mostlyvoltage-mode basedimplementations

are based on voltage-mode circuits such as switched-capacitorcircuits. To our knowledge, current-mode implementations ofViterbi decoders have just started to emerge. A first com-plete current-mode implementation using switched-current (SI)memory cells in the ACS loop and state-vector renormalizationwas presented by Demosthenous and Taylor [37]. However, thedecoder can be simplified and thus speeded up in many waysby applying the sum-product calculus as in our approach.

5.4.2 Proposed Implementation

A probabilistic Viterbi decoder core for a rateR = 1/2 con-The proposed circuitimplementation volutional code with constraint lengthK using SPS units in-

stead of the traditional ACS units is shown in Fig. 5.45. As

Page 165: On the Design of Analog VLSI Iterative Decoders - Electronics

5.4. Probability-Based Analog Viterbi Decoder 149

BMC ACS SSMdecodedinformation

receivedchannelinformation

ACS feedback loop

BMC SPS SSMdecodedinformation

receivedchannelinformation

SPS feedback loop

Simplified block diagram of a traditional analog Viterbidecoder (top) and a probabilistic analog viterbi decoder with

building blocks of Fig. 3.13 in the SPS unit (bottom).

Figure 5.44

opposed to the ACS circuit implementation of [37], this cir-cuit implementation is inherently immune to branch-metric-representation overflows or underflows, since the branch metricsignal-currents represent probability distributions whose sum-current vector is bounded. A renormalization of small currentsis also inherent to the circuit topology of the basic sum-productbuilding block as presented in Chapter 4.

At the core of the SPS unit we find the branch-metric multi- Discussion of thecircuitplier (Q11 to Q14) consisting of only one bipolar transistor (or

one MOS transistor in weak inversion) per element-wise multi-plication of two discrete probability distributions. The dashedtransistors Q13 and Q14 are present due to symmetry considera-tions and do not contribute to the output signal, as described inChapter 4. A small overhead is added by the extra transistors atthe input doing the logarithmic compression of the input prob-ability distribution. For practical binary cases with a constraintlengthK ≥ 7 and rateR= 1/2, the overhead of four input tran-sistors is negligible compared to the 2K+1 transistors neededfor the multiplying part. The bases of Q11 to Q14 are connected(shown as dashed lines in Fig. 5.45) to one of the logarithmi-cally compressed input metrics according to the edge label ofthe trellis diagram. Before adding the two corresponding pathsof the trellis diagram, the product currents are copied twice bythe pMOS transistors M10 to M24. One copy is used for the

Page 166: On the Design of Analog VLSI Iterative Decoders - Electronics

150 Chapter 5. Decoder Examples

to SSM

metric 00

metric 01

metric 10

metric 11

VrefB

VrefA

Ibias

M10 M12 M14 M2

M1 M3

M4 M24 M22 M20

S0 S1 S2K-1-1S2K-2

SImemory

Trellis Pattern

to SSM

Q1

Q2

Q3

Q4

Q12

Q11

Q13

Q14

Q15 Q16

Figure 5.45 Circuit implementation of a2K−1-state SPS unit for binarysymbols including its feedback loop.

Page 167: On the Design of Analog VLSI Iterative Decoders - Electronics

5.5. High-Level Study of Plain CMOS Implementations 151

current summation, and one copy is fed into the current com-parator consisting of transistors M1 to M4. The result of thiscomparison is then passed to the SSM unit by a digital inverterwhere it is stored and used for the survivor-path trace-back pro-cess. The sum current of each state is stored in a ping-pong SImemory cell to form the feedback loop of the SPS unit (drawnas blackbox in Fig. 5.45). Transistors Q15 and Q16 – togetherwith the remaining 2K−2 sections for each state – form a renor-malization circuit. The sum current of all states together is thusdefined byIbias and affects the speed of the whole decoder.

Although a BiCMOS process is used for the probabilistic ana- CMOSimplementation

possiblelog Viterbi decoder, the architecture is very attractive for high-and highest-speed implementations of such decoders, since thecircuits have fewer transistors in the data path than other ana-log Viterbi decoder implementations. Speed can be adapted ina wide range by changing the bias current. On the other hand,an ordinary CMOS process can be used to build an ultra-low-power analog Viterbi decoder with transistors operating in weakinversion. In this case, all bipolar NPN transistors are replacedby nMOS transistors.

For the design of the circuit, we have to consider the same Precisionconsiderationsprecision requirements as in the case of a digital implemen-

tation. A resolution of 5 to 7 bits on the log-likelihood level,i.e., for the squared Euclidean distances, is generally sufficientfor digital circuit implementations. Smaller quantization of thedata induces noticeable losses in the coding gain. Having inmind the precision discussion of Section 4.4.1, a design fulfill-ing these specs is feasible even with relatively small transistorsizes. However, systematic errors should be avoided by doinga correct layout of the circuit chip. This point is even moreimportant, since we reuse the same computation section all thetime.

5.5High-Level Study of Plain CMOSImplementations

Today’s semiconductor technology is driven by digital applica- CMOS drivessemiconductor

technologytions in CMOS. Hence, BiCMOS technology lags by at leastone and a half generations behind the leading CMOS technolo-gies. For heavily parallel analog circuit applications, the mini-

Page 168: On the Design of Analog VLSI Iterative Decoders - Electronics

152 Chapter 5. Decoder Examples

mal feature size, which is the key parameter for digital applic-tions, is not always the most important parameter. Wiring den-sity in all three geometrical dimensions is equally or even moreimportant for complex connection patterns. A second argumentfor using CMOS technology instead of BiCMOS technology isits lower production cost. For economic reasons, everythingthat can be done in CMOS will be implemented in this technol-ogy.

For all the practical and economical reasons mentioned above,We would like toimplement ourcircuits in CMOStechnology

it will be advantageous to implement the circuits of the consid-ered analog probability propagation networks in CMOS tech-nology. Weak inversion operation of the MOS transistors en-ables the direct implementation of the circuits, but by naturethey are relatively slow due to the low current densities inthe transistors. Raising the current level does not help, sincewidening the transistors to enable weak-inversion operation ata higher current level increases the parasitic gate-source capac-itances proportionally and the speed measuregm/C remains al-most at the same level. However, with the transistor sizes of to-day’s advanced CMOS technologies rapidly shrinking towardsthe sub-0.1µm range, the transistors are operating more andmore in moderate or even weak inversion even for high-speedoperation. We will support this claim briefly after having in-troduced a very simple MOS model which is continuous fromweak inversion to strong inversion.

In a second subsection we will introduce the concept of dif-Redundancyimproves decodingperformance

ferent factor-graph representations and thus different decoderarchitectures for the same underlying code. By doing this wewill get code descriptions with redundant equations. By ex-plicitly constructing code descriptions with over-constrainedequation sets we will get better decoding performance if weuse quadratic-law MOS tranistors in our probability calculationmodules. Part of the work presented in this subsection was donein two diploma projects under our supervision: Moser [150] in-vestigated the [8,4,4] Hamming code and its different realiza-tions and Fromherz and Schinca [151] implemented the contin-uous MOS model in the high-level simulation tool.

Page 169: On the Design of Analog VLSI Iterative Decoders - Electronics

5.5. High-Level Study of Plain CMOS Implementations 153

5.5.1Continuous CMOS Model from Weak to StrongInversion

The operational regions of exponential behaviour in weak inver-Continuously fromexponential to

quadratic behavioursion and the quadratic behaviour of strong inversion in CMOScircuits are not separated abruptly. There is a smooth transi-tion between these two regions of operation. Thus, it will beinteresting to see by how much the bit-error rates are degradedif the MOS transistors are operated in moderate or strong in-version. To simulate these effects, we set up an object-orientedhigh-level simulation environment. All transistors of the coremuliplier matrix are in saturation and are modeled accordingto [109,152]

ID = 2nβ U2T ln2

(1+exp

VG −nVS− Vth

2nUT

), (5.23)

whereβ = µCox W/L, the slope factorn, and all other symbolshave their usual meaning (see also the List of Symbols at theend of this text). The finite output resistance of the MOS tran-sistors and all other non-idealities were neglected. We will usethis simple model in the high-level simulations of Section 5.5.3to compare different decoder implementations.

Towards Weak Inversion with State-of-the-Art CMOSProcesses

The degree of inversion of a single transistor can be determined Definition of theinversion coefficientby the so-called inversion coefficient IC, which was introduced

by Vittoz [152]. It describes the ratio between the actual draincurrent and the specific current, which is the factor before thesquared logarithm function of (5.23):

IC = ID

2nβ U2T

. (5.24)

For IC� 1 the transistors operate in weak inversion; generally Advanced CMOSprocesses work more

and more in moderateor even weak

inversion

IC < 0.1 is enough for a reasonable approximation of the expo-nential behaviour. By lowering the transistor lengthL and leav-ing the transistor widthW and the drain currentID the same, weactually diminish the inversion coeffient proportionally. At the

Page 170: On the Design of Analog VLSI Iterative Decoders - Electronics

154 Chapter 5. Decoder Examples

same time, the parasitic gate capcitance CG is also proportion-ally reduced. Moreover, the frequency measuregm/CG goes upby 1/L2 sincegm is assumed directly proportional toID in theweak-inversion region. If we want to achieve a given operationfrequency and use a more advanced process with smaller min-imal feature sizeLmin, we will observe that the inversion co-efficients of the transistors will be reduced by a factor 1/L3

min.So just using a more sophisticated CMOS process pushes theoperation of the transistors rapidly towards the weak-inversionregion. Hence, advanced CMOS processes will allow the high-speed operation of our building blocks in weak-inversion andthus without any approximations. Additionally, these CMOSprocesses generally provide many metal layers, which enablesvery complicated connection patterns both on local and globalscale. Thus we may implement very complex systems on onesingle chip.

5.5.2 Redundant Equations and Code Realizations

In the following, we will introduce the concept of redundantparity-check equations by proposing different realizations ofthe [8,4,4] extended Hamming code. We will use these codedescriptions afterwards for high-level simulations to comparethe decoding performance using different operation regions ofthe MOS transistor.

The [8,4,4] extended Hamming code is characterized in its ba-The [8,4,4] extendedHamming code sic form by the four equations

x5 = u1 ⊕u2 ⊕u3, (5.25a)

x6 = u1 ⊕u2 ⊕u4, (5.25b)

x7 = u1 ⊕u3 ⊕u4, (5.25c)

x8 = u2 ⊕u3 ⊕u4, (5.25d)

where ⊕ denotes the addition modulo-2 operation. Theseparity-check equations encode the datawordu = [u1,u2,u3,u4]onto the codewordx = [u1,u2,u3,u4,x5,x6,x7,x8]. The codeis a so-called systematic code because the databitsu1, . . . ,u4appear directly in the codeword;x5, . . .x8 are called parity bits.

Page 171: On the Design of Analog VLSI Iterative Decoders - Electronics

5.5. High-Level Study of Plain CMOS Implementations 155

u3u2u1 u4

x7x6x5 x8

Factor graph of the extended [8,4,4] Hamming code in itsbasic form.

Figure 5.46

In Fig. 5.46, the basic realization of the [8,4,4] Hamming code Changing the coderealization by

introducing statevariables

is shown. Note that there exist other realizations, i.e., otherfactor graphs of the same code. They may lead to different de-coding performance as we will show in the following subsec-tion. To illustrate these differences, two additional examplesare given in Fig. 5.47 and Fig. 5.48. For the second realization,four hidden statess1 to s4 are introduced in a tail-biting manner:

s1 = u3 ⊕u4, (5.26a)

s2 = u4 ⊕u1, (5.26b)

s3 = u1 ⊕u2, (5.26c)

s4 = u2 ⊕u3. (5.26d)

Using these hidden states, the parity-check equations (5.25a) to(5.25d) become

x5 = u3 ⊕s3, (5.27a)

x6 = u2 ⊕s2, (5.27b)

x7 = u1 ⊕s1, (5.27c)

x8 = u4 ⊕s4. (5.27d)

The factor graph of Fig. 5.47 is a straightforward image of theabove eight equations (5.26a) to (5.27d).

Page 172: On the Design of Analog VLSI Iterative Decoders - Electronics

156 Chapter 5. Decoder Examples

u2

x6

u1

u4

x8

x7

x5

u3

Figure 5.47Realization 2 of the extended [8,4,4] Hamming code with fourstate variable nodes.

Page 173: On the Design of Analog VLSI Iterative Decoders - Electronics

5.5. High-Level Study of Plain CMOS Implementations 157

u1

x8

x5

u4 u3

x6

u2

x7

Realization 3 of the extended [8,4,4] Hamming code with fourredundant parity checks.

Figure 5.48

Realization 3 has been obtained by adding four additional par-Redundant equationsfor the extendedHamming code

ity check equations to the equation set (5.25a) to (5.25d) [150].These redundant equations add additional complexity in the de-coder, but they do not change the code itself. The codeword isstill built using the original parity-check equations (5.25a) to(5.25d).

5.5.3High-Level Simulation Results

CMOS-only decoder for the extended Hamming code

A first simulation set was run with the three realizations of High-levelsimulations for the

extended Hammingcode

the [8,4,4] Hamming code presented in the previous subsec-tion [150]. The transistors were designed to operate in weakor in strong inversion. The bits were encoded according to theparity-check equations (5.25a) to (5.25d) and then transmittedover a channel with additive white Gaussian noise (AWGN).The results of Fig. 5.49 indicate clearly that different realiza-

Page 174: On the Design of Analog VLSI Iterative Decoders - Electronics

158 Chapter 5. Decoder Examples

0 1 2 3 4 5 610 -4

10 -3

10 -2

10 -1

10 0

SNR [dB]

BE

R

Realization 1

Realization 3Realization 2

exponential

quadratic

Figure 5.49 High-level simulations for a (8,4,4) Hamming code where theencoded bits are transmitted over an AWGN channel.Comparison between exponential and quadratic characteristicof MOS transistors.

tions of the same code behave differently during decoding op-eration. Realization 3 with four redundant equations performsbest, both in weak and strong inversion, but the other realiza-tions also perform rather well in the quadratic (strong inver-sion) region, with a coding loss of less than 1dB compared tothe ideal exponential characteristic. These results suggest thatthe use of redundant equations may even help to overcome cod-ing loss when using MOS transistors in the quadratic (stronginversion) region [151].

CMOS-only decoder for the tail-biting trellis code

A second experiment was conducted using the binary [18,9,5]A slight coding lossfor the tail-bitingdecoder if operatedwith MOS devicesonly

tail-biting trellis code of Section 5.2. All the NPN BJTs weresimply replaced by properly sized nMOS transistors. As isshown in Fig. 5.50, the decoder of Fig. 5.13 operated in thequadratic region loses at most 0.5dB compared to its ideal ex-

Page 175: On the Design of Analog VLSI Iterative Decoders - Electronics

5.6. Appendix — Schematics of the Tail-Biting Trellis Decoder 159

0 1 2 3 4 5 610-5

10-4

10-3

10-2

10-1

100

SNR [dB]

BE

Rideal exponentialquadratic

High-level simulations for a binary [18,9,5] tail-biting trelliscode where the encoded bits are transmitted over an AWGN

channel. Comparison between exponential and quadraticcharacteristic of MOS transistors.

Figure 5.50

ponential implementation. However, note that no mismatch ef-fects were included so far in this high-level simulation setup.But it is expected that these mismatch-effects only introduce aslight loss in decoding as it was discussed in Section 4.4.1.

Comparable attempts to force circuits, originally designed for Exchanging MOSdevices for BJTs can

be found in othercircuits

bipolar technology, to operate in the quadratic (strong inver-sion) region of plain CMOS technology have been made. Forexample, Gilbert’s array-normalizing circuit [131] has beenused in a fuzzy logic controller to scale current vectors [153].Although the circuit does not anymore scale the quantities cor-rectly, the order of precedence is preserved in the monotoni-cally increasing transfer function, which is sufficient for build-ing fuzzy logic circuits.

5.6Appendix — Tail-Biting Trellis Decoder

Page 176: On the Design of Analog VLSI Iterative Decoders - Electronics

160

Cha

pter

5.D

ecod

erEx

ampl

es

pz00

_1pz

00_2

pz10

_2pz

10_1

pz01

_1pz

01_2

pz11

_2pz

11_1

py00

py01

py10

py11

VrefB px00 px01 px10 px11

VrefA

Iref

Figure 5.51: Circuit implementation of the type-B module: forward trellis.

Page 177: On the Design of Analog VLSI Iterative Decoders - Electronics

5.6.

App

endi

x—

Sche

mat

ics

ofth

eTa

il-Bi

ting

Trel

lisD

ecod

er16

1

pz00

_1pz

00_2

pz01

_2pz

01_1

pz11

_1pz

11_2

pz10

_2pz

10_1

py00

py10

py01

py11

VrefB px00 px10 px01 px11

VrefA

Iref

Figure 5.52: Circuit implementation of the type-C module: backward trellis.

Page 178: On the Design of Analog VLSI Iterative Decoders - Electronics

162

Cha

pter

5.D

ecod

erEx

ampl

es

p0 p1

px00

px10

px01

px11

VrefB py00 py10 py01 py11

VrefA

Iref Iref

Figure 5.53: Circuit implementation of the type-D module: final summarization.

Page 179: On the Design of Analog VLSI Iterative Decoders - Electronics

5.7. Appendix — Schematics of the Turbine Decoder 163

5.7Appendix — Schematics of the TurbineDecoder

Out3<0>

In1<0>

reset

In1<1>

Iref

Out3<1>

Out2<0>

Iref

Out2<1>

In3<0>

In3<1>

In2<0>

In2<1>

VrefA

VrefB

M1r

M3r M2r

Circuit implementation of the fully bidirectional 2.5-portvariable node. Denoted as ‘Channel-Bit’ in Fig. 5.28.

Figure 5.54

Page 180: On the Design of Analog VLSI Iterative Decoders - Electronics

164 Chapter 5. Decoder Examples

Out3<0>

In1<0>

reset

In1<1>

Iref

Out3<1>

Iref

In3<0>

In3<1>

In2<0>

In2<1>

VrefA

Out2<1>

VrefB

Out2<0>

M1r

M3r M2r

Figure 5.55 Circuit implementation of the fully bidirectional 2.5-portvariable node with output bit slicer. Denoted as‘Info-/Channel-Bit’ in Fig. 5.28.A special ‘boxed’pMOS-transistor symbol is used for the cascoded transistorson the top right part of the circuit.

Page 181: On the Design of Analog VLSI Iterative Decoders - Electronics

5.7. Appendix — Schematics of the Turbine Decoder 165

Iref

Iref

Out4

currentcomparator

bit_out

1:2

Icomp

Circuit implementation of the fully bidirectional 2.5-portvariable node with output bit slicer. Denoted as

‘Info-/Channel-Bit’ in Fig. 5.28. A special ‘boxed’pMOS-transistor symbol is used for the cascoded transistors

on the top right part of the circuit.

Figure 5.55

Page 182: On the Design of Analog VLSI Iterative Decoders - Electronics

166 Chapter 5. Decoder Examples

Out3<0>

In1<0>

In1<1>

Iref

Out3<1>

Out1<0>

Iref

Out1<1>

In3<0>

In3<1>

VrefBIn2<0>

In2<1>

VrefA

Figure 5.56 Circuit implementation of the fully bidirectional 3-portsoft-XOR node (left part).

Page 183: On the Design of Analog VLSI Iterative Decoders - Electronics

5.7. Appendix — Schematics of the Turbine Decoder 167

Out2<0>

Out2<1>

Iref

Circuit implementation of the fully bidirectional 3-portsoft-XOR node (right part).

Figure 5.56

Page 184: On the Design of Analog VLSI Iterative Decoders - Electronics

168 Chapter 5. Decoder Examples

In1<0>

In1<1>

Iref

Out1<0>

Iref

Out1<1>

Out2<0>

Out2<1>

In3<0>

In3<1>

VrefB

In4<0>

In4<1>

VrefA

In2<0>

In2<1>

<0>

<0>

<0>

<1>

<1>

<1>

<0>

<1>

Iref

Figure 5.57 Circuit implementation of the fully bidirectional 4-portsoft-XOR node (left part).

Page 185: On the Design of Analog VLSI Iterative Decoders - Electronics

5.7. Appendix — Schematics of the Turbine Decoder 169

<0>

<0>

<1>

<1>

Iref

Out3<0>

Iref

Out3<1>

Out4<0>

Out4<1>

<0>

<1>

<0>

<1>

Iref

Circuit implementation of the fully bidirectional 4-portsoft-XOR node (right part).

Figure 5.57

Page 186: On the Design of Analog VLSI Iterative Decoders - Electronics

170 Chapter 5. Decoder Examples

Page 187: On the Design of Analog VLSI Iterative Decoders - Electronics

Chapter 6

Concluding Remarks

6.1Summary of the Results

In this dissertation we have presented a technique for effi-A new technique foran analog

sum-productalgorithm

implementation

ciently implementing the sum-product algorithm (or probabil-ity propagation algorithm) in analog VLSI technology. The de-scribed new type of analog computing networks exhibits a nat-ural match between probability theory and transistor physics.The elementary modules of which these networks are composedinclude probabilistic versions of all standard logic gates as wellas more general non-binary sum-product modules. The obviousapplication of such networks is the decoding of error-correctingcodes as described in this dissertation. However, any factorgraph where all function nodes of degree larger than one are{0,1}-valued can be mapped onto such analog networks.

The transistor-level implementation of the building blocks areTranslinear transistorcore of the building

blocksvery simple current-mode vector multipliers and current-modeselective adders that process discrete probability distributions.The core circuits can be interpreted as translinear circuitsor log-domain signal processors. Basically one transistor isneeded to build the pair-wise product of two elements of dis-crete probability distributions.

The presented networks follow a bio-inspired approach andBio-inspired circuitdesign approachtherefore omit many plagues of traditional analog circuit de-

sign such as data-representation overflows, temperature depen-dence, linear approximations of non-linearities, component-variations, and tedious manual design flows. The circuits ex-ploit rather than fight the inherent non-linearities of the usedexponential characteristic of both bipolar junction transistorsand weakly inverted MOS transistors. By building large, highlyconnected networks out of very simple and low-precision com-putation nodes, a high precision and high processing through-

Page 188: On the Design of Analog VLSI Iterative Decoders - Electronics

172 Chapter 6. Concluding Remarks

put is reached on the system level. Due to their simplicity andcomputational efficiency, our analog networks exhibit a distinctadvantage in the speed-power-ratio compared to their compara-ble digital counter-parts. According to our (still limited) expe-rience, this advantage amounts to at least two orders of magni-tude.

We have also presented a design methodology that allows a di-Automated analogdesign-flow rect mapping of the parity-check-matrix description of a given

code to the factor-graph representation and further on to thestructural description of the decoding network. It is based onthe standard digital design-flow of chip-design environmentssuch as is available in Cadence design tools. By automati-cally constructing the decoder circuits rather than doing a full-custom design by hand, we circumvent tedious manual verifica-tions and fundamentally speed up the design process. This alsoopens the prospect of fabricating large first-time-right decodersystems.

The practical decoder examples presented in this dissertationPracticalimplementationsperform well

showed a system-level behaviour that is remarkably robustagainst all sorts of non-idealities even for the discrete transistor-device implementation. The measured results also showed aclose agreement with the transistor-level simulations. Further-more, we verified many theoretical design aspects with thesepractical implementations. Finally, the high-level design stud-ies are a rich source of ideas for future work on this subject.

Beside the decoding applications, which are the main subject ofProbablity-propagation networksare also useful forother applications

this dissertation, the probability propagation networks may beapplied in various other related domains such as the tracking ofhidden-Markov models, widely used for many pattern recog-nition tasks, and the inference on Bayesian networks, whichappear in the context of artificial intelligence problems.

6.2 Ideas for Further Work and Outlook

To round off this dissertation, we briefly describe a few openissues and ideas for future research. They mainly represent thethoughts of the author where future work might go to.

Decoder for a full-size LDPC code. So far, only relativelysmall decoding networks were implemented. They incorporate

Page 189: On the Design of Analog VLSI Iterative Decoders - Electronics

6.2. Ideas for Further Work and Outlook 173

only some ten or hundred factor graph nodes. However, for agood BER performance, the length of a code needs to be quitelarge. Generally, a code with a length of some thousand bits percodeword will already be sufficient for a practical application.Hence, we need to construct larger decoding networks, whichgives rise to many issues such as the exacerbating simulationtimes and the testability of such networks. These problems havenot been investigated in depth yet, but they will be of increasingimportance for future designs.

Full CMOS implementation. The chip implementations de-scribed in this dissertation rely on the exponential characteris-tic of BJTs. However, it is economically and technically inter-esting to use standard CMOS processes. As we have seen inSection 5.5, the advanced CMOS processes shift the operationregion of the transistors more and more towards moderate in-version or even weak inversion. This opens the new prospect ofimplementing high-speed analog decoders in CMOS technol-ogy only.

Use of redundant code equations.A second possibility forpurely CMOS analog decoders is the use of redundant equa-tions in the code description. As we have seen in the high-levelsimulation results, this direction is very promising since we donot have to rely on the most advanced CMOS technologies.

Application of the probability-propagation calculus to otherproblems. In this dissertation we have only described decodingnetworks. However, we have seen that many problems can bedescribed by factor graphs which in turn can be directly con-verted to an analog probability-propagation network. It wouldbe very interesting to apply the design technique to other appli-cation fields such as artificial-intelligence problems that mightappear in on-line fault-detection circuits of complex systems.

Adaptive filters. By changing the signal representation froma probability-based interpretation to a real-valued interpreta-tion, the well-known equal gate and soft-XOR gate may beoperated as real-value adders and real-value multipliers, re-spectively [154]. Hence, they represent the basic operationsof discrete-time filters. By making the filter taps adaptive, wecould easily build adaptive FIR and IIR filters. Adaptive FIRfilters are commonly used for equalizing wire-line channels.

Joint channel-equalizer/-decoder. In the communicationscommunity, there exist several concepts of jointly equalizing

Page 190: On the Design of Analog VLSI Iterative Decoders - Electronics

174 Chapter 6. Concluding Remarks

a given channel and decoding the transmission code. But allof them work in the digital domain and it thus may make nosense to do decoding in the analog way, whereas the remainingpart of the receiver works digitally. So why not build most ofthe receiver front-end using our analog probability networks?For example, the decision-feed-back equalizer (DFE) is a goodcandidate for an analog network implementation, since all thebasic operations can be implemented using our generic build-ing blocks. By doing so, we get one step closer to the antennaor the line interface of a data communication system withoutflipping too much back and forth between analog to digital.

All-analog receiver system.Our experience so far is that manyindividual blocks of a receiver system can be implemented inanalog electronics. Despite the fact that many renowned re-searchers postulate software radio, i.e., a system that consistsmerely of an A/D converter as close as possible to the antennaand digital processors for signal processing, we think that forcertain demanding applications analog signal processing in anintelligent manner is the way to go. Our long-term aim is an all-analog receiver system, i.e., to have no digital signals before thedecoder block, since the analog decoder can make an inherentA/D conversion. This would potentially provide very efficienthighest-speed and ultra-low-power communications systems asneeded by today’se-society.

Page 191: On the Design of Analog VLSI Iterative Decoders - Electronics

Appendix A

Selected CircuitStructures

A.1Transistor Terminals and Voltages

In order to facilitate the analysis of a given circuit and to pre-vent from sign problems in circuits using complementary tran-sistors, we define all voltages as positive values for the normalactive region. In the case of a bipolar process the voltages of thetransistor terminals are defined with the rails as their referenceas is shown in Fig. A.1. For the case of MOS transistors, whichare in principle symmetrical structures, we define the voltageswith respect to the potential of the well or substrate. Hence,the devices are four-terminal devices. But often, we draw onlythe three main terminals drain, gate, and source. In the case ofann-well technology, the substrate (which is also called bulk)is p-doped semiconductor material an thus is generally tied tothe negative rail to prevent forward biasedpn-junctions. Con-versely, then-doped substrate of ap-well technology is gen-erally connected to the positive supply rail. Since the terminalvoltages of a MOS transistor are bulk-referenced, the function-ality of a given terminal is only defined by the voltage level.The current leading terminal with the smaller voltage becomesthe source of the transistor whereas the other terminal becomesthe drain.

For the equations describing the behaviour of the semicon-ductor devices, consult one of the excellent textbooks thatexist today: for MOS tranistors see [109, 155] and for BJT[138,156–158]. The references [109,156–158] are very devicephysics oriented, whereas the remaining books are more designoriented.

Page 192: On the Design of Analog VLSI Iterative Decoders - Electronics

176 Appendix A. Selected Circuit Structures

GB

B

E

E

C

CG

DD

S

S

n-typesubstrate

p-typesubstrate

VG

VE

VE

VC

VC

VS

VB

VDD

VSS

VB

VS

VS

Vsub

Vsub

VD

VD

PNP

NPN

pMOS

nMOS

pos. rail

neg. rail

Figure A.1 The definition of the terminals and its corresponding voltagesfor both BJT (on the left side) and CMOS (on the right side).

A.2 Cascode Structures

The output resistance of a single saturated MOS transistor (orA composite ‘supertransistor’ forward-biased transistor in the case of a BJT) is generally

fairly limited. In the case of a MOS transistor it is directlyproportional to the transistor lengthL. In order to reduce theeffect of a drain-current change due to drain-source-voltagevariations, the lengthL has to be made fairly long for practi-cal applications. Hence, the maximum operation frequency isdimished by the same amount. A simple solution to that prob-lem is the so-called cascode structure. The many available cas-code structures lead to ‘super transistors’ with improved out-put resistance. The composite transistors can be used in re-placement for any transistor of a given circuit. The symbol ofsuch a super transistor, that we have used for our purpose, isshown in Fig. A.2a). Many different forms of cascode circuitsexist [108, 130, 135, 137, 138]. In this context we cite just two:the simple cascode structure is shown in Fig. A.2b) and the reg-ulated cascode structure in Fig. A.2c). All the reasoning thathas been advanced so far for MOS transistors is also valid forBJTs, with only slight differences in the calculations.

Page 193: On the Design of Analog VLSI Iterative Decoders - Electronics

A.2. Cascode Structures 177

-

+

VG2Vref

VG1

IDIDID

VG1

b)a) c)

VG1

Cascoded nMOS transistors: a) the used symbol for acascoded nMOS transistor, b) a simple cascode structure, and

c) a regulated cascode structure.

Figure A.2

The simple cascode consists of only two transistors: the main A simple cascodestructuretransistor below and the stacked cascode transistor. This cas-

code transistor stabilizes the drain voltage of the main transis-tor. The output resistance of the main transistor is multipliedby the transconductance of the cascode transistor times the out-put resistance of this same transistor. Hence, by still havingfairly small transistor sizes, the output resistance of the com-posite transistor is considerably augmented. The transistors aredesigned such that they both operate in saturation. The biasingvoltageVG2 is chosen according to the maximum drain currentand kept constant in time.

For high-precision circuits, the simple cascode structure may The regulatedcascode structurenot raise the output resistance sufficiently. One solution to

this problem consists of stacking more than one cascode stage.However, this also raises the minimum supply voltage require-ments drastically. A much more intelligent solution to thatproblem was proposed by Säckingeret al. [135]. The so-calledregulated cascode structure actively controls the drain voltageof the main transistor. By doing so, much higher output re-sistances can be achieved. However, the operational amplifieradds circuitry overhead. The most simple implementation ofthis amplifier would consist of only one source follower tran-sistor. But generally more sophisticated implementations of theamplifier have to be chosen. Examples for such implementa-tions can be found in [41] and the references therein.

Page 194: On the Design of Analog VLSI Iterative Decoders - Electronics

178 Appendix A. Selected Circuit Structures

A.3 Current Mirrors

Current mirrors are a key element in analog integrated circuits.They are used to duplicate currents or to fold or cascade parts ofthe circuit in order to reduce the supply-voltage requirements.All the structures described below can equally well be imple-mented with bipolar transistors. The dimensioning of the tran-sistors is only slightly different. An overview of different cur-rent mirror structures can be found in [108,130,137,138].

The most simple structure of an nMOS current mirror is shownThe most simplecurrent mirrorconsists of just twotransistors

in Fig. A.3a). Both transistors have to be in saturation and werely on perfect matching for an ideal operation. A brief analysisshows that the copy errors due to the finite output resistance ofthe mirror transistors are relatively large [130, 137, 138]. Theoutput resistance is improved by making the tranistors longer.Note that the current mirror can be operated in both strong in-version or weak inversion.1 However, a simple anlysis showsthat current mirrors operated in strong inversion match betterthan those operated in weak inversion [142, 152]. For a givenW L, whereW is the width of the transistor, the matching ofthe transistors is best if the current mirror is designed to oper-ate with a largeVGS, i.e., to force the transistor to deep stronginversion. Matching can be improved by augmenting the activetransistor areaW L. This reduces the relative errors of randomfabrication errors.

The design of a current mirror is generally started by imposingDesign of the simplecurrent mirror a certain voltage swing at the nominal current. This leads to a

W/L in a given semiconductor technology. The minimum volt-age between drain and source that still allows the operation inthe saturated region can be derived from the given gate-sourcevoltage. Finally, the active transistor areaW L is adapted un-til the desired level of matching is achieved. Note, however,that the parasitic gate capacitance is augmented by the sameamount. Hence, the increase of the parasitic capacitance willreduce the maximum operation speed. Generally, several pa-rameters have to be traded off during the design process.

Beside raising the transitor length to augment the output re-Cascoded currentmirror

1The degree of saturation and the degree of inversion are two orthogonalaxes of the operational modes of MOS transistors. We can freely choose oneof the four quadrants that is best suited for the purpose of the transistor inthe circuit

Page 195: On the Design of Analog VLSI Iterative Decoders - Electronics

A.3. Current Mirrors 179

Iin Iout

b)a)

Iin

Ibias

Iout

c)

Iin Iout

WL

1x2

·WL

W2

L2

WL

WL

1x2 +1

·

WL

Different nMOS current mirrors: a) the most simple currentmirror, b) a simple cascoded current mirror, and c) a

low-voltage version of a cascoded current mirror.

Figure A.3

sistance of the mirror transistors, we could apply the cascodetechniques described in Appendix A.2. This solution is shownin Fig. A.3b). The diode connected transistors are always in sat-uration, thus no special attention has to be paid to that part ofthe circuit. The transistors of the output section are designedsuch that the transistors are saturated for the nominal current.Although very simple and self-biasing, a drawback of the cir-cuit are two stacked diode-connected transistors. They limit theminimal supply voltage for a correct operation heavily.

A solution for low-voltage operation is shown in Fig. A.3c) Low-voltage versionof the cascoded

current mirror[136]. Compared to the simple cascoded current mirror, weomit the threshold voltage of the cascode transistor. However,we need a special biasing circuit which is fortunately very sim-ple. The sizing of the biasing transistor, which is located onthe left side of the low-voltage current mirror schematic, is ac-cording to Fig. A.3c). Again, all transistors are dimensioned tooperate in the saturation region.

Page 196: On the Design of Analog VLSI Iterative Decoders - Electronics

180 Appendix A. Selected Circuit Structures

Page 197: On the Design of Analog VLSI Iterative Decoders - Electronics

List of Abbreviations

A/D Analog-to-DigitalACS Add-Compare-Select; computation unit of

Viterbi decodersAPP a posterioriProbabilityBCH algebraic code type named after the inventors

Bose, Chaudhuri, and HocquenghemBCJR decoding algorithm named after the inventors

Bahl, Cocke, Jelinek, and RavivBER Bit Error RateBiCMOS Bipolar and CMOS transistors in the same sili-

con processBJT Bipolar Junction TransistorBMC Branch Metric Computation; computation unit

of the Viterbi decoderBSC Binary Symmetric ChannelCAD Computer-Aided DrawingCAE Computer-Aided EngineeringCMOS Complementary MOSCOB Chip-On-Board; mounting technology for inte-

grated circuits directly on a PCBD/A Digital-to-AnalogDFE Decision-Feedback EqualizerDMC Discrete Memoryless ChannelDR Dynamic RangeESD ElectroStatic DischargeFBA Forward-Backward AlgorithmFIR Finite Impulse Response; see also IIRHMM Hidden Markov ModelI/V current-to-voltage conversionIC Integrated CircuitIIR Infinite Impulse Response; see also FIRJTAG Joint Test Action Group. This group created the

IEEE 1149.1 standard defining test access portsand boundary scans

LDPC Low-Density Parity-CheckLDS Linear Discrete-time SystemLED Light Emitting DiodeMAP Maximiuma posterioriProbability

Page 198: On the Design of Analog VLSI Iterative Decoders - Electronics

182 List of Abbreviations

MOS Metal–Oxide–SemicondictorMOSFET Metal–Oxide–Semicondictor Field-Effect Tran-

sistorP&R Place-and-Route; tool in the digital chip-design

flowPCB Printed Circuit BoardPSD Power Spectral DensityQCRA Quasi-Cyclic Repeat-Accumulate; LDPC code

typeRA Repeat-Accumulate; LDPC code typeSC Switched-CapacitorS/H Sample-and-HoldSI Switched-CurrentSNR Signal-to-Noise RatioSPS Sum-Product-Select; computation unit of a

MAP Viterbi decoder using probability propa-gation modules. See also ACS

SSM Storage Survivor Memory; unit in Viterbi de-coders

TA Transconductance AmplifierTL Translinear LoopTN Translinear NetworkV/I voltage-to-current conversionVI voltage-currentVLSI Very Large Scale Integration

Page 199: On the Design of Analog VLSI Iterative Decoders - Electronics

List of Symbols

Symbols related to coding

Roman Symbols

u;ui Uncoded data vector; uncoded bit numberiu; ui Output of the decoder; decoded bit numberi .

Estimate ofu, ui

x;xi Coded data vector; coded bit numberiy; yi Received data vector; received bit numberis(t) Input of the physical channel at timetr (t) Output of the physical channel at timetb BitC (i) Code; (ii) Channel capacityd Hamming distanceEb Energy of an information bitEc Energy of a channel bit (chip)Fk k-dimensional vector space over the fieldFg Girth of the graph; smallest number of branches

to form a closed loop in a factor graphG Generator matrix of a block codeGF(q) Galois-field withq elementsH Parity-check matrix of a block codeK Constraint length of a convolutional codem (i) Memory order of a convolutional encoder;

(ii) size of alphabet ofX or Yn size ofX or Y alphabetN0 One-sided noise power spectral densitypX(x); p(x) Probability of an actual realizationx of the ran-

dom variableX; short-hand notation of the sameprobability

p(x) Approximate probability ofxQ(x) Q-function of the Gaussian statisticR Code rateTs Symbol duration; sampling interval

Page 200: On the Design of Analog VLSI Iterative Decoders - Electronics

184 List of Symbols

Greek Symbols

ε Transition probability of binary symmetricchannel

λ Squared Euclidean path-metric in the Viterbi al-gorithm

π(.) Permutation; interleaver of a Turbo codeσ 2

n Variance of white Gaussian noiseµ Probabilistic path-metric in the Viterbi algo-

rithmµi (b) Short-hand notation forp(yi |xi = b)µx→ f Sum-product message from a variable nodex to

a function nodefµ f →x Sum-product message from a function nodef

to a variable nodexρ Sampled outputs of the matched filter detectorξk State transition at timek

Symbols related to electronics

Roman Symbols

A Active areaAE Active emitter area of a BJTC;Ci Capacitor; Capacitor numberiCox Unit oxide capacitance of a MOS transistorf Frequencygm Transconductance of a transistori (i) Small-signal current; (ii) IndexI Large-signal currentIC Collector current of a BJTID Drain current of a MOS transistorI0 Specific current of a MOS transistorIS Specific current of a BJTJ0 Specific current density of a MOS transistorJS Specific current density of a BJTIref Reference currentIC Inversion coefficient of the channel of a MOS

transistorL Length of a transistorLmin Minimum feature size (minimum channel

length) of a MOS transistor

Page 201: On the Design of Analog VLSI Iterative Decoders - Electronics

List of Symbols 185

M;M i MOS transistor; MOS transistor numberin (i) Slope factor of MOS; (ii) Emission coeffi-

cient of a BJT; (iii) indexP Power dissipationQ; Qi Bipolar junction transistor; BJT numberiR;Ri Resistor; Resistor numberis Thicknesst TimeT Absolute Temperaturev Small-signal voltageV Large-signal voltageVBE Base-emitter voltage of a BJTVD Drain voltage of a MOS transistorVDsat Drain voltage to achieve saturation in the drain

currentVG Gate voltage of a MOS transistorVS Source voltage of a MOS transistorVth Threshold voltage of a MOS transistorW Transistor widthUT Thermal voltagekT/q; 25.9mV at 300K

Greek Symbols

β (i) Current gain of a BJT; (ii) Transfer parameterof a MOS transistor

ε,εrel Relative current errorλ (i) Channel-length modulation factor; (ii) Ther-

mal conductivity coefficientµ Carrier mobility

Page 202: On the Design of Analog VLSI Iterative Decoders - Electronics

186 List of Symbols

Page 203: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography

[1]C. E. Shannon. “A mathematical theory of communication.”BellSystems Technical Journal, vol. 27, pp. 379–423 (part I), 623–656 (part II), July 1948.

[2]C. Berrou, A. Glavieux, and P. Thitimajshima. “Near Shannon-limit error-correcting coding and decoding: turbo codes.” InProceedings of the International Conference on Communications,pp. 1064–1070. Geneva, May 1993.

[3]S. Benedetto and G. Montorsi. “Unveiling turbo codes: someresults on parallel concatenated coding schemes.”IEEE Transac-tions on Information Theory, vol. 42, pp. 409–428, March 1996.

[4]S. Benedetto and G. Montorsi. “Iterative decoding of seriallyconcatenated convolutional codes.”Electronics Letters, vol. 32,pp. 1186–1188, June 1996.

[5]R. G. Gallager. Low-Density Parity-Check Codes. MIT Press,1963.

[6]D. J. C. MacKay and R. M. Neal. “Good codes based on verysparse matrices.” InCryptography and Coding. 5th IMA Con-ference(edited by C. Boyd), no. 1025 in Lecture Notes in Com-puter Science, pp. 100–111. Springer, 1995.

[7]R. M. Tanner. “Codes with sparse graphs: transform analysisand construction.” InProceedings of the IEEE InternationalSymposium on Information Theory, p. 116. Cambridge, MA USA,1998.

[8]M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A.Spielmann. “Improved low-density parity-check codes usingirregular graphs and belief propagation.”Proceedings of the IEEEInternational Symposium on Information Theory, p. 117, Aug.1998.

[9]G. M. Shepherd.The Synaptic Organization of the Brain. OxfordUniversity Press, New York, 1974.

[10]G. M. Shepherd.Neurobiology. Oxford University Press, NewYork, 1983.

[11]C. A. Mead. Analog VLSI and Neural Systems. Addison Wes-ley Computation and Neural Systems Series. Addison Wesley,Reading, MA, 1989. ISBN 0-201-05992-4.

Page 204: On the Design of Analog VLSI Iterative Decoders - Electronics

188 Bibliography

[12] X. Arréguit, F. A. van Schaik, F. V. Baudin, M. Bidiville, andE. Raeber. “A CMOS motion detector system for pointing de-vices.” IEEE Journal of Solid-State Circuits, vol. 31, no. 12,pp. 1916–1921, Dec. 1996.

[13] P. Masa, P. Heim, E. Franzi, X. Arréguit, F. Heitger, P. Ruedi,P. Nussbaum, P. Piiloud, and E. Vittoz. “10 mW CMOS retinaand classifier for handheld, 1000 images/s optical character recog-nition system.” InProceedings of the IEEE International Solid-State Circuits Conference, pp. 204–205. San Francisco, CA, Feb1999.

[14] A. Mortara, P. Heim, P. Masa, E. Franzi, P. F. Ruedi, F. Heitger,and J. Baxter. “An opto-electronic 18 b/revolution absolute an-gle and torque sensor for automotive steering applications.” InProceedings of the IEEE International Solid-State Circuits Con-ference, pp. 182–183. San Francisco, CA, Feb. 2000.

[15] F. J. MacWilliams and N. J. A. Sloane.The Theory of Error-Correcting Codes, vol. 16. North-Holland, 1977. ISBN 0 44485193 3.

[16] S. Lin and D. J. Costello, Jr.Error Control Coding: Fundamen-tals and Applications. Prentice-Hall Series in Computer Applica-tions in Electrical Engineering. Prentice Hall, Englewood Cliffs,NJ, 1983.

[17] ITU-T. “Recommendation v.34 — a modem operating at datasignalling rates of up to 33600 bit/s for use on the generalswitched telephone network and on leased point-to-point 2-wiretelephone-type circuits.” Tech. rep., International Telecommu-nication Union, Geneva, February 1998. Available athttp://www.itu.int/itudoc/itu-_t/rec/v/v34.html .

[18] ITU-T. “Recommendation v.90 — a digital modem and ana-logue modem pair for use on the public switched telephone net-work (PSTN) at data signalling rates of up to 56000 bit/s down-stream and up to 33600 bit/s upstream.” Tech. rep., InternationalTelecommunication Union, Geneva, September 1998. Availableat http://www.itu.int/itudoc/itu-_t/rec/v/v90.html .

[19] W. Y. Chen. DSL: Simulation Techniques and Standards Develop-ment for Digital Subscriber Line Systems. Macmillan TechnologySeries. Macmillan Technical Publishing, Indianapolis, IN, 1998.

[20] T. Richardson, A. Shokrollahi, and R. Urbanke. “Design of prov-ably good low-density parity check codes.”, April 1999. Submit-ted to IEEE Transactions on Information Theory.

[21] C. Schlegel.Trellis Coding. IEEE Press, NewYork, 1997.

Page 205: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 189

[22]D. J. C. MacKay. “Good error-correcting codes based on verysparse matrices.”IEEE Transactions on Information Theory,vol. 45, no. 2, pp. 399–431, March 1999.

[23]A. J. Viterbi. “Error bounds for convolutional codes and anasymptotically optimum decoding algorithm.”IEEE Transactionson Information Theory, vol. 13, pp. 260–269, April 1967.

[24]G. D. Forney, Jr. “The Viterbi algorithm.”Proceedings of theIEEE, vol. 61, no. 3, pp. 268–278, March 1973.

[25]S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara. “Serialconcatenation of interleaved codes: Performance analysis, de-sign, and iterative decoding.”IEEE Transactions on InformationTheory, vol. 44, no. 3, pp. 909–926, May 1998.

[26]J. G. Proakis.Digital Communications. McGraw-Hill, third edn.,1995.

[27]R. E. Blahut.Theory and Practice of Error Control Codes. Addi-son Wesley, 1984.

[28]A. S. Acampora and R. P. Gilmore. “Analog Viterbi decodingfor high speed digital satellite channels.”IEEE Transactions onCommunications, vol. 26, no. 10, pp. 1463–1470, Oct. 1978.

[29]T. W. Matthews and R. R. Spencer. “An analog CMOS Viterbidetector for dgital magnetic recording.” InProceedings of theIEEE International Solid-State Circuits Conference, pp. 214–215.San Francisco, CA, 1993.

[30]M. H. Shakiba, D. A. Johns, and K. W. Martin. “Analog im-plementation of class-IV partial-response Viterbi detector.” InProceedings of the IEEE International Symposium on Circuitsand Systems, vol. 4, pp. 91–94. London, Mai 1994.

[31]M. H. Shakiba, D. A. Johns, and K. W. Martin. “A 200 MHz3.3 V BiCMOS class-IV partial-response analog Viterbi decoder.”In Proceedings of the IEEE Custom Integrated Circuit Confer-ence, pp. 567–570. Santa Clara, May 1995.

[32]M. H. Shakiba, D. A. Johns, and K. W. Martin. “An integrated200-MHz 3.3-V BiCMOS class-IV partial-response analog Viterbidecoder.” IEEE Journal of Solid-State Circuits, vol. 33, no. 1,pp. 61–75, Jan. 1998.

[33]M. H. Shakiba, D. A. Johns, and K. W. Martin. “General ap-proach to implementing analogue Viterbi decoders.”ElectronicsLetters, vol. 30, no. 22, pp. 1823–1824, Oct. 1994.

[34]M. H. Shakiba, D. A. Johns, and K. W. Martin. “BiCMOS cir-cuits for analog viterbi decoders.”IEEE Transactions on Circuits

Page 206: On the Design of Analog VLSI Iterative Decoders - Electronics

190 Bibliography

and Systems–II: Analog and Digital Signal Processing, vol. 45,no. 12, pp. 1527–1537, Dec. 1998.

[35] A. Demosthenous and J. Taylor. “Current-mode approaches toimplementing hybrid analogue/digital Viterbi decoders.” InPro-ceedings of the International Conference on Electronics, Circuitsand Systems, vol. 1, pp. 33–36. Rhodos, 1996.

[36] A. Demosthenous, C. Verdier, and J. Taylor. “A new architecturefor low power analogue convolutional decoders.” InProceedingsof the IEEE International Symposium on Circuits and Systems,vol. 1, pp. 37–40. Hong-Kong, 1997.

[37] A. Demosthenous and J. Taylor. “Low-power CMOS and BiC-MOS circuits for analog convolutional decoders.”IEEE Trans-actions on Circuits and Systems–II: Analog and Digital SignalProcessing, vol. 46, no. 8, pp. 1077–1080, Aug. 1999.

[38] K. He and G. Cauwenberghs. “An area-efficient analog VLSIarchitecture for state-parallel Viterbi decoding.” InProceedingsof the IEEE International Symposium on Circuits and Systems,vol. II, pp. 432–435. Orlando, Florida, May 1999.

[39] Z. Wang and S. B. Wicker. “An artificial neural net Viterbi de-coder.” IEEE Transactions on Communications, vol. 44, pp. 165–171, Feb. 1996.

[40] C. Verdier, A. Demosthenous, J. Taylor, and M. Wilby. “An in-tegrated analogue convolutional decoder based on the Hammingneural classifier.” InProceedings of Neural Networks and TheirApplications, pp. 150–155. 1996.

[41] M. Helfenstein. Analysis and Design of Switched-Current Net-works. Ph.D. thesis, ETH Zürich, Konstanz, 1997.

[42] H. P. Schmid. Single-Amplifier Biquadratic MOSFET-C Filters.Ph.D. thesis, Swiss Federal Institute of Technology, Zurich, Octo-ber 2000.

[43] G. S. Moschytz.MOS Switched-Capacitor Filters: Analysis andDesign. IEEE Press, New York, 1984.

[44] G. C. Temes and R. Gregorian.Analog MOS Integrated Circuitsfor Signal Processing. John Wiley & Sons, New York, 1986.

[45] C. Toumazou, J. B. Hughes, and N. C. Battersby, editors.Switched-Currents: an analogue technique for digital technol-ogy. IEE/Peter Peregrinus Ltd., 1993. ISBN 0-86341-294-7.

[46] G. J. Minty. “A comment on the shortest-route problem.”Oper.Res., vol. 5, p. 724, 1957.

[47] L. Bu and T.-D. Chiueh. “Solving the shortest path problemusing an analog network.”IEEE Transactions on Circuits

Page 207: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 191

and Systems–I: Fundamental Theory and Applications, vol. 46,no. 11, pp. 1360–1363, November 1999.

[48]R. C. Davis. “Diode-configured Viterbi algorithm error correctingdecoder for convolutional codes.” U.S. Patent 4545054, Oct.1985.

[49]R. C. Davis and H.-A. Loeliger. “A nonalgorithmic maximumlikelihood decoder for trellis codes.”IEEE Transactions on Infor-mation Theory, vol. 39, pp. 1450–1453, July 1993.

[50]H. F. Schantz. “An overview of neural OCR networks.”Journalof Information Systems Management, vol. 8, no. 2, pp. 22–27,1991.

[51]H. F. Schantz. “Neural network-OCR/ICR recognology. Theoryand applications.”Document Image Automation, vol. 13, no. 3,pp. 20–23, 1993.

[52]D. Jacquet and G. Saucier. “Design of a digital neural chip:application to optical character recognition by neural network.” InProceedings of the European Design and Test Conference EDAC-ETC-EUROASIC, pp. 256–260. 1994.

[53]J. Wang and J. Jean. “Segmentation of merged characters byneural networks and shortest path.”Pattern Recognition, vol. 27,no. 5, pp. 649–658, May 1994.

[54]D. A. Kelly. “Neural networks for handwriting recognition.” InProceedings of the SPIE, vol. 1709, pp. 143–154. 1992.

[55]M. Schenkel, I. Guyon, and D. Henderson. “On-line cursivescript recognition using time-delay neural networks and hiddenmarkov models.”Machine Vision and Applications, vol. 8, no. 4,pp. 215–223, 1995.

[56]R. Seiler, M. Schenkel, and F. Eggimann. “Off-line cursive hand-writing recognition compared with on-line recognition.” InPro-ceedings of the 13th International Conference on Pattern Recog-nition, vol. 4, pp. 505–509. 1996.

[57]J. Rouat. “Spatio-temporal pattern recognition with neural net-works: application to speech.” InProceedings of Artificial NeuralNetworks - ICANN ’97, pp. 43–48. 1997.

[58]G. K. Venayagamoorthy, V. Moonasar, and K. Sandrasegaran.“Voice recognition using neural networks.” InProceedings of the1998 South African Symposium on Communications and SignalProcessing-COMSIG ’98, pp. 29–32. Rondebosch, South Africa,Sept. 1998.

[59]K. M. Olson and G. A. Ybarra. “Performance comparison ofneural network and statistical pattern recognition approaches to

Page 208: On the Design of Analog VLSI Iterative Decoders - Electronics

192 Bibliography

automatic target recognition of ground vehicles using SAR im-agery.” Proceedings of the SPIE, vol. 3161, pp. 159–170, 1997.

[60] S. B. Cho. “Pattern recognition with neural networks combinedby genetic algorithm.”Fuzzy Sets and Systems, vol. 103, no. 2,pp. 339–347, April 1999.

[61] N. Wiberg. “Approaches to neural-network decoding of error-correcting codes.” Linköping Studies in Science and Technology,Thesis No. 425, 1994.

[62] Y.-J. Wu, P. M. Chau, and R. Hecht-Nielsen. “A supervisedlearning neural-network coprocessor for soft-decision maximum-likelihood decoding.” IEEE Transactions on Neural Networks,vol. 6, pp. 986–992, July 1995.

[63] S. H. Bang and B. J. Sheu. “A neural network for detectionof signals in communication.”IEEE Transactions on Circuitsand Systems–I: Fundamental Theory and Applications, vol. 43,pp. 644–655, Aug. 1996.

[64] L. A. Zadeh. “Fuzzy sets.”Information and Control, vol. 8,pp. 328–353, 1965.

[65] M. Wu, W.-P. Zhu, and S. Nakamura. “A hybrid fuzzy neuraldecoder for convolutional codes.” InProceedings of the IEEEInternational Symposium on Circuits and Systems, vol. 3, pp.235–238. IEEE, Monterey, CA, June 1998.

[66] N. Wiberg, H.-A. Loeliger, and R. Koetter. “Codes and iterativedecoding on general graphs.”European Transactions on Telecom-munications, vol. 6, pp. 513–525, Sept./Oct. 1995.

[67] N. Wiberg. Codes and Decoding on General Graphs. Ph.D.thesis, Univ. Linköping, Sweden, 1996.

[68] J. Hagenauer and M. Winkelhofer. “The analog decoder.” InProceedings of the IEEE International Symposium on InformationTheory, p. 145. Cambridge, MA USA, Aug. 1998.

[69] J. Hagenauer, E. Offer, C. Méasson, and M. Moerz. “Decodingand equalization with analog non-linear networks.”EuropeanTransactions on Telecommunications, vol. 10, no. 6, pp. 659–680,Nov./Dec. 1999.

[70] M. Moerz, T. Gabara, R. Yan, and J. Hagenauer. “An analog0.25µm BiCMOS tailbiting MAP decoder.” InProceedings ofthe IEEE International Solid-State Circuits Conference, pp. 356–357. San Francisco, CA, Feb. 2000.

[71] H.-A. Loeliger, M. Helfenstein, F. Lustenberger, and F. Tarköy.“Probability propagation and decoding in analog VLSI.” InPro-

Page 209: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 193

ceedings of the IEEE International Symposium on InformationTheory, p. 146. Cambridge, MA, Aug. 1998.

[72]M. Helfenstein, H.-A. Loeliger, F. Lustenberger, and F. Tarköy.“Verfahren und Schaltung zur Signalverarbeitung, insbesonderezur Berechnung einer Wahrscheinlichkeitsfunktion.” Swiss PatentApplication no. 1998 0375/98, Feb. 1998. Filed Feb. 17, 1998.

[73]H.-A. Loeliger, F. Lustenberger, F. Tarköy, and M. Helfenstein.“Decoding in analog VLSI.” IEEE Communications Magazine,vol. 37, no. 4, pp. 99–101, April 1999.

[74]F. Lustenberger, M. Helfenstein, H.-A. Loeliger, F. Tarköy, andG. S. Moschytz. “All-analog decoder for a binary (18,9,5) tail-biting trellis code.” InProceedings of the European Solid-StateCircuits Conference, pp. 362–365. Duisburg, Sep. 1999.

[75]F. Lustenberger, M. Helfenstein, H.-A. Loeliger, F. Tarköy, andG. S. Moschytz. “An analog decoding technique for digitalcodes.” InProceedings of the IEEE International Symposiumon Circuits and Systems, vol. II, pp. 428–431. Orlando, FL, June1999.

[76]M. Helfenstein, F. Lustenberger, H.-A. Loeliger, F. Tarköy, andG. S. Moschytz. “High-speed interfaces for analog, iterativedecoders.” InProceedings of the IEEE International Symposiumon Circuits and Systems, vol. II, pp. 424–427. Orlando, FL, June1999.

[77]H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarköy.“Probability propagation and decoding in analog VLSI.”, Sept.2000. Accepted for publication in IEEE Transactions on Infor-mation Theory, available athttp://www.isi.ee.ethz.ch/~lustenbe/papers/IT_2000.pdf .

[78]H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarköy.“Analog probability propagation networks — Part I: Fundamen-tals.” In preparation.

[79]F. Lustenberger, H.-A. Loeliger, M. Helfenstein, and F. Tarköy.“Analog probability propagation networks — Part II: Decoderexamples.” In preparation.

[80]C. A. Mead. “Neuromorphic electronic systems.”Proceedings ofthe IEEE, vol. 78, pp. 1629–1636, Oct. 1990.

[81]J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. Mead.“Winner-take-all networks ofO(n) complexity.” In Ad-vances in Neural Information Processing Systems 1(edited byD. Tourestzky), pp. 703–711. Morgan Kaufmann Publishers, SanMateo, CA, 1988.

Page 210: On the Design of Analog VLSI Iterative Decoders - Electronics

194 Bibliography

[82] J. P. Lazzaro. “Low-power silicon spiking neurons and axons.”In Proceedings of the IEEE International Symposium on Circuitsand Systems, vol. 5, pp. 2220–2223. San Diego, CA, 1992.

[83] J. P. Lazzaro, J. Wawrzynek, and R. P. Lippmann. “A microp-ower analog circuit implementation of hidden markov model statedecoding.” IEEE Journal of Solid-State Circuits, vol. 32, no. 8,pp. 1200–1209, Aug. 1997.

[84] R. Sarpeshkar, L. Watts, and C. A. Mead. “Refractory neuroncircuits.” Computation and Neural Systems Memo CNS TR-92-08, California Institute of Technology, Pasadena, CA, 1992.

[85] Y. Arima, M. Murasaki, T. Yamada, A. Maeda, and H. Shinohara.“A refreshable analog VLSI neural network chip with 400 neu-rons and 40K synapses.”IEEE Journal of Solid-State Circuits,vol. 27, no. 12, pp. 1854–1861, Dec. 1992.

[86] A. F. Murray, L. Tarassenko, H. M. Reekie, A. Hamilton,M. Brownlow, S. Churcher, and D. J. Baxter. “Pulsed silicon neu-ral networks: Following the biological leader.” InVLSI Desingof Neural Networks(edited by U. Ramacher and U. Rückert), pp.103–123. Kluwer Academic Publishers, 1991.

[87] R. F. Lyon and C. A. Mead. “An analog electronic cochlea.”IEEE Transactions on Acoustics, Speech and Signal Processing,vol. 36, no. 7, pp. 1119–1134, July 1988.

[88] J. Lazzaro and C. A. Mead. “Circuit models of sensory trans-duction in the cochlea.” InAnalog VLSI Iimplementation of Neu-ral Systems(edited by C. A. Mead and M. Ismail), pp. 85–101.Kluwer Academic Publishers, 1989.

[89] L. Watts, D. A. Kerns, R. F. Lyon, and C. A. Mead. “Improvedimplementation of the silicon cochlea.”IEEE Journal of Solid-State Circuits, vol. 27, no. 5, pp. 692–700, May 1992.

[90] F. Lustenberger.Cochlée artificielle en silicium. Semester project,École Polytechnique Federal de Lausanne, Lausanne, 1994.

[91] A. van Schaik, E. Fragnière, and E. Vittoz. “Improved siliconcochlea using compatible lateral bipolar transistors.” InAdvancesin Neural Information Processing Systems(edited by D. Touret-zky), pp. 671–677. MIT Press, Cambridge MA, 1996.

[92] C. A. Mead. “Adaptive retina.” InAnalog VLSI implementationof neural systems(edited by C. A. Mead and M. Ismail), pp.239–246. Kluwer Academic Publishers, 1989.

[93] M. A. Mahowald. “Silicon retina with adaptive photodetectors.”In Proceedings SPIE, Visual Information Processing: From Neu-rons to Chips, vol. 1473, pp. 52–58. 1991.

Page 211: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 195

[94]M. A. Mahowald. “Analog VLSI chip for stereocorrespondence.”In Proceedings of the IEEE International Symposium on Circuitsand Systems, vol. 6, pp. 347–350. London, 1994.

[95]W. Bair and C. Koch. “Real-time motion detection using ananalog VLSI zero-crossing chip.” InProceedings SPIE, VisualInformation Processing: From Neurons to Chips, vol. 1473, pp.59–65. 1991.

[96]A. Papoulis. Probability, Random Variables and Stochastic Pro-cesses. McGraw-Hill, third edn., 1991.

[97]F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphsand the sum-product algorithm.”, June 2000. Submitted and re-vised for publication in IEEE Trans. Inform. Theory. Available athttp://www.comm.utoronto.ca/frank/factor .

[98]R. M. Tanner. “A recursive approach to low complexity codes.”IEEE Transactions on Information Theory, vol. 27, no. 5,pp. 533–547, Sept. 1981.

[99]G. D. Forney, Jr. “Codes on graphs: Generalized state realiza-tions.”, November 1998. Draft.

[100]G. D. Forney, Jr. “Codes on graphs: Normal realizations.” InProceedings of the IEEE International Symposium on InformationTheory, p. 9. June 2000.

[101]R. L. Graham, D. E. Knuth, and O. Patashnik.Concrete Mathe-matics. Addison Wesley, New York, NY, 1989.

[102]F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphsand the sum-product algorithm.”, July 1998. Private communica-tion.

[103]L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv. “Optimal decodingof linear codes for minimizing symbol error rate.”IEEE Transac-tions on Information Theory, vol. 20, pp. 284–287, March 1974.

[104]National Semiconductors. “An application guide for Op Amps.Application Note 20.” InLinear Applications Handbook, pp.19–30. 1994.

[105]B. Gilbert. “Translinear circuits: A proposed classification.”Electronics Letters, vol. 11, no. 1, pp. 14–16, January 1975.

[106]E. Seevinck. Analysis and Synthesis of Translinear IntegratedCircuits, vol. 31 of Studies in electrical and electronic engineer-ing. Elsevier, Amsterdam, first edn., 1988. ISBN 0-444-42888-7.

[107]B. Gilbert. “Translinear circuits: An historical overview.”AnalogIntegrated Circuits and Signal Processing, vol. 9, no. 2, pp. 95–118, March 1996. Special Issue: Translinear Circuits.

Page 212: On the Design of Analog VLSI Iterative Decoders - Electronics

196 Bibliography

[108] A. B. Grebene. Bipolar and MOS Analog Integrated CircuitDesign. John Wiley & Sons, 1984. ISBN 0-471-08529-4.

[109] Y. Tsividis. Operation and Modelling of The MOS Transistor.McGraw-Hill, second edn., 1999. ISBN 0-07-116791-9.

[110] T. Serrano-Gotarredano, B. Linares-Barranco, and A. G. Andreou.“A general translinear principle for subthreshold MOS transisi-tors.” IEEE Transactions on Circuits and Systems–I: Fundamen-tal Theory and Applications, vol. 46, no. 5, pp. 607–616, May1999.

[111] K. Bult. Analog CMOS square-law circuits. Ph.D. thesis, TwenteUniversity of Technology, 1988.

[112] E. Seevinck and R. J. Wiegerink. “Generalized translinear circuitprinciple.” IEEE Journal of Solid-State Circuits, vol. 26, no. 8,pp. 1098–1102, Aug. 1991.

[113] R. J. Wiegerink.Analysis and synthesis of MOS translinear cir-cuits. Ph.D. thesis, Twente University of Technology, 1992.

[114] R. W. Adams. “Filtering in the log-domain.” Preprint 1470,presented at 63rd Audio Engineering Society Conferenc, May1979.

[115] E. Seevinck. “Companding current-mode integrator: a new cir-cuit principle for continous-time monolithic filters.”ElectronicsLetters, vol. 26, pp. 2046–2047, Nov. 1990.

[116] D. Frey. “Log domain filtering: an approach to current-modefiltering.” IEE Proceedings, Part G, vol. 140, pp. 406–416, Dec.1993.

[117] D. Perry and G. W. Roberts. “The design of log-domain filtersbased on the operational simulation of LC ladders.”IEEE Trans-actions on Circuits and Systems–II: Analog and Digital SignalProcessing, vol. 43, no. 11, pp. 763–774, Nov. 1996.

[118] Y. Tsividis. “Externally linear, time-invariant systems and theirapplication to companding signal processors.”IEEE Transactionson Circuits and Systems–II: Analog and Digital Signal Process-ing, vol. 44, no. 2, pp. 65–85, Feb. 1997.

[119] C. Toumazou, J. Ngarmnil, and T. S. Lande. “Micropower logdomain filter for electronic cochlea.”Electronics Letters, vol. 30,pp. 1839–1841, Oct. 1994.

[120] C. Enz and Y. Cheng. “MOS transistor modeling issues for RFcircuit design.”, 1999. Workshop on Advances in Analog CircuitDesign (AACD’99).

Page 213: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 197

[121]B. Gilbert. “A precise four-quadrant multiplier with subnanosec-ond response.” IEEE Journal of Solid-State Circuits, vol. 3,pp. 365–373, 1968.

[122]K. Kimura. “Some circuit design techniques using two cross-coupled pairs.” IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, vol. 41, no. 5, pp. 411–423, May 1994.

[123]C. F. Chan, H. Ling, and O. Choy. “A one volt four-quadrantanalog current mode multiplier cell.”IEEE Journal of Solid-StateCircuits, vol. 30, no. 9, pp. 1018–1019, Sept. 1995.

[124]G. Colli and F. Montecchi. “Low voltage low power CMOS four-quadrant analog multiplier for neural network applications.” InProceedings of the IEEE International Symposium on Circuitsand Systems, vol. 1, pp. 496–499. 1996.

[125]W. Gai, H. Chen, and E. Seevinck. “Quadratic-translinear CMOSmultiplier-divider circuit.” Electronics Letters, vol. 33, no. 10,pp. 860–861, May 1997.

[126]R. J. Wiegerink. “A CMOS four-quadrant analog current multi-plier.” In Proceedings of the IEEE International Symposium onCircuits and Systems, vol. 4, pp. 2244–2247. 1991.

[127]K. Kimura. “A bipolar low-voltage quarter-square multiplierwith a resistive-input based on the bias offset technique.”IEEEJournal of Solid-State Circuits, vol. 32, no. 2, pp. 258–266, Feb.1997.

[128]H. R. Mehrvarz and C. Y. Kwok. “A novel multi-input floating-gate MOS four-quadrant analog multiplier.”IEEE Journal ofSolid-State Circuits, vol. 31, no. 8, pp. 1123–1131, Aug. 1996.

[129]J. Ramirez-Angulo. “±0.75 V BiCMOS four-quadrant analogmultiplier with rail-rail input signal-swing.” InProceedings of theIEEE International Symposium on Circuits and Systems, vol. 1,pp. 242–245. 1996.

[130]K. R. Laker and W. M. C. Sansen.Design of analog integratedcircuits and systems. McGraw-Hill, third edn., 1994. ISBN 0-07-113458-1.

[131]B. Gilbert. “A monolitic 16-channel analog array normalizer.”IEEE Journal of Solid-State Circuits, vol. 19, pp. 956–963, 1984.

[132]J. Vogt, K. Koora, A. Finger, and G. Fettweis. “Comparison ofdifferent turbo decoder realizations for IMT-2000.” InProceed-ings of the Global Telecommunications Conference, vol. 5, pp.2704–2708. Rio de Janeireo, Brazil, Dec. 1999.

Page 214: On the Design of Analog VLSI Iterative Decoders - Electronics

198 Bibliography

[133] F. Poegel. “Private email communication: The resolution of sig-nals in different decoder architectures.”, July 2000. This topicwill appear in Frank’s PhD thesis which he is currently finishingat TU Dresden, Germany.

[134] COMATLAS. Datasheet of the Turbo-code codec CAS 5093.

[135] E. Säckinger and W. Guggenbühl. “A high-swing, high-impedance MOS cascode circuit.”IEEE Journal of Solid-StateCircuits, vol. 25, no. 1, pp. 289–298, Feb. 1990.

[136] P. J. Crawley and G. W. Roberts. “High-swing MOS currentmirror with arbitrarily high output resistance.”Electronics Letters,vol. 28, no. 4, pp. 361–363, Feb. 1992.

[137] D. A. Johns and K. Martin.Analog Integrated Circuit Design.John Wiley & Sons, 1997.

[138] P. R. Gray and R. G. Meyer.Analysis and Design of AnalogIntegrated Circuits. Wiley, New York, third edn., 1993.

[139] Harris Semiconductors.Datasheet of the CA3096 NPN/PNPTransistor Array, December 1997.

[140] Austria Mikrosystem International GmbH.Process Parametersand Design Rules of the0.8µm silicon BiCMOS process, 1999.See also http://www.amsint.com.

[141] A. M. Aji, G. B. Horn, and R. J. McEliece. “Iterative decodingon graphs with a single cycle.” InProceedings of the IEEE Inter-national Symposium on Information Theory, p. 276. Cambridge,MA, Aug. 1998.

[142] M. J. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers.“Matching properties of MOS transistors.”IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.

[143] T. Richardson and R. Urbanke. “The capacity of low-densityparity check codes under message-passing decoding.”, September2000. Accepted for publication in IEEE Transactions on Informa-tion Theory.

[144] P. Robertson, E. Villebrun, and P. Hoeher. “A comparison of op-timal and sub-optimal decoding algorithms in the log domain.” InProceedings of the International Conference on Communications,vol. 2, pp. 1009–1013. Seattle, WA, June 1995.

[145] R. M. Tanner. “On quasi-cyclic repeat-accumulate codes.” InProc. 37th Allerton Conf. on Communications, Control, and Com-puting. Monticello, Illinois, Sept. 1999.

[146] R. M. Tanner. “Transforming quasi-cyclic codes with sparsegraphs.”, Jan. 2000. Submitted to IEEE Trans. Inform. The-

Page 215: On the Design of Analog VLSI Iterative Decoders - Electronics

Bibliography 199

ory. Available athttp://www.cse.ucsc.edu/~tanner/pubs.html .

[147]P. O. Vontobel. “Investigation of quasi-cyclic repeat-accumulatecodes suitable for a chip implementation.” Internal report, Signaland Information Processing Laboratory, ETH Zurich, 2000.

[148]H. Traff. “Novel approach to high-speed CMOS current com-parators.” Electronics Letters, vol. 28, no. 3, pp. 310–312, Jan.1992.

[149]G. D. Forney, Jr. “The forward-backward algorithm.” InProc.34th Allerton Conf. on Communications, Control, and Computing,pp. 432–446. Allerton House, Monticello, Illinois, Oct. 1996.

[150]S. M. Moser. Investigation of Algebraic Codes of Small BlockLength using Factor Graphs. Master’s thesis, Signal- and Infor-mation Processing Laboratory, ETH Zurich, Zurich, March 1999.

[151]G. Fromherz and E. Schinca.Konvergenzverhalten des Summe-Produkt-Algorithmus in Standard-CMOS-Technologie. Master’sthesis, Signal- and Information Processing Laboratory, ETHZurich, Zurich, March 1999.

[152]E. A. Vittoz. “MOS and Bipolar transistors.” Electronics Labo-ratories Advanced Engineering Course on CMOS and BiCMOSVLSI Design ’94, Aug. 1994.

[153]A. Rodriguez-Vasquez, R. Navas, M. Delgado-Restituto, andF. Vidal-Verdu. “A modular programmable CMOS analog fuzzycontroller chip.” IEEE Transactions on Circuits and Systems–II: Analog and Digital Signal Processing, vol. 46, pp. 251–265,March 1999.

[154]M. Helfenstein, H.-A. Loeliger, F. Lustenberger, and F. Tarköy.“Verfahren zur mathematischen Verarbeitung zweier Werte ineiner elektrischen Schaltung.” Swiss Patent Application no.1999 1448/99, Feb. 1999. Filed Aug. 6, 1999.

[155]Y. Tsividis. Mixed Analog-Digital VLSI Devices and Technology:An Introduction. McGraw-Hill, 1995. ISBN 0-07-065402-6.

[156]S. M. Sze. Physics of Semiconductor Devices. John Wiley &Sons, New York, second edn., 1982. ISBN 0-471-09837-X.

[157]S. M. Sze.Semiconductor Devices, Physics and Technology. JohnWiley & Sons, New York, 1985. ISBN 0-471-83704-0.

[158]S. Wang. Fundamentals of Semiconductor Theory and DevicePhysics. Prentice Hall Series in Electrical and Computer Engi-neering. Prentice-Hall, Englewood Cliffs, NJ, 1989. ISBN 0-13-344425-2.

Page 216: On the Design of Analog VLSI Iterative Decoders - Electronics

200 Bibliography

Page 217: On the Design of Analog VLSI Iterative Decoders - Electronics

Curriculum Vitæ

I was born in Lucerne, Switzerland, on May 31, 1969. After fin-ishing high-school at theKantonssschule Alpenquai, Lucerne,in 1989 (Matura Typus C) and a one-year interruption for mil-itary services, I enrolled in Micro Engineering at the SwissFederal Institute of Technology EPF Lausanne. I received theDiploma (M.Sc.) degree in Micro Engineering (Ing. en Mi-crotechnique dipl. EPFL) in 1995 for the design, implementa-tion, and testing of an artificial silicon cochlea in CMOS tech-nology. In April 1995 I joined the Signal and Information Pro-cessing Laboratory (ISI) of ETH Zurich, where I worked as ateaching assistant for two years. During this time, I attended thepost-diploma program in Information Technology which I com-pleted with a Dipl. NDS degree in Information Technology in2000. From autumn 1997 to summer 2000 I participated as re-search assistant at the interdisciplinary research project ‘Designof Analog VLSI Iterative Decoders’ (DAVID). Beside the workat the DAVID project presented in this dissertation, my maininterests include general analog and bio-inspired circuit design,micro-systems design, system-oriented VLSI design and ana-log design automation.


Recommended