+ All Categories
Home > Documents > EDA Tech Forum Journal: June 2009

EDA Tech Forum Journal: June 2009

Date post: 26-Mar-2016
Category:
Upload: rtc-group
View: 236 times
Download: 4 times
Share this document with a friend
Description:
EDA Tech Forum Journal: June 2009
Popular Tags:
56
Embedded ESL/SystemC Digital/Analog Implementation Tested Component to System Verified RTL to Gates Design to Silicon The Technical Journal for the Electronic Design Automation Community Volume 6 Issue 3 June 2009 www.edatechforum.com INSIDE: DAC putting users on fast track Refining the embedded OS soup Making virtual prototypes real Highlighting computational litho Mastering antenna design basics
Transcript
Page 1: EDA Tech Forum Journal: June 2009

EmbeddedESL/SystemCDigital/Analog ImplementationTested Component to SystemVerified RTL to GatesDesign to Silicon

The Technical Journal for the Electronic Design Automation Community

Volume 6 Issue 3 June 2009www.edatechforum.com

INSIDE:DAC putting users on fast track

Refining the embedded OS soup

Making virtual prototypes real

Highlighting computational litho

Mastering antenna design basics

Page 2: EDA Tech Forum Journal: June 2009

COMMON PLATFORM TECHNOLOGY

Chartered Semiconductor Manufacturing, IBM and Samsung provide you the access to innovation you need for industry-changing 32/28nm high-k/metal gate (HKMG) technology with manufacturing alignment, ecosystem design enablement, and fl exibility of support through Common Platform technology.

Collaborating with some of the world’s premier IDMs to develop leading-edge technology as part of a joint development alliance, Chartered, IBM and Samsung provide access to this technology as well as qualifi ed IP and robust ecosystem offerings to help you get to market faster, with less risk and more choice in your manufacturing options.

Visit www.commonplatform.com today to fi nd out how you can get your access to innovation.

www.commonplatform.com

Industry availability of real innovation in materials science, process technology and manufacturing for differentiated customer solutions.

To Find Out More, Visit Us at These Upcoming EDA TF Locations:

August 25 - Hsinchu, TaiwanAugust 27 - Seoul, Korea

September 1 - Shanghai, ChinaSeptember 3 - Santa Clara, CA., USA

September 4 - Tokyo, JapanOctober 8 - Boston, MA., USA

Untitled-4 1 5/18/09 11:07:23 AM

Page 3: EDA Tech Forum Journal: June 2009

3EDA Tech Forum June 2009

contents

EDA Tech Forum Journal is a quarterly publication for the Electronic Design Automation community including design engineers, engineering managers, industry executives and academia. The journal provides an ongoing medium in which to discuss, debate and communicate the electronic design automation industry’s most pressing issues, challenges, methodologies, problem-solving techniques and trends.

EDA Tech Forum Journal is distributed to a dedicated circulation of 50,000 subscribers.

EDA Tech Forum is a trademark of Mentor Graphics Corporation, and is owned and published by Mentor Graphics. Rights in contributed works remain the copyright of the respective authors. Rights in the compilation are the copyright of Mentor Graphics Corporation. Publication of information about third party products and services does not constitute Mentor Graphics’ approval, opinion, warranty, or endorsement thereof. Authors’ opinions are their own and may not reflect the opinion of Mentor Graphics Corporation.

EDA Tech ForumVolume 6, Issue 3June 2009

< TECH FORUM >

18Embedded

Embedded software virtualization comes of ageLynuxWorks

24ESL/SystemC

Using TLM virtual system prototype for hardware and software validationMentor Graphics

28ESL/SystemC

Bridging from ESL models to implementation via high-level hardware synthesisCoFluent Design

34Verified RTL to Gates

Parallel transistor-level full-chip circuit simulationUniversity of California, San Diego

40Digital/Analog Implementation

Reducing system noise with hardware techniquesTexas Instruments

46Design to Silicon

Computational scaling: implications for designIBM Microelectronics

50Tested Component to System

Antenna design considerationsLS Research

< COMMENTARY >

6Start Here

Why DAC and DATE still matterNetworking and interaction have a critical value during a downturn.

8Conference

At the sharp endDAC 2009 will see the launch of a track dedicated to real-world design.

12USB Focus

Connecting to embedded designWe review some of the sector’s increasingly common USB implementations.

Analysis

Nice in NiceWe review the major trends from DATE 2009 - multicore, aggregation, ESL and more.Analysis

Kermit would blushJust what is a ‘green’ design strategy and what should it contain.Profile

Renaissance manSaul Griffith isn’t just trying to save the planet - he also proves engineers can match charisma with nous.All content now online at www.edatechforum.com. Register and read.

Page 4: EDA Tech Forum Journal: June 2009

4

team< EDITORIAL TEAM >

Editor-in-ChiefPaul Dempsey +1 703 536 1609 [email protected]

Managing EditorMarina Tringali +1 949 226 2020 [email protected]

Copy EditorRochelle Cohn

< SALES TEAM >Advertising ManagerStacy Mannik +1 949 226 2024 [email protected]

Advertising ManagerLauren Trudeau +1 949 226 2014 [email protected]

Advertising ManagerShandi Ricciotti +1 949 573 7660 [email protected]

Untitled-9 1 6/1/09 3:35:03 PM

< CREATIVE TEAM >Creative DirectorJason Van Dorn [email protected]

Art DirectorKirsten Wyatt [email protected]

Graphic DesignerChristopher Saucier [email protected]

< EXECUTIVE MANAGEMENT TEAM >PresidentJohn Reardon [email protected]

Vice PresidentCindy Hickson [email protected]

Vice President of FinanceCindy Muir [email protected]

Director of Corporate MarketingAaron Foellmi [email protected]

Page 5: EDA Tech Forum Journal: June 2009

CONVERGENCE

Power. Noise. Reliability.From prototype to signoff, we provide a unified CPS platform to

manage noise and co-optimize your power delivery network across

the system. Mitigate design failure risk, reduce system cost, and

accelerate time-to-market with Apache.

visit us at Booth #722

www.apache-da.com

Untitled-3 1 6/3/09 10:39:54 AM

Page 6: EDA Tech Forum Journal: June 2009

< COMMENTARY > START HERE6

start hereWhy DAC and DATE still matterOur preview of the forthcoming Design Automation Conference concentrates on the User Track that makes its debut there next month. Given that it shares many of the objectives behind this journal, that is hardly surprising. However, it is not the only aspect of DAC that merits investigation.

Also in the program, conference chair Dr. Andrew Kahng and Dr. Juan-Anto-nio Carballo of IBM are holding a workshop to look into developing a new generation of roadmap for EDA. Certainly, those that already exist are not perfect—although recasting the concept will not be easy either.

EDA is, by its nature, an industry that responds to clients who hold their R&D cards very closely. Meanwhile, classical scaling is broken and much of what will comprise the worlds of architecture and process within 10 years is up for debate. Nobody wants to force themselves into making big bets based on limited visibility and certain volatility.

The answer of course is for any new roadmapping activity to be as inclu-sive as possible. Again, a further question arises as to whether or not that can be best achieved within the existing ITRS framework or if it requires one

that is separate but complementary. However, where Kahng and Carballo may be right is in deciding that it is time to put these issues back up for debate.

In some respects, this attempt to broaden the scope of the roadmapping debate also reflects a trend that was seen in France earlier this spring at the Design Auto-mation and Test in Europe (DATE) conference, and it feeds into a bigger issue as to why conferences still matter, even in as severe a downturn as this.

DATE has been something of a whipping boy of late, and it is probably fair to say that its days as a top-level commercial exhibition are over. However, it always has been a very strong technology conference, and that was still true in 2009. Indeed, full delegate attendance (i.e., those there for the papers, not the stands) barely fell compared with 2008, despite global economic woes.

Going back to DAC’s new User Track, DATE introduced invited industrial pa-pers a few editions ago. It has also, as many at DAC admit, typically been a step ahead in how it addresses ESL. And this year, its debates, sessions and discus-sions were impressive in how they genuinely encompassed the views of both users and various forms of vendors.

Solutions today are far more likely to come from consensus than from one company’s eureka moment. It is here that conferences like DATE and DAC (and ISQED, ICCAD and others) really come into their own, even though many may be tempted to slash their companies’ human presence.

Meanwhile, our report from DATE appears in the ‘Editor’s Cut’ extended edi-tion of EDA Tech Forum, just posted at our website alongside coverage of the tricky area that is green design. Find out more at www.edatechforum.com.

Paul DempseyEditor-in-Chief

Page 7: EDA Tech Forum Journal: June 2009

Visit www.microchip.com/easy for a 20% discount off of our most popular 8-bit development tools!

8 Reasons why Microchip is the Worldwide Leader in 8-bit Microcontrollers:1) Broad portfolio of more than 250 8-bit PIC® microcontrollers

2) Comprehensive technical documentation, free software and hands-on training

3) MPLAB® IDE is absolutely free, and supports ALL of Microchip’s 8, 16, and 32-bit microcontrollers

4) Low-cost development tools help speed up prototyping efforts

5) Easy migration with pin and code compatibility

6) Shortest lead times in the industry

7) The only supplier to bring USB, LCD, CAN, Ethernet, and capacitive touch sensing to the 8-bit market

8) World-class, 24/7 technical support

GET STARTED TODAY!

The

Mic

roch

ip n

ame

and

logo

, the

Mic

roch

ip lo

go, M

PLAB

and

PIC

are

reg

iste

red

trad

emar

ks o

f M

icro

chip

Tec

hnol

ogy

Inco

rpor

ated

in t

he U

.S.A

. an

d ot

her

coun

trie

s. ©

2009, M

icro

chip

Tec

hnol

ogy

Inco

rpor

ated

.

All R

ight

s R

eser

ved.

Why Do More Design Engineers Use PIC® Microcontrollers?

Serial EEPRO

Ms

Analog

Digital Signal

ControllersMicrocontrollers

www.microchip.com/easy

Untitled-6 1 6/1/09 3:30:45 PM

Page 8: EDA Tech Forum Journal: June 2009

< COMMENTARY > CONFERENCE8

The Design Automation Conference (DAC) returns to San Francisco’s Moscone Center, July 26th-31st, and it is hoped that its proximity to Silicon Valley will see attendances hold up well even in tough times.

However, the organizers are looking to more than just geography to guarantee continued interest in chip de-sign’s main annual gathering. This 46th edition of DAC will also see the return of the CEO panel featuring Lip-Bu Tan of Cadence Design Systems, Wally Rhines of Mentor Graphics, and Aart de Geus of Synopsys. There will also be keynotes from technologists such as William Dally of graphics powerhouse Nvidia, and Fu-Chieh Hsu of lead-ing foundry TSMC. And there will be 29 panels spread across the main conference and the Pavillion on the exhi-bition floor, including one that will have been voted into being by attendees (its topic was still to be determined as the magazine went to press).

Add DAC’s typically strong technical program and you are already at the point where the conference appears to of-fer something for everyone. Or, actually, does it? One fea-ture being launched this year has been explicitly designed to meet a need that both users and vendors have wanted the conference to address in more detail: the User Track.

The new feature has been put together by Leon Stok, di-rector of EDA for the IBM Systems and Technology Group, and Soha Hassoun, associate professor in the Computer Sci-ence Department of Tufts University.

Stok explains the objective, “There is a strong desire among engineers to get their hands on more information about tools, flows and their use to solve the problems they face every day. But at DAC, the main parts of the technical program have tended to be more academic and focused on research. Meanwhile on the floor, the vendors are adver-tising their tools in a very specific way. What was missing was something in the middle. That is the void we are try-ing to fill.”

The idea is not entirely new. DAC’s equivalent across the Atlantic, Design Automation and Test in Europe, has also re-cently launched a track dedicated to industrial case studies and methodologies in everyday use. The initiative proved popular there, and, based on the response from potential contributors, the same looks likely to happen in the USA.

“We have had 117 submissions, and we expected only 50 to 70 in the first year, so it’s gone a lot further than expec-tations,” says Hassoun. From that, the track has been di-vided ultimately into 16 front-end presentations (overseen by Hassoun) and 26 back-end presentations (overseen by Stok), plus a packed poster session (see box opposite and on p. 10).

Of course, some may say that DAC is merely duplicat-ing what the major vendors achieve via their user group meetings. These have also long been based around practical case studies and the so-called ‘war stories’ of how tool us-ers overcame a particular design challenge. At a time when some EDA players have been pushing more investment into these events, you might even suggest that DAC’s User Track would be the source of some tension.

“That’s just not the case, though,” says Stok. “The ven-dors also want there to be a forum at DAC for the exchange of very practical design information in a technical setting that is decoupled from the marketing message.”

Indeed, evidence of that comes in the form of the User Track’s sponsor, Cadence, the vendor with perhaps the great-est reputation prior to this DAC for wanting to ‘go it alone’.

Hassoun also says that DAC’s User Track can fill another vacuum that single-vendor shows cannot. “The vendor meet-ings are useful but also, by definition, more focused. And they don’t offer the chance to compare, whereas we have pa-pers from designers working across various flows and with combinations of tools from different sources. It isn’t always the case that you will take everything from one supplier. And again, we have found a number of submissions where the vendor encouraged the user to write the paper up.”

There are also some general trends emerging. At the front-end, Hassoun sees part of the function as bridging the gap between the stronger adoption of ESL strategies seen in Europe and Japan and the more traditional design strategies that are now beginning to move in the same direc-tion in North America. To that end, she is also involved in a workshop of the fast-developing ESL concept of virtual platforms that takes place on Wednesday at this year’s DAC (Room 301).

DAC’s new User Track aims to add a more pragmatic flavor to chip design’s largest conference

At the sharp end

Continued on page 10

Page 9: EDA Tech Forum Journal: June 2009

9EDA Tech Forum June 2009

Design Automation Conference 2009User Track ProgramThe User Track runs from Tuesday, July 28 until Thursday, July 30. All the main sessions take place in Room 132 at the Moscone Center.

There is also a separate Poster Session on Wednesday, July 29 from 1.30pm-3.00pm in the Concourse Area. Details on this can be found on page 10.

All details were correct as EDA Tech Forum went to press, but, as ever, may be subject to late change or cancellation. For the most accurate version of the agenda for this and all other DAC events, check the website, www.dac.com.

Tuesday, July 28

10.30am-12.00pm Robust Design and Test

1.1 Electromagnetic Interference Reduction on an Automotive Microcontroller, STMicroelectronics

1.2s Power Integrity Sign-Off Flow Using CoolTime and PrimeTime-SI—Flow and Validation, Aptina Imaging

1.3s Improving Parametric Yield in DSM ASIC/SOC Design, Samsung

1.4 Low-Power Test Methodology, STMicroelectronics

2.00pm-4.00pm Practical Physical Design

2.1 Automated Pseudo-Flat Design Methodology for Register Arrays, Intel

2.2 Qualcomm DSP Semi-Custom Design Flow: Leveraging Place and Route Tools in Custom Circuit Design, Qualcomm

2.3s Auto ECO Flow Development for Functional ECO Using Efficient Error Rectification Method Based on Conformal, Intel

2.4s Monte Carlo Techniques for Physical Synthesis Design Convergence Exploration, Intel

2.5s Tortoise: Chip Integration Solution, STMicroelectronics2.6s ASIC Clock Distribution Design Challenges, Intel

4.30pm-6.30pm Verification: A Front-End Perspective

3.1 Interactive 2-D Projection Cross Coverage Viewer for Coverage Hole Analysis, ClueLogic, Verifore

3.2 Verification of Power Management Protocols Through Abstract Functional Modeling, Intel, Ipflex

3.3 Design Flow for Embedded System Device Driver Development and Verification, Cadence Design Systems, Virtutech

Wednesday, July 29th

9.00am-11.00am Timing Analysis in the Real World

4.1 Design of a Single-Event Effect Fault Tolerant Micro-Processor for Space Using Commercial EDA Tools, European Space Agency, Atmel

4.2 SSTA and Its Application to SPARC 64 Processor Design, Fujitsu4.3s A Hierarchical Transistor and Gate-Level Statistical Timing

Flow for Micro-Processor Designs, IBM4.4s Unifying Transistor- and Gate-Level Timing Through the Use

of Abstraction, IBM4.5s The Automatic Generation of Merged-Mode Design

Constraints, Texas Instruments, FishTail Design Automation4.6s Modeling Clock Network Variation in Timing Verification, Sun

Microsystems

1.30pm-3.00pm Poster Session and Ice Cream Social

See page 10

3.00pm-4.00pm Toward Front-End Design Productivity

6.1 Unified Chip/Package/Board Codesign Flow for Laminate, Leadframe, and Wafer-Level Packages in a Distributed Design Environment, Infineon Technologies, Cadence Design Systems, CISC Semiconductor Design+Consulting

6.2s Fast FPGA Resource Estimation, Xilinx6.3s Assessing Design Feasibility Early with Atrenta’s 1Team-

Implement SOC, STMicroelectronics, Atrenta

4.30pm-6.00pm Front-End Development: Embedded Software and Design Exploration

7.1s Applying Use Cases to Microcontroller Code Development, Cypress Semiconductor

7.2s Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor, University of Thessaly

7.3s An ‘Algorithm to Silicon’ ESL Design Methodology, STMicroelectronics7.4s Necessary but Not Sufficient: Lessons and Experiences with

High-Level Synthesis Tools, Texas Instruments7.5 Switching Mechanism in Mixed TLM-2.0 LT/AT System, Intel

Thursday, July 30

9.00am-11.00am Power Analysis and IP Reuse

8.1 Dynamic Power Analysis for Custom Macros Using ESP-CV, Qualcomm

8.2 Power Supply and Substrate Noise Analysis; Reference Tool Experience with Silicon Validation, Kobe University, ARTEC, STARC, Apache Design Solutions

8.3s Modeling and Design Challenges for Multicore Power Supply Noise Analysis, IBM

8.4s Dynamic Power Noise Analysis Method for Memory Designs, Samsung

8.5s Hard IP Reuse in Multiple Metal Systems SOCs, Texas Instruments, Freescale Semiconductor

8.6s Apache Redhawk di/dt Mitigation Method in Power Delivery Design and Analysis, Intel

2.00pm-4.00pm Front-End Power Planning and Analysis

9.1 Power Library Pre-Characterization Automation, NEC, NEC HCL ST9.2s Chip - Package - PC Board Codesign: Applying a Chip Power

Model in System Power Integrity Analysis, Cisco Systems9.3s New SOC Integration Strategies for Multi-Million Gate, Multi-

Power/Voltage Domain Designs, Texas Instruments, Atrenta9.4 PETRA: A Framework for System-Level Dynamic Thermal and

Power Management Exploration, Intel9.5 ALPES: Architectural-Level Power Planning and Estimation,

STMicroelectronics, ST-NXP Wireless, ST-Ericsson

4.30pm-6.00pm Advances in Analog and Mixed-Signal Design

10.1 A Schematic Symbol Library for Collaborative Analog Circuit Development Across Multiple Process Technologies, Stanford University, NetLogic Microsystems, Rambus

10.2 A Mixed-Signal/MEMS CMOS Codesign Flow with MEMS-IP Publishing/Integration, National Chaio Tung University

10.3s Substrate Noise Isolation Characterization in 90nm CMOS Technology, Magwel, NXP Semiconductors

10.4s An Integrated Physical-Electrical Design Verification Flow, Mentor Graphics, SySDSoft

Page 10: EDA Tech Forum Journal: June 2009

< COMMENTARY > CONFERENCE10

Among back-end issues, Stok says that timing analysis and dealing with variability are big issues for users, and that parametric yield is also gaining increasing traction. “Also there is a big drive in enabling DFM to try to make appropriate timing and electrical information available to design teams so that they react as early as possible.”

The goals are clear. Get people to show up. Get them talk-ing about the User Track. And have them leave DAC with the sense that they have pulled in a lot of useful information for day-to-day use.

The last point Hassoun and Stok make is that this first run will not be perfect—they never are. By taking this step, DAC begins a process of refinement that will only work if attendees offer detailed feedback about both the good and the bad.

“We’re looking at all the ways we can do that. Some may be formal, some not. There is an ice cream social and we hope to meet a lot of people there. But we’ll be around all week—and we need people to come up to us and tell us what they think,” says Stok.

The papers below will be presented at a poster session and ice cream social that will supplement the main User Track sessions, from 1.30pm-3.00pm on Wednesday, July 29. The event will be held on the Concourse Level of the Moscone Center.

The track’s organizers are also encouraging DAC delegates to attend the social to provide feedback on this new section of the conference program. However, if you are unable to attend this or cannot otherwise contact Leon Stok and Soha Hassoun during the conference, you can also email Soha at [email protected].

User Track – Front End

3-D Visualization of Integrated Circuits in the Electric VLSI Design System, Sun Microsystems

Automatic Generation, Execution and Performance Monitoring of a Family of Multiprocessors on Large Scale Emulator, ENSTA, EVE

C-Based Hardware Design Using AutoPilot Synthesizing MPEG-4 Decoder onto Xilinx FPGA, University of California - Los Angeles, AutoESL Design Tech

C-Based High-Level Synthesis of a Signal Processing Unit Using Mentor Graphics Catapult C, University of Tübingen, Robert Bosch

Design and Verification Challenges of ODC-Based Clock Gating, PwrLite

Effective Debugging Chip-Multiprocessor Design in Acceleration and Emulation, Chinese Academy

Enabling IP Quality Closure at STMicroelectronics with VIP Lane, STMicroelectronics, Satin IP Technologies

Formal Verification Based Automated Approaches to System-On-Chip DFT Logic Verification, Texas Instruments

Interactive Code Optimization for Dynamically Reconfigurable Architecture, Toshiba

Power Gated Design Optimization and Analysis with Silicon Correlation Results, Intel

SystemC: A Complete Digital System Modeling Language: A Case Study, Rambus

Transforming Simulators into Implementations, University of Texas - Austin

Using Algorithmic Test Generation in a Constrained Random Test Environment, Ericsson

Visualizing Debugging Using Transaction Explorer in SOC System Verification, Marvell Semiconductor

User Track – Back End

A Generic Clock Domain Crossing Verification Flow, Advanced Micro Devices

A Simple Design Rule Check for DP Decomposition, National Taiwan University

Algorithm for Analyzing Timing Hot-Spots, eInfochips

An On-Chip Variation Monitor Methodology Using Cell-Based P&R Flow, Faraday Technology

Application and Extraction of IC Package Electrical Models for Support of Multi-Domain Power and Signal Integrity Analysis, Freescale Semiconductor, Sigrity

Applications of Platform Explorer, Integrator and Verifier in SOC Designs, Samsung

Assertion Based Formal Verification in SOC Level, Wipro Technologies

Attacking Constraint Complexity in Verification IP Reuse, Cisco Systems, Synopsys

Automated Assertion Checking in Static Timing with IBM ASICs, IBM

Case Study of Diagnosing Compound Hold-Time Violations, Realtek Semiconductor, Mentor Graphics

Design Profiling – Modeling the ASIC Design Process, IBM

Enhanced SDC Support for Relative Timing Designs, University of Southern California, University of Utah

Hold Time ECO for Hierarchical Design, Global Unichip, Dorado Design Automation

Improving the Automation of the System in Package (SIP) Design Environment via a Standard and Open Data Format, IBM

Interconnect Explorer: A High-Level Power Estimation Tool for On-Chip Interconnects, Université de Bretagne

Managing Information Silos: Reducing Project Risk through Multi-Metric Tracking, Achilles Test Systems

Net-List Level Test Logic Insertion: Flow Automation for MBIST & Scan, Broadcom

Physical Implementation of Retention Cell Based Design, Atoptech

Sequential Clock Gating Optimization in GPU Designs with PowerPro CG, Advanced Micro Devices

Soft-Error-Rate Estimation in Sequential Circuits Utilizing a Scan ATPG Tool, Renesas Technology, Hitachi

Solving FPGA Clock-Domain Crossing Problems: A Real-World Success Story, North Pole Engineering, Honeywell International, Mentor Graphics

Static Timing Analysis of Single Track Circuits, University of Southern California, Sun Microsystems, Intel

Timing Closure in 65-Nanometer ASICs Using Statistical Static Timing Analysis Design Methodology, IBM

Using STA Information for Enhanced At-Speed ATPG, Freescale Semiconductor, Mentor Graphics

Page 11: EDA Tech Forum Journal: June 2009

320,000,000 MILES, 380,000 SIMULATIONS AND ZERO TEST FLIGHTS LATER.

THAT’S MODEL-BASED DESIGN.

Accelerating the pace of engineering and science

After simulating the final descent of the Mars Rovers under thousandsof atmospheric disturbances, theengineering team developed andverified a fully redundant retro firing system to ensure a safetouchdown. The result—two successful autonomous landingsthat went exactly as simulated. To learn more, go tomathworks.com/mbd

©2005 The MathWorks, Inc.

Untitled-4 1 5/6/08 11:40:32 AM

Page 12: EDA Tech Forum Journal: June 2009

< COMMENTARY > USB FOCUS12

Universal Serial Bus (USB) is a connectivity specification that provides ease-of-use, expandability and good perfor-mance for the end-user. It is one of the most successful in-terconnects in computer history. Originally released in 1995 for PCs, it now is expanding into use by embedded systems and is replacing older interfaces such as serial and parallel interfaces as the preferred communication link. This article has been written as a tutorial on some of the many ways in which USB can be employed in embedded systems.

USB is not a peer-to-peer protocol like Ethernet. One USB de-vice (the ‘USB host’) acts as the master, while others (‘USB de-vices’ or ‘USB peripherals’) act as slaves. The host initiates all bus transfers. Up to 127 USB devices can be connected to one USB host via up to six layers of cascaded hubs. For embedded sys-tems, it is very unusual to have more than one hub. In most cases, one USB device connects directly to one USB host with no hub.

A USB host requires a USB host controller and USB host software. The latter is layered from the bottom up as follows: (1) USB host controller driver, (2) USB host stack, and (3) USB class driver. The first layer controls the USB host controller (i.e., it reads and writes registers in the controller and it trans-fers data). The second layer implements the USB protocol and thus controls connected USB devices. The third layer is device-aware and communicates with and controls the actual device (e.g., disk drive, HID human interface device, CDC communication device, etc.). One USB host stack can support multiple class drivers simultaneously. In an embedded sys-tem there is usually only one USB host controller.

A USB device requires a USB device controller and USB device software. The latter is layered from the bottom up as follows: (1) USB device controller driver, (2) USB device stack, and (3) USB function driver. The first layer controls the USB device controller (i.e., it reads and writes registers in the controller and it transfers data). The second layer implements the USB protocol and thus communicates with the USB host stack. The third layer commu-nicates with the class driver in the host and provides the actual device control. It makes the embedded unit look like a USB disk drive, HID, serial device, or another defined type. One USB de-vice stack can support more than one function driver simultane-ously, through the composite device framework.

An attractive feature of USB is that it is plug-and-play, which means that a USB device will be automatically recognized shortly after being connected to a host. Also, cabling is simple: there is an A receptacle/plug pair for the host-end and a B receptacle/plug pair for the device-end. All hosts and devices adhere to this standard, except On The Go (OTG) devices, which are designed for but not yet widely used in embedded systems.

What follows are descriptions of six examples of how USB can be utilized in an embedded system. Where performance information is given, a “medium performance processor” is assumed to be a 50-80 MHz ARM7 or ColdFire.

1. PC to device via USB serialMost new PCs and laptops do not provide serial or parallel ports; they have been replaced with USB ports. Hence, connecting a PC to an embedded device via its RS-232 port is no longer pos-sible. As part of their USB host stacks, popular PC OSs include Communication Class Drivers (CDCs). As shown in Figure 1, if

Yingbo Hu and Ralph Moore describe how to implement USB across a range of common functions.

Connecting to embedded design

PC

PC Application

Serial Port API

USB CDC Class

USB Host Stack

USB HostController Driver

USB HostController

Embedded Device

USB DeviceController

EmbeddedApplication

USB Serial/CDCFunction Driver

USB Device

USB DeviceController Driver

Software

Hardware

USB Cable

FIGURE 1 PC to device via USB serial

Source: Micro Digital

Page 13: EDA Tech Forum Journal: June 2009

13EDA Tech Forum June 2009

the embedded device has a Serial/CDC function driver, then it will look like a serial device to the PC. When it is plugged in, it will be recognized by the PC OS as a serial device, and it will be automatically assigned a COM port number. Then, terminal em-ulators and other serial applications can communicate with the embedded device without any modification. This use of USB is particularly good for control and transferring serial data. Trans-fer rates of 800 KB/sec are feasible at full speed and 2,500 KB/sec at high speed for medium speed embedded processors.

2. PC to device via USB diskAnother way of connecting a PC or laptop to an embedded de-vice is for the embedded device to emulate a USB disk drive. Popular PC operating systems have built-in USB mass storage

class drivers that interface their file systems to the USB host stack, as shown on the left of Figure 2. Adding a mass storage function driver to the embedded device enables it to look like a USB disk drive to the PC. The figure also shows how a resi-dent flash memory can be accessed as a flash disk via the USB function driver connected to its flash driver.

Any other type of storage media could be connected, instead, via its own driver. When the embedded device is plugged into a PC, it is recognized as a disk drive and automatically assigned a drive letter. Thereafter, files can be dragged and dropped to and from the embedded device as though it were a disk drive. In this example, a PC application could read and write the files on the flash disk. Note that the embedded application uses a local file system to access the flash disk itself. This file system must, of course, be OS-compatible. An important concept to

PC

Web Browser

TCP/IP Stack

RNDIS Over USB

PC USB Host Stack

PC USB HostController Driver

USB HostController

Embedded Device

USB DeviceController

Web Server

TCP/IP Stack

USB RNDISFunction Driver

Embedded USBDevice Stack

USB DeviceController Driver

Software

Hardware

USB Cable

FIGURE 3 Web server access via USB RNDIS

Source: Micro Digital

USB Cable

PC Embedded Device

Software

Hardware

PC Application

Mass StorageClass Driver

File System

PC USBHost Stack

USB HostController Driver

USB HostController

USB Mass StorageFunction Driver

Embedded USBDevice Stack

USB DeviceController Driver

USB DeviceController

Flash I/ODriver

EmbeddedApplication

Flash Driver

File System

FlashMemory

FIGURE 2 PC to device via USB disk

Source: Micro Digital

Continued on next page

Page 14: EDA Tech Forum Journal: June 2009

< COMMENTARY > USB FOCUS14

understand is that within the PC, the PC’s file system is used and the embedded device merely looks like another disk drive to it. This use of USB would be particularly good for uploading acquired data files or downloading new versions of code files.

3. Web server access via USB RNDISRNDIS (Remote Network Driver Interface Specification) per-mits emulating Ethernet over USB. It is not part of the USB specification, but some popular PC OSs, such as Windows and Linux, support it. As shown in Figure 3 (p. 13), adding an RNDIS function driver to an embedded device allows for the interfacing of its USB device stack to its TCP/IP stack, which in turn, connects to its Web server. When the embedded device is plugged into a PC, its browser can connect to the Web server in the embedded device. Hence, it is possible to use a browser to access an embedded device’s Web server, even when there is no Ethernet connection or it is difficult to access. This can be con-venient for field troubleshooting or configuration using a lap-top. The same information accessed via the network to which the embedded device is connected can be accessed via USB.

4. USB multi-port serial device with UART and other connectionsIn Example 1 we examined the implementation of one serial channel over a USB connection. However, it is actually pos-

sible to run multiple, independent serial channels over one USB connection. This is practical because of the higher speed of USB compared with other similar technologies. Figure 4 shows the block diagram. The CDC ACM class driver in the PC may not be the native driver that comes with the PC OS. A special driver may need to be installed. This driver presents multiple virtual COM ports to the PC application and it multi-plexes the corresponding serial channels over the USB connec-tion. In the embedded device, the USB CDC function driver de-multiplexes the serial channels. Note that, in this example, one channel goes to an application task, which might return certain internal information, and the other two serial channels connect to actual UARTs. The application in the PC can com-municate with physical devices, (e.g., modem, barcode reader, printer, etc.) connected to the UARTs as though they were con-nected directly to serial ports on the PC. For example, with a medium performance processor and full speed USB, a total bandwidth of 200KB/sec is achievable. This would support fifteen 115.2Kbaud channels, with capacity left over.

USB Cable

PC Embedded Device

Soft

war

eH

ard

war

e

PC App 1 PC App 2

MassStorageDriver

CDC-ACM

Driver

File SystemAPI

RS232API

USB DeviceFramework

PC USB Host Stack

PC USB Host Controller Driver

USB HostController

USB Mass StorageFunction Driver

USB Serial PortFunction Driver

File System

USB Composite Device Framework

Embedded USB Device Stack

USB Device Controller Driver

Embedded Application

Communication

USB Device Controller

FIGURE 5 USB composite devices

Source: Micro Digital

USB Cable

PC Embedded Device

Soft

war

eH

ard

war

e

PC Application

SerialPort 3

SerialPort 1

SerialPort 2

PC CDC-ACM Driver

PC USBHost Stack

PC USB HostController Driver

USB HostController

Application Task

Channel 3

Channel 1

Channel 2

USB CDCFunction

Driver

EmbeddedUSB Device

Stack

USB DeviceController

Driver

USB DeviceController

UART 1Driver

UART 2Driver

ExternalDevice 1

ExternalDevice 2

FIGURE 4 USB multi-port serial device with UART and other connections

Source: Micro Digital

Continued on page 16

Page 15: EDA Tech Forum Journal: June 2009

Untitled-3 1 6/5/09 2:43:30 PM

Page 16: EDA Tech Forum Journal: June 2009

< COMMENTARY > USB FOCUS16

5. USB composite devicesIt is actually possible for one USB device to look like multiple USB devices to a USB host simultaneously. This is made pos-sible by the USB Composite Device Framework, as shown in Figure 5 (p. 14). The USB host (a PC in this example) will rec-ognize each USB device within the embedded device and load its corresponding class driver. The device looks like a USB disk and a serial port. Note that both function drivers are present. This example is a fairly common case that is supported by PC OSs. This particular one would support an application in the PC transferring files, and another application allowing an op-erator to control or configure the embedded device.

6. USB thumb drive supportFigure 6 shows how an embedded device can access a USB thumb drive (aka USB memory stick). A mass storage class driver fits between the USB host stack and the local file system in the embedded device. It creates the usual read/write logi-cal address API expected of media drivers. Naturally the file system must be OS-compatible in order to exchange thumb drives with a PC. Thumb drives are commonly used to trans-fer data from embedded devices to PCs or to update firmware or configuration settings and tables in embedded devices.

Yingbo Hu is R&D embedded software engineer at Micro Digital, and Ralph Moore is the company’s president.

Untitled-15 1 6/1/09 3:51:19 PM

Embedded Device

EmbeddedApplication

File System(FAT or Other)

USB HostController Driver

USB HostController USB Thumb Drive

Software

Hardware

USB Mass StorageClass Driver

Embedded USBHost Stack

FIGURE 6 USB thumb drive support

Source: Micro Digital

Page 17: EDA Tech Forum Journal: June 2009

intelligent, connected devices.

15 billion connected devices by 2015.* How many will be yours? intel.com/embedded

Choose your architecture wisely.

* Gantz, John. The Embedded Internet: Methodology and Findings, IDC, January 2009.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. © 2009 Intel Corporation. All rights reserved.

Untitled-1 1 2/17/09 1:46:30 PM

Page 18: EDA Tech Forum Journal: June 2009

< TECH FORUM > EMBEDDED18

A major problem facing embedded software developers is how to easily migrate applications onto new hardware. The risks and difficulties of migration often force developers to stick with outdated software and hardware platforms, mak-ing it difficult and costly to add new competitive features to their embedded products. Undertaking development in high-level languages (e.g., C and C++) and using RTOS platforms with open standards interfaces (e.g., POSIX) can make the transition a little easier, but there are still major porting efforts required when the hardware changes.

Much of the time, the embedded software engineer just wants to run his legacy system on the new hardware without having to change anything, and add new features alongside it to, say, offer a more modern user interface or perhaps to pick up new communication protocols to talk to the outside world.

Interestingly enough, something similar was recently needed in the IT world, where multiple applications and versions of OSs needed to run on multiple hardware plat-forms. The prevailing solution in IT has been using software virtualization. It provides a common software environment in which to run different ‘guest’ OSs and their applications across many hardware environments.

This virtualization—or ‘hypervisor’—technology is now available to embedded developers, offering similar ben-efits for the reuse and migration of legacy platforms. An-other interesting benefit of using an embedded hypervisor when combined with a software separation kernel is that it allows a traditional embedded RTOS to be resident on the same physical hardware as a more traditional desktop OS like Windows or Linux without compromising the real-time performance and determinism of the system.

The separation kernel is a modern version of an RTOS that includes the ability to safely and securely partition both time and memory for the different software applica-tions running in the system. The applications run in their own virtual machines that are protected from each other

by using the separation kernel and the memory manage-ment unit of the processor. Partitioned OSs have been used in safety-critical systems (e.g., avionics applications) for a while, but the separation kernel adds two new components, security and multi-processor support, that make them still more widely applicable.

In a safety-critical system, a partitioned OS is used to guarantee that any fault condition is isolated to the applica-tion in question and cannot corrupt or contaminate other applications that are running on the same hardware. At the same time, there are guarantees that each application gets its required time slice regardless of what else is happening, based on predetermined priorities and time allocations. In a system requiring security functionality, the separation ker-nel must have the ability to stop malicious faults or attacks that have entered into the system through a breach in one of the applications.

The separation kernel can also help with another interest-ing challenge, that of embedded software development on multicore devices. The traditional RTOS has been good at distributing functionality on a single device, as it can con-trol and marshal all the resources that the single processor can control. In a multicore system there is a need to control

Robert Day is vice president, marketing at LynuxWorks and has more than 20 years of experience in the embedded industry. He is a graduate of The University of Brighton, England, where he earned a Bachelor of Science degree in computer science.

Robert Day, LynuxWorks

Embedded software virtualization comes of age

Partition 1(core 1)

Partition 2(core 2)

Partition 3(core 2)

GUIApplication

Guest OS

FullVirtualization

API

NewApplication

New RTOS

Para-Virtualization

API

LegacyApplication

Legacy RTOS

Para-Virtualization

API

Multi-core Hardware

Separation Kernel & Hypervisor

FIGURE 1 New and legacy applications on a multicore system

Source: LynuxWorks

Page 19: EDA Tech Forum Journal: June 2009

19EDA Tech Forum June 2009

New separation kernels and embedded hypervisors can help ease the pain of migrating legacy systems to new hardware platforms, including multicore processing systems. Bringing multiple OSs and applications on to the same hardware also opens up new possibilities for com-bining systems that offer real-time performance within a familiar GUI environment. The LynxSecure separation kernel and embedded hypervisor from LynuxWorks is one of the latest generations of this technology, and is used to illustrate these new capabilities.

and synchronize resources and applications that are spread over multiple cores. Running multiple instantiations of a traditional RTOS can deal with the spread of different ap-plications across different cores, but is typically somewhat limited in the communication between, and the synchroni-zation across those applications.

A well designed separation kernel can actually run one instantiation of itself in a multicore system. Using the par-titioning and secure communication mechanisms described above, it can allow partitioning of the system across multi-ple cores and allow communication paths and synchroniza-tion mechanisms between applications regardless of which core they are running on.

Separation kernels have a great deal of flexibility and yet offer a greater degree of security than traditional embed-ded RTOSs, and are likely to become very widely used over a broad range of different embedded applications on many different hardware platforms. This gets really interesting when the separation kernel is married to a software hyper-visor. The embedded hypervisor is designed to be smaller and more efficient than its IT cousins, but still gives the benefit of being able to run different ‘guest’ OSs on top of an RTOS.

The hypervisor and separation kernel combination gives a new dimension to embedded software development, of-fering the ability to run multiple guest OSs and applications securely separated by a real-time system. The combined use of multicore processors, a separation kernel and a hypervi-sor allows developers to partition their system with guest OSs spread over multiple cores, and retains the ability to control communications between them using a common separation kernel.

Many new multicore processors also offer hardware virtualization to allow hypervisors to more efficiently run guest OSs. Although much of this technology has been built for the IT world, it offers some very compelling bene-fits for embedded users. By allowing the hardware to help with software virtualization, for example the VT-D and VT-X technology provided on Intel processors, it enables the hypervisor to leave many of its instruction and data handling operations to hardware. This boosts the perfor-mance of running guest OSs to near native performance,

and does not compromise the real-time attributes of the guest applications even though they are running in a vir-tualized environment.

Rather like a memory management unit, hardware vir-tualization also prevents the guest OSs from interfering with other parts of the system without having to rely to-tally on software management. This increases the security of the system without degrading the real-time perfor-mance and determinism.

For many embedded systems, performance is a key factor, and this separates the new generation of embedded hyper-visors from traditional IT-centric offerings. There are differ-ent types of software virtualization available to the embed-ded developer, offering the best control over a guest OS.

The first is not really virtualization or a hypervisor, but rather a software emulation package. This is often provided under the guise of virtualization, but it has a very important element that needs to be clarified—that of performance.

A software emulator essentially translates the machine instructions of one processor on top of the target hard-ware, and is available as an application that can run on top of a partitioned OS. This software emulator then can present a full emulated environment to a run a guest OS on. This can be very appealing as it appears to give the same functionality as a hypervisor, but there is a huge performance hit as the emulator is having to translate each instruction for both the guest OS and the applica-tions running on it.

A true hypervisor will assist in the running of guest OSs, but they will still be running on the processor for which they were designed, removing the need for emulation and hence increasing performance.

Continued on next page

Page 20: EDA Tech Forum Journal: June 2009

< TECH FORUM > EMBEDDED20

Two types of hypervisor virtualization are available: para- virtualization and full virtualization. Para-virtualization is a term used for a guest OS that has been modified to run on top of a hypervisor. In this case, the virtualization environ-ment on which the embedded applications run has been optimized for performance, both for the processor environ-ment and for the hypervisor. This approach offers a near native execution performance for the embedded applica-tions and OSs when combined with hardware virtualiza-tion extensions, and can be used to host true RTOSs in their own partitions.

Another approach that can be used with this technology is that of full virtualization. Here, the same hypervisor can offer a virtualization environment to the guest OS that is akin to running on the native hardware. This requires no changes to the guest OS, but because the hypervisor is add-ing more virtualization, there is a small performance hit in comparison to the para-virtualized approach. An advantage of full virtualization is that OSs where the source cannot be para-virtualized (e.g., Windows) can still be run in this em-bedded system.

The LynxSecure separation kernel and embedded hyper-visor from LynuxWorks offers both para- and full virtual-ization as well as the ability to run these different types of virtualization simultaneously in the same system. This al-lows the embedded developer to pick and choose the type of virtualization based on what guest OSs are required and the performance aspects of different parts of the system. This elegant solution has unlimited flexibility without sacri-ficing the performance required by embedded systems.

Returning to the original issue of code migration, this new technology not only allows applications to be migrated to new hardware, but also for them to use the OSs that they were originally run on. The use of the separation kernel also

App AppAppAppWindowsApp

LinuxApp

ARINC 653App

POSIXApp

POSIX APEX API

API

Middleware GLIBC Middleware

Open-standards API Open-standards API Open-standards API

LynxOS-SE

Virtual CSP/BSP/Drivers Virtual CSP/BSP/Drivers Virtual CSP/BSP/Drivers

Win32 API

Windows Linux

CDSGuard

App

ApplicationRun-Time (ART)

ART

Subject 0 Subject 1 Subject 2 Subjects 3-4

Intersubject Communication

LynxSecure Separation Kernel and Embedded Hypervisor

Exceptions and Interrupts Device I/O, Memory Management

Hardware

USE

RSU

PER

VIS

OR

HY

PER

VIS

OR

FIGURE 2 The LynxSecure separation kernel and embedded hypervisor offers ultimate flexibility for new embedded designs

Source: LynuxWorks

Virtualization technology is now available to

embedded developers, offering major benefits for the reuse and migration of

legacy platforms.

Page 21: EDA Tech Forum Journal: June 2009

Untitled-4 1 6/5/09 2:55:40 PM

Page 22: EDA Tech Forum Journal: June 2009

< TECH FORUM > EMBEDDED22

allows these legacy systems to be contained in their own virtual machine, and even allocated their own core in a mul-ticore system.

Figure 1 (p. 18) shows a legacy RTOS application running in its own partition, sharing a core with a new set of ap-plications on a new RTOS, while the other core has the GUI element with a fully virtualized guest OS like Windows for a familiar man machine interface.

This scenario is likely to become quite common in many embedded industries from industrial controls through to medical, financial, military and automotive systems. It brings together a real-time system (e.g., data collection from a networked sensor system) with a more traditional GUI-based OS such as Windows or Linux. This combination on a single piece of hardware has always been a challenge, as both systems have individually required control of underlying hardware, which could compromise both the real-time per-formance aspects as well as the security and integrity of both systems. The separation kernel and hypervisor allows both the RTOS and GUI OS to each run in their own parti-tion, possibly each on their own core in a multicore system, both protected from one another, but allowing each to se-curely communicate with each other.

Because the software environment is virtualized, any leg-acy applications will also run as before, removing the need

for a large amount of recoding. The RTOS will be able to run in real time because the underlying separation kernel still provides a deterministic environment; the GUI OS will believe it has full control of the processor, and will perform as if it has its own dedicated machine. This system will then be able to offer a familiar user interface to control and dis-play the data, while the RTOS busily collects the informa-tion without compromise.

LynuxWorks855 Embedded WaySan JoseCA 95138-1018USA

T: 1 408 979 3900W: www.lynuxworks.com

Do you wake up at night thinking about your project?

The DV Notebook gives you real-time statusin any web browser. Achilles software

automatically extracts, gathers, and organizeskey results from your design and verication

les as they are generated.

We’ve got just what you needto see clearly each morning.

www.achillestest.com

Untitled-14 1 6/1/09 3:43:34 PM

Page 23: EDA Tech Forum Journal: June 2009

©2009 National Instruments. All rights reserved. CompactRIO, LabVIEW, National Instruments, NI, and ni.com are trademarks of National Instruments. Other product and company names listed are trademarks or trade names of their respective companies. 2009-10794-305-101-D

>> Learn how to simplify embedded design at ni.com/embedded 888 279 9833

Get to market faster and reduce development costs with graphical system design, an approach that combines open,

graphical software and off-the-shelf hardware to help you quickly iterate on designs and easily implement them on an

NI embedded platform. The NI CompactRIO system offers an ideal embedded prototyping platform with a built-in micro-

controller, RTOS, programmable FPGA, integrated signal conditioning, and modular I/O, as well as tight integration with

intuitive NI LabVIEW software.

Embedded Prototyping. Simplified.

Traditional Prototyping Tools Graphical System Design Tools

Untitled-5 1 4/15/09 3:16:53 PM

Page 24: EDA Tech Forum Journal: June 2009

< TECH FORUM > ESL/SYSTEMC24

In consumer electronics, missing a market window by even a few weeks can result in drastically limited sales. These cost- and schedule-sensitive applications are at the same time among the most challenging to create. Composed of complex hardware blocks, they typically include sophisti-cated digital circuitry coupled with large memories to pro-vide advanced computational and multimedia capabilities. Frequently battery-powered, they have stringent power restrictions despite the demand for each generation to sup-port ever more features and capabilities.

With all the complexity associated with the hardware, the software is also crucial to the competitive success of these products. The application software is often the key differ-entiator, allowing the system company to reap substantial profit margins. Software is also increasingly important with regard to the power and performance behavior of the hard-ware platform.

In traditional product development flows, the software team waits to validate their code on prototype hardware. While this approach worked well in the past, it fails under current technical and time-to-market pressures. According to industry research firm Venture Development Corpo-ration, nearly 40% of project delays can be traced back to flaws in the system architecture design and specification. This problem exists because finding and fixing hardware/software design errors at the late, physical prototype stage is both very difficult and very time-consuming.

Moving hardware/software validation to an earlier point in the design flow enables both groups to quickly model their designs, assess the functionality and attributes of the entire system, and make changes that deliver huge returns in performance, power consumption and system size with-out endangering deadlines. The conclusion is clear: starting application software and firmware development against a high-level hardware model can save significant develop-ment time, and yield products that meet or exceed consum-er expectations.

Conducting software validation earlierA new system design methodology is emerging in response to this pressing need for hardware/software validation early in the design cycle. It is based on the creation of high-level hardware models that describe functionality in suffi-cient detail for the software team to use them as a develop-ment platform, even when hardware design is in its nascent stages. Thus, software developers can start application and firmware validation during the initial stages of the design cycle, where changes are easiest, have the most impact on the characteristics of the final design, and raise little risk of jeopardizing the product launch date.

The methodology is based on a scalable transactional level modeling (TLM) concept that describes the hardware in SystemC. It provides benefits during both hardware and

Alon Wintergreen has worked for Mentor Graphics as techni-cal support engineer specializing in ESL and Summit Design tools since 2006. He holds a BS in electrical engineering from the Technion, the Israel Institute of Technology, Haifa.

Alon Wintergreen, Mentor Graphics

Using a TLM virtual system prototype for hardware and software validation

TLM Reference Platform

Function

Timing Power

Communication Layer

Communication Layer

Co

mm

un

ication

Layer

Co

mm

un

icat

ion

Lay

er

Port

Port

• Separates Function from Interface• Separates Timing and Power from Function for incremental refinement• Aligns with Hardware and Software requirements for speed and fidelity

FIGURE 1 A scalable TLM approach

Source: Mentor Graphics

Page 25: EDA Tech Forum Journal: June 2009

25EDA Tech Forum June 2009

The article describes how a methodology based around scalable transaction level modeling (TLM) techniques can be used to enable software design to begin far earlier in a design flow and thus allow companies to bring designs to market faster, particularly in time-sensitive sectors.

It is based on the creation of high-level hardware models that describe functionality in sufficient detail so that the software team can use them as a development plat-form, even if hardware design is in its earliest stages. Thus, software developers can even start application and firmware validation during the initial stages of the design, when changes are easiest, have the most impact on the characteristics of the final design, and there is little risk of missing a market deadline.

software development. Not only can the software team be-gin coding much earlier, but TLM hardware descriptions also provide much faster verification times—100x or more. On the hardware side, TLM allows for compact descriptions because the hardware system blocks are captured at a high-er level and communicate by function calls, not by detailed signals, significantly reducing simulation time.

The TLM model does not limit the design creativity of the hardware team. TLM allows the separation of functionality from implementation. Hence, instead of forcing engineers to commit to hardware specifics early in the design cycle, the model simply describes the functionality of the hard-ware, not the details of how the hardware achieves that functionality. It also enables incremental model fidelity for timing and power. In essence, the TLM model is indepen-dent of the hardware mechanics, allowing the hardware team to continually refine the design without having to

constantly update the high-level virtual prototype. A scal-able TLM approach separates function from interface, tim-ing and power from function for incremental refinement, and aligns with hardware and software requirements for speed and fidelity (Figure 1).

At the same time, software development can align with hardware development from the very earliest stages, allow-ing system interaction issues to be identified and resolved from the outset, dramatically minimizing their potential im-pact on the schedule. As a result, this methodology moves software/hardware integration into the electronic system level (ESL).

Concept Virtual Prototype Architecture Exploration Implementation

Untimed Loosely Timed ApproximatelyTimed

AccuratelyTimed

RTL

RTL

Single Scalable TLM Model

100M

10M

1M

10K

1K

Speed(MIPS)

FunctionalAccurate

RegisterAccurate

TransactionAccurate

ProtocolAccurate

BitAccurate

Function

Accuracy

Function

Timing Power

Communication Layer

Communication Layer

Co

mm

un

ication

LayerC

om

mu

nic

atio

n L

ayer

Port

Port

FIGURE 2 Ideal ESL design modeling

Source: Mentor Graphics

Continued on next page

Page 26: EDA Tech Forum Journal: June 2009

< TECH FORUM > ESL/SYSTEMC26

Using the Programmer’s View for software application validation TLM has several levels of abstraction, all of which support virtual prototyping and hardware/software co-design. However, there are some trade-offs involved.

The very highest level, known as the Programmer’s View (PV) level, is a good point at which to begin software validation. At this stage, the SystemC hardware descrip-tion does not include any timing information and therefore the simulation performance is extremely efficient—at least 1000 times faster than at the RTL level. The TLM model contains sufficient information to describe the hardware functionality to support software application develop-ment. Interface declarations are included so the software can connect with the hardware side. Specifically, there are two kinds of interfaces.

The first is a high-level methods interface with which the software engineer can call in his program. The method will ‘run’ the hardware design and ‘returns’ with the re-sult value.

The second is a bus-cycle-accurate interface based on memory-mapped registers on the hardware side, allow-ing the hardware and software sides to interact through read and write transactions along with interrupts. Such a hardware/software interface is achieved either by incor-porating an instruction set simulator (ISS) or by using a host-mode technology that uses read/write implicit access. An implicit access captures all the accesses to hardware by identifying the memory space calls. It allows software to

run on any host processor (rather than the target processor) and simplifies software programming because the software engineer does not need to instrument the code with any external API calls. Host-mode execution often offers much faster simulation with slightly less accuracy than using the traditional ISS.

The firmware development environmentSoftware teams have traditionally been forced to wait for a hardware prototype to develop the firmware be-cause of the level of detail required for validation. How-ever, this aspect of the hardware/software interaction can now be moved to much earlier in the design cycle by using TLM models. At this point, the hardware team should nevertheless introduce detailed timing informa-tion because of its potential influence over the behavior of the firmware.

The abstraction level is now bus-cycle-accurate, and here software engineers can decide if they want to work on the target OS (in this case they will use ISS models accompanied by the SW development tools) or on any host OS of their choice (in which case they will use bus-functional models and implicit-access functionality).

This enables the firmware code to interact through bus-functional models with the hardware design. Working in the host OS environment of choice using the cycle-accurate model, any read/write operation will be mapped to the hardware and interact with an actual address in the hard-ware. An example of this type of implicit access is:

Function

Timing Power

Communication Layer

Communication Layer

Co

mm

un

ication

Layer

Co

mm

un

icat

ion

Lay

er

Port

Port

FunctionalBehavior

RTL

ANSI C++System/Algorithm

Untimed TLMHW/SW Prototype

Timed TLMArchitectural Analysis

Bus Cycle AccurateHW Verification

RTLImplementation

100,000x(1 sec)

10,000x

1,000x

10x

1x(7 days)

FIGURE 3 Scalable transaction level modeling

Source: Mentor Graphics

Page 27: EDA Tech Forum Journal: June 2009

27EDA Tech Forum June 2009

*addr1 = value1; // write access to mapped

address – addr1

value2 = *addr2; // read access from mapped

address - addr2

There are several specific debugging functionalities for firmware-related verification tasks. For instance, the design team can manage both hardware and software environ-ments in one IDE tool. They also can perform debugging operations such as assigning breakpoints on both sides, and perform hardware/software transaction debugging. And they can view all the transactions (read/write/interrupts) and associated information in between hardware and soft-ware and break on any specific types of transaction or its parameters.

Selecting hardware verification methodsWhen it comes to hardware verification and debug, one of two approaches is usually taken.

The first involves the use of ISS models and software de-velopment environments at the highest TLM level (fast ISS models) or at the cycle-accurate level as described earlier.

The second approach adopts the emulation of software threads within the SystemC hardware design. As opposed to the previous methods where software is linked through an ISS or host mode, here it is embedded within the hard-ware CPU model as additional SystemC threads that exe-cute directly with the hardware in one simulation process.

This second option is used specifically for system perfor-mance exploration since it offers very high simulation speed while being less accurate with no support for an RTOS. In that approach, which is used mainly by system architects, it is also possible to use ‘token-based’ modeling, which allows high simulation performance.

In the first approach, the PV and the cycle-accurate model can also interact with SystemC verification solu-tions. They can be connected to existing ISS SystemC mod-els, either at the PV level or cycle-accurate ISS solutions at TLM’s Verification View level. Software developers can work on the real target OS if the host-mode is not accurate enough for them. If the ISS model (or models) and associ-ated software development tools can be fully synchronized with the SystemC hardware description of the system, the target software development can also start earlier in the design cycle.

In the second approach, we define a sub-level of abstrac-tion, which is called the Architect’s View. It includes some timing information, simulates faster than cycle-accurate models, but is not as accurate. This level is mainly used by system architects for performance analysis. Here, the meth-odology includes a set of configurable hardware models at that abstraction level (e.g., generic buses, generic processor, generic DMA, data generators, etc.). Using this methodol-ogy, the system architect can define hardware and software partitioning as well as target processors, bus architectures

and memory hierarchies. Equally important, he or she can add timing and power metrics.

This level also supports token-based modeling, an abstract high-level modeling method that uses tokens (or pointers) to represent the data structure, resulting in an exceptionally fast simulation performance, an important requirement for system performance analysis.

In addition, performance analysis functionalities can be used with custom models, so that system architects can run software emulation as a testbench for the system performance analysis task. Think of this as a software emulation that runs as SystemC threads and therefore it is part of the hardware simulation, but runs extremely fast. This capability can be used by the system architect at the highest level to find the best architecture. The tokens or pointers result in very accurate modeling for use in measuring system performance. The system engineer can manipulate the parameters of the different blocks and test various configurations and use cases until reaching the required performance.

Integrating developmentIn markets that are extremely sensitive to cost and schedule slips, hardware and software teams need to work together from the very outset to meet product launch windows. The emerging and scalable TLM methodology described above moves software and firmware validation to the earliest stages of the design cycle, benefiting both teams. Software designers can now validate their applications and firmware long before hardware prototypes are available. At the same time, the hardware team can concentrate on hardware re-finement without having to continually update models for the software validation.

By aligning the software and hardware flows at the earli-est point possible, this approach minimizes integration risks downstream in the design flow. The result is a significantly reduced threat of schedule slips even as the design team maximizes product differentiation. The use of scalable TLM models is a crucial step in bridging software and hardware design methodologies, bringing us closer to the ultimate goal: true concurrent design.

Mentor GraphicsCorporate Office8005 SW Boeckman RdWilsonvilleOR 97070USA

T: +1 800 547 3000W: www.mentor.com

Page 28: EDA Tech Forum Journal: June 2009

< TECH FORUM > ESL/SYSTEMC28

Cycle-accurate models very precisely predict the behavior and performance of hardware and software components. Behavioral and performance transaction-level models (TLMs) enable hardware/software partitioning decisions to be made at the electronic system level (ESL) early in the development phase, long before cycle-accurate models will be available. The problematic gap between these different types of models is known as the ESL implementation gap.

Figure 1 offers a proven methodology that bridges the gap from SystemC TLMs for architectural exploration to SystemC cycle-accurate models of hardware. The behavior of the cy-cle-accurate models can be verified in the complete system by comparing it with the reference TLMs. The reference model

of the complete system serves as a testbench for the verifica-tion and integration of the cycle-accurate models.

This flow allows designers to evaluate alternative archi-tectures early in a project with low modeling effort, and integrate cycle-accurate models later on to progressively replace the TLMs. Exact timing and performance properties obtained from simulating cycle-accurate models (e.g., pow-er consumption, resource load) are used to back-annotate the reference models. This increases the level of confidence in decisions made when exploring the design space.

JPEG system application modelingA JPEG system will be used to demonstrate the design flow. It consists of a still camera and a JPEG encoder/decoder (Figure 2). The Camera includes a Controller and two image Sensors. The JPEG encoder/decoder consists of two subsystems (JPEG1 and JPEG2), each processing the images acquired by a sensor. The subsys-tems consist of three functions: MBConverter, MBEncoder and

JPEGSave. The structures of the two subsystems are different for MBEncoder. The test case focuses on the transaction-level and cycle-accurate modeling of the MBEncoder1 and MBEncoder2 functions, the mapping of these func-tions onto a multiprocessor platform, and the resulting performance of multiple candidate architectures in terms of latency, utilization and power consumption.

MBEncoder1 is mod-eled with one computa-tional block (Pixelpipe), as shown on the left of Figure 3. This function contains the entire JPEG sequential C code, which constitutes the reference algorithm tested on a PC.

Jérôme Lemaitre is solution specialist at CoFluent Design. His fields of interest include architectural exploration and perfor-mance analysis, MPSoC/NoC design and rapid prototyping. He completed a post graduate degree in 2002 at ESPEO, Uni-versity of Orléans, France, majoring in embedded systems.

Jérôme Lemaitre, CoFluent Design

Bridging from ESL models to implementation via high-level hardware synthesis

Library

TimingPowerLoad

MemoryCost

Mixed Graphical andTextual Specification

Automatic SystemC Generation

ReferenceTimed Behavior

Performance

CoFluent Studio

SWDevelopment

ArchitectureExploration

Integration & TestBack-annotation

CatapultHW Behavioral Synthesis

Loop unrolling, pipelining

Validation

Cycle AccurateBehavior

Real PerformanceSW/HW

co-simulation

CycleAccurateModels

Back-annotation

Product

Transaction-LevelModeling

FIGURE 1 Combining ESL models and cycle-accurate models of HW tasks

Source: CoFluent Design

Page 29: EDA Tech Forum Journal: June 2009

29EDA Tech Forum June 2009

The article describes a methodology that bridges the gap between SystemC transaction-level models (TLMs) that are used for architectural exploration and SystemC cycle-accurate models of hardware that typically follow much later in a design flow, after many sensitive decisions have been made.

The behavior of the cycle-accurate models can be verified in the complete system by comparing it with the reference TLMs. The reference model of the complete system then serves as a testbench for the verification and integration of the cycle-accurate models.

This flow allows designers to evaluate alternative architec-tures early in a project with low modeling effort, and they can integrate cycle-accurate models later on to progres-sively replace the TLMs.

Exact timing and performance properties obtained from simulating cycle-accurate models (e.g., power consump-tion, resource load) are used to back-annotate the refer-ence models. This increases the level of confidence in decisions made when exploring the design space.

The methodology is illustrated via a case study involving a JPEG system, using Mentor Graphics’ Catapult Synthesis and CoFluent Design’s CoFluent Studio tools to provide a complete ESL flow, from architectural exploration to hardware implementation.

MBEncoder2 has the same functionality as MBEncoder1. The difference is the granularity: MBEncoder2 is more detailed in order to optimize its implementation, as shown on the right of Figure 3. The separation enables the optimization of image processing by introducing parallelism in the application. Map-ping these functions onto a hardware or software processor enables the exploration of their behavior and performance.

Functional TLMs of the encodersIn the TLMs of the encoders, the behavior of the Pixelpipe, DCT, Quantize and Huffmanize blocks is implemented by calling procedures that execute sequential C code provided by Mentor Graphics. This C code operates on algorithmic C bit-accurate data types. These allow you to anticipate the replacement of the reference model with the cycle- and bit-accurate model obtained after the hardware undergoes high-level synthesis with Mentor’s Catapult C software.

These are the execution times measured in CoFluent Stu-dio for the functions under consideration.Computation Averageblock executionDCT 25.40usQuantize 26.09usHuffmanize 113.60usPixelpipe 152.06us

These numbers are obtained by calibrating the execution of the application on a 1GHz PC. The measurements pro-vide initial timing information that is precise enough to map the corresponding functions onto software processing units (e.g., CPU, DSP). To map these functions onto hardware processing units (e.g. ASIC, FPGA), more accurate numbers can be obtained from high-level hardware synthesis.

Although the execution time of Pixelpipe is shorter than the sum of the execution times of DCT, Quantize and Huff-manize, the processing of a complete image is shorter with MBEncoder2 (439ms) than with MBEncoder1 (536ms). This is because the DCTMB, QuantizeMB and HuffmanizeMB func-tions are pipelined, whereas MBEncoder1 has to complete the processing of a macro-block before accepting a new one.

FIGURE 2 JPEG system

Source: CoFluent Design

Pixelpipe DCT Quantize

MBFromConverter1

MBBitStream1MBDCT MBQuant

MBEncoder1MBEncoder1 Behavior DCTBehavior

DCTMB QuantizeMBQuantizeBehavior

Huffmanize

HuffmanizerMB

MBBitStream2

HuffmanizerBehavior

MBFromConverter2

FIGURE 3 Structure of MBEncoder1 (left) and MBEncoder2 (right)

Source: CoFluent Design

Continued on next page

Page 30: EDA Tech Forum Journal: June 2009

< TECH FORUM > ESL/SYSTEMC30

Also, the processing speed of the pipeline in MBEncoder2 is limited by the HuffmanizeMB function, since it has the longest execution time in the pipeline.

The operations are verified by visualizing images in Co-Fluent Studio and reviewing the timing properties as shown in Figure 4. Simulating one second of data with the parallel processing of two images of 200x128 pixels at the transaction level requires only a few seconds of actual simulation time.

Platform modelingThe complete JPEG application is mapped onto the platform model shown in Figure 5. It consists of an ExternalPlatform, modeled as a hardware processing unit, and a JPEGPlatform.

CoFluent Studio offers generic models of hardware ele-ments. These processing, communication and storage units are characterized by high-level behavioral and performance properties that are parameterized to represent physical parts. Multiple and various platform models can be described

quickly, without investing in the expensive development or acquisition of models of specific IP components, such as memories, buses or processors. Simulation of software is not based on instruction-set simulators, as the C code used to de-scribe algorithms executes natively on the simulation host.

The FPGA has a speed-up defined as a generic parameter named FPGA_SpeedUp, which can vary from 1 to 250, with a default of 10. This parameter represents the hardware accelera-tion. The speed-up of the DSP is set to 2, meaning that the in-ternal architecture of the DSP is twice as efficient as a general-purpose processor, due to specialized embedded instructions.

The test case maps MBEncoder1 and MBEncoder2 onto the FPGA and DSP, with exploration of multiple mapping alter-natives. The following assumptions were used: the Camera model is mapped onto the External platform, while MBCon-verters and JPGSave are mapped onto the CPU with execu-tion times short enough not to delay the DSP and FPGA.

Average execution times can now be updated a follows:Computation SW execution HW executionblock (KCycles) (KCycles)DCT 25.40/2 25.40/FPGA_SpeedUpQuantize 26.09/2 26.09/FPGA_SpeedUpHuffmanize 113.60/2 113.60/FPGA_SpeedUpPixelpipe 152.06/2 152.06/FPGA_SpeedUp

The power consumption for each computation block is described using a simplified law that utilizes the FPGA_SpeedUp parameter. A higher speed-up on the FPGA uses more gates, and therefore increases the power consumption. The power consumption equations are:

FIGURE 4 Verification of the functional behavior in CoFluent Studio

Source: CoFluent Design

External PlatformSerialLink

DualPortMem LocalBus

JPEGPlatform

CPU

DSP

FPGA

JPEGPlatformStruc

SharedBusSharedMem

FIGURE 5 Platform model.

Source: CoFluent Design

TABLE 1 Initial performances of the DSP and FPGA in Configuration C

Source: CoFluent Design

Page 31: EDA Tech Forum Journal: June 2009

31EDA Tech Forum June 2009

Computation Powerblock consumption (mW)DCT 0.2*FPGA_SpeedUp^(3/2)Quantize 0.15*FPGA_SpeedUp^(3/2)Huffmanize 0.2*FPGA_SpeedUp(3/2)Pixelpipe 0.25*FPGA_SpeedUp^(3/2)

Mapping and architecture modelingArchitecture descriptionOne image is imposed every 500ms simultaneously to MBEncoder1 and MBEncoder2. Here is a comparison of the performance of the three configurations.Function Config.A Config.B Config.CMBEncoder1 FPGA DSP FPGADCTMB DSP FPGA DSPQuantizeMB DSP FPGA DSPHuffmanizeMB DSP FPGA FPGA

By studying the impact of FPGA_SpeedUp on the perfor-mance of the system in terms of latencies, resource utiliza-tion and power consumption, the best architecture and the minimum value required for the FPGA_SpeedUp generic parameter can be selected.

CoFluent Studio’s drag-and-drop mapping operation is used to allocate functions to processing units and route data through communication channels. The resulting architec-tural models are automatically generated in SystemC.

Profiling data is automatically collected for each configu-ration at all hierarchical levels during simulation. The simu-lations are based on individual properties, described using constants or C variables. This information is displayed in tables, as shown in Table 1. The utilization of the FPGA in-creases to 200% because two functions (MBEncoder1 and HuffmanizeMB) can be executed in parallel on the FPGA.

Initial exploration resultsEarly exploration results are based on initial timing properties measured by simulating the reference model. Results in terms of utilization and power consumption on the DSP and FPGA, and processing latencies for the two JPEG encoders/decoders, are given for the default case (FPGA_SpeedUp = 10). Configuration C processes both images with the shortest latencies. Config.A Config.B Config.CLatencies Path w. Encoder1 182 366 182(ms) Path w. Encoder2 400 136 136

Utilization DSP 80.01 73.23 25.22(%) FPGA 36.64 39.77 63.85

Power DSP 30.60 33.09 8.53Cons.(mW) FPGA 2.93 2.55 4.71

FIGURE 6 Dynamic power consumption

Source: CoFluent Design

FIGURE 7 Effects of FPGA speed-up generic parameter on latencies and power consumption

Source: CoFluent Design

DCTMB

MBDCTMBFromConverter2

Wrapper

ResetRequest

Acknowledge

DataIn

Enable

Address

DataOut

CycleAccurate

SystemC DCT

Clock

FIGURE 8 Integrating the DCT cycle-accurate model in CoFluent Studio

Source: CoFluent Design

Continued on next page

Page 32: EDA Tech Forum Journal: June 2009

< TECH FORUM > ESL/SYSTEMC32

Figure 6 (p. 29) shows that, on average, Configuration C consumes less power than the two other configurations. However, the power consumption on the FPGA is higher for Configuration C.

In CoFluent Studio, it is possible to explore the impact of generic parameters at the system level for multiple architec-tures with a single simulation. The results for all configura-tions are collected and displayed in the same environment. This allows for rapid comparison of architecture.

Figure 7 (p. 29) shows the impact of FPGA_SpeedUp on la-tencies and power consumption. For Configuration C, MBEn-coder2 becomes the bottleneck, since the system performance is limited by the DSP. The simulations show that FPGA_Speed-

Up = 15 is the minimum and optimal value, and should be set as the objective for the hardware high-level synthesis tool.

Calibration of the reference modelIn the previous section, the JPEG system was modeled at the transaction level, and system-level decisions were made based on the initial exploration results. In this section, cycle-accurate models obtained from Catapult C hardware high-level synthesis are integrated back into CoFluent Studio for further verification and refinement.

Functional cycle-accurate modelsUsing Catapult C, the sequential C code that is executed in the computation blocks is converted into SystemC cycle-ac-curate code. The resulting code is integrated back into CoFlu-ent Studio to verify the behavior of the cycle-accurate models against the reference TLMs. Then, the timing and perfor-mance properties of the cycle-accurate models are extracted through simulation to calibrate the architecture exploration process for functions that are mapped onto hardware units.

In order to integrate the cycle-accurate models back into CoFluent Studio, SystemC wrappers are created (Figure 8). They convert incoming transaction-level data to cycle-accu-rate data that is processed by the detailed model, and vice versa. The wrappers handle interfaces and protocols specific to the detailed model, such as handshakes and memories.

It took one day to wrap the detailed models and integrate them into CoFluent Studio. The verification task is simpli-fied since the reference TLM is used as a testbench. The processing of macro-blocks and images can be displayed directly in CoFluent Studio. However, the simulation speed is slower. For this example, it is approximately 400 times slower than the transaction-level simulation.

These are the exact properties of the cycle-accurate opera-tions in terms of the (measured) number of clock cycles and the (assumed) power consumption. Average exec Average powerFunction (clockcycles) cons.(mW)DCTMB 161 1000QuantizeMB 72 800HuffmanizeMB 576 1200MBEncoder1 303 1400

Config A Config B Config C

Latencies (ms) Path with Encoder1 Path with Encoder2

6 (7) 400 (400)

366 (366) 9 (5)

6 (7) 126 (126)

Utilization (%) DSP FPGA

80.01 (80.01) 1.21 (1.46)

73.23 (73.23) 3.38 (1.59)

25.22 (25.22) 3.07 (2.55)

Power. Cons. (mW) DSP FPGA

30.60 (30.60) 10.66 (14.43)

33.09 (33.09) 23.28 (12.04)

8.53 (8.53) 27.73 (23.05)

TABLE 2 Exploration after synthesis and back-annotation

Source: CoFluent Design

FIGURE 9 Configuration C dynamic power consumption profiles, before and after back-annotation

Source: CoFluent Design

FIGURE 10 Power consumption when mapping two encoders on the FPGA

Source: CoFluent Design

Page 33: EDA Tech Forum Journal: June 2009

33EDA Tech Forum June 2009

The back-annotation of the timing information leads to more accurate performance results during the design-space exploration phase.

As in the reference model, HuffmanizeMB is the slow-est function in the pipeline in MBEncoder2. This is due to the fact that the cycle-accurate model of the HuffmanizeMB function does not read/write elements of a macro-block continuously whereas the three other models do.

Performance analysis after calibrationIn order to explore the performance of the same architectures, the reference model is back-annotated with exact properties of the detailed models. Since timing properties are exact, the value of the FPGA_SpeedUp parameter is set to 1 for this new iteration in the architecture exploration phase.

As shown in Table 2, the speed-up obtained after high-lev-el synthesis is approximately 250. Values obtained based on the reference model for FPGA_SpeedUp = 250 are indicated between brackets for comparison. The metrics of interest converge toward similar values from the reference model for all architectures, confirming design decisions made ear-ly, based on the reference transaction-level model.

Configuration C leads to the shortest latencies. As pre-dicted with the reference transaction-level model, the bot-tleneck for Configuration C is the DSP, since the real speed-up exceeds 15. Configuration C permits each encoder to process two images in parallel every 126ms. As shown in Figure 9, this leads to a peak power consumption of almost 1W on the FPGA. For comparison, the dynamic profiles re-turned by the reference model with FPGA_SpeedUp = 15 are also shown. The configuration with FPGA_SpeedUp = 15 can process the same number of images, with a lower peak power consumption (approximately 25mW).

Maximizing encoding performanceBy optimizing the latencies in the back-annotated model by mapping both MBEncoder1 and MBEncoder2 onto the FPGA, the DSP limitation is avoided. In this configuration, two images can be processed in parallel in 9ms: each en-coder processes more than 100 images per second. The re-sulting average power consumption is higher than 1.6W on the FPGA. Figure 10 shows the case where the two encoders receive an image every 10ms.

The exact durations of the detailed models are indicated in Figure 10; highlighting the bottleneck is the Huffman-izeMB function in the pipeline. This function must be syn-thesized differently to reach the execution time of 3030ns of the MBEncoder1 function, leading to approximately 150 images per second.

ConclusionJoining TLMs and cycle-accurate models obtained after high-level hardware synthesis using Mentor Graphics’ Cat-apult C Synthesis for architecture exploration in CoFluent

Design’s CoFluent Studio, provides a complete ESL flow, from architectural exploration to hardware implementation. With the ‘implementation gap’ closed, designers can benefit from the architectural exploration and profiling completed early in the design cycle.

The design compared the utilization ratio (resource load), processing latency and dynamic and average power con-sumption of the three configurations. Generic parameter-ized models of platform elements and the drag-and-drop mapping tool allow quick completion of initial architectures. Once the impact of a generic parameter that represents the hardware acceleration was analyzed, the minimum value required for that parameter to optimize both latencies and power consumption was found.

Reference C algorithms are converted to SystemC cycle-accurate models using the Catapult C synthesis tool. The resulting cycle-accurate models are integrated back into Co-Fluent Studio to refine the TLMs for those functions that map onto hardware processors. Wrapping SystemC around cycle-accurate models enables the transaction-level models to interface with cycle-accurate models. The behavior of the wrapped, detailed models was verified within CoFluent Studio against the behavior of the reference model, which served as a testbench.

Back-annotated timing properties of the reference TLM are based on exact timing obtained by simulating the de-tailed models. The back-annotated model is used to explore the same architectures as with the reference model. Reach-ing the same conclusions confirms that decisions can be made early and with a high level of confidence based on the reference transaction-level model. This also confirms that external SystemC models—hand-written or synthesis result—can be easily integrated into CoFluent Studio, but cycle-accurate models should only be used for validation and calibration, and be replaced by their transaction-level equivalent models to maintain simulation efficiency.

AcknowledgementsThe author would like to thank Thomas Bollaert from Mentor Graphics for providing the sequential C code of the JPEG application as well as for the four detailed models generated using Catapult C.

CoFluent Design24 rue Jean Duplessis78150 Le ChesnayFrance

T: +33 139 438 242

W: www.cofluentdesign.com

Page 34: EDA Tech Forum Journal: June 2009

< TECH FORUM > VERIFIED RTL TO GATES34

1. IntroductionWith the rapid scaling down of VLSI feature sizes, the huge size of today’s designs is making transistor-level circuit simu-lation extremely time-consuming. The evolution of hardware technology has made quad-core systems already available in mainstream computers, and higher core counts are planned. The shift from single-core to multicore processing is creating new challenges and opportunities for circuit simulation. Par-allel computation is becoming a critical part of circuit simula-tors designed to handle large-scale VLSI circuits.

Several parallel circuit simulation techniques have been proposed. The Siemens circuit simulator TITAN [6] parti-tions the circuit by minimizing the total wire length for the circuit, and parallel simulation is performed by using a non-overlapping domain decomposition technique. However, the efficiency of the approach can deteriorate quickly as the size of interface increases. Weighted graphs and graph de-composition heuristics are used in the Xyce parallel circuit simulator [7] to partition the circuit graph to achieve load balance and minimize communication costs. The waveform relaxation technique has been proposed for parallel circuit simulation [8], however, it is not widely used because it is only applicable to circuits with unidirectional signal flow.

Parallel domain decomposition methods have been de-veloped for the simulation of linear circuits such as power ground networks [2]. All these methods are mainly based on parallel matrix solving and device model evaluation; they are either incapable of handling circuits with nonlinear com-ponents or their performance drops quickly for circuits with nonlinear components as the number of processor cores and circuit size increase. WavePipe [3] complements previous methods by extending classical time-integration methods in SPICE. It takes fundamental advantage of multi-threading at the numerical discretization level and exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points. MAPS [4] explores inter-algorithm parallelism by starting multiple simulation algorithms in parallel for a given task.

The method proposed here is a fully parallel circuit simu-lation for full-chip transient circuit analysis. It can simulate large industrial designs in parallel achieving speedups of orders of magnitude over SPICE without jeopardizing ac-curacy. It provides orthogonal improvements over methods like WavePipe [3] and MAPS [4] and complements other parallel matrix solver and/or parallel device model evalua-tion-based approaches in a number of ways.

1. It can perform transistor-level full-chip circuit simula-tion for general circuit designs with interconnect, clock, power and ground, and analog components with ca-pacitive and inductive couplings.

2. The circuit is partitioned into a linear subdomain and

This paper is based on one originally presented at Design Automation and Test in Europe 2009, in Nice, France.

He Peng and Chung-Kuan Cheng, University of California, San Diego

Parallel transistor-level full-chip circuit simulation

PowerNetwork

Signal/ClockNetwork1

PowerNetwork

GroundNetwork GroundNetwork

Signal/ClockNetwork2

Signal/ClockNetwork3

Signal/ClockNetwork1

Signal/ClockNetwork2

Signal/ClockNetwork2

Signal/ClockNetwork3

(a) (b) (c)

FIGURE 1 Circuit partition example. (a) Original circuit (b) Nonlinear subdomain 1 (c) Nonlinear subdomain 2

Source: UCSD

Page 35: EDA Tech Forum Journal: June 2009

35EDA Tech Forum June 2009

The paper presents a fully parallel transistor-level full-chip circuit simulation tool with SPICE accuracy for general cir-cuit designs. The proposed overlapping domain decompo-sition approach partitions the circuit into a linear subdo-main and multiple nonlinear subdomains based on circuit nonlinearity and connectivity. A parallel iterative matrix solver is used to solve the linear domain while nonlinear subdomains are parallelly distributed into different proces-sors topologically and solved by a direct solver.

To achieve maximum parallelism, device model evaluation is also done in parallel. A parallel domain decomposition technique is used to iteratively solve the different parti-tions of the circuit and ensure convergence. The technique is several orders of magnitude faster than SPICE for sets of large-scale circuit designs on up to 64 processors.

multiple nonlinear subdomains. Overlapping domain decomposition is used as an interface to ensure con-vergence. The linear subdomain is further partitioned by ParMETIS [9] and in parallel solved by the iterative solver GMRES with BoomerAMG as the precondition-er to achieve parallel scalability.

3. Convergence is further improved by solving different nonlinear subdomains according to their circuit topo-logical order.

4. To achieve maximum parallelism, device model evalu-ation is done in parallel.

5. SPICE accuracy is guaranteed by using the same con-vergence check and dynamic time stepping techniques as Berkeley SPICE3f5.

This paper is an extension of a method proposed in [5], with the following major improvements:

1. An improved parallel simulation flow with new paral-lel device model evaluation, numerical integration and linearization steps that increase parallel scalability of the proposed approach.

2. A new and improved parallel implementation that runs on up to 64 processors.

The paper is organized as follows. Section 2 presents the parallel simulation approach in detail as well as the overall simulation flow. Experimental results are then described in Section 3. Finally, we offer our conclusions and proposals for future research.

2. Parallel domain decomposition simulationA. Domain decomposition partition and parallel graph partitionThe proposed approach reads the circuit description in SPICE format and partitions the circuit at linear and nonlin-ear boundaries at gate level. The circuit is partitioned into a linear subdomain and multiple nonlinear subdomains. First, we partition the circuit into different nonlinear partitions (i.e., subdomains) based on circuit nonlinearity and connectivity. Figure 1(a) shows an example of two NAND gates surround-ed by coupled power, ground and signal networks. Figure 1(b) and 1(c) show two different nonlinear subdomains. The nonlinear subdomains include nonlinear functional blocks

as well as input/output signal networks connected to these functional blocks. Power and ground networks are not in-cluded because we need to make the nonlinear subdomains small enough to be solved efficiently by a direct solver. Parti-tions that break feedback loops are avoided. Since these feed-back loops usually are not very large, it is feasible to keep them in a single nonlinear subdomain.

Once the circuit is partitioned into different nonlinear sub-domains Ω1, ..., Ωk, we add the whole circuit as a linear sub-domain Ω0. Unsymmetry in the system matrix of Ω0 caused by nonlinear components in the circuit are removed as de-scribed in 2C in order to improve the matrix properties of Ω0. This allows us to use parallel iterative matrix solvers to solve the linear subdomain efficiently. We use the parallel graph partition tool ParMETIS [9] to further partition the linear sub-domain Ω0 to achieve parallel load balance and make the pro-posed approach scale with the number of processors.

TypeA1

A2

A0

FIGURE 2 Overlapping domain decomposition partition of system matrix A

Source: UCSD

Continued on next page

Page 36: EDA Tech Forum Journal: June 2009

< TECH FORUM > VERIFIED RTL TO GATES36

B. Gate decoupling and topological order simulationBecause of nonlinear elements in the circuit, the system ma-trix for linear subdomain Ω0 is not symmetrical and hence it is unsuitable for fast parallel linear solvers. The main asym-metry in the matrix comes from the gate-source and gate-drain coupling in device models. We move this coupling from the system matrix of the linear subdomain to the right hand side of the linear system. For example, the following sub-matrix demonstrates gate-drain coupling in the BSIM3 model:

aa

VV

II

gg gd

dg dd

g

d

g

d

a a

=

We move the off diagonal terms in the matrix to the right hand side:

a VV

I a V

I agg

dd

g

d

g gd dt

d

0 a0

=

− ddg gtV

where Vtd and Vt

g are solutions at previous iteration.This process simplifies the linear subdomain and im-

proves the matrix properties of the system matrix. With this simplification, we could use parallel solvers like GMRES to solve the matrix efficiently. The linear subdomain Ω0, which is the entire circuit with simplification, is then partitioned by ParMETIS and solved in parallel using GMRES with BoomerAMG as a preconditioner. Nonlinear subdomains are evenly distributed into different processors and solved by the direct solver KLU [13].

To increase the convergence rate, we generate a topological order from primary inputs and flip-flops in the circuit and solve nonlinear subdomains according to this order. Feedback loops are reduced into a single node when the topological order is gen-erated. Convergence is achieved by using parallel domain de-composition techniques, which will now be introduced in 2C.

C. Parallel domain decomposition techniques for circuit simulationDomain decomposition methods refer to a collection of di-vide-and-conquer techniques that have been primarily de-veloped for solving partial differential equations [10], [11]. The partition method introduced in 2A generates overlap-ping subdomains. For example, as shown in Figure 1, non-linear subdomains 1 and 2 overlap as they share the same signal/clock network 2.

Partitioning the circuit into a linear subdomain Ω0 and K overlapping nonlinear subdomains Ω1, ... , Ωk is equivalent to partitioning the system matrix A of the circuit into a matrix A0 and K overlapping sub-matrices A1, ..., AK as shown in Figure 2 (p. 33), where Ai is the matrix representing subdomain Ωi.

Schwarz Alternating Procedure [11] is used to iteratively solve the linear system Ax=b. The algorithm is described below. The proposed method first solves the linear subdo-main Ω0. The linear subdomain is partitioned by ParMETIS and solved in parallel using GMRES with BoomerAMG as a preconditioner. Next, all nonlinear subdomains Ω1, ...,ΩK are distributed into different processors according to their topo-logical order and solved by the direct solver KLU. Residue values at a subdomain boundary are updated as soon as the solution for a given subdomain is available. The Schwarz Al-ternating Procedure continues until convergence is reached.

Schwarz Alternating Procedure1. Input: Matrices A, A0, A1, ..., AK, right hand side b.2. Output: Solution x

Load Netlist

Domain DecompositionPartition

Partition Linear Domain(ParMetis)

Generate Topological Order

DeviceEvaluation

NumericalIntegration

Linearization

Processor 1

Parallel SolveLinear Domain

Parallel Distribute and DirectSolve Non-Linear Domains

Schwarz AlternatingProcedure Convergence?

Newton-Raphson IterationConvergence?

Next Time Step

Yes

No

No

DeviceEvaluation

NumericalIntegration

Linearization

Processor 2

DeviceEvaluation

NumericalIntegration

Linearization

Processor N

FIGURE 3 Parallel transient simulation flow

Source: UCSD

Page 37: EDA Tech Forum Journal: June 2009

37EDA Tech Forum June 2009

3. Choose initial guess x4. Calculate residue r = b−A x5. repeat6. for i = 0 to k do7. Solve Aiδi = ri8. Update solution x. xi = xi+δi9. Update residue values on boundary10. endfor11. until Convergence12. Output x

D. Parallel device loadingDevice model evaluation, an essential part of a circuit simulator, could easily consume more than 30% of the total runtime in a se-rial simulation. For parallel circuit simulation, the device model evaluation part becomes more sensitive as it will introduce sig-nificant communication overhead if it is not done in parallel and optimally. The proposed method performs the device loading, numerical integration and linearization steps in parallel. Once the circuit is partitioned, each processor calculates the device model, numerical integration and linearization for its own par-tition. This approach reduces the device model evaluation run-time and reduces the communication overhead needed for the stamping of Jacobian matrix entries among the processors.

E. Parallel transient simulation flowThe overall parallel transient simulation flow is presented in Figure 3. The device loading, numerical integration and linear-ization steps are the same as Berkeley SPICE except that they are done in parallel. As shown, the linear subdomain is par-titioned into N partitions by ParMETIS, where N is the total number of available processors. Each processor loads its own part of the circuit. After the parallel device loading, numeri-cal integration and Newton-Raphson linearization, the linear subdomain is solved in parallel. Nonlinear subdomains are then evenly distributed into available processors according to their topological order and solved by a direct solver.

3. Experimental resultsThe proposed approach was implemented in ANSI C. Paral-lel algorithms are implemented with MPICH1. We adopted

the same dynamic step size control, numerical integration and nonlinear iteration methods as are used in Berkeley SPICE3. The GMRES solver and BoomerAMG precondi-tioner in the PETSc [12] package were used as iterative solv-ers, and KLU [13] was used as a direct solver. Five industrial designs were tested on the FWGrid [14] infrastructure with up to 64 processors. Table 1 lists the transient simulation runtime, where DD refers to the proposed method.

Test cases ckt1 and ckt2 are two linear circuit dominant test cases. Figure 4 shows the waveform of one node of the ckt1 cir-cuit. The waveform shows that our result matches the SPICE result. Test cases ckt3, ckt4 and ckt5 are cell designs with power and ground networks. We can see from Table 1 that the pro-posed method is orders of magnitude faster than SPICE.

The results also show that the proposed method scales very well as we increase the number of available processors. How-ever, the performance increase from 32 to 64 processors is not as great as the increase from 16 to 32 processors. This is due to cross-rack communication overhead on the FWGrid infrastructure. With 64 processors, we need to use at least two racks as each rack

Transient simulation runtimeCase #Nodes #Tx #R #C #L SPICE DD

(1 Proc.)DD (4 Proc.)

DD (16 Proc.)

DD (32 Proc.)

DD (64 Proc.)

ckt1 620K 770 550K 370K 270K 4,257s 661s 245s 106s 58s 49s

ckt2 2.9M 3K 1.9M 1.2M 810K N/A 18,065s 6,429s 2,545s 1,493s 1,179s

ckt3 290K 80K 405K 210K 0 20.5h 4,761s 1,729s 703s 439s 337s

ckt4 430K 145K 360K 180K 50K 49.4h 7,297s 2,731s 1,182s 782s 596s

ckt5 1M 6.5K 2.2M 1M 5K N/A 5,714s 2,083s 855s 443s 318s

TABLE 1 Transient simulation runtime

Source: UCSD

SPICEProposed Approach

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x 10−9

0.998

0.9985

0.999

0.9995

1

1.0005

1.001

1.0015

1.002

1.0025

Time

Vo

ltag

e

FIGURE 4 Transient simulation waveform of ckt1 design

Source: UCSD

Continued on next page

Page 38: EDA Tech Forum Journal: June 2009

< TECH FORUM > VERIFIED RTL TO GATES38

on a FWGrid has only 32 processors. The ParMETIS partition of the linear subdomain is very important for parallel scalability, as we have noticed that without ParMETIS it is very hard to achieve performance gains when more than 16 processors are used.

4. Conclusions and future researchA fully parallel circuit simulation tool for transistor-level simulation of large-scale deep-submicron VLSI circuits has been presented. An orders-of-magnitude speedup over Berkeley SPICE3 is observed for sets of large-scale real de-sign circuits. Experimental results show an accurate wave-form match with SPICE3.

For future work, we would like to extend this method to su-percomputers with hundreds of processors. Parallel load bal-ancing and machine scheduling techniques will be developed to ensure our tool scales with growing numbers of processor and for circuit sizes with hundreds of millions of elements.

[2] K. Sun, Q. Zhou, K. Mohanram and D. C. Sorensen ”Parallel domain decomposition for simulation of large-scale power grids”, in Proc. ICCAD, 2007, pp. 54-59[3] W. Dong, P. Li and X. Ye ”Wavepipe: Parallel Transient Simulation of Analog and Digital Circuits on Multi-core Shared-memory Machines”, in Proc. DAC, 2008, pp. 238-243.[4] X. Ye, W. Dong, P. Li and S. Nassif ”MAPS: Multi-Algorithm Parallel Circuit Simulation”, in Proc. ICCAD, 2008, pp. 73-78.[5] H. Peng and C.-K. Cheng, ”Parallel Transistor Level Circuit Simulation using Domain Decomposition Methods”, in Proc. ASPDAC, 2009, pp. 397-402.[6] N. Frohlich, B. M. Riess, U. A. Wever, and Q. Zheng ”A New Approach for Parallel Simulation of VLSI Circuits on a Transistor Level”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 45. No.6, June 1998[7] S. Hutchinson, E. Keiter, R. J. Hoekstra, H. A. Watts, A. J. Waters, R. L. Schells and S. D. Wix ”The Xyce Parallel Electronic Simulator - An Overview”, IEEE International Symposium on Circuits and Systems, Sydney (AU), May 2000[8] A. Ruehli and T. A. Johnson ”Circuit Analysis Computing by Waveform Relaxation”, in Encyclopedia of Electrical and Electronics Engineering, vol. 3, Wiley, 1999.[9] G. Karypis, K. Schloegel and V. Kumar, ParMETIS - Parallel Graph Partitioning and Sparse Matrix Ordering, University of Minnesota, http://glaros.dtc.umn.edu/gkhome/metis/parmetis/

overview, 2003[10] B. Smith, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, 2004.[11] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, 2003.[12] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. Curfman McInnes, B. F. Smith and H. Zhang, PETSc Users Manual, ANL-95/11, Argonne National Laboratory.[13] T. A. Davis, Direct Methods for Sparse Linear Systems, SIAM, Philadelphia, Sept. 2006. Part of the SIAM Book Series on the Fundamentals of Algorithms.[14] FWgrid Home Page: http://fwgrid.ucsd.edu/

Department of Computer Science and EngineeringUniversity of California, San DiegoLa JollaCA 92093-0404USAARM ColdFire PowerPC X86

R T O S I N N O V A T O R S800.366.2491 [email protected]

Free Evaluation Kits: www.smxrtos.com/evalFree Demos: www.smxrtos.com/demo

You’ve found it. Our software is built torun right out of the box—no integrationrequired—and we provide full support forpopular tools. With Micro Digital you have low-cost, no-royalty licensing, full source code, and direct programmer support. So get your project off to the right start. Visit www.smxrtos.com/processors today.

Looking for the right software foryour processor?

SMX RTOSBSPsDevice DriversKernel AwarenessSimulator

TCP/IPFAT File SystemFlash File SystemGUIC++ Support

USB DeviceUSB HostUSB OTGUSB Class DriversFloating Point

Untitled-12 1 6/1/09 3:39:56 PM

AcknowledgmentThe authors would like to acknowledge the support of NSF CCF-0811794 and the California MICRO Program.

References[1] L. Nagal, ”Spice2: A Computer Program to Simulate Semiconductor Circuits,” Tech. Rep. ERL M520, Electronics Research Laboratory Report, UC Berkeley, 1975

Page 39: EDA Tech Forum Journal: June 2009

Copyright 2009 Taiwan Semiconductor Manufacturing Company Ltd. All rights reserved. Open Innovation Platform™ is a trademark of TSMC.

Performance. To get it right, you need a foundry with an Open Innovation Platform™ and process technologies that provides the flexibility to expertly choreograph your success. To get it right, you need TSMC.

Whether your designs are built on mainstream or highly advanced processes, TSMC ensures your products achieve maximum value and performance.

Product Differentiation. Increased functionality and better system performance drive product value. So you need a foundry partner who keeps your products at their innovative best. TSMC’s robust platform provides the options you need to increase functionality, maximize system performance and ultimately differentiate your products.

Faster Time-to-Market. Early market entry means more product revenue. TSMC’s DFM-driven design initiatives, libraries and IP programs, together with leading EDA suppliers and manufacturing data-driven PDKs, shorten your yield ramp. That gets you to market in a fraction of the time it takes your competition.

Investment Optimization. Every design is an investment. Function integration and die size reduction help drive your margins. It’s simple, but not easy. We continuously improve our process technologies so you get your designs produced right the first time. Because that’s what it takes to choreograph a technical and business success.

Find out how TSMC can drive your most important innovations with a powerful platform to create amazing performance. Visit www.tsmc.com

A Powerful Platform for Amazing Performance

Untitled-4 1 6/3/09 11:46:00 AM

Page 40: EDA Tech Forum Journal: June 2009

< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION40

Transmission line reflection and ground bounce are two of the main issues that arise in any discussion of noise issues for digital circuitry. Generally, though, digital circuits oper-ate with relatively large signal levels that have high noise margins, making them inherently immune to low-level noise pick-up.

If a circuit performs analog or data acquisition activities, a small amount of external noise can cause significant inter-ference. For instance, 10mV of noise in the analog ground between a 12-bit analog-to-digital converter (ADC) and

the converter’s driver amplifier can cause an error of eight least-significant-bits (LSB). In contrast, digital systems can tolerate hundreds of millivolts of this type of ground error before intermittent problems start to occur.

Finding the origin and then eliminating interfering noise in the analog domain presents a formidable challenge. Of particular interest are ‘slow’ sensor systems where design-ers are tempted to ignore problematic, high-frequency noise issues. This article looks at hardware noise reduction strate-gies for signal conditioning paths with sensors. It will ex-plore noise topics such as conducted, device and radiated noise from an analog perspective.

Data acquisition circuit using a load-cell sensorFigure 1 shows the example circuit used in this discussion.

Bonnie Baker is a senior applications engineer for Texas Instru-ments and has been involved with analog and digital designs and systems for over 20 years. She has written hundreds of articles, de-sign and application notes, conference papers, and authored the book A Baker’s Dozen: Real Analog Solutions for Digital Designers.

Bonnie C. Baker, Texas Instruments

Reducing system noise with hardware techniques

Wall Wart

9V DC Out

uLM78

A5

VDD=5V

A4DCLOCK

DOUT

CS/SHDN

to T

MS3

20C

6713

DSK

A2

R3

R4

R1

R1 R2

LCN

LCP

R2

RGR3

R4

A1

1/2 ofOPA2337

Two-Op AmpInstrumentation Amplifier

VDD

A6

REF29252.5V

Reference

VDD

VDD

ADS7829

LCL-816G

FIGURE 1 A 12-bit ADC, combined with an instrumentation amplifier, converts a low-level signal from a bridge sensor

Source: TI

Page 41: EDA Tech Forum Journal: June 2009

41EDA Tech Forum June 2009

Circuit noise problems can originate from a variety of sources. By carefully examining attributes of the offending noise you can identify it’s source, thereby making noise reduction solutions become more apparent. There are three subcategories of noise problems: device, conducted and radiated noise.

If an active or passive device is the major noise contributor, you can substitute lower noise devices into the circuit.

You can reduce conducted noise with by-pass capacitors, analog filters and/or rearrange positions of the devices on the board with respect to the power connectors and signal path.

You can minimize the contribution of radiated noise with a careful layout that avoids signal-coupling opportunities, inclusion of ground and power planes and system shield-ing techniques.

This article discusses and illustrates these strategies with reference to a data acquisition circuit using a load-cell sensor.

Its analog portion consists primarily of the load-cell sensor, a dual operational amplifier (op amp) (OPA2337 [4]) config-ured as an instrumentation amplifier, and a 12-bit, 100kHz SAR ADC (ADS7829 [4]).

The sensor (LCL-816G [4]) is a 1.2kW, 2mV/V load cell with a full-scale range of ±32oz. In this 5V system, the electrical full-scale output range of the load cell is ±10mV.

The instrumentation amplifier consisting of two op amps (A1 and A2) and five resistors creates a 153V/V gain. This gain matches the instrumentation amplifier block’s full-scale output swing to the ADC’s full-scale input range. The SAR ADC has an internal input sampling mechanism. With this function, each conversion produces a single digitized sample. The processor, for example the TMS320C6713B [4], acquires the data from the SAR converter, performs some calibration and translates the data into a usable format for tasks such as displays or actuator feedback signals.

The transfer function, from the sensor to the output of the ADC is:

D LC LC GAIN VVOUT P N REF

DD

= − +

(( )( ) )1

122

where:LC V R R RLC V R R RGAIN

P DD

N DD

= += +

=

( / ( )),( / ( )),

(

2 1 2

1 1 2

1++ +R R R RG3 4 32/ / )

In this equation, LCP and LCN are the positive and nega-tive sensor outputs, GAIN is the gain of the instrumentation amplifier circuit. VREF is a 2.5V reference, which level shifts the instrumentation amplifier output, VDD is the power sup-ply voltage and sensor excitation voltage, and DOUT is a decimal, whole number representation of the 12-bit digital output code of the ADC.

If the design implementation is poor, this circuit could be an excellent candidate for noise problems. The symptom of a poor implementation is an intolerable level of uncertainty over the digital output results from the ADC. It is easy to as-sume that this type of symptom indicates that the last device in the signal chain generates the noise problem. On the con-

9080706050403020100

2960 2970 2980 2990

Nu

mb

er o

f O

ccu

rren

ces

Output Code of 12-bit A/D Converter

Code Widthof Noise = 44

(Total Samples =1024)

6.54 Noise-Free Bits

FIGURE 2 Poor implementation of the 12-bit data acquisition system easily could have an output range of 44 different codes with a 1024 sample size

Source: TI

Frequency (log)

BroadbandNoise

1 / f Noise

nV

/ H

z (l

og

)

FIGURE 3 Noise contributions from devices across the frequency spectrum emulate a 1/f characteristic at low frequencies and a flat response (broadband noise) at the higher frequencies

Source: TI

Continued on next page

Page 42: EDA Tech Forum Journal: June 2009

< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION42

trary, the root cause of poor conversion results could stem from the other active devices, from passive components, the PCB layout, or even extraneous sources.

For instance, if a designer did not take appropriate noise reduction measures, the 12-bit system in Figure 1 could out-put a large distribution of codes for a DC input signal as shown in Figure 2 (p. 39). The data it shows is far from an optimum implementation. Forty-four bits of peak-to-peak error changes the 12-bit converter system into a noise-free, 6.5-bit system.

Noise problems can be separated into these three subcat-egories:

1. Device Noise. This originates in active or passive devices on the board.

2. Conducted Noise. This appears in the PCB traces, and originates in devices on the board, or as a result of e-fields or b-fields.

3. Radiated Noise. This is transmitted into the system via e-fields or b-fields.

Device noiseYou can find device noise in both passive and active devices. The materials in passive devices can be films or composites. Resistors, capacitors, inductors and transformers fall into this category. The material in active devices is silicon. Ac-tive devices include bipolar transistors, field effect transis-tors, CMOS transistors and integrated circuits that use these

transistors. When you add device noise sources together, the equa-

tions are different from those used to describe voltage, cur-rent and number of bits. The fundamental difference is that noise signals are uncorrelated. Therefore, you implement a simple addition of voltage or current noise sources with an RSS formula, or the square root of the sum of the squares. If adding several voltage sources, you would use the follow-ing formula:

V V V V VTOTAL N2

12

22

32 2= + + + +...

This formula applies to noise contributions over a spe-cific bandwidth (BW). If there is no bandwidth definition, the particular test frequency must be used. In this case, the noise units are V/ÖHz. These units of measure describe the voltage noise density (also known as spot noise). Spot noise is measured at a specific frequency over a 1Hz bandwidth. Generally, the units of measure for voltage noise are: nV/ÖHz, mVrms or mVp-p.

Passive devices/Resistors There are three basic classes of fixed resistors: wirewound, film type and composition. Regardless of construction, all resistors generate a noise voltage. This noise is primarily a result of thermal noise. Lower quality resistors such as the composition type have additional noise in the lower fre-

OPA340

A3

C2R7

R8

C1

Wall Wart

9V DV Out

uLM78

A5

VDD=5V

A4

to T

MS3

20C

6713

DSK

A2

R3

R4

R1

R1 R2

LCN

LCP

R2

RGR3

R4

A1

1/2 ofOPA2335

Two-Op AmpInstrumentation Amplifier

2nd OrderLow-Pass Filter

A6

REF29252.5V

Reference

VDD

VDD

VDD

ADS7829

LCL-816G

FIGURE 4 A second revised design has lower noise devices, by-pass capacitors, a second-order anti-aliasing filter and ground plane

Source: TI

Page 43: EDA Tech Forum Journal: June 2009

43EDA Tech Forum June 2009

quency spectrum due to shot and contact noise. Thermal noise (aka Johnson noise) is generated by the random ther-mal motion of electrons in the resistive material. This noise is independent of DC or AC current flow, and is constant across the entire frequency spectrum. The ideal thermal noise for resistors is:

V K T R BWN = × × × ×4 ( )

In this equation K is equal to Boltzman’s constant (1.38e-19), T is equal to temperature in Kelvin, R is the resistance value in Ohms, and (BW) is the noise bandwidth of inter-est.

Wirewound resistors are the quietest of the three and come closest to ideal noise levels. Composition resistors are the noisiest because of their contact noise, which is aggra-vated by current. Otherwise composition resistors have the same noise as wirewound. You can reduce resistive noise by reducing the value of the resistors on your board.

Active devicesThis category of devices includes op amps, instrumenta-tion amplifiers, voltage references and voltage regulators, among others. Two areas of voltage noise in the frequency domain are the 1/f and broadband regions.

The 1/f noise is a low-frequency noise where power den-sity varies as the reciprocal of frequency (1/f). This noise is a consequence of trapped carriers in the semiconductor ma-terial, which are captured and released in a random manner. The time constant of this energy is concentrated within the lower frequencies. Figure 3 (p. 39) shows an example of 1/f noise.

Broadband noise is associated with the DC current flow across p-n junctions. This noise is due to a random diffusion of carriers through the base of the transistor and a random generation and recombination of whole electron pairs. You can reduce the noise that the active devices generate by se-lecting low-noise devices at the start.

Conducted noiseConducted noise is the noise present on PCB traces. These problems can often be corrected at the point of origin.

Power supply filter strategiesRegardless of the power source, good circuit design implies that by-pass capacitors are used. While a regulator, DC/DC converter, linear or a switching power supply can provide power to the board, in all cases by-pass capacitors are a re-quired part of the design.

By-pass capacitors belong in two locations on the PCB: one at the power supply (10mF to 100mF or both), and one for every active device (digital and analog). The value of each by-pass capacitor will depend on each device it is as-sociated to. Generally speaking, if the device’s bandwidth is less than or equal to ~10MHz, a 0.1mF by-pass capacitor will reduce injected noise dramatically. If the bandwidth of the device is above ~50MHz, a 0.01mF by-pass capacitor is probably appropriate. Between these two frequencies, ei-ther or both can be used. In all cases, it is best to refer to the manufacturer’s guidelines for specifics.

By-pass capacitor leads must be placed as close as possi-ble to the device’s power supply pin. If two by-pass capaci-tors are used for one device, the smaller of the two should be closest to the device pin. Finally, the lead length of the by-pass capacitor should be as short as possible in order to minimize lead inductance.

Signal path filtering strategiesA system such as that shown in Figure 1 requires an analog filter. The primary function of the low-pass, analog filter is to remove the input signal’s high-frequency components going into the ADC. If these high frequencies pass to the ADC, they will contaminate the conversion data by aliasing during the conversion process. To attenuate high-frequency noise, a two-pole, anti-aliasing filter is added to the circuit.

Code Width of Noise = 1(Total Samples = 1024)

FIGURE 5 When implementing the circuit in Figure 1 using noise reduction techniques, a 12-bit system can be achieved

Source: TI

Continued on next page

Page 44: EDA Tech Forum Journal: June 2009

< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION44

Layout strategiesDevice placement is critical. In general, the circuit devices can be separated into two categories: high-speed (>40MHz) and low-speed. Then, they should be separated again into three sub-categories: pure digital, pure analog and mixed signal. The pure analog devices should be furthest away from the digital devices and the connector to ensure that digital switching noise is not coupled into the analog signal path through the traces or ground plane.

Emitted or radiated noiseA circuit’s level of susceptibility to extraneous noise is di-rectly related to the implementing signal traces across the board, ground plane and power plane strategy, and subtle-ties such as using differential signal paths and shielding.

Signal tracesAs a basic guideline, both analog and digital signal traces on PCBs should be as short as possible. Shorter traces mini-mize the circuit’s susceptibility to onboard and extraneous signals. The amount of extraneous noise that can influence the PCB is dependent on the environment. Opportunities for onboard signal coupling, however, can be avoided with careful design. One set of terminals to be particularly cau-tious with are the input terminals of an amplifier. Radiated noise problems can arise because these terminals typically have high-impedance inputs.

Signal coupling problems occur when a trace with a high-impedance termination is next to a trace with fast changing voltages. In such situations, charge is capacitively coupled between the traces per the formula:

I = CdV/dt (Formula 1)

In Formula 1, current is in amperes, C is capacitance, dV is change in voltage, and dt is change in time.

Ground and power supply strategyBoard layout definition, ground plane implementation and power supply strategy are critical when designing low-noise solutions. The PCB used in the data for Fig-ure 2 did not have a ground plane. Ground planes solve problems such as offset errors, gain errors and noise problems.

The inclusion of the power plane in a 12-bit system is not as critical as the presence of the ground plane. Al-though a power plane can solve many problems, power

noise can be reduced by making the power traces two or three times wider than minimum trace widths on the board.

Back to the drawing boardIf we modify the circuit in Figure 1 with low-noise strate-gies in mind, we end up with the circuit in Figure 4 (p. 40). The PCB has an added ground plane, lower value resistors, lower noise amplifiers, a low-pass filter and by-pass capaci-tors. Figure 5 (p. 41) shows the resulting noise data in histo-gram form.

References1. Morrison, Ralph, Noise and Other Interfering Signals, John Wiley & Sons, 1992.2. Ott, Henry W., Noise Reduction Techniques in Electronic Systems, John Wiley & Sons, 1988.3. Allen, Holberg, CMOS Analog Circuit Design, Holt, Rinehart and Winston, 1987.4. These datasheets are available for download using the following URLs: (a) “MicroSIZE, Single-supply CMOS Operational Amplifiers” (OPA337, OPA2337, OPA338, OPA2338), Texas Instruments, March 2005, www.ti.com/opa-ca.(b) “100ppm/°C, 50mA in SOT23-3 CMOS Voltage Reference” (REF2912, REF2920, REF2925, REF2930, REF2933, REF2940), Texas Instruments, February 2008, www.ti.com/voltageref-ca.(c) “10/8/12-bit High speed 2.7 V microPOWER Sampling Analog-to-Digital Converter” (ADS7826, ADS7827, ADS7829), Texas Instruments, June 2003, http://www.ti.com/ads7826-ca.(d) “TMS320C6713B Floating-point digital signal processor”, Texas Instruments, November 2005, www.ti.com/tms320c6713b-ca.(e) “Single-supply, Rail-to-rail Operational amplifier (OPA340, OPA2340, OPA4340),” Texas Instruments, November 2007, www.ti.com/opa340-ca.(f) “0.5mV/°C max, single-supply CMOS Operational Amplifiers Zero Drift Series” (OPA334, OPA2334, OPA2335, OPA2335), Texas Instruments, www.ti.com/opa334-ca.

Texas Instruments12500 TI BoulevardDallasTX 75243USA

T: 1 972 995 2011W: www.ti.com

Finding the origin and then eliminating noise in the

analog domain presents a formidable challenge.

Page 45: EDA Tech Forum Journal: June 2009

www.arm.com

at the heart...of SoC DesignARM IP — More Choice. More Advantages.

• Full range of microprocessors, fabric and physical IP, as well assoftware and tools

• Flexible Foundry Program offering direct orWeb access to ARM IP

• Extensive support of industry-leading EDA solutions

• Broadest range of manufacturing choice at leading foundries

• Industry’s largest Partner network

©ARM Ltd.AD123 | 04/08

The Architecture for the DigitalWorld®

Untitled-2 1 5/6/08 9:32:51 AM

Page 46: EDA Tech Forum Journal: June 2009

< TECH FORUM > DESIGN TO SILICON46

In these early years of the 21st century, major obstacles to circuit design can be seen in terms of premature perturba-tions to design practices attributable to the later-than-de-sired realizations of advanced semiconductor technologies. The perturbations were inevitable, but they still underlined the absence of key elements from the technology roadmap. The most widely known example of this is the demise of tra-ditional CMOS performance scaling experienced during the first half of the decade. The inability to control off current as device channel length was scaled for performance led to the architectural shift from single to multiple core processors. Although the laws of physics were unavoidable, the change in design practice took place earlier than anticipated because industry initially lacked a high-K gate dielectric material.

As we near the end of this decade, we face a similar perturbation in circuit design techniques as they relate to density scaling. Historically, density scaling has relied on lithographic scaling. However, delays to next-generation lithography (NGL) now present us with a discontinuity in the lithographic roadmap supporting that link.

IBM recognized the need for innovation to address this problem some time ago and recently announced that it is pursuing a Computational Scaling (CS) strategy for semi-conductor density scaling [1]. This strategy is an ecosystem that includes the following components, alongside neces-sary technology partnerships:

• a new resolution enhancement technique (RET) that uses source-mask optimization (SMO);

• virtual silicon processing with TCAD;• predictive process modeling;• design rule generation;• design tooling;

• design enablement;• pixilated illumination sources;• variance control; and• mask fabrication.

This article describes the lithographic discontinuity that created the need for this solution, the implications for de-sign, and the design tooling needed for the CS strategy.

Phil Strenski is Computational Technology Rules Lead in Design & Technology Integration at IBM’s Semiconductor Research and Development Center. He holds a doctorate in physics from Stanford University.

Tim Farrell is an IBM Distinguished Engineer in IBM’s Systems and Technology Group and is currently leading its initiatives in Com-putational Technology. He joined IBM in 1982 with dual degrees in Optical Engineering and Economics from the University of Rochester.

Phil Strenski, Tim Farrell, IBM Microelectronics

Computational scaling: implications for design

10000

1000

100

Optical Scaling (λ/NA)(Historically ~ 10%/year)

Nan

omet

ers

Lenses

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010

Historical Trend Outlook

FIGURE 1 Optical scaling since 1980

Source: IBM

22nm

Techn

olo

gy N

od

e

45nm Technology Node

90n

m T

ech

no

log

y N

od

e

193nm Optical Range of Influence

FIGURE 2 An expanding range of influence

Source: IBM

Page 47: EDA Tech Forum Journal: June 2009

47EDA Tech Forum June 2009

The article presents the context for the use of computa-tion scaling (CS) to eke out more from existing lithog-raphy tools until next-generation techniques are finally introduced. It discusses the critical elements in the CS ecosystem developed by IBM and partners to overcome roadblocks to optical scaling that demand the use of non-traditional techniques for the incoming 22nm process node.

The differing roles of engineers in the design and process segments of a project flow are discussed, as are some of the tools that will make CS a reality.Patterning technology

Density scaling is the progressive packing of increasing numbers of circuit patterns into a set area of silicon. For nearly three decades this has been accomplished by opti-cal scaling. Optical scaling is the introduction of lenses with either shorter exposure wavelengths (l) and/or larger nu-merical apertures (NA). A metric for optical scaling is l/NA where smaller values equate to smaller feature sizes and higher circuit density.

Operationally this has been accomplished by the pe-riodic purchase of a new exposure tool and the opti-mized selection of wafer processing set points and mask types. As shown in Figure 1, optical scaling historically enabled a 10% annual reduction in feature size and an 81% annual increase in density through 2007. However, due to economic and technical issues, traditional scal-ing will not resume until next-generation lithographic (NGL) techniques such as extreme ultraviolet (EUV),

nano-imprint and multi-column electron beam become available.

Although we have been able to realize a 10% annual im-provement in optical scaling, this did not by itself support the two-year technology development cycle introduced in the late 1990s. As such, there has been a growing gap be-tween desired optical scaling and realized optical scaling. One impact of this gap has been a decrease in the optical iso-lation of individual design constructs. The consequence, as shown in Figure 2, has been that individual constructs need to be viewed in the context of an expanding neighborhood.

The industry has managed this gap by introducing 193nm lithography, off-axis illumination, immersion lithog-raphy and double patterning. However, as shown in Fig-

45nm 32nm 22nm

Problem Visualization:

k1=0.45Single Exposure

k1=0.35Single Exposure

k1 ~ 0.25Double Exposure (DDL)

Design Topology Will Not Migrate to 22nm

FIGURE 3 Traditional optical scaling is not enough for 22nm

Source: IBM

Continued on next page

Page 48: EDA Tech Forum Journal: June 2009

< TECH FORUM > DESIGN TO SILICON48

ure 3, attempts to extend traditional optical scaling to the 22nm/20nm process node for a traditional 2D circuit pat-tern produce unacceptable results.

The current industry direction to address the highlighted problem at 22nm is the use of double (or triple) patterning with highly restrictive design rules (e.g., single orientation, near singular pitches, forbidden constructs) and design for manufacturing (DFM) tools that place responsibility for managing technological complexity on the shoulders of the designer. All of these approaches are driven by the increasing variance between designed 2D wafer patterns and resultant wafer patterns. Such a path drives a costly and complex departure from traditional IC design migra-tion paths and increases the cost of wafer production for the fabricator.

Design implicationsThe first important point to observe is that because l/NA has not been scaling consistently with incoming ge-ometries, the radius of influence for lithographic concerns has been growing in terms of design features. This prob-lem is illustrated in Figure 3 (p. 45). In the past, this radius

might cover at most a nearby edge pair, so width/space rules were generally a sufficient response. As this radius has grown, the complexity of rules has grown as well, re-sulting in width-dependent space rules and other multiple edge constraints.

At the same time, the typical curvature introduced on the wafer has become comparable to feature size, so that it is no longer reasonable to assume that wafer contours will essentially resemble the original drawn shapes, except for some minor corner rounding. It is necessary therefore to consider patterns of larger radius, and correspondingly less detail.

A second concern is that the various lithographic solu-tions available are not simply ordered. Any given approach to sub-wavelength lithography favors some classes of lay-outs at the expense of others. It is critical to work with de-sign evaluation processes that will lead to the selection of that technique that best fits your design. For example, a strong dipole is good at printing parallel lines in one direc-tion at certain pitches. But that comes at the cost of wires in the other direction. Pushing the minimum pitch may also introduce dead zones of forbidden pitch. Going to multiple exposures introduces further trade-offs. Does one use the second exposure to print alternating lines at a tighter pitch at the cost of the other direction, or print both directions with a more relaxed pitch, or enhance the printability of dif-ficult 2D situations?

A helpful concept here is the idea of retargeting (Figure 4). This involves the adjustment of drawn shapes to serve as targets for eventual wafer contours. Of necessity, this is already happening to print certain features, such as iso-lated lines. But it can also be exploited to simplify the de-sign representation. Given the flexibility to adjust shapes so that they satisfy manufacturability needs, a design can be represented on a coarser grid, capturing the topologi-cal intent without undue attention to fine details of edge movement, and without the need for identifying or follow-ing an inordinate number of rules when such small move-ments are allowed.

Density needs can be assisted by the identification of pre-validated constructs, consisting of topologies not generally allowable, but manufacturable in certain well-defined con-texts with certain specified retargeting (cf, SRAM cells but in the context of logic cells). A close design-technology in-teraction is required to make sure such constructs are prop-erly validated along with ordinary content, and defined for maximum utility to design. Updates to this methodology are likely, but much of the infrastructure is already present in the form of parameterized cells.

It is helpful when thinking about these concepts for design and design automation to consider the design community as falling into two camps. One is made up of those who use technology to produce full chip designs. The other comprises those who work with technology

Design Target Mask On Wafer

Retarget OPC Pattern

Extract From This

FIGURE 4 Retargeting shapes for eventual wafer contours

Source: IBM

0

-5

-10

-15

-20

-25

% c

han

ge

in y

ield

0 0.5 1 1.5 2

Layout A Layout B

B A

Lithography Variation

FIGURE 5 Advanced modeling techniques are the way forward

Source: IBM

Page 49: EDA Tech Forum Journal: June 2009

49EDA Tech Forum June 2009

to define its character. The first group is increasingly fo-cused on productivity, away from detailed layout and toward automation, micro-architecture and the balance between power and performance. The second is aware of technology limitations and uses tools like lithographic simulation to evaluate the trade-offs between manufac-turability and design issues like density, rule complexity and design methodology.

For the first community of chip designers, the overrid-ing technological direction is fairly synergistic and trans-parent. More design will be prevalidated in cell libraries and other building blocks. Wiring level rules will be evalu-ated for designability (i.e., friendliness to router technol-ogy), making automation more likely. And an early focus on high value patterns (with detailed implementation left to technology) should reduce the risk of significant layout change late in design. Moving toward more regular design rules should also contribute to this simplification, remov-ing the need for this community to worry about detailed edge movements.

There are some areas that could affect the first com-munity, depending on how the design rules evolve. One clear example is the anisotropy in wiring levels. Litho-graphic solutions often strongly favor one direction, so wire pitch and width values will differ depending on the preferred direction. The value of patterns in expressing design constructs and layout issues suggests opportuni-ties to impact productivity by exploiting such patterns in construction (routing, cell assembly) or analysis (ex-traction, rule checking) tasks. For example, technology could directly deliver router building blocks rather than delivering edge rules that the router configuration must experiment with to produce useful router rules. Cell assembly could use predefined blocks to achieve bet-ter density but still maintain a coarse topological grid. Extraction could use look-up for predefined content to improve accuracy and runtime, and access a retargeting process to improve accuracy.

The second community of designers contributing to tech-nology definition is more obviously affected by the emerging discontinuity. More prevalidated content will be developed (similar, again, to SRAM cells today), with simulation vali-dation and advanced lithographic techniques as illustrated in Figure 5. The modeling of all aspects of the manufactur-

ing flow (in addition to lithography) will need to improve to allow trade-offs to be made intelligently with simulation.

Design and technology will need to develop efficient means of communicating: technology simulation and uncertainty to design, and design evaluations and pro-posed content to be added to technology offerings. Some aspects of the handoff between design and technology may evolve as a result (e.g., delivering patterns for pre-defined content rather than rules, or delivering router configurations directly). The involvement of design will occur earlier in a technology node, to help make the dif-ficult decisions among the unavoidable trade-offs. New mechanisms for expressing design intent (beyond just drawn shapes) can enable technology to further opti-mize the contours for added value (yield, density, power -performance).

SummarySeverely sub-wavelength lithography presents some un-avoidable conflicts with traditional scaling assumptions. However, disciplined design-technology co-optimization provides opportunities to define effective value by carefully considering the necessary trade-offs early in the technology cycle. Design communities will be affected differently based on their interaction with technology.

Chip designers will see fairly evolutionary changes, with more regular design rules, perhaps supplemented with patterns of predefined content in constrained situations. Designers working with technology will have an increased ability to influence its direction by understanding trade-offs and working to optimize design value. IBM is engaged in all aspects of this in delivering its computational scaling solution, and is working with its partners to deliver valu-able manufacturable technologies in this deep sub-wave-length realm.

References[1] “IBM Develops Computational Scaling Solution for Next Generation ‘22nm’ Semiconductors”, press release issued 17 September 2008 (access online at http://www-03.ibm.com/press/us/en/pressrelease/25147.wss).

IBM Microelectronics DivisionEast Fishkill facilityRoute 52Hopewell JunctionNY 12533USA

W: www.ibm.com

Any given approach to sub-wavelength

lithography favors some classes of layouts at the

expense of others.

Page 50: EDA Tech Forum Journal: June 2009

< TECH FORUM > TESTED COMPONENT TO SYSTEM50

Antenna requirementsGain and communication rangeWith the advent of prolific wireless communications appli-cations, system designers are in a position to consider the placement and performance of an antenna system. The first step in establishing antenna requirements is to determine the desired communication range and terminal character-istics of the radio system (i.e., transmit power, minimum receiver sensitivity level). Given those parameters, one can ascertain the amount gain or loss required to maintain the communication range by using the Friis Transmission for-mula [1]:

PP

c g grf

r

t

t r=2

24( )πwhere:

P received power WP transmitted power W

c

r

t

==

=

[ ][ ]

sspeed of lightms

g transmitt

antenna gai

= nn

antenna gain

WW

g receiveWW

f cy

r

=

= cclic frequency [Hz]r=communication range [m]

This relation is only valid for free-space propagation, but il-lustrates the important role of the antenna gain in the maximiza-tion of the receive-to-transmit power ratio, or system link gain.

Antenna size and clearanceAntenna gain (or loss) must be part of a trade-off study be-tween performance and the physical realization consider-ations of size, placement and clearance (distance from ob-

Brian Petted is the chief technology officer of L.S. Research, a wireless product development company and EMC testing laboratory. He holds a BSEET degree from the Milwaukee School of Engineering (MSOE) and a MSEE from Marquette University.

Brian Petted, LS Research

Antenna design considerations

100

10-1

10-2

10-3

10-4

10-5

10-6

100

10-1

10-2

10-3

10-4

10-5

10-6

Gain Coverage Probability Density Functions

Dir

ecti

ve G

ain

Pro

bab

ility

-20 -15 -10 -5 0 5 10 15 20 -20 -15 -10 -5 0 5 10 15 20

v pdfh pdf

Gain Coverage Probability

v ccdfh ccdf

Directive Gain (dBi)Directive Gain (dBi)

FIGURE 1 Gain pdf (left) and associated ccdf (right)

Source: LSR

Page 51: EDA Tech Forum Journal: June 2009

51EDA Tech Forum June 2009

An overview of antenna design considerations is pre-sented. These considerations include system require-ments, antenna selection, antenna placement, antenna element design/simulation and antenna measurements. A center-fed dipole antenna is presented as a design/simulation example. A measurement discussion includes reflection parameter measurements and directive gain measurements.

structions). One basic antenna relationship presented below shows that antenna gain, g, and then antenna effective aper-ture (area) are directly proportional. This roughly indicates that antenna gain is proportional to the physical size of the antenna [2].

Ag

wavelengthcfe =

=

λπ

λ2

4 [m] = ;

Another basic antenna relationship shows the Fraunhofer or Rayleigh distance, d, at which the near/far-field transi-tion zone exists. Ideally, there should be a free-space clear-ance zone around the antenna of at least d. The largest di-

mension of the antenna, D, and the operating wavelength determine this distance [3].

dD

>

2 2

λλ r>>D; r>>;

For example, if the largest dimension of the antenna is a half of a wavelength, the minimum clearance zone is a half-wavelength. This serves as a basic guideline, however in many physical realizations, this clearance zone is compro-mised and the effects must be determined through simula-tion or empirical measurement.

d >

=2

22

λλ

Antenna gain detailsAntenna gain is defined as the ratio of radiated power intensity relative to the radiated power intensity of an isotropic (omni-directional) radiator. Power intensity is the amount of radiated power per unit solid angle mea-sured in steradians (sr) [4]. The sphere associated with the isotropic radiator has a steradian measure of 4π stera-dians and serves as the normalization reference level for antenna gain.

λ /4 λ /4

FIGURE 2 Antenna evolution from the half-wave dipole (left): quarter-wave monopole over a ground plane (center), L-antenna (right)

Source: LSR

FIGURE 3 Inverted F Antenna evolution from the L-antenna by feeding the antenna at a more favorable impedance point (left): extruded version of Inverted F antenna produced Planar Inverted F antenna (right)

Source: LSR

FIGURE 4 Sleeve Dipole Design Input into CST Microwave Studio simulator

Source: LSR

Continued on next page

Page 52: EDA Tech Forum Journal: June 2009

< TECH FORUM > TESTED COMPONENT TO SYSTEM52

gU

UU

PW srW srt

rad

isotropic

rad

t= =

; [ / ][ / ]

The antenna gain expression can be expanded further to reveal other factors that contribute to the overall antenna gain. The radiation intensity for the antenna is a function of the antenna efficiency, η, and the directivity, D. The an-tenna efficiency is a product of the reflection efficiency or mismatch loss and the losses due to the finite resistances and losses in the antenna element conductor and dielectric structures. The mismatch loss can be ascertained through simulation or measurement of the antenna’s input imped-ance or reflection coefficient, Г. The directivity is a de-scription of the gain variation as a function of the link axis angle(s) or the angle(s) of arrival/departure as described by the standard spherical coordinate system.

Antenna gain patternsIdeally, antenna patterns are displayed as 3D plots (an example is shown to accompany the case study in Fig-ure 6). The plot is often constructed from multiple cross sections known as conical cuts. A typical conical cut is formed by holding the elevation angle, θ, constant and measuring the pattern over a complete revolution of the azimuthal angle, φ. Secondly, a separate plot is gener-ally made for each component of the electrical field or polarization (Eφ-horizontal or Eθ-vertical). Examples of conical cuts are presented in Figure 7, accompanying the case study.

Since most antenna patterns are not necessarily omni-di-rectional, the description of antenna gain is fairly complex. In order to serve a system analysis in terms of determining communication range or system gain, the minimum, maxi-mum and average gain over the entire pattern of a particu-lar cut is typically used as the singular antenna gain value in the Friis transmission formula.

However, designers may want to determine the distri-bution of communication ranges and system gains, given the non-uniform nature of a directional antenna that is used in an omni-directional application. In those cases, probability density functions (pdfs) can be associated with antenna patterns, both conical cuts and 3D patterns [5]. Even though the directional antenna patterns are de-terministic, the fact that their application is omni-direc-tional with a random link axis angle makes the antenna gain a random variable with respect to communication range and system gain analyses. Figure 1 (p. 48) shows the pdf associated with both the omni- and non-uniform patterns presented on the left. On the right is the com-plementary-cumulative density function (ccdf), which is

FIGURE 5 Sleeve Dipole reflection coefficient (left) and impedance prediction (right).

Source: LSR

FIGURE 6 Sleeve Dipole 3D antenna pattern for θ-directed electric field component

Source: LSR

Page 53: EDA Tech Forum Journal: June 2009

53EDA Tech Forum June 2009

derived from the pdf and indicates the probability that the antenna can provide a minimum level of gain, given a random link axis angle.

Note for the case of the omni-directional antenna, the pdf is an impulse since the gain is single-valued and has no real distribution. The omni-directional case presents an interest-ing step-function ccdf. It shows that the probability of hav-ing a directive gain at least as large as the abscissa is 1 for gains less than the fixed gain value.

Antenna topologiesThere are many possible topologies or structures for an an-tenna. One interesting set of structures is that which evolved from the basic half-wave dipole (Figure 2, p. 49).

Starting with the half-wave dipole, the lower element of the dipole can be realized by a reflected image of the upper element onto a ground plane (using electric field boundary conditions and/or image theory). The mono-pole can be folded over, however, with degradation in impedance match and gain. The degradation due to matching can be recovered by feeding the antenna at a different point along the resonant length of the antenna (recall the impedance variation of a transmission line with a standing wave present). This results in the in-verted ‘F’ antenna. The elements may be extruded from the wire form to a planar form to realize an increase in impedance and gain bandwidth, but with a small degra-dation in gain. These additional evolutions are presented in Figure 3 (p. 49).

Antenna design and simulationThe initial design of an antenna can arise from a set of di-mensional formulas based on closed-form electromagnetic relations. In practice, however, these antennas require some empirical adjustment and/or tuning steps before you arrive

at a final design. Secondly, the electromagnetic relations as-sociated with most antennas are not of a closed form and therefore do not yield dimensional synthesis equations. Therefore, in order to design and validate an antenna prior to fabrication, it is worthwhile simulating the antenna using a electromagnetic field solver that can predict the behavior of radiating systems.

One such solver, CST Microwave Studio [6], offers many options that can simulate open-boundary, radiating struc-tures. Figure 4 (p. 49) shows the relative utility of the simu-lation tool. It is the input page for a 2.4GHz sleeve dipole antenna, containing the dimensional and material param-eter inputs required to carry out the simulation.

Upon completion of the electromagnetic simulation, the radiation pattern of the electric field is available as a 3D plot and as conical cuts. Further, the simulator predicts the input reflection coefficient and represents it as a scattering param-eter (S11). The simulator provides all of the essential infor-mation about the antenna prior to its physical realization in order to pre-validate the design approach. The predicted reflection coefficient and driving point impedance are pre-sented in Figure 5. The predicted 3D radiation pattern is presented in Figure 6, with associated conical cuts shown in Figure 7.

Antenna design validation and measurement With the antenna synthesized and realized, the design must be validated through measurement. The first nec-essary measurement is to measure the reflection coef-ficient of the antenna input port or driving point. The reflection coefficient and associated driving point im-pedance is measured with a vector network analyzer

FIGURE 7 Associated conical cuts for θ-directed electric field over angle φ at fixed angle θ=90 degrees (left) and θ-directed electric field over angle θ at fixed angle φ=0 degrees (right)

Source: LSR

Continued on next page

Page 54: EDA Tech Forum Journal: June 2009

< TECH FORUM > TESTED COMPONENT TO SYSTEM54

(VNA). Care must be taken during this measurement to ensure that the antenna is radiating and not being disturbed by any surrounding objects. Ideally, this mea-surement is performed in an anechoic chamber. How-ever, with sufficient separation between the antenna and any perturbing obstructions, this measurement can typically be performed within a normal laboratory en-vironment.

In order to initially validate the antenna design, the re-flection coefficient and associated driving point impedance must be such that the antenna is reasonably matched to the system impedance (generally 50Ω).

Once it has been established that the antenna is matched to the system impedance, the radiation pattern must be measured to complete the final steps of design validation. The measurements are performed in an an-echoic chamber by exciting the antenna under test with a known transmit source power and measuring the re-ceived power, received voltage or electric field intensity at a fixed distance.

The antenna is swept through a series of conical cuts in an effort to compare them to simulated results or to build a set of cuts to assemble into a 3D gain pattern. The abso-lute received signal is normalized either by the conducted power applied to the antenna or compared to a known reference such as a half-wave dipole. Both polarization cases are measured. With the set of pattern data at hand, the measurements can also be examined against the sys-tem requirements in terms of minimum, maximum and average gain, or against gain distribution requirements, if applicable.

ConclusionAntennas provide the primary interface between the radio and the propagation environment. The antenna requires special considerations in terms of performance requirements, design constraints, design and realiza-tion. Specification of the antenna gain and relating those requirements to the system performance in terms of range and system link gain is a foundation for the design goals of the antenna. During the antenna topol-ogy/structure selection process, consider packaging constraints in terms of the size, location and possible obstructions. Be prepared to compromise performance versus package conformance.

Ideally, one should use a simulation tool to assess the performance of the antenna prior to realization, not only to gauge the fundamental performance of the antenna, but also to check the effects of antenna compaction, obstruc-tions and other compromised parameters. The final physi-cal realization and consequent measurement of input ter-minal reflection/impedance and antenna gain complete the

design process. Often, the measurement results require that the antenna structure be modified to empirically optimize its performance.

References[1] Harald Friis, “A Note on a Simple Transmission Formula,” Proc. IRE, 34, 1946, pp. 254-256. [2] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, p. 79.[3] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, p. 30. [4] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, pp. 39-43. [5] B. Petted, “Antenna Gain Considerations in Communications System Range Analysis”, Seminar in Microwave Engineering, Marquette University, March 20, 2009. [6] CST Microwave Studio, CST of America, 492 Old Connecticut Path, Suite 505, Framingham, MA 01701. W: www.cst.com.

LS ResearchW66 N220 Commerce Court CedarburgWI 53012USA

T: 1 262 375 4400W: www.lsr.com

Page 55: EDA Tech Forum Journal: June 2009

Remaining 2009Worldwide Locations

• June 16 – Munich, Germany• August 25 – Hsin-Chu, Taiwan• August 27 – Seoul, South Korea• September 1 – Shanghai, China• September 3 – Santa Clara, CA• September 3 – Beijing, China• September 4 – Tokyo, Japan• September 8 – New Delhi, India• September 10 – Bangalore, India• October 1 – Denver, CO• October 8 – Boston, MA

Register Now

STAY ON THE

FRONT LINEOF EE DESIGN

STAY ON THE

FRONT LINEOF EE DESIGN

2009 Platinum Sponsors:

Attend a 2009 EDA Tech Forum® Event Near You

EDA_EventAd_09v4.indd 1 6/5/09 2:50:13 PM

Page 56: EDA Tech Forum Journal: June 2009

COOLVERY

Copyright © 2008 Altera Corporation. All rights reserved.

Cool off your system with Altera® Cyclone® III FPGAs. The market’s first 65-nm low-cost FPGA features up to 120K logic elements—2X more than the closest competitor—while consuming as little as 170 mW static power. That’s an unprecedented combination of low power, high functionality, and low cost— just what you need for your next power-sensitive, high-volume product. Very cool indeed.

www.altera.com

Low power

Highest functionality in its class

First 65-nm low-cost FPGA

Untitled-1 1 6/2/09 4:04:58 PM


Recommended