EDA Tech Forum Journal: September 2009

EmbeddedESL/SystemCDigital/Analog ImplementationTested Component to SystemVerified RTL to GatesDesign to Silicon

EmbeddedESL/SystemCDigital/Analog ImplementationTested Component to SystemVerified RTL to GatesDesign to Silicon

The Technical Journal for the Electronic Design Automation Community

Volume 6 Issue 4 September 2009www.edatechforum.com

INSIDE:Junot Diaz brings art to science

Extending the power of UPF

OS horses for embedded courses

AMS design at the system level

Getting the most out of USB 2.0

COMMON PLATFORM TECHNOLOGY

Chartered Semiconductor Manufacturing, IBM and Samsung provide you the access to innovation you need for industry-changing 32/28nm high-k metal-gate (HKMG) technology with manufacturing alignment, ecosystem design enablement, and flexibility of support through Common Platform technology.

Collaborating with some of the world’s premier IDMs to develop leading-edge technology as part of a joint development alliance, Chartered, IBM and Samsung provide access to this technology as well as qualified IP and robust ecosystem offerings to help you get to market faster, with less risk and more choice in your manufacturing options.

Visit www.commonplatform.com today to find out how you can get your access to innovation.

www.commonplatform.com

Industry availability of real innovation in materials science, process technology and manufacturing for differentiated customer solutions.

To find out more, visit us at these upcoming EDA TF locations:

September 1 - Shanghai, ChinaSeptember 3 - Santa Clara, CA, USA

September 4 - Tokyo, JapanOctober 8 - Boston, MA, USA

Untitled-3 1 8/11/09 2:05:38 PM

www.commonplatform.com

3EDA Tech Forum September 2009

contents

EDA Tech ForumVolume 6, Issue 4September 2009

EDA Tech Forum Journal is a quarterly publication for the Electronic Design Automation community including design engineers, engineering managers, industry executives and academia. The journal provides an ongoing medium in which to discuss, debate and communicate the electronic design automation industry’s most pressing issues, challenges, methodologies, problem-solving techniques and trends.

EDA Tech Forum Journal is distributed to a dedicated circulation of 50,000 subscribers.

EDA Tech Forum is a trademark of Mentor Graphics Corporation, and is owned and published by Mentor Graphics. Rights in contributed works remain the copyright of the respective authors. Rights in the compilation are the copyright of Mentor Graphics Corporation. Publication of information about third party products and services does not constitute Mentor Graphics’ approval, opinion, warranty, or endorsement thereof. Authors’ opinions are their own and may not reflect the opinion of Mentor Graphics Corporation.

< TECH FORUM >

16Embedded

Linux? Nucleus?... Or both?Mentor Graphics

20ESL/SystemC

Bringing a coherent system-level design flow to AMSThe MathWorks

28Verified RTL to gates

A unified, scalable SystemVerilog approach to chip and subsystem verificationLSI

34Digital/analog implementation

Implementing a unified computing architectureNetronome Systems

38Design to silicon

System level DFM at 22nmEDA Tech Forum

44Tested component to system

Pushing USB 2.0 to the limitAtmel and Micrium

50Tested component to system

Ensuring reliability through design separationAltera

< COMMENTARY >

6Start here

Stepping upEngineers must not forget that project management means tough choices.

8Analysis

Reading the runesThe latest consumer electronics forecasts mix the good with the bad.

10Interview

Engineering creativityPulitzer prize-winner Junot Diaz helps MIT students find their own voices.

12Low Power

Extending UPF for incremental growthNow IEEE approved, the standard is adding new verification and abstraction capabilities.

4

team< EDITORIAL TEAM >

Editor-in-ChiefPaul Dempsey +1 703 536 1609 [email protected]

Managing EditorMarina Tringali +1 949 226 2020 [email protected]

Copy EditorRochelle Cohn

< CREATIVE TEAM >Creative DirectorJason Van Dorn [email protected]

Art DirectorKirsten Wyatt [email protected]

Graphic DesignerChristopher Saucier [email protected]

< EXECUTIVE MANAGEMENT TEAM >PresidentJohn Reardon [email protected]

Vice PresidentCindy Hickson [email protected]

Vice President of FinanceCindy Muir [email protected]

Director of Corporate MarketingAaron Foellmi [email protected]

< SALES TEAM >Advertising ManagerStacy Mannik +1 949 226 2024 [email protected]

Advertising ManagerLauren Trudeau +1 949 226 2014 [email protected]

Advertising ManagerShandi Ricciotti +1 949 573 7660 [email protected]

On September 3, 2009, Breker Verification

Systems will present information on Model Based

Scenario Generation. Adnan Hamid, founder of

Breker, will lead a Lunch and Learn discussion at

the Tech Forum which is being held at the Santa

Clara Convention Center in Santa Clara, CA.

This informative discussion will entail information

on how scenario models improve productivity by

more than 2X, achieve coverage goals and reduce

testcase redundancy.

Some highlights from this discussion include

information on how Model based Scenarios:

- Achieve 10X reduction in testbench code

- Provide 2X productivity improvement

- Enable faster simulations

- Reduce test sequence redundancy

- Visualize pre-simulation scenarios

- Annotate test coverage results onto visual

models

- Reuse verification IP – both vertically and

horizontally.

Sign up for this session at

http://www.edatechforum.com

About Breker Verificaiton Systems

Breker’s product, Trek™, the Scenario Modeling

tool for Advanced Testbench Automation, provides

functional verification engineers with an

automated solution for generating input stimulus,

checking output results and measuring scenario

coverage.

Trek is a proven technology that demonstrates a

10X reduction in test-bench development and a 3X

improvement in simulation throughput, freeing up

resources needed to meet today’s aggressive design

goals. Architected to run in your current

verification environment, this technology also

provides powerful graphical visualization and

analysis of your design’s verification space

For more information about this leading edge

technology, visit us at www.brekersystems.com or

call 512-415-1199.

Untitled-11 1 8/14/09 11:54:23 AM

intelligent, connected devices.

15 billion connected devices by 2015.* How many will be yours? intel.com/embedded

Choose your architecture wisely.

* Gantz, John. The Embedded Internet: Methodology and Findings, IDC, January 2009.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. © 2009 Intel Corporation. All rights reserved.

Untitled-1 1 2/17/09 1:46:30 PM

intel.com/embedded

< COMMENTARY > START HERE6

start hereStepping upDuring the CEO Panel at this year’s Design Automation Conference, the men lead-ing the three largest EDA vendors stressed that their industry can do well in a slump because it both contributes to the ongoing battle against technological limits, and enables the delivery of ever greater efficiency. But another, parallel question raised by this year’s DAC and by today’s broader semiconductor environment is, “Just how much can EDA do for semiconductor vendors?”

The point here is not to highlight the limits of design software, or even, as has been previously suggested, ask whether the vendors should limit their horizons to give cli-ents room to differentiate products. Rather, we need to concentrate on the broad and—more pointedly—blunt issue of the fabbed, fabless or sub-contracted design manager.

One of the great economic advances enabled by EDA recently has given ESL ab-stractions the capability to underpin more easily executable and financially quantifi-able designs. Identify both your goals and your main problems early on and chances are you will get to market quicker and more profitably. Yet, even allowing that ESL has varied in its adoption rate across both geography and sector, the overall respin rate remains worryingly high.

IBM fellow Carl Anderson gave an excellent but alarming speech at DAC during a session previewing the 22nm node (see page 38). There is a plethora of tools and the tightest constraints in terms of time and money, but a common reason for major proj-ect delays remains late stage changes. When a project manager must decide whether to stay the course or make concessions, he all too often strays too far down the latter course, incurring risks that history tells us usually do not pay off. For these circum-stances, Anderson offers an interesting challenge: “innovation inside the box.”

Tools can only do so much—the responsibility comes down to the manager and the team wielding them. What is needed is a combination of craftsman, creator, pro-fessional and military general. Anderson had a slide showing both Albert Einstein and George S. Patton, the point being not so much the comparison as the character-istics each represents for different aspects of the industry.

The issue goes further. For example, we are on the cusp of a new age of systems-in-package (SiPs) technologies as an alternative to monolithic integration within sys-tems-on-chip (SoCs). However, those who believe SiP is simply a poor man’s SoC are deluding themselves. One design company describes the comparison more as “en-tering the world of relative pain.” There are good reasons why TSMC has only just added SiP to its Reference Flow, and so far qualified only one EDA vendor, Cadence Design Systems, which itself remains cautious about the road ahead. And why—because, as you’ve guessed, the SoC vs. SiP decision is a lot tougher than we have been led to believe, and success will depend on the commitment design managers are prepared to show throughout a project’s life.

Tools can and will do an awful lot, but now more than ever, Spiderman’s Uncle Ben was bang on the money: “With great power, comes great responsibility.”

Paul DempseyEditor-in-Chief

Untitled-6 1 8/14/09 10:16:30 AM

www.mentor.com/embedded

< COMMENTARY > ANALYSIS8

When we first reviewed the consumer electronics market at the beginning of the year, there were still hopes that growth could remain statistically flat despite global economic woes. How-ever, at the beginning of the summer, the Consumer Electronics Association revised its forecast down from Janu-ary’s -0.7% to -7.7%, implying total fac-tory-gate sales of $165B. The CEA is also now predicting that 2010 will see a more modest rebound, in the 1-2% range.

This may appear to be at odds with second quarter financial results posted by companies such as Intel that firmly exceeded Wall Street’s expectations. Research firm iSuppli also recently es-timated that Q2 consumer electronics sales did rise sequentially on the be-ginning of the year, by 4.2% to $71.1B, although that number is 11.3% down year-on-year.

The good news—with due deference to Pollyanna—may therefore be that the research is now more a reflection of a slump that is behind the high technol-ogy sector, but which was more severe than anticipated. But even here, a devil may lurk in the details.

Speaking shortly before the release of his association’s re-vised forecast, CEA economist Shawn DuBravac said that he expected the recession to bottom out during August. However, the Digital Downtown event that saw this com-ment was also marked by significant disagreement among analysts from the CEA and elsewhere as to whether the subsequent recovery curve will be V-shaped (i.e., relatively quick) or U-shaped (i.e., bumping along the floor for a while yet). The CEA’s revised number suggests that it is taking a relatively conservative position between the two.

If there is a more solid cause for optimism, it may well lie in the relative strength of consumer electronics during this downturn. The forecast decline in factory sales is the first since the 2001 dot.bomb, and is substantially lower than those seen in the automotive market (40% down) and

in housing (down by just over a third). An important factor here may be an at-titudinal shift among consumers.

“We’ve been saying for quite some time that many types of consumer elec-tronics have moved from being seen as luxuries to essential purchases,” says Stephen Baker, vice president, industry analysis at the NPD Group, one of the leading retail research groups. “A lot of that is not about perception, but how we are becoming a digital society. For example, as we get closer to the fall, you see increased spending on laptops and other products that students need for the new academic year.”

There was more evidence of this in June when Apple announced that the iPhone 3G S upgrade shipped one mil-lion units during its first three days on sale in the U.S. A high proportion of buyers attributed their ‘had-to-have’ purchase decisions to productivity rath-

er than fashion. Another almost ‘obligatory’ purchase in the U.S. has been a new television, after the switch-off of the analog signal finally went ahead on June 12.

There were concerns that extra federal subsidy for the converter box program would slow sales, but according to the CEA this did not happen to any significant degree. Digi-tal displays represented 15% of sales in the first half of the year, and unit shipments are expected to rise by 8%. Strong cost-down pressures here will mean that full-year revenues fall in line with the broader market by 6% to $24B. However, another factor here is that consumers are now moving onto secondary displays at smaller sizes for bedrooms and else-where in the home beyond the main lounge. This second-ary market is holding up well despite the recession and also helping to establish LCD as the main HDTV technology.

In the longer term, though, analysts and leading technol-ogy companies are looking to the mobile Internet device seg-ment to help restore industry growth into 2010 and through 2011. The smartphone is already a strong player and one

We revisit some of the latest consumer electronics forecasts. Paul Dempsey reports.

Reading the runes

One million units in just three days


tor Graphics also rolled out the latest extensions to its Vista software tools that allow low-power architectural optimiza-tion at the system level through transaction-level modeling. When companies start talking about potential 80% savings on power budgets, again the mobile device market immedi-ately springs to mind.

Driving down board and components cost looks therefore like it will remain the prevailing theme this year, but inno-vation on the mobile front is also recognized as a necessity in the latest market data.

of the few product segments still expected to grow in rev-enue terms by the CEA this year, albeit by a modest 3% to $14B. However, the netbook market is still surging ahead with shipments set to rise by 85% to 8.5M and revenues set for $3.4B. One thing that netbooks have going for them, of course, is the public perception that in many cases they are a cheaper alternative to a traditional laptop, and therefore the recession can make them a more attractive buy.

In that light, more and more alliances are looking to ex-ploit this emerging space. July’s Design Automation Confer-ence saw ARM and the Common Platform foundry alliance add EDA vendor Synopsys to their existing drive to harness high-K metal gate technology at the 28nm and 32nm nodes to SoCs for mobile applications. At the same event, Men-

Untitled-9 1 8/14/09 10:23:52 AM

www.accelicon.com

< COMMENTARY > INTERVIEW10

To the world at large, Junot Diaz is well on the way to be-coming a literary superstar. His novel The Brief Wondrous Life of Oscar Wao has already earned critical superlatives and the 2008 Pulitzer Prize for Literature. And deservedly so. It is a fabulous, accessible book, expressed in language that—when you buy a copy (and, trust me, you will)—immediately exposes the poverty of my attempts to praise it here. However, Oscar is not the main reason for our interest.

Like most authors—even the most suc-cessful ones—Diaz has ‘the day job’, and in this case it is as the Rudge and Nancy Allen Professor of Creative Writing at the Massachusetts Institute of Technology (MIT). Yes, he spends a large part of his time teaching scientists and engineers who are, in his words “only visiting the humanities.”

What Diaz is trying to achieve goes beyond the ‘challenge’ traditionally as-sociated with science students and the written word. There is a lot of talk today about turning engineers into bloggers, smoothing communication within multi-national teams, and the profession’s need to explain itself to the outside world. Diaz aims to stimulate their creativity.

While that might sound less practical, when I mentioned this interview to a senior Silicon Valley executive, his eyes lit up. In his view, the one thing that university courses of-ten fail to address is students’ more imaginative capabili-ties. The industry does need team players and ascetically analytical thinkers, but given the obstacles it faces today, it desperately requires more newcomers who can make inno-vative leaps and deliver original ideas.

Diaz, for his part, does not approach his engineering majors from the perspective that they are all that much dif-ferent from those focused on the humanities. At heart, it is about potential and engagement, not stereotypes.

“Yes, they’re brilliant and yes, they’ve sacrificed a lot of social time to their studies, and yes, they’re intense. But at

the arts level, they seem a lot like my other students,” he says. “I think there’s a tendency at a place like MIT to focus on what’s weird about the kids, but really what amazes me is how alike they are to their non-science peers. Some are moved deeply by the arts and wish to exercise their passion

for it; many are not.”At the same time, he has noticed that

science must, by its nature, place some emphasis on a very factual, very dry dis-course and a hardcore empiricism. That is not a bad thing, rather the reflection of the demands of a different discipline.

“But it is in a class like mine where students are taught that often the story is the digression or the thing not said—that real stories are not about Occam’s Razor but something far more beautiful, messy, and I would argue, elegant,” he says.

There is another side to this dialogue—what the students bring to the classes as well. On one level, there is an inherently international side to MIT, given its repu-tation as a global center of excellence.

“I get to read stories from all over the world. What a wonderful job to have, to be given all these limited

wonderful glimpses of our planet,” says Diaz. “Hav-ing a student from Japan use her own background to provide constructive criticism about describing family pressures to a young person from Pakistan is something to cherish.”

Multi-culturalism is something that strikes a chord with Diaz for personal reasons. He is himself an immigrant from the Dominican Republic, and his experiences as part of that diaspora, that country’s often bloody national history and his own youth in New Jersey all deeply inform Oscar Wao and his short stories.

“Many of my students are immigrants, so I know some of the silences that they are wrestling with,” Diaz says. “It makes me sympathetic but also willing to challenge them to confront some of the stuff that we often would rather look away from: the agonies of assimilation, the virulence of

Pulitzer-winning author. MIT professor. Junot Diaz shows that those who can do, also teach.

Engineering creativity


some of our internalized racial myths, and things like that.”

However, those students with a sci-ence background also inspire him. “In every class, I see at least one student who thinks the arts are ridiculous frip-pery, become, by the end of term, a fierce believer in the power of the arts to describe and explode our realities as human beings,” says Diaz.

“Why I love non-arts majors is that they are much more willing to ques-tion the received wisdom of, say, cre-ative writing, and that has opened up entirely new lines of inquiry. My MIT students want you to give them two, maybe three different definitions for character. And that’s cool.”

Diaz’s own taste in authors puts something else in the mix here. The fantasies of Tolkien prove a frequent touchstone in his novel, although he seems even closer to a group of other English science-fiction writ-ers who—arguably taking their cue from the genre’s father H.G. Wells—explore nakedly apocalyptic themes, espe-cially John Christopher. His prescience is indeed seeing him ‘rediscovered’ right now in the UK, for works such as The Death of Grass (a.k.a. No Blade of Grass), which foresees an ecological catastrophe.

“Christopher had a stripped-down economic style and often his stories were dominated by anti-heroes—in other words, ordinary human beings,” says Diaz. “He may not be well known, but that changes my opin-ion not a jot. Like [John] Wyndham, like [J.G.] Ballard, Christopher deploys the apocalyptic mode to critique both our civilization and our deeper hidden selves, but I don’t think anyone comes close to his fearless, ferocious vision of human weakness and of the terrible world we inhabit.”

With opinions like that, MIT is obviously an excellent place to stir up the debate.

“I often find myself defending fantasy to students who themselves have to defend science-fiction to some

of their other instructors,” says Diaz. “That’s why we’re in class, to explode these genres, to explore what makes up their power and why some people are immune to their charms.”

So, it is a pleasure rather than a challenge for this author to bring arts and literature to those following what we are often led to believe is a very separate academic and intel-lectual path. In fact, his parting thought is that the combina-tion may in some respects be an improvement, more well-rounded even.

“All I can say is this one thing about engineering and sci-ence students: they are so much more accustomed to work-ing as a team, and that’s a pleasant relief from the often unbearable solipsistic individualism of more humanities-oriented student bodies,” says Diaz.

Non-arts majors are much more willing to question the received wisdom.

The Brief Wondrous Life of Oscar Wao and Junot Diaz’s earlier collection of short stories, Drown, are both published in the USA by Riverhead Books.

< COMMENTARY > LOW POWER12

Accellera’s Unified Power Format (UPF) is in production use today, delivering the low-pow-er system-on-chip (SoC) designs that are so much in demand. Building upon that success, IEEE Std 1801-2009 [UPF] offers additional fea-tures that address the challenges of low-power design and verification. These include more abstract specifications for power supplies, power states, and other elements of the power management architecture, all of which pro-vide additional flexibility for specification of low-power intent. Support for the incremental development of power architectures extends the usefulness of the standard into IP-based, hierarchical methodologies where base power architectures may have been established inde-pendent of the IP components used in the over-all design. This article reviews some of the new features in IEEE Std 1801-2009 that can make UPF-based flows even more effective.

The Accellera standard, fully incorporated in IEEE Std 1801-2009, introduced the concept of a power architec-ture specification. This includes power domain defini-tions (i.e., groups of design elements with the same pri-mary power supply), power distribution and switching networks, and power management elements such as iso-lation and level shifting cells that mediate the interfaces between power domains. The specification of state re-tention across power down/up cycles was also included. These capabilities encompass both the verification and the implementation semantics, so the same UPF specifi-cation can be used in both contexts. IEEE Std 1801-2009 builds on this concept to offer additional capability, flex-ibility and usability.

Supply setsUPF introduced the capability of defining supply nets, which represent the power and ground rails in a design. In some cases, such as during the development and inte-gration of soft-IP, it can be more useful to define nets that must be used in pairs to connect design elements to both power and ground. In addition, there may be other supplies

involved, such as is entailed in support for bias rails. IEEE Std 1801-2009 introduced the concept of a ‘supply set’ so that it can represent such a col-lection of related nets.

Syntax:

create_supply_set <set_name>

{ -function { <func_name> [ <net_name> ] } }*

[ -reference_gnd <supply_net_name> ]

[ -update ]

Each of the supply nets in a supply set con-tributes some abstract function to it. The func-tion names, specified via the ‘–function’ op-tion can have either predefined values (e.g., ‘power’, ‘ground’, ‘nwell’, ‘pwell’, ‘deepnwell’,

or ‘deeppwell’), or user-defined names. The predefined values allow the user to specify standard power func-tions such as the primary power and ground nets or bias power nets. The user-defined names may be used as placeholders for later modification or as potential exten-sions for analysis tools.

A supply set can be defined incrementally, as progressively more information becomes available. Initially, a supply set can be defined in terms of functions. Later, those functions can be mapped to specific supply nets. Separately, the sup-ply set can be associated with a power domain; a power switch; or a retention, isolation, or level shifting strategy. For example, the initial definition of a supply set might be given as follows:

create_supply_set SS1 -function {PWR} –function {GND}

Later, the abstract functions ‘PWR’ or ‘GND’ can be mapped to specific supply nets, using the ‘–update’ option:

create_supply_set SS1 –function {PWR vdd} –update

create_supply_set SS1 –function {GND vss} –update

The function names defined as part of a supply set also contribute to automating the connection of supply nets to

The latest revision of the Unified Power Format offers more flexible design and verification abstraction, explain Erich Marschner and Yatin Trivedi.

Extending UPF for incremental growth

YatinTrivedi

ErichMarschner


pins of design elements. When a supply set is connected to a design element, each individual supply net in the set is automatically connected to a supply port based on the correspondence between supply net function and supply port attributes. For example, for a given supply set S, the supply net that is the ‘power’ net within S will be associ-ated with any supply port that has the ‘pg_type’ attribute ‘primary_power’.

Power StatesThe Accellera UPF standard introduced several com-mands (‘add_port_state’, ‘create_pst’, ‘add_pst_state’) to define the power states of the system in terms of the power states of the power domains of which it is com-prised. These in turn are defined in terms of the states and voltage levels of supply nets. In IEEE Std 1801-2009, these capabilities are expanded to include an additional approach to defining power states: the ‘add_power_state’ command. This new command applies to both supply sets and power domains.

Syntax:

add_power_state <object_name>

-state <state_name>

{ [ -supply_expr <boolean_function> ]

[ -logic_expr <boolean_function> ]

[ -simstate <simstate>

[ -legal | -illegal ]

[ -update ]

}

Power states can be defined abstractly at first, using ‘–logic_expr’, the argument of which specifies the condition or conditions under which the object is in the given power state. The condition is given as a boolean expression that can refer to logic nets or supply nets. This is useful when power states are being defined before the power distribution net-work has been defined. Later, when the power distribution network is in place, the power state definition can be refined by specifying the state in terms of the states of supply nets

Simstate Combinational Logic Sequential Logic Corruption SemanticsNORMAL Fully Functional Fully Functional None

CORRUPT_STATE_ON_ACTIVITY Fully Functional Non-Functional Regs powered by the supply corrupted when any input to the reg is active

CORRUPT_STATE_ON_CHANGE Fully Functional Non-Functional Regs powered by the supply corrupted when the value of the register is changed

CORRUPT_ON_ACTIVITY Non-Functional Non-Functional Wires driven by logic and regs powered by the supply corrupted when any input to the logic is active

CORRUPT Non-Functional Non-Functional Wires driven by logic and regs powered by the supply corrupted immediately on entering state

NOT_NORMAL Deferred Deferred By default, same as CORRUPT. Tool may provide an override

FIGURE 1 Simstates and their simulation semantics

Source: Accellera

Continued on next page

< COMMENTARY > LOW POWER14

only, using ‘–supply_expr’. The supply expression defini-tion is the golden specification of the power state. When both ‘–logic_expr’ and ‘–supply_expr’ are given, ‘–supply_expr’ is treated as the primary definition, and ‘–logic_expr’ is then used as an assertion, to check that the more specific supply expression is indeed true only when the logic ex-pression is also true. This helps ensure consistency as the power architecture is elaborated and further refined during development.

Incremental refinement for power states can be quite pow-erful. In this case, it can be applied to the boolean function that defines the power state. For example, an initial power state specification might be given as follows:

add_power_state P1 –state FULL_ON –logic_expr PWR_ON

The end result is that object P1 will be in state ‘FULL_ON’ whenever logic net ‘PWR_ON’ evaluates to ‘True’. A subse-quent command can refine the logic expression to further restrict it, as in:

add_power_state P1 –state FULL_ON –logic_expr RDY –update

which would refine the logic expression for state ‘FULL_ON’ of P1 so that it is now ‘(PWR_ON && RDY)’. The same kind of refinement can be performed on the supply expression.

SimstatesThe Accellera UPF standard introduced simulation seman-tics for UPF specifications. In IEEE Std 1801-2009, the simu-lation behavior semantics have been further developed un-der the concept of ‘simstates’. For any given power state, the simstate specifies the level of operational capability sup-ported by that power state, in terms of abstractions suitable for digital simulation.

Several levels of simstate are defined. ‘NORMAL’ rep-resents normal operational status, with sufficient power available to enable full and complete operational capa-bilities with characteristic timing. ‘CORRUPT’ represents non-operational status, in which power is either off or so low that normal operation is not supported at all, and both nets and registers are in an indeterminate state. In between these two extremes are three other simstates: ‘CORRUPT_ON_ACTIVITY’, ‘CORRUPT_STATE_ON_ACTIVITY’ and ‘CORRUPT_STATE_ON_CHANGE’. These represent in-termediate levels of ability to maintain state despite lower than normal power.

Simstates are defined for power states of supply sets. As the power state of a supply set changes during simulation, the corresponding simstate is applied to the design ele-ments, retention registers, isolation cells, or level shifters to which the supply set is connected. The simulation seman-tics of the various simstates are shown in Figure 1 (p. 13).

Incremental refinement applies to simstates as well. The simstate of a given power state can be defined initially as ‘NOT_NORMAL’, indicating that it is a non-operational state, without being more specific. A later UPF command can update the definition to specify that it is any one of the simstates other than ‘NORMAL’. By default, the ‘NOT_NORMAL’ simstate is treated as if it were the ‘CORRUPT’ simstate, a conservative interpretation. By re-fining a ‘NOT_NORMAL’ simstate, the corruption effects can be limited so that they only apply to activity on wires or state elements, or to changes of state elements.

SummaryThe Accellera UPF standard provided an excellent founda-tion for the development of low-power design and verifica-tion capabilities and is in production use with customers today. As a standard, it has enabled the interoperability of tools, which has in turn made possible a variety of low-power design and verification flows to meet different needs. Building upon this foundation, IEEE Std 1801-2009, which completely incorporates the Accellera UPF standard, defines new abstractions that can provide more flexible capabilities to support the incremental development of power architectures. These new capabilities can help UPF users address the low-power design and verification chal-lenges that are becoming increasingly significant in today’s product development cycles.

Erich Marschner and Yatin Trivedi are members of the IEEE 1801 Working Group.

320,000,000 MILES, 380,000 SIMULATIONS AND ZERO TEST FLIGHTS LATER.

THAT’S MODEL-BASED DESIGN.

Accelerating the pace of engineering and science

After simulating the final descent of the Mars Rovers under thousandsof atmospheric disturbances, theengineering team developed andverified a fully redundant retro firing system to ensure a safetouchdown. The result—two successful autonomous landingsthat went exactly as simulated. To learn more, go tomathworks.com/mbd

©2005 The MathWorks, Inc.

Untitled-4 1 5/6/08 11:40:32 AM

www.mathworks.com/mbd

< TECH FORUM > EMBEDDED16

IntroductionRecent years have seen a lot of publicity and enthusiasm around implementations of Linux on embedded systems. It seems that another mobile device manufacturer announces its support for the Linux general purpose OS (GPOS) every week. At first, many developers viewed Linux as an out-right competitor to established and conventional real-time OSs (RTOSs). Over time, though, as the various options have been tried out in the real world, a new reality has dawned. Each OS has its own strengths and weaknesses. No one OS fits all.

To illustrate this new reality, this article takes a closer look at the differences between a commercial RTOS (Nucle-us) and a GPOS (Linux)—and considers how the two OSs might even work together.

Embedded vs. desktopThe phrase ‘operating system’ causes most people to think in terms of the controlling software on a desktop computer (e.g., Windows), so we need to differentiate between the pro-gramming environments for a PC and those for an embed-ded system. Figure 1 highlights the four key differences.

It is interesting to note that as new OSs are announced (such as the Chrome OS for netbooks), the differences be-tween a desktop computer and an embedded system be-come even less obvious. Further, more complex embedded systems now have large amounts of memory and more pow-erful CPUs; some often include a sophisticated graphical user interface and require the dynamic loading of applica-tions. While all these resources appear to facilitate the pro-vision of ever more sophisticated applications, they are at odds with demands for lower power consumption and cost control.

Obviously OS selection is not a simple matter. It is driven by a complex mix of requirements, which are intrinsic in the design of a sophisticated device. The ideal choice is always a combination of technical and commercial factors.

Technical factors for OS selectionFrom a technical perspective, the selection criteria for an OS revolve around three areas: memory usage, performance, and facilities offered by the OS.

MemoryAll embedded systems face some kind of memory limitation. Because memory is more affordable nowadays, this con-straint may not be too much of a problem. However, there are some situations where keeping the memory usage to a minimum is desired; this is particularly true with handheld devices. There are two motivations for this: cost—every cent counts in most designs; and power consumption—battery life is critical, and more memory consumes more power.

For memory-constrained applications, it is clearly desir-able to minimize the OS memory footprint. The key to doing this is scalability. All modern RTOS products are scalable, which means that only the OS facilities (i.e., the application program interface (API) call service code) used by the appli-cation are included in the memory image. For certain OSs, this only applies to the kernel. Of course, it is highly desir-able for scalability to apply to all the OS components such as networking, file system, the user interface and so on.

PerformanceThe performance of a device on an embedded application is a function of three factors: application code efficiency, CPU power and OS efficiency. Given that the application code per-formance is fixed, a more efficient OS may enable a lower power CPU to be used (which reduces power consumption and cost), or may allow the clock speed of the existing CPU to be reduced (which can drastically reduce power consump-tion). This is a critical metric if the last ounce of performance must be extracted from a given hardware design.

FacilitiesUltimately, an OS is a set of facilities required in order to imple-ment an application. So, the availability of those facilities is an obvious factor in the ultimate selection. The API is important, as it facilitates easy porting of code and leverages available ex-pertise. Most RTOSs have a proprietary API, but POSIX is gen-erally available as an option. POSIX is standard with Linux.

Colin Walls is a member of the marketing team of the Mentor Graphics Embedded Systems Division and has more than 25 years of experience in the electronics industry. He is a frequent presenter at conferences and seminars, and the author of numerous technical articles and two books on embedded software.

Colin Walls, Mentor Graphics

Linux? Nucleus? …Or both?


Until recently, operating system (OS) specification for embedded systems has been seen largely as an ‘either/or’ exercise. Similarly, OSs that have their foundations in the embedded market and those that have grown out of desktop computers have been seen as competing rather than complementary technologies.

Cost and performance criteria within specifications will of-ten lead to one technology winning out over another. But as hardware moves increasingly to multicore architectures, it is also the case that different types of OS can be speci-fied within a single overall end-product, each specificallly handling those tasks it carries out most efficiently.

This article compares one real-time OS (Nucleus from Mentor Graphics) of the type traditionally associated with embedded systems with a general-purpose OS (Linux) that is increasingly being used in that market to identify their various advantages, and also the emerging opportunities for their use alongside one another.

Most embedded OSs employ a thread model; they do not make use of a memory management unit (MMU). This facilitates maximum OS performance, as the task context switch can be fast, but it does not provide any inter-task protection. Higher-end OSs, like Linux, use the process model where the MMU is employed to completely insulate each task from all of the other tasks. This is achieved by providing a private memory space at the expense of context switch speed. An interesting compromise is for a thread-based OS to utilize a MMU to protect other tasks’ memory, with-out re-mapping address spaces. This provides significant inter-task protection, without so much context switch time overhead.

Of course, there is much more to an OS than the kernel and the breadth and quality of the middleware. The avail-ability of specific application code, preconfigured for the chosen OS, may be very attractive. Even if the current target application does not need all of these capabilities, possible upgrades must be anticipated.

Commercial factors for OS selectionThe primary consideration in any business decision is cost. Even though selecting an OS is apparently a technical en-deavor, financial issues can strongly influence the choice. There are initial costs, which include the licensing of soft-ware or procurement of tools, and there are ongoing costs, such as runtime licenses or maintenance charges.

All software has some kind of license, so the costs of legal scrutiny must be factored in, as lawyers with appropriate technical skills do not come cheap. There is also the ques-tion of ongoing support and who is going to provide it.

Finally, for most state-of-the-art devices, time-to-market is extremely tight, so the extent to which the choice of OS can accelerate development may be measured not as a cost, but rather, as a cost savings.

Linux or Nucleus? One company’s experienceBitRouter, a successful software company from San Diego, Cali-fornia, builds turnkey software solutions for set-top box and television applications. The company has implemented solu-tions using the Mentor Graphics Nucleus RTOS, uC/OS-II, Vx-Works, OS20, WIN32, commercial Linux, as well as embedded Debian Linux for ARM. Some of BitRouter’s main customers include Texas Instruments, Toshiba Semiconductors, NXP Semi-conductors, ST Microelectronics, Motorola, RCA and NEC.

BitRouter had the opportunity to implement similar digital-

Runs different programs at different timesdepending upon the needs of the user.

Has large amounts of (RAM) memory anddisk space, both can be readily andcheaply expanded if required.

All PCs have an essentially identicalhardware architecture and run identicalsystem software. Software is writtenfor speed.

Boot up time may be measured in minutesas the OS is loaded from disk and initialized.

Runs a single, dedicated application atall times.

Has sufficient memory, but no excess;adding more is difficult or impossible.

Embedded systems are highly variable,with different CPUs, peripherals, operatingsystems, and design priorities.

Boot up time is almost instantaneous—measured in seconds.

Desktop Computer Embedded System

FIGURE 1 Differences between a desktop computer and an embedded system.

Source: Mentor Graphics

Real-Time OS(Nucleus)

Control Plane Data Plane

IPC

General Purpose OS(Linux)

UserApp

UserApp

UserApp

Middleware

Kernel

Middleware

Kernel

Application

FIGURE 2 High-level software architecture with the separation between the control and data planes in a multi-OS/AMP system.



< TECH FORUM > EMBEDDED18

to-analog converter set-top boxes using Linux and Nucleus. The Nucleus-based set-top box had a Flash and RAM foot-print of roughly half that required by the similar Linux-based set-top box. The boot time required for video to play was three seconds with Nucleus compared to ten seconds with Linux.

This is just one example of how an application dictates the most suitable OS. In this situation, a commercial RTOS was better suited because it was small, compact, and it was being built into a high-volume system where memory foot-print and boot-up time were key issues.

BitRouter is nevertheless still a big supporter of Linux and believes Linux will be a good fit for Java-based set-top boxes and TV sets where the total RAM footprint can exceed 64MB—and where constrained space is not such a critical issue.

The next frontier: multicore, multi-OS designs The OS selection between Nucleus and Linux may not even comprise the right question. Perhaps it is better to ask how these two OSs can work together, maximizing their respec-tive strengths to address ever-challenging performance and power efficiency requirements for today’s increasingly complex embedded applications.

One solution is moving to multicore system development. And while multicore has been around for some time, what is new are the recent advancements in asymmetric multi-pro-cessing (AMP) systems. An AMP multicore system allows

load partitioning with multiple instantiation of the same or different operating systems running on the same cores.

Figure 2 (p. 17) shows a basic, high-level software architec-ture for an AMP system. The design includes both a GPOS and an RTOS—each serving distinctly different purposes. The system is partitioned in such a way that the GPOS handles the control plane activities (e.g., drivers, middleware, GUI) and the RTOS handles data plane activities that are time- sensitive, deterministic, and computationally more intensive.

A key ingredient in the success of the multi-OS/AMP system is the Inter-Process Communication (IPC) method, which until recently, varied from one design to the next. IPC allows the OSs to communicate back and forth. Today, there are a number of open standards for IPC, which will further expedite multicore, multi-OS development.

Figure 3 takes the multi-OS example one step further. It shows a few of the design decisions behind the integration of the GPOS and RTOS, and a real-world example can be envi-sioned in terms of what fabless chip vendor Marvell recently accomplished with its Sheeva line of embedded processors.

The company is a specialist in storage, communications and consumer silicon, with a focus on low-power, high-performance devices. Sheeva, allows developers to use dual OSs to manage separate function requirements. For example, in one applica-tion for enterprise printing, Nucleus could be used for inter-operational tasks where speed is of prime importance, while Linux could be used for networking and the user interface.

ConclusionConventional embedded RTOSs and desktop-derived OSs each have a place in the embedded ecosystem.

An RTOS makes less demand on resources and is a good choice if memory is limited, real-time response is essential, and power consumption must be minimized. Linux makes good sense when the system is less constrained and a full spectrum of middleware components can be leveraged.

Finally, there are an increasing number of instances where multicore design can benefit from a multi-OS approach—on a single embedded application—maximizing the best of what Nucleus and Linux have to offer.

More informationColin Walls also blogs regularly at http://blogs.mentor.com/colinwalls

Mentor GraphicsCorporate Office8005 SW Boeckman RdWilsonvilleOR 97070USA

T: +1 800 547 3000W: www.mentor.com

Shared Devices

RTOS D

evices

External Memory

General-purpose OS(Linux)

RTOS(Nucleus)

IPC

CORE 0 CORE 1

UART

PCI LCD I2C SPI

INTCTRL

TIMER

ETH USB TP

Soft

war

e

Shar

ed D

evic

es

GPO

S D

evic

es

Perip

hera

lsEx

ecut

ion

Uni

ts

FIGURE 3 Partitioning of system resources in a multicore, multi-OS design.


Copyright 2009 Taiwan Semiconductor Manufacturing Company Ltd. All rights reserved. Open Innovation Platform™ is a trademark of TSMC.

Performance. To get it right, you need a foundry with an Open Innovation Platform™ and process technologies that provides the flexibility to expertly choreograph your success. To get it right, you need TSMC.

Whether your designs are built on mainstream or highly advanced processes, TSMC ensures your products achieve maximum value and performance.

Product Differentiation. Increased functionality and better system performance drive product value. So you need a foundry partner who keeps your products at their innovative best. TSMC’s robust platform provides the options you need to increase functionality, maximize system performance and ultimately differentiate your products.

Faster Time-to-Market. Early market entry means more product revenue. TSMC’s DFM-driven design initiatives, libraries and IP programs, together with leading EDA suppliers and manufacturing data-driven PDKs, shorten your yield ramp. That gets you to market in a fraction of the time it takes your competition.

Investment Optimization. Every design is an investment. Function integration and die size reduction help drive your margins. It’s simple, but not easy. We continuously improve our process technologies so you get your designs produced right the first time. Because that’s what it takes to choreograph a technical and business success.

Find out how TSMC can drive your most important innovations with a powerful platform to create amazing performance. Visit www.tsmc.com

A Powerful Platform for Amazing Performance

Untitled-4 1 6/3/09 11:46:00 AM

www.tsmc.com

< TECH FORUM > ESL/SYSTEMC20

There is a widespread belief that analog and mixed-signal (AMS) design cannot take advantage of abstractions and other ESL design techniques that have shortened design cycles and raised efficiency in the digital domain. This ar-ticle will show that the reverse is true. Although the first generation of ESL tools tended to focus on linking hard-ware and software, there are ESL tools that enable AMS engineers to design at the system level and exploit the pro-ductivity advantages of ESL. These tools also improve the design and verification of the interface between the analog and digital worlds, where experience shows us that many bugs lurk.

Changing any design flow does involve some risk, some-times a considerable amount. However, the techniques dis-cussed here are at the lower end of that scale, and demon-strate that even a few minor tweaks can have a dramatic effect.

The status quoFirst, let’s identify some existing problems that we can fix in a straightforward way. Figure 1 shows a typical AMS design flow that we will use as a vehicle.

A project usually starts with a specification. An inter-nal team—perhaps ‘a research group’ or ‘a systems and algorithms group’—creates models of the proposed sys-tem and starts running conceptual simulations. It then passes the resulting goals on to the digital and analog design groups for the implementation stage. The specifi-cation normally arrives as a paper (e.g., Acrobat, Word) document.

At this point, the two teams go off and design their parts of the system. In principle, there should be steady back-and-forth communication between them as the de-sign progresses from concept to implementation. After that we come to verification, then to a prototype as a chip or PCB that is typically fabricated by a third-party manufacturing partner.

What are the problems here? The left hand side of the diagram shows the digital designers working through a ‘design, verify and implement’ loop. They have ac-cess to some very good design abstractions that, over the last two decades, have enabled them to work at much higher levels, away from the circuit level. How-ever, analog designers have not seen quite the same amount of advances. Typically, they still focus at the circuit level, so the loop there becomes ‘implement and verify’ only.

Meanwhile, our flow assumes that there is efficient com-munication between the digital and analog teams, but we all know that is not often the case. This is another candidate for major improvement.

Let’s also ask if we can do better than a paper specifica-tion. Could we make the design process more coherent as a project flows from concept to verification through implementation, and also execute the design in a way that really exploits the system knowledge generated during specification?

This gives us three flow-based objectives that we will now address under the following headings:

• Behavioral abstraction for AMS;• Linking tools and teams; and• A coherent design process.

Mike Woodward is industry manager for communications and semicon-ductors at The MathWorks. He has degrees in Physics, Microwave Phys-ics and Microwave Semiconductor Physics, and was a leading player in a team that won the British Computer Society’s IT Award for Excellence in 2000. His work on audio signal processing has led to several patents.

Mike Woodward, The MathWorks

Bringing a coherent system-level design flow to AMS

Specification

Verification

Digital Design Analog DesignDESIGN

VERIFY IMPLEMENT VERIFY IMPLEMENT

PROTOTYPE

VERIFY• MEASURE• INTERPRET

FIGURE 1 A typical AMS/digital design flow

Source: The MathWorks


These are all key features of the Simulink tool suite. We are now going to look at them in both the generic sense and by way of some actual user experiences.

Behavioral abstraction for AMSSimulink enables you to take a more abstract, or behav-

ioral, view of a system. Models are created by assembling components from pre-existing libraries, or by creating com-ponents if they do not already exist. This speeds up the de-sign process compared with building each component from scratch every time. Let’s take a more specific view of how that works.

Sigma delta modulatorFigure 2 is a model of a second-order sigma delta analog-to-dig-ital converter (ADC). This example gives us the opportunity to show how analog and digital elements are connected together.

How did we get that? Simulink has a set of librar-ies available to it, and you construct models through a drag-and-drop interface. So, progressing to Figure 3 (p. 22), this is the interface where we find that we can drop in an integrator, a source and a gain. Some of these components are analog, and some are digital—we can connect them together directly. Having connected them up, we will want some kind of output so we put in a oscilloscope.

What if we want some behavior that is not the default behavior? Say we want a specific gain signal for example. To do that, you simply double-click on the gain block, and that opens up a dialogue box where you can enter the data (Figure 4, p. 23).

What happens if you want some behavior that is not in an existing library? Then you can create your own Simulink blocks using C or Matlab.

As with many mixed-signal systems this model has a feedback loop in it, something that can cause significant problems for some simulation environments. Simulink copes with feedback loops naturally, and in fact that capa-bility was built-in right from the start.

Variable time step handlingThe temporal behavior of a typical analog system is fairly constant and predictable over long periods, but can some-times undergo dramatic change for short periods. This presents a significant challenge when you want to run simulations.

A simulation can take very large time steps. That will save on computational power and time, but also means the simulation is likely to miss capturing the system’s

For two decades, the benefits of ESL and abstractions have been supposedly confined to engineers working on digital designs and to system architects. Analog and mixed-signal (AMS) design has largely remained a ‘circuit level’ activity. This article shows that tools exist that now also allow AMS engineers to exploit abstraction, and that can make all types of design flow (analog only, but also where AMS/RF and digital elements are combined) more efficient and more comprehensive.

Software such as Simulink provides access to an extensive range of models, and the same tools can also provide a common communications platform between AMS and digital teams that helps both sides see how one part of a design affects another. Even at the level of the specifica-tion itself, cumbersome and sometimes ambiguous paper documentation can be replaced with digital files that define goals and intent throughout the life of a project.

FIGURE 2 Second order sigma-delta ADC




behavior when it changes radically over a short period of time.

Alternatively, it can take very small time steps through-out. This captures rapidly changing behavior, but is also un-necessarily lengthy and computationally very expensive.

Simulink offers a third approach. Here, the simula-

tion takes a variable time step, so that when the system is changing very rapidly, it sets a small time step and when the system is hardly changing at all it sets larger ones. This provides the balance between accuracy and computational efficiency.

Our ADC system has different data rates, something we can see by turning on the rate display as shown in Figure 5 (p. 24). Different colors show different data rates, with variable time step branches in black (analog components), and the different fixed steps in different colors (the digital components). As you can see, the model consists of multiple different time steps. Note how blocks operating at different rates are directly connected together.

If various time steps are being used in the simulation, how can we control them? Figure 6 (p. 26) shows the Simu-link configuration menu where the time steps are controlled. This is a simple interface where the default options are the best in almost all cases. If we have analog components in the model we can select a variable time step, or if the model is wholly digital we can select a fixed time step.

If you need even greater control, you can change the solver, you can change the minimum tolerances, or you can change the maximum step sizes. All those advanced options are there. But if you just want to get

something akin to a ballpark option, you can use the default options.

Executable specificationsIf this kind of system-model is developed early on, then it can be used as an executable specification. Paper specifications are ambiguous, but an executing system is not. An executable specification in the form of a func-tioning system model enables geographically separated groups to reason about the behavior of the system and to understand how their component has to fit in to the overall system.

The bottom lineThis design process enables us to get something up and run-ning very quickly—we can find an accurate behavioral so-lution rapidly, much more quickly than is usually the case in AMS design.

Lastly, before we start putting all this together, we must note that while efficiencies in modeling and temporal analy-sis are important, there may be points where the granularity of circuit-level simulation is required. That is still available. You do not give it up when you move to a more system ori-ented flow, as we will now go on to discuss.

In the real worldSemiconductor company IDT New Wave was looking to improve its mixed-signal simulations. Their previous method was based purely at the circuit level, and it used to take days to run. The feedback loops in the design slowed the simulation engine down greatly. In addition to the vari-able time step solver, Simulink has the capacity to deal with algebraic loops, so IDT was able to use the tools and con-cepts described above to shorten its design cycle and iden-tify algorithmic flaws earlier in its flow.

Let’s summarize the benefits of using this type of ap-proach. Using traditional design models, you can easily become entangled in the details of the analog components, and because of the cost of changing these models, can only examine a few different architectures. By taking a more ab-stracted approach, you can quickly evaluate a wider range of architectures. This more comprehensive exploration will give you confidence that your final decision is the ‘right’ one. Its rapid design capability also substantially reduces the risk of serious errors being found later in the design pro-cess and helps avoid respins or late-stage ECOs.

Linking tools and teamsEffective communication between analog and digital en-gineers has a tremendous impact on your flow’s efficiency once you move beyond behavioral evaluation toward actual implementation. Consider again Figure 1. Our simplified de-sign flow shows no obstacles between the digital and ana-log groups, but in many cases there might as well be a brick

FIGURE 3 Libraries of analog and digital components



wall. One semiconductor company told us it was so hard to get the analog and digital teams to communicate during design that they waited until they had a test chip from the foundry before testing the analog-digital interface.

The problem is not that these teams inherently dislike one other or do not appreciate the need for constant and pro-ductive communication; rather it lies in the lack of a com-mon language.

A digital engineer will at some point need to check that his design works with the design of his analog counterpart. So, he will ask his colleague to supply some test source, and that colleague will then run an analog design tool to output the necessary data. The data will almost certainly not be in a format that the digital tools can read. So, it will need some translation—and maybe a graduate engineer at the company has written some Perl scripts to do that. But, if there are 10, 20 or 30 seconds of data, it will be an enormous file and it is going to take time to process. Finally, though, the digital engineer can read the data into his work and get some answers.

Then, inevitably, the analog designer asks his digital col-league to return the favor and we go through the process again, but in the other direction.

There are several problems with this.• It is slow. The time taken for simulation is compounded

by the time taken translating and transferring data.• It is cumbersome. Simply moving these enormous files

around can be awkward. I have worked on a project where even with a fast link between two different sites in different countries, translation and processing meant that it was still quicker to burn DVDs and send them via a courier than to transfer the data online.

• It is very static, and this is the biggest problem of all. We cannot simulate the dynamic analog-digital interaction. If something in the simulation causes the modulation scheme to change, this will affect the analog compo-nents, and this change in the analog behavior may in turn affect the digital parts of the system. This kind of interaction cannot be studied using a file-based simula-tion method.

Both analog and digital designers need a common plat-form that removes these obstacles. This is where Simulink shows its capabilities. Not only can it simulate mixed-signal systems, but it has links to tools from other vendors for im-plementation and those other vendors have links from their products to Simulink.

In the digital domain, there are co-simulation links from Simulink to Mentor Graphics’ ModelSim, Synopsys’ Dis-covery and Cadence Design Systems’ Incisive. In the analog domain there are links to Cadence’ PSpice and Spectre RF, Synopsys’ Saber and others.

Co-simulation can be defined as the use of bidirection-al runtime links between Simulink (and, for that matter,

Matlab) and other tools. You can run the two tools to-gether and exchange data in real time as the simulation progresses. Basically for every model time step you take, you exchange data. This means you can see the dynamic changes of behavior between the different models in the system. In essence, we thus have Simulink acting as the common design platform. From a behavioral description, you can now call down from a higher level abstraction to a detailed implementation in a tool from another ven-dor—all at run time.

Let’s take an example. A real RF receiver will intro-duce distortions into a signal that affect the behavior of a digital receiver. By making a call from Simulink to, say, ModelSim, a digital engineer can see straight-away how his digital receiver implementation copes with those distortions and decide if the result is within tolerance or if the design needs to be changed. Mean-while, an analog engineer can call from the analog por-tions of his system in Simulink to see implementations

FIGURE 4 Adding behavior




running on SpectreRF. He can thus see how his designs perform within the context of the digital simulation in Simulink.

In both scenarios, Simulink is acting as a test harness, giv-ing analog and digital designers confidence that the inter-play between their work will actually meet the needs of the final system, and providing that information much more quickly and dynamically.

• This is faster. There is no need to swap files. We can use just one model and isolate pieces that we want to test for by calling directly down to other appropriate software.

• It is easier. There are no huge data files floating around. In fact, all that’s ‘floating around’ is the common and agreed Simulink model.

• It’s very dynamic. We can almost immediately see how changes in the digital system affect the analog system and vice versa because they all execute in the same model at the same time. Two vendors’ simulation en-vironments work together to enable you to study your environment much better.

As well as the links cited above, Simulink also has links to test equipment that enables hardware in-the-loop testing, and brings to bear the power of Matlab for data analysis to be used to interpret the test results. Simulink has links to test equipment from manufactur-ers such as Agilent, Anritsu, LeCroy, Rohde & Schwarz and others.

We have moved from a very high level of abstraction into the implementation world by using Simulink models as the test harness for analog and digital designs. Reusing system-models in this way enables us to find errors much earlier in the flow.

In the real worldRealtek was developing an audio IC and had the same need to improve intra-team communication while streamlining the project’s life cycle. Using Simulink as a common design platform, they were able to get the teams to share data from early in the design process and this made it far easier for them to work together. The teams could speak the same language and use the same test environment. Notably, the resulting design took a very high market share in its first year of release.

A coherent design processWe are now ready to bring all these elements together in a single flow and create the coherent design process we dis-cussed earlier.

Simulink allows you to model multiple domain systems in the same model: continuous time, discrete time, discrete event, finite state machines, physical and circuit models. So, one simulation can include digital hardware, analog and RF hardware, embedded software and the environment, with each part interacting with the others as appropriate.

Using Simulink, you can quickly create an abstract behav-ioral model of a system. This enables you to very rapidly choose between different system architectures, so giving you greater confidence the design will work on a first pass.

Moving onto implementation, analog parts of a model can be removed and replaced by co-simulation links to an analog design tool. The analog implementation team can thus continue to use its existing design tools, and also dy-namically test the behavior of its analog subsystem against the dynamic behavior of the digital subsystem.

Similarly, the digital engineers can replace the digital portions of the model with co-simulation links and hence test the digital implementation against the analog behavior of the system.

FIGURE 5 Exposing the clocks in the ADC



CONVERGENCE

Power. Noise. Reliability.From prototype to signoff, we provide a unified CPS platform to

manage noise and co-optimize your power delivery network across

the system. Mitigate design failure risk, reduce system cost, and

accelerate time-to-market with Apache.

www.apache-da.com

Untitled-4 1 8/14/09 10:09:06 AM

www.apache-da.com


This approach flushes out interface errors much earlier in the design process.

The final step in this process is the reuse of the system-model as a golden reference against which to compare the delivered hardware. This involves connecting Simulink to the test equipment and to the analysis and interpretation of the captured test results.

In essence, we are reusing the same system-model again and again as we move through the design stages.

SummaryBy introducing abstraction into the design process we en-able AMS designers to find a working architecture much more quickly than they could previously. We’ve linked dif-ferent groups in the design process via a common design platform. And we’ve cut down on development effort by reusing the same model through the design process.

In more specific terms for AMS designers, we have shown how they can gain practical advantages by taking a more abstract system-level view, and how this same process can be used to improve the communication between analog and digital engineers.

Do you wake up at night thinking about your project?

The DV Notebook gives you real-time statusin any web browser. Achilles software

automatically extracts, gathers, and organizeskey results from your design and verication

les as they are generated.

We’ve got just what you needto see clearly each morning.

www.achillestest.com

Untitled-8 1 8/14/09 10:21:42 AM

The MathWorks3 Apple Hill DriveNatickMA 01760USA

T: +1 508 647 7000W: www.mathworks.com

FIGURE 6 Controlling the time steps


TechConDESIGN TO THE POWER OF THREE

Energy E

fficien

cy

MCU & ToolsInte

rnet Ev

erywher

e

WWW.ARMTECHCON3.COMMEDIA SPONSORS:

IS COMING OCTOBER 21-23, 2009TO THE SANTA CLARA CONVENTION CENTER

FORMERLY ARM DEVELOPERS’ CONFERENCE:

BROUGHT TO YOU BY:

REGISTER NOW for only $495*Use Promo Code: ARM09EarlyBird

*A Discount of $200 off Std. Registration

Energy Efficiency Energy EfficiencyLeveraging energy efficient SoC strategies to minimize power requirements

MCU & Tools MCU & ToolsEnabling successful on-time product development, integration, testing and production

Internet Everywhere Internet EverywhereDeveloping applications for a connected world

Increase your knowledge exponentially and decrease your time-to-market by overcoming challenges in designs for everything from Motorcontrol to High-speed Communications that require energy conscious processors, tools and applications that are secure and connected.

Three Events in One!Select from over a hundred classes in our intensive and value-packed Conference, and visit key industry experts on the exhibition floor for the latest in:

Get 3X the Solutions for yourARM Powered Designs

HEARST

ARM_Techcon3_EDA_0909.indd 1 8/17/09 3:26:28 PM

www.armtechcon3.com

< TECH FORUM > VERIFIED RTL TO GATES28

The ability to perform chip and submodule verification within a unified and scalable SystemVerilog (SV) environ-ment minimizes the effort required for testbench develop-ment and frees up resources for work on the primary test cases specified in the verification plan.

The right combination of SV verification libraries (as found in the Advanced Verification Methodology (AVM) and the Open Verification Methodology (OVM)), SV Asser-tions and integrated processor models covers all verifica-tion needs. It offers the wide range of configurations needed to ultimately achieve dedicated submodule verification at maximum simulation speed without simulation overhead.

This approach reduces the overall time needed for function-al verification of a system-on-chip (SoC) and its corresponding submodules by exploiting several inherent advantages. These include scalability, automation, flexibility and reuse.

• The test environment is configurable at a high level, so the user can focus tightly on the part of a design that must be verified. Both top module and submodule veri-fication tests can be executed.

• Automation includes test and data generation, self-checking tests and regression mechanisms.

• The user can employ multiple stimulus techniques—these can include direct stimulation for integration testing, randomized stimulation for higher levels of abstraction during block-level testing, and processor models.

• Finally, the environment can be established as an open framework. The transactor models and scoreboards in-clude standardized transaction-level modeling (TLM) interfaces. This greatly reduces ramp-up time for the verification of new designs.

Traditional verification methodologies use separate envi-ronments for chip and submodule verification. By contrast, the newer, more integrated strategy described here provides improvements in efficiency that more than repay the extra effort required during the initial set-up and to manage the increased complexity of the resulting environment. More-over, only one environment now needs to be developed to cover all top-level and submodule verification tasks, and the increased complexity can be resolved by a well-defined class, directory structure and documentation.

The paper describes the SoC verification process for a specific chip to illustrate the approach. The design was a state-of-the-art, multi-million-gate storage controller ASIC including a variety of interface standards, mixed-language RTL and intellectual property (IP) blocks. The top-level module was captured in VHDL. It contained VHDL, Ver-ilog and SystemVerilog RTL components, with multiple IP blocks as both hard and soft macros (Figure 1). The SoC could be partitioned into three main parts: an ARM subsys-tem with three CPUs, a host subsystem containing the host interfaces, and a subsystem with customer logic.

The environmentThe SV environment was built on top of an AVM library to take advantage of the methodology’s TLM features. You can also use OVM or other libraries. Several transactors served the different external interfaces (e.g., SAS, SATA, Ethernet and various memory interfaces). We used SV’s ‘interface’ construct so that tests could access the device-under-test’s (DUT’s) internal and external interfaces. All transactors out-side the SV environment (e.g., memory models, ARM trace ports) were instantiated in the top-level testbench along with the DUT and the SV verification environment.

The SV environment itself was instantiated inside a pro-gram block in the top-level testbench file. This gave us the option of using ‘force’ instructions in tests where consid-ered necessary. We also used a test factory class that gener-ated a user-selectable environment object during runtime. To achieve this, the simulator call included a parameter that selected the type of environment class to be used as the test environment. This allowed us to construct different transactor configurations in each test. It also allowed us to

Thomas Severin is a SoC product development engineer in LSI’s Storage Peripheral Division. He has 10 years of experience in the ASIC industry and joined LSI in 2008. His special focus is in the use of advanced verification methodologies for complex designs.

Robert Poppenwimmer is a senior ASIC SoC design/verification engineer in LSI’s Storage Peripheral Division. He joined LSI in 2000 and has a master degree in electrical engineering from the Technical University Munich.

A unified, scalable SystemVerilog approach to chip and subsystem verificationThomas Severin, Robert Poppenwimmer, LSI


The article describes LSI’s work on the use of a single SystemVerilog-based (SV) verification environment for both the chip and its submodules. The environment is based on SV’s Advanced Verification Methodology (AVM) libraries, although alternatives are available. One particular reason for choosing AVM was that LSI wanted to leverage its transaction-level modeling capabilities as well as other “advantages.”

“A verification environment that offers reusability, scal-ability and automation allows our verification experts to focus on the functional verification goals of a complex SoC much more efficiently,” says Thomas Kling, engineer-ing manager for Custom SoC Products in LSI’s Storage Peripheral Division.

The main part of the article describes the environment’s development and application to a specific design: a multi-million-gate storage controller ASIC equipped with a vari-ety of interface standards and intellectual property blocks, and expressed at the RTL in multiple languages. “In using the described approach we were able to increase our engi-neering efficiency and maintain a high level of flexibility for reaching our verification goals on time,” says Kling.

run different tests without recompiling and restarting the simulator: when one test was finished, the corresponding environment object would be destroyed and the next test’s environment object was constructed and started.

The environment consisted of a base class that included all objects common to each test as well as the three AHB drivers. All tests were derived classes from this base envi-ronment class formed after individual selection of addition-al testbench modules (e.g., scoreboards, reference models, monitors and drivers for the DUT’s internal interfaces).

Test strategyThis primarily took two approaches to verification. Many tests covered control functions that were verified by applying directed tests, often supported by assertions to get feedback on the functional coverage. When it came to data path testing, we used a more suitable directed random approach. These tests were appropriate for testing the memory paths, Ether-net packet transfers, and SAS and SATA frame handling.

So, for the directed random approach, we implemented memory-based, self-checking capabilities. Data accesses were randomly applied to both the path-to-be-tested and a virtual reference memory model. All the read data packets were then routed to a scoreboard for comparison.

We made heavy use of assertions to make sure we covered all possible access modes on the buses and all functional ca-pabilities of the relevant submodules (e.g., bridges). All test classes were completely self-checking and a detailed test re-port was provided after every simulation run.

StructureOur testbench’s top level was a SV module with different func-tional sections. In its first section, we defined the SV interfaces and necessary code to instantiate and connect the DUT. A sec-tion with conditional instantiations of simulation models fol-lowed. Depending on ‘define’ macros inside the starting script, we could attach several different models at the DUT’s bound-aries (e.g., Fibre Channel transceivers, SAS and SATA models, DDR & flash memory models, and ARM boundary scan trick-boxes (BSTs) & embedded trace macrocell (ETM) models).

The next section included different groups of proprietary connections and several blocks of SV ‘bind’ instructions. All

these were separated by their functionality into ‘include’ files and again included, depending on the ‘defines’. These blocks were used to connect empty wrapper ports inside the DUT to SV interface signals, and the ‘bind’ blocks brought additional assertion checker modules into the RTL design. The final sec-tion contained the definition of a program block (outside the testbench module) that oversaw the construction and control of the test environment and its instantiation.

As shown in Figure 2, the environment base class (‘cl_env_base’) had all the internal drivers instantiated and connected to local virtual interfaces. Only the AI-1B driv-ers were left to be inserted on demand in a derived testcase class. As most of the drivers handled accesses to memory-mapped regions, they were connected to ‘Memory Slave’ units that simulated memory arrays of customizable size.


FIGURE 1 Testbench structure.

Source: LSI


Given also the option to replace RTL parts of the DUT (or even whole subsystems) with empty wrappers connected by SV interfaces to dedicated transactors in the environ-ment, we could now use one environment to test blocks or subsystems of the design as well as the whole chip.

For example, we had some tests that verified only the Ethernet interface and the attached frame buffer memory, while other tests verified the different ARM subsystems on a stand-alone basis. Of course, we finally used the complete DUT design for chip-level verification.

The AVM-based approach also allowed us to integrate large customer-designed parts that were provided late in the project schedule. We simply inserted empty wrappers, connected them to our transactors, and verified the inter-faces to the customer logic. Later we replaced the wrappers with the real RTL, dynamically removed the transactors, and were able to reuse all the available tests.

ConnectivityIn the top-level testbench we defined SV interfaces and as-signed DUT ports to their signals (Figure 1). For the internal connections to the empty wrapper modules in the design, we connected the wrapper’s ports to the corresponding SV in-

If we had used configurations in which real RTL was used in-stead of the empty wrappers, the affected drivers’ interface con-nections would simply be left unconnected. But as all were used in most testcases, they were implemented in the base class.

As some tests involved using AI-1B bus functional mod-els (BFMs) while others used ARM design simulation mod-els (DSMs), we decided to instantiate the AI-1B transactors inside the testcase-specific classes. These were derived from the base class and, therefore, inherited all the base class’s transactors and connections.

In each testcase class, we could define the specific AI-1B transactor to be used (or none where we used DSMs), as well as all the test-supporting infrastructural models (e.g., scoreboards, stimulus generators and reference models). The testcase class also contained the actual test inside the ‘run ()’ task. The general control flow of all involved models was implemented here.

Through the SV interfaces and their connection to the en-vironment, it was now very easy to build a largely customiz-able periphery around the DUT. Most test-specific transactors were defined inside the environment; only the static ones were directly instantiated at the top level. Even there we could cus-tomize the transactors using different ‘define’ parameters.

FIGURE 2 Verification environment.

Source: LSI

31

them in the waveforms. This took away a lot of pain during the SV transactor development and debugging process.

Verification componentsThe most difficult task was the integration of all the required transactors, especially the provision of a simple and unified access method for the test writers. To illustrate: we had to use some drivers (e.g., SAS/SATA) that were available only in Verilog; our Al-TB driver was a reused and quite complex module written in pure SystemVerilog; and we needed to code several new transactor classes.

We developed new drivers for several different internal bus protocols as well as a basic Ethernet driver, memory transactors, enhanced scoreboards capable of comparing out-of-order transactions, reference models and testbench-supporting transactors. These transactors enabled synchro-

terfaces. Inside the environment base class, we had a virtual interface defined for each interface used in the top level. Both interfaces and virtual interfaces were connected at simulation start-up time to provide signal access to the environment.

To make life a little easier, we defined an additional ‘merger’ interface that had all the other interfaces nested inside, so we only needed to route one interface through the environment hierarchy instead of a dozen.

When a wrapper was later replaced by real RTL, the ‘in-clude’ file that built the connections was not included, re-sulting in an unconnected interface. On the other side, we would not generate the corresponding driver anymore, thus maintaining a fully working environment.

For some tests, especially DSM-related ones executed on an ARM CPU model, it is worth having a transactor con-nected to an internal interface even when the RTL code is used. We had some transactors that established a commu-nication channel (through accesses to dedicated memory areas) between the C program running on an ARM DSM model and the SV testbench routine.

For this to work we had to leave the ‘include’ files inte-grated after replacing the wrappers with RTL, effectively connecting the SV interface signals to the real RTL module’s ports. Another helpful technique was to add an SV interface for debugging purposes to the merger. As signals inside a SV interface can be displayed in a waveform (unlike SV dy-namic variables or objects), we could assign values to such ‘debug interface’ signals inside an SV transactor to watch Continued on next page

EDA Tech Forum September 2009

The right combination of SystemVerilog verification

libraries, SV Assertions, and integrated processor

models covers all verification needs.

Untitled-5 1 8/14/09 10:12:29 AM

www.bluepearlsoftware.com


LSIOrleansstrasse 481669 MunichGermany

T: + 49 (0) 89 45833 0W: www.lsi.com

nization by event triggering and message passing between the SV environment and C routines running on the DUT’s ARM subsystems.

As our goal was to take maximum advantage of the TLM features to simplify the interconnections between the trans-actors and unify their utilization, we put some effort into making as many components AVM-compliant as possible. This was also very important with regard to plans for our subsequent reuse strategy and later migration to the OVM library for future projects.

Using the AVM library saved resources that were no lon-ger taken up in handling the details of managing transac-tion movement inside an environment. The predefined TLM structure made it possible for a single engineer to plan, build and maintain the whole environment, including most of the transactors. The rest of the team could concentrate on RTL development and test writing.

Converting the Fibre Channel and SAS/SATA Verilog transactors to SystemVerilog and AVM was not feasible with-in this project’s schedule, but these tasks will be undertaken for our next-generation environment. Porting our already available SV Al-TB driver to AVM compliance required some changes in its internal structure, but was accomplished in a reasonable time. The development of all the new transactors was accomplished ahead of schedule thanks to the easy-to-use structural TLM building blocks of the AVM library.

Untitled-15 1 6/1/09 3:51:19 PM

www.cofluentdesign.com

2009 Platinum Sponsors:

Check other locations and register now at:

STAY ON THE

FRONT LINEOF EE DESIGN

STAY ON THE

FRONT LINEOF EE DESIGN

Attend the world’s largest EDA industry event with its technical sessions on electronic system level design (ESL), IC design and physical verification, functional verification, FPGA/PLD, design-for-test, and PCB systems design.

SEPTEMBER 3RD Santa Clara, CA

Santa Clara Convention Center

For our Santa Clara audience, this free, one day event now has an embedded software track! Sessions include:

• Linux® and Nucleus®: A Killer Combo for Multicore SoCs

• Driving Android™ Beyond Handsets • Invigorate Your Device Utilizing 3D User Interface Technology

EDA_EventAd_09v6.indd 1 7/22/09 5:20:36 PM

www.edatechforum.com

< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION34

Kurt Parker is a field applications and product marketing engineer for Netronome Systems. He holds a Masters of Engineering and an MBA from Arizona State University.

Unified computing architectures (UCAs) bring together net-working, computing, storage access and virtualization in systems that aim to streamline data center resources, scale service delivery, and reduce the number of devices within a system that require setup and management. They must deliver powerful packet processing (e.g., thousands of ap-plied processing cycles per packet, and more than 30 mil-lion packets per second); high levels of integration (e.g., I/O virtualization, security and encryption); and ease of both implementation and use.

In adopting UCAs, system architects seek to avoid costly, lengthy and risky custom ASIC developments and instead favor merchant silicon providers that offer highly program-mable network processors. Such devices give them the abil-ity to develop and deliver innovative, differentiated products while conforming to industry standards—standards that are themselves often in flux. Meanwhile, performance, power and space budgets are fueling the popularity of multithread-ed, multicore architectures for communications systems.

Such a range of technologies is encapsulated within our company’s Netronome NFP-3200 processor. Our experience here suggests that comprehensive suites of intuitive and famil-iar software applications and tools are fundamental to the suc-cess of next-generation communications processing projects.

The application development suites must be based on a familiar GUI and act as the easy-to-use gateway to a soft-ware development kit that contain the tools needed for all phases of a project (e.g., initial design, code creation, simu-lation, testing, debugging and optimization). A command line interface with flexible host OS requirements will speed interactive development tasks. You also need a power simu-lation environment that allows the software and hardware teams to develop the next-generation platform simultane-ously and thereby take fullest advantage of the capabilities available in highly programmable UCAs. Other require-ments for effective development that work within this mod-el include:

• The ability to switch between high-level programming lan-guages and assembly code at a very granular level. C com-pilers provide a familiar high-level programming lan-guage with isolation from hardware specifics for faster time-to-market and optimum code portability. Assem-bly code can also be used to fine-tune portions of an application to maximize performance, and should be embedded in the C code for optimal results.

• The appropriate use of legacy architectures. Architectural choices backed by technologies that can boast years of market leadership and success are inherently both safer and more stable. Such legacy technologies also provide access to and thus take advantage of available pools of talent and experience. Meanwhile, most customers will expect full backward-compatibility with existing archi-tectures.

• A choice of development platforms. Access to multiple development platforms and an ability to debug ap-plications on internally developed platforms will en-able accurate simulations of real-world performance in hardware during the system design process.

• Access to advanced flow processing development tools. Cycle- and data-accurate architectural simulations are vital to the rapid prototyping and optimization of applications and parallel hardware/software development. Flexible traffic simulation and packet generation tools reduce testing and debugging time.

Applications enabled by unified computing architectures

Enterprises and service providers alike are using various network-based appliances and probes across a widening range of important activities. These include the test and measurement of applications and services, and deep packet inspection to provide billing, accounting and the enforce-ment of acceptable-use policies.

These appliances must therefore offer high performance. They must be sufficiently programmable that they can adapt to the evolving networking landscape. And they must have extremely low latency to avoid inserting any delay into applications and services that measure or pro-

Implementing a unified computing architecture

Kurt Parker, Netronome Systems

35

Netronome offers a range of programmable Network Flow Processors, which deliver high-performance packet processing and are aimed at designers of communications equipment whose requirements extend beyond simple forwarding.

Many network processors and multicore CPUs lack L4-L7 programmability or cannot scale to 10Gbit/s and beyond. Netronome’s flow processors are powered by 40 pro-grammable networking cores that deliver 2,000 instruc-tions and 50 flow-operations-per-packet at 30 million packets-per-second, enabling 20Gbit/s of L2-L7 processing with line-rate security and I/O virtualization. This article describes the tool flow for the development of a high-end application using the processor.

as firewalls, intrusion detection systems, intrusion preven-tion systems, anti-virus scanners, unified threat manage-ment systems, network behavior analysis, network moni-toring, network forensics, network analysis, network access control, spam/spyware, web filters, protocol acceleration, load balancing, compression and more.

The ISVs desire tools and software libraries that deliver quick, easy access to the powerful UCAs built for deep packet inspection and the deployment of security applica-

tect system activity. Simple configurations will not suffice. Evolving standards, government oversight and regulation, and technological innovation require not only UCAs but also powerful and flexible tools that give fast, easy access to those architectures.

Network-based threats, such as spam, spyware and vi-ruses, identity theft, data theft and other forms of cyber crime have become commonplace. To combat these threats, a multi-billion-dollar industry of independent software vendors (ISVs) has emerged. These ISVs provide numerous categories of network and content security appliances such Continued on next page

FlowC Compilerand optionalNetwork Flow Assembler

Precision Flow Modeler fordata- and cycle-accuratesimulation

Packet generationand traffic simulation

Local simulation withlocal, remote or noforeign model, andhardware

Centralized control ofcompiler, linker, assemblerdebugger, simulationand testing

Project workspace withlibraries and documentation.

IntegratedProject Management

CodeCreation

SoftwareSimulation

NetronomeProgrammer

Studio

DevelopmentTesting

Debugging

PerformanceOptimization

InitalDesign

FIGURE 1 Comprehensive tools for all design phases

Source: Netronome


< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION36

listing of documents included in the SDK; and a tree listing of all microcode blocks that are found when opening a project.

DevelopmentApplication development can be broken into six phases: Initial Design, Code Creation, Software Simulation, Devel-opment Testing, Debugging and Performance Optimization (Figure 1). The process of architecting a system often entails an iterative series of requirements analysis, estimation (e.g., size, speed, resource cost), proof-of-concept implementa-tion and test. Many blocks of code already exist for standard functions (e.g., packet classification, forwarding and traffic management) and can help during architectural develop-ment. In particular, they give an indication of the code and resource footprint for typical functions. This allows devel-opers to focus on innovations that drive value to their users and differentiate the end system. Proof-of-concept and test are mini-develop/debug phases that can be accelerated by using the SDK’s code-building and simulation tools.

Powerful high-level language tools drive rapid code development. The Netronome Flow C Compiler (NFCC) provides the programming abstraction through the C lan-guage. It focuses on one given microengine within the NFP-3200, with threading and synchronization exposed at the language level. When a program is executed on a microen-gine, all its threads execute the same program. Therefore, each thread has a private copy of all the variables and data structures in memory.

The compiler supports a combination of standard C, lan-guage extensions and intrinsic functions. The intrinsic func-tions provide for access to such NFP features as hash, content addressable memory (CAM) and cyclic redundancy check (CRC) capabilities. Developers can configure the NFCC through the GUI or through a command line to optimize their code for size, speed, or debugging at the function level or on a whole-program basis. The compiler also supports inline as-sembly language, both in blocks and individual lines.

The Netronome Flow Assembler (NFAS) will assemble microengine code developed for the IXP2800 legacy mode

tions in increasingly virtualized environments.Another area where communications equipment manu-

facturers will see an impact from UCAs is in intelligent network interface cards for virtual machine environments within multicore Intel Architecture system designs by way of virtualized on-chip networks. Today, in single-core sys-tems, an external network provides functions (e.g., VLAN switching, packet classification and load balancing) to di-rect traffic to one or more systems. As these systems are now combined within a multicore virtualized system, the serv-er’s network I/O facility must provide the same functional-ity that would previously have been provided externally.

The Netronome Network Flow Processing software de-velopment kit (SDK) and related application code enables system designers to take such an integrated approach by employing high-speed network flow processing to intel-ligently classify millions of simultaneous flows and direct traffic to the appropriate core and/or virtual machine.

While unified computing systems (UCS) are extending to and through 10Gbit/s data rates, their very existence ob-viates merely configurable architectures, which offer little or no ability to differentiate in services or performance. Purpose-built processors designed to handle the growing and changing needs of UCSs through their programming flexibility and high levels of integration are the only way to achieve maximum performance efficiency. The NFP-3200 has 40 multithreaded packet processing microengines run-ning at 1.4GHz, and in the next section we will use it as an example of how such high performance can be exploited by an appropriate toolset to develop a UCS.

Implementation flowThe NFP SDK provides the tools needed to implement next-generation designs. These are the main steps you would take to develop a UCS.

ConfigurationThe Netronome Programmer Studio is a fully integrated devel-opment environment (IDE) that allows for the building and de-bugging of networking applications on a unified GUI. Its graph-ical development environment conforms to the standard look and feel of Microsoft Windows, allowing developers to custom-ize the workspace to fit their personal flows and comfort.

To enhance organization and multi-party access, ongoing development settings and files are managed through proj-ects. Most projects are set up in a standard fashion that allows full assemble, compile and build control. There is also the ability to create a ‘debug-only’ project. This allows fast-track enablement to debug functionality on externally controlled projects. The Project Workspace, a window within the Pro-grammer Studio, provides tabs with important project and development related information including a tree listing of all project files; a listing of all the Microengine cores in the NFP-3200 that are loaded with microcode when debugging; a

FIGURE 2 The Network Flow Linker interface

Source: Netronome

37

of the entire data-plane portion of the chip and its interfaces. Figure 3 shows the PFM in action. In the debug phase of the design, a customer can select and view the code and program counter position for any thread with code loaded in the build system. Using breakpoints is a standard tool for checking for code correctness, and the PFM allows them to be set not only on points in the code being run, but also on changes in inter-nal registers and external memory locations.

In many communication applications, performance effi-ciency as it relates to power, cost and size are as important as performance. Competitive differentiation is often gained through the ability to apply increasing amounts of function-ality to every packet in a high-speed data flow. In these cas-es, it is desirable to tune application code to maximize per-formance and functionality in the communications system. Because the PFM is a cycle-accurate simulation, developers can use it to see exactly how well their code is running as written for the NFP without actually loading it on the chip. In addition, Programmer Studio captures code coverage so the user can identify dead and highly executed code in an application. This allows performance improvements and it-erative code development in parallel with hardware design and integration. Of specific use to developers in the optimi-zation phase is the Thread History Window (seen at the foot of Figure 3). Color coding of cycle-by-cycle activity on each microengine thread gives a quick visualization of when the microengine is executing code or might be stalled and in need of a software switch to the next context in line. Perfor-mance statistics, execution coverage, and an ability to craft simulated customized traffic patterns into the NFP-3200 help developers see hot spots in their code where additional focus would bring performance gains in the application.

or for the NFP-3200’s extended addressing and functional-ity (a.k.a. extended mode). Like the NFCC, it assembles on a per-microengine basis. Evoking that assembler results in a two-step process: preprocessing and assembly.

The preprocessor is invoked automatically by the assem-bler to transform a program before it reaches the assembly process, including the processing of files and replacement of certain literals. At this stage, developers can also invoke any or all of the following facilities: declaration file inclusion, macro expansion, conditional compilation, line control, structured assembly and token replacement.

The assembly process includes code conversion, optimi-zation and register/memory allocation.

For single functions or entire applications, ready-to-build code can be partitioned and imaged using Netronome’s Network Flow Linker (NFLD). The NFLD interface allows users to manage the complexity of a multi-threaded, multi-processor architecture within an easy-to-use GUI (Figure 2). The user assigns list files output by the assembler and com-piler into each of the microengines within the chip. Vari-ous memory reservation, fill options and build options are presented as well.

Debug and optimizationThe Programmer Studio is backed by the Netronome Preci-sion Flow Modeler (PFM), a cycle- and data-simulation model

FIGURE 3 The Precision Flow Monitor simulation system

Source: Netronome

Netronome144 Emeryville DriveSuite 230Cranberry TwpPA 16066USA

T: 1 724 778 3290W: www.netronome.com


You need a power simulation environment that allows the software

and hardware teams to develop your next-generation platform

simultaneously.

38 < TECH FORUM > DESIGN TO SILICON

A recent session at the Design Automation Conference in San Francisco considered how to make the 22nm process node a reality despite an increasing number of obstacles. All the speakers were unanimous that part of the answer will come from using system-level design strategies to address manufacturability.

Much has already been said and written about the need to bring design-for-manufacture (DFM) further up the design flow, although it would appear that necessity will prove as much the mother of abstraction as invention in this case, with 22nm creating a series of challenges that make the shift necessary.

According to various data, 22nm manufacturing is ex-pected in 2012 and leading manufacturers are already in-stalling or preparing to install capacity (Figure 1).

The challenges of 22nmIntelIntel has always been at the forefront of the intersection be-tween design and manufacturing, and remains one of the few semiconductor companies fully active in (and commit-ted to) both areas.

Shekhar Borkar, a company fellow and director of its Mi-croprocessor Technology Labs, divided the challenges pre-sented by the 22nm into the technological and the economic, and also made some observations on the future of custom design. His overarching theme was that ‘business as usual’ simply is not an option.

From the technological point of view he cited a number of relatively familiar but now increasingl large problems. The main ones were:

• slowdowns in gate delay scaling;• slowdowns in both supply and threshold voltages as

subthreshold leakage becomes excessive;• increased static variations due to random dopant fluc-

tuations (RDFs) and the limitations of sub-wavelength lithography (stuck at 193nm, with ‘next-generation’

extreme ultra-violet (EUV) lithography still to arrive in commercial form);

• increased design rule complexity and more restrictions due, again, largely to sub-wavelength lithography; and

• greater degradation and less reliability due to the high electric fields.

From the economic point of view, Borkar noted that:• Intel expects a 22nm mask set to cost more than $1M,

indicating that the manufacturing cost ramp is hardly slowing down (separately, the Globalfoundries fab be-ing prepped for 22nm has been ‘conservatively’ esti-mated at a cost of $4.5B); and

• there is a growing ‘tools gap’ between increases in ef-ficiency that are being delivered and the ability of the EDA software to deal with the increases in complexity presented by a node that is likely to offer one billion transistors on a single piece of silicon.

Then, Borkar described how the traditional advantages of custom design would be reduced or obviated completely:

• Achieving the best operational frequency is no longer achieved largely by optimizing the resistance and ca-pacitance metrics for interconnect, since transistor scal-ing, variability and power consumption have increased their influence so greatly.

• The restricted design rules (RDRs) imposed by manu-facturing variability virtually eliminate the chances of going in to manually enhance how closely transistors and interconnects are packed together on a chip.

• There have been cases where attempts to take a local rather than a global view of optimization, in the hope that there may be some islands available, have actually worsened rather than improved a design, or at least added to the time-to-market to little effect.

Carnegie Mellon/PDF SolutionsIn his presentation, Andrzej Strojwas, Keithley professor of Electrical and Computer Engineering at Carnegie Mellon University and chief technologist of PDF Solutions, echoed many of Borkar’s points and added some observations of his own, more directly related to manufacturing metrics.

The 2009 Design Automation Conference was held in San Francisco, July 26-31, 2009. Further information on accessing archived proceedings and papers presented at the event is available at www.dac.com.

System level DFM at 22nmSpecial Digest, EDA Tech Forum

39

The article provides an overview of one common theme in the papers presented at a special session of the 2009 De-sign Automation Conference, Dawn of the 22nm Design Era. As such, we would recommend that readers wishing to access still more detail on this topic (in particular, on de-vice structures for 22nm and project management require-ments) read the original contributions in full.

The papers are:

8.1 “Design Perspectives on 22nm CMOS and Beyond”, Shekhar Borkar, Intel.

8.2 “Creating an Affordable 22nm Node Using Design-Lithography Co-Optimization”, Andrzej Storjwas, Tejas Jhaveri, Vyacheslav Rovner & Lawrence Pileggi, Carnegie Mellon University/PDF Solutions.

8.3 “Device/Circuit Interactions at 22nm Technology Node”, Kaushik Roy, Jaydeep P. Kulkarni, Sumeet Kumar Gupta, Purdue University.

8.4 “Beyond Innovation: Dealing with the Risks and Com-plexity of Processor Design in 22nm”, Carl Anderson, IBM.

It is his belief —and that of his CM/PDF co-authors—that the recent, highly lauded innovations in high-k metal-gate stacks will reduce RDF-based variation only for the 32/28nm process generation. For 22/20nm, the industry will need to move to long-touted device architectures such as FinFETs and ultra-thin-body or fully depleted silicon on insulator (SOI), technologies that mitigate RDFs by reduc-ing the dopant concentration in the channel.

Furthermore, CM/PDF research suggests that systemic variations will reach prohibitive levels at 22nm if issues surrounding limitations in lithography resolution and the design enhancements offered through stress technologies are not addressed. In particular, the ongoing lack of EUV lithography is forcing the introduction of double patterning techniques (DPTs). In context of the modeling, characteriza-tion and printability challenges such multi-exposure DPTs suggest, the technique will be extremely expensive.

Purdue UniversityKaushik Roy, professor in the Nanoelectronics Research Labo-ratory at Purdue University, joined Strojwas in putting forward a structural answer to the challenges presented by 22nm.

In Purdue’s case, one proposal was for multi-gate FETs (MUGFETS) to address the increase in short channel effects (SCEs) as shrinking transistor sizes bring the source close to the drain.

MUGFETS will not be immune to SCEs—nor to threats from body thickness—but they do offer a broader range of design elements that can be tuned for the node. These

include multi-fin and width quantization, use of the back-gate as an independent gate, gate-underlapping and fin ori-entation (Figure 2, p.40). An important question, though, is where in the flow such tuning should take place.

IBMCarl Anderson, an IBM Fellow who manages physical de-sign and tools for the company’s server division, addressed the challenges inherent in 22nm from a project management perspective.

As complexity increases, he argued, so does the emphasis that needs to be placed on the culture and discipline through which companies manage risks and resources.

Even today, respins and major delays could often be at-tributed to changes that were sought relatively late in the


High Volume Manufacturing 2006 2008 2010 2012 2014 2016 2018

Technology Node (nm) 65 45 32 22 16 11 8

Integration Capacity (BT) 4 8 16 32 64 128 256

Delay = CV/I scaling ~0.7 >0.7 Delay scaling will slow down

Energy/Logic Op scaling >0.5 >0.5 Energy scaling will slow down

Variability Medium High Very High

FIGURE 1 Technology outlook

Source: Intel


< TECH FORUM > DESIGN TO SILICON40

the designer (e.g., clocking).There would be still more integration required to meet

the needs of the test stage, Borkar said, with each func-tional block requiring either built-in self-test capability or a straightforward interface to an external tester.

Strojwas defined goals that addressed a traditional set of DFM objectives more specifically, but also placed great em-phasis on pre-characterized circuit elements and templates. This may suggest a slightly greater degree of granularity than Borkar’s vision, although to say as much to any major degree would be splitting hairs.

Strategically, Strojwas says that DFM must become proac-

design cycle, he said.He also warned that “It will be very easy and tempting to

propose chip designs and goals for 22nm that are unachiev-able within finite schedules and resources.”

The system-level solutionGiven the variation in topics across the four presentations, there was nevertheless broad agreement that some kind of high-level design flow strategy needs to be adopted to take full account of 22nm’s sheer range.

Correct-by-constructionThe notion of abstracting to the system level to achieve correct-by-construction design was cited explicitly by both Borkar and Strojwas. Borkar said the main objective had to be designs that are fully functional, on specification and manufacturable after the first past. To do this, he said that the industry needs to shift to a system design methodology that is “SoC-like” [system-on-chip].

The main difference will lie in the fact that today such SoC strategies are based around application-specific blocks, whereas that required for 22nm will be more concerned with soft blocks (or macros) that represent such components of a design as cores, on-die networks and some ‘special function’ hardware (Figure 3). At the same time, Borkar said that the shift to place more of the overall system onus on software will continue.

These new blocks will be predesigned and well character-ized, and as a result, the emphasis in differentiating your product and enhancing its performance will move to sys-tem-level optimization as opposed to designing logic blocks. Physical design will also be predominantly automated, and numerous aspects of a design might now be ‘hidden’ from

Process Options

Circuit Options

SystemRequirements

Tech DesignOptions

Single Gate Double Gate

Fin height, fin thickness, gate workfunction, channel doping,fin orientation, gate underlap, ...

Standard CMOS,SRAM, Logic

Lower-power, robustness, high performance, area efficiency

Vdd-H-Vt, gate underlap (sym/asym), fin orientation

Asymetric gates

Special circuit styles: SchmittTrigger, Dynamic logic, SRAM,

Skewed logic...

FIGURE 2 Technology-device-circuit-system codesign options for double gate MOSFETS from 22nm

Source: Purdue University

SmallCores

LargeCore

Network on Chip

SpecialFunctionHardware

Memory

FIGURE 3 SoC-like design methodology with ‘soft’ macros

Source: Intel

Untitled-15 1 8/11/09 4:21:34 PM

www.exludus.com

< TECH FORUM > DESIGN TO SILICON42

ductor industry to reap the benefits of scaling economics in sub-22nm nodes.

With his focus on integrating tools and design method-ologies into project management—arguably one of the more neglected areas in terms of EDA implementations and execu-tions—Anderson notes that design teams will need to make still more numerous and complex trade-offs between sched-ule, function, performance, cost and resources for 22nm. He continues:

Trade-offs between custom vs. synthesized macros, reused vs. new custom [intellectual property], more features vs. power, etc. will have to be made early in the design cycle. The chal-lenge will be to optimize the entire chip and not just a single component.

However, given the threats posed by late-stage changes or “problems that are hidden or that only surface near the end of the design cycle,” and the fact that the ability to ad-dress them may be constrained, he also notes that a good engineer’s ability will continue to be defined by whether or not he can innovate “inside the box.”

Cultural changeUntil very recently, DFM and ESL have been seen as two very different areas of endeavor within EDA and chip de-sign more generally. There has also been a perceived geo-graphic split, with North America being considered stron-ger on manufacturability issues while Asia and Europe were considered more advanced in terms of abstracted de-sign flows.

The message from manufacturing specialists at 22nm, however, is that its distinction is becoming increasingly un-tenable. The minutiae of a circuit or a transistor still matter—indeed, the structure of circuits would seem fundamental to any progress along the process node path—but systems that are defined with an awareness of fabrication challenges are vital to future progress.

tive as opposed to reactive, notwithstanding its inherent need to operate based on a considerable volume of already generat-ed and deeply researched manufacturing data (indeed, exactly the kind of analysis in which PDF Solutions specializes). This alone will effectively counter systemic variation, he said.

In the paper that accompanies the DAC presentation, Strojwas and his co authors write, “We propose a novel de-sign methodology (Figure 4)…that will ensure a correct-by-construction integrated circuit design.” They continue:

The key enabler of the methodology is the creation of a regular design fabric onto which one can efficiently map the selected logic templates using a limited number of printability friendly layout patterns. The co-optimization of circuit layout and pro-cess is achieved by selection of logic functions, circuit styles, layout patterns and lithography solutions that jointly result in the lowest cost per good die. The resulting set of layout patterns and physical templates are then fully characterized on silicon through the use of specially designed test structures.

Roy and Anderson are looking at this from less specifical-ly methodological aspects, but where their papers intersect, they reach broadly similar conclusions.

Roy and his co-authors conclude their review of various device structures by noting that system-level strategies are already developing that are independent of the underlying technology, and continue by stating:

Technology scaling in 22nm [will] require closer interaction between process, device, circuit and architecture. Co-optimi-zation methodologies across different levels [will] help to mit-igate the challenges arising due to changes in transistor topol-ogy and increased process variations. Novel device/circuit/system solutions integrated seamlessly with the EDA tools [will] help meet the desired yield and [will] help the semicon-

Product DesignOjectives

CircuitDesign Style Templates

Litho ChoicesLayout

Design Style

DPT,MEBM, IL

PatternsSilicon

Characterization

First-PassSilicon Success

FIGURE 4 Extremely regular layout methodology

Source: Carnegie Mellon University/PDF Solutions


Some kind of high-level design flow strategy needs

to be adopted.

Magwel’s 3D simulation and verification solutions have the capacity, speed and accuracy to tackle your most challenging designs whether they are in analog, mixed-signal, RF or power management ICs . Magwel’s revolutionary 3D co-simulation technology bridges the simulation gap between semiconductors and metal that has limited the accuracy of conventional extraction tools. Get the accuracy of a true physics based simulator with the speed and capacity of our new 64-bit algorithms. To learn more please visit our website www.magwel.com or call us at +1 408 930 1065.

Unique technology. Exclusively from Magwel. The leader in 3D co-simulation and extraction.

Are you drowning in the …

Substrate noise of your power management or RF circuit?

Electro-migration issues of your power transistor array?

RF cross-talk of your ESD structure?

Inaccuracy of your extractor for 3D nets?

Verify and extract with confidence 3D simulation accuracy with speed and capacity

Untitled-7 1 8/14/09 10:19:14 AM

www.magwel.com

< TECH FORUM > TESTED COMPONENT TO SYSTEM44

Matt Gordon is technical marketing engineer at Micrium and has several years of experience as an embedded software engineer. He has a bachelor’s degree in computer engineering from the Georgia Institute of Technology.

Jacko Wilbrink is product marketing director at Atmel. He has more than 20 years of experience in the semiconductor industry and fostered the development of the industry’s first ARM-based microcontroller, the SAM7. He has a degree in electronics engineering from the University of Twente, the Netherlands.

The Universal Serial Bus (USB) revolutionized the PC world and is rapidly gaining ground in the embedded controls market. The basis of its success is simplicity, reliability and ease-of-use.

In the PC world, USB has completely replaced the UART, PS2 and IEEE-1284 parallel ports with a single interface type, greatly simplifying software drivers and reducing the real estate that must be dedicated to bulky connectors. The rapid drop in solid state memory prices combined with the increased speed of the USB 2.0 specification (480Mbps) cre-ated the opportunity to store several gigabytes of data in USB memory sticks very quickly. As a result, memory sticks have replaced floppy disks and CD drives as the primary vehicle for data storage and transfer.

One further key to USB’s success is interoperability, based on the well-defined standard (Figure 1) and guaranteed by the USB Consortium. Any USB-certified device from any vendor will work in a plug-and-play fashion with any other USB-certified device from any other vendor. Multiple devices can operate on the same bus without affecting each other. The end-user no longer needs to specify IRQs for every peripheral in the system. The USB standard does all the housekeeping.

Another major advantage of USB is that it relieves system designers of the burden of implementing one-off interfaces and drivers that are incompatible and often unreliable. For us-ers of embedded controls systems in particular, USB obviates the need to maintain an inventory of different, bulky cables as well as any concerns over their long-term availability because of the drop-in replacement nature of USB peripherals.

All these advantages have fostered the adoption of USB in the embedded space. It has become so popular that virtu-ally every vendor of 32-bit flash MCUs offers several de-

rivatives with USB full-speed device or On-The-Go (OTG) capabilities. Embedded microprocessors frequently include both high-speed device and host ports. Some even have an integrated USB hub that supports the connection of multi-ple USB devices, going some way beyond the initial line-up of keyboards, mice and storage card readers.

The simplicity and ease-of-use of USB software and its high sustainable data rates are driving many designers to migrate designs to USB-enabled 32-bit MCUs, which are now price-competitive with 8- and 16-bit devices and offer higher inter-nal bandwidth to handle and process hundreds of thousands of bits for each attached high-speed peripheral.

USB also offers the opportunity to replace wires between PCBs within a system (e.g., a host processor platform con-nection to a user interface panel). In most cases, the technol-ogy brings together different PCBs that do not sit close to-gether. The USB cable is a robust, low-cost and EMI-tolerant alternative to parallel wires.

As USB has found its way into an increasing number of embedded devices, software developers have become wary of the additional complexity that this protocol can bring to an application. The developers of USB-enabled products must shoulder a hefty burden in order to grant end-users the convenience that has made this means of serial com-munication so popular. Whereas software drivers for SPI, RS-232 and other simple serial protocols typically involve little more than read and write routines, USB software driv-ers can span thousands of lines, incorporating routines that are difficult to develop and to debug. The software that sits on top of the drivers can be similarly complex, in part be-cause this code must manage enumeration, the byzantine process by which USB hosts identify devices.

In order to avoid concerning themselves with enumera-tion and other confusing aspects of USB, many engineers turn to commercial off-the-shelf (COTS) USB software. For a developer using a reliable, well-tested software module, USB communication simply becomes a series of calls to eas-ily understandable API functions. Thus, programmers who rely on such modules can afford their end-users the con-venience of USB without significantly lengthening product development times. Using COTS USB software also offers the best guarantee that devices can interoperate, intercon-

Pushing USB 2.0 to the limitJacko Wilbrink, Atmel & Matt Gordon, Micrium

45

USB offers many advantages for use on embedded systems, although software developers remain concerned about the additional complexity it can bring to an applica-tion. For example, software drivers for SPI, RS-232 and other traditional serial protocols typically involve little more than read and write routines, while USB software drivers can span thousands of lines, incorporating routines that are difficult to develop and to debug. The software that sits on top of the drivers can be equally complex.

To avoid being forced to address enumeration and other confusing aspects of USB, many engineers turn to com-mercial off-the-shelf (COTS) USB software. For a devel-oper using a reliable, well-tested software module, USB communication simply becomes a series of calls to easily understandable API functions. Thus, programmers who rely on such modules can afford their end-users the conve-nience of USB without significantly lengthening product development times. Using COTS USB software also offers the best guarantee that devices can interoperate, intercon-nect and/or communicate with one another as specified by the USB standard.

nect and/or communicate with one another as specified by the USB standard.

Software solutions for USB implementationsFor the sake of simplicity, ease-of-use and software portabil-ity, three hardware/software interface standards have been defined by Intel for the register level interface and memory data structures for the Host Controller hardware implemen-tation: the Universal Host Controller Interface (UHCI) for low-speed, Open HCI (OHCI) for full-speed, and Enhanced HCI (EHCI) for high-speed USB host controllers.

The USB driver abstracts the details for the particular host controller driver for a particular operating system. On top of the driver, multiple client drivers run specific classes of devices. Examples of device classes are Human Interface Device (HID), Communication Device Class (CDC) and Storage Class.

Developers whose products function as USB hosts are not the only engineers who can benefit from a quality USB soft-ware module; implementers of USB OTG and devices also

have much to gain. Although the makers of USB devices are somewhat insulated from the aforementioned host controller differences, these developers still must ensure that high-speed hosts can actually recognize their devices. A home-grown USB device implementation capable of full-speed communi-cation must normally be overhauled to support high-speed communication. Even if the underlying USB device control-ler is capable of high-speed communication, the upper-layer software might not support the additional enumeration steps that high-speed communication involves. The upper layers of a solid COTS implementation, however, are intended to be used with any type of host, full- or high-speed.

Because hardware-related issues for both hosts and de-vices are minimized by USB software modules, overhead can be a major concern for developers considering these modules. Although most embedded microcontrollers can-not maintain high-speed USB’s 480Mbps maximum data rate, a low-overhead software module can ensure that rates well over the full-speed maximum of 12Mbps are viable. Because these modules rely heavily on DMA for transferring packets to and from memory, applications that incorporate

Type

Client DriverSoftware

Universal Bus Driver (UBD)

Companion(UHCI or OHCI)Host Controller

Driver

EnchancedHost ControllerDriver (EHCD)

Companion(UHCI or OHCI)Host Controller

EnhancedHost Controller

(EHCI)

USB

USB Device

Scope ofEHCI

Hardware

SystemSoftware

FIGURE 1 USB 2.0

Source: USB Consortium

Mode Max bandwidth

Bulk 53.24MB/s

Interrupt 49.15MB/s

Isochronous 49.15MB/s

Control 15.87MB/s

TABLE 1 Effective data rates for USB HS operating modes

Source: Atmel/Micrium




These chips are likely to speed up the adoption of USB for the majority of interconnects between PCBs as well as between the embedded system and its peripherals.

It is a relatively straightforward task to sustain a 480Mbps data rate in a PC or a 400MHz ARM9-based product running a Microsoft or Linux OS with a single memory space con-nected to a single high-speed bus. Achieving this on an ARM Cortex M3 flash MCU with a clock frequency of 96MHz is another story. To run at that speed, store the data in external or internal memory, process it and then resend it either over the USB link or another equivalent speed interface (e.g., an SDCard/SDIO or MMC), needs a highly parallel architecture where DMAs stream data without CPU intervention between memories and peripherals, and where the CPU has parallel access to its memory space to process the data.

Atmel solved this problem on the SAM3U Cortex M3 Flash Microcontroller with a high-speed USB interface by adapting the multi-layer bus architecture of their ARM9 MPUs to the Cortex M3 and dividing the memory in mul-tiple blocks distributed in the architecture.

Three types of DMAs are connected to minimize the load-ing of any data transfer on the bus and memories, and free the processor for the data processing and system control tasks.

Ideally, the central DMA features a built-in FIFO for in-creased tolerance to bus latency and programmable length burst transfers that optimize the average number of clock cycles per transfer, scatter, gather and linked list operations. It can be programmed for memory-to-memory transfers or memory-to-peripheral like a high-speed SPI or SDIO/SD/MMC Media Card Interface (MCI). The high-speed DMA used for the USB High-Speed Device (HSD) interface has a dedicated layer in the bus matrix, maximizing parallel data transfers. The risk of waiting for the bus availability has been removed, and the only critical access the program-mer needs to manage is the access to the multiple memory blocks. Simultaneous accesses need to be avoided, other-wise a FIFO overrun or underrun can occur and data will be lost or the transfer will be stalled.

The peripheral DMA should be tightly integrated in the pe-ripheral programmer’s interface, which will simplify periph-

them are not forced to waste CPU cycles copying data.Of course, a low-overhead software module should use both

memory and CPU clock cycles efficiently. The best commercial off-the-shelf (COTS) USB solutions are devoid of redundant code and superfluous data structures that would otherwise bring about bloated memory footprints. Given the magnitude of the USB protocol, the compact size of these modules is par-ticularly impressive. For example, the code size of a normal configuration of Micriµm’s µC/USB-Host is just 35 kilobytes, while µC/USB-Device, which is Micriµm’s USB Device stack, has a code size of only 15 kilobytes. These modules’ memory needs, as well as those of Micriµm’s OTG module, µC/USB-OTG, are summarized in the graph in Figure 2.

The benefits that an expertly crafted USB module offers easily outweigh the small sacrifices necessary to accommodate such a module. Although developing a high-speed USB host or device without one of these modules is not impossible, it is hardly ad-visable. With a capable COTS solution on their side, astute engi-neers can accelerate the transition from full speed to high speed and can quickly move their USB-enabled products to market.

Hardware implications in sustaining high-speed USB bandwidthMost USB-enabled MCUs are limited to 12Mbps full-speed USB 2.0. The problem here is that the amount of data being collected, stored and ultimately offloaded to a storage device for remote processing for today’s embedded controls appli-cations has increased exponentially. Full-speed USB does not compete effectively with 20Mbps SPI or 100Mbps-plus paral-lel bus. Fortunately, flash MCUs and embedded MPUs are coming to market with on-chip 480Mbps high-speed USB 2.0.

Type

60

50

40

30

20

10

0

Mem

ory

Usa

ge

(in

Kb

ytes

)

µC/USB Device µC/USB Host µC/USB OTG

Code Data

FIGURE 2 Memory footprint of USB modules

Source: Micrium

The simplicity and ease-of-use of USB software and its high sustainable data rates are leading

many engineers to migrate designs to USB-enabled 32-

bit MCUs

©2009 National Instruments. All rights reserved. CompactRIO, LabVIEW, National Instruments, NI, and ni.com are trademarks of National Instruments. Other product and company names listed are trademarks or trade names of their respective companies. 2009-10794-305-101-D

>> Learn how to simplify embedded design at ni.com/embedded 888 279 9833

Get to market faster and reduce development costs with graphical system design, an approach that combines open,

graphical software and off-the-shelf hardware to help you quickly iterate on designs and easily implement them on an

NI embedded platform. The NI CompactRIO system offers an ideal embedded prototyping platform with a built-in micro-

controller, RTOS, programmable FPGA, integrated signal conditioning, and modular I/O, as well as tight integration with

intuitive NI LabVIEW software.

Embedded Prototyping. Simplified.

Traditional Prototyping Tools Graphical System Design Tools

Untitled-5 1 4/15/09 3:16:53 PM

www.ni.com/embedded


unlikely to run at the maximum 480 Mbps data rate. More likely, they will run at tens or hundreds of Mbps, to escape the limitations of full speed USB (12Mbps) or SPI (tens of Mbps). However, over time, data requirements will contin-ue to grow and thereby push demands on any system.

Running at the maximum 480 Mbps data rate on a Cortex M3 class flash MCU is feasible through a careful design of the internal bus, memory and DMA architecture. Using COTS software takes the burden and risk away for the software de-veloper, providing the best guarantee for USB compliance and interoperability in the minimum amount of time. The use of market-standard implementations of the USB host interface defined by Intel increases the choice in OSs, RTOSs.

eral driver development. It should have a reduced gate count to generalize its implementation without a serious cost adder reducing processor overhead in data transfers. Gate count re-duction can be achieved by removing local storage capabilities and reducing linked list support to two memory areas.

Multiple data memory blocks should be distributed in the microcontroller. For example, two central SRAM blocks can allow the CPU to run from one with the DMAs loading and storing in parallel from the other. There should be several FIFOs built into the high-speed peripherals and DMA con-troller, including a 4KB DPRAM in the USB HSD interface. These memories reduce the impact of the bus or memory latency on the high-speed transfer. The programmer can al-locate the 4KB DPRAM in the USB HSD to the different end points, except for the control end point since its data rate is low. Up to three buffers can be allocated to a single end point to support micro-chain messages.

Table 1 (p.45) provides benchmark data on the effective data rates for the different operating modes of the USB HS Interface in Atmel’s Cortex M3-based SAM3U. The data is streamed in parallel to the processor doing the data packing or unpacking. The delta between the effective data rate and the maximum 480Mbps or 60MBs in Bulk, Interrupt and Isochronous modes, are due to the protocol overhead and not to any architectural limits.

The gap between the data requirements of embedded sys-tems and the hardware that moves and processes that data has been growing exponentially in recent years. Recent de-velopments in both microcontrollers and software capable of supporting high-speed USB provide a much needed solu-tion. In the early stages of adoption, the majority of users are

Cortex-M3Processor

SRAM16KB

SRAM32KB

Flash128kB

Flash128kB

High SpeedUSB Device

4kBDPRAM

DMA

CentralDMA

ExternalBus

Interface

4kBSRAM

5-Layer AHB Matrix

Peripheral Bridge Peripheral BridgePeripheral DMA Controller

Peripheral DMA Interface

Low Speed PeripheralsADC, Timer, PWM, UAR, I2C

High Speed PeripheralsMMC/SDCard, SDIO, SPI, I2S

Syst

em P

erip

her

als

Bac

kup

Un

it

MPU

FIGURE 3 Block diagram of the SAM3 with multi-layer bus, DMAs and high-speed interfaces (HSMCI, EBI)

Source: Atmel


Micrium1290 Weston RoadSuite 306WestonFL 33326USA

T: 1 954 217 2036W: micrium.com

Atmel2325 Orchard ParkwaySan JoseCA 95131USA

T: 1 408 441 0311W: www.atmel.com

Globalizing Sales & Support™

EDATechForce EDA

Our Partners

EDATechForce, LLC3000 Scott Blvd. Suite 116 Santa Clara, CA 95054 Phone: (408) 855-8882 Email: [email protected]

www.EDATechForce.com

Your Technical Sales Rep Company Professional Sales Team EDATechForce understands “Business is Personal”. Our customers trust us to bring them strong, competent technology that we can personally support.

Technical Advantage

Spend less time evaluating EDA products ... … More time designing

US-based state-of-the-art compute center Our AEs are experts in the entire IC design flow Let us help you find the Technology you need

Our Mission: Provide outstanding Sales and

Technical Support Services to world-wide EDA and IP partners, for the

benefit of our valued customers.

Untitled-1 1 8/17/09 3:49:37 PM

www.edatechforce.com


Paul Quintana is a senior technical marketing manager in Altera’s military and aerospace business unit, focusing on secure communication and cryptography, DO-254 and software defined radio. He holds an MSEE and a BSEE from New Mexico State University.

FPGAs are a ubiquitous part of today’s processing land-scape. Their use has extended from their long-established role as glue logic interfaces to the very heart of the advanced information-processing systems used by core Internet routers and high-performance computing systems. What remains common throughout this evolution is the drive to integrate more functionality in less space while decreasing power and cost.

High-reliability system design—as well as other system design areas such as information assurance, avionics and industrial safety systems—sets similar requirements for reduced system size, power and cost. Traditionally, high- reliability systems designs have approached reliability through redundancy. The drawback with redundancy, however, is in-creased component count, logic size, system power and cost.

Altera has developed a strategy that addresses the conflict-ing needs for low power, small size and high functionality while maintaining the high reliability and information assur-ance these applications require. The design separation feature in its Quartus II design software and Cyclone III LS FPGAs gives designers an easy way of executing high-reliability re-dundant designs using single-chip FPGA-based architectures.

Life-cycles and reliabilityThe concept of reliability engineering has been driven by the U.S. Department of Defense (DoD) since its first studies into the availability of Army and Navy materiel during World War II. For example, the mean time to failure (MTBF) of a bomber was found to be less than 20 hours, while the cost to repeatedly repair the bomber would eventually reach more than ten times the original purchase price. Subsequently, the total life-cycle cost has become a critical metric for system specification and design.

High-assurance cryptographic systems have historically followed a similar path. Failures in a cryptographic sys-tem affect the total life cycle in such fundamental terms as security for military systems and commerce for financial systems. Given this context, high-assurance cryptographic systems have similar design and analysis requirements to high-reliability systems.

In each case, the designer’s goal is to shrink the PCB size and reduce the number of components needed for a par-ticular application. This has been the trend in the electronics industry for decades, most recently in system-on-chip (SoC) ASICs and today progressing to SoC FPGAs. Developing SoC ASICs consolidated external digital logic into a single device. This paradigm progressed successfully until the cost and schedules of ASIC development exceeded market money and time budgets. With ASIC costs having grown so much, system designers are increasingly turning to FPGAs

Ensuring reliability through design separationPaul Quintana, Altera

U.S. “OrangeBook” TCSBC

(1985)

French “Blue-White-Red Book”German IT-Security Criteria

Netherlands CriteriaU.K. Systems Security

Confidence LevelsU.K. “Green Books”

(All 1989)

U.S. FederalCriteria Draft

(1992)

EuropeanITSEC(1991)

CanadianCriteriaCTCPEC(1993)

Common Criteria, ISO 15408

1985 2007

v1.0 (1996)

v2.0 (1998)

v2.1 (1999)

v2.2 (2004)

v3.1 (2006)

FIGURE 1 Evolution of security criteria design and analysis

Source: Altera

51

System designs have traditionally achieved reliability through redundancy, even though this inevitably increases component count, logic size, system power and cost. The article describes the design separation feature in Altera software that seeks to address these as well as today’s conflicting needs for low power, small size and high func-tionality while maintaining high reliability and information assurance.

where performance and logic densities enable logic consoli-dation onto a reprogrammable chip.

However, while the growth in SoC designs has been steady for many years, the design and complexity of FPGAs have un-til now prevented the integration of redundant designs. Many system and security analysts deemed the analysis necessary to verify separate and independent datapaths too difficult.

By working with certification authorities, Altera has sim-plified complex FPGA device analysis and ensured separate and independent datapaths. By providing users with FPGA tools and data flows that have this analysis in mind from the start, we enable designers to consolidate fail-safe logic designs into a single FPGA fabric. This allows them to meet development budgets and also the requirements of high-reliability and high-assurance applications.

Information-assurance applicationsInformation-assurance equipment must provide a high lev-el of trust in the design and implementation of the crypto-graphic equipment. Guaranteeing a complex system design is trustworthy requires robust design standards and system analysis, and several security-design standards and evalua-tion bodies exist. While explaining the design requirements and evaluation criteria used by each of these bodies exceeds the scope of this article, an overview of their evolution and complexity is shown in Figure 1.

IT systems have the greatest influence on information assur-ance. With an ever-increasing number of infrastructure-control systems, and with corporate and personal information acces-sible via the Internet, they are increasingly relied on to protect sensitive information and systems from hackers and criminals.

To provide information assurance on the Internet, a user must not only inspect data for viruses, but also protect sen-sitive information by using security and encryption tech-nologies such as IPsec and HTTPS. While the HTTPS cryp-tographic algorithm is typically implemented in software running on a computer platform, IPsec and virtual private network (VPN) encryption applications usually require higher performance and rely more heavily on hardware. Network IT equipment must be evaluated at all appropriate levels to ensure trust in the overall system.

This trust must be proven by hardware analysis of each IT component, ensuring that information-assurance levels meet the security requirements of either the Common Crite-ria or Federal Information Processing Standard (FIPS) 140-2 or 140-3. As shown in Table 1 (p.53), this analysis is complex and can greatly extend the design cycle.

Commercial cryptographyThe financial industry today drives the development of com-mercial cryptography and cryptographic equipment. Its need

for information assurance has become ever more pervasive, as its use of technology has grown from inter- and intra-bank electronic data interchange (EDI) transactions, to public au-tomatic teller machines (ATMs), to high-performance crypto-graphic applications driving electronic commerce.

Like the military, commercial electronic commerce needs commonly accepted standards for the design and evalua-tion of cryptographic hardware. The financial industry’s need for cryptographic interoperability has been a key dif-ferentiator in this market. Commerce extends beyond na-tional boundaries and therefore so must the cryptographic equipment it uses. A major complication in this landscape is the classification of cryptography as a regulated technology under the International Traffic in Arms Regulations (ITAR). High-performance electronic-commerce cryptographic equipment is developed mainly by large server manufac-turers that can invest in the expertise and long design cycles necessary to create FIPS 140-2-certified modules.

High-reliability applicationsIndustrial applications also take advantage of the design sep-aration and independence available from FPGAs. For exam-ple, increasing numbers of embedded control units (ECUs) are used in automobiles with increasing complexity and functionality. ECU designers must maintain reliability while reducing size and cost. An ability to separate redundant logic within a single FPGA allows them to reduce the number of system components while maintaining fault isolation.

The portions of logic shown in blue and in yellow are inseparate, secure partitions

FIGURE 1 Design separation for high reliability and information assurance

Source: Altera




VerilogHDL(.v)

VHDLHDL

(.vhd)

AHDL(.tdf)

BlockDesign File

(.bdf)

EDIFNetlist(.edf)

VQMNetlist(.vqm)

Partition Top

Partition 1

Partition 2

Analysis & SynthesisSynthesize Changed Partitions,

Create/Modify Design Partition AssignmentsPreserve Others

Design PartitionAssignments

Settings &Assignments



One Post-SynthesisNetlist per Partition

Partition MergeCreate Complete Netlist Using Appropriate Source Netlists for Each

Partition (Post-fit, Post Synthesis or imported Netlist)

Single Netlist forcomplete Design

One Post-FitNetlist perPartition

Single Post-FitNetlist forComplete Design

FitterPlace-and-Route Changed

PartitionsPreserve Others

Create Individual Netlists andComplete Netlists

Assembler

Timing Analyzer

RequirementsSatisfied?

Program/Configure Device

Floor Plan LocationAssignments

• Create/Modify InitialFloorplan

• Assign SecurityAttributes

• Assign RoutingRegions and Signals

• Assign I/O

No

Yes

Make Design &Assignment Modifications

FIGURE 3 High-assurance design flow using incremental compile

Source: Altera

53

es where design separation is critical—such as financial applica-tions, where data must be encrypted—data must not be able to leak from one portion of the design to another in the event of an inadvertent path being created by a fault. In cases where high re-

Design separation Information-assurance and high-reliability applications current-ly require at least two chips to ensure the logic remains separate and functions independently. This ensures that a fault detected in one device does not affect the remainder of the design. In cas-

# Section Security Level 1 Security Level 2 Security Level 3 Security Level 4

1 Cryptographic module specification

Specification of cryptographic module, cryptographic boundary, approved algorithms, and approved modes of operation

Description of cryptographic module, including all hardware, software, and firmware components

Statement of module security policy

2 Cryptographic module ports and interfaces

Required and optional interfaces

Specification of all interfaces and of all input and output datapaths

Data ports for unprotected critical security parameters logically separated from other data ports

3 Roles, services, and authentication

Logical separation of required and optional roles and services

Role-based or identity-based operator authentication

Identity-based operator authentication

4 Finite state model

Specification of finite state model

Required states and optional states

State transition diagram and specification of state transitions

5 Physical security

Production-grade equipment

Locks or tamper evidence

Tamper detection and response for covers and doors

Tamper detection and response envelope EFP and EFT

6 Operational environment

Single operator Executable code Approved integrity technique

Referenced PPs evaluated at EAL2 with specified discretionary access control mechanisms and auditing

Referenced PPs plus trusted path evaluated at EAL3 plus security policy modeling

Referenced PPs plus trusted path evaluated at EAL4

7 Cryptographic key

management

Key management mechanisms: random number and key generation, key establishment, key distribution, key entry/output, key storage, and key zeroization

Secret and private keys established using manual methods may be entered or output in plaintext form.

Secret and private keys established using manual methods shall be entered or output encrypted or with split knowledge procedures.

8 EMI/EMC 7 CFR FCC Part 15, Subpart B, Class A (Business use), Applicable PCC requirements (for radio)

7 CFR FCC Part 15, Subpart B, Class B (Home use)

9 Self tests Power-up tests: cryptographic algorithm tests, software/firmware integrity tests, critical functions tests, conditional tests

Statistical RNG tests callable on demand

Statistical RNG tests performed at power-up

10 Design assurance

Configuration management (CM)Secure installation and generation

Design and policy correspondence

Guidance documents

CM system

Secure distribution Functional specification

High-level language implementation

Formal model Detailed explanations (informal proofs)Preconditions and post conditions

- Mitigation of other attacks

Specification of mitigation of attacks for which no testable requirements currently are available

TABLE 1 FIPS 140-2 security requirements

Source: Altera




The design separation feature is fully supported using the Mentor Graphics ModelSim verification environment, al-lowing designers to achieve high system reliability through logical redundancy. ModelSim allows designers to verify the functional equivalence of redundant logic on a single Cyclone III LS FPGA.

ConclusionRequirements for high-reliability and information-assurance systems have many similarities. Both systems require de-sign separation and independence, as each system requires redundancy to ensure proper design operation in the event of hardware faults. Traditionally, the implementation of re-dundancy increases system size, weight, power and costs because this redundancy is implemented at the board level. To reduce these factors, low-power FPGA processes can be used with a high-assurance design flow to meet stringent NSA Fail Safe Design Assurance requirements.

By ensuring design separation and independence, re-dundant logic can be transferred from the board level to a single FPGA as part of a SoC design approach. Combining low-power, high-logic density and design-separation fea-tures allows developers of high-reliability, high-assurance cryptographic and industrial systems to minimize design development and schedule risk by using reprogrammable logic, and to improve productivity by using a proven incre-mental-compile design flow.

Further informationCyclone III FPGAs—Security, www.altera.com/products/devices/cy-

clone3/overview/security/cy3-security.htmlPartitioning FPGA Designs for Redundancy and Information Se-

curity, webcast,www.altera.com/education/webcasts/all/wc-2009-partitioning-fpga-

redundancy.htmlAN 567: Quartus II Design Separation Flow, www.altera.com/lit-

erature/an/an567.pdf Protecting the FPGA Design From Common Threats, www.altera.

com/literature/wp/wp-01111-anti-tamper.pdf

liability is critical—such as industrial systems where entire man-ufacturing lines may be shut down if one piece of equipment fails—redundant circuits continue to control the system in the event of a main circuit failing, ensuring little to no downtime.

The design separation feature in the Quartus II design soft-ware allows designers to maintain the separation of critical functions within a single FPGA. This separation is created us-ing Altera’s LogicLock feature. This allows designers to allocate design partitions to a specific section of the device. When the design separation flow is enabled, as shown in Figure 2, each secure partition has an automatic fence (or ‘Keep out’ region) associated with it. In this way, no other logic can be placed in the proximity, creating one level of increased fault tolerance.

However, to ensure true separation, the routing also must be separated. Therefore, all routing is restricted to the LogicLock area of the design partition. This means that the fence region does not contain logic and does not allow rout-ing to enter or exit the fence, ensuring the region’s physi-cal isolation from any other function in the device. Routing interfaces can then be created using interface LogicLock re-gions. These interface LogicLock regions can route signals into or out of separated regions by creating an isolated chan-nel between two separated partitions. This is effectively the same as using two physical devices to ensure separation.

Altera has designed the Cyclone III LS fabric architecture to ensure the separation results in an increased fault toler-ance with the minimal fence size, enabling designers to use over 80% of the resources for their design. The design sepa-ration flow also enables specific banking rules that ensure the separation created in the fabric for critical design parti-tions extends to the I/Os. The Cyclone III LS packages also are designed to support such I/O separation.

Single-chip high-assurance design flowThis uses a standard incremental compile design flow (Figure 3, p.52) with five additional steps during floorplanning:

• Create design partition assignments for each secure re-gion using incremental compilation and floorplanning. Each secure region must be associated with one parti-tion only, which means the design hierarchy should be organized early in the design process.

• Plan and create an initial floorplan using LogicLock re-gions for each secure partition. Top-level planning early in the design phase helps prevent and mitigate routing and performance bottlenecks.

• Assign security attributes for each LogicLock region. Locked regions are used for those parts of a design that require design separation and independence.

• Assign routing regions and signals. To ensure each signal path is independent, a secure routing region must be creat-ed for every signal entering or leaving a design partition.

• Assign I/Os. Each secure region with fan-outs to I/O pins cannot share a bank with any other secure region to ensure design separation and isolation.

Altera 101 Innovation DriveSan Jose,CA 95134 USA

T: 1 408 544 7000W: www.altera.com

www.arm.com

at the heart...of SoC DesignARM IP — More Choice. More Advantages.

• Full range of microprocessors, fabric and physical IP, as well assoftware and tools

• Flexible Foundry Program offering direct orWeb access to ARM IP

• Extensive support of industry-leading EDA solutions

• Broadest range of manufacturing choice at leading foundries

• Industry’s largest Partner network

©ARM Ltd.AD123 | 04/08

The Architecture for the DigitalWorld®

Untitled-2 1 5/6/08 9:32:51 AM

www.arm.com

COOLVERY

Copyright © 2008 Altera Corporation. All rights reserved.

Cool off your system with Altera® Cyclone® III FPGAs. The market’s first 65-nm low-cost FPGA features up to 120K logic elements—2X more than the closest competitor—while consuming as little as 170 mW static power. That’s an unprecedented combination of low power, high functionality, and low cost— just what you need for your next power-sensitive, high-volume product. Very cool indeed.

www.altera.com

Low power

Highest functionality in its class

First 65-nm low-cost FPGA

Untitled-1 1 6/2/09 4:04:58 PM

www.altera.com

Date post:	22-Mar-2016
Category:	Documents
Upload:	rtc-group
View:	236 times
Download:	9 times

EDA Tech Forum Journal: September 2009

Documents