+ All Categories
Home > Documents > Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices...

Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices...

Date post: 04-Aug-2020
Category:
Upload: others
View: 27 times
Download: 0 times
Share this document with a friend
4
14 OCTOBER 2013 RTC MAGAZINE 14 JANUARY 2014 RTC MAGAZINE TECHNOLOGY CORE Altera Cyclone V: The Marriage of CPU and FPGA P rocessors and field-programmable gate arrays (FPGAs) perform the heavy lifting in most embedded systems. While processors and FPGAs often work alone, the two technologies work brilliantly together, forming an even more powerful embedded computing plat- form. Often in these systems, the proces- sor provides the high-level management functionality while the FPGA performs stringent real-time operations, extreme data processing, or interface functions not easily supported by a processor. SoC FPGA devices successfully inte- grate both processor and FPGA architec- tures in a single device. Melding the two technologies provides a variety of benefits including higher integration, lower power, smaller board size and higher bandwidth communication between the processor and FPGA. Best-in-class devices exploit the unique advantages of a merged pro- cessor/FPGA system while retaining the benefits of stand-alone processor and FPGA. An SoC FPGA provides at least com- parable and likely superior functionality and performance than previous generation designs, but at a lower board space, lower power and lower system cost—maybe as much as 50% less. By integrating these technologies on the same piece of silicon, system developers can eliminate the cost of one of the plastic packages. If both the CPU and FPGA in a design use separate external memories, designers may also be able to consolidate both into one memory device, saving even more system cost, board space and power. Because the sig- nals between the processor and the FPGA now reside on the same silicon, commu- nication between the two consumes sub- stantially less power compared to using separate chips. Plus, thanks to thousands of internal connections between the pro- cessor and the FPGA, an integrated solu- tion has substantially higher bandwidth and lower latency compared to a two-chip solution. There are several design consider- ations and engineering decisions embed- ded developers should take into account when choosing the best SoC FPGA for their application. These selection criteria include system performance, system reli- ability, power consumption, development tools and future roadmap. Increasing System Performance with SoC FPGAs Ultimately, system performance in SoC FPGAs is dictated by efficiently moving data between four major SoC functions: the processor, the FPGA logic, the interconnect, and on-chip and off- chip memory. In a variety of applications, system performance is dominated by the data path performance, where a device must process continuous streams of data at “line speed” or “wire speed” with a mini- mum of stalling or interruptions. In these applications, the FPGA logic crunches the critical data path while the processor provides high-level management over the control path. The processor intercepts a small fraction of the incoming data and mostly attempts to stay out of the way of the data path. To perform this delicate dance, modern-day SoC FPGAs leverage an ARM dual-core Cortex-A9 application by Todd Koelling, Altera Devices that combine ARM processors with FPGA fabrics on a single die show great promise. Still, it is important to pay attention to the internal details when selecting to ensure the highest performance. Architecture Matters When Choosing the Right SoC FPGA FIGURE 1 Cyclone V SoCs feature a >100 Gbit/s interconnect between the FPGA and processor.
Transcript
Page 1: Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices Devices Memory Memory FIGURE 3 Integrating the processor and FPGA into a single SoC

14 OCTOBER 2013 RTC MAGAZINE14 JANUARY 2014 RTC MAGAZINE

TECHNOLOGYCOREAltera Cyclone V: The Marriage of CPU and FPGA

P rocessors and field-programmable gate arrays (FPGAs) perform the heavy lifting in most embedded

systems. While processors and FPGAs often work alone, the two technologies work brilliantly together, forming an even more powerful embedded computing plat-form. Often in these systems, the proces-sor provides the high-level management functionality while the FPGA performs stringent real-time operations, extreme data processing, or interface functions not easily supported by a processor.

SoC FPGA devices successfully inte-grate both processor and FPGA architec-tures in a single device. Melding the two technologies provides a variety of benefits including higher integration, lower power, smaller board size and higher bandwidth communication between the processor and FPGA. Best-in-class devices exploit the unique advantages of a merged pro-cessor/FPGA system while retaining the benefits of stand-alone processor and FPGA.

An SoC FPGA provides at least com-parable and likely superior functionality and performance than previous generation designs, but at a lower board space, lower power and lower system cost—maybe as much as 50% less. By integrating these technologies on the same piece of silicon, system developers can eliminate the cost

of one of the plastic packages. If both the CPU and FPGA in a design use separate external memories, designers may also be able to consolidate both into one memory device, saving even more system cost, board space and power. Because the sig-nals between the processor and the FPGA now reside on the same silicon, commu-nication between the two consumes sub-stantially less power compared to using separate chips. Plus, thanks to thousands of internal connections between the pro-cessor and the FPGA, an integrated solu-tion has substantially higher bandwidth and lower latency compared to a two-chip solution.

There are several design consider-ations and engineering decisions embed-ded developers should take into account when choosing the best SoC FPGA for their application. These selection criteria include system performance, system reli-ability, power consumption, development tools and future roadmap.

Increasing System Performance with SoC FPGAs

Ultimately, system performance in SoC FPGAs is dictated by efficiently moving data between four major SoC functions: the processor, the FPGA logic, the interconnect, and on-chip and off- chip memory.

In a variety of applications, system performance is dominated by the data path performance, where a device must process continuous streams of data at “line speed” or “wire speed” with a mini-mum of stalling or interruptions. In these applications, the FPGA logic crunches the critical data path while the processor provides high-level management over the control path. The processor intercepts a small fraction of the incoming data and mostly attempts to stay out of the way of the data path.

To perform this delicate dance, modern-day SoC FPGAs leverage an ARM dual-core Cortex-A9 application

by Todd Koelling, Altera

Devices that combine ARM processors with FPGA fabrics on a single die show great promise. Still, it is important to pay attention to the internal details when selecting to ensure the highest performance.

Architecture Matters When Choosing the Right SoC FPGA

FIGURE 1

Cyclone V SoCs feature a >100 Gbit/s interconnect between the FPGA and processor.

Page 2: Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices Devices Memory Memory FIGURE 3 Integrating the processor and FPGA into a single SoC

RTC MAGAZINE OCTOBER 2013 15

TECHNOLOGY CORE

RTC MAGAZINE JANUARY 2014 15

processor integrated into the fabric of an advanced 28nm FPGA. The Cortex-A9 offers an ideal mixture of low power, capabilities, bandwidth and performance compared to other application processors.

The interconnect featured in Cyclone V SoCs is designed specifically to in-crease system performance by supporting more than 100 Gbit/s of throughput be-tween the FPGA logic and the processor (Figure 1). The 100 Gbit/s interconnect between the FPGA logic and the Cortex-A9 processor ensures the system has suffi-cient interconnect performance to support high-throughput traffic.

The ability to efficiently access on-chip and off-chip memory also enables SoC FPGAs to increase system perfor-mance. Hardened memory controllers featured in Cyclone V SoCs employ ad-vanced algorithms to squeeze as much memory efficiency as possible. These al-gorithms extract maximum bandwidth by managing transaction priority, reordering command and data, and scheduling pend-ing transactions using algorithms like deficit weighted round robin. Additional performance comes by customizing the memory controller via software to best fit a custom data profile.

When evaluating the performance of a memory controller, it is important to not just look at the bus width and speed.

System level benchmarks, such as LM-bench, are useful for assessing the overall performance of the memory subsystem. As evidenced by running the LMbench benchmark on a 667 MHz Cyclone V SoC system, the Cyclone V SoC with the smarter memory controller extracts more memory bandwidth—up to 17% more than a competitive SoC device—despite a 25% lower memory operating frequency. This efficiency advantage enables the Cy-clone V SoC to deliver more bandwidth at lower clock rates, resulting in system power savings.

Increasing System Reliability with SoC FPGAs

As memory sizes continue to in-crease, the need for error detection and correction is a growing trend in designs today. Most modern systems include dedicated hardware to help ensure data integrity. This includes error correction code (ECC) protection—not only as part of the memory controller, but also inte-grated within the processor’s on-chip memories, caches, peripheral buffers and in the FPGA itself. Error checking and correction circuitry makes a system more robust and resilient against unexpected data errors or corrupted data.

Memory protection is a feature of-ten associated with the memory con-

trollers in more advanced processors, whether called a memory management unit (MMU) or memory protection union (MPU). The processor’s memory protection unit prevents errant or ille-gal processor transactions from reading or corrupting other memory regions. In the Cortex-A9 processor, ARM extends this protection concept with TrustZone, which provides a system-wide approach for security-sensitive systems.

Using the Cyclone V SoC, specific memory regions may be dedicated to the operating system and embedded software applications while other memory regions may be dedicated to FPGA-based func-tions, as shown in Figure 2. Via memory protection, the FPGA master functions are prevented from corrupting the oper-ating system or embedded software re-gions.

Integration Leads to Power Savings

New electronics applications are in-creasingly power aware—and not just in handheld devices, but also in automotive applications and even server racks with their seemingly endless power and cool-ing budget. SoC FPGA devices are viable solutions to help embedded developers stay within their power budgets.

As illustrated in Figure 3, simply integrating the processor and FPGA components into a single SoC FPGA can potentially reduce system power by 10% to 30%. I/Os carrying signals between devices, often at higher voltages, are one of the most power-consuming functions in an application.

Beyond the power savings that sim-ple integration provides, Cyclone V SoCs feature power-saving modes such as clock gating and scaling. The processor and FPGA also have independent power planes, allowing an application to turn off power to the FPGA completely while keeping the processor active to monitor any interrupts.

To optimize power, SoC designs are becoming more interrelated with power supply design. At a system level, the power supply design often consumes more power than the SoC device itself. The challenge in these systems is bal-

FPGA

ProcessorSystem

DDR I/F

ProtectionDDR Controller

IP

DDRMemory

FPGA IP Space

Operating Systems/Embedded Software

FIGURE 2

DDR memory protection in SoC application where processors and FPGA share a common memory.

Page 3: Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices Devices Memory Memory FIGURE 3 Integrating the processor and FPGA into a single SoC

16 OCTOBER 2013 RTC MAGAZINE

TECHNOLOGY CORE

16 JANUARY 2014 RTC MAGAZINE

Before

After

FPGA+CPU

Devi

ces

Memory

FPGA

CPU

Devi

ces

Devi

ces

Memory

Memory

FIGURE 3

Integrating the processor and FPGA into a single SoC FPGA reduces power-hungry, inter-chip I/O connections, as does sharing an external memory interface.

ancing the engineering tradeoff between minimizing the power supply footprint versus maximizing the efficiency of the power supply. Cyclone V SoCs are sup-ported by a range of power supply op-tions and are also supported by advanced DC/DC power converter technologies that enable designers to meet stringent power targets and space constraints. Al-tera offers a new line of Enpirion power modules specifically suited to meet the space and efficiency constraints of SoC FPGA-based embedded systems.

Familiar Development Tools Support SoC FPGAs

This new class of SoC devices that integrate leading-edge ARM applica-tion processors and FPGA fabric opens a wealth of possibilities for faster, cheaper and more energy-efficient electronic products. However, the innovation in hardware must be matched by similar in-novation in the FPGA tools, on-chip de-bugging, software debugging and analy-sis tools. Software ultimately determines

how successful a designer will be using these devices. For broader use, software developers must find SoC FPGAs and their features to be as easy and efficient as software development on stand-alone processors.

SoC FPGAs from Altera are sup-ported by an SoC Embedded Design Suite (EDS) that includes a comprehen-sive, ARM-compatible tool suite for em-bedded software development. It contains development tools, utility programs, run-time software and application examples to expedite firmware and application software of SoC embedded systems. As the result of a strategic relationship be-tween Altera and ARM, the SoC EDS includes the exclusive offering of the ARM Development Studio 5 (DS-5) Al-tera Edition Toolkit. By combining the ARM DS-5 advanced multicore debug-ging capabilities with FPGA-adaptive capability—the ability to see changes in the FPGA hardware immediately—and a seamless link to the Altera SignalTap logic-analyzer, the SoC EDS toolkit pro-vides embedded software developers an unprecedented level of full-chip visibil-ity and control.

When a bug makes an unwelcome appearance, the development team must determine whether it is a hardware or software issue. The tools that support Al-tera SoC FPGAs make finding the cause of these faults much easier by allow-ing the processor subsystem and FPGA subsystem to cross-trigger from code to waveform or from waveform to code. As a result, the development team can find and track how and why a particular con-dition occurred in the system. Cross-trig-gering, trace and global time-stamping are valuable features for IP verification, custom driver development and the sys-tem integration portion of a project.

Besides finding the location of a fault, the SoC EDS allows embedded system developers to find out exactly how and why the system entered the faulty state. The ARM System Trace Module (STM) enables tracking of CPU-based software events. Application software can issue hardware and software event “bread crumbs” as the system executes over time to monitor system behavior and

to gain deep insights into its operation. In an “FPGA adaptive” debugging environ-ment, STM enables event monitoring of both the CPU and FPGA domains with-out having to stop the system.

Future SoC FPGA RoadmapWhen selecting SoC FPGAs, it is im-

perative to make certain that the vendor’s product roadmap will keep your systems competitive and offer forward migration of software for the long term (Figure 4). To begin with, consider the foundation of all silicon roadmaps, which is the un-derlying silicon process technology. The Cyclone V and Arria V SoCs currently available from Altera are built on a 28nm low-power process to help minimize power for industrial, automotive, medical and communications applications where power consumption is a major factor.

The next-generation Arria 10 SoCs from Altera deliver optimal perfor-mance, power efficiency, small form factor and low cost for a wide variety of midrange wireless infrastructure, broadcast, military and compute and storage applications. Arria 10 SoCs are based on TSMC 20nm process technol-ogy and combine a dual-core Cortex-A9 processor system with industry-leading programmable logic technology. Imple-menting the dual-core Cortex-A9 proces-sor system provides ease of software mi-gration from first generation SoC FPGAs while providing a performance boost to 1.5 GHz from the smaller geometry pro-cess technology.

The third-generation Stratix 10 SoCs will deliver breakthrough levels of performance and bandwidth for ad-vanced communications, military and data center applications. Stratix 10 SoCs are based on Intel 14nm Tri-Gate process technology and feature a 64-bit quad-core ARM Cortex-A53 processor. The Cortex-A53 supports a 32-bit compat-ibility mode to ease migration of existing software if desired.

SoC FPGAs are a powerful new class of programmable devices that are applicable to a wide range of electronic designs. The most popular commercially available devices integrate a standard ARM dual-core Cortex-A9—with a rich

Page 4: Altera Cyclone V: The Marriage of CPU and FPGA ... · FPGA Devices +CPU Memory FPGA CPU Devices Devices Memory Memory FIGURE 3 Integrating the processor and FPGA into a single SoC

RTC MAGAZINE OCTOBER 2013 17

TECHNOLOGY CORE

RTC MAGAZINE JANUARY 2014 17

set of peripherals, on-chip memory, a high-speed internal interconnect archi-tecture, a hierarchy of on-chip memory and a leading-edge FPGA fabric. Inno-vative new software design and debug tools enable developers to simultane-ously view and cross-trigger both sides (processor and FPGA) of the chip. While the available devices on the market may seem similar at first glance, upon a closer look, the underlying architecture matters.

Altera San Jose, CA (408) 544-7000 www.altera.com

LOGIC DENSITY

LOW

POW

ERHI

GH P

ERFO

RMAN

CE

LOW-COST SoCs(Lowest Power, Form Factor, and Cost)

HIGH-END SoCs(Highest Performance and System Bandwidth)

MIDRANGE SoCs(High Performance with Low Power, Form Factor, and Cost)

• 28 nm TSMC• 925 MHz Dual ARM Cortex-A9 MPCore• 5 Gbps Transceivers• 400 MHz DDR3• 25 to 110K LEs• Up to 224 Multipliers (18x19)

• 28 nm TSMC• 1.05 GHz Dual ARM Cortex-A9 MPCore• 10 Gbps Transceivers• 533 MHz DDR3• Up to 462K LEs• Up to 2,136 Multipliers (18x19)

• 20 nm TSMC• 1.5 GHz Dual ARM Cortex-A9 MPCore• 17 Gbps Transceivers• 1333 MHz DDR4• Up to 660KLEs• Up to 3,300 Multipliers (18x19)

• 14 nm Intel Tri-Gate• 64 bit Quad ARM A53 MPCore• Optimized for Maximum Performance per Watt• Over 4,000K LEs

FIGURE 4

Stratix 10 SoCs are the third-generation SoC from Altera, which integrates a quad-core Cortex-A53 processor built on Intel’s 14 nm Tri-Gate process technology.

Altera Announces Quad-Core 64-bit ARM Cortex-A53 for Stratix 10 SoCs

Altera’s Stratix 10 SoC devices, manufactured on Intel’s 14nm Tri-Gate process, will now incorporate a high-perfor-mance, quad-core 64-bit ARM Cortex-A53 processor system, complementing the device’s floating-point digital signal pro-cessing (DSP) blocks and high-performance FPGA fabric. Coupled with Altera’s system-level design tools, including OpenCL, this versatile heterogeneous computing platform will offer exceptional adaptability, performance, power efficiency and design productivity for a broad range of applications, in-cluding data center computing acceleration, radar systems and communications infrastructure.

The ARM Cortex-A53 processor, the first 64-bit pro-cessor used on an SoC FPGA, is an attractive fit for use in Stratix 10 SoCs due to its performance, power efficiency, data throughput and advanced features. The Cortex-A53 is among the most power-efficient of ARM’s application-class proces-sors, and when delivered on the 14nm Tri-Gate process will achieve over six times more data throughput compared to to-day’s highest performing SoC FPGAs. The Cortex-A53 also delivers important features, such as virtualization support, 256 Tbyte memory reach and error correction code (ECC) on LI and L2 caches. Furthermore, the Cortex-A53 core can run in 32-bit mode, which will run Cortex-A9 operating systems and code unmodified, allowing a smooth upgrade path from Al-tera’s 28nm and 20nm SoC FPGAs. Leveraging Intel’s 14nm Tri-Gate process and an enhanced high-performance architec-ture, Altera Stratix 10 SoCs will have a programmable-logic performance level of more than 1 GHz—two times the core

performance of current high-end 28nm FPGAs.

By standardizing on ARM processors across its three-gen-eration SoC portfolio, Altera will offer software compatibility and a common ARM ecosystem of tools and operating system support. Embedded developers will be able to accelerate debug cycles with Altera’s SoC Embedded Design Suite (EDS) fea-turing the ARM Development Studio 5 (DS-5) Altera Edition toolkit, the industry’s only FPGA-adaptive debug tool, as well as use Altera’s software development kit (SDK) for OpenCL to create heterogeneous implementations using the OpenCL high-level design language. Stratix 10 SoCs will offer design-ers a versatile and powerful heterogeneous compute platform enabling them to innovate and get to market faster.


Recommended