8/3/2019 Accelerated Processing Unit
1/15
Accelerated Processing Unit
CHAPTER 1
INTRODUCTION
Imagine a PC that:
Recognizes your gestures without a remote
Responds to your touch or voice to do your bidding
Supports bi-directional hi-definition video chat over links with limited
bandwidth
Finds and tags the photos and videos in your library that contain particular
faces, places or objects
Helps you sort through your photo libraries to eliminate duplicates saved
with different file names
Enhances the videos youve created with regard to color, focus and image
stability
Up-scales even low-quality content to seamlessly match the capabilities of
your HD display
Adds stereoscopic 3D realism to 2D content
Supports immersive, multi-monitor 3D gaming experiences
Department of Electronics and Communication,College of Engineering , Adoor 1
8/3/2019 Accelerated Processing Unit
2/15
Accelerated Processing Unit
Sells at price points well within reach of the mainstream consumer.
Many of these capabilities exist today piecemeal in labs, running on expensive,
workstation-class computers that cost as much as tens of thousands of dollars. Why
havent we progressed further, faster in delivering these capabilities to the mainstream?
The semiconductor industry prides itself on rapid improvements in system performance,but hardware that runs fast enough to enable these advanced capabilities still costs far too
much to enable high-volume deployment. Software developers, always tuned to market
realities as well as technology, have focused their efforts on applications that run well on
the dual- and quad-core x86 processors that comprise the bulk of todays mainstream
system offerings. But change is in the air; in 2011, affordable mainstream systems that
can support these advanced capabilities are set to enter the market. Youve probably
heard this story before. Every two years, advances in semiconductor technology allow
chip architects to double the number of transistors they can fit in a given area of silicon.
Over the past decade, these extra transistors have been used to increase the size of on-
chip caches and add more x86 processor cores to designs, making todays CPUs the
fastest processors ever. Even the slowest contemporary CPUs have more than enough
performance to handle traditional office productivity, Internet browsing and e-mail
applications, which long ago ceased to be limited by CPU speed. But as fast as they are,
todays CPUs lack the performance to deliver a vivid, modern computing experience on
their own. The latest applications require CPUs that can deal with vast amounts of data
and require hundreds, if not thousands of individual threads to manipulate the massive
databases needed to recognize an object in a scene, the meaning in a sentence, or an
anomaly in an x-ray image. Not surprisingly, traditional CPU architectures and
application programming tools optimized for scalar data structures and serial algorithmsfit poorly with these new vector-oriented, multi-threaded data-parallel models.
Fortunately, innovative architectures and tools better suited for these new workloads have
emerged. Graphics processing units (GPUs), originally intended to enhance 3D
visualization, have evolved into powerful, programmable vector processors that can
accelerate a wide variety of software applications. Software tools like DirectCompute and
OpenCL permit developers to create standards-based applications that combine the power
of CPU cores and programmable GPU cores, and run on a wide variety of hardware
platforms. A few ambitious independent software vendors (ISVs) have already added
support for these new vector capabilities into their most advanced products, even if they
had to structure their code around proprietary hardware and software interfaces to get the
job done.
Advanced Micro Devices (AMDs) forthcoming Accelerated Processing Units
(APUs) build upon this momentum and take PC computing to the next level. These new
processors are being designed to accelerate multimedia and vector processing
applications, enhance the end-users PC experience, reduce power consumption, and
Department of Electronics and Communication,College of Engineering , Adoor 2
8/3/2019 Accelerated Processing Unit
3/15
Accelerated Processing Unit
offer a superior visual graphics experience at mainstream system price points. More
importantly, these APUs will enable ISVs to create new generations of applications and
user interfaces limited perhaps only by the inventiveness of their developers, rather than
by the constraints of the traditional CPU architectures that have dominated the computer
industry for decades.
CHAPTER 2
ACCELERATED PROCESSING UNIT
At the most basic level,
Accelerated Processing Units
combine general-purpose x86
CPU cores with programmable
vector processing engines on a
single silicon die. APUs also
include a variety of critical
system elements, includingmemory controllers, I/O
controllers, specialized video
decoders, display outputs, and
bus interfaces, but real appeal of
these chips stems from the
inclusion of both scalar and
vector hardware as full-fledged
processing elements. CPU and a
basic graphics unit have been
lashed together in a single package with truly
programmable GPUs like those in the AMD Fusion, VIA corefusion, let alone GPUs that
can be programmed using high-level industry-standard tools like DirectCompute and
OpenCL. AMD is best situated to address this engineering challenge, as it is currently the
only company which has access to extensive IP resources (e.g. patents and engineering
expertise) in both x86 processor technology and industry-leading GPU technology. In
Department of Electronics and Communication,College of Engineering , Adoor 3
8/3/2019 Accelerated Processing Unit
4/15
Accelerated Processing Unit
fact, AMDs recognition that it needed proven GPU technology for future converged
products drove its 2006 acquisition of ATI Technologies. APU is set to arrive in a variety
of shapes and sizes adapted to the requirements of their target markets. AMD has
disclosed that its first APUs, code-named Llano and Ontario, are designed for
mainstream desktop and notebook platforms and thin and light notebooks, and
netbooks and slates. Both of these APUs will combine multiple superscalar x86 processor
cores with an array of programmable SIMD engines leveraged from AMDs discrete
graphics portfolio. The key aspect to note is that all the major system elements x86
cores, vector (SIMD) engines, and a Unified Video Decoder (UVD) for HD decoding
tasks attach directly to the same high speed bus, and thus to the main system memory.
This design concept eliminates one of the fundamental constraints that limit the
performance of traditional integrated graphics controllers (IGPs).
Until now, transistor budget constraints
typically mandated a two chip solution for
such systems, forcing system architects to
use a chip-to-chip crossing between the
memory controller and either the CPU or
GPU. These transfers affect memory
latency, consume system power and thus
impact battery life. The APUs scalar x86
cores and SIMD engines share a common
path to system memory to help avoid these
constraints. Total system performance can
be further enhanced through the addition ofa discrete GPU. The common architectures
of the APU and GPU allow for a multi-GPU configuration where the system can scale to
harness all available resources for exceptional graphics and enable truly breathtaking
overall performance. Although the APUs scalar x86 cores and SIMD engines share a
common path to system memory, APUs first generation implementations divide that
memory into regions managed by the operating system running on the x86 cores and
other regions managed by software running on the SIMD engines. APU provides high
speed block transfer engines that move data between the x86 and SIMD memory
partitions. Unlike transfers between an external frame buffer and system memory, these
transfers never hit the systems external bus. Clever software developers can overlap theloading and unloading of blocks in the SIMD memory with execution involving data in
other blocks. Insight 64 anticipates that future APU architectures will evolve towards a
more seamless memory management model that allows even higher levels of balanced
performance scaling. Just as AMDs architects have woven x86 cores and GPU cores
into a single hardware fabric, astute software developers can now begin to weave high
performance vector algorithms into programs previously constrained by the limited
Department of Electronics and Communication,College of Engineering , Adoor 4
8/3/2019 Accelerated Processing Unit
5/15
Accelerated Processing Unit
computational capabilities of conventional scalar processors, even when arranged in
multi-core configurations. In just a few years, machines equipped with programmable
GPUs are expected to comprise a meaningful portion of the installed base of PCs.
Software coming from ISVs who take advantage of these enhanced capabilities will have
the ability to execute well beyond the capability of packages that lack support for these
features.
CHAPTER 3
REASONS FOR MERGING
The CPU and the GPU have been on this collision course for quite some time;
although we often refer to the CPU as a general purpose processor and the GPU as a
graphics processor, the reality is that they are both general purpose. The GPU is merely ahighly parallel general purpose processor, which is particularly well suited for particular
applications such as 3D gaming. As the GPU became more programmable and thus
general purpose, its highly parallel nature became interesting to new classes ofapplications: things like scientific computing are now within the realm of possibility for
execution on a GPU.
Today's GPUs are vastly superior to what we currently call desktop CPUs when itcomes to things like 3D gaming, video decoding and a lot of HPC applications. The
problem is that a GPU is fairly worthless at sequential tasks, meaning that it relies on
having a fast host CPU to handle everything else other than what it's good at.
Department of Electronics and Communication,College of Engineering , Adoor 5
8/3/2019 Accelerated Processing Unit
6/15
Accelerated Processing Unit
Figure 3 Amdahls Law
ATI discovered that long term, as the GPU grows in its power, it will eventually
be bottlenecked by the ability to do high speed sequential processing. In the same vein,the CPU will eventually be bottlenecked by the ability to do highly parallel processing. In
other words, GPUs need CPUs and CPUs need GPUs for all workloads going forward.
Neither approach will solve every problem and run every program out there optimally, but the combination of the two is what is necessary.
To understand the point of combining a highly sequential processor like modernday desktop CPUs and a highly parallel GPU you have to look above and beyond the
gaming market, into what AMD is calling stream computing. AMD perceives a number
of potential applications that will require a very GPU-like architecture to solve, thingsthat we already see today. Simply watching an HD-DVD can eat up almost 100% of
some of the fastest dual core processors today, while a GPU can perform the same
decoding task with much better power efficiency. H.264 encoding and decoding are
perfect examples of tasks that are better suited for highly parallel processor architecturesthan what desktop CPUs are currently built on. But just as video processing is important,
so are general productivity tasks, which is where we need the strengths of present day
Out of Order superscalar CPUs. A combined architecture that can excel at both types ofapplications is clearly a direction that desktop CPUs need to target in order to remain
Department of Electronics and Communication,College of Engineering , Adoor 6
8/3/2019 Accelerated Processing Unit
7/15
Accelerated Processing Unit
relevant in future applications for consumers as well as in researches.
Future applications will easily combine stream computing with more sequential
tasks, and we already see some of that now with web browsers. Imagine browsing a sitelike YouTube except where all of the content is much higher quality and requires far
more CPU (or GPU) power to play. You need the strengths of a high powered sequential
processor to deal with everything other than the video playback, but then you need thestrengths of a GPU to actually handle the video. Examples like this one are overly simple,
as it is very difficult to predict the direction software will take when given even more
processing power; the point is that CPUs will inevitably have to merge with GPUs inorder to handle these types of applications.
CHAPTER 4
MERGING CPUS AND GPUS
AMD views the APU
progression as three discretesteps:
Today we have a CPU and aGPU separated by an
external bus, with the two
being quite independent.The CPU does what it does
best, and the GPU helps out
wherever it can.
Department of Electronics and Communication,College of Engineering , Adoor 7
8/3/2019 Accelerated Processing Unit
8/15
Accelerated Processing Unit
Step 1, is what AMD is calling integration, and it is what we can expect in the first
Fusion product. The CPU and GPU are simply placed next to one another and there's
minor leverage of that relationship, mostly from a cost and power efficiency standpoint.
Step 2, which AMD calls optimization, gets a bit more interesting. Parts of the CPU can
be shared by the GPU and vice versa. There's not a deep level of integration, but it beginsthe transition to the most important step - exploitation.
The final step in the evolution of APU is where the CPU and GPU are truly integrated,and the GPU is accessed by user mode instructions just like the CPU. You can expect to
talk to the GPU via extensions to the x86 ISA, and the GPU will have its own register file
(much like FP and integer units each have their own register files). Elements of the
architecture will be shared, especially things like the cache hierarchy, which will proveuseful when running applications that require both CPU and GPU power.
The GPU could easily be integrated onto a single die as a separate core behind a shared
L3 cache. For example, if you look at the current Barcelona architecture you have fourhomogenous cores behind a shared L3 cache and memory controller; simply swap one of
those cores with a GPU core and you've got an idea of what one of these chips could looklike. Instructions that can only be processed by the specialized core will be dispatched
directly to it, while instructions better suited for other cores will be sent to them. There
would have to be a bit of front end logic to manage all of this, but it's easily done.
Chapter 5
APU in Consumer Electronics
The potential of Fusion extends far beyond the PC space and into the embedded
space. If you can imagine a very low power, low profile Fusion CPU, you can easily see
it being used in not only PCs but consumer electronics devices as well. The benefit is thatyour CE devices could run the same applications as your PC devices, truly encouraging
and enabling convergence and cohabitation between CE and PC devices.
Despite both sides attempting to point out how they are different, AMD and Intel
actually have very similar views on where the microprocessor industry is headed. Both
companies have stated to us that they have no desire to engage in the "core wars", as inwe won't see a race to keep adding cores. The explanation for why not is the same onethat applied to the GHz race: if you scale exclusively in one direction (clock speed or
number of cores), you will eventually run into the same power wall. The true path to
performance is a combination of increasing instruction level parallelism, clock speed, andnumber of cores in line with the demands of the software you're trying to run.
AMD has been a bit more forthcoming than Intel in this respect by indicating that
Department of Electronics and Communication,College of Engineering , Adoor 8
8/3/2019 Accelerated Processing Unit
9/15
Accelerated Processing Unit
it doesn't believe that there's a clear sweet spot, at least for desktop CPUs. AMD doesn't
believe there's enough data to conclude whether 3, 4, 6 or 8 cores are the ideal number for
desktop processors. From our testing with Intel's V8 platform, an 8-core platformtargeted at the high end desktop, it is extremely difficult finding high end desktop
applications that can even benefit from 8 cores over 4. Our instincts tell us that for
mainstream desktops, 3 - 4 general purpose x86 cores appears to be the near term targetthat makes sense. You could potentially lower the number of cores needed if you
combine other specialized hardware (e.g. an H.264 encode/decode core).
What's particularly interesting is that many of the same goals Intel has for the
future of its x86 processors are in line with what AMD has planned. For the past couple
of IDFs Intel has been talking about bringing to market a < 0.5W x86 core that can be
used for devices that are somewhere in size and complexity between a cell phone and anUMPC (e.g. iPhone). Intel has committed to delivering such a core in 2008 called
Silverthorne, based around a new micro-architecture designed for these ultra low power
environments.
AMD confirmed that it too envisions ultra low power x86 cores for use in
consumer electronics devices, areas where ARM or other specialized cores are commonlyused. AMD also recognizes that it can't address this market by simply reducing clock
speed of its current processors, and thus AMD mentioned that it is working on a separate
micro-architecture to address these ultra low power markets. AMD didn't attribute any
timeframe or roadmap to its plans, but knowing what we know about Fusion's debut we'dexpect a lower power version targeted at UMPC and CE markets that make up all the
sales are scheduled to follow as early as possible.
Why even think about bringing x86 cores to CE devices like digital TVs or
smartphones? AMD offered one clear motivation: the software stack that will run on
these devices is going to get more complex. Applications on TVs, cell phones and otherCE devices will get more complex to the point where they will require faster processors.
Combine that with the fact that software developers don't want to target multiple
processor architectures when they deliver software for these CE devices, and by usingx86 as the common platform between CE and PC software you end up creating an entire
environment where the same applications and content can be available across any device.
The goal of PC/CE convergence is to allow users to have access to any content, on any
device, anywhere - if all the devices you're trying to gain access to content/programs onhappen to all be x86, it makes the process much easier.
Why is a new core necessary? Although x86 can be applied to virtually anymarket segment, the range of usefulness of a particular core can extend throughout an
order of magnitude of power. For example, AMD's current desktop cores can easily be
scaled up or down to hit TDPs in the 10W - 100W range, but they would not be good forhitting something in the sub-1W range. AMD can easily address the sub-1W market, but
it will require a different core from what it addresses the rest of the market with. This
philosophy is akin to what Intel discovered with Centrino; in order to succeed in the
mobile market, you need a mobile specific design. To succeed in the ultra mobile and
Department of Electronics and Communication,College of Engineering , Adoor 9
8/3/2019 Accelerated Processing Unit
10/15
Accelerated Processing Unit
handtop markets, you need an ultra mobile/handtop specific processor design as well.
Both AMD and Intel realize this, and now both companies have publicly stated that they
are planning to do something about this recent consumer requirements.
Chapter 6
New Era of Software Development
The GPU is ushering in a new age for software developers. Thats because theGPU is no longer just about visualization or high-end graphics. Sure, those are important
functions, but new software and applications will more fully leverage the latent
capabilities of the GPU as it takes its place alongside the CPU as a powerfulcomputational engine. This merging of CPU and GPU processing power, combined with
the changing face of the Internet, promises to drive software to the next level of
innovation.
As Wired Magazine boldly declared recently, the Internet isnt just about webbrowsing anymore; its about instant communication and the applications and data to
deliver video, photos and audio. The changing dynamics of the Internet are putting
mobility at a premium and driving consumers increasingly into the market for thebroadening range of mobile devices Smartphone, tablets, netbooks, and notebooks.
Department of Electronics and Communication,College of Engineering , Adoor 10
8/3/2019 Accelerated Processing Unit
11/15
Accelerated Processing Unit
What better time for the emergence of the APU a processor that will combine the power
of the CPU and GPU onto a single chip in a small, power-saving format.
Software developers have already started to ask, how do I embrace the new ageof GPU and APU computing? Luckily, AMD is in the trenches working with
industry leaders on the tools and standards needed to help smooth the transition. Asweve touched on in previous blog posts, AMD supports:
OpenCL: OpenCL is an open standard framework for writing parallel programsto execute across heterogeneous platforms consisting of CPUs, GPUs, and other
processors. Notably, the standard enables applications to access the GPU for non-
graphical computing and to balance computation between the CPU and GPU,
therefore making it the perfect development environment for the APU. Wereseeing a lot of exciting innovation happening around OpenCL, such as
MainConcepts new OpenCL H.264/AVC Encoder. MainConcept offers a flexible
and powerful software development kit so other software developers can easily
add OpenCL accelerated encoding to their own solutions. OpenCL is also helpingto drive developments around more natural user interfaces like touch and gesture
and object and facial recognition as well as allowing developers to harness thepower of the GPU for productivity in HD video conferencing and virus scanning.
Microsofts DirectX: DirectX, Microsofts Windows graphics technology,
provides a collection of APIs that developers can use for handling tasks related to
multimedia. Its been widely used by Windows developers for games and videoapplications and is catching the attention of a larger group of developers by
enabling code to be offloaded to the GPU. DirectX APIs include:o D2D: a hardware-accelerated 2-D graphics API that provides high
performance and high-quality rendering for 2-D geometry, bitmaps, and
text. D2D drives your day-to-day software experience to a new level,
particularly when it comes to online gaming and productivity applications.The next generation of web browsers is making use of D2D technology,
including Microsofts IE9 beta and Mozillas FireFox4 beta.
o Directcompute: Another API set of DirectX, Directcompute provides
programmers with a more flexible way to access the computational
capability of GPUs that support DirectX 10 and DirectX 11.
Cyberlinks MediaShow 5s FaceMe Technology, which is designed to
quickly identify faces in photos, is optimized for Microsoft DirectX 11Directcompute.
OpenGL: OpenGL is another standard specification defining a cross-language,
cross-platform API for writing applications that produce 2D and 3D computergraphics such as in content design software and high-end games. While OpenGL
Department of Electronics and Communication,College of Engineering , Adoor 11
8/3/2019 Accelerated Processing Unit
12/15
Accelerated Processing Unit
isnt new, it does have noteworthy new functionality that simplifies porting
between mobile and desktop platforms and increases interoperability with
OpenCL. The recently released OpenGL 4.0 specification also includes update tothe OpenGL Shading language which lets developers better utilize the GPU
acceleration.
Chapter 7
Practicality
Although its exciting to look at the new applications that will finally becomepractical in the Fusion era, the fact remains that most users will want their new APU-
based systems to handle a mix of traditional applications for office productivity and
Internet access, along with those new exciting apps. Fortunately, the changes AMD made
to enable new APU-accelerated applications can also help existing applications run betteras well.
Many of these improvements stem from AMDs ability to fit the CPU cores, GPU
cores and North Bridge (the part of the chip where the memory controller and PCI-
express interfaces reside) onto a single piece of silicon. As noted earlier, this eliminates achip-to-chip linkage that adds latency to memory operations and consumes power. It
takes less energy to move electrons across a chip than to move those same electrons
Department of Electronics and Communication,College of Engineering , Adoor 12
8/3/2019 Accelerated Processing Unit
13/15
Accelerated Processing Unit
between two chips, and the power saved by this small change alone can help significantly
increase system battery life. The co-location of all key elements on one chip also allows
AMD to take a holistic approach to power management on these APUs. They can powervarious parts of the chip up and down depending on workloads, squeezing out a few
milliwatts here and another few milliwatts there which in the aggregate can amount to
significant power savings.
Finally, some of the improvements can be attributed to the advanced GPU
technology AMD embeds in its APU offerings. Although the company has yet to revealthe technical specs of these GPUs, it has disclosed they will be DirectX 11-compliant.
These will be the first APU-based systems that can support DirectX 11s enhanced visual
experience without a discrete GPU, and thus will represent a cost-effective solution forsystems developers
Chapter 8
Conclusion
Since the days of the earliest personal computers, each major advance in system
capability has enabled innovative software developers to create new products that opened
new markets. The Apple II gave us VisiCalc, the first spreadsheet. The original IBM PC
led to Lotus 1-2-3, the first spreadsheet with graphics. The Macintosh ushered in an eraof desktop publishing that has forever changed the way the world creates and distributes
information.
The dramatic increase in performance enabled by AMD Fusion technology can
create new opportunities for entrepreneurial developers to innovate and make the world abetter and richer place. Along the way, they may enrich themselves as well. Thats the
way the system is supposed to work.
Department of Electronics and Communication,College of Engineering , Adoor 13
8/3/2019 Accelerated Processing Unit
14/15
Accelerated Processing Unit
More importantly, compared to todays mainstream offerings, APU-based
platforms will possess prodigious amounts of computational horsepower. This processing
power will allow developers to tackle problems that lie beyond the capabilities of todaysmainstream systems,and will enable innovative developers to step up and update existing
applications or invent new ones that take advantage of GPU acceleration. These features
will be a standard part of every APU. Over time, even the most affordable PCs can beexpected to have the computational performance of yesterdays million dollar
mainframes with all day battery life.
Of course, few users will want to run the same applications on tomorrows
notebooks that they ran on yesterdays mainframes and supercomputers. They will likely
want to run applications that help them in their everyday lives, doing tasks they cannotaccomplish on the systems they own today. They may want to use facial recognition
software to sort their photos and videos, or even to help them identify people they meet
on the street or actors they see in movies. They may want the on-screen appearance of thevideos they stream to approach that of the HD content on their TVs, even when
bandwidth constrains that content to a low resolution format.
For the hardware developer, ODM or PC manufacturer, its time to start thinking
about how to incorporate these new APUs into product lines in order to enhance the
consumer experience. Software developers should look to this new power to help theirsoftware run even better. All developers are encouraged to upgrade their skills and learn
about OpenCL and DirectCompute, and to examine current software projects to see how
they can be improved in a world where systems have dramatically more power. Becausepretty soon, they will.
Reference
The Industry-Changing Impact of Accelerated Computing
o Nathan Brookwood
fusion.amd.com
Department of Electronics and Communication,College of Engineering , Adoor 14
8/3/2019 Accelerated Processing Unit
15/15
Accelerated Processing Unit
http://www.anandtech.com/show/2229
http://sites.amd.com/us/fusion/APU/Pages/fusion.aspx
http://www.dailytech.com/article.aspx?newsid=4696
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html