+ All Categories
Home > Documents > Applications may be the edge of science fiction, but they are

Applications may be the edge of science fiction, but they are

Date post: 12-Sep-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
7
by Justin Rattner he digital revolution, far from abating, continues with even greater intensity in new applications in health, media, social networking, and many other areas of our lives. These applica- tions will require revolutionary improvements in speed and capacity in future microprocessors so that they can process terabytes of information with teraflops of terascale computing power. Tera is not an exaggeration: trillions of hertz and trillions of bytes will be needed (Figure 1). In a terascale world, there will be new processing capabili- ties for mining and interpreting the world’s growing mountain of data, and for doing so with even greater efficiency. Examples of applications are artificial intelligence in smart cars and ap- pliances and virtual reality for modeling, visualization, physics simulation, and medical training. Many other applications are still on the edge of science fiction [1]. In these applications, massive amounts of data must be processed. Three-dimensional (3-D) images in connected visual computing applications like virtual worlds can include hundreds of hours of video, thousands of doc- uments, and tens of thousands of digital photos that require indexing and searching. Terascale computing refers to this massive processing capability with the right mix of memory and input/output (I/O) capabilities for use in Digital Object Identifier 10.1109/MSSC.2008.930935 T Applications may be the edge of science fiction, but they are starting to happen, although the challenges are formidable. 1943-0582/09/$25©2009IEEE IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 2009 83 © COMSTOCK & PHOTO F/X2 Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.
Transcript
Page 1: Applications may be the edge of science fiction, but they are

1943-0582/08/$25©2008IEEE IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 20 0 8 83

by Justin Rattner

he digital revolution, far from abating, continues with even greater intensity in new applications in health, media, social

networking, and many other areas of our lives. These applica-tions will require revolutionary improvements in speed and capacity in future microprocessors so that they can process

terabytes of information with teraflops of terascale computing power. Tera is not an exaggeration: trillions of hertz and trillions of bytes will be

needed (Figure 1). In a terascale world, there will be new processing capabili-ties for mining and interpreting the world’s growing mountain of data, and for doing so with even greater efficiency.

Examples of applications are artificial intelligence in smart cars and ap-pliances and virtual reality for modeling, visualization, physics simulation, and medical training. Many other applications are still on the edge of science fiction [1]. In these applications, massive amounts of data must be processed. Three-dimensional (3-D) images in connected visual computing applications like virtual worlds can include hundreds of hours of video, thousands of doc-uments, and tens of thousands of digital photos that require indexing and searching. Terascale computing refers to this massive processing capability with the right mix of memory and input/output (I/O) capabilities for use in

Digital Object Identifier 10.1109/MSSC.2008.930935

T

Applications may be the edge of science fiction, but they are starting to happen, although the challenges are formidable.

1943-0582/09/$25©2009IEEE IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 20 0 9 83

© c

om

sto

ck

& p

ho

to f

/x2

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 2: Applications may be the edge of science fiction, but they are

84 WINTER 20 0 9 IEEE SOLID-STATE CIRCUITS MAGAZINE

everyday devices, from servers to desktops to laptops.

With a terascale architecture computer, the user will partici-pate in a rich, immersive real-time collaboration in a virtual environ-ment, with studio quality, photo-realistic 3-D graphics. Or the user will manage personal media by automatically analyzing, tagging, and sorting snapshots and home videos. Advanced algorithms will improve the quality of movies captured on older, low-resolution video cameras. An advanced digi-tal health application might assess

a patient’s health by interpreting huge volumes of data in a scan and aid doctors in making real-time decisions.

Requirements of complex and compelling applications will pose many challenges and will offer many opportunities in the future develop-ment of microprocessors.

Challenges in Circuits and ProcessesThe technology scaling described by Moore’s law provides three benefits:

It doubles transistor integration ■■

capacity every generation, there-by reducing the cost by half.

It improves performance of the ■■

circuit.It reduces power consumption.■■

If occurrence of all these benefits factors sounds too good to be true, it is nonetheless. We have enjoyed the benefits for many decades now.

As the transistor scales down in size, it has to scale in all dimen-sions. One major factor in the verti-cal dimension is the scaling of gate insulator thickness. Innovation in scaling is now a combination of materials and dimension. Although the transistor has been seamlessly scaled over the last few genera-tions, the gate insulation (SiO2) has become so thin that it has started to leak as a result of tunneling cur-rents. Does this mean the end of scaling? Skeptics will say so, but the innovators have invented the high-k gate insulator with a new metal gate structure, shown in Figure 2. Also shown is a seven-layer copper and low-k dielectric interconnection system. More re-search (Figure 3) is under way to see if compound semiconductor transistors could provide benefits in transistor scaling and perfor-mance. The high mobility of com-pound semi conductors offers the potential to make transistors even faster. Figure 3 shows additional examples of research, including III-V materials, nanowires, carbon nanotubes, optical interconnects, and 3-D/trigate devices.

Power supply voltage scaling however, has slowed down because of a lack of threshold voltage scal-ing. As supply voltage scales down-ward, the threshold voltage of the transistor has to scale down as well. However, the subthreshold leakage of a transistor increases exponentially with reduction of the threshold voltage—and leakage has already reached its allowable lim-its. This slowing down of supply voltage scaling will result in exces-sive power consumption. In order to keep power consumption un-der control, circuit designers will be forced to scale supply voltage

Performance Entertainment

Traveland Learning

Personal Media

Medicine

Dataset SizeTerabytesGigabytesMegabytes

ModelBased

Tera-Scale

Tera-Scale Applications

Multi-Core

Single-CoreText

Index andMultitask

Multimedia

Kilobytes

TIPS

GIPS

MIPS

KIPS

Figure 1: Terascale applications require increased performance and data set sizes.

(a) (b) (c)

SiGe

Copper + Low-k Strained Silicon High-k + Metal Gate

SiGe SiGe SiGe

Silicon

High-k

Metal

Figure 2: The new era of scaling: material innovation + dimensional scaling. (a) Copper + low-k, (b) strained silicon, and (c) high-k + metal gate.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 3: Applications may be the edge of science fiction, but they are

IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 20 0 9 85

in spite of a loss in performance. Consequently, the frequency of operation of large designs will not continue to increase at the historic rate. Therefore the bulk of perfor-mance gain will have to come from other means. An example is the use of parallelism to exploit the abun-dance of available transistors.

In the past, the metal intercon-nect system was deemed to be the performance limiter due to higher resistance–capacitance (RC) delay. However, the lack of supply voltage scaling is a greater limiter. In fact, since clock frequencies will not increase much, logic domains will shrink in size, and interconnect- related issues will be less challeng-ing than previously expected.

Variability among transistors has also become worse because of ran-dom dopant fluctuations and line edge roughness. The latter is mostly due to lithography challenges [2]. The variability manifests itself as variation in the threshold voltage, resulting in instability of static random-access memory (SRAM) cells and variations in circuit performance. These effects are well known, and work to deal with them by variation-tolerant cir-cuit design is ongoing [3].

As Figure 4 shows, the 193-nm light source will be the workhorse for lithography until the extreme ul-traviolet (EUV) sources become prac-ticable. The gap between light source wavelength and the fine geometries being patterned requires a continued tightening of design rules. While lay-out design rules have become more and more restrictive, these restric-tions are expected to get even worse. In the presence of severely restric-tive design rules, custom layouts will likely result in a poorer layout than an automated layout—a paradigm shift. Design will then become fully automated, from high-level specifica-tion to layout. This shift will require tools to optimize the entire platform itself. Interestingly, such tools will also make it easier to “port” process technology from one generation to the next.

Challenges in DesignIn the last 30 years, Intel has deliv-ered dramatic performance gains by increasing the frequency of its pro-cessors, from 5 MHz to more than 3 GHz, while at the same time, improving instructions per cycle.

Recently, power and thermal issues—such as dissipating heat from increasingly densely packed transistors—have begun to limit the rate at which processor fre-quency can be increased. Mobile clients, with their smaller form factors and server energy costs, can be expected to increasingly limit platform power and energy budgets. Although frequency in-creases have been a design staple

for the last 20 years, the next 20 years will require new approaches. This is just one of the challenges in core (CPU) and “uncore” (inter-faces, memory, and other compo-nents outside the CPU) design for terascale architecture.

Continuing to increase the per-formance of microprocessor hard-ware within a fixed power budget will require more than just add-ing cores. Energy efficiency of the cores and their execution units must increase commensurately. Approaches such as the continued reduction of supply voltage require both circuit and core microarchi-tecture research to provide the nec-essary resiliency [4].

3-D

ComputationalLithography Metal

Ge

CuBarrier

FBC

FinFET

CarbonNanotube FET

OpticalInterconnect

III-V

S D S

G

High-k

BondedStructure Top Thin

Wafer

Bottom Wafer5 µm

Thru-SiVia(TSV)

Figure 3: Research directions in snapshot of semiconductor process and device research.

In a terascale world, there will be new processing capabilities for mining and interpreting the world’s growing mountain of data, and for doing so with even greater efficiency.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 4: Applications may be the edge of science fiction, but they are

86 WINTER 20 0 9 IEEE SOLID-STATE CIRCUITS MAGAZINE

Core DesignSince Moore’s law predicts a con-tinued increase in the number of transistors per chip for every new process generation, microproces-sor development must include ap-proaches that exploit the use of additional silicon real-estate in power-efficient ways.

To increase instructions per cy-cle, architects have taken advan-tage of the increased transistor budget Moore’s law provides, add-ing microarchitectural support for

features that exploit instruction-level parallelism such as out-of-order execution and speculation.

Unfortunately, experience bears out diminishing returns expressed by Pollack’s rule: performance in-crease is roughly proportional to the square root of the increase in complexity. As Figure 5 shows, performance increases with only the square root of the area (that is, complexity), while power con-sumption increases linearly. Hence alternative approaches to increase performance become even more important.

One of the differentiating as-pects of terascale architecture is large-scale core-level parallelism combined with a core microarchi-tecture that is optimized for highly parallel multithreaded workloads. Use of simultaneous multithread-ing (multiple independent threads of execution to better utilize the re-sources provided by the processor) to deal with memory latencies, re-ducing or eliminating out-of-order

complexity, and speculation are all possibilities to achieve a more pow-er-efficient, cost-effective terascale architecture. Additional specialized functions such as vector instruc-tions will be useful in addressing highly computation-intensive as-pects of emerging workloads.

For terascale processing, modu-lar tile-based design methodolo-gies will be required. Monolithic designs with their associated com-plexity, speed paths, clock-distri-bution challenges, and so forth will be intractable.

Uncore DesignBringing the terascale vision to fruition requires the integration of not only a large number of general- purpose computing cores but also the uncore—special-purpose com-puting engines (texture units, shader units, and fixed-function units, for example), and platform elements, such as memory and I/O controllers—into a single die. This integration requires an on-die fabric that has scalable high-bandwidth, low-latency, and power-efficient in-terconnections to link the computing and platform elements together. The fabric must allow them to exchange information with each other, access memory, and communicate with the rest of the system. Given the un-core’s central nature, there are cer-tain additional requirements such as partitionability, fault tolerance, validation and testing, regularity and flexibility, and design friendli-ness that need to be addressed [5].

The addition of specialized func-tions requires research on the best way to incorporate the functionality so that it can be used (and reused) most effectively by software in new and unanticipated ways. Large-grain fixed-function acceleration requires heavy utilization to justify its inclu-sion. Finer-grained enhancements to the instruction set and reconfigu-rable accelerator functionality offer the programmer more options.

The diversity of workloads and concentration of computation

0.1

0.011980 1990

FeatureSize

65 nm

45 nm

32 nm22 nm

Gap

193 nm248 nm

LithographyWavelength

EUV13 nm

2000 2010 2020

Micron nm

1

100

10

1,000

Figure 4: The gap between subwavelength lithography and technology scaling.

10

Inte

ger

Per

f (X

)

Area or Power (X)

Performance ~ Sqrt(Area)

Slope = ~0.5

11 10

Figure 5: Pollack’s rule: integer performance for new microarchitectures increases with area. The graph plots integer performance increase against area and power increase from the previous generation microarchitecture, in the same process technology.

Microprocessor development must meet many technical challenges to realize the opportunity that these emerging applications present.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 5: Applications may be the edge of science fiction, but they are

IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 20 0 9 87

resources in terascale architec-ture puts tremendous demands on the cache hierarchy and coherency protocol. These demands require a flexible distributed (tiled) cache or-ganization that can adapt to work-load demands. Such an organization must place minimal restrictions on the software to fully realize the per-formance potential. The associated coherency protocol needs to be effi-cient and scalable. It should also be flexible in terms of the requirements it imposes on the building blocks of terascale architecture.

Memory DesignWith substantial increases in the computation power on a single die, one faces the challenge of feeding the device with enough data band-width. For a small class of applica-tions where the memory footprint is small, the memory accesses result in exercising the on-die caches. For the majority of applications, a major increase in off-chip memory band-width is required.

This increase manifests itself in two ways:

providing power-efficient, high-■■

speed off-die I/Oproviding power-efficient, high-■■

bandwidth access to dynamic random-access memory (DRAM).The former has seen steady prog-

ress in the past decade but not at the required pace. The latter may re-quire a new look at the system-level memory repartition and optimiza-tion along with I/O design. Potential solutions to address memory band-width challenges include the higher density memory for embedded applications, such as scaled float-ing body cell memory [6], and the integration of a large DRAM cache inside the processor package. Such an approach must provide a more efficient I/O channel that allows a higher bandwidth than the path provided by dual in-line memory modules (DIMMs), which crosses the package to the motherboard connec-tor. Examples include IBM’s work on 3-D integrated circuits [7].

I/O DesignWith increasing levels of integra-tion [8], the terascale microproces-sor truly becomes a full system on a chip. This implies a need to provide system I/O such as Ethernet and di-rectly attached storage, as well as interprocessor interconnects such as QuickPath Interconnect (QPI) [9].

Issues such as resilient distribution, mapping of data flows to cores, and a balance of the number of interfac-es versus bandwidth of interfaces must all be resolved.

Example of a Terascale ArchitectureThe “Larrabee” architecture [10], shown in Figure 6, is Intel’s first terascale architecture microproces-sor that is targeted at visual comput-ing workloads. The cores communi-cate on a wide ring bus, resulting in fast access to memory and fixed-function blocks and ensuring cache coherency. The last-level cache (L2) is partitioned among the cores to provide high aggregate bandwidth and allow for data replication and sharing. The Larrabee core has sep-arate scalar and vector units with

separate registers. The cores are in-order x86 scalar cores with short ex-ecution pipelines. Each core has fast access from the L1 cache and direct connection to each core’s subset of the L2 cache. The prefetch instruc-tion can load both the L1 and L2 caches. The vector processing unit (VPU) comes with vector complete

instruction set—scatter/gather for vector load/store—and mask regis-ters that select lanes to write, result-ing in data-parallel flow control and enabling mapping of a separate ex-ecution kernel to each VPU lane.

Challenges in SoftwareWriting software to gain the full benefits of multicore processing and scale, with increasing paral-lelism, may be the greatest chal-lenge terascale architecture poses. Parallel programming has been the province of only a few experts in the server and high- performance computing communities because it is difficult to develop and test. Pro-gramming even small-scale multi-processing (two to four processors) is a formidable task. The required

Figure 6: Block diagram of the Larrabee graphics processing unit (GPU).

MultithreadedWide SIMD

I$ D$

MultithreadedWide SIMD

I$ D$

MultithreadedWide SIMD

I$ D$

MultithreadedWide SIMD

I$ D$

L2 Cache

Mem

ory

Con

trol

ler

Mem

ory

Con

trol

ler

Fix

ed F

unct

ion

Text

ure

Logi

c

Dis

play

Inte

rfac

eS

yste

m In

terf

ace

Writing software to gain the full benefits of multicore processing and scale may be the greatest challenge terascale architecture poses.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 6: Applications may be the edge of science fiction, but they are

88 WINTER 20 0 9 IEEE SOLID-STATE CIRCUITS MAGAZINE

synchronization leads to problems with deadlock and race conditions that introduce very subtle, hard-to-reproduce errors. Tools to as-sist with multicore and existing programming are just beginning to make inroads into this problem.

Even with application software that is designed to exploit paral-lelism, the system software, from firmware through virtual machine managers and operating system to the managed run-time environment, must be designed to avoid serializa-tion in ways that prevent scaling.

For example, existing firmware ini-tialization sequences with extensive rendezvous of the processors are already very time consuming and completely impractical for dozens of cores. Operating system code cur-rently optimized for small symmet-ric multiprocessing (SMP, where two or more identical processors con-nect to a single shared main mem-ory) must be carefully recrafted to scale with many more cores and ex-ploit the benefits of the high-speed on-die interconnects among cores.

Legacy software and inherently single-threaded algorithms present another challenge to terascale ar-chitecture. We must explore ways to incorporate heterogeneous general-purpose cores, both single-thread and multithread optimized types, into the architecture [11]. We must also research the use of ensembles of simple cores to accelerate single threads through thread-level specu-lation and other techniques [12].

The programming environment we take for granted, languages, de-bugging, compilers, for example, must all be extended to address both task parallelism (i.e., multithread-ing) and data parallelism. Beyond

the more common shared memory model of SMP programming, we need to provide the tools to sup-port programming models that have been successfully employed in exist-ing parallel programming environ-ments. Examples are stream-based computing, map-reduce, and new lan-guage extensions that abstract away concurrency, such as C for Through-put (Ct), a flexible, data-parallel pro-gramming language [13].

The software challenge may also be addressed by adding features to the processor. Acceleration of

software transactional memory [14] and support for debugging and cor-rectness [13] are just two examples of such work. Moving forward, a key aspect of microprocessor research is to find other hardware features that simplify parallel programming.

New OpportunitiesWhile terascale architecture pres-ents many challenges, it also offers many unique opportunities as a consequence of its highly integrated multicore design. One example is the increased ability to deal with re-liability challenges of future process technology characteristics. With many cores, the potential to main-tain spares will give new options for dealing with hard errors. Failed or deteriorating cores can be replaced by using spares during testing or during the lifetime of the processor. Multiple cores or hardware threads could be used for redundant com-putation for error detection. Small cores will have a smaller area of confinement for defects, thereby re-ducing their impact. When combined with spare cores, this will dramati-cally improve yields over designs with fewer, larger cores.

Terascale architecture’s very high computation density combined with its fine granularity (each small core will have significant processing pow-er) will allow the flexibility to dedi-cate resources to certain functions. Examples are system management and single use devices—for example, appliances using Voice over Inter-net Protocol (VOIP). The dedicated resources allow for an independent software environment. Together with an on-die fabric having hardware per-formance isolation, these resources provide the quality of service needed for flawless real-time media.

The modularity of a scalable fab-ric designed to support various num-bers and types of compute elements will allow reduced time to market. This support is made possible by the shipment of highly segmented product with intermediate releases of products that can be differenti-ated via a new computation element or core. The development of smaller building blocks that function within a regular fabric will allow for factor-ing of validation.

Of course, the greatest oppor-tunities for terascale architecture come from the continued improve-ment of platform performance that has been an abundant source of in-novation for the PC platform. New applications with new capabilities and dramatically better human in-terfaces will be enabled by teras-cale architecture.

SummaryWe foresee a continuing digital revo-lution in health, media, social net-working, and many other areas of our lives based on new and emerg-ing applications. These applications will require revolutionary improve-ments in performance and capabili-ties in future microprocessors so that they can process terabytes of information with teraflops of teras-cale computing power.

In a terascale world, there will be new processing capabilities for mining and interpreting the world’s growing mountain of data,

With a terascale architecture computer, the user will participate in a rich, immersive real-time collaboration in a virtual environment, with studio quality, photo-realistic 3-D graphics.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.

Page 7: Applications may be the edge of science fiction, but they are

IEEE SOLID-STATE CIRCUITS MAGAZINE WINTER 20 0 9 89

and for doing so with even greater efficiency. Intelligent agents could advise users in real time on stock trades and other financial deci-sions. Such agents could search massive collections of digital vid-eos to find specific people and events, and even edit a new vid-eo based on what the user wants to see. For gamers, there is the obvious benefit of photo-realistic, real-time graphics. The benefits go far beyond gaming: interactive vir-tual environments are now being developed for both collaboration and education, such as learning a language by interacting with vir-tual native speakers, or learning to deal with medical emergencies on a simulated human body.

Microprocessor development must meet many technical challenges to realize the opportunity that these emerging applications present. De-vice technology scaling continues to follow Moore’s law, with innova-tions such as high-k gate insulation ensuring its continuation. However, the slowing of threshold voltage scaling and power constraints will limit frequency as the basis for in-creased performance. We must ex-ploit other means to achieve radical increases in performance, such as chip-level multiprocessing. An on-die interconnection fabric that pro-vides the requisite high bandwidth and low latency must be developed to interconnect the cores. Feeding this level of computation power will require corresponding increases in power-efficient system memory ac-cess and I/O, requiring innovations in caches, memory devices, and their interconnects.

Multicore parallelism is visible to software and requires multithread-ed concurrent programming—a new challenge to today’s mainstream pro-grammers. Acceleration of single-threaded code and legacy-threaded binaries will require new technol-ogy. New languages, programming models, and methods will be devel-oped to better enable parallel pro-gramming which will themselves

inspire new extensions to micropro-cessor architecture.

The resulting discoveries and successes will not only shape the future of microarchitecture but will guide the capabilities of the underlying platforms and allow the possibilities of the future to be-come a reality through revolution-ary applications.

References[1] P. Dubey, “A platform 2015 model: recog-

nition, mining and synthesis moves com-puters to the era of tera,” Intel Technol. J., vol. 9, no, 2, vi, Feb. 2005.

[2] S. Borkar et al., “Parameter variations and impact on circuits and microarchitec-ture,” in Proc. ACM/IEEE Design Automa-tion Conf., Jun. 2003, pp. 338–342.

[3] S. Borkar, “Tackling variability and reli-ability challenges,” IEEE Design Test Com-put., vol. 23, no. 6, p. 520, Nov. 2006.

[4] H. Kual, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, S. Borkar, “A 320mV 56μW 411 GOPS/watt ultra-low-voltage motion-estimation accelerator in 65nm CMOS,” in Int. Solid-State Circuits Conf. Tech. Dig., 2008, pp. 316–616.

[5] D. N. Jayasimha, B. Zafar, Y. Hoskote, “On-die interconnection networks: Why they are different and how to compare them,” Microprocessor Technology Lab, Corporate Technology Group, Intel Corp. [Online]. Available: http://blogs.intel.com/research/ terascale/ODI_why-different.pdf.

[6] I. Ban et al., “A scaled floating body cell (FBC) memory with high-K+metal gate on thin-silicon and thin-BOX for 16-nm tech-nology node and beyond,” in Proc. Symp. VLSI Technology, 2008.

[7] A. W. Topol et al., “Three-dimensional in-tegrated circuits,” IBM J. Res. Devel., vol. 50, no. 4/5, pp. 491–506, 2006.

[8] M. Azimi, N. Cherukuri, D.N. Jayasimha, A. Kumar, P. Kundu, S. Park, I. Schoinas, and A. Vaidya, “Integration challenges and tradeoffs for tera-scale architec-tures,” Intel Technology J., pp. 173–181, Aug. 2007. [Online]. Available: http://www.intel.com/technology/itj/2007/v11i3/1-integration/1-abstract.htm.

[9] “Intel QuickPath Architecture.” [Online]. Available: www.intel.com/technology/quickpath/whitepaper.pdf

[10] L. Seiler, D. Carmean, E. Sprangle, T. For-syth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espana, Ed. Grochowski, T. Juan, and P. Hanrahan, “Larrabee: A many-core x86 architecture for visual computing,” in Proc. Int. Conf. Computer Graphics Interactive Techniques (SIGGRAPH), Los Angeles, 2008.

[11] T. Li, D. Baumberger, D.A. Koufaty, and S. Hahn, “Efficient operating system scheduling for performance-asymmetric multi-core architectures,” in Proc. 2007 ACM/IEEE Conf. Supercomputing, 2007, article 53.

[12] P. Marcuello and A. González, “Thread-spawning schemes for speculative multi-threaded architectures,” in Proc. 8th Int. Symp. High Performance Computer Archi-tectures, 2002, pp. 55.

[13] “Ct: A flexible parallel programming model for tera-scale architectures.” [Online]. Avail-able: http://techresearch.intel.com/User-

Files/en-us/File/terascale/Whitepaper- Ct.pdf

[14] Saha, A.-R. Adl-Tabatabai, and Q. Jacob-son, “Architectural support for software transactional memory,” in Proc. 39th Annu. IEEE/ACM Int. Symp. Microarchitec-ture, 2006, pp. 185–196.

[15] S. Chen, B. Falsafi, P. B. Gibbons, M. Kozuch, T.C. Mowry, R. Teodorescu, A. Ailamaki, L. Fix, G.R. Ganger, B. Lin, and S.W. Schloss-er, “Log-based architectures for general-purpose monitoring of deployed code,” in Proc. 1st Workshop Architectural System Support Improving Software Dependability, San Jose, CA, 2006, pp. 63–65.

About the AuthorJustin Rattner ([email protected]) is vice president and chief technology officer of Intel Corpora-tion. He is also an Intel Senior Fellow and head of Intel’s Corporate Tech-nology Group. He directs Intel’s glob-al research efforts in microproces-sors, systems, and communications, including the company’s “disruptive” research activity aimed at replacing existing dominant technologies. In 1989, Rattner was named scientist of the year by R&D Magazine for his leadership in parallel and distribut-ed computer architecture. In Decem-ber 1996, he was featured as person of the week by ABC World News for his visionary work on the Depart-ment of Energy’s ASCI Red System. In 1997, he was honored as one of the Computing 200 and profiled in the book Wizards and Their Wonders (ACM Press). He has received two Intel Achievement Awards for his work in high-performance comput-ing and advanced cluster communi-cation architecture. He is a member of the executive committee of Intel’s Research Council and serves as the Intel executive sponsor for Cornell University, where he is a member of the External Advisory Board for the School of Engineering. He is also a trustee of the Anita Borg Institute for Women and Technology. Before joining Intel in 1973, he worked at Hewlett Packard and Xerox. Intel named him its first principal en-gineer in 1979 and its fourth Intel Fellow in 1988. He holds B.S. and M.S. degrees from Cornell Univer-sity in electrical engineering and computer science.

Authorized licensed use limited to: Politecnico di Milano. Downloaded on December 23, 2009 at 04:34 from IEEE Xplore. Restrictions apply.


Recommended