+ All Categories
Home > Documents > Enabling Eco-Friendly Smart Data - EcoCloud

Enabling Eco-Friendly Smart Data - EcoCloud

Date post: 03-Feb-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
70
EcoCloud Enabling Eco-Friendly Smart Data Babak Falsafi Director, EcoCloud ecocloud.ch
Transcript

EcoCloud�Enabling Eco-Friendly Smart Data

Babak FalsafiDirector, EcoCloudecocloud.ch

A Brief History of IT

2

§ From computing-centric to data-centric§ Consumer Era: interfacing, connectivity and access

1970s-

PC Era

Communication Era

Mainframes1980s 1990s Today+

Consumer Era

Data Shaping the Future of IT

3

IoT: An Increasing Source of Data

§ From industrial use, to transportation, grid, buildings, self-driving cars…...

4

[Source: IDC]

28 Billion Connected Devices 4 Zettabyttes of Data, 10% of Digital Universe

Data Shaping All Science & TechnologyScience entering 4th paradigm§ Analytics using IT on

§  Instrument data§  Simulation data§  Sensor data§ Human data § …

Complements theory, empirical science & simulation

5Shifting funding in sciences towards Big Data analytics!

6picture from the Blue Brain Project, EPFL

Big Data Analytics in Human Brain

1 Billion Euros to Model the Brain(a consortium of 150 scientists from around world)

7

21.03.17 7

Big Data for Digital Humanities:Online view of millennia of city’s history

Venice Time Machine

Source: James Hamilton, 2014 http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_Reinvent20131115.pdf

Data Growing Faster than Technology

9

0

5

10

15

20

25

2004 2006 2008 2010 2012

Peta

byte

s of D

ata

Technology improvement (Moore's Law)

Largest publically-reported data warehouse size

Silicon is running out of steam!

Voltage scaling nearly dead10

0.01

0.1

1

130 nm

90 nm

65 nm

45 nm

32 nm

22 nm

14 nm

10 nm

$/Tr

ansis

tor

0

0.2

0.4

0.6

0.8

1

1.2

2001 2006 2011 2016 2021 2026

Pow

er S

uppl

y Vdd

2001 2013

Slope = .053Slope = .014

Density scaling getting there

Increasing Strain on Data Platforms

§ Modern datacenters è 20 MW!§ In modern world, 6% of all electricity, growing at >20%!

11

2001 2005 2009 2013 2017 0

40 80

120 160 200 240 280

Datacenter Electricity DemandsIn the US(source: Energy Star)

50 million Swiss homes

A Modern Datacenter

17x football stadium, $3 billion

Bill

ion

Kilo

wat

t hou

r/yea

r

How do we build Big Data platforms of tomorrow?12

13

The first & only academic research center of its kind �

A multi-disciplinary research center pioneering efficient and sustainable data-centric information technologies

(founded in 2011)

Center to bring efficiency to data§ 18 faculty, 50 researchers§ Around $8M/year research

Mission:§ Energy-efficient data-centric IT§ From algorithms to machine infrastructure

§  E.g., Big Data analytics, integrated computing/cooling,…§ Maximizing Performance/TCO for Big Data

14

Faculty

15

Aberer Ailamaki Argyraki Atienza Candea

Falsafi Guerraoui

Lenstra

Jones

Thome ZwaenepoelKoch

Bugnion

Larus

Cevher

Odersky

Ford Jaggi

Who We AreStrong team§ Several ACM/IEEE Fellows + European Academies§ Award winning: e.g., 10 ERCs, 1 EURYI, 2 Sloans§ High citation index + top academic alumni placement§ High impact on industry:

§ Co-founder of VMware, VP of Cisco + several startups§ Father of Scala language: in use by Amazon, LinkedIn, Twitter, UBS§ Technology transfer : IBM BlueGene, AMD SimNOW & HP

Strong track record of collaborating with industry16

Today’s Server EcosystemIT industry/market is horizontal:§ Per-vendor layer§ Well-defined interfaces§ Near-neighbor optimization at best

Big vendors (e.g., Amazon, Google)§ Can do cross-layer optimizations§ But,

§ Only limited to services of interest§ Are limited in extent (e.g., software)§ Use proprietary technologies§ Circumscribe data usage!

Middleware(data,webservices)

Applica4on

Run4meSystem(scrip4ng,DSLs)

Opera4ngSystem(resourcemanagement)

Server(processor,mem,storage,network)

Infrastructure(cooling,power)

17

Our Vision:�Holistic Optimization of Data Platforms

Holistic optimization§ From algorithms to infrastructure§ Cross-layer optimization§ IT paradigms to manage energy§ End-to-end robust and resilient data

platforms

Open technologies!Algorithm

Infra

struc

ture

Our Vision:�The ISA Triangle of Efficiency

Approximation(Tailor Work for Output Quality)

Holistic Optimization

Integrated Thermal & Load Balancing

Project PMSM§  Synergistic IT load/thermal control §  Real-time monitoring of 5K servers§  Fine-grain power/thermal sensors

50% energy savings!

20

Integrated Cooling: CMOSAIC3D server chipTwo-phase liquid cooling

§ Enables higher thermals§ Dramatically better heat removal

Prototyped by IBM

21

PCB

Micro-Heater Liquid

Micro- Channels

Integrated Power Subsystem:�GreenDataNet

Towards Energy-Neutral Datacenters§ Power generation + storage +

provisioning with services§ Federated sites§ Grid load management

2222

Scale-Out DatacentersVast data sharded across servers

Memory-resident workloads§ Necessary for performance§ Major TCO burden

Processors access data in memory§ Abundant request-level parallelism§  Performance scales with core count

23

Data

Core Core Core

$

Core Core Core

Memory

Design servers around memory!

How efficient are servers for in-memory apps?� 3.0 (parsa.epfl.ch/cloudsuite)�

24

In-Memory AnalyticsRecommendation System

Graph AnalyticsGraphX

Data AnalyticsMachine learning

Web Search Apache Solr & Nutch

Media StreamingNginx, HTTP Server

Web ServingNginx, PHP server

Data ServingCassandra NoSQL

Data CachingMemcached

In use by AMD, Broadcom, Cavium, Huawei, HP, Intel, Google….

Big Data Workloads Stuck in Memory!

25

0%

25%

50%

75%

100%

0

1

2

3

4

Tota

l Exe

cutio

n C

ycle

s

App

licat

ion

IPC

Application IPC Memory Cycles

Execute ~1 instruction per cycle

CloudSuite on Modern Servers [ASPLOS 2012]

26

Too few cores!

Cores �too fat!}} }

8 MB (60%) waste of space (no reuse)!

B/W unused!

Workload/Server Mismatch

What do Existing Processors Offer?

✘ Few fat cores✘ Large LLC

27

Core Core Core

$

Core Core Core

Intel Xeon (~100 W)

✘  Few lean coresü  Compact LLC

ü  Many lean cores✘  Large LLC✘  Large distance

C

C

C

C$$

Calxeda (~5W)

C

C$C

C$

$

C

C$

$C

C$

$

C

C$

$C

C$

$

C

C$

$C

C$

$

C

C$

$C

C$

$

C

C$

$C

C$

$

$

Tilera (~30W)

$

$

Mismatch with workload demands!

Specialized Processors for In-Memory Services: Scale-Out Processors [ISCA’12, IEEE Micro’12]

One or more stand-alone (physical) servers§ Runs a full software stack

No inter-pod connectivity or coherence§  Scalability and optimality across generations

Pods can share chip I/O (e.g., memory, network, etc.)

28Inherently Software Scalable!40nm 20nm 10nm

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

C

C

C

$

C

CC

C

$

C

C

C

C

C

C C

CC C

C

C

C $$

C

C

C C

CC C

C

C

C $$

C

C

C

$

C

CC

C

$

C

C

C

CC

C

$

C

CC

$

C

C

C

C

C

C

C C

CC

C C

C

C

$$

C

C

C C

CC

C C

C

C

$$

CC

C

$

C

CC

$

C

C

C

C

NOC-Out: [Micro’12]�

Specialized Network-on-Chip for PodsExactly the opposite of current NoCs

§  Cache coherent§  But, designed for core-to-cache communication§ Not core-to-core!

LLC network:§ Flattened Butterfly (FB) topology

Request & Reply networks: § Tree topology§ Limited connectivity for efficiency

FB’s performance at 1/10th cost

29

29

C C

$ C

C C

$ C

C C

$ C

C C

$ C

C C C C

C C C C

C C C C

C

C

C C

C C C

C

C

C $ $

Footprint Cache: [ISCA’13]�Effective Die-Stacked Caching for Pods

Die-Stacked Caching:§  Rich connectivity à High on-chip BW§  High capacity à Low off-chip BW

Footprint Cache:§  Allocate tags for pages§  Predict & fetch page’s footprint

30

C

C

C

$

C

C C

C

$

C

C

C

C

C

C C

C C C

C

C

C $ $

C

C

C C

C C

C C

C

C

$ $

C C

C

$

C

C C

$

C

C

C

C LOGIC

Off-chip memory

Die-Stacked Cache

Scale-Out Processor

EuroCloud Server : (eurocloudserver.com) �3D Scale-Out Chip for In-Memory ComputingMobile efficiency in servers§ Swarms of ARM cores§ 3D memory § 10x performance/TCO§ Runs Linux LAMP stackPlanned prototype:§ ARM/ST/cea + Chalmers/FORTH in EuroServer FP7§ Data Processing Unit by Huawei

31

Cavium’s ThunderX

32

48-core 64-bit ARM SoC based on “Clearing the Clouds”, ASPLOS’12§  Large instruction caches§  Low single-thread ILP§  2x area for cores vs. cache§  Data-centric scale-out chip

ATraPos Data Islands

§ NUMA is a bottleneck in OLTP§ No single configuration is optimal for all workloads§ Technologies for cache affinity in OLTP

core core coreL1L2

L1L2

L1L2

L3

coreL1L2

memory controllerInter-socket links

core core core

L1L2

L1L2

L1L2

L3memory controller

core

L1L2

Inter-socket links

L1

lightweight fabric inter-socket links

Island/PodL3

ATraPos

Optimal system configuration

[VLDB’12, ICDE’14]

33

Flashback 2004: �Shekhar Borkar’s (Intel Fellow) Keynote @ Micro

TCB Exec

Core

PLL

OOO

ROM

CAM1TCB Exec

Core

PLL

ROB

ROM

CLB

Inputseq

Sendbuffer

2.23 mm X 3.54 mm, 260K transistors An idea too early for its time?

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1995 2000 2005 2010 2015 M

IPS Pentium

@75W

XScale @~2W

Intel’s TCP/IP Processor

Specialization: �An idea whose time has come

35

35

§  One FPGA per blade§  All FPGAS connected in half rack§  6×8 2-D torus topology§  High-end Stratix V FPGAs §  Running Bing Kernels for feature extraction

and machine learning

Microsoft Unveils Catapult to �Accelerate Bing!

[EcoCloud Annual Event, June 5th, 2014]

Future of Specialization§ High-Level functional programs

§ Optimize using domain knowledge

§ Strip abstraction overhead using staging

§ Heterogeneous + parallel targets

36

MilkyWay-2

Xeon Phi

Nvidia Maxwell

AlteraFPGA

MPIPGAS

PthreadsOpenMP

CUDA OpenCL

Verilog VHDL

Programmer

Hardware

general-purpose compiler

(illustration: Markus Püschel)

Programmer

Hardware

Result:high-bandwidth

extensiblecompilation

pipeline

Key Idea:make compilation

part of the program

Evaluation: k-means

Sujeeth et al. ICML’11 Brown et al. 2013 (submitted)

1

4.5

7.1

012345678

Cluster

Spark

Delite CPUDelite GPU0.3 0.4 0.4

1.0

4.0

7.1

1.25

012345678

1 CPU 4 CPU 8 CPU

Parallelized Matlab OptiML C++

Scala DSL’s in Datacenters§ Key challenge is programming accelerators

§ FPGA programming is hardware development§ Systems change very frequently

§ Using DSL’s to specialize data & system services

40

Specialized Database Stack: DBToaster

41dbtoaster.org

Compiling offline analytics into online/incremental engines Aggressive code specialization

Data StreamLow-latency in-memory stream processing Up to 6 OOM faster than commercial systems

1 10

100 1000

10000 100000

1000000 10000000

Q3 Q9 Q10 Q11 Q14 Q15 Q21

Graph Analytics with X-Stream

42

[SOSP’13]

4 billion edges in 26 mins

64 billion edges in in 26 hours

512 million edges in 28 sec

RAM

SSD

HDD

Weakly Connected Components

1U server

Swiss Road Network

Facebook

Open Source Cloud solution for the Internet of ThingsEstablished Open-source platform for IoT

•  Configure, deploy and use IoT services•  Auditing privacy of IoT apps in the cloud•  Energy-efficient data harvesting•  Publish/subscribe for continuous processing and sensor

data filtering

github.com/OpenIotOrg/openiot

Smart Manufacturing Campus Guide Air Monitoring Agriculture Sensing

OpenIOT

43

Open Service Platform for the Next-Generation of Personal Clouds

•  Privacy enhancing tools to end-users for protection of privacy •  Allow transparent access to data stored at multiple cloud services•  Federate datacenters to share resources and meet user demands

•  Framework based on context and semantics of data sharing

•  Crowdsourcing to determine sharing sensitivity

•  Software to provide policy recommendations for safe sharing of files

CloudSpaces

44

Invisible DBMS

NoDB: in-situ queries in the Cloud

45Zero data-to-query time!

Give me your data as is

(supports SQL + your tools)

Give me your queries

Get your results!

Adaptive Kernel

Adaptive Load

Adaptive Data Store

dias.epfl.ch/NoDB

ViDa: Building databases through queries

46

MapReduce Engine

Relational DBMS

Reporting Server SpreadsheetEnterprise

Search �Engine

External DataSources

OperationalDatabases

JIT Data Virtualization:

ü Query heterogeneous data •  CSV, JSON, Arrays, DBMS, …

ü Query engine built at runtime

ü No static decisions•  Adapt to queries and data

dias.epfl.ch/ViDa

[VLDB’14, CIDR’15]

§ Data plane principles: zero-copy, run-to-completion, coherence free§ Protected operating system with clean-slate API§ Specialized for in-memory event-driven applications

App

IX

Kernel-levelHypervisor-level

NIC

User-level Linux

3.6x throughput with <50% latency @ 99th percentile

Specialized Network Stack:�The IX kernel [Belay’14, OSDI best paper]

Specialized Datacenter Networks

Software defined (SDN):§ Hides resource management§ Leverages workload locality§ Alleviates TCAM scalability§ Spoke with Intel Berkeley ISTC

Rack-level virtualization:§ Uses hardware assist§ Integrated into OpenStack§ Prototype with Broadcom Trident

Access Control

Traffic Engineering

SDN ControllerServers

VMVM

VM

Virtual Switch

TCAM

Supervisor Engine

Hypervisor

Today’s Network Fabrics Bottleneck!

In-Memory Latency critical services§ Graphs, KV, DB

Vast datasets à distribute§ Often within rack

Today’s networks:Latency 20x-1000x of DRAM

49

DataData

1x

20x – 1000x

Remote access latency >> local access latency

Big Data on ccNUMA: Expensiveü Ultra-low latency

Cost and complexity of scaling upFault-containment

50

512GB 32TB3TB

Ultra low-latency but ultra expensive

Big Data on Commodity Fabrics: Slow

51

ü Cost-effective rack-scale fabrics of SoCs  High remote latency (~ >10 us)

AMD’s SeaMicro HP’s Moonshot

Need low-latency rack-scale fabric!

§ Global virtual address space w/o global coherence§ RDMA-inspired programming model

§  Integrated Network Interface (NI)§ Software Accessible Remote Memory Controller (RMC)

§ Lean NUMA fabric§ Reliable user-level messaging over a minimal protocol 52

core . . .

LLC

core

Memory Controller

Remote �MC

N I

core

NUMA fabric

Coherence domain 1

Coherence domain 2

300 ns round-trip latency to remote memory

Scale-Out NUMA (soNUMA):�Rack-scale In-memory Computing [ASPLOS’14]

PriFi: fingerprinting-resistant WLANsCurrent home/business WiFi networks vulnerable to fingerprinting, tracking, and snooping attacks§ Even if wireless network is encrypted.

[Srinivasan et al, UbiComp '08]

PriFi: new wireless access point and VPN design that “blends together” traffic of all WLAN devices, making them provably indistinguishable§ Builds on Prof. Ford's prior Dissent project on anonymous

communication technologies

53

PriFi Architecture Diagram

54

Online Destinations:Untrusted, near or far

Relay: Untrusted,near clients

Users, Devices: all untrackable, indistinguishable

to snoopers

Trustees: Remote, high-latency, independent servers

~400ms ping

Remote trustees ensure clients' anonymity, security, and indistinguishability, but are not in communication “inner loop”

~40msping

IoT devices require roots of trust

55

Alice

Check E-mail

Send Text-Message

Downloadsoftware update

Bob

What's the time?

Weak roots of trust → weak networks

56

Secure, Decentralized Roots of TrustIoT devices need secure sources of time, cryptographic randomness, naming/directory services, etc.§ But users rightfully concerned about just trusting a single centralized,

trusted authority§ Need to distribute trust across multiple services

§ Move from weakest-link to strongest-link security

57

CoSi: Scalable Collective Signing

58

Enables multiple authorities, many “witnesses”to cross-check and keep each other honest

Authority(leader)

WitnessesCollectiveAutority

(“Cothority”)

“Bob's public key is Y.”

“The time is 3PM.”

“Gmail's public key is X.”

“The latest version of Firefox is Z.”

A few words on ApproximationData services are probabilistic è Yet digital platforms are precise!

Much opportunity at the algorithmic/software level§ Learning algorithms (Cevher et. al.)§ Approximate querying (Koch et. al.)§ Programming (Rinard et. al.)

Architecture? § Bad: von Neumann not best suited for approximation

§  Control path dominates energy§ Dual datapath shown (Ceze et. al.) not useful

§ Good: support for neural processing§  Analog (Temam et. al.) or Digital (Esmailizadeh et. al.)

59

Big Data Analytics [NIPS’14]

Convex machine learning models§ Broad set of applications: sparse svm’s, low-rank matrix completions... § Bigger the data è faster the algorithms for the same statistical risk!

Smoothing idea: simplify the problem as data size gets larger

Lecture 0: Motivation

Role of convex analysis and optimization

I Convex models Fı = minxœRp

{F(x) := f (x) + g(x) : x œ �}

I Key advantages

1. convex geometry: provable estimation & noise stability

2. scalable algorithms: computation vs. accuracy tradeo�s

3. superb practical performance: de facto standard

key concepts: time-data tradeo�s for approximate estimation / prediction

0.3 0.4 0.5 0.6 0.7 0.8 0.90

100

200

300

400

500

Itera

tions

Number of samples/dimension (n/p)

0.3 0.4 0.5 0.6 0.7 0.8 0.90

1

2

3

4

5

µ

Iterations to solve elastic net

Elastic net parameter µ

0.3 0.4 0.5 0.6 0.7 0.8 0.90

5

10

15

20

25

30

Number of samples/dimension (n/p)Tim

e (s

)

Prof. Volkan Cevher [email protected] Mathematics of Data: From Theory to Computation

Lecture 0: Motivation

Role of convex analysis and optimization

I Convex models Fı = minxœRp

{F(x) := f (x) + g(x) : x œ �}

I Key advantages

1. convex geometry: provable estimation & noise stability

2. scalable algorithms: computation vs. accuracy tradeo�s

3. superb practical performance: de facto standard

key concepts: time-data tradeo�s for approximate estimation / prediction

0.3 0.4 0.5 0.6 0.7 0.8 0.90

100

200

300

400

500

Itera

tions

Number of samples/dimension (n/p)

0.3 0.4 0.5 0.6 0.7 0.8 0.90

1

2

3

4

5

µ

Iterations to solve elastic net

Elastic net parameter µ

0.3 0.4 0.5 0.6 0.7 0.8 0.90

5

10

15

20

25

30

Number of samples/dimension (n/p)

Time

(s)

Prof. Volkan Cevher [email protected] Mathematics of Data: From Theory to Computation

Industry Affiliates Program

Industrial Affiliates

Educational Programs

Research Projects

Annual Meeting of Affiliates Company

Technical Advisory Board Membership

Access to EcoCloudConferences/Programs

Recruit from EcoCloudInternship Program Continuing

Education/Outreach

Access to EcoCloud Research Results

Joint Research Projects

Network with EcoCloud Faculty and Researchers

Bringing it All Together

EcoCloud was founded in 2011§ Bridging Big Data w/ Energy Wall§ Our vision: ISA Optimization in Datacenters§ Strong industrial partnership program§ Real impact (industry, EU and beyond)

Thank You

63

End of Dennard Scaling

64

0

0.2

0.4

0.6

0.8

1

1.2

2001 2006 2011 2016 2021 2026

Pow

er S

uppl

y Vdd

2001

2013

Today

Slope = .053

Slope = .014

The fundamental energy silver bullet is gone!

[source: ITRS]

65

? ? IT’s Future

BridgingTechnologies

Big Data Big Energy

Two IT Trends on a Collision Course

1.  Big Data§  Data grows faster than 10x/year§  Silicon performance & capacity at 1.5x/year

2.  Energy§  Silicon density increase continues§  But, Silicon efficiency has slowed down/will stop§  IT energy not sustainable

66

Inflection Point #1: IT is all about Data

67

§ Data growth (by 2015) = 100x in ten years [IDC 2012]§ Population growth = 10% in ten years

§ Monetizing data for commerce, health, science, services, ….§ Big Data is shaping IT & pretty much whatever we do!

[source: Economist]

Inflection Point #2: Energy used to be “Free”

Four decades of Dennard Scaling (1970~2005):§ P = C V2 f § More transistors§ Lower voltages ➜  Constant power/chip

68

Robert H. Dennard, picture from Wikipedia

Dennard et. al., 1974

The Rise of Parallelism to Save the Day

With voltages leveling:§ Parallelism has emerged as the only silver bullet§ Use simpler cores § Prius instead of Audi

§ Restructure software§ Each core è fewer joules/op

69

Multicore Scaling

Conv

entio

nal S

erve

rCP

U (e

.g., X

eon)

M

oder

n M

ultico

reCP

U (e

.g., T

ilera

)

The Rise of Dark Silicon:�End of Multicore ScalingBut parallelism can not offset leveling voltages

Even in servers with abundant parallelism

Core complexity has leveled off too!

Soon, cannot power all chip

70

1248

163264

128256512

1024

2004 2007 2010 2013 2016 2019

Num

ber o

f Cor

es

Year of Technology Introduction

Max EMB Cores

EMB + 3D mem

GPP + 3D mem

Dark Silicon

Hardavellas et. al., “Toward Dark Silicon in Servers”, IEEE Micro, 2011


Recommended