Date post: | 03-Feb-2023 |
Category: |
Documents |
Upload: | khangminh22 |
View: | 0 times |
Download: | 0 times |
A Brief History of IT
2
§ From computing-centric to data-centric§ Consumer Era: interfacing, connectivity and access
1970s-
PC Era
Communication Era
Mainframes1980s 1990s Today+
Consumer Era
IoT: An Increasing Source of Data
§ From industrial use, to transportation, grid, buildings, self-driving cars…...
4
[Source: IDC]
28 Billion Connected Devices 4 Zettabyttes of Data, 10% of Digital Universe
Data Shaping All Science & TechnologyScience entering 4th paradigm§ Analytics using IT on
§ Instrument data§ Simulation data§ Sensor data§ Human data § …
Complements theory, empirical science & simulation
5Shifting funding in sciences towards Big Data analytics!
6picture from the Blue Brain Project, EPFL
Big Data Analytics in Human Brain
1 Billion Euros to Model the Brain(a consortium of 150 scientists from around world)
7
21.03.17 7
Big Data for Digital Humanities:Online view of millennia of city’s history
Venice Time Machine
Source: James Hamilton, 2014 http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_Reinvent20131115.pdf
Data Growing Faster than Technology
9
0
5
10
15
20
25
2004 2006 2008 2010 2012
Peta
byte
s of D
ata
Technology improvement (Moore's Law)
Largest publically-reported data warehouse size
Silicon is running out of steam!
Voltage scaling nearly dead10
0.01
0.1
1
130 nm
90 nm
65 nm
45 nm
32 nm
22 nm
14 nm
10 nm
$/Tr
ansis
tor
0
0.2
0.4
0.6
0.8
1
1.2
2001 2006 2011 2016 2021 2026
Pow
er S
uppl
y Vdd
2001 2013
Slope = .053Slope = .014
Density scaling getting there
Increasing Strain on Data Platforms
§ Modern datacenters è 20 MW!§ In modern world, 6% of all electricity, growing at >20%!
11
2001 2005 2009 2013 2017 0
40 80
120 160 200 240 280
Datacenter Electricity DemandsIn the US(source: Energy Star)
50 million Swiss homes
A Modern Datacenter
17x football stadium, $3 billion
Bill
ion
Kilo
wat
t hou
r/yea
r
13
The first & only academic research center of its kind �
A multi-disciplinary research center pioneering efficient and sustainable data-centric information technologies
(founded in 2011)
Center to bring efficiency to data§ 18 faculty, 50 researchers§ Around $8M/year research
Mission:§ Energy-efficient data-centric IT§ From algorithms to machine infrastructure
§ E.g., Big Data analytics, integrated computing/cooling,…§ Maximizing Performance/TCO for Big Data
14
Faculty
15
Aberer Ailamaki Argyraki Atienza Candea
Falsafi Guerraoui
Lenstra
Jones
Thome ZwaenepoelKoch
Bugnion
Larus
Cevher
Odersky
Ford Jaggi
Who We AreStrong team§ Several ACM/IEEE Fellows + European Academies§ Award winning: e.g., 10 ERCs, 1 EURYI, 2 Sloans§ High citation index + top academic alumni placement§ High impact on industry:
§ Co-founder of VMware, VP of Cisco + several startups§ Father of Scala language: in use by Amazon, LinkedIn, Twitter, UBS§ Technology transfer : IBM BlueGene, AMD SimNOW & HP
Strong track record of collaborating with industry16
Today’s Server EcosystemIT industry/market is horizontal:§ Per-vendor layer§ Well-defined interfaces§ Near-neighbor optimization at best
Big vendors (e.g., Amazon, Google)§ Can do cross-layer optimizations§ But,
§ Only limited to services of interest§ Are limited in extent (e.g., software)§ Use proprietary technologies§ Circumscribe data usage!
Middleware(data,webservices)
Applica4on
Run4meSystem(scrip4ng,DSLs)
Opera4ngSystem(resourcemanagement)
Server(processor,mem,storage,network)
Infrastructure(cooling,power)
17
Our Vision:�Holistic Optimization of Data Platforms
Holistic optimization§ From algorithms to infrastructure§ Cross-layer optimization§ IT paradigms to manage energy§ End-to-end robust and resilient data
platforms
Open technologies!Algorithm
Infra
struc
ture
Our Vision:�The ISA Triangle of Efficiency
Approximation(Tailor Work for Output Quality)
Holistic Optimization
Integrated Thermal & Load Balancing
Project PMSM§ Synergistic IT load/thermal control § Real-time monitoring of 5K servers§ Fine-grain power/thermal sensors
50% energy savings!
20
Integrated Cooling: CMOSAIC3D server chipTwo-phase liquid cooling
§ Enables higher thermals§ Dramatically better heat removal
Prototyped by IBM
21
PCB
Micro-Heater Liquid
Micro- Channels
Integrated Power Subsystem:�GreenDataNet
Towards Energy-Neutral Datacenters§ Power generation + storage +
provisioning with services§ Federated sites§ Grid load management
2222
Scale-Out DatacentersVast data sharded across servers
Memory-resident workloads§ Necessary for performance§ Major TCO burden
Processors access data in memory§ Abundant request-level parallelism§ Performance scales with core count
23
Data
Core Core Core
$
Core Core Core
Memory
Design servers around memory!
How efficient are servers for in-memory apps?� 3.0 (parsa.epfl.ch/cloudsuite)�
24
In-Memory AnalyticsRecommendation System
Graph AnalyticsGraphX
Data AnalyticsMachine learning
Web Search Apache Solr & Nutch
Media StreamingNginx, HTTP Server
Web ServingNginx, PHP server
Data ServingCassandra NoSQL
Data CachingMemcached
In use by AMD, Broadcom, Cavium, Huawei, HP, Intel, Google….
Big Data Workloads Stuck in Memory!
25
0%
25%
50%
75%
100%
0
1
2
3
4
Tota
l Exe
cutio
n C
ycle
s
App
licat
ion
IPC
Application IPC Memory Cycles
Execute ~1 instruction per cycle
CloudSuite on Modern Servers [ASPLOS 2012]
26
Too few cores!
Cores �too fat!}} }
8 MB (60%) waste of space (no reuse)!
B/W unused!
Workload/Server Mismatch
What do Existing Processors Offer?
✘ Few fat cores✘ Large LLC
27
Core Core Core
$
Core Core Core
Intel Xeon (~100 W)
✘ Few lean coresü Compact LLC
ü Many lean cores✘ Large LLC✘ Large distance
C
C
C
C$$
Calxeda (~5W)
C
C$C
C$
$
C
C$
$C
C$
$
C
C$
$C
C$
$
C
C$
$C
C$
$
C
C$
$C
C$
$
C
C$
$C
C$
$
$
Tilera (~30W)
$
$
Mismatch with workload demands!
Specialized Processors for In-Memory Services: Scale-Out Processors [ISCA’12, IEEE Micro’12]
One or more stand-alone (physical) servers§ Runs a full software stack
No inter-pod connectivity or coherence§ Scalability and optimality across generations
Pods can share chip I/O (e.g., memory, network, etc.)
28Inherently Software Scalable!40nm 20nm 10nm
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
C
C
C
$
C
CC
C
$
C
C
C
C
C
C C
CC C
C
C
C $$
C
C
C C
CC C
C
C
C $$
C
C
C
$
C
CC
C
$
C
C
C
CC
C
$
C
CC
$
C
C
C
C
C
C
C C
CC
C C
C
C
$$
C
C
C C
CC
C C
C
C
$$
CC
C
$
C
CC
$
C
C
C
C
NOC-Out: [Micro’12]�
Specialized Network-on-Chip for PodsExactly the opposite of current NoCs
§ Cache coherent§ But, designed for core-to-cache communication§ Not core-to-core!
LLC network:§ Flattened Butterfly (FB) topology
Request & Reply networks: § Tree topology§ Limited connectivity for efficiency
FB’s performance at 1/10th cost
29
29
C C
$ C
C C
$ C
C C
$ C
C C
$ C
C C C C
C C C C
C C C C
C
C
C C
C C C
C
C
C $ $
Footprint Cache: [ISCA’13]�Effective Die-Stacked Caching for Pods
Die-Stacked Caching:§ Rich connectivity à High on-chip BW§ High capacity à Low off-chip BW
Footprint Cache:§ Allocate tags for pages§ Predict & fetch page’s footprint
30
C
C
C
$
C
C C
C
$
C
C
C
C
C
C C
C C C
C
C
C $ $
C
C
C C
C C
C C
C
C
$ $
C C
C
$
C
C C
$
C
C
C
C LOGIC
Off-chip memory
Die-Stacked Cache
Scale-Out Processor
EuroCloud Server : (eurocloudserver.com) �3D Scale-Out Chip for In-Memory ComputingMobile efficiency in servers§ Swarms of ARM cores§ 3D memory § 10x performance/TCO§ Runs Linux LAMP stackPlanned prototype:§ ARM/ST/cea + Chalmers/FORTH in EuroServer FP7§ Data Processing Unit by Huawei
31
Cavium’s ThunderX
32
48-core 64-bit ARM SoC based on “Clearing the Clouds”, ASPLOS’12§ Large instruction caches§ Low single-thread ILP§ 2x area for cores vs. cache§ Data-centric scale-out chip
ATraPos Data Islands
§ NUMA is a bottleneck in OLTP§ No single configuration is optimal for all workloads§ Technologies for cache affinity in OLTP
core core coreL1L2
L1L2
L1L2
L3
coreL1L2
memory controllerInter-socket links
core core core
L1L2
L1L2
L1L2
L3memory controller
core
L1L2
Inter-socket links
L1
lightweight fabric inter-socket links
Island/PodL3
ATraPos
Optimal system configuration
[VLDB’12, ICDE’14]
33
Flashback 2004: �Shekhar Borkar’s (Intel Fellow) Keynote @ Micro
TCB Exec
Core
PLL
OOO
ROM
CAM1TCB Exec
Core
PLL
ROB
ROM
CLB
Inputseq
Sendbuffer
2.23 mm X 3.54 mm, 260K transistors An idea too early for its time?
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1995 2000 2005 2010 2015 M
IPS Pentium
@75W
XScale @~2W
Intel’s TCP/IP Processor
Specialization: �An idea whose time has come
35
35
§ One FPGA per blade§ All FPGAS connected in half rack§ 6×8 2-D torus topology§ High-end Stratix V FPGAs § Running Bing Kernels for feature extraction
and machine learning
Microsoft Unveils Catapult to �Accelerate Bing!
[EcoCloud Annual Event, June 5th, 2014]
Future of Specialization§ High-Level functional programs
§ Optimize using domain knowledge
§ Strip abstraction overhead using staging
§ Heterogeneous + parallel targets
36
MilkyWay-2
Xeon Phi
Nvidia Maxwell
AlteraFPGA
MPIPGAS
PthreadsOpenMP
CUDA OpenCL
Verilog VHDL
Programmer
Hardware
Result:high-bandwidth
extensiblecompilation
pipeline
Key Idea:make compilation
part of the program
Evaluation: k-means
Sujeeth et al. ICML’11 Brown et al. 2013 (submitted)
1
4.5
7.1
012345678
Cluster
Spark
Delite CPUDelite GPU0.3 0.4 0.4
1.0
4.0
7.1
1.25
012345678
1 CPU 4 CPU 8 CPU
Parallelized Matlab OptiML C++
Scala DSL’s in Datacenters§ Key challenge is programming accelerators
§ FPGA programming is hardware development§ Systems change very frequently
§ Using DSL’s to specialize data & system services
40
Specialized Database Stack: DBToaster
41dbtoaster.org
Compiling offline analytics into online/incremental engines Aggressive code specialization
Data StreamLow-latency in-memory stream processing Up to 6 OOM faster than commercial systems
1 10
100 1000
10000 100000
1000000 10000000
Q3 Q9 Q10 Q11 Q14 Q15 Q21
Graph Analytics with X-Stream
42
[SOSP’13]
4 billion edges in 26 mins
64 billion edges in in 26 hours
512 million edges in 28 sec
RAM
SSD
HDD
Weakly Connected Components
1U server
Swiss Road Network
Open Source Cloud solution for the Internet of ThingsEstablished Open-source platform for IoT
• Configure, deploy and use IoT services• Auditing privacy of IoT apps in the cloud• Energy-efficient data harvesting• Publish/subscribe for continuous processing and sensor
data filtering
github.com/OpenIotOrg/openiot
Smart Manufacturing Campus Guide Air Monitoring Agriculture Sensing
OpenIOT
43
Open Service Platform for the Next-Generation of Personal Clouds
• Privacy enhancing tools to end-users for protection of privacy • Allow transparent access to data stored at multiple cloud services• Federate datacenters to share resources and meet user demands
• Framework based on context and semantics of data sharing
• Crowdsourcing to determine sharing sensitivity
• Software to provide policy recommendations for safe sharing of files
CloudSpaces
44
Invisible DBMS
NoDB: in-situ queries in the Cloud
45Zero data-to-query time!
Give me your data as is
(supports SQL + your tools)
Give me your queries
Get your results!
Adaptive Kernel
Adaptive Load
Adaptive Data Store
dias.epfl.ch/NoDB
ViDa: Building databases through queries
46
MapReduce Engine
Relational DBMS
Reporting Server SpreadsheetEnterprise
Search �Engine
External DataSources
OperationalDatabases
JIT Data Virtualization:
ü Query heterogeneous data • CSV, JSON, Arrays, DBMS, …
ü Query engine built at runtime
ü No static decisions• Adapt to queries and data
dias.epfl.ch/ViDa
[VLDB’14, CIDR’15]
§ Data plane principles: zero-copy, run-to-completion, coherence free§ Protected operating system with clean-slate API§ Specialized for in-memory event-driven applications
App
IX
Kernel-levelHypervisor-level
NIC
User-level Linux
3.6x throughput with <50% latency @ 99th percentile
Specialized Network Stack:�The IX kernel [Belay’14, OSDI best paper]
Specialized Datacenter Networks
Software defined (SDN):§ Hides resource management§ Leverages workload locality§ Alleviates TCAM scalability§ Spoke with Intel Berkeley ISTC
Rack-level virtualization:§ Uses hardware assist§ Integrated into OpenStack§ Prototype with Broadcom Trident
Access Control
Traffic Engineering
SDN ControllerServers
VMVM
VM
Virtual Switch
TCAM
Supervisor Engine
Hypervisor
Today’s Network Fabrics Bottleneck!
In-Memory Latency critical services§ Graphs, KV, DB
Vast datasets à distribute§ Often within rack
Today’s networks:Latency 20x-1000x of DRAM
49
DataData
1x
20x – 1000x
Remote access latency >> local access latency
Big Data on ccNUMA: Expensiveü Ultra-low latency
Cost and complexity of scaling upFault-containment
50
512GB 32TB3TB
Ultra low-latency but ultra expensive
Big Data on Commodity Fabrics: Slow
51
ü Cost-effective rack-scale fabrics of SoCs High remote latency (~ >10 us)
AMD’s SeaMicro HP’s Moonshot
Need low-latency rack-scale fabric!
§ Global virtual address space w/o global coherence§ RDMA-inspired programming model
§ Integrated Network Interface (NI)§ Software Accessible Remote Memory Controller (RMC)
§ Lean NUMA fabric§ Reliable user-level messaging over a minimal protocol 52
core . . .
LLC
core
Memory Controller
Remote �MC
N I
core
NUMA fabric
Coherence domain 1
Coherence domain 2
300 ns round-trip latency to remote memory
Scale-Out NUMA (soNUMA):�Rack-scale In-memory Computing [ASPLOS’14]
PriFi: fingerprinting-resistant WLANsCurrent home/business WiFi networks vulnerable to fingerprinting, tracking, and snooping attacks§ Even if wireless network is encrypted.
[Srinivasan et al, UbiComp '08]
PriFi: new wireless access point and VPN design that “blends together” traffic of all WLAN devices, making them provably indistinguishable§ Builds on Prof. Ford's prior Dissent project on anonymous
communication technologies
53
PriFi Architecture Diagram
54
Online Destinations:Untrusted, near or far
Relay: Untrusted,near clients
Users, Devices: all untrackable, indistinguishable
to snoopers
Trustees: Remote, high-latency, independent servers
~400ms ping
Remote trustees ensure clients' anonymity, security, and indistinguishability, but are not in communication “inner loop”
~40msping
IoT devices require roots of trust
55
Alice
Check E-mail
Send Text-Message
Downloadsoftware update
Bob
What's the time?
Secure, Decentralized Roots of TrustIoT devices need secure sources of time, cryptographic randomness, naming/directory services, etc.§ But users rightfully concerned about just trusting a single centralized,
trusted authority§ Need to distribute trust across multiple services
§ Move from weakest-link to strongest-link security
57
CoSi: Scalable Collective Signing
58
Enables multiple authorities, many “witnesses”to cross-check and keep each other honest
Authority(leader)
WitnessesCollectiveAutority
(“Cothority”)
“Bob's public key is Y.”
“The time is 3PM.”
“Gmail's public key is X.”
“The latest version of Firefox is Z.”
A few words on ApproximationData services are probabilistic è Yet digital platforms are precise!
Much opportunity at the algorithmic/software level§ Learning algorithms (Cevher et. al.)§ Approximate querying (Koch et. al.)§ Programming (Rinard et. al.)
Architecture? § Bad: von Neumann not best suited for approximation
§ Control path dominates energy§ Dual datapath shown (Ceze et. al.) not useful
§ Good: support for neural processing§ Analog (Temam et. al.) or Digital (Esmailizadeh et. al.)
59
Big Data Analytics [NIPS’14]
Convex machine learning models§ Broad set of applications: sparse svm’s, low-rank matrix completions... § Bigger the data è faster the algorithms for the same statistical risk!
Smoothing idea: simplify the problem as data size gets larger
Lecture 0: Motivation
Role of convex analysis and optimization
I Convex models Fı = minxœRp
{F(x) := f (x) + g(x) : x œ �}
I Key advantages
1. convex geometry: provable estimation & noise stability
2. scalable algorithms: computation vs. accuracy tradeo�s
3. superb practical performance: de facto standard
key concepts: time-data tradeo�s for approximate estimation / prediction
0.3 0.4 0.5 0.6 0.7 0.8 0.90
100
200
300
400
500
Itera
tions
Number of samples/dimension (n/p)
0.3 0.4 0.5 0.6 0.7 0.8 0.90
1
2
3
4
5
µ
Iterations to solve elastic net
Elastic net parameter µ
0.3 0.4 0.5 0.6 0.7 0.8 0.90
5
10
15
20
25
30
Number of samples/dimension (n/p)Tim
e (s
)
Prof. Volkan Cevher [email protected] Mathematics of Data: From Theory to Computation
Lecture 0: Motivation
Role of convex analysis and optimization
I Convex models Fı = minxœRp
{F(x) := f (x) + g(x) : x œ �}
I Key advantages
1. convex geometry: provable estimation & noise stability
2. scalable algorithms: computation vs. accuracy tradeo�s
3. superb practical performance: de facto standard
key concepts: time-data tradeo�s for approximate estimation / prediction
0.3 0.4 0.5 0.6 0.7 0.8 0.90
100
200
300
400
500
Itera
tions
Number of samples/dimension (n/p)
0.3 0.4 0.5 0.6 0.7 0.8 0.90
1
2
3
4
5
µ
Iterations to solve elastic net
Elastic net parameter µ
0.3 0.4 0.5 0.6 0.7 0.8 0.90
5
10
15
20
25
30
Number of samples/dimension (n/p)
Time
(s)
Prof. Volkan Cevher [email protected] Mathematics of Data: From Theory to Computation
Industry Affiliates Program
Industrial Affiliates
Educational Programs
Research Projects
Annual Meeting of Affiliates Company
Technical Advisory Board Membership
Access to EcoCloudConferences/Programs
Recruit from EcoCloudInternship Program Continuing
Education/Outreach
Access to EcoCloud Research Results
Joint Research Projects
Network with EcoCloud Faculty and Researchers
Bringing it All Together
EcoCloud was founded in 2011§ Bridging Big Data w/ Energy Wall§ Our vision: ISA Optimization in Datacenters§ Strong industrial partnership program§ Real impact (industry, EU and beyond)
End of Dennard Scaling
64
0
0.2
0.4
0.6
0.8
1
1.2
2001 2006 2011 2016 2021 2026
Pow
er S
uppl
y Vdd
2001
2013
Today
Slope = .053
Slope = .014
The fundamental energy silver bullet is gone!
[source: ITRS]
Two IT Trends on a Collision Course
1. Big Data§ Data grows faster than 10x/year§ Silicon performance & capacity at 1.5x/year
2. Energy§ Silicon density increase continues§ But, Silicon efficiency has slowed down/will stop§ IT energy not sustainable
66
Inflection Point #1: IT is all about Data
67
§ Data growth (by 2015) = 100x in ten years [IDC 2012]§ Population growth = 10% in ten years
§ Monetizing data for commerce, health, science, services, ….§ Big Data is shaping IT & pretty much whatever we do!
[source: Economist]
Inflection Point #2: Energy used to be “Free”
Four decades of Dennard Scaling (1970~2005):§ P = C V2 f § More transistors§ Lower voltages ➜ Constant power/chip
68
Robert H. Dennard, picture from Wikipedia
Dennard et. al., 1974
The Rise of Parallelism to Save the Day
With voltages leveling:§ Parallelism has emerged as the only silver bullet§ Use simpler cores § Prius instead of Audi
§ Restructure software§ Each core è fewer joules/op
69
Multicore Scaling
Conv
entio
nal S
erve
rCP
U (e
.g., X
eon)
M
oder
n M
ultico
reCP
U (e
.g., T
ilera
)
The Rise of Dark Silicon:�End of Multicore ScalingBut parallelism can not offset leveling voltages
Even in servers with abundant parallelism
Core complexity has leveled off too!
Soon, cannot power all chip
70
1248
163264
128256512
1024
2004 2007 2010 2013 2016 2019
Num
ber o
f Cor
es
Year of Technology Introduction
Max EMB Cores
EMB + 3D mem
GPP + 3D mem
Dark Silicon
Hardavellas et. al., “Toward Dark Silicon in Servers”, IEEE Micro, 2011