+ All Categories
Home > Documents > Miller.on Chip Optical Communications

Miller.on Chip Optical Communications

Date post: 29-May-2018
Category:
Upload: sanu81
View: 215 times
Download: 0 times
Share this document with a friend

of 24

Transcript
  • 8/8/2019 Miller.on Chip Optical Communications

    1/24

    On-Chip OpticalCommunication for Multicore

    Processors

    Jason Miller

    Carbon Research Group

    MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCELAB

  • 8/8/2019 Miller.on Chip Optical Communications

    2/24

  • 8/8/2019 Miller.on Chip Optical Communications

    3/24

  • 8/8/2019 Miller.on Chip Optical Communications

    4/24

    4

    Number of cores doublesevery 18 months

    The Future of Multicore

    Parallelism replacesclock frequencyscaling and corecomplexity

    ResultingChallenges

    ScalabilityProgrammingPower

    MIT RAW Sun Ultrasparc T2 IBM XCell 8i

    Tilera TILE64

  • 8/8/2019 Miller.on Chip Optical Communications

    5/24

    5

    Multicore Challenges

    Scalability How do we turn additional cores into additional performance?

    Must accelerate single apps, not just run more apps in parallel Efficient core-to-core communication is crucial

    Architectures that grow easily with each new technologygeneration

    Programming Traditional parallel programming techniques are hard Parallel machines were rare and used only by rocket scientists Multicores are ubiquitous and must be programmable by

    anyone

    Power Already a first-order design constraint More cores and more communication more power Previous tricks (e.g. lower Vdd) are running out of steam

    M l i C i i

  • 8/8/2019 Miller.on Chip Optical Communications

    6/24

    6

    Multicore CommunicationToday

    Single shared resource

    Uniform communication costCommunication through

    memory

    Doesnt scale to many cores

    due to contention and longwires

    Scalable up to about 8 cores

    BUS

    p p

    c c

    L2 Cache

    DRAM

    Bus-based Interconnect

  • 8/8/2019 Miller.on Chip Optical Communications

    7/24

  • 8/8/2019 Miller.on Chip Optical Communications

    8/24

    8

    Multicore Programming Trends

    Meshes and small cores solve the physical scalingchallenge, but programming remains a barrier

    Parallelizing applications to thousands of cores is hard

    Task and data partitioning

    Communication becomes critical as latencies increase

    Increasing contention for distant communication

    Degraded performance, higher energy

    Inefficient broadcast-style communication

    Major source of contention Expensive to distribute signal electrically

  • 8/8/2019 Miller.on Chip Optical Communications

    9/24

    9

    Multicore Programming Trends

    For high performance, communication andlocality must be managed

    Tasks and data must be both partitioned and

    placed Analyze communication patterns to minimize latencies

    Place data near the code that needs it most

    Place certain code near critical resources (e.g. DRAM, I/O)

    Dynamic, unpredictable communication isimpossible to optimize

    Orchestrating communication and localityincreases programming difficulty exponentially

  • 8/8/2019 Miller.on Chip Optical Communications

    10/24

    10

    Improving Programmability

    Observations:

    A cheap broadcast communication mechanism

    can make programming easier Enables convenient programming models (e.g., shared

    memory)

    Reduces the need to carefully manage locality

    On-chip optical components enable cheap,energy-efficient broadcast

  • 8/8/2019 Miller.on Chip Optical Communications

    11/24

  • 8/8/2019 Miller.on Chip Optical Communications

    12/24

    12

    Optical Broadcast Network

    Waveguide passesthrough every core

    Multiplewavelengths (WDM)

    eliminatescontention

    Signal reaches allcores in

  • 8/8/2019 Miller.on Chip Optical Communications

    13/24

  • 8/8/2019 Miller.on Chip Optical Communications

    14/24

    14

    Optical bit transmission

    sending core receiving core

    flip-flop flip-flop

    filter

    photodetector

    modulator

    modulator

    driver

    data waveguide

    transimpedanceamplifier

    multi-wavelength source waveguide

    Each core sends data using a different wavelength nocontention

    Data is sent once, any or all cores can receive it efficientbroadcast

  • 8/8/2019 Miller.on Chip Optical Communications

    15/24

    15

    Core-to-core communication

    32-bit data words transmitted across several parallel waveguides

    Each core contains receive filters and a FIFO buffer for everysender

    Data is buffered at receiver until needed by the processing core

    Receiver can screen data by sender (i.e. wavelength) or messagetype

    sending coreA

    receiving coresending core B

    FIFO

    32

    ProcessorCore

    FIF

    OFIFO

    32

    ProcessorCore

    FIF

    O

    FIF

    O

    FIF

    O

    Processor Core

    32 32

  • 8/8/2019 Miller.on Chip Optical Communications

    16/24

    16

    ATAC Bandwidth

    64 cores, 32 lines, 1 Gb/s

    Transmit BW: 64 cores x 1 Gb/s x 32 lines = 2 Tb/s

    Receive-Weighted BW: 2 Tb/s * 63 receivers= 126 Tb/s

    Good metric for broadcast networks reflects WDM

    ATAC allows better utilization of computational

    resources because less time is spent performingcommunication

    S t C biliti d

  • 8/8/2019 Miller.on Chip Optical Communications

    17/24

    17

    System Capabilities andPerformance

    Baseline: Raw Multicore Chip Leading-edge tiled multicore

    64-core system (65nm process) Peak performance: 64 GOPS Chip power: 24 W Theoretical power eff.: 2.7

    GOPS/W Effective performance: 7.3 GOPS Effective power eff: 0.3

    GOPS/W Total system power: 150 W

    ATAC Multicore Chip Future optical interconnect

    multicore

    64-core system (65nm process) Peak performance: 64 GOPS

    Chip power: 25.5 W Theoretical power eff.: 2.5

    GOPS/W Effective performance: 38.0

    GOPS Effective power eff.: 1.5

    GOPS/W

    Total system power: 153 WOptical communications require a smallamount of additional system power but

    allow for much better utilization ofcomputational resources.

  • 8/8/2019 Miller.on Chip Optical Communications

    18/24

    18

    Programming ATAC

    Cores can directly communicate with anyother corein one hop (

  • 8/8/2019 Miller.on Chip Optical Communications

    19/24

    19

    Communication-centric Computing

    BUS

    p p

    c c

    L2 Cache

    ATAC reduces off-chip memory calls, and hence energy and

    latency

    View of extended global memory can be enabled cheaplywith on-chip distributed cache memory and ATAC network

    ATAC

    memory

    Bus-Based

    Multicore

    3pJ

    3pJ

    3pJ

    3pJ

    500pJ

    500pJ

    500pJ

    500pJ

    Operation Energy Latency

    Networktransfer

    3pJ 3 cycles

    ALU addoperation

    2pJ 1 cycle

    32KB cacheread 50pJ 1 cycle

    Off-chipmemoryread

    500pJ 250cycles

  • 8/8/2019 Miller.on Chip Optical Communications

    20/24

    20

    Summary

    ATAC uses optical networks to enable multicoreprogramming and performance scaling

    ATAC encourages communication-centric architecture,which helps multicore performance and power scalability

    ATAC simplifies programming with a contention-free all-to-all broadcast network

    ATAC is enabled by recent advances in CMOS integrationof optical components

  • 8/8/2019 Miller.on Chip Optical Communications

    21/24

    Backup Slides

    What Does the Future Look

  • 8/8/2019 Miller.on Chip Optical Communications

    22/24

    22

    What Does the Future LookLike?

    Corollary of Moores law: Number of cores willdouble every 18 months

    05 08 11 14

    64 256 1024 4096

    02

    16Research

    Industry 16 64 256 10244

    (Cores minimally big enough to run a self respecting

    1K cores by 2014! Are we ready?

  • 8/8/2019 Miller.on Chip Optical Communications

    23/24

    23

    Scaling to 1000 Cores

    Purely optical design scales to about 64 cores

    After that, clusters of cores share optical hubs ENet and BNet move data to/from optical hub

    Dedicated, special-purpose electrical networks

    Proc

    Dir$

    $

    memory

    memory

    64 Optically-Connected ClustersElectrical Networks Connect

    16 Cores to Optical Hub

    ONet

    BNet

    ENet

    HUB

    NET

  • 8/8/2019 Miller.on Chip Optical Communications

    24/24

    24

    ATAC is an Efficient Network

    Modulators are Primary Source of Power Consumption Receive Power: Require only ~2 fJ/bit even with -5dB link loss

    Modulator Power:

    Ge-Si EA design ~75 fJ/bit (assume 50 fJ/bit for modulator driver)

    Example: 64-Core Communication

    (i.e. N = 64 cores = 64 s; for 32 bit word: 2048 drops/core and 32 adds/core) Receive Power: 2 fJ/bit x 1Gbit/s x 32 bits x N2 = 262 W

    Modulator Power: 75 fJ/bit x 1Gbit/s x 32 bits x N = 153 W

    Total energy/bit = 75 fJ/bit + 2 fJ/bit x (N-1) = 201 fJ/bit

    Comparison: Electrical Broadcast Across 64 Cores Require 64 x 150fJ/bit = 10 pJ/bit (~50X more power)

    (Assumes 150fJ/mm/bit, 1-mm spaced tiles)


Recommended