7 eti pres

Post on 23-Jun-2015

169 views 0 download

Tags:

description

SCC ETI Power

transcript

©2011 ET International, Inc

ETI SCC Baremetal FrameworkBandwidth and Power Findings

Rishi Khan3/30/11

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alOutline

• SCC Framework Overview• Bandwidth Findings• Power Findings• Software Access

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alSCC Framework Overview

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alMessaging Goals

• Asynchronous Communications• Single Threaded• Possibly Long Latency until data is received• Maximize bandwidth• Handle big and small messages• Extensible layer that supports MPI, BSD

sockets, etc

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alDesign Choices

• One channel per core-pair per direction• Large window size (up to 1MB/channel)• Fast polling of incoming data (use MPB)• Circular buffer with 16 slots and read/write pointers• Poll local pointers, signal remote pointers• Use separate cache lines to avoid locking

2 cache lines * 48 channels = 3K per core

• Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alCircular Buffer Example

Core 0 (reader)

Cache

MPB

Channel->local_read

Channel->mpb_write

Core 1 (writer)

Cache

MPB

Channel->local_write

Channel->mpb_read

DRAMChannel->body[]

Is there space?

Write the data (with length as first 2 bytes)

Upd

ate

writ

e po

inte

r

Pol

l loc

al w

rite

poi

nter

Read data

UpdateRead

Pointer

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alSocket API

• int stream_recv(int nid, void *buf, size_t len, int nb);• int stream_send(int nid, const void *buf, size_t len);

0

20

40

60

80

100

120

Intel RCCE

ETI Streams (DRAM, Blocking)

ETI Streams (MPBs, Blocking)

ETI Streams (MPBs, Non-Blocking)

Message Size (bytes)

Mes

sag

ing

Ban

dw

idth

(M

B/s

ec)

L1

L2

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alMPI

0

20

40

60

80

100

120

Linux MPI (Intel, Blocking, TCP)

Baremetal MPI (ETI, Blocking)

Baremetal MPI (ETI, Non-blocking)

RCKMPI

Message Size (bytes)

Mes

sag

ing

Ban

dw

idth

(M

B/s

ec)

L1

L2

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alPower Goals

• External monitoring of voltage and current• Backend Power API

Update time functions with frequency changesKeep chip under safe conditions!!

• Internal synchronization of clocks• External synchronization of host and SCC

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alExternal Monitoring

• Read /opt/sccKit/systemSettings.ini• Telnet BMC 5010• Request Status / Parse Data• Store timestamps

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alBackend Power API

• power_session scc_open_power(heap h);• void scc_close_power(power_session ps);• int scc_set_freq(power_session ps, u32 requested_frequency);• int scc_set_voltage(power_session ps, u32 requested_millivolts);• char* scc_error_string(status_code code);

100 106 114 123 133 145 160 178 200 266 320 400 533 800

0.7 X X X X X X X X X X X X

0.8 X X X X X X X X X X X X X

0.9 X X X X X X X X X X X X X

1.0 X X X X X X X X X X X X X

1.1 X X X X X X X X X X X X X X

1.2 X X X X X X X X X X X X X X

1.3 X X X X X X X X X X X X X X

Allowable Frequency

Vol

tage

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alInternal Synchronization

• Cores come out of sccReset in 20ms intervals• Each core’s clock starts at cycle 0 at reset• Each core’s frequency may be different• Solution:

Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alFormulas for Time

• Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles;

• Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle) *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/

(double)freq_divider);

Inte

gral

Tim

e

Freq

scc_cycle_count()

_integral_cycle

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alSyncing Front/Back

• Change voltage from 0.7 to 1.1 every 1 second• Measure changes on frontend• Cannot get better than 0.5 seconds

0 2 4 6 8 10 12 14 16 18 2020

20.5

21

21.5

22

22.5

Amps

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

Residuals

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alBug in BMC Voltage Readings

• 3 power islands• Drop voltage from 1.2 to 0.7 immediately• Raise Voltage after 20 seconds

0 5 10 15 20 25 30 35 400.7

0.8

0.9

1

1.1

1.2

Voltage

V0

V1

V2

Time

Vo

ltag

e

0 5 10 15 20 25 30 35 4017

18

19

20

21

22

23

Amps

Time

Am

ps

20.5 Seconds0.6 Seconds

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alOther SCC issues

• If more than 24 cores pound on one MPB, contention overtakes system.Sleep required between polling

• Allowable Voltage/freq are chip specific• BMC telnet response is > 100ms

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alFuture Work

• DARPA UHPC: Study how voltage/freq affect power dissipation

• Allan Snavely (UCSD)Systematically study loops over a number of

parameters to find the best voltage/freq.Create formulas to approximate good power

settings for unknown loops

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alAccess to Software

• Email scc-support@etinternational.com • Beta available• Considering open sourcing SCC-specific

portions of our work for others to test/learn/improve

Copyright 2011 ET International, Inc.

ET

Inte

rnat

ion

alAcknowledgements

• Mark Deazley (ETI)• Eric Hoffman (ETI)• Allan Snavely (UCSD)• Intel:

Tim MattsonTed KubaskaRob NoradkiWilf Pinfold, Shekhar Borkar (UHPC)