Date post: | 23-Jun-2015 |
Category: |
Documents |
Upload: | raymond-kung |
View: | 169 times |
Download: | 0 times |
©2011 ET International, Inc
ETI SCC Baremetal FrameworkBandwidth and Power Findings
Rishi Khan3/30/11
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alOutline
• SCC Framework Overview• Bandwidth Findings• Power Findings• Software Access
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSCC Framework Overview
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alMessaging Goals
• Asynchronous Communications• Single Threaded• Possibly Long Latency until data is received• Maximize bandwidth• Handle big and small messages• Extensible layer that supports MPI, BSD
sockets, etc
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alDesign Choices
• One channel per core-pair per direction• Large window size (up to 1MB/channel)• Fast polling of incoming data (use MPB)• Circular buffer with 16 slots and read/write pointers• Poll local pointers, signal remote pointers• Use separate cache lines to avoid locking
2 cache lines * 48 channels = 3K per core
• Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alCircular Buffer Example
Core 0 (reader)
Cache
MPB
Channel->local_read
Channel->mpb_write
Core 1 (writer)
Cache
MPB
Channel->local_write
Channel->mpb_read
DRAMChannel->body[]
Is there space?
Write the data (with length as first 2 bytes)
Upd
ate
writ
e po
inte
r
Pol
l loc
al w
rite
poi
nter
Read data
UpdateRead
Pointer
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSocket API
• int stream_recv(int nid, void *buf, size_t len, int nb);• int stream_send(int nid, const void *buf, size_t len);
0
20
40
60
80
100
120
Intel RCCE
ETI Streams (DRAM, Blocking)
ETI Streams (MPBs, Blocking)
ETI Streams (MPBs, Non-Blocking)
Message Size (bytes)
Mes
sag
ing
Ban
dw
idth
(M
B/s
ec)
L1
L2
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alMPI
0
20
40
60
80
100
120
Linux MPI (Intel, Blocking, TCP)
Baremetal MPI (ETI, Blocking)
Baremetal MPI (ETI, Non-blocking)
RCKMPI
Message Size (bytes)
Mes
sag
ing
Ban
dw
idth
(M
B/s
ec)
L1
L2
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alPower Goals
• External monitoring of voltage and current• Backend Power API
Update time functions with frequency changesKeep chip under safe conditions!!
• Internal synchronization of clocks• External synchronization of host and SCC
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alExternal Monitoring
• Read /opt/sccKit/systemSettings.ini• Telnet BMC 5010• Request Status / Parse Data• Store timestamps
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alBackend Power API
• power_session scc_open_power(heap h);• void scc_close_power(power_session ps);• int scc_set_freq(power_session ps, u32 requested_frequency);• int scc_set_voltage(power_session ps, u32 requested_millivolts);• char* scc_error_string(status_code code);
100 106 114 123 133 145 160 178 200 266 320 400 533 800
0.7 X X X X X X X X X X X X
0.8 X X X X X X X X X X X X X
0.9 X X X X X X X X X X X X X
1.0 X X X X X X X X X X X X X
1.1 X X X X X X X X X X X X X X
1.2 X X X X X X X X X X X X X X
1.3 X X X X X X X X X X X X X X
Allowable Frequency
Vol
tage
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alInternal Synchronization
• Cores come out of sccReset in 20ms intervals• Each core’s clock starts at cycle 0 at reset• Each core’s frequency may be different• Solution:
Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alFormulas for Time
• Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles;
• Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle) *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/
(double)freq_divider);
…
Inte
gral
Tim
e
Freq
scc_cycle_count()
_integral_cycle
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSyncing Front/Back
• Change voltage from 0.7 to 1.1 every 1 second• Measure changes on frontend• Cannot get better than 0.5 seconds
0 2 4 6 8 10 12 14 16 18 2020
20.5
21
21.5
22
22.5
Amps
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
Residuals
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alBug in BMC Voltage Readings
• 3 power islands• Drop voltage from 1.2 to 0.7 immediately• Raise Voltage after 20 seconds
0 5 10 15 20 25 30 35 400.7
0.8
0.9
1
1.1
1.2
Voltage
V0
V1
V2
Time
Vo
ltag
e
0 5 10 15 20 25 30 35 4017
18
19
20
21
22
23
Amps
Time
Am
ps
20.5 Seconds0.6 Seconds
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alOther SCC issues
• If more than 24 cores pound on one MPB, contention overtakes system.Sleep required between polling
• Allowable Voltage/freq are chip specific• BMC telnet response is > 100ms
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alFuture Work
• DARPA UHPC: Study how voltage/freq affect power dissipation
• Allan Snavely (UCSD)Systematically study loops over a number of
parameters to find the best voltage/freq.Create formulas to approximate good power
settings for unknown loops
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alAccess to Software
• Email [email protected] • Beta available• Considering open sourcing SCC-specific
portions of our work for others to test/learn/improve
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alAcknowledgements
• Mark Deazley (ETI)• Eric Hoffman (ETI)• Allan Snavely (UCSD)• Intel:
Tim MattsonTed KubaskaRob NoradkiWilf Pinfold, Shekhar Borkar (UHPC)