+ All Categories
Transcript
Page 1: QED Effective Post-Silicon Validation and Debug QED€¦ · Use QED check Ra –original register Ra’ –corresponding duplicated register Ra ≠ Ra’ –ERROR DETECTED L2 Bank

Use QED check

Ra – original register

Ra’ – corresponding duplicated register

Ra ≠ Ra’ – ERROR DETECTED

L2Bank 1

QED Effective Post-Silicon Validation and DebugEshan Singh, David Lin, PI: Subhasish Mitra, Robust Systems Group, Stanford University

Post-Silicon Validation Critical Quick Error Detection Quick Error Detection Highlights Symbolic QED

Electrical Bugs

Structured and Effective

109X quicker detection, 4X coverage

Automatically localize logic bugs

No failure reproduction, no simulation

Broadly applicable

Cores, uncore, power management, logic

& electrical, acceleratorsSource: Intel

Post-silicon bug

count

Year

Pre-silicon

verification

inadequate

“Post-silicon cost & complexity rising faster than design cost”

– S. Yerramilli, V.P., Intel

DesignPre-silicon

Verification

Post-silicon

Validation

High

Volume

Fab

Localization Dominates Cost

Detect bugs

Root-cause & fix

Run tests (OS, games)

Debug time:

1-4 weeks per bug

Localize bugs

Long Error Detection Latency Challenge

Localization

Timeline

Error

occurred

Error detection latency

Ideal ~ 1,000 cycles

Reality ~ Billions cycles

Error

detected

Test

execution

Intel® 48-Core SCC

Symbolic QED Results

Fast QED using Hardware Support

QED

Wide variety Diversity

SystematicAutomated

QED family

Tests

QED Test 1

QED Test 2

QED Test N

Original

TestsTest 1

Test 2

Test N

Error detection latency: guaranteed short

Coverage: improved

Software & hardware approaches

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

106X

4X

Software-only QED

no hardware modifications, bugs inside processor cores, bugs inside uncore components, bugs from power-management features

Hybrid QED Non-programmable accelerators, logic bugs and electrical bugs

Symbolic QED Automatically localize logic bugs, no additional hardware

Fast QED 0.4% area overhead, very low runtimes

QED Transformation Examples

Fully automated logic bug localization using

Bounded Model Checking (BMC)

No trace buffers → No area overhead

Effective for large SoCs

No failure reproduction, no simulation

Collaborator: Prof. Clark Barrett (NYU)

Traditional debug Automatic S-QED

Weeks to months 20 mins. to 7 hours

Long bug traces 3- to 22-cycle bug traces

...

Core 1 Core 2

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

Core N

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

A’=A B’=B C’=C

A = B * 2

A’= B’* 2

Check(A==A’)

D’=D E’=E F’=F

G’=G H’=H

E = F * G

E’= F’* G’

Check(E==E’)

H = D + E

H’= D’+ E’

Check(H==H’)

E’=E I’=E

J’=J K’=K

I = E / 2

I’= E’/ 2

Check(I==I’)

Load J ← mem[7 ]

Load J’← mem[7’]

Check(J==J’)

K = J + 1

K’= J’+ 1

Check(K==K’)

Lock(1,’1)

Store mem[1 ] ← C

Store mem[1’] ← C’

Unlock(1,1’)

Lock(5,5’)

Store mem[5 ] ← H

Store mem[5’] ← H’

Unlock(5,5’)

ALL Cores

ALL Threads

<PLC mem[1..N]>

for ALL i,i’

Lock(i)

Lock(i’)

Load X ← mem[i]

Load X’← mem[i’]

Check (X == X’)

Unlock(i’)

Unlock(i)

IEEE TCAD comments (QED paper)

“All reviewers agree this will be a classic paper for years to come.”

“I will personally pay for page charges if you promise to thank me (anonymously) when you win a major award for this paper!”

Intel (Nagib Hakim, PE)

“QED is revolutionary... Intel is in the process of implementing a prototype of QED. This would enable a whole slew of applications.”

AMD (Jeff Rearick, Senior Fellow)

QED: “magical thinking needed” in ETS keynote.

Freescale (Sharad Kumar, Manager)

“We evaluated QED & are adopting in our tools flow for multi-core debug.”

QED is one such promising technique that we have evaluated and are adopting in our tools flow for multi-core debug.

Proactive Load and Check

Control Flow Tracking Using Software Signatures

if ((last_signature == #3) or(last_signature == #4)):

last_signature = #5

else:ERROR_DETECTED!

<Block 5>

CFCSS-V

Block 2

CFCSS-V

CFCSS-V

CFCSS-V

CFCSS-V

Block 3

Block 4

Block 1

Block 5

CFCSS-V Block 5:

ERROR!

Freescale SoC Logic Bug

Error detection latency (cycles)

Original QED

15 Billion 9

Interconnection network

Core 1Core 0 Core NCore 2 Core 3

Random Instruction Test Generator

Shared

Caches

Memory

ControllersAccelerators

Other uncore

components

Error detection latency (cycles)

Cu

mu

lati

ve m

emo

ry b

ugs

det

ecte

d

100 1K 10K 10 Billion

0%

20%

40%

60%

80%

100%

106X

improvedQED

Original test

8-Core Industrial TestQED Med., Max. EDL:392, 3k

Original testMed., Max. EDL:10M, 100M

0%

20%

40%

60%

80%

100%

100 1k 10k 100k 1M 10M >100M

104X

2X

Cu

mu

lati

ve B

ugs

Det

ecte

d

Error detection latency (clock cycles)

Power Management Bugs

0

10k

20k

0 20 100 60 140

PLC-H checkers count

Area cost

0.05% 0.4%

0.05% - 0.4%

area impact

Erro

r d

etec

tio

n l

aten

cy (

cycl

es)

Fast QED

105X quicker detection

2X coverage

No intrusiveness

Runtime: 1.04X – 6X

MBIST reuse

Core, uncore, power management bugs

Uncore Bugs

No boot

Pass

48 processor cores

0.9V, 800 MHz

QED unique detect

QED enhanced detect

QED quick detect

Error detection latency (cycles)

Cu

mu

lati

ve b

ugs

det

ecte

d

100 1k 10k 100k 1M 10M

0%

20%

40%

60%

80%

100%

104X2X

OriginalMed., Max. EDL:241k, 10M

QEDMed., Max. EDL: 675, 8k

Difficult Logic Bugs

QED Techniques

Hybrid QED

Error Detection Latency (cycles)

Co

vera

ge (

per

cen

tage

)

1 10 100 1k 10k 100k 1M 10M0%

20%

40%

60%

80%

100%Hybrid QED: Mean EDL= 705 cycles

OriginalMean EDL =124k cycles

102X

Improved

Accelerator validation and debug

Using high-level synthesis

Collaborator: Prof. Deming Cheng (UIUC)

0%

20%

40%

60%

80%

100%

0 100 1K 10K 100k 1M

Cu

mu

lati

ve b

ugs

det

ecte

d

Bug Trace Length (cycles)>10M

OriginalMin., Mean, Max.: 722, 1.9M, 11M

Symbolic QEDMin., Mean, Max.: 13, 20, 29

106X

2X

BMC ToolAutomaticallyOvernight

1. “Universal” PropertyQED Check + Initial State

Logic Bugs Localized

2. Partial Instances +QED Modules

1. “Universal” Property: QED CheckWhat property should the BMC tool check?

2. Partial InstantiationHow to ensure the design fits in the BMC tool?

CMP Ra == Ra’

QED checks are Compositional

Not design/implementation specific

Preserved across partial instances

Unlike tradition properties

Systematically instantiate only the modules needed to activate the bug

BMC tool finds a bug trace

Core

1

Core

0

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

L2Bank 0

L2Bank 1

L2Bank 2

L2Bank 3

L2Bank 4

L2Bank 5

L2Bank 6

L2Bank 7

Memory

controller 0

Memory

controller 1

Memory

controller 2

Memory

controller 3

I/O

controllers

Crossbar interconnect

Core

0

L2Bank 0

Crossbar

interconnect

Core

0

L2Bank 0

Memory

controller 0

Crossbar

interconnect

Core

1

Core

0

L2Bank 0

Crossbar

interconnect

Memory

controller 0

Reduce InstancesKeep at least 1 core

Run Each

No Trace Found Trace Found Trace FoundBest Localization

Top Related