Download - TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein

May 1, 2013 1

Improving Utilization of Acceleration

Platforms by Using Off-Platform Test

Generation

May 1, 2013

Wisam Kadry, Dmitry Krestyashyn, Arkadiy Morgenshtein,

Amir Nahir, Vitali Sokhin, Elena Tsanko

IBM Research - Haifa

May 1, 2013 2

Outline

Introduction

• Functional verification

• Exercisers for Post-Si validation

• Exercisers on Accelerators (EoA)

Threadmill Overview

• Architecture

• Main features

Offline Generation Mode

• Motivation

• MethodologyResults

• Utilization improvement

• Coverage improvement

Conclusions and Future Work

May 1, 2013 3

Typical Functional Verification Flow

Test

Template

Coverage

Analysis Tool

Coverage

Information

Random

Stimuli

GeneratorTest

Test

Fail

PassDUV

Simulator

Checking,

Assertions

Coverage

Reports

May 1, 2013 4

SoftwareSimulation

Acceleration

Prototyping

Silicon

Speed

Contr

olla

bili

ty a

nd O

bserv

abili

ty

10 1K 100K 10M 1G

Pre and Post Silicon Tradeoffs

May 1, 2013 5

• Run operating-systems and application

– Very limited coverage

– Very little variability

– Hard to debug

• Run test-cases generated by pre-silicon test-generators

– Long generation time implies many servers need to feed one silicon

platform

– Low utilization due to loading time

– Poor solutions for built-in online checking at test level

• Pre-Si checking uses checkers of the simulation platforms, unavailable at Post-Si

• Exercisers

Post Silicon Validation Alternatives

May 1, 2013 6

Exercisers: Post Silicon Validation Tools

Exerciser - program that runs on a testing environment (accelerator

or/and silicon) and “exercises” the design by testing interesting scenarios on it

May 1, 2013 7

Exerciser requirements � Include a random stimuli generation component (as in pre-silicon)

� Valid stimuli

� Adhere to user requests

� High quality stimuli

� Generate many test-cases from the same test-template

� Simple and fast

� Can run on early bring-up silicon

� Eases debugging

� Increases platform utilization

� Self-contained

� Minimal interaction with the environment

� Loaded once on the DUV, runs “forever”

� Bare-Metal

� Contains OS services required by the test-cases

� Enables complete machine control

May 1, 2013 8

Threadmill: IBM Post-Silicon Exerciser

Test Template

System

Configuration

Architectural Model &Testing

Knowledge

Generator&

Kernel

Generation

Checking

Execution

OS services

Test Template

Topology

ArchitecturalModel

Exerciser Image

Test Template

Topology

ArchitecturalModel

Test Template

Topology

ArchitecturalModel

Exerciser Image

Builder

Test Template

Test Template

Silicon

Accelerator

Reference

Model

May 1, 2013 9

Def language for test-templates:

� Rich language to describe the test-plan scenarios

� Multi-threaded support (each thread with its own scenario)

Checking:

� Multi pass checking: comparing values of architectural resources (GPRs, SPRs, memory) between different executions of the same test-case

� Variability originates from changes to the state of the design

� Timing variations in multithreaded processing

� Randomization of uArch modes of the processors – thread priority, internal control modes

� Variations in pipeline and cache states

� User ability to specify self checking as part of the test-case

Threadmill - Main Features

May 1, 2013 10

Generation:

�Concurrent multi-threaded generation

� Light-weight, on-platform

� Static: no reference model and no state tracking

�Very fast :100s of tests per second on silicon

�Utilization: 90% generation, 10% execution and checking

Threadmill - Main Features

May 1, 2013 11

� Large number of processors, each of which simulates a small portion of the

design and pass the results between them

� Processors running in parallel, allowing high execution performance

� Orders of magnitude faster than simulation

� Allow good observability and coverage analysis

� Allow tests execution of billions of cycles at pre-Si stage

� The platform used extensively and simultaneously by multiple projects and

locations

� High cost and limited resources dictate request for utilization efficiency

Accelerators

May 1, 2013 12

Exercisers on Accelerator

� Motivation:

� Verification of early design models – more cycles, longer tests than in simulation

� Debug at bring-up stage (better observability than Si, higher speed than simulation)

� Utilization of failure event checkers, available only on Accelerator

� SW validation

� Test quality analysis – coverage (count, specific functions hit)

� Challenges:

� High system cost and limited resource availability dictate a need for utilization

efficiency improvement

� Tests ran by the exercisers should target coverage maximization within constrains of

limited resources

� Proposed approach – Off-Platform Generation

May 1, 2013 13

Threadmill Offline Generation Mode

Execution

Checking

TC1RES

t0

Generation

TC10RES

t0

Execution

Checking

TC1RES

t0

Generation

Execution

Checking

TC1RES

t0

Generation

Execution

New Image

Checking

TC1Results

Accelerator

Generation

TC10Results

Generation

Checking

Execution

OS services

Test Template

Topology

ArchitecturalModel

Exerciser Image

Test Template

Topology

ArchitecturalModel

Test Template

Topology

ArchitecturalModel

Exerciser Image

Test Template

Generator& Kernel

Builder

Architectural

Model

Reference

ModelConfiguration

May 1, 2013 14

Threadmill Offline Generation Mode

• Create image with generator component enabled

– Include empty data structures for the test-cases, memory initializations,

translation tables and expected results

• Run the post-silicon application on a software reference model

• Extract the necessary data of test-cases, memory and results from the run

on a software reference model

– Fill data structures with all the data

• Produce an image that includes all harvested data.

– Disable the generator component

• Load the image to the acceleration platform

• Run the image without the overhead associated with the generation of

test-cases and initializations.

May 1, 2013 15

Offline vs. Regular Generation

Pro’s

• No cycles “waste” for on-platform generation

• More test cases can be ran for same number of cycles

• Higher test coverage can be expected

• Comparison with SW reference model may reveal 2+2=5 bugs

Con’s

• Depends on a reference model

• Big-size image loading influences number of test cases

May 1, 2013 16

Experimental Setup

• Two example test templates used as benchmarks:

– Random: 100 random instructions

– Directed: some threads perform load/stores; other threads run

functional scenario

• For each test template 3 images were prepared:

– Regular mode

– Offline mode with 50 test-cases

– Offline mode with 100 test-cases

May 1, 2013 17

1.35 M1.3 M4.8 MCycles per test-case

10050124Num of test-cases

135 M65 M595 MTotal Accelerator

cycles

44.3 MB23.7 MB3.5 MBImage size

15.8 min8 min0.6 minTime to prepare

image

Offline mode 100 TCOffline mode 50 TCRegular mode

Accelerator utilization improvement: x3.7

Results – Random Test

May 1, 2013 18

1.45 M1.4 M7 MCycles per test-case

1005042Num of test-cases

145 M70 M295 MTotal Accelerator

cycles

45.9 MB24.6 MB3.7 MBImage size

17.9 min10.2 min0.7 minTime to prepare

image

Offline mode 100 TCOffline mode 50 TCRegular mode

Accelerator utilization improvement: x5

Results – Directed Test

May 1, 2013 19

Coverage Comparison

• About 50,000 coverage events are analyzed in the Accelerator model

• A test of a new special feature of the next Power design was selected for coverage comparison

• Only events related to the specific functionality were analyzed• Exerciser code does not use the analyzed feature - less coverage “noise”

• Number of covered events (out of 310 analyzed events):• Offline – 237

• Regular – 209

• Total count of hits of all events:• Offline – 117,020

• Regular – 56,708

May 1, 2013 20

1

10

100

1000

10000

100000

coverage events

# h

its

hitCounter_offline

hitCounter_regular

Coverage Comparison

Events hit only by OfflineOffline achieves more hits

for most events

May 1, 2013 21

Conclusions and Future Work

• More TCs – higher chance of triggering various scenarios

• Improved coverage

• Quality assessment of test content that is later used at bring-up

• The Offline generation concept may be used in future as basis for a dedicated tool for Accelerator-based verification

May 1, 2013 22

References

• A. Adir, S. Copty, S. Landa, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann, “A unified methodology for pre-silicon verification and post-silicon validation” – DATE 2011

• A. Adir, M. Golubev, S. Landa, A. Nahir, G. Shurek, V. Sokhin, A. Ziv, “Threadmill: A post-silicon exerciser for multi-threaded processors” –DAC 2011

• A. Adir, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann, “Leveraging pre-silicon verification resources for the post-silicon validation of the IBM POWER7 processor” – DAC 2011

May 1, 2013 23

May 1, 2013 24

Thank You!