May 1, 2013 1
Improving Utilization of Acceleration
Platforms by Using Off-Platform Test
Generation
May 1, 2013
Wisam Kadry, Dmitry Krestyashyn, Arkadiy Morgenshtein,
Amir Nahir, Vitali Sokhin, Elena Tsanko
IBM Research - Haifa
May 1, 2013 2
Outline
Introduction
• Functional verification
• Exercisers for Post-Si validation
• Exercisers on Accelerators (EoA)
Threadmill Overview
• Architecture
• Main features
Offline Generation Mode
• Motivation
• MethodologyResults
• Utilization improvement
• Coverage improvement
Conclusions and Future Work
May 1, 2013 3
Typical Functional Verification Flow
Test
Template
Coverage
Analysis Tool
Coverage
Information
Random
Stimuli
GeneratorTest
Test
Fail
PassDUV
Simulator
Checking,
Assertions
Coverage
Reports
May 1, 2013 4
SoftwareSimulation
Acceleration
Prototyping
Silicon
Speed
Contr
olla
bili
ty a
nd O
bserv
abili
ty
10 1K 100K 10M 1G
Pre and Post Silicon Tradeoffs
May 1, 2013 5
• Run operating-systems and application
– Very limited coverage
– Very little variability
– Hard to debug
• Run test-cases generated by pre-silicon test-generators
– Long generation time implies many servers need to feed one silicon
platform
– Low utilization due to loading time
– Poor solutions for built-in online checking at test level
• Pre-Si checking uses checkers of the simulation platforms, unavailable at Post-Si
• Exercisers
Post Silicon Validation Alternatives
May 1, 2013 6
Exercisers: Post Silicon Validation Tools
Exerciser - program that runs on a testing environment (accelerator
or/and silicon) and “exercises” the design by testing interesting scenarios on it
May 1, 2013 7
Exerciser requirements � Include a random stimuli generation component (as in pre-silicon)
� Valid stimuli
� Adhere to user requests
� High quality stimuli
� Generate many test-cases from the same test-template
� Simple and fast
� Can run on early bring-up silicon
� Eases debugging
� Increases platform utilization
� Self-contained
� Minimal interaction with the environment
� Loaded once on the DUV, runs “forever”
� Bare-Metal
� Contains OS services required by the test-cases
� Enables complete machine control
May 1, 2013 8
Threadmill: IBM Post-Silicon Exerciser
Test Template
System
Configuration
Architectural Model &Testing
Knowledge
Generator&
Kernel
Generation
Checking
Execution
OS services
Test Template
Topology
ArchitecturalModel
Exerciser Image
Test Template
Topology
ArchitecturalModel
Test Template
Topology
ArchitecturalModel
Exerciser Image
Builder
Test Template
Test Template
Silicon
Accelerator
Reference
Model
May 1, 2013 9
Def language for test-templates:
� Rich language to describe the test-plan scenarios
� Multi-threaded support (each thread with its own scenario)
Checking:
� Multi pass checking: comparing values of architectural resources (GPRs, SPRs, memory) between different executions of the same test-case
� Variability originates from changes to the state of the design
� Timing variations in multithreaded processing
� Randomization of uArch modes of the processors – thread priority, internal control modes
� Variations in pipeline and cache states
� User ability to specify self checking as part of the test-case
Threadmill - Main Features
May 1, 2013 10
Generation:
�Concurrent multi-threaded generation
� Light-weight, on-platform
� Static: no reference model and no state tracking
�Very fast :100s of tests per second on silicon
�Utilization: 90% generation, 10% execution and checking
Threadmill - Main Features
May 1, 2013 11
� Large number of processors, each of which simulates a small portion of the
design and pass the results between them
� Processors running in parallel, allowing high execution performance
� Orders of magnitude faster than simulation
� Allow good observability and coverage analysis
� Allow tests execution of billions of cycles at pre-Si stage
� The platform used extensively and simultaneously by multiple projects and
locations
� High cost and limited resources dictate request for utilization efficiency
Accelerators
May 1, 2013 12
Exercisers on Accelerator
� Motivation:
� Verification of early design models – more cycles, longer tests than in simulation
� Debug at bring-up stage (better observability than Si, higher speed than simulation)
� Utilization of failure event checkers, available only on Accelerator
� SW validation
� Test quality analysis – coverage (count, specific functions hit)
� Challenges:
� High system cost and limited resource availability dictate a need for utilization
efficiency improvement
� Tests ran by the exercisers should target coverage maximization within constrains of
limited resources
� Proposed approach – Off-Platform Generation
May 1, 2013 13
Threadmill Offline Generation Mode
Execution
Checking
TC1RES
t0
Generation
TC10RES
t0
Execution
Checking
TC1RES
t0
Generation
Execution
Checking
TC1RES
t0
Generation
Execution
New Image
Checking
TC1Results
Accelerator
Generation
TC10Results
Generation
Checking
Execution
OS services
Test Template
Topology
ArchitecturalModel
Exerciser Image
Test Template
Topology
ArchitecturalModel
Test Template
Topology
ArchitecturalModel
Exerciser Image
Test Template
Generator& Kernel
Builder
Architectural
Model
Reference
ModelConfiguration
May 1, 2013 14
Threadmill Offline Generation Mode
• Create image with generator component enabled
– Include empty data structures for the test-cases, memory initializations,
translation tables and expected results
• Run the post-silicon application on a software reference model
• Extract the necessary data of test-cases, memory and results from the run
on a software reference model
– Fill data structures with all the data
• Produce an image that includes all harvested data.
– Disable the generator component
• Load the image to the acceleration platform
• Run the image without the overhead associated with the generation of
test-cases and initializations.
May 1, 2013 15
Offline vs. Regular Generation
Pro’s
• No cycles “waste” for on-platform generation
• More test cases can be ran for same number of cycles
• Higher test coverage can be expected
• Comparison with SW reference model may reveal 2+2=5 bugs
Con’s
• Depends on a reference model
• Big-size image loading influences number of test cases
May 1, 2013 16
Experimental Setup
• Two example test templates used as benchmarks:
– Random: 100 random instructions
– Directed: some threads perform load/stores; other threads run
functional scenario
• For each test template 3 images were prepared:
– Regular mode
– Offline mode with 50 test-cases
– Offline mode with 100 test-cases
May 1, 2013 17
1.35 M1.3 M4.8 MCycles per test-case
10050124Num of test-cases
135 M65 M595 MTotal Accelerator
cycles
44.3 MB23.7 MB3.5 MBImage size
15.8 min8 min0.6 minTime to prepare
image
Offline mode 100 TCOffline mode 50 TCRegular mode
Accelerator utilization improvement: x3.7
Results – Random Test
May 1, 2013 18
1.45 M1.4 M7 MCycles per test-case
1005042Num of test-cases
145 M70 M295 MTotal Accelerator
cycles
45.9 MB24.6 MB3.7 MBImage size
17.9 min10.2 min0.7 minTime to prepare
image
Offline mode 100 TCOffline mode 50 TCRegular mode
Accelerator utilization improvement: x5
Results – Directed Test
May 1, 2013 19
Coverage Comparison
• About 50,000 coverage events are analyzed in the Accelerator model
• A test of a new special feature of the next Power design was selected for coverage comparison
• Only events related to the specific functionality were analyzed• Exerciser code does not use the analyzed feature - less coverage “noise”
• Number of covered events (out of 310 analyzed events):• Offline – 237
• Regular – 209
• Total count of hits of all events:• Offline – 117,020
• Regular – 56,708
May 1, 2013 20
1
10
100
1000
10000
100000
coverage events
# h
its
hitCounter_offline
hitCounter_regular
Coverage Comparison
Events hit only by OfflineOffline achieves more hits
for most events
May 1, 2013 21
Conclusions and Future Work
• More TCs – higher chance of triggering various scenarios
• Improved coverage
• Quality assessment of test content that is later used at bring-up
• The Offline generation concept may be used in future as basis for a dedicated tool for Accelerator-based verification
May 1, 2013 22
References
• A. Adir, S. Copty, S. Landa, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann, “A unified methodology for pre-silicon verification and post-silicon validation” – DATE 2011
• A. Adir, M. Golubev, S. Landa, A. Nahir, G. Shurek, V. Sokhin, A. Ziv, “Threadmill: A post-silicon exerciser for multi-threaded processors” –DAC 2011
• A. Adir, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann, “Leveraging pre-silicon verification resources for the post-silicon validation of the IBM POWER7 processor” – DAC 2011
May 1, 2013 23
May 1, 2013 24
Thank You!