Multidimensional Performance Modeling for Advanced Embedded Signal Processors Michael Stebnisky Carl...

Post on 27-Dec-2015

228 views 0 download

transcript

Multidimensional Performance Modeling Multidimensional Performance Modeling for Advanced Embedded Signal for Advanced Embedded Signal

ProcessorsProcessors

Michael StebniskyMichael StebniskyCarl HeinCarl Hein

Lockheed Martin Advanced Technology LaboratoriesLockheed Martin Advanced Technology Laboratories1 Federal Street • A&E Building 2W1 Federal Street • A&E Building 2W

Camden, New Jersey 08102Camden, New Jersey 08102mstebnis@atl.lmco.commstebnis@atl.lmco.com

Systems IntegrationSystems Integration

22

Multidimensional Performance ModelingMultidimensional Performance Modeling

• Problem: Problem: — Traditional performance modeling approaches \are unable to address Traditional performance modeling approaches \are unable to address

emerging requirements and component technologies. This is a result emerging requirements and component technologies. This is a result of an increased awareness and need for dynamically adaptive or of an increased awareness and need for dynamically adaptive or reconfigurable systems, particularly in the area of power reconfigurable systems, particularly in the area of power dissipation/performance.dissipation/performance.

• Goal(s)/Objectives(s)Goal(s)/Objectives(s)— Define methods/algorithms to acurately model and optimize Define methods/algorithms to acurately model and optimize

reconfigurable architectures and functions (services) required to reconfigurable architectures and functions (services) required to support multidimensional performance modeling.support multidimensional performance modeling.

— Apply ideas developed from InfoPad, ACS, PAC/C, DARES, PCA, and Apply ideas developed from InfoPad, ACS, PAC/C, DARES, PCA, and MSP to develop a uniqe new rapid prototyping/optimization capability.MSP to develop a uniqe new rapid prototyping/optimization capability.

• ApproachApproach— Define features required to support accurate performance and Define features required to support accurate performance and

multidimensional modeling and optimization of DRAs. multidimensional modeling and optimization of DRAs. — Evaluate algorithms/methods for performing intelligent, reactive Evaluate algorithms/methods for performing intelligent, reactive

dynamic scheduling.dynamic scheduling.— Evaluate algorithms/methods for performing offline analysis, data Evaluate algorithms/methods for performing offline analysis, data

reduction, pattern recognition, and execution planning. reduction, pattern recognition, and execution planning.

DoD missions/systems require new approaches/ tools DoD missions/systems require new approaches/ tools to exploit emerging to exploit emerging reconfigurable technologies to form polymorphous/power reconfigurable technologies to form polymorphous/power aware systems.aware systems.

33

A Methodology for Verifiable, Validatable Polymorphic Architectures

DRA Optimization Process Flow

RuntimeSystemModel

Parameterized Resource

Executables

Offline Optimization:

Speed

Latency Power Energy

QoS ...

DRAObjectLibrary Runtime

Optim-ization: Speed

Latency Power Energy

QoS ...

ApplicationFlowgraphs

PreprocessedSchedulingInformation

Schedule/Component

Analysis/Feedback to Optimization

Characteristics

Est

imat

es

Lo

cal

Des

ign

s

CO

TS

Testcase/ Scenario Testbench

Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures

• Based upon a two phase Based upon a two phase optimization process - offline optimization process - offline and onlineand online

• Offline optimizationOffline optimization— Overall goal: perform Overall goal: perform

component selection, pruning, component selection, pruning, data extraction and sensitivity data extraction and sensitivity analysis that will maximize the analysis that will maximize the effectiveness of the online effectiveness of the online optimizationoptimization

— Selects an optimum set of Selects an optimum set of verified, validated components verified, validated components from existing librariesfrom existing libraries

— Facilitates analysis to identify Facilitates analysis to identify potential implementations more potential implementations more optimal to the applicationoptimal to the application

— Identifies dependencies/relationships Identifies dependencies/relationships in the data flow graph that will facilitate online schedulingin the data flow graph that will facilitate online scheduling

• Online optimizationOnline optimization— Components are selected for execution at runtime based upon dynamic figures of meritComponents are selected for execution at runtime based upon dynamic figures of merit— Figures of merit are complex functions of component characteristicsFigures of merit are complex functions of component characteristics

44

Key Aspects of ApproachKey Aspects of Approach• Dealing with overwhelming flexibilityDealing with overwhelming flexibility

— Management of complexity and Management of complexity and methods of abstraction/simplificationmethods of abstraction/simplification

• Two stage optimizationTwo stage optimization— Offline (component and subsystem) Offline (component and subsystem)

and online (scheduling)and online (scheduling)

• Methodology for generating/optimizing/ Methodology for generating/optimizing/ representing components and representing components and reconfigurable devices reconfigurable devices — Develop operating points, analyze, improve, and Develop operating points, analyze, improve, and

abstractabstract

• Dynamic scheduling with multidimensional Dynamic scheduling with multidimensional goals/constraintsgoals/constraints— ““Toolkit” approach using offline extracted informationToolkit” approach using offline extracted information

• Offline information extraction and optimization of schedulingOffline information extraction and optimization of scheduling— Analysis/characterization/abstraction of tasks/graphsAnalysis/characterization/abstraction of tasks/graphs

• Constraint/goal simplification using modesConstraint/goal simplification using modes— Abstraction method minimizing computation and enforcing interfaceAbstraction method minimizing computation and enforcing interfacess

• Reuse of existing capability to perform the required analysis and data visualizationReuse of existing capability to perform the required analysis and data visualization— Use the internally developed CSIM Use the internally developed CSIM

55

Graphical Hierarchical Hardware Architecture Definition

Graphical Hierarchical Software Architecture Definition

Produces Animated and Static Simulation Timeline Displays as well as processor, communication, and

memory statistical data

Modify

Modify

Model of Hardware

BehaviorsPE SE ME

BehaviorsPE SE ME

Behaviors

PE SE ME

Structure:Network Topology

Structure:Network Topology

SW Data Flow GraphDFG

FFT

FIR FFTVMULVMULVMUL

DFG

FFT

FIR FFTVMULVMULVMUL

Application SW for PEsBoard 1/ PE1 Recv 1024

Compute 3.2 us Send 512,2 Send 512,3 Recv 1024

Schedule

Map / Allocate

Partition

Analysis

HW / SWCo-SimulationVisualization &

Animation

Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/

CSIM — CSIM — HW/SW Virtual Prototyping HW/SW Virtual Prototyping Capabilities and FeaturesCapabilities and Features

• CSIM has demonstrated the ability to CSIM has demonstrated the ability to model complex signal processor model complex signal processor network behavior and performancenetwork behavior and performance

• CSIM is a C-based, hierarchical CSIM is a C-based, hierarchical simulation environment for modeling simulation environment for modeling system, subsystem, and individual system, subsystem, and individual module/component HW/SW module/component HW/SW performanceperformance

• CSIM’s independent use of HW/SW CSIM’s independent use of HW/SW models provides a path for reducing models provides a path for reducing development costs as well as the life development costs as well as the life cycle costscycle costs

• CSIM provides a common environment CSIM provides a common environment for developing and verifying system, for developing and verifying system, subsystem, and module HW/SW subsystem, and module HW/SW interfaces and overall system interfaces and overall system performanceperformance

66

Design space resulting from evaluating a Design space resulting from evaluating a representative signal processing applicationrepresentative signal processing application

Automated Support to Offline Analysis/ Automated Support to Offline Analysis/ OptimizationOptimization

• Each task/component (i.e. Each task/component (i.e. fft, fir) is abstracted by fft, fir) is abstracted by several parameters (i.e. several parameters (i.e. throughput, latency, power,throughput, latency, power,size, etc.)size, etc.)

• Each task may have severalEach task may have severaldifferently optimized differently optimized implementationsimplementations

• An online library of An online library of implementations is selectedimplementations is selected

• The application(s) of interest The application(s) of interest are simulated using a subset are simulated using a subset of the available componentsof the available components

• The results of simulating The results of simulating each subset is characterized each subset is characterized with respect to the parameters of interestwith respect to the parameters of interest

• The characterizations are used to identify a subset of implementations that are most The characterizations are used to identify a subset of implementations that are most optimal across the parameters of interestoptimal across the parameters of interest

77

Processor Architecture and EnvironmentProcessor Architecture and Environment

This graphic illustrates a This graphic illustrates a representative processor representative processor architecture and an associated architecture and an associated CSIM visualization environment. CSIM visualization environment. The core CSIM capability has been The core CSIM capability has been augmented with a dynamic augmented with a dynamic scheduler that can apply complex scheduler that can apply complex selection criteria to the scheduling selection criteria to the scheduling process. In this example, these process. In this example, these complex selection criteria are complex selection criteria are encapsulated as several “modes”; encapsulated as several “modes”; the buttons on the lower right are the buttons on the lower right are used to interactively switch used to interactively switch between the modes. In addition, between the modes. In addition, any of the processors can be any of the processors can be “killed” or returned to the “killed” or returned to the simulation, facilitating the simulation, facilitating the evaluation of failover and similar evaluation of failover and similar analyses. Illustrative results are analyses. Illustrative results are shown in the following slides.shown in the following slides.

88

Power-driven scheduling; “LoPwr”

Complex criteria driven scheduling, “DDPSWC”

Assigned task scheduling, “Base”

Assigned speed scheduling, “Fast”

Max speed-scheduling, “Fastest”

Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures• Real time on-demand Real time on-demand

dynamic allocation of dynamic allocation of computational resources computational resources

• Each task/component Each task/component (i.e. fft, fir) is abstracted (i.e. fft, fir) is abstracted by several parameters by several parameters (i.e. throughput, latency, (i.e. throughput, latency, power, size, etc.)power, size, etc.)

• Each task may have Each task may have several differently several differently optimized implementationsoptimized implementations

• An online library of An online library of implementations is selectedimplementations is selected

• At run time the At run time the implementation best implementation best matched to the current matched to the current selection criteria is executedselection criteria is executed

• For this example, the maximum For this example, the maximum throughput to minimum power throughput to minimum power dissipation ratio is improved >10X versus standard practicedissipation ratio is improved >10X versus standard practice

SWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practice

99

Processors faulted/

switched out

Processors returned to operation

Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures

• Validation and verification of Validation and verification of next generation processor next generation processor architecturesarchitectures

• Any processor may be Any processor may be temporarily or permanently temporarily or permanently removed from the architectureremoved from the architecture

• Processors may be returned Processors may be returned to the architectureto the architecture

• Facilitates development and Facilitates development and analysis of failover/fault analysis of failover/fault adaptation methodologiesadaptation methodologies

• Extends multidimensional Extends multidimensional optimization to architectures optimization to architectures with intermittently inactive with intermittently inactive components (especially components (especially useful for throughput/ power useful for throughput/ power trades)trades)

1010

Other Applications of CSIMOther Applications of CSIM

ProgramProgram

VNSVNS

AMRFSAMRFS

Advanced Advanced Surface Surface Ship ModelShip Model

Avionics Avionics Mission Mission Computing Computing SystemSystem

Wireless Wireless Mobile Mobile NetworksNetworks

DescriptionDescription

Network attack Network attack simulator for training simulator for training IA operatorsIA operators

Agent-based resource Agent-based resource managementmanagement

Computing Computing infrastructure modelinfrastructure model

Model of JSF-Model of JSF-Integrated Core Integrated Core ProcessorProcessor

Dynamic ad hoc Dynamic ad hoc routing behaviorsrouting behaviors

CustomerCustomer

US Army US Army CECOMCECOM

ONRONR

LM NE&SS- LM NE&SS- MoorestownMoorestown

LM NE&SS- LM NE&SS- EganEgan

DARPA DARPA (XGComms) (XGComms) US Army US Army CECOM (WIN-T)CECOM (WIN-T)

Key TechnologyKey Technology

Token-based performance Token-based performance modeling, HLAmodeling, HLA

Intelligent agents, Intelligent agents, dynamic schedulerdynamic scheduler

Token-based performance Token-based performance modeling, pre-emptive modeling, pre-emptive multitasking CPU modelsmultitasking CPU models

Token-based performance Token-based performance modeling, hardware/ modeling, hardware/ software co-simulationsoftware co-simulation

Abstract propagation and Abstract propagation and arbitration models, arbitration models, dynamic routing dynamic routing algorithmsalgorithms

1111

Specification

Requirements

CSIM & MaC

Run-time Environment and Design Run-time Environment and Design Application for Polymorphous Technology Application for Polymorphous Technology Verification & Validation (RE-ADAPT V&V)Verification & Validation (RE-ADAPT V&V)

• ImpactImpact

— Design-Time Modeling and Design-Time Modeling and Simulation Environment for Simulation Environment for Verification and Validation (V&V) of PCA enabled applicationsVerification and Validation (V&V) of PCA enabled applications

— Run-Time Monitoring for PCA Enabled Application V&VRun-Time Monitoring for PCA Enabled Application V&V

— Approach Validation via application to PCA enabled avionics designApproach Validation via application to PCA enabled avionics design

• New IdeasNew Ideas

— Methodology for run-time Methodology for run-time detection of requirement detection of requirement violationsviolations

— Framework for automatic Framework for automatic generation of monitoring and generation of monitoring and checking componentschecking components

— Dynamic run-time corrector to Dynamic run-time corrector to force reconfiguration into a force reconfiguration into a safe statesafe state

Real-time Systems Group, University of Pennsylvania

1212

Stream Processors indicated by filled boxes

GP Processors indicated by outlined boxes

Dynamic bar chart indicating total active processors, active stream processor active GP processors and active threaded processors

Total active processor count display

MaCS messages and status

Mission status RADAR Tasks

Imaging

Route Planning

Self Test and MaCS

Flight Control

Threat Avoidance

Mission Assignment

Communi-cations

PCA Virtual Processor State and Activity

System State and Task Flow

Real-time Systems Group, University of Pennsylvania

DARPATech DemoDARPATech Demo