+ All Categories
Home > Documents > High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded...

High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded...

Date post: 29-Mar-2015
Category:
Upload: sonny-harbach
View: 217 times
Download: 3 times
Share this document with a friend
Popular Tags:
35
High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne Wolf
Transcript
Page 1: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

High Performance Embedded Computing

© 2007 Elsevier

Lecture 8: Embedded Processor Issues

Embedded Computing Systems

Mikko Lipasti, adapted from M. Schulte

Based on slides and textbook from Wayne Wolf

Page 2: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Topics

Bus encoding. Security-oriented architectures. CPU simulation. Configurable processors.

Page 3: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Bus encoding

Encode information on bus to reduce toggles and dynamic energy consumption. Count energy consumption

by toggle counts. Bus encoding is invisible to

rest of architecture. Some schemes transmit

side information about encoding.

mem CPUenc dec

encodedbus

sideinformation

Page 4: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Bus-invert coding Stan and Burleson: take

advantage of correlation between successive bus values.

Choose sending true or complement form of bus values to minimize toggles. Why might this approach

work well? Can break bus into fields

and apply bus-invert coding to each field. How might the bus be

divided?

Page 5: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Working zone encoding

Mussoll et al.: Used to encode address buses Uses the observation that the majority of the execution

time for a program is spent in a small range of addresses Divides addresses into sets called working-zone Address in a working zone is sent as an offset from the

base in a one-hot code. Why is a one-hot code used?

Addresses that are not in a working zone have the entire value sent.

Compared to bus-invert coding, what would you expect to be the advantages and disadvantages of this approach?

Page 6: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Address bus encoding

Benini et al: cluster correlated address bits and then encode clusters

Compute correlation coefficients of transition variables to determine clusters:

Need to ensure clusters don’t become too large, since this can increase encode/decode logic.

Use logic synthesis to design encoders and decoders for each cluster

Page 7: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Benini et al. results

[Ben98] © 1998 IEEE

What important tradeoffs of the address encoding technique are not shown in the table below?

Page 8: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Dictionary-based encoding Takes advantage of the observation that many

values are repeated on buses.

Page 9: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Dictionary-based encoding Takes advantage of the observation that many values are repeated on buses. Divides bus into three parts:

Only the upper bits of the bus are stored in the dictionary and used to match dictionary values that are indexed by the index part.

When the upper bits match, they are put in a high-Z state and the remaining bits are sent; otherwise all bits are sent.

Page 10: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Lv et al. dictionary-based architecture

[Lv03] © 2003 IEEE

Page 11: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Lv et al. energy savings

[Lv03] © 2003 IEEE

Page 12: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Security-oriented architectures There are a variety of security attacks:

Typical desktop/server attacks, such as Trojan horses and viruses.

Physical access allows side channel attacks. Cryptographic instruction sets have been

developed for several architectures. Embedded systems architecture must add

protection for side effects, consider energy consumption.

Page 13: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Secure architectures

SmartMIPS and ARM SecureCore offer security extensions Include encryption instructions, specialized

memory management units, etc. SAFE-OPS

Designed to protect against software modification Compiler embeds a watermark into code based

on register assignment. FPGA accelerator checks the validity of the

watermark during execution.

Page 14: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Power attacks

Kocher et al.: Adversary can observe power consumption at pins and deduce data, instructions within CPU.

Yang et al.: Dynamic voltage/frequency scaling (DVFS) can be used as a countermeasure. [Yan05] © 2005 ACM Press

Page 15: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

CPU simulation Performance vs. energy/power simulation. Temporal accuracy. Trace vs. execution. Simulation vs. direct execution. Simulate using appropriate benchmarks for

embedded systems Don’t use SPEC CPU Benchmarks! Embedded Benchmarks include EEMBC,

MediaBench, MiBench Benchmarks often should be domain-specific

Page 16: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Trace-based analysis

Instrumentation generates side information.

PC-sampling checks PC value during execution.

Can measure control flow, memory accesses.

Page 17: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Program counter (PC) sampling Example: Unix prof. Interrupts are used to sample PC periodically.

Must run on the platform. Doesn’t provide complete trace. Subject to sampling problems: undersampling,

periodicity problems. Generates a call-graph report that indicates the

percentage execution time spent in each program.

Page 18: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Program instrumentation

Example: dinero. Modify the program to

write trace information. Track entry into basic

blocks. Requires editing object

files. Provides complete trace.

Page 19: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Microarchitecture-modeling simulators Varying levels of detail:

Instruction scheduler is not cycle-accurate. Cycle timers are cycle-accurate.

Can simulate for performance or energy/power.

Typically written in general-purpose programming language (e.g., C), not hardware description language.

Page 20: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Cycle-accurate simulator

Models the microarchitecture. Simulating one instruction

requires executing routines for instruction fetch, decode, execute, etc.

Models pipeline state. Microarchitectural registers are

exposed to the simulator.

reg

IR

PC

I-box

Page 21: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Trace-based vs. execution-based

Trace-based: Gather trace first, then

generate timing information.

Basic timing information is simpler to generate.

Full timing information may require regenerating information from the original execution.

Requires owning the platform.

Execution-based: Simulator fully executes the

instruction. Requires a more complex

simulator. Requires explicit

knowledge of the microarchitecture, not just instruction execution times.

Page 22: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Power simulation Model capacitance in the processor. Keep track of activity in the processor.

Requires full simulation. Activity determines capacitive

charge/discharge, which determines power consumption.

CPU Power Simulators include: Simple Power and Wattch for embedded GP Trimaran with EPIC Explorer for embedded VLIW

Page 23: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Automated CPU design

Customize aspects of CPU for application: Instruction set. Functional units. Memory system (including register files). Busses, I/O, and peripherals.

Tools help design and implement custom CPUs. FPGAs make it easier to implement custom CPUs. Application-specific instruction processor (ASIP) has

custom instruction set. Configurable processor is generated by a tool set.

Page 24: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Techniques

Architecture optimization tools help choose the instruction set and microarchitecture.

Configuration tools implement the microarchitecture (and perhaps compiler).

Early example: MIMOLA [1984] analyzed programs, created microarchitecture and instructions, synthesized logic.

Page 25: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

CPU configuration process

Page 26: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Tensilica configuration options

© 2004 Tensilica

Page 27: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Tensilica EEMBC comparison

© 2004 Tensilica

Page 28: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Tensilica energy consumption by subsystem

© 2006 Tensilica

Page 29: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Toshiba MePcore

Page 30: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

LISA language

[Hof01] © 2001 IEEE

Page 31: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

LISA descriptions and generation Memory model includes registers and other

memories. Uses clause binds operations to hardware. Timing specified by PIPELINE, IN,

ACTIVATION, ENTITY. Generates hierarchical VHDL design.

Page 32: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

PEAS-III

Synthesis driven by: Architectural parameters

such as number of pipeline stages.

Declaration of function units.

Instruction format definitions.

Interrupt conditions and timing.

Micro-operations for instructions and interrupts.

Generates both simulation and synthesis models in VHDL.

Page 33: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Instruction set synthesis

Generate instruction set from application program, other requirements.

Sun et al. analyzed design space for simple BYTESWAP() program.

[Sun04] © 2004 IEEE

Page 34: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Complex function definition Atasu et al. try to

combine many operations into an instruction: Disjoint operator graphs. Multi-output instructions.

Operator graph must be convex---value cannot leave, then re-enter the instruction.

Textbook discusses several other approaches

[Ata03] © 2003 ACM Press

Page 35: High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

© 2006 Elsevier

Limited-precision arithmetic

Fang et al. used affine arithmetic to analyze numerical characteristics of algorithms.

Mahlke synthesize variable bit-width architectures given bit-width requirements.

Cluster operations to find a small number of distinct bit widths.

What advantages and disadvantages might this approach have?

[Mah01] © 2001 IEEE


Recommended