Post on 12-Jan-2016
transcript
Instruction-Level Parallelism for Low-Power Embedded
Processors
January 23, 2001Presented By
Anup Gangwar
Embedded Systems Group IIT Delhi
Slid
e 2
Introduction
Need for high performance low power processors
Synergistic hardware -compiler design for EPIC or VLIW like architectures
A new variable instruction length scheme
Full predication support in hardware
Embedded Systems Group IIT Delhi
Slid
e 3
Outline
Instruction-Level Parallelism Power Consumption in VLSI Circuits A Look at Available Mobile and DSP Processors High-Level Evaluation of A Low-Power VLIW
Processor The DEVIL Low-Power Processor A Step Towards Predicated Execution Conclusion
Embedded Systems Group IIT Delhi
Slid
e 4
ILP : Concepts and Limitations
Data DependencesFlow Dependence or RAWAnti Dependence or WAROutput Dependence or WAW
Reduction of critical pathControl DependencesResource Conflicts
Embedded Systems Group IIT Delhi
Slid
e 5
Embedded Systems Group IIT Delhi
Slid
e 6
Achieving ILP : Pipelining
Control dependencies affect pipelined execution
Data dependencies affect pipelined execution
Resource conflicts affect pipelined execution
Embedded Systems Group IIT Delhi
Slid
e 7
Achieving ILP: Superscalar Architectures
In-order issue with in-order completion
In-order issue with out-of-order completion
Out-of-order issue with out-of-order completion
Embedded Systems Group IIT Delhi
Slid
e 8
Embedded Systems Group IIT Delhi
Slid
e 9
Embedded Systems Group IIT Delhi
Slid
e 1
0
Embedded Systems Group IIT Delhi
Slid
e 1
1
Achieving ILP: VLIW Processors
Low circuit overhead than Superscalar Processors
Limited number of resourcesExplicit insertion of NOPs increases
code size
Embedded Systems Group IIT Delhi
Slid
e 1
2
Embedded Systems Group IIT Delhi
Slid
e 1
3
Extracting ILP : BasicBlock Scheduling
Embedded Systems Group IIT Delhi
Slid
e 1
4
Extracting ILP: Superblock Scheduling
Embedded Systems Group IIT Delhi
Slid
e 1
5
Extracting ILP: Predicated Execution
Embedded Systems Group IIT Delhi
Slid
e 1
6
Power Consumption in CMOS Circuits : Parallelism for Energy Efficiency
Embedded Systems Group IIT Delhi
Slid
e 1
7
Embedded Systems Group IIT Delhi
Slid
e 1
8
Available Mobile and VLIW Processors
The ARM FamilyThe ARM7 GenerationThe StrongARMThe ARM Thumb OptionThe ARM Piccolo OptionThe ARM9 and ARM10
Embedded Systems Group IIT Delhi
Slid
e 1
9
Available Mobile and VLIW Processors
The Motorola M-CoreThe LSI TinyRiscThe Hitachi SuperH FamilyVLIW Processors
The Motorola-Lucent Star*CoreThe Philips TriMediaThe HP/Intel IA-64
Embedded Systems Group IIT Delhi
Slid
e 2
0
High Level Evaluation of A Low-Power VLIW Processor
Energy consumption distribution
Embedded Systems Group IIT Delhi
Slid
e 2
1
High Level Evaluation of A Low-Power VLIW ProcessorNOP Elimination in VLIW Processor
Embedded Systems Group IIT Delhi
Slid
e 2
2
High Level Evaluation of A Low-Power VLIW ProcessorSpeed-up Comparison
Embedded Systems Group IIT Delhi
Slid
e 2
3
High Level Evaluation of A Low-Power VLIW Processor
Energy Comparison
Embedded Systems Group IIT Delhi
Slid
e 2
4
High Level Evaluation of A Low-Power VLIW ProcessorEnergy-Delay Product Comparison
Embedded Systems Group IIT Delhi
Slid
e 2
5
The DEVIL Low-Power Processor
Complexity in VLIW ArchitecturesHardware Duplication
FUs and number of registers as well as ports
Number of FUs versus type of FU
Number of FUs versus available ILP
Embedded Systems Group IIT Delhi
Slid
e 2
6
The DEVIL Low-Power ProcessorCode Memory
Embedded Systems Group IIT Delhi
Slid
e 2
7
The DEVIL Low-Power Processor
Embedded Systems Group IIT Delhi
Slid
e 2
8
The DEVIL Low-Power ProcessorInstruction Fetch Mechanism
Embedded Systems Group IIT Delhi
Slid
e 2
9
The DEVIL Low-Power ProcessorBranch Prediction Mechanism
Embedded Systems Group IIT Delhi
Slid
e 3
0
The DEVIL Low-Power Processor Performance with and without superscalar optimizations
Embedded Systems Group IIT Delhi
Slid
e 3
1
The DEVIL Low-Power Processor Effect of SuperScalar optimization on code size
Embedded Systems Group IIT Delhi
Slid
e 3
2
The DEVIL Low-Power ProcessorEffect of NOP elimination on code size
Embedded Systems Group IIT Delhi
Slid
e 3
3
The DEVIL Low-Power Processor Effect of NOP elimination on the number of
accesses to code memory
Embedded Systems Group IIT Delhi
Slid
e 3
4
The DEVIL Low-Power Processor Effect of instruction fetch mechanism on code size
Embedded Systems Group IIT Delhi
Slid
e 3
5
The DEVIL Low-Power Processor Code size comparison with existing mobile processors
Embedded Systems Group IIT Delhi
Slid
e 3
6
A Step Towards Predicated Execution
Compiler techniques for reducing predicate code sizeReduction of number of Control InstructionsPredicate promotion and Instruction merging Instruction reduction for advanced code
generation
Embedded Systems Group IIT Delhi
Slid
e 3
7
A Step Towards Predicated Execution:Reduction of number of Control Instructions
Embedded Systems Group IIT Delhi
Slid
e 3
8
A Step Towards Predicated Execution: Predicate promotion and Instruction merging
Embedded Systems Group IIT Delhi
Slid
e 3
9
A Step Towards Predicated Execution
Introducing predication support into processorEffect on code size of full predicationPredication code size and Execution
CharactersticsPrefix based predication
Embedded Systems Group IIT Delhi
Slid
e 4
0
A Step Towards Predicated ExecutionRelative number of predicated instructions
Embedded Systems Group IIT Delhi
Slid
e 4
1
A Step Towards Predicated Execution
Code expansion considering predication
Embedded Systems Group IIT Delhi
Slid
e 4
2
A Step Towards Predicated Execution Code reductions due to predicated execution
Embedded Systems Group IIT Delhi
Slid
e 4
3
Conclusions
A synergistic hardware-compiler approach for low-power processors
A new VLIW architecture to reduce increase in code size
A prefix based predicated execution architecture framework