Introduction to gem5
Nizamudheen Ahmed
Texas Instruments
1
Introduction
• A full-system computer architecture simulator
– Open source tool focused on architectural modeling
– BSD license
• Encompasses – system-level architecture, as well as
– processor micro-architecture.
• The gem5 simulation infrastructure is the merger of
– The best aspects of the M5 and
– The best aspects of GEMS
• M5 – Highly configurable simulation framework to support multiple ISAs, and diverse CPU models
– developed @ The University of Michigan
• GEMS [General Execution-driven Multiprocessor Simulator] – detailed and flexible memory system model
– Includes support for multiple cache coherence protocols and interconnect models
– developed @ The University of Wisconsin Madison
2
3
Ref: http://gem5.org/dist/tutorials/hipeac2012/gem5_hipeac.pdf
Features - Framework
• A Simulation framework
– C++ based simple Discrete event simulation kernel
– sc_thread, sc_cthread and wait not supported
– gem5 Events provide mechanism to schedule, deschedule and reschedule events on the
simulation time-line
• Object (derived from SimObject) schedule their own events on EventQueue.
• Python based front-end interface
– Python scripts to construct topology being simulated
– Initialization, configuration, simulation control & statistics
• Supports: Alpha, ARM, MIPS, Power, SPARC, and x86
– ARM
• ARM detailed configuration similar to Cortex-A15
• including support for Thumb, Thumb-2, VFPv3 and NEON instruction set extensions
• Multiple system simulation
– Example: multiple SoC connected over a simulated-Ethernet link
• Boots Linux and Android
– Enough IP model supported to boot Linux
• VNC capabilities (Graphics capabilities)
4
Features – System modes
• gem5 supports 2 fundamental modes of operation
• Full system (FS)
– Models bare hardware, including devices
– Interrupts, exceptions, privileged instructions, fault handlers
– Use-case: benchmarking individual applications, or set of applications on
MP
– Additional feature: Simulated UART output & frame buffer output
• Syscall emulation (SE)
– Models user-visible ISA plus common system calls
– System calls emulated, typically by calling host OS
– Simplified address translation model, no scheduling
– Use-case: OS fast-boot
5
Features – CPU Models
• Configurable CPU models : Supports 3 CPU models
– Simple Atomic/Timing : Fast CPU model
– InOrder: Detailed pipelined in-order CPU model
– O3: Detailed pipelined out-of-order CPU model
• Supports a domain specific language to represent ISA details
• Includes information to generate the decode function
• Example
def bitfield OPCODE <31:26>;
def bitfield IMM <12>;
def signed bitfield MEMDISP <15:0>;
decode OPCODE {
0: Integer::add({{ Rc = Ra + Rb; }});
1: Integer::sub({{ Rc = Ra - Rb; }});
}
6
Features – Memory reference Interfaces
• Three transport interfaces : functional, atomic, timing
• Functional
– Similar to TLM debug-transport
– Untimed call
– No state change intended
– Use-case: For loading binaries, memory introspection, etc.,.
• Atomic
– Similar to TLM blocking transport (but no wait)
– time annotation
– State change allowed (cache fill, eviction and so on)
– Use-case: LT style use-case, cache warming, etc.,.
• Timing
– Similar to TLM non-blocking transport
– Non-blocking interface, time annotations, multiple phases
– Use-case: Detailed memory access behavior analysis 7
Features – Memory System (1)
• Memory System:
– Classic (from M5): Fast and configurable memory system model
– Ruby (from GEMS) : framework/infrastructure to model variety of cache-
coherent memory-system.
• Classic memory model
– Fast and easily configurable memory-model.
– Supports Atomic as well as Timing mode operation
– Higher simulation speed compared to Ruby
– Models simplistic snooping cache coherency protocol.
– Less accurate than detailed Ruby model
8
Features – Memory System(2)
9
• Ruby
– Detailed model for the memory subsystem.
– Supports Timing access interface. Does not supports atomic access
interface.
– Supports a domain specific language called SLICC(Specification Language
for Implementing Cache Coherence)
• support a wide variety of cache coherence protocols, from directory to
snooping protocols and several points in between.
• SLICC file SLICC compiler Documentation and Cache controller
model code for cache-coherency
– Includes
• Inclusive/exclusive cache hierarchies
• Various replacement policies
• Coherence protocols
• Interconnection network
• DMA & Memory controller
– Ruby accurately models on-chip network contention and flow control
Features – Check pointing & Fast-forward
• Checkpointing
– Snapshot the relevant system state
– Restore it later
• The ISA, number of cores and memory-map need t be same to restore
the session
– Use serialize and unserialize concepts
– Supported on classic memory-model as well as Ruby memory-model.
• Fast-forward
– Idea is to start the simulation in atomic mode and switch over to detailed
mode for relevant/important simulation period
– Switch may consume few more simulation cycles to drain outstanding
memory-access request
10
Flexibility
11
Source: The gem5 Simulator, May 2011 issue of ACM SIGARCH Computer Architecture News
GEM5 accuracy
12
Ref: Accuracy Evaluation of GEM5 simulator system A. Butko, R. Garibotti, L. Ost, and G.
Sassatelli. In the proceeding of the IEEE International Workshop on Reconfigurable
Communication-centric Systems-on-Chip (ReCoSoC), York, United Kingdom, July 2012.
• Real System:
– ST-Ericsson Nova A9500 processor
– Dual-core ARM Cortex-A9 processor (1 GHz)
running a Linux kernel
– It also features a number of DSP and ASIP
cores along with a Mali-400 GPU
• GEM5 System
– Dual-core ARM Cortex-A9core running at 1
GHz
– 32-kB private L1 data and instruction caches,
512-kB shared L2 cache
– DDR physical memory running at 400MHz.
– Linux Kernel 2.6.38
• Conclusion
– “According to the results, the accuracy varies
from 1.39% to 17.94% depending on the
memory traffic. In the worst scenario,
mismatch has been shown to result from
overly simple model of the external DDR
memory . . .”
TI & GEM5
• Wrapped GEM5 ISA simulator into SystemC wrapper and plugged that
into the architecture modeling tool chain (1H2011).
– SystemC scheduler integration
– Classic memory-model to TLM 2.0 bridge
• Working closely with a 3P to upstream the SystemC wrapper and TLM
2.0 bridge into the standard gem5 code-base
– Plan to close this by 1H13
• Enabled full-system performance optimization for next-gen
heterogeneous SoC
– Running complex Linux workloads
– Heavily used to address many-core challenges
13
SystemC Integration (1)
14
Even
t
gem5 Event
Queue
gem5
Model
time
Even
t
gem5
Model
Pop
when
the
time
comes
sc_event.notify(t)
Ref: Integrating gem5 in systemC simulations, Alexandre Romaña, Texas Instruments
http://www.m5sim.org/wiki/images/7/72/Gem5_workshop_systemC_integration_ext.pdf
SystemC Integration (2)
15
Protocol Bridge G
ener
ic
TL
M 2
.0
SystemC
amba Model G
EM5
cl
assi
c
Sim
ula
tion
b
ridge
AM
BA
TL
M 2
.0
Free from carbon design systems Ref: Integrating gem5 in systemC simulations, Alexandre Romaña, Texas Instruments
http://www.m5sim.org/wiki/images/7/72/Gem5_workshop_systemC_integration_ext.pdf
Tool dependency
• GCC 4.2 +
• Python
• SWIG
• Scons (build)
• Google Protocol Buffers
16
Summary
• gem5 introduction
– High-level features (CPU/Memory/System)
• Active gem5 community
– Gem5 community and user group is very active
• Past 100 days
– ~850 mails in the gem5-user mailing list reflector
– ~1200 mails in the gem5-dev mailing list reflector
• Resources
– Subscribe to the mailing lists
• gem5-users – Questions about using/running gem5
• gem5-dev – Questions about modifying the simulator
– Submit a patch to our ReviewBoard
• http://reviews.gem5.org
– Read & Contribute to the wiki
• http://www.gem5.org 17
Q & A
18
Envisioned use-case for system simulation • SW development and verification
– Binary translation models (QEMU/OVP) are fast enough to do this and have
a mature SW development environment
• HW/SW performance verification
– Need performance measure of 1st order accuracy, capturing the things that
actually matters
• Early architecture Exploration
– Need an environment where it is fast and easy to model and connect the
key architectural components of hardware platform
• HW/SW functional verification
– RTL is representative enough and has enough visibility and a mature
methodology
19
Courtesy: http://gem5.org/dist/tutorials/hipeac2012/02.introduction.m4v