+ All Categories
Home > Documents > Predictable Integration of Safety-Critical Software on COTS- based Embedded Systems

Predictable Integration of Safety-Critical Software on COTS- based Embedded Systems

Date post: 25-Feb-2016
Category:
Upload: jemima
View: 47 times
Download: 0 times
Share this document with a friend
Description:
Predictable Integration of Safety-Critical Software on COTS- based Embedded Systems. Marco Caccamo University of Illinois at Urbana-Champaign. Outline. Motivation PRedictable Execution Model (PREM) Peripheral scheduler & real-time bridge Memory-centric scheduling MemGuard - PowerPoint PPT Presentation
Popular Tags:
83
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Marco Caccamo University of Illinois at Urbana-Champaign
Transcript
Page 1: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software

on COTS-based Embedded Systems

Marco CaccamoUniversity of Illinois

at Urbana-Champaign

Page 2: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Outline

• Motivation• PRedictable Execution Model (PREM)

– Peripheral scheduler & real-time bridge– Memory-centric scheduling

• MemGuard– Memory bandwidth Isolation

• Colored Lockdown– Cache space management

2

Page 3: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Real-Time Applications

3

• Resource intensive real-time applications– Multimedia processing(*), real-time data analytic(**), object tracking

• Requirements– Need more performance and cost less Commercial Off-The Shelf (COTS) – Performance guarantee (i.e., temporal predictability and isolation)

(*) ARM, QoS for High-Performance and Power-Efficient HD Multimedia, 2010(**) Intel, The Growing Importance of Big Data and Real-Time Analytics, 2012

Page 4: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Modern System-on-Chip (SoC)

• More cores– Freescale P4080 has 8 cores

• More sharing – Shared memory hierarchy (LLC, MC, DRAM)– Shared I/O channels

4

More performanceLess energy,Less cost

But, isolation?

Page 5: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• In a multicore chip, memory controllers, last level cache, memory, on chip network and I/O channels are globally shared by cores. Unless a globally shared resource is over provisioned, it must be partitioned/reserved/scheduled. Otherwise– Complexity, cost and schedule: The schedulability analysis,

testing and temporal certification of an IMA partition in a core will also depend on tasks running in other cores

– Safety Concerns: The change of software in one core could cause the tasks in other cores’ IMA partitions missing their deadlines. This is unacceptable!

5

SoC: challenges for RT safety-critical systems

Page 6: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Problem: Shared Memory Hierarchy

• Shared hardware resources• OS has little control

Core1 Core2 Core3 Core4

DRAM

App 1 App 2 App 3 App 4

6

Memory Controller (MC)

Shared Last Level Cache (LLC) Space sharing

Access contention

Page 7: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems7

Problem: Task-Peripheral conflict (1 core)

• Task-peripheral conflict:– Master peripheral working for Task B.– Task A suffers cache miss.– Processor activity can be stalled due to

interference at the FSB level.

• How relevant is the problem?– Up to 49% increased wcet for memory

intensive tasks.– Contention for access to main memory

can greatly increase a task worst-case computation time!

CPU

Front Side Bus

DDRAM

Host PCI Bridge

Masterperipheral

Slaveperipheral

Task A Task B

This effect MUST be considered in wcet

computation!!

Sebastian Schonberg, Impact of PCI-Bus Load on Applications in a PC Architecture, RTSS 03

PCI Bus

Page 8: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Experiment: Task and Peripherals

• Experiment on Intel Platform, typical embedded system speed.• PCI-X 133Mhz, 64 bit fully loaded by traffic generator peripheral.• Task suffers continuous cache misses.• Up to 44% wcet increase.

8

Page 9: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Experiment: 2 Cores Interference

• Task A suffers max number of cache misses (92% stall time).• Task B has variable cache stall time.

WCET increase proportional to cache stall time

Max WCET increase ~= cache stall time of task A

• Adding PCI-E peripheral interference -> 196% WCET increase!

Multicore interference is a serious problem!!!

9

Page 10: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Transaction Length Bandwidth (256B)

No interference 596MB/s (100%)

128 bytes 441MB/s (74%)

256 bytes 346MB/s (58%)

512 bytes 241MB/s (40%)

Problem: Bus Contention

• Two DMA peripherals transmitting at full speed on PCI-X bus.

• Round-robin arbitration does not allow timing guarantees. RAM

CPU

10

Page 11: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Problem: Bus Contention

0 8 16

t

t

3

NO BUS SHARING

RAM

6

• Two DMA peripherals transmitting at full speed on PCI-X bus.

• Round-robin arbitration does not allow timing guarantees.

CPU

11

Page 12: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Problem: Bus Contention

RAM

0 8 16

t

t

6

BUS CONTENTION, 50% / 50%

10

4

• Two DMA peripherals transmitting at full speed on PCI-X bus.

• Round-robin arbitration does not allow timing guarantees.

CPU

11

Page 13: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Problem: Bus Contention

RAM

0 8 16

t

t

9

BUS CONTENTION, 33% / 66%

9

Integration Nightmare!!!

• Two DMA peripherals transmitting at full speed on PCI-X bus.

• Round-robin arbitration does not allow timing guarantees.

CPU

11

Page 14: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• Compute worst case increase on task computation time due to peripheral interference (single core system).

• Main idea: treat the memory subsystem as a switch that multiplexes accesses between the CPU and peripherals.

• The same analysis was later extended to multicore platforms.

Cache Delay Analysis (contention-based access)

t

Cach

e fe

tche

s

t

Band

wid

th

t

Cach

e fe

tche

s

wcet increase

Task

Peripherals

wcet (no interfence)

12

R. Pellizzoni and M. Caccamo, "Impact of Peripheral-Processor Interference on WCET Analysis of Real-Time Embedded Systems" IEEE Transactions on Computers (TC), Vol. 59, No. 3, March 2010.

Page 15: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Modeling I/O traffic: Peripheral Arrival Curve

• Key idea: the maximum task delay depends on the amount of peripheral traffic (single core).

• : maximum amount of time required by all peripherals to access main memory.

)(ti

• Can be obtained using…– Measurement– Distributed traffic analysis– Enforced through engineering solution (more on that later…)

14

)(ti

Page 16: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

The Need for Engineering Solutions

• Analysis bounds are tight but depend on very peculiar arrival patterns.

• Average case significantly lower than worst case.– Main issue: COTS arbiters are not designed for predictability.

• We propose engineering solutions to:1. schedule memory accesses at high level (coarse granularity)

memory-centric real-time scheduling, 2. control cores’ memory bandwidth usage,3. manage cache space in a predictable manner

26

Page 17: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Outline

• Motivation• PRedictable Execution Model (PREM)

– Peripheral scheduler & real-time bridge– Memory-centric scheduling

• MemGuard– Memory bandwidth Isolation

• Colored Lockdown– Cache space management

17

Page 18: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Peripheral Scheduling

CPU

RAM

IMPLICIT SCHEDULE ENFORCEMENT

0 8 16

t

t

3

BLOCK BLOCK

• Solution: enforce peripheral schedule (single resource scheduling).

• No need to know low-level parameters!

COTS peripherals do not provide block functionality,

so how do we do this?

28

Page 19: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Real-Time I/O Management System

• Real-Time Bridge interposed between peripheral and bus.

• RT-Bridge buffers incoming/outgoing data and delivers it predictably.

• Peripheral Scheduler enforces traffic isolation.

CPU

NorthBridgePCIe

SouthBridge

ATA

PCI-X

RTBridge

RTBridge

RTBridge

RTBridge

Peripheral Scheduler

RAM

29

E. Betti, S. Bak, R. Pellizzoni, M. Caccamo and L. Sha, "Real-Time I/O Management System with COTS Peripherals" IEEE Transactions on Computers (TC), Vol. 62, No. 1, pp. 45-58, January 2013.

Page 20: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Peripheral Scheduler• Peripheral Scheduler receives data_rdyi information from

Real-Time Bridges and outputs blocki signals.• Server provides isolation by enforcing a timing reservation.• Fixed priority, cyclic executive etc. can be implemented in HW

with very little area.

Server1

Scheduler (FP)READY1

EXEC1

EXEC1 = READY1

EXEC2 = READY2 andnot EXEC1

EXECi = READYi andnot EXEC1 … and not EXECi-1

. . .

. . .

READY2

EXEC2

READYi

EXECi

. . .

. . .

data_rdy1

block1

data_rdy2

block2

data_rdyi

blocki

Server2

Serveri

30

Page 21: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Real-Time Bridge

FPGA CPU

PLB

InterruptController

DMA Engine

Local RAM

PCI

Brid

ge

IntMain

IntF

PGA

bloc

k

System + PCI

Host CPU

Main Memory

PCIControlledPeripheral

FPGA

• FPGA System-on-Chip design with CPU, external memory, and custom DMA Engine.

• Connected to main system and peripheral through available PCI/PCIe bridge modules.

MemoryController

PCI

Brid

ge31

data

_rdy

Page 22: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Real-Time Bridge

• The controlled peripheral reads/writes to/from Local RAM instead of Main Memory (completely transparent to the peripheral).

• DMA Engine transfers data from/to Main Memory to/from Local RAM.

FPGA CPU

PLB

InterruptController

DMA Engine

Local RAM

PCI

Brid

ge

IntMain

IntF

PGA

bloc

k

data

_rdy

System + PCI

Host CPU

Main Memory

PCIControlledPeripheral

FPGA

MemoryController

PCI

Brid

ge32

Page 23: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Peripheral Virtualization

• RT-Bridge supports peripheral virtualization.

• Single peripheral (ex: Network Interface Card) can service different software partitions.

• HW virtualization enforces strict timing isolation.

33

Page 24: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Implemented Prototype

• Xilinx TEMAC 1Gb/s ethernet card (integrated on FPGA).• Optimized virtual driver implementation with no software

packet copy (PowerPC running Linux).• Full VHDL HW code and SW implementation available.

34

Page 25: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Evaluation• 3 x Real-Time Bridges, 1 x Traffic

Generator with synthetic traffic.

• Rate Monotonic with Sporadic Servers.

Scheduling flows without peripheral scheduler (block always low) leads to deadline misses!

Peripheral Transfer Time

Budget Period

RT Bridge 7.5ms 9ms 72ms

Generator 4.4ms 5ms 8ms

Utilization 1, harmonic periods.

Generator

RT-Bridge

RT-Bridge

RT-Bridge

35

Page 26: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EvaluationPeripheral Transfer

TimeBudget Period

RT Bridge 7.5ms 9ms 72ms

Generator 4.4ms 5ms 8ms

No deadline misses with peripheral scheduler

Generator

RT-Bridge

RT-Bridge

RT-Bridge

• 3 x Real-Time Bridges, 1 x Traffic Generator with synthetic traffic.

• Rate Monotonic with Sporadic Servers.

36

Page 27: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Testbed (single core, distributed)

• Embedded testbed used to prove the applicability of our techniques.

• System objective: control a 3DOF Quanser helicopter.– Non-linear control.– 100 Hz sensing and actuation.

• End-to-end delay control using:– I/O Management System.– Real-Time Bridge

38

Page 28: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• Sensor Node performs sensing/actuation.• Control node executes control algorithm.• Data exchanged on real-time network.

Testbed (single core, distributed)

Sensor Node

Control Node

Quanser 3DOFhelicopterRT Network

39

Page 29: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Testbed

Sensing / actuation node

Control Node

RT Switch

CPU RAMMemlogic

Peripheral Scheduler

PCI

RT Bridge

Traffic Generator

RT NICCard

RT NICCard

ADC/DACCard

NIC

GUI Node

NIC

Sensing data

Actuation

Disturb

40

Page 30: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Real-Time Bridge Demo

41

Page 31: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Predictable Execution Model (PREM uni-core)

• (The rule) Real-time embedded applications should be compiled according to a new set of rules to achieve predictability

• (The effect) The execution of a task can be distinguished between a memory intensive phase (with cache prefetching) and a local computation phase (with cache hits)

• (The benefit)High-level coscheduling can be enforced among all active components of a COTS system

contention for accessing shared resources is implicitly resolved by the high-level coscheduler without relaying

on low level arbiters

30

R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, R. Kegley, "A Predictable Execution Model for COTS-based Embedded Systems", Proceedings of 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Chicago, USA, April 2011.

Page 32: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Page 33: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Page 34: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Page 35: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Page 36: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Page 37: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory-centric scheduling (multicore)

• It uses the PREM task model: each task is composed by a sequence of intervals, each including a memory phase followed by a computation phase.

• It enforces a coarse-grain TDMA schedule for granting memory access to each core.

• Each core can be analyzed in isolation as if tasks were running on a “single-core equivalent ” platform.

G. Yao, R. Pellizzoni, S. Bak, E. Betti, and M. Caccamo, "Memory-centric scheduling for multicore hard real-time systems", Real-Time Systems Journal, Vol. 48, No. 6, pp. 681-715, November 2012.

Page 38: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Two cores example: TDMA slot of core 1

memory phase computation phase

J1J2J34 1280

With a coarse-grained TDMA, tasks on one core can perform the memory access only when the TDMA slot is granted

Core Isolation

Page 39: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory-centric scheduling: three rules

• Assumption: fixed priority, partitioned scheduling

• Rule 1: enforce a coarse-grain TDMA schedule among the cores for granting access to main memory;

• Rule 2: raise scheduling priority of memory phases over execution phases when TDMA memory slot is granted;

• Rule 3: memory phases are non-preemptive.

Page 40: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Raise priority of mem. phases during TDMA slot

memory phase computation phase

J1J2J34 1280

J1J2J3

Page 41: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Make memory phases non-preemptive

J1J2J34 1280

J1J2J34 1280

Page 42: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Summary of two cores example

Rule 1 – TDMA memory schedule

Rule 2 – Prioritize memory phases during a TDMA memory slot

Rule 3 – memory phases are non-preemptive

Page 43: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

J1J2J3

0

J4J510 20 30 40

Intuition of response time analysis

The linearized TDMA model:1. b is the memory bandwidth assigned to the core (b = TDMA_slot/ TDMA_period).2. each memory phase is inflated by a factor 1/b; each execution phase is inflated

by a factor 1/(1-b);3. Interfering jobs that contribute to worst case response time can be separated as a

memory chain followed by an execution chain;

Execution chainMemory chain

Page 44: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

J1J2J3

0

J4J510 20 30 40

Pipelining memory and exec. phases

key observations:

• The inflated memory and execution phases can run in parallel.• Only ONE joint job contributes to both memory and execution chains (in this

figure, J3 is the joint job).

Execution chainMemory chain

Page 45: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Worst-case response time of Job Ji

2. Memory blocking from one lower priority job

3. Either memory or computation from hp(i)

4. Computation of job under analysis

1. Upper bound of the memory phase of the joint job

1. Both the memory and the computation of the joint job 2. Longest memory phase of one job with lower priority (due to non-preemptive

memory)3. The max of memory and computation phase for each higher priority job4. The computation phase of the job under analysis

Page 46: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Schedulability of synthetic tasks

Core Util Memory

Util

In an 8-core, 10-task system, the memory-centric scheduling bound is superior to the contention-based scheduling bound.

Schedulability ratio

Page 47: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Schedulability of synthetic tasksSchedulability

ratio

Core Util Memory

Util

Ratio = .5

The contour line at 50% schedulable level

Page 48: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Outline

• Motivation• PRedictable Execution Model (PREM)

– Peripheral scheduler & real-time bridge– Memory-centric scheduling

• MemGuard– Memory bandwidth Isolation

• Colored Lockdown– Cache space management

48

Page 49: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Interference

• Key observations:– Memory bandwidth(variable) != CPU bandwidth (constant)– Memory controller queuing/access delay is unpredictable

49

Core

Shared Memory

Core

foregroundX-axis

background470.lbm

Intel Core2

L2 L2

437.leslie3d 462.libquantum 410.bwaves 471.omnetpp1.0

1.2

1.4

1.6

1.8

2.0

2.2Foreground slowdown ratio

(1.6GB/s) (1.5GB/s) (1.5GB/s) (1.4GB/s)

(2.1GB/s)

Page 50: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Access Pattern

• Memory access patterns vary over time• Static resource reservation is inefficient

50

Time(ms)

LLC misses LLC misses

Time(ms)

Page 51: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

51Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Bandwidth Isolation

• MemGuard provides an OS mechanism to enforce memory bandwidth reservation for each core

H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, L. Sha, "MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms", to appear at IEEE RTAS, April 2013.

Page 52: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

MemGuard

• Characteristics– Memory bandwidth reservation system– Memory bandwidth: guaranteed + best-effort– Prediction based dynamic reclaiming for efficient

utilization of guaranteed bandwidth– Maximize throughput by utilizing best-effort bandwidth

whenever possible

• Goal– Minimum memory performance guarantee – A dedicated (slower) memory system for each core in

multi-core systems52

Page 53: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Bandwidth Reservation

• Idea– Control interference by regulating per-core memory traffic– OS monitor and enforce each core’s memory bandwidth usage

• Using per-core HW performance counter(PMC) and scheduler

53

10 200Dequeue tasks

Enqueue tasks

Dequeue tasks

Budget

Coreactivity

21

computationmemory fetch

Page 54: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Guaranteed Bandwidth: rmin

• Definition– Minimum memory transfer rate

• when requests are back-logged in the DRAM controller• worst-case access pattern: same bank & row miss

• Example (PC6400-DDR2*)– Peak B/W: 6.4GB/s– Measured minimum B/W: 1.2GB/s

54(*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)

Page 55: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Bandwidth Reservation

• System-wide reservation rule– up to the guaranteed bandwidth rmin

m: #of cores

• Memguard approximates a dedicated (ideal) memory subsystem– bandwidth: Bi (bytes/sec)– latency: 1/Bi (sec/byte)

55

Page 56: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Memory Bandwidth Reclaim

• Key objective– Utilize guaranteed bandwidth efficiently

• Regulator– Predicts memory usage based on history– Donates surplus to the reclaim manager at the beginning of every

period– When remaining budget (assigned – donated) is depleted, tries to

reclaim from the reclaim manager

• Reclaim manager– Collects the surplus from all cores– Grants reclaimed bandwidth to individual cores on demand

56

Page 57: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Hard/Soft Reservation on MemGuard

• Hard reservation (w/o reclaiming)– Guarantee memory bandwidth Bi regardless of other cores– Selectively applicable on per-core basis

• Soft reservation (w/ reclaiming)– Does not guarantee reserved bandwidth due to potential

misprediction– Error cases can occur due to misprediction– Error rate is small (shown in evaluation)

• Best-effort bandwidth– After all cores use their given budgets, and before the next period

begins, MemGuard broadcasts all cores to continue to execute57

Page 58: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Evaluation Platform

• Intel Core2Quad 8400, 4MB L2 cache, PC6400 DDR2 DRAM • Modified Linux kernel 3.6.0 + MemGuard kernel module

– https://github.com/heechul/memguard/wiki/MemGuard• Used the entire 29 benchmarks from SPEC2006 and synthetic benchmarks

58

Core 0

L1-I L1-D

L2 Cache

Intel Core2Quad

Core 1

L1-I L1-D

Core 2

L1-I L1-D

L2 Cache

Core 3

L1-I L1-D

System Bus

DRAM

Page 59: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Isolation Effect of Reservation

• Sum b/w reservation rmin (1.2GB/s) Isolation– 1.0GB/s(X-axis) + 0.2GB/s(lbm) = rmin

59

Isolation

Core 0: 1.0 GB/s for X-axis

Core 2: 0.2 – 2.0 GB/s for lbm

Solo [email protected]/s

Page 60: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Effects of Reclaiming and Spare Sharing

• Guarantee foreground ([email protected]/s)• Improve throughput of background ([email protected]/s): 368%

60

Page 61: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Effect of MemGuard

• Soft real-time application on each core. • Provides differentiated memory bandwidth

– weight for each core=1:2:4:8 for the guaranteed b/w, spare bandwidth sharing is enabled

61

Page 62: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Outline

• Motivation• PRedictable Execution Model (PREM)

– Peripheral scheduler & real-time bridge– Memory centric scheduling

• MemGuard– Memory bandwidth Isolation

• Colored Lockdown– Cache space management

62

Page 63: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

LVL3 Cache & Storage Interference

• Inter-core interference– The biggest issue wrt modular certification– Fetches by one core might evict cache blocks owned by

another core– Hard to analyze!

• Inter-task/inter-partition interference• Intra-task interference

– Also present in single-core systems; intra-task interference is mainly a result of cache self-eviction.

Page 64: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Inter-Core Interference: Options

• Private cache – It is often not the case: majority of COTS multicore platforms have last

level cache shared among cores• Cache-Way Partitioning

– Easy to apply, but inflexible– Reducing number of ways per core can greatly increase cache conflicts

• Colored Lockdown – Our proposed approach– Use coloring to solve cache conflicts– Fine-grained assignment of cache resources (page size – 4Kbytes)– Use cache locking instructions to lock “hot” pages of rt critical tasks

locked pages can not be evicted from cache

R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, R. Pellizzoni, "Real-Time Cache Management Framework for Multi-core Architectures", to appear at IEEE RTAS, Philadelphia, USA, April 2013.

Page 65: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

How Coloring Works

• The position inside the cache of a cache block depends on the value of index bits within the physical address.

• Key idea: the OS decides the physical memory mapping of task’s virtual memory pages manipulate the indexes to map different pages into non-overlapping sets of cache lines (colors)

Page 66: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

How Coloring Works

• The position inside the cache of a cache block depends on the value of index bits within the physical address.

• Key idea: the OS decides the physical memory mapping of task’s virtual memory pages manipulate the indexes to map different pages into non-overlapping sets of cache lines (colors)

Page 67: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

How Coloring Works

• You can think of a set associative cache as an array…

. . .

32 ways

16 c

olor

s

Page 68: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

How Coloring Works

• You can think of a set associative cache as an array…• Using only cache-way partitioning, you are restricted to assign

cache blocks by columns.• Note: assigning one way turns it into a direct-mapped cache!

. . .

Page 69: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

How Coloring + Locking Works

• You can think of cache as an array…• Combining coloring and locking, you can assign arbitrary

position to cache blocks independently of replacement policy

. . .

Page 70: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

T1CPU1

Colored Lockdown Final goal• Aimed model - suffer cache misses in hot memory regions only once:

– During the startup phase, prefetch & lock the hot memory regions– Sharp improvement in terms of WCET reduction (and schedulability)

T2CPU2

startup

memoryaccess

execution

T1CPU1

T2CPU2

T2CPU2 hot

region

Page 71: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• In the general case, the size of the cache is not enough to keep the working set of all running rt critical tasks.

• For each rt critical task, we can identify some high usage virtual memory regions, called: hot memory regions ( ). Such regions can be identified through profiling.

• Critical tasks do NOT color dynamically linked libraries. Dynamic memory allocation is allowed only during the startup phase.

Detecting Hot Regions

Page 72: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• How can we detect hot pages? Given an addr. space:

Detecting Hot Regions

hotregion

Their location is unknown

Their absolute virtual memory addresses change from run to run

Process Addr. Space

data

text

heap

Page 73: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• Execute the unmodified task inside a profiling environment

• The output is the list of every single accessed virtual memory address

• We keep per-page access counters. Hotter pages will record a higher number of accesses.

Detecting Hot Regions

Profiling Environment

Observed Task Instrumentation code added at run-

time

Memory accesses are caught

Page 74: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Detecting Hot Regions

• Rank the virtual pages by number of accesses.

• Since absolute addresses change from run to run, identify each page as a pair of values:

– The index of the section which contains the page– The offset, expressed in pages, from the beginning of the section E.g.: virtual page #: 0x8040A → Section #3 (text) + 0x3

• Execute the task again outside the profiling environment to obtain an unaltered list of sections.

• Compute the relative position of a hot page according to the unaltered list of sections.

Page 75: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• The final memory profile will look like:

Detecting Hot Regions

# + page offset

1 + 0x0002 1 + 0x000425 + 0x0000 1 + 0x000125 + 0x0003 3 + 0x0000 4 + 0x0000 6 + 0x0002 1 + 0x0005 1 + 0x0000

...

ABCDEIKOPQ

Where A, B, … is the page ranking;

Where “#” is the section index;

It can be fed into the kernel to perform selective Colored Lockdown

How many pages should be locked per process? Task WCET reduction as function of locked pages has approximately a convex shape; convex optimization can be used for allocating cache among rt critical tasks

Page 76: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

• EEMBC Automotive benchmarks– Benchmarks converted into periodic tasks– Each task has a 30 ms period

• ARM-based platform– 1 GHz Dual-core Cortex-A9 CPU– 1 MB L2 cache + private L1 (disabled)

• Tasks observed on Core 0– Each plotted sample summarizes execution of 100 jobs

• Interference generated with synthetic tasks on Core 1

EEMBC Results

Page 77: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EEMBC Results

• Angle to time conversion benchmark (a2time)

• Baseline reached when 4 hot pages are locked / 81% accesses caught

Page 78: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EEMBC Results

• CAN remote data request benchmark (canrdr)

• Baseline reached when 3 pages are locked / 91% accesses caught

Page 79: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EEMBC Results• Same experiment executed on 7 EEMBC benchmarks

Benchmark Total Pages Hot Pages % Accesses in Hot Pages

a2time 15 4 81%

basefp 21 6 97%

bitmnp 19 5 80%

cacheb 30 5 92%

canrdr 16 3 85%

rspeed 14 4 85%

tblook 17 3 81%

Page 80: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EEMBC Results• One benchmark at the time scheduled on Core 0• Only the hot pages are locked

No Prot.No Interf.

No Prot.Interf.

Prot.Interf.

Page 81: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

EEMBC Results• Four benchmarks at the time scheduled on Core 0• Only the hot pages are locked

Prio 4(top priority)

Prio 3

Prio 2

Prio 1(low priority)

Page 82: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems

Conclusions

• In a multicore chip, memory controllers, last level cache, memory, on chip network and I/O channels are globally shared by cores. Unless a globally shared resource is over provisioned, it must be partitioned/reserved/scheduled.

• We proposed a set of engineering solutions to:1. schedule memory accesses at high level (PREM + memory-centric scheduling),2. control cores’ memory bandwidth usage (MemGuard),3. manage cache space in a predictable manner (Colored Lockdown).

• We demonstrated our techniques on different platforms based on Intel and ARM, and tested them against other options.

• Questions?

Page 83: Predictable  Integration  of  Safety-Critical Software  on  COTS- based Embedded  Systems

Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems83

• Part of this research is joint work with prof. Lui Sha and prof. Rodolfo Pellizzoni• This presentation is from selected research sponsored by

– National Science Foundation (NSF), Office of Naval Research (ONR)– Lockheed Martin Corporation– Rockwell Collins

• Graduate students and Postdocs involved in this research: Stanley Bach, Heechul Yun, Renato Mancuso, Roman Dudko, Emiliano Betti, Gang Yao

References• E. Betti, S. Bak, R. Pellizzoni, M. Caccamo and L. Sha, "Real-Time I/O Management System with COTS

Peripherals”, IEEE Transactions on Computers (TC), Vol. 62, No. 1, pp. 45-58, January 2013. • R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, R. Kegley, "A Predictable Execution Model

for COTS-based Embedded Systems", Proceedings of 17th RTAS, Chicago, USA, April 2011. • G. Yao, R. Pellizzoni, S. Bak, E. Betti, and M. Caccamo, "Memory-centric scheduling for multicore hard

real-time systems", Real-Time Systems Journal, Vol. 48, No. 6, pp. 681-715, November 2012.• H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, L. Sha, "MemGuard: Memory Bandwidth Reservation System

for Efficient Performance Isolation in Multi-core Platforms", to appear at IEEE RTAS, April 2013. • R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, R. Pellizzoni, "Real-Time Cache Management

Framework for Multi-core Architectures", to appear at IEEE RTAS, Philadelphia, USA, April 2013.

Acknowledgements

1


Recommended