Network On Chip Cache Coherency

Post on 22-Feb-2016

61 views 0 download

Tags:

description

Network On Chip Cache Coherency. Final report, part B Students: Zemer Tzach Kalifon Ethan Instructor: Walter Isaschar Winter 2009. Agenda. General concepts. Description of the coherency protocol. Architecture design. Components implementation. Simulations. - PowerPoint PPT Presentation

transcript

Network On Chip Cache Coherency

Final report, part B

Students: Zemer Tzach Kalifon Ethan

Instructor: Walter Isaschar

Winter 2009

AgendaGeneral concepts.

Description of the coherency protocol.

Architecture design.

Components implementation.

Simulations.

Functionality demonstration .Network On Chip - Cache Coherency 2

Network On Chip - Cache Coherency

General Concepts

3

General Background

Modern CPU’s are based on CMP – Chip-Multi Processor.

Improved performance is achieved by “Distribution and Parallelism”.

Cores interact by using NoC – Network on Chip.

Network On Chip - Cache Coherency 4

NoC General Diagram

Network On Chip - Cache Coherency 5

NoC Characteristics

Wormhole packet routing.

Packet’s path is X-Y.

Units can communicate simultaneously.

Reduce power consumption.

Scalability.

Network On Chip - Cache Coherency 6

Cache Coherency

Cache: On chip fast temporary storage.

Cache Coherency: CMP cores use only up to date data.

Traditionally, Cache Coherency achieved by central memory control unit.

Network On Chip - Cache Coherency 7

Traditionally Cache Coherency

Network On Chip - Cache Coherency 8

Line 1000 = X Line 1000 = XLine 1000 = Y

Problem Description

Prior Cache Coherency protocols are irrelevant – NoC doesn’t have central unit.

Adding such unit will damage both NoC’s scalability and parallelism.

Network On Chip - Cache Coherency 9

Solution Requirements

High performance:Avoid “Hot Spots” and “Bottlenecks”.

Minimize resources.

Won’t affect main NoC characteristics (e.g. scalability).

Network On Chip - Cache Coherency 10

Solution Basics

Memory control distribution according to memory spaces.

Placement of control units as part of the NoC.

Network On Chip - Cache Coherency 11

Solution Diagram

Network On Chip - Cache Coherency 12

Solution General Example

Network On Chip - Cache Coherency 13

Read Miss on line 1000.CPU refer to the appropriate Controller.Controller order transfer of data.Other CPU sends the cache line.

Line 1000 = ?

Line 1000 = X

Line 1000 = X

Project Goal

Design and implement Cache Coherency protocol for CMP based NoC.Implement NoC (part one).Implement Cache Coherency support for NoC (part

two).

Network On Chip - Cache Coherency 14

Network On Chip - Cache Coherency

Coherency Protocol

15

Network On Chip - Cache Coherency

General DescriptionThree types of transactions: Read, Read for

Ownership and Invalidation.Cache line’s status can be I/S/E

(Invalid/Shared/Exclusive respectively).Each cache control unit keeps journal which

determines line’s status.Requests are first addressed to the

appropriate cache control unit.16

Protocol’s Terminology

Requester.Home Node. Closest Sharer. Owner.

Network On Chip - Cache Coherency 17

Read Miss: Line is Shared

Network On Chip - Cache Coherency 18

(3)Data

(1)Read

Request(2)

Forward Request

(4)ACK

Write Miss: Line is Shared

Network On Chip - Cache Coherency 19

(4)ACK

(3)Data

(2)Forward and Invalidation

Request

(7)Grant

Ownership

(5)Invalidation

(1)Read for

Ownership

(6)Invalidation

ACK

Design difficulties (1st example)

Network On Chip - Cache Coherency 20

(2)Invalidation

(4)Forward and Invalidation

Request

(5)Data

(1)Read for

Ownership

(3)Invalidation

ACK

Design difficulties (2nd example)

Network On Chip - Cache Coherency 21

(2)Forward and Invalidation

Request

(2)Invalidation

(3)Data

(4)ACK

(1)Read for

Ownership

Protocol’s FeaturesParallel handling of Read requests.Data is forwarded by the Closest Sharer.Transparency: any CPU which uses M/E/S/I is

supported.The protocol supports strongly consistent

processors.

Network On Chip - Cache Coherency 22

Network On Chip - Cache Coherency

Architecture

23

CMP Diagram

Network On Chip - Cache Coherency 24

CPU Node Structure

Network On Chip - Cache Coherency 25

NoC Interface

Functions as a gateway to the NoC.Packing/unpacking flits into/from NoC’s

Packets.Transmit and receive data simultaneously.

Network On Chip - Cache Coherency 26

NoC Interface Structure

Network On Chip - Cache Coherency 27

CPU Interface

Adapting between NoC’s Cache Coherency Protocol and the CPU.

Translating NoC’s Packets into/from FSB transactions.

CPU transactions doesn’t prevent the CPU Interface from handling the Protocol’s packets.

Network On Chip - Cache Coherency 28

CPU Interface Structure

Network On Chip - Cache Coherency 29

Controller Node Structure

Network On Chip - Cache Coherency 30

Cache Coherency Controller

Manages the Coherency Protocol.Each CCC (Cache Coherency Controller) is

responsible for a specific set of the Memory Lines.

The Directory Table (DT) holds the status of the above Lines as well as several protocol’s information bits.

Network On Chip - Cache Coherency 31

CCC Structure

Network On Chip - Cache Coherency 32

DT General Structure

The DT will contain the following data for each Line:

Network On Chip - Cache Coherency 33

Architecture Features

Message’s length vary according to its purpose. Reduces NoC’s congestion.

Messages carry the transaction information (reduces HW requirements).

Transaction can be blocked by memory update only (allows high parallelism).

Scalable. Network On Chip - Cache Coherency 34

Network On Chip - Cache Coherency

CMPImplementatio

n

35

CMP Characteristics

Size of memory unit is 1 [Byte].Cache line comprise 2 memory units (can

be enlarged).Size of memory is 16 [Byte].CPU’s actions are determined by the user.

Network On Chip - Cache Coherency 36

CPU Implementation

Network On Chip - Cache Coherency 37

CPU Node Implementation

Network On Chip - Cache Coherency 38

CCC Node Implementation

Network On Chip - Cache Coherency 39

CMP Implementation

Network On Chip - Cache Coherency 40

Synthesis Parameters

Network On Chip - Cache Coherency 41

System PerformanceSystem’s clock frequency is 100 [MHz]. CPU’s hold-up (in cycles):

Network On Chip - Cache Coherency 42

Event Line’s Status CPU Delay TotalInvalidation S 0 9Invalidation E 19 28 (M)Read Miss I 29 38 (M)Read Miss S 29 38Read Miss E 49 58 (M)

System Performance

M – Memory penalty.C – Dependant on number of CPUs.Delay in all nodes is one/two cycle. In larger systems network factor becomes

greater.Network On Chip - Cache Coherency 43

Event Line’s Status CPU Delay TotalWrite Miss I 29 38 (M)Write Miss S 29 38 (C)Write Miss E 29 38 (C)

Network On Chip - Cache Coherency

CMPSimulations

44

Network On Chip - Cache Coherency

Read Miss: Line is Shared (1)

45

CPU1x1 reads cache line. The appropriate line is stored in CPU0x0.

1

2

Network On Chip - Cache Coherency

Read Miss: Line is Shared (2)

46

1

2

4

3

Network On Chip - Cache Coherency

Read Miss: Line is Shared (3)

47

1

2 6

5

Network On Chip - Cache Coherency

Read Miss: Line is Exclusive (1)

48

CPU1x1 reads for ownership. The appropriate line is stored in CPU0x0.

1

2

1

2

Network On Chip - Cache Coherency

Read Miss: Line is Exclusive (2)

49

1

2

3

4

Network On Chip - Cache Coherency

Read Miss: Line is Exclusive (3)

50

1

2

5

Network On Chip - Cache Coherency

Read Miss: Line is Exclusive (4)

51

1

2

6

7

Network On Chip - Cache Coherency

Demonstration

52

Demonstration Diagram

Network On Chip - Cache Coherency 53

Tasks – Part A

Familiarize with design tools.Familiarize with VirtexII Pro FPGA

(application & components).Design & Implement NoC’s router.Assemble NoC (2x2 grid) using our router

implementation.

Network On Chip - Cache Coherency 54

Tasks – Part B

Design Cache Coherency protocol for CMP based on faculty research.

Assemble CMP based on our NoC.Implement the protocol as part of the

assembled CMP.

Network On Chip - Cache Coherency 55

Future Work

Network On Chip - Cache Coherency 56

Memory should be distributed.Improve NoC Interface latency.Messages carry all the transaction’s

information.Strongly consistent processors.

Conclusions (1)

Network On Chip - Cache Coherency 57

All architectural goals were achieved. Minimal HW utilization makes for practical

solution. The most efficient possible by protocol

definition.

Conclusions (2)

Network On Chip - Cache Coherency 58

The generic design makes a great basis for further studies and research.

With larger systems, the project advantages would be even more predominant.