+ All Categories
Home > Documents > T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core...

T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core...

Date post: 26-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
19
T-HEAD Xuantie-910 : Innovating Cloud and Edge Computing by RISC-V Yu Pu
Transcript
Page 1: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

Xu a nt i e - 9 1 0 : I n novat i ng C l o u d a nd Ed ge C o m p ut in g by R IS C-V

Yu Pu

Page 2: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

Open Source, Building the Chip Ecosystem in the New AIoT Era T-HEAD

Chen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian Chen, Chunqiang Li, Yu Pu, Jianyi Meng, Xiaolang Yan, Yuan Xie and Xiaoning Qi

Page 3: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

Infrastructure Provider of the AIoT Era

Building the Chip Infrastructure of the AIoT Era

AliOSAliOS

Domain Specific ArchitectureDomain Specific ArchitectureRISC-V Compatible Processor RISC-V Compatible Processor

…Intelligent Computing Security Memory ControlMCU

Domain Specific SoC Platforms ( IPs from Partners ) Domain Specific SoC Platforms ( IPs from Partners )

Industry Control

Page 4: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADXuantie - The Evolving Processor Architecture

First RISC-V Processor with HW TEE902

Ultra High Performance Processor with AI Acceleration Engine9109xx

In progress

Page 5: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

• RISC-V RV64GCV• Cluster Based Muti-core Architecture• 1/2/4 Cores per Cluster• 32KB/64KB L1 D$; 32KB/64KB L1 I$• 64-bit , 12-stage 、 Out-of-Order • 3-decode , 8-issue• Dual Issue Out-of-Order Memory Access• High Performance Hybrid Branch Processing• Multi-mode Dynamic Data Prefetch• Vector Engine for AI Acceleration• AI, Edge servers, Industrial control , ADAS

Ultra High Performance Architecture – Xuantie910

910 Core

I-Cache D-Cache

VectorComputingUnit

FPUCoherence Interconnect Bus

L2 Cache Master IF

PLICTimer

Debug Unit

Trace Unit

Page 6: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADRemarkable Performance

Data source:http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-151.pdfhttps://content.riscv.org/wp-content/uploads/2019/04/RISC-V_SweRV_Roadshow-.pdfhttps://content.riscv.org/wp-content/uploads/2019/06/17.00-syntacore_zurich_ws.pdf https://www.sifive.com/cores/u74-mc

rocket NXF BOOM-2w BOOM-4w SweRV SCR7 U74 C9100

2

4

6

8

2.32

3.223.91

4.7 4.9 5 5.1

7.140%

Xuantie910

CoreMark/MHz

Page 7: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADCompatible with RISC-V Specification

ISA RV64GCV

VectorRISC-V 0.7.1 Vector Extension

FP16/32/64, INT8/16/32/64Privilege Mode Machine + Supervisor + User

Memory Management Sv39 MMU + 8/16 PMP

Interrupt Controller Clint + PLIC

Page 8: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

RISC-V Turbo• Computing • Bit operation• Memory access• Multi-core synchronization

Extended Enhancement - RISC-V Turbo

• Memory management• Cache 、 TLB EEMBC Nbench OpenSSL0

0.2

0.4

0.6

0.8

1

1.2

1.4

Native RV XT910

Page 9: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADDeep Superscalar Out-of-Order Pipeline

• Front-End• Fetch 8 Instructions/cycle• Decode 3 Instructions/cycle• Issue 8 Instructions/cycle

• Back-End• Out-of-Order Memory Access• Dedicated Branch Processing• Out-of-Order Vector Computing Vector pipe

VEC1

VEC2

VEC3

VEC4

VEC1

VEC2

VEC3

VEC4

ALU/MUL pipe

MUL2

MUL3

ALUALU

ALU/DIV pipe

Branch pipe

BR

Load/Store pipe

AG DC DA

WB

IF IP IB ID IR IS RF

instruction fetch instruction decode & issue

Page 10: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADInstruction Fetch Unit with Hybrid Prediction

• Hybrid Multi-mode Branch Prediction• Branch Direction Prediction• Branch Target Prediction• Return Address Prediction• Indirect Branch Prediction

• High-bandwidth Parallel Fetch• 128-bit Fetch• Up to 8 Instructions Packaged in Parallel• Instruction Cache Way Prediction• Loop Acceleration

Bi-modepredictor

RAS

GHR

Branch Address Bus

Indirect Branch

Predictor

BTBBranch Target

Predictor

Bi-modepredictorBranch

Predictor

Branch Predictor

Return Address Predictor

Page 11: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADDual Issue Out-of-Order Load Store Unit

• Out-of-Order Dual Issue• Load/store Address Pipeline• Independent Store Data Pipeline• Speculation Fail Prediction

• Load/store Fast Complete• 3 cycle Load-to-Use• 1 cycle Store Execution

• Powerful Prefetching Capabilities• Multi-mode and Multi-stream• Both Virtual and Physical Address Prefetching• Configurable Prefetch Capacity

Issue Queue

AGULD D-CACHE

ST AGU WriteBuffer

ST_DATA

Page 12: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADEffi cient Multi-core Interconnection

• Decoupled Processor Interface Units (PIUx)• MOESI Coherence Protocol• Directory-based Architecture• Snoop Filter Supported• Configurable L2 Cache , up to 8MB• ECC Supported

PIUxPIUx

Slave IF

L2 Cache

Master IFPIUxPIUx

Snoop filter Snoop control

Page 13: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

• Compatible with RISC-V 0.7.1 Vector Extension• Supports FP16/32/64, INT8/16/32/64• 256-Bit Operation Width, VL = 128 and 2 pipelines • Two 128-Bit Vector ALU Ops per Cycle• One 128-Bit Vector Load and One 128-bit Vector Store per Cycle• Direct Access to L1$ on Vector Load and Vector Store • Dual-issue Out-of-Order Vector Execution Pipeline• More than 300GFLOPS of FP16 Computing Power per Cluster

(32 FLOPS/core/cycle x 2.5 GHz x 4 Cores)• Half of FP16 computing power when widening to FP32

AI Optimized Vector Computing Engine

RASRASRASVector Register Files

Vector ExecutionVector ExecutionVector ExecutionVector ExecutionVector ExecutionVector ExecutionVector ExecutionVector Execution

Page 14: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADEffi cient Profiling Engine

Page 15: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEAD

automotive consumer networking telecom0

0.2

0.4

0.6

0.8

1

1.2

1.4

ASSIGNMENT BITFIELD FOURIER FP EMULATION HUFFMAN IDEA LU DECOMPOSITION NEURAL NET NUMERIC SORT STRING SORT0

0.2

0.4

0.6

0.8

1

1.2

1.4

nBenchEEMBC

Experimental ResultsA73XT910

Page 16: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADFPGA DEMO

Page 17: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADASIC Implementation

Process Technology

TSMC 12nm FinFETOperating Frequency

2.0 GHz a ~ 2.5 GHz b

a. LVT 6T-turbo STD cell, 0.8V VDD, TT 85oCb. 30% ULVT STD cell, 1.0V VDD, TT 85oCArea per Core

(excl. L2$)0.6 mm2 (without VEC)

0.8 mm2 (with VEC)

Page 18: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADWujian SoC Platform with Xuantie Enabling chip differentiation competition 50%

Reducing Chip Design Time by 50%

50%Saving up to 50% on Design Cost

Page 19: T-HEAD - Hot Chips · 2020. 8. 16. · T-HEAD • RISC-V RV64GCV • Cluster Based Muti-core Architecture • 1/2/4 Cores per Cluster • 32KB/64KB L1 D$; 32KB/64KB L1 I$ • 64-bit,12-stage、Out-of-Order

T-HEADConclusion

• Ultra High Performance Superscalar Processor• RISC-V Compatible plus RISC-V Turbo Technology• Dual issue Out-of-Order Memory Subsystem• AI Vector Acceleration Engine


Recommended