+ All Categories
Home > Documents > A Tiny RISC-V Floating-Point Unit - PULP platform

A Tiny RISC-V Floating-Point Unit - PULP platform

Date post: 28-Jan-2022
Category:
Upload: others
View: 19 times
Download: 2 times
Share this document with a friend
29
Information Classification: General December 8-10 | Virtual Event A Tiny RISC-V Floating-Point Unit Luca Bertaccini PhD Student ETH Zurich #RISCVSUMMIT
Transcript
Page 1: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

December 8-10 | Virtual Event

A Tiny RISC-V Floating-Point UnitLuca Bertaccini

PhD Student

ETH Zurich#RISCVSUMMIT

Page 2: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

A Tiny RISC-V Floating-Point Unit

Luca Bertaccini

PhD Student at Digital Circuits and Systems Group (ETH Zurich), part of the PULP team.

(ETH Zurich)

1

Page 3: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Internet of Things

▪ Billions of devices gathering and sending data to servers

▪ Processing on the edge to save bandwidth and energy

▪ More and more processing is done on the edge

▪ Many existing algorithms require FP arithmetic

2

Page 4: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Floating-Point Supports

3

FPUs introduce large area overhead

▪ ~20kGE for single-precision▪ ~50kGE for double-precision

SW emulation libraries introduce large code size overhead

▪ ~5kB for single-precision support▪ ~15kB for double- and single-precision ▪ More system memory required

Area-optimized cores occupy

10-20kGE

Page 5: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Why a Small FPU?

4

Large area overhead for full-fledged FPU Large code size overhead for SW emulation

Not affordable for low-cost MCUs

Need for a tiny FPU

Page 6: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Snitch

▪ Snitch as host system

▪ Snitch is a RV32IMAFD core

▪ Snitch has been designed for high-performance

▪ Snitch includes an open-source multi-format RISC-V FPU optimized for high performance and energy-efficiency (FPnew*)

▪ The integer Snitch core is optimized for area

▪ Why not coupling a single-core Snitch with a tiny FPU?

5

*https://github.com/pulp-platform/fpnew/

Page 7: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

From Fast-FPU to Tiny-FPU

Snitch’s FPU (Fast-FPU):

▪ Modular (ADDMUL, COMP, CONV) and multi-format

▪ High-performance and energy-efficient FPU

▪ Large area and fully combinatorial

▪ ADDMUL is the largest module

6

Page 8: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

From Fast-FPU to Tiny-FPU

ADDMUL is the largest module:

▪ Multiple large adders

▪ Multiple large shifters

▪ One large multiplier

How can we optimize Snitch’s FPU for area?

7

×-

+

>>

>>

-

+

LZC

Page 9: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU - Trading area for latency

▪ Two versions:

▪ Double-precision Tiny-FPU

(with support for single-precision)

▪ Single-precision Tiny-FPU

▪ Iterative, multi-cycle execution

▪ Reuse datapath resources in a time-multiplexed

fashion

▪ Maximize internal register utilization

8

+

+ + +

Page 10: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

9

Page 11: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

(𝒑 ∗ 𝒑) → (𝒑 ∗ 𝟐)

9

Page 12: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

𝟑𝒑 + 𝟒 →𝟑𝒑 + 𝟒

𝟑+ 𝟏

9

Page 13: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

𝟑𝒑 + 𝟒 →𝟑𝒑 + 𝟒

𝟑+ 𝟏

9

Page 14: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

Small components (not optimized)

9

Page 15: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

Overhead for time-multiplexing

9

Page 16: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU

Comparisons and cast did not requireadditional arithmetic components

9

Page 17: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny-FPU - Performance

▪ fdiv/fsqrt not supported (SW emulated, option to GCC compiler –mno-fdiv)

▪ When emulating FP via SW (libgcc functions), even fadd and fmul can take hundreds of

cycles

10

Latency [cycles]

fmadd fadd fsub fmul comparison cast

FP32 21-24 10-13 10-13 18 2 9

FP64 36-39 10-13 10-13 33 2 9

FP32 (on FP64 datapath) 22-25 10-13 10-13 19 2 9

Page 18: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Five Snitch Implementations

To evaluate our Tiny-FPU, we considered five Snitch implementations

• Snitch-int :

• Snitch-tiny64:

• Snitch-fast64:

• Snitch-tiny32:

• Snitch-fast32:

Snitch

Snitch FP64 Tiny-FPU+

Snitch FP64 Fast-FPU+

Snitch FP32 Tiny-FPU+

Snitch FP32 Fast-FPU+

11

libgcc+

Page 19: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Benchmarks

Synthetic Benchmarks:

▪ Two matrix multiplications

▪ One integer, one FP

▪ Tuning FP intensity

▪ (%FP from 0.07% to 53%)

12

Real Benchmarks:

▪ fann (%FP = 21%)

▪ conv2d (%FP = 20%)

▪ knn (%FP = 7%)

▪ fixed-point fann (%FP=0%)

Page 20: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Results - Area

• GF22FDX @100MHz

• Tiny-FPU is 53% (DP) and 37% (SP) smaller than Fast-FPU

• RISC-V defines separate Register File (RF) for FP instructions

• RF occupies around 70% of the area overhead to support D- and F-extension not dedicated to the FPU) and more than Tiny-FPU

13

Page 21: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Tiny FPU reduces Code Size

0

2

4

6

8

10

12

14

16

RV32IMAFD RV32IMAF

Code size overhead [kB]

Snitch-int Snitch-tiny Snitch-fast

• Code size overhead for a full SW emulation library

• The FPUs need just the functions to emulate fdiv and fsqrt on the integer datapath

• Code size overhead up to 80% smaller when implementing Tiny-FPU

-79%

-80%

14

Page 22: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Results - Performance

High %FP

▪ Snitch-tiny up to 18.5x (DP) and 15.5x (SP) faster than Snitch-int

15

Low %FP (<5%)

▪ Snitch-tiny only 1.33x (DP) and 1.18x (SP) slower than Snitch-fast, while being 5x(DP) and 3x (SP) faster than Snitch-int

5% 53%

Page 23: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Results - Power

16

%FP > 16%

▪ Steep increase of Snitch-fast power consumption due to heavier system resources utilization

High %FP

▪ Snitch-tiny consumes up to 47% (DP) and 33% (SP) less power than Snitch-fast 16% 53%

- 47%

Page 24: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Results - Energy Efficiency

17

High %FP

▪ Snitch-tiny is not as energy-efficient as snitch-fast due to the multi-cycle execution

▪ Snitch-tiny is up to 8x more energy-efficient than Snitch-int

53%

Page 25: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Further Optimization

18

RISC-V defines a separate

register file for FP

instructions

Zfinx proposes to use just one register file when FPand INT share the same

word size

Snitch-tiny32 would be

just 1.7x larger than

Snitch-int

Page 26: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Conclusions

▪ Tiny-FPU is a new area-optimized RISC-V FPU:

▪ 53% (DP) and 37% (SP) smaller than a high-performance and energy-efficient FPU

▪ We evaluated the costs and performance of five different floating-point supports, keeping Snitch as host system

▪ Snitch coupled with Tiny-FPU is:

▪ up to 18.5x (DP) and 15.5x (SP) faster than Snitch employing SW emulation

▪ up to 8x more energy-efficient than Snitch employing SW emulation

▪ up to 47% (DP) and 33% (SP) less power-consuming than Snitch coupled with Fast-FPU

▪ Future work: Zfinx version of Snitch-tiny32 to achieve the lowest area overhead to support FP in HW.

19

Page 27: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Open Source

Joining our open-source repositories soon!

20

https://pulp-platform.org

https://github.com/pulp-platform

Page 28: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

Acknowledgement

ETH:

▪ Matteo Perotti

▪ Stefan Mach

▪ Pasquale Davide Schiavone

▪ Florian Zaruba

▪ Luca Benini

HUAWEI:

▪ Tariq Kurd

▪ Mark Hill

▪ Lukas Cavigelli

21

Page 29: A Tiny RISC-V Floating-Point Unit - PULP platform

Information Classification: General

December 8-10 | Virtual Event

Thank you for your attention!#RISCVSUMMIT @risc_v


Recommended