+ All Categories
Home > Documents > The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based...

The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based...

Date post: 18-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
15
Presented by: Martin Croome VP Marketing GreenWaves Technologies The Next Generation of GAP An IoT Application Processor for Inference at the Very Edge
Transcript
Page 1: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

Presented by:

Martin Croome

VP Marketing

GreenWaves Technologies

The Next Generation of GAP

An IoT Application Processor for Inference at the Very Edge

Page 2: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

2

• French fabless semiconductor startup

• Based in Grenoble, France

• Founded in November 2014

• Focused on designing and selling chips for AI and signal processing on battery operated IoT and wearable devices

GreenWaves

Company Proprietary

Page 3: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

Machine learning is becoming an attribute of all connected things

Sense Interpret & Analyse Act & communicate

IoT & wearable Devices

require

Signal processing and inference capabilities within a highlyconstrained power budget

Beyond human…

High frequency vibration

Radar

Infrared

Ultrasound

Human…

3

Page 4: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

Machine learning is becoming an attribute of all connected things

Security &

Access Control

Machine Health

Monitoring

Workspace & Energy Management

Safety &

Signalling Solutions

Wearables & Hearables

Smart AppliancesMini Robotics &

Toys

People CountingObject detection

Vibration Analysis People DetectionActivity Classification Face ID

Gesture controlPose detection

Object/ People Detection Sound analysisText Recognition

Face DetectionKeywords Detection

Biosignal analysis

Medical sensors

4

Page 5: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

5

• MCU class energy consumption • Highly efficient parallelization

• Sophisticated architecture (including instruction set architecture extensions)

• Explicit memory movement

• Agility• Fine grained compute / energy scaling

• Ultra fast state transitions

• Programmability• Applicable to many real world problems –

not just CNNs

• Exploits fast evolution of state-of-the-art

• Single code model across architecture

The fundamentals of GAP – Intelligence at the very edge

Embedded

Compute

Cluster

Multi core

Shared memory

Shared instruction

cache

Hardware

synchronization

Custom accelerators

Ultra low power

SoC Chassis

Controller Core

L2 Memory

Peripheral µDMA

Cluster DMA

Integrated PMU

Single chip solution for an intelligent sensor

Page 6: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

6

• First generation GAP processor

• Based on 5 years of research in PULP program

• TSMC 55nm process

• 9 Extended ISA RISC-V RV32 IMC cores

• 8 core cluster

• 1 core ‘fabric controller’

• HDKs since May 2018

• Production qualified

• First shipping products Q1 2020

GAP8

Page 7: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

7

• QVGA Face ID

• Face detection: ~25ms ~1mW / frame / second

• Face Reidentification: 400ms 22mW / frame / second

• 93% accuracy on Labelled Faces in the Wild dataset

• Embeddable owner detection on battery operated devices

• IR people detection

• 80 x 80 IR Image - LynRED ThermEye

• Image preprocessing + human detection

• 62ms ~4.4mW / frame / second

• 99% accuracy on internally collected training set.

• A full solution for people counting / occupancy detection on a battery for > 5 years

GAP8 has already achieved industry leading performance

Page 8: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

• GAP8• Combined market leading architecture…

• …with mature semiconductor process TSMC 55nm LP

• GAP9• Tunes GAP8 architecture with experience gained from GAP8

• Exploits market leading semiconductor process Global Foundries 22nm FDX

• GAP9 establishes a new capability / power consumption point in the industry• 10 times larger problems than GAP8

• 5 times less power than GAP8

• Increases agility and programmability

8

Introducing GAP9

Page 9: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

9

• Increased Capability

• Larger problems

• 1.6MB internal RAM

• Peak cluster L1 bandwidth of 41.6 GB/sec

• Peak L2 bandwidth of 7.2 GB/s

• Hardware compression

• More compute states

• 400MHz cluster top frequency

• New power states

• More flexibility

• 32 / 16 / 8-bit floating point support

• New bi-directional multi-channel digital audio interfaces

• Additional CSI2 camera interface

• Increased security

• HW AES 256/128 bit

• HW Programmable Unclonable Function (PUF)

GAP9 – Examples of architectural evolutions

Embedded

Compute

Cluster

Multi core

Shared memory

Shared instruction

cache

Hardware

synchronization

Custom accelerators

Ultra low power

SoC Chassis

Controller Core

L2 Memory

Peripheral µDMA

Cluster DMA

Integrated PMU

Page 10: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

10

GAP9 vs. Arm M7 on MobileNet v1

Target Clock (MHz)

Time (ms)

Cycles (M)

FPS Active Power

(mW)

Image Size

Channel Scaling

Top 1ImageNet Accuracy

STM32 H7 400 162.5 65 6.2 170 160x160 0.25 43%

GAP9 29 162.5 4.77 6.2 5 160x160 0.25 43%

GAP9 400 11.925 4.77 83.9 50 160x160 0.25 43%

GAP9 400 167.5 67 6.0 50 192x192 1 70%

STM H7 figures - Running MobileNet on STM32 MCUs at the edge, Manuele Rucci

Accuracy estimates from TensorFlow model library

ImageNet performance in 1000 image classes

34 x less

More accuracy

14 x more

Page 11: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

11

• What are customers expecting (Different things)?

Expecting a packaged solution

Expecting a known network

Expecting to revolutionize the world

• Each of these customers requires a different tool set

But architecture is only 50% of the story – tools is the rest

OR

OR

Solutions

Pre-packaged fine-tuneable networks

Expert tools to exploit architecture

Page 12: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

12

GAPFlow

GAP Cluster Code

NNTool

Graph conversion

Post-training quantization

Model Library

Model Fine Tuning

Training aware quantization

Reference Platforms • A series of build system agnostic, modular tools

that convert Graphs to GAP code

• Build examples based on Makefiles but usable with

any build system

• NN focus but by no means limited to NN

• Use one, use all, use none

• Extendable

• Clear points of failure

• Enhanced with examples networks and full

applications that use it

AutoTiler

Graph Description

Generators

Kernel

Model

Page 13: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

13

GAP AutoTiler – Explicit memory movement

Shared L1

L2

1 N

External L3 (Ram/Flash)

µDMA

Exec

L2 to L1

L3 to L2

• Data caches are not good for streamed data – low cache efficiency

• ML / Signal Processing data traffic sizing is known at compile time

• Generate code for automatic data tiling and pipelined memory transfer interleaved with parallel call to compute kernel

DMA

AutoTiler

Graph Description

Generators

Kernel

Model

Page 14: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

14

• GAP9 development boards available in early 2020 for lead customers

• GAP9 simulator has been in customer hands since May 2019

• GAP8 shipping now in production

• GAP8 development boards and evaluation boards for vision and IR vision shipping now

• GAP SDK available on our GitHub repository

• Come and see our demonstrations on the Open HW Group booth

GAP family is enabling ground breaking applications at the very edge

Real … Now …

Page 15: The Next Generation of GAP - RISC-V · 12.12.2019  · •First generation GAP processor •Based on 5 years of research in PULP program •TSMC 55nm process •9 Extended ISA RISC-V

Thank you!

Questions?


Recommended