AI and SDR - GNU Radio...Source: Manuel Uhm, Software-Defined Radio: To Infinity and Beyond,...

© Copyright 2019 Xilinx

Manuel Uhm

Director, Silicon Marketing

Chair of the Board, Wireless Innovation Forum (SDR Forum v2.0)

Jason Vidmar

Sr. System Architect – MILCOM / SATCOM / Machine Learning

AI and SDR:

Software Meets Hardware Again…


SDR Evolution

>> 3

Figure 1: How successive generations of SDRs have come to dominate the radio industry and will continue to evolve.Source: Manuel Uhm, Software-Defined Radio: To Infinity and Beyond, Military Embedded Systems, October 2016

Key semiconductor technology drivers:

• Moore’s Law

• FPGAs

• RFICs

• Analog/Digital Integration


AI Evolution

>> 4

Source: Verhaert, 2019 Perspective on Artificial Intelligence Evolution

Key semiconductor technology drivers:

• Moore’s Law

• GPUs

• FPGAs

• ASICs

© Copyright 2019 XilinxPage 5

SDR & AI Payload Convergence

Cognitive

Radar

Cognitive

SIGINT

Cognitive

EW

Cognitive

RadioMulti-

Mission

Situationally

Aware

Payload:Enabled by SDR

and AI Technology


End of the Line for Processor Performance?

>> 6

MOORE’S LAW

End of “PPA” Improvement

AMDAHL’S LAW

Multicore Hits Limit

DENNARD SCALING

Power Density Rises

Moving Forward: Domain-Specific Architectures (DSAs)

Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach , 6/e. 2018

1980 1985 1990 1995 2000 2005 2010 2015

1

10

100

1000

10000

100000

40 Years of Processor Performance

CISC

2x / 3.5yrs

(22%/yr)

RISC

2x / 1.5yrs

(52%/yr)

End of

Dennard

Scaling

Multicore

2x / 3.5yrs

(23%/yr)

Amdahl’s

Law

2x / 6yrs

(12%/yr)

End of the

line?

2x / 20yrs

(3%/yr)

Pe

rfo

rman

ce v

s.

VA

11

-780


FPGAs

ASICs

ASSPs

General Purpose Processors

Pe

rfo

rma

nce

/Po

we

r E

ffic

ien

cy

Number of Applications

Why ACAP?

ACAPs

(Domain Specific Architecture)

Evolving Processor Landscape


The Adaptive Compute Acceleration Platform

Diverse Workloads in

Milliseconds

Future-Proof for

New Algorithms

ADAPTIVE

AdaptableEngines

ScalarEngines

IntelligentEngines

COMPUTE

ACCELERATION

Enabling Data Scientists, SW Developers, HW Developers>> 8

PLATFORM

Development Tools HW/SW Libraries Run-time Stack

SW Programmable Silicon Infrastructure

Multi-core

Processing SystemProgrammable

Logic

DSP

(Vector-based & Fabric-based)


Hardware Adaptable: Accelerating the Whole Application

NETWORK-ON-CHIP

AI Engines

Arm

Dual-Core

Cortex-R5F

Arm

Dual-Core

Cortex-A72

I/O

TB/s of Bandwidth

PL-to-AI Engine

Scalar, Sequential

& App Processing

Any-to-Any

Connectivity

Flexible Parallel Compute,

Data manipulation

ML & Signal Processing

Vector, Compute Intensive

128 GB/s of

Memory B/W

per Core

IntelligentAdaptableScalar

Adaptive Beamforming

AJ

Tactical Networking

SAR Backprojection

Spectrum Processing

Machine Learning

Heterogeneous Processing

For Tactical Edge Systems (Example Applications)

Delivering Deterministic Performance & Low Latency

Custom Memory

Hierarchy

Page 9

Applications are combined into

Domain Specific Architectures (DSAs)

Robust Device &

Run-time

Security


Versal ACAP: A Platform for Software and Hardware Developers

User ApplicationC, C++, Python

Frameworks

Fully Software Programmablewith Hardware Design Path

OS Drivers

Versal ACAP Device & Integrated Shell

Evaluation & Deployment Boards

RuntimeScout

Vivado

SoftwarePlatform

HardwarePlatform

IP Libraries


Possible Platform Example: Multi-Mission Situationally Aware UAV Payload with Versal ACAP

>> 11

UAV Platform

Multi-Mission Applications: Comms, Radar, SIGINT, EW

ML Overlay

Scalar Engines

VERSAL ACAP

Versal ACAP Eval Board

AI EnginesAdaptable Engines

xfopenCV DSPlib

Xilinx Runtime (XRT)

Frameworks


Versal ACAP Roadmap

>> 12

HBMMemory Integration

AI EdgeLowest power AI

Premium112G SerDes600G CoresAI Core

AI InferenceThroughout

PrimeBroadest Application

AI RFAI withIntegrated RF


Advanced SDR: Technologies

and Challenges

>> 13


Trends in SDR Pushing the Compute Boundary

>> 14

5G 100X Complexity1 vs. 4G

[CAPACITY]

[RESILIENCY]

[AUTONOMY]

10X

20X

10X

100X

100X

3X

Rise of Deep Learning

(Dawn of Next Wave of AI)

Source: ETRI RWS-150029, 5G Vision and Enabling Technologies, Dec. 2015.

300,000X!

Operations in Contested Spectrum

AlexNet to

AlphaGo Zero

Source: “AI and Compute,” OpenAI. May 2018.

http://www.3gpp.org/ftp/tsg_ran/TSG_RAN/TSGR_70/Docs

https://openai.com/blog/ai-and-compute/


Enabling Technologies

˃ Direct-RF / High-IF Sampling Data

Converters

˃ Array Antennas

˃ Compute Optimizations for Deep Learning

>> 15

Array

Antennas

Controlled Reception Pattern Array (CRPA)

beam patterns.(source: gpsworld.com)

mMIMO Spatial Multiplexing

and Beamforming (5G).

(Matheus, 2016)

“Cat”

“Dog”

“Bird”…

“QPSK”

“BPSK”

“8PSK”…

Image Input

Non-image Input (RF)

Classification Result

Animation credit: Philip Leone, Univ. of Sydney. Presentation.

Deep Learning Classification

https://www.gpsworld.com/anti-jam-technology-demystifying-the-crpa/

http://phwl.org/wp-content/uploads/2018/10/lstmslides-milcom18.pdf


Advanced SDR: Compute Comparisons

>> 16

References: “Implementing a Real-Time

Beamformer on an FPGA Platform.” Xilinx.

Xcell Journal 60.

See also: Xilinx WP452 “Adaptive

Beamforming for Radar: Floating-Point

QRD+WBS in an FPGA” References: “Applied Deep Learning - Part 4: Convolutional Neural Networks”,

Towards Data Science (blog).

W = Rxx-1 * b Y = X * K

X

Y

K

Covariance Matrix Decomposition:

QR, Cholesky, etc.

Resnet-50 visualization. Kaggle.com

Complex-valued

Higher Precision Desirable (e.g., SPFP32)

Typical FLOPS: up to MFLOPS per Decomposition

Real-valued

Lower Precision Desirable (e.g., INT8)

Typical OPS: 7.6 GOPS (Resnet-50 unpruned)

Steering Vector

Space Time Adaptive ProcessingApplication Example: Beamforming/Nulling (Comms / Anti-Jam)

Deep Learning Inference (Conv. Nets)Application: Modulation Recognition, Waveform Classification

Convolutional

Layer

Processing

https://www.xilinx.com/support/documentation/white_papers/wp452-adaptive-beamforming.pdf

https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2


AI Engine: Multi-Precision Math Support

8 816

32

64

128

32x32SPFP

32x32Real

32x16Real

16x16Real

16x8Real

8x8Real

MACs / Cycle (per core)

Real Data Types Complex Data Types

24

8

16

32x32Complex

32x16Complex

16x16Complex

16Complexx 16 Real

MACs / Cycle (per core)Linear Algebra

Matrix-Matrix Mult

Matrix-Vector Mult

Convolution

FIR Filters

2-D Filters

Transforms

FFTs/IFFTs

DCT, etc

Optimized For:


AI Engine: Scalar Unit, Vector Unit, Load Units and Memory

Local, Shareable Memory• 32KB Local, 128KB Addressable

32-bit Scalar RISC Processor

Up to 128 MACs / Clock Cycle per Core (INT 8)

Highly

Parallel

Memory Interface

Scalar Unit

ScalarRegister

File

Scalar ALU

Non-linear

Functions

Vector

Register

File

Fixed-Point

Vector Unit

Floating-Point

Vector Unit

Vector Unit Vector Processor

512-bit SIMD DatapathInstruction Fetch

& Decode Unit

AGU AGU AGU

Load Unit A Load Unit B Store Unit

7+ operations / clock cycle

• 2 Vector Loads / 1 Mult / 1 Store

• 2 Scalar Ops / Stream Access

Instruction Parallelism: VLIW Data Parallelism: SIMD

Multiple vector lanes

• Vector Datapath

• 8 / 16 / 32-bit & SPFP operands

Stream Interface

Up to 128 MACs / Clock Cycle per Core (INT 8)

8 FLOPs / Clock Cycle (32SPFP)


AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI

Engine

Me

mo

ry

AI Engine

Array

AI Engine: Terminology

>> 20

Versal ACAP

AI Engine

Tile

Interconnect

ISA-based

Vector Processor

Local

Memory

AI Vector

Extensions

5G Vector

Extensions Data

Mover

Memory Interface

Scalar Unit

Scalar

Register

File

Scalar ALU

Non-linear

Functions

Vector

Register

File

Fixed-Point

Vector Unit

Floating-Point

Vector Unit

Vector Unit

Instruction Fetch

& Decode Unit

AGU AGU AGU

Load Unit A Load Unit B Store Unit

Stream Interface

AI Engine

1GHz+ VLIW / SIMD vector processor


AI Inference Mapping on Versal™ ACAP

NETWORK-ON-CHIP

AI Engines

Arm

Dual-Core

Cortex-R5

Arm®

Dual-Core

Cortex™-

A72

I/O

Weight

Buffer

(URAM)

IntelligentAdaptableScalar

External Memory

(e.g., DDR)

Activation

Buffer

(URAM)

PL

Max

Pool

Convolution

Layers

Fully

Connected

Layers

ReLU

˃ Custom memory hierarchy

˃ Buffer on-chip vs off-chip; Reduce latency and power

˃ Stream Multi-cast on AI interconnect

˃ Weights and Activations

˃ Read once: reduce memory bandwidth

˃ AI-optimized vector instructions (128 INT8 mults/cycle)

A = Activations

W = Weights

𝐴00

AI

Engine

𝑊00 𝐴10

AI

Engine

AI

Engine

AI

Engine

𝐴00 𝐴01𝐴10 𝐴11

×𝑊00 𝑊01

𝑊10 𝑊11

=𝐴00×𝑾𝟎𝟎 + 𝐴01×𝑊10 …𝐴10×𝑾𝟎𝟎+ 𝐴11×𝑊10 …

Cascade

Stream

X =

(4x8)

(8x4)

(4x4)

>> 21

Program Directly From High-level ML Frameworks

Frameworks


AI Engine Delivers High Compute Efficiency

95%

80%

98%

ML Convolutions FFT DPD

Vector Processor Efficiency

Peak Kernel Theoretical Performance

Block-based

Matrix Multiplication

(32×64) × (64×32)

1024-pt

FFT/iFFT

Volterra-based

forward-path DPD

˃ Adaptable, non-blocking interconnect

Flexible data movement architecture

Avoids interconnect “bottlenecks”

˃ Adaptable memory hierarchy

Local, distributed, shareable = extreme bandwidth

No cache misses or data replication

Extend to PL memory (BRAM, URAM)

˃ Transfer data while AI Engine Computes

Compute

Comm

Overlap Compute and Communication

Compute Compute

Comm Comm

>> 22


Summary

˃ The evolution of processing for AI is following a similar

track to SDR where hardware and software need to be

tightly coupled

˃ The drive for more Capacity, Autonomy and Resiliency in

advanced SDRs carry high compute demands and mixed

precision processing capabilities

˃ Moore’s Law is running out of steam which means the

goal of a SWaP-friendly multi-mission situationally aware

payload requires advancements in processing beyond

just process technology

˃ ACAPs are a response to this new reality

>> 23

Xilinx VC1902 Versal ACAP with 400 AI Engines.

First shipment June 2019.

Visit https://www.xilinx.com/products/silicon-

devices/acap/versal.html for datasheets, whitepapers,

and product tables.

https://www.xilinx.com/products/silicon-devices/acap/versal.html


Adaptable.

Intelligent.

>> 24

THANK

YOU!Contact Info:

[email protected]

[email protected]

Date post:	19-Jun-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

AI and SDR - GNU Radio...Source: Manuel Uhm, Software-Defined Radio: To Infinity and Beyond,...

Documents