+ All Categories
Home > Documents > High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High...

High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High...

Date post: 28-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
29
High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video Peter Walsh Chief Emerging Technology Engineer ESPN
Transcript
Page 1: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video

Peter Walsh Chief Emerging Technology Engineer

ESPN

Page 2: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Overview

• Real-time GPU processing of broadcast video

– Maximize GPU utilization

– Maintain flexibility

• High Performance Video Pipeline

– CPU and GPU buffers

– Data transfer

Page 3: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Monday Night Football production truck

Page 4: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

NASCAR production truck

Page 5: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Studio (BCS championship “Film Room”)

Page 6: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

GPU Processing

• Segmentation (generating chromakey)

• Inserting graphics (linear and chromakeying)

• Field (camera) tracking

• Object (player) tracking

Page 7: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Page 8: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Background

• “Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

• “Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014

Page 9: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast
Page 10: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Naïve Sequential Implementation

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 11: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 12: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Techniques

• Avoid CPU memory copies

• Use pinned system memory

• DMA Video I/O using pinned memory

• DMA between CPU and GPU

• Asynchronous – using multiple CUDA streams

• Double buffers for simultaneous R/W

Page 13: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Frame Buffers

Pinned System

System

GPU

Page 14: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Frame Buffers

Pinned System

System

GPU

Page 15: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Buffer Allocation • Device • System • Pinned System

• 1D • 2D (pitch specified) • 2D (pitch determined by CUDA allocation)

Page 16: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Pitch

Page 17: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

CUDA API

Allocation:

Memory Copies:

cudaMalloc() cudaHostAlloc() cudaMallocPitch()

cudaMemcpy() cudaMemcpy2D() cudaMemcpyAsync() cudaMemcpy2DAsync()

Page 18: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Buffer Transfers

B.Copy(A, pStream)

• Source and destination buffers

– System, pinned system, device

– Different pitches

• Supports Synchronous/Asynchronous transfers

Page 19: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

CUDA Kernels

LaunchKernel( A, B, pStream, …)

• Buffers A and B are in device memory

• Sync/Async behavior controlled by pStream

Page 20: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

A

B C

D

Processing

Acquire(A) B.Copy(A, pUploadStream) Process(B, C, pProcessingStream, params) D.Copy(C, pDownLoadStream) Output(D)

GPU

CPU

Page 21: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Double Buffering

Dst

Src

Src

Dst

Frame “i”

Frame “i + 1”

Page 22: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Double Buffering

Src

Processing

GPU

CPU

Dst

Src Dst Src Dst

Src Dst

Page 23: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Double Buffering

Src

Processing

GPU

CPU

Dst

Src Dst Src Dst

Src Dst

Page 24: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Page 25: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 26: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Intel IPP ippiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

NVIDIA NPP nppiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

HPVP Filter_8u_C1R(pSrc, pDest, roi, pFilterKernel);

Page 27: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Live Filtering

• Acquire(A)

• B.Copy(A, pUploadStream)

• Filter_8u_C3R(B, C, roi, pFilterKernel) *

• D.Copy(C, pDownLoadStream)

• Output(D)

* CUDA stream for processing already defined

Page 28: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

References/Links

“Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

“Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014 http://www.youtube.com/watch?v=QpEV-XVIxNw http://frontrow.espn.go.com/2014/01/espns-advanced-replay-tool-art-graphically-enhances-sports-telecasts/

Page 29: High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast

Questions

Peter Walsh ESPN [email protected] (860) 766-2908


Recommended