+ All Categories
Home > Documents > GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center...

GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center...

Date post: 01-Aug-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
49
GTC - DC – November - 2019 GPU ACCELERATED SPEECH - TO - TEXT
Transcript
Page 1: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

GTC - DC – November - 2019

GPU ACCELERATED SPEECH-TO-TEXT

Page 2: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

2

AGENDA

1) Brief introduction to speech processing

2) Previous Results

3) Performance Updates Since GTC

4) Kaldi Container

5) Production Deployment with Intelligent Voice

Page 3: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

3

Speech Recognition: the process of taking a raw audio signal and transcribing to text

Use of Automatic Speech Recognition has exploded in the last ten years:

Personal assistants, Medical transcription, Call center analytics, Video search, etc

INTRODUCTION TO ASRTranslating Speech into Text

NVIDIA is

cool0/0.98 1

-:-

2

nvidia:nvidia/1.0

3ai:ai/1.24

4

speech:speech/1.63

-:-

-:-

-:-

Page 4: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

4

KALDI

Kaldi is a speech processing framework out of Johns Hopkins University

Uses a combination of DL and ML algorithms for speech processing

Started in 2009 with the intent to reduce the time and cost needed to build ASR systems

http://kaldi-asr.org/

Considered state-of-the-art

Speech Processing Framework

Page 5: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

5

SPEECH RECOGNITION

• Kaldi fuses known state-of-the-art techniques from speech recognition with deep learning

• Hybrid DL/ML approach continues to perform better than deep learning alone

• "Classical" ML Components:

• Mel-Frequency Cepstral Coefficients (MFCC) features – represent audio as spectrum of spectrum

• I-vectors – Uses factor analysis, Gaussian Mixture Models to learn speaker embedding – helps acoustic model adapt to variability in speakers

• Predict phone states – HMM - Unlike "end-to-end" DL models, Kaldi Acoustic Models predict context-dependent phone substates as Hidden Markov Model (HMM) states

• Result is system that, to date, is more robust than DL-only approaches and typically requires less data to train

State of the Art

Page 6: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

6

KALDI SPEECH PROCESSING PIPELINE

NVIDIA is

cool

Raw AudioFeature

Extraction

Acoustic

Model

Language

ModelOutput

MFCC &

IvectorsNNET3 Decoder

Kaldi Components:

Lattice

Page 7: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

7

PREVIOUS RESULTS

Page 8: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

8

PREVIOUS WORK

NVIDIA Presentations/Publications:

GTC On Demand: https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php

Spring 2018: S81034

Fall 2018: DC8189

Spring 2019: S9672

https://arxiv.org/abs/1910.10032

Devblogs:

https://devblogs.nvidia.com/nvidia-accelerates-speech-text-transcription-3500x-kaldi/

https://devblogs.nvidia.com/gpu-accelerated-speech-to-text-with-kaldi-a-tutorial-on-getting-started/

Page 9: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

9

GTC 2019 ACCELERATED COMPONENTSGPU Accelerated

NVIDIA is

cool0/0.98 1

-:-

2

nvidia:nvidia/1.0

3ai:ai/1.24

4

speech:speech/1.63

-:-

-:-

-:-

Raw AudioFeature

Extraction

Acoustic

Model

Language

ModelOutput

Page 10: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

10

GTC-2019 PERFORMANCE 1 GPU, LibriSpeech, 19-03 container

2x Xeon*: 2x Intel Xeon Platinum 8168,Xavier: AGX Devkit, T4*: PCI-E, V100*: SXM

Determinized Lattice Outputbeam=10

lattice-beam=7Uses all available HW threads

Hardware Perf (RTFx) WER Perf

LibriSpeech Model, Libri Clean Data

2x Intel Xeon 381 5.5 1.0x

AGX Xavier 500 5.5 1.3x

Tesla T4 1635 5.5 4.3x

Tesla V100 3524 5.5 9.2x

LibriSpeech Model, Libri Other Data

2x Intel Xeon 377 14.0 1.0x

AGX Xavier 450 14.0 1.2x

Tesla T4 1439 14.0 3.8x

Tesla V100 2854 14.0 7.6x

Page 11: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

0x

5x

10x

15x

20x

25x

30x

T4 Perf (!) V100 Perf (!)

Spe

edu

p (

!)GTC-2019 Scale-up performance

1 GPU 2 GPUs 4 GPUs 8 GPUs

T4 Performance V100 Performance

1635

RTFx

3371

RTFx

6368

RTFx

7906

RTFx3524

RTFx

7082

RTFx

10011

RTFx

9399

RTFx

Page 12: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

12

MULTI-GPU PERFORMANCE LIMITERS

Scalability Limited Due to CPU Overhead

Feature Extraction and Determinization become bottlenecks

CPU has a hard time keeping up with GPU performance

Page 13: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

13

RECENT PERFORMANCE

UPDATES

Page 14: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

14

RECENT IMPROVEMENTS

Multi-threading improvements

Moved more tasks to worker threads which allows control threads to submit work faster and keep GPU busy

Reduce memory usage

Increased batch size = more performance

General Optimization

Container Improvements

Automatic segmenting and dataset preparation

Added ASpIRE Model

Since GTC 2019

Page 15: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

15

LATEST SINGLE GPU PERFORMANCE 1 GPU, LibriSpeech, 19.11 Container

2x Xeon*: 2x Intel Xeon Platinum 8168V100*: SXM

Determinized Lattice Outputbeam=10

lattice-beam=7Uses all available HW threads

Hardware19.11 Perf

(RTFx)WER

GTC – 2019

19.3 Speedup

GTC–DC -2019

19.11 Speedup

LibriSpeech Model, Libri Clean Data

2x Intel Xeon 381 5.5 1.0x 1.0x

Tesla T4 1849 5.5 4.3x 4.9x

Tesla V100 5154 5.5 9.2x 13.5x

LibriSpeech Model, Libri Other Data

2x Intel Xeon 377 14.0 1.0x 1.0x

Tesla T4 1679 14.0 3.8x 4.5x

Tesla V100 3925 14.0 7.6x 10.4x

Page 16: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

16

SPRING 2019 ACCELERATED COMPONENTSGPU Accelerated

NVIDIA is

cool0/0.98 1

-:-

2

nvidia:nvidia/1.0

3ai:ai/1.24

4

speech:speech/1.63

-:-

-:-

-:-

Raw AudioFeature

Extraction

Acoustic

Model

Language

ModelOutput

Large amount of CPU work.

When scaling to multi-GPU CPU

threads cannot keep up.

Page 17: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

17

GPU ACCELERATED FEATURE EXTRACTIONReduce CPU overhead

NVIDIA is

cool0/0.98 1

-:-

2

nvidia:nvidia/1.0

3ai:ai/1.24

4

speech:speech/1.63

-:-

-:-

-:-

Raw AudioFeature

Extraction

Acoustic

Model

Language

ModelOutput

Batch=1 Implementation moved

work to GPU significantly

reducing CPU load.

Page 18: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

18

FEATURE EXTRACTIONPipeline

Base

Feature

PitchOnlineCmvn

Ivector

ExtractionMFCCFBANK

Green = Implemented in CUDA.Individual Models may not use all components.

Batch=1 implementation only

Input Feature

IvectorFeature

Raw Audio

Currently Not

supported.

Page 19: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

19

IVECTOR EXTRACTIONPipeline

Base

Feature

LDA

Transform

Online

CMVN

SpliceLDA

Transform

Posteriors

Ivector

Stats

Compute

Ivector

Splice

IvectorFeature

Green = Implemented in CUDA.Individual Models may not use all components.

Batch=1 implementation only

Page 20: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

20

GPU FEATURE EXTRACTIONEnd-to-End Scalability & Efficiency (DGX-1V)

GPU_THREADS=2, MAX_BATCH_SIZE=300, BATCH_DRAIN_SIZE=40, DATASETS=test_clean, COPY_THREADS=0

0

5000

10000

15000

20000

25000

1 2 4 8

Rea

l Tim

e Fa

cto

r

Number of V100-SXM-16GB

Parallel Scalability

CPU feature extraction

GPU feature extraction

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8

Para

llel E

ffic

ien

cy

Number of V100-SXM-16GB

Parallel Efficiency

CPU Feature Extraction

GPU feature extraction

Page 21: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

21

FULL NODE PERFORMANCE

Good Scalability across a range of hardware platforms

0

10000

20000

30000

40000

50000

60000

DGX-1V(V100 SXM 16GB)

DGX-2(V100 SXM 32 GB)

SYS-6049GP-TRT(T4)

RTF

x

Kaldi - Multi-GPU Scalability

1 GPU 2 GPUS 4 GPUs 8 GPUs 16 GPUs 20 GPUs

0%

20%

40%

60%

80%

100%

120%

DGX-1V(V100 SXM 16GB)

DGX-2(V100 SXM 32 GB)

SYS-6049GP-TRT(T4)

Par

alle

l Eff

icie

ncy

Kaldi - Multi-GPU Efficiency

1 GPU 2 GPUS 4 GPUs 8 GPUs 16 GPUs 20 GPUs

14 hours in

one second

Page 22: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

22

FUTURE WORK

Batched Feature Extraction

More performance

Online Speech Pipeline

Streaming audio

Lower latency, higher throughput, less memory

Can emulate offline with same benefits

More models

Look for these features at GTC - 2020

Page 23: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

23

THE NGC CONTAINER REGISTRY

Discover over 40 GPU-Accelerated ContainersSpanning deep learning, machine learning, HPC applications, HPC visualization, and more

Innovate in Minutes, Not WeeksPre-configured, ready-to-run

Run AnywhereThe top cloud providers, NVIDIA DGX Systems, PCs and workstations with selectNVIDIA GPUs, and NGC-Ready systems

Simple Access to GPU-Accelerated Software

Page 24: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

24

NGC CONTAINER

Get an NGC account: https://ngc.nvidia.com/signup

Free & Easy

#login in to NGC, pull container, and run it

%> docker login nvcr.io

%> docker pull nvcr.io/nvidia/kaldi:19.10-py3

%> nvidia-docker run --rm -it nvcr.io/nvidia/kaldi:19.10-py3

#prepare models and data

%> cd /workspace/nvidia-examples/librispeech

%> ./prepare_data.sh

#run benchmark

%> ./run_benchmark.sh

Page 25: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

25

BENCHMARK OUTPUTNGC Container

Process 0:

~Group 0 completed Aggregate Total Time: 15.3179 Audio: 19452.5 RealTimeX: 1269.92

~Group 1 completed Aggregate Total Time: 20.8032 Audio: 38905 RealTimeX: 1870.15

~Group 2 completed Aggregate Total Time: 26.5266 Audio: 58357.4 RealTimeX: 2199.96

~Group 3 completed Aggregate Total Time: 31.8119 Audio: 77809.9 RealTimeX: 2445.94

~Group 4 completed Aggregate Total Time: 37.179 Audio: 97262.4 RealTimeX: 2616.06

~Group 5 completed Aggregate Total Time: 42.5534 Audio: 116715 RealTimeX: 2742.79

~Group 6 completed Aggregate Total Time: 48.0023 Audio: 136167 RealTimeX: 2836.68

~Group 7 completed Aggregate Total Time: 49.4219 Audio: 155620 RealTimeX: 3148.8

~Group 8 completed Aggregate Total Time: 54.2707 Audio: 175072 RealTimeX: 3225.91

~Group 9 completed Aggregate Total Time: 57.2566 Audio: 194525 RealTimeX: 3397.42

Overall: Aggregate Total Time: 57.2567 Total Audio: 194525 RealTimeX: 3397.42

%WER 5.54 [ 29134 / 525760, 3900 ins, 2321 del, 22913 sub ]

%SER 51.50 [ 13494 / 26200 ]

Scored 26200 sentences, 0 not present in hyp.

Decoding completed successfully.

Total RTF: 3397.42 Average RTF: 3397.4200 Average WER: 5.5400

All WER and PERF tests passed.

Page 26: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

26

BENCHMARK FEATURES

Transcribes a corpus of audio using multiple threads and an NVIDIA GPU

Create corpus from a directory of wav files or use a provided corpus

Scores transcriptions when gold text is present

Comes with two English models (LibriSpeech & ASpIRE)

Highly tunable through various parameters

CPU_THREADS, GPU_THREADS, NUM_PROCESSES, SEGMENT_SIZE, ITERATIONS, etc

https://devblogs.nvidia.com/gpu-accelerated-speech-to-text-with-kaldi-a-tutorial-on-getting-started/

Page 27: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

27

NVIDIA TECHNICAL CONTRIBUTORS

*Justin Luitjens

Senior Developer Technology Engineer

*Ryan Leary

Senior Applied Research Scientist

Hugo Braun

Senior AI Developer Technology Engineer

*Levi Barnes

Developer Technology Engineer

*David Taubenheim

Senior Solutions Architect

*Attending GTC-DC 2019, Come ask questions and tell us how we can help solve your mission!

*Adam Thompson

Senior Solutions Architect

Page 28: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Nigel CanningsCEO

INTELLIGENT VOICE LTD

@intelligentvoxwww.intelligentvoice.com

Page 29: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Some of the world’s best speech solutions are driven by IV

+130 more..

Intelligent Voice Limited is a global leader in the development of proactive compliance and eDiscovery technology solutions for voice, video and other media. Its clients include government agencies, banks, securities firms, Call-Centers, litigation support providers, international consultancy, advisory businesses and insurers, all involved in the management of risk and meeting of multi-jurisdictional regulation.

Page 30: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

Page 31: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

Nigel reads about “A Supercomputer on your Desk”

Page 32: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Nigel reads about “A Supercomputer on your Desk”May 2 2007

A Brief History of GPU Accelerated Voice

May 3 2007

Nigel’s wife: “We haven’t got a spare 60k ”

Page 33: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

Aug 1 2013

The UK Government gives Nigel a Grant to GPU Accelerate ASR

Page 34: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Aug 1 2013

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

The UK Government gives Nigel a Grant to GPU Accelerate ASR

Early2014

27 CUDA programmers tell Nigel it is Impossible

Page 35: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Aug 1 2013

Early2014

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

The UK Government gives Nigel a Grant to GPU Accelerate ASR

27 CUDA programmers tell Nigel it is Impossible

Jun 11 2014

One man says “Alright, I’ll give that a try”

Page 36: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Aug 1 2013

Early2014

Jun 11 2014

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

The UK Government gives Nigel a Grant to GPU Accelerate ASR

27 CUDA programmers tell Nigel it is Impossible

One man says “Alright, I’ll give that a try”

Mar 17 2015

Nigel Releases GPU Powered ASR at GTC 2015 Running at 31 x Realtime on a K80!

Page 37: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Aug 1 2013

Early2014

Jun 11 2014

Mar 17 2015

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

The UK Government gives Nigel a Grant to GPU Accelerate ASR

27 CUDA programmers tell Nigel it is Impossible

One man says “Alright, I’ll give that a try”

Nigel Releases GPU Powered ASR at GTC 2015 Running at 31 x Realtime on a K80!

Nov 6 2019

Nigel and NVIDIA Show the same process running 1000X Realtime on a V100

Page 38: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

A Brief History of GPU Accelerated Voice

May 2 2007

May 3 2007

Aug 1 2013

Early2014

Jun 11 2014

Mar 17 2015

Nov 6 2019

Nigel reads about “A Supercomputer on your Desk”

Nigel’s wife: “We haven’t got a spare 60k

The UK Government gives Nigel a Grant to GPU Accelerate ASR

27 CUDA programmers tell Nigel it is Impossible

One man says “Alright, I’ll give that a try”

Nigel Releases GPU Powered ASR at GTC 2015 Running at 31 x Realtime on a K80!

Nigel and NVIDIA Show the same process running 1000X Realtime on a V100

Page 39: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Intelligent Voice – Model Performance

Real world Speed: - x250 on a T4- x1000 on a V100 32Gb CPU vs GPU accuracy virtually identical

13.6x14.5x

13.7x 15.6x15.2x

15.4x 12.1x

17.1x

12.8x

15.8x

11.1x

12.2x

18.0x

14.4x

13.5x 15.4x

12.7x

14.4x

14.3x

13.3x

15.0x 9.3x

14.1x

0

200

400

600

800

1000

1200

1400

1600

1800

RTF

x

1x V100 16GB SXM 2x Intel E5-2698Data Labels = SpeedupBeam=15, Lattice Beam = 2.5

Page 40: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Nobody just wants a transcript..

Page 41: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

It is in theory possible to extrapolate the whole of creation—every Galaxy, every sun, every planet, their orbits, their composition, and their economic and social history from, say, one small piece of fairy cake.

Douglas Adams

Page 42: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

The Voice Suite

High Speed ASRLightning Fast Speech-

to-text

Live Call MonitoringCatch anomalies in

real-time

IVNOTE + Smart

TranscriptSearch what’s said

Model BuildingAccelerates ‘learning’

and accuracy

API-based

IntegrationLet our features

enhance yours

Onsite or in-cloudChoose where your

data lives

Biometric SearchVoice ID

Hyperphonic SearchSounds & phrases

searched, instantly

INDEXIntelligent Voice indexes key

words and phrases from your

telephone calls

SPEECH TO TEXTThis allows you to search

for telephone calls as if they

were text.

ANALYSEAdd-on modules give you the

power to analyze calls and track

behavior.

STOREYou have full control of your

data - Securely encrypted on

your cloud platform of choice.

Emotional

Intelligence Behavioural analysis

of human speech

PCI Redaction Automatically remove

Payment Card

information from audio

recordings.

Edge ProcessingOn device speech

recognition, with FULL

vocabulary

Live conference

TranscriptionInstant and accurate.

Encrypted Search:

PATENTEDSearch sound, keeping the

words hidden

IN DEVELOPMENTFEATURES: CURRENTLY AVAILABLE

Automated Language

Detection

Page 43: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

What can we do with it - use cases

Communication Surveillance Financial InstitutionsCompliance monitoringSurveillance – Voice, Chat, Email, Web conference

Live and post Call monitoring Call Centre, Law Enforcement Key word and Phrase spotting and alertingBiometric authorisationQuality Assurance reportingPCI data identification and redaction

E-Discovery Legal Service Providers, Forensic analysts, RegulatorsSearch & Review of large audio and text data setsBiometric Search, persons of Interest

Fraud Detection Insurance Claims

Credibility Analysis Earnings CallsBehavioural analysis of Voice communication

Page 44: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Intelligent Voice Differentiators

GPU Acceleration Model Training

Language DetectionLattice/ Alternate Search

Integrations

Optimised Pipeline

Audio Pre-FilteringCustom VAD

Low Confidence Region BoostingNumber OptimisationDynamic Lattice Boost

Page 45: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Available Languages

English – UK

English – US

English – SA

English – AUS

English – Global

Spanish – MEX

Spanish – EU

Catalan

German – DE

German – Swiss

Portuguese – BR

Portuguese – EU

Dutch

Norwegian

Danish

Japanese

French

Russian

Korean

Mandarin

Tagalog

Cantonese

Italian

Canadian French

Coming Soon:

Arabic

Page 46: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

SmartTranscript™

Trace Alert Terms

Lattice Matching

Redact text and audio direct in your review platform using simple word

highlighting

Karaoke

Automated Topics

Speaker Separated Transcripts

See Word Alternatives

Page 47: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

“Vox in a Box”

Pre-Configured Speech Server

Transcription in 20+ Languages and Dialects

Fully TrainableREST-based API

Highly Optimised for Speed

EDGE or Data Centre

10-50,000 hours per day

GPU powered

Page 48: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

"In an infinite universe, the one thing sentient life cannot afford to have is a sense of proportion."

Douglas Adams

Page 49: GPU ACCELERATED SPEECH-TO-TEXT - NVIDIA...Personal assistants, Medical transcription, Call center analytics, Video search, etc INTRODUCTION TO ASR Translating Speech into Text NVIDIA

Recommended