+ All Categories
Home > Documents > Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic...

Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic...

Date post: 26-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
Transcript
Page 1: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly
Page 2: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly
Page 3: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 4: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 5: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Effects (APO/DSP vendor)

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 6: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 7: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 8: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 9: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Provide value-added features, e.g. AEC, AGC

• COM object, and run in user mode

• Proxy APO for Hardware DSP, Windows provide a default proxy

APO (MsApoFxProxy.dll)

• Three different location for APO:

o Stream Effect (SFX):

an instance of the effect for every stream

o Mode Effect (MFX):

applied to all streams that are mapped to the same mode

o Endpoint Effect(EFX):

Endpoint Effect (EFX) are applied to all streams that use the same

endpoint, always applied event to RAW

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 10: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 11: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Expose all audio effects including Beam Forming, Noise suppression and

echo cancelation via FX_Stream_CLSID, FX_Mode_CLSID, and

FX_Endpoint_CLSID APOs

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 12: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Describe Microphone’s number,

position, type, angle, and so on

• Audio driver reported to

Windows by

KSPROPERTY_AUDIO_MIC_ARR

AY_GEOMETRY

• Very important for Windows

Speech platform enhancement

pipeline

• Descriptor

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 13: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 14: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Speech mode specifies:

The application expects speech recognition specific signal

processing at the lowest latency

The hardware preferred sample rate for wideband speech (such

as 16 kHz).

• Need support for Speech mode if using OEM pipeline

#define STATIC_AUDIO_SIGNALPROCESSINGMODE_SPEECH 0xfc1cfc9b, 0xb9d6, 0x4cfa, 0xb5, 0xe0, 0x4b, 0xb2, 0x16, 0x68, 0x78, 0xb2DEFINE_GUIDSTRUCT("FC1CFC9B-B9D6-4CFA-B5E0-4BB2166878B2", AUDIO_SIGNALPROCESSINGMODE_SPEECH);#define AUDIO_SIGNALPROCESSINGMODE_SPEECH DEFINE_GUIDNAMED(AUDIO_SIGNALPROCESSINGMODE_SPEECH)

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 15: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Mic Gain is very key important to Cortana experience

• Default Mic Gain is the OEM recommended Mic Gain for

customer to use in Cortana

• HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_One

Core\AudioInput\MicWiz\DefaultDefaultMicGain

• The Registry key is set only to integrated mic arrays

• The Registry is set only meet or exceed Standard metrics

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 16: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

USB

Terminal Type Code I/O Description

Input Undefined 0x0200 I Input Terminal, undefined Type.

Microphone 0x0201 I A generic microphone that does not fit under any of the other

classifications.

Desktop Microphone 0x0202 I A microphone normally placed on the desktop or integrated

into the monitor.

Personal microphone 0x0203 I A head-mounted or clip-on microphone.

omni-directional

microphone

0x0204 I A microphone designed to pick up voice from more than one

speaker at relatively long ranges.

microphone array 0x0205 I An array of microphones designed for directional processing

using host-based signal processing algorithms.

processing microphone

array

0x0206 I An array of microphones with an embedded signal processor.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 17: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 18: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Is Mic geometryexposed

Is Raw Mode Supported

Is Speech Mode supported

Run MS pipeline in default mode

Start

Yes

Yes

No

Run MS pipeline in raw mode

Run OEM pipeline inspeech mode

Are AEC and NS exposed

Yes

Yes

No

No

No

Is Mic an array

Yes

A single microphone does not require a

microphone geometry

No

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 19: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Driver Configuration Verification Tool

• OEMVerificationWin10x86.exe

• Recorder and Sound files

• Score Utility

• OEMScoreUtilityx64.exe

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved. Shared with

Partners under NDA.

Page 20: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 21: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Good acoustic design is a function of many parameters other than just

microphone design, and is highly dependent on the device integration

and usage

Mic EQ, GainOEM Speech Recognizer

Acoustic ModelsMulti-channel Echo Canceling

Noise Suppression

BeamformingVoice Activation

Microsoft speech pipeline

Automatic Gain Control

Cortana

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 22: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Microsoft recommends two or more Microphones

• Benefits:

Sound Source Localization

Reduction of ambient noises.

Partial de-reverberation, because most indirect paths are attenuated.

Reducing the effects of electronic noise.

Microphone

array

Eleme

nts

Type NG, dB NGA,

dB

DI, dB

Linear, small 2 uni-

directional

-12.7 -6.0 7.4

Linear, big 2 uni-

directional

-12.9 -6.7 7.1

Linear, 4el 4 uni-

directional

-13.1 -7.6 10.1

L-shaped 4 uni-

directional

-12.9 -7.0 10.2

Linear, 4 el

second

geometry

4 integrated -12.9 -7.3 9.9

Good

omnidirectio

nal

microphone

1 integrated 0 0 4.5

Target Characteristics(for reference)

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 23: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Cover a quiet office or cubicle with good sound capturing

• Speaker is less than 0.6 meters from the microphone

Small two-element array Big two-element microphone array

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 24: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

L-shaped four-element microphone arrayLinear four-element microphone

array

• Cover a quiet office or cubicle with good sound capturing

• Speaker is less than 2 meters from the microphone

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 25: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Circle microphone array geometry

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 26: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• Important to ensure temporal relationship between signals in Mics

• Import to Beam forming and source localizer

Frequency(HZ) PHASE RESPONSE MATCHING

250 <30 deg,<20 will be better

1K <30 deg,<20 will be better

4K <30 deg,<20 will be better

7K <30 deg,<25 will be better

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 27: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Frequency(HZ) THD

250 <3.2%, 2.5% WILL BE BETTER

1K <3.2%, 2.5% WILL BE BETTER

4K <3.2%, 2.5% WILL BE BETTER

5K <4%

6K <6.3%

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 28: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Requirement: Mic array far away from speaker

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 29: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Requirement: Mic array far away from speaker

Page 30: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Requirement: Mic array far away from speaker

Page 31: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

• SNR

better than 61 dB, 63 dB will be better

• Mic sensitivity

+/-3 dB, +/-1 dB will be better

• No signal leak

should be no leak from the hole to Microphone physically

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 32: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Component Requirement

Sampling rate 16,000 Hz, synchronized for all ADCs

Sampling synchronization Better than 1/64th of the sampling

period

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 33: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

Document Detail Reference Link

Portclass Miniport driver

Develop Guide

Describe how to develop

Audio driver for PCI and DMA

based audio driver. Describe

how to write a Miniport

PortClass Audio Driver.

https://msdn.microsoft.com/EN-

US/library/windows/hardware/ff536829(v=vs.85).aspx

Microphone Array Support

in Windows

Describe Microphone Array

Design Guide in Windows.

https://msdn.microsoft.com/en-

us/library/windows/hardware/dn613960.aspx

Windows Audio Driver

Development Guide

Windows have several kinds

of Audio drivers. This link

Describe how to choose

which kinds of Audio driver

to develop for its device.

https://msdn.microsoft.com/en-

us/library/windows/hardware/ff537861(v=vs.85).aspx

Enabling Great Audio

Experiences in Windows 10

Windows 10 audio framework https://channel9.msdn.com/Events/WinHEC/2015/WHT2

02

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Page 34: Windows speech platform and Cortana extension introduction · 2015-08-26 · •Good acoustic design is a function of many parameters other than just microphone design, and is highly

(c) 2015 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views

expressed in this document, including URL and other Internet Web site references, may change without notice. You

bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any

Microsoft product. You may copy and use this document for your internal, reference purposes.

Some information relates to pre-released product which may be substantially modified before it’s commercially

released. Microsoft makes no warranties, express or implied, with respect to the information provided here.


Recommended