+ All Categories
Home > Documents > Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends...

Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends...

Date post: 17-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
Life is for sharing. Towards a Universal Quality Scale for Narrowband, Wideband and Fullband Speech Services Sebastian Möller 1 , Jens Berger 2 1 Quality and Usability Lab, Telekom Innovation Laboratories, TU Berlin, Germany 2 SwissQual AG – A Rohde & Schwarz Company, Solothurn, Switzerland
Transcript
Page 1: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Life is for sharing.

Towards a Universal Quality Scale for Narrowband, Wideband and Fullband Speech Services Sebastian Möller1, Jens Berger2

1 Quality and Usability Lab, Telekom Innovation Laboratories, TU Berlin, Germany 2 SwissQual AG – A Rohde & Schwarz Company, Solothurn, Switzerland

Page 2: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Agenda

Motivation Influence Factors

Modelling a telephony situation Bandwidth

Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure

Conclusions and Next Steps

2

Page 3: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Motivation Today’s situation.

Problem statement: Many different subjective experiments using a single scale from 1 to 5 The interpretation of the score highly depends on the experimental context

Listening-only vs. conversation

Bandwidth limitation (e.g. only one bandwidth in the test, or different ones)

Length of the stimuli (short sentences vs. long passages or emulated calls)

In practice, two main discussions are relevant 1) How is the relation between a score for a typical sentence and the quality of a longer call?

Measurement episode

Conversational mode

2) How is the relation between a narrowband score and a super-wideband score?

Bandwidth

Idea: Bandwidth- and situation-independent “universal” scale

Page 4: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Agenda

Motivation Influence Factors

Modelling a telephony situation Bandwidth

Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure

Conclusions and Next Steps

4

Page 5: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Influence Factors Modeling a telephony situation.

Real human conversation

Free conversation

Controlled conversation

Emulated conversation

3rd party listening test

Listening-only test

Free conversation between two persons

Scripted dialog between two persons

Listening pre-recorded samples Emulation of own speech activity by

keyword spotting

Listening to a pre-recorded conversation No own activity

Listening to pre-recorded short speech samples No own activity

ITU P.805

ITU P.800

ITU P.1302

ITU P.800

Page 6: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

6

Influence Factors Bandwidth.

“Noi

sine

ss”

(Wältermann et al., JAES 2010)

Page 7: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Result is a Mean Opinion Score representing overall listening quality (e.g. ITU-T P.800)

This integral score reflects all perceived degradations by the users, including individual preferences and cross-masking effects

Result: One score for each presented speech sample despite length and bandwidth, only addressing the listening mode

excellent

good

fair

poor

bad

excellent

good

fair

poor

bad

(5)

(4)

(3)

(2)

(1)

(5)

(1)

Influence Factors Example: Test according to ITU P.800.

Page 8: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Agenda

Motivation Influence Factors

Modelling a telephony situation Bandwidth

Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure

Conclusions and Next Steps

8

Page 9: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

9

E-model approach:

Establishment of the Universal Scale Integration of different types of degradations.

Backgr. noise, acoustic coupling

Linear distortion, delay

Codec Packet loss

Jitter buffer, VAD

Talker echo, listener echo

Circuit noise

Backgr. noise, acoustic coupling

IP WAN

4

4

Page 10: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

10

E-model approach:

Establishment of the Universal Scale Integration of different types of degradations.

IP WAN

4

4

Overall quality R = Ro - Is - Id - Ie,eff

Estimated user judgment MOS = f (R )

Impairments SNR simultaneous delayed nonlin./timevar.

Ps, Ds, STMR

SLR, RLR, Ta

Ie, qdu Ppl Bpl TELR, T, WEPL, Tr

Nc, Nfor Pr, Dr, LSTR

Page 11: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

11

E-model approach:

Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus effect”

Definition of an “absolute quality scale“ (R-scale) which should be independent of the judgment context

Relationship between judgment scale and quality scale is then context-dependent

Establishment of the Universal Scale Integration of different types of degradations.

0 50 100 1501

1.5

2

2.5

3

3.5

4

4.5

R

MO

S

Page 12: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

12

Telephony situation:

Scale should reflect conversational quality, measured e.g. according to ITU-T Rec. P.800 and P.805

Listening-only tests may be used in case that no “conversational impairments“ are present, however scale endings might be used more frequently than in conversation tests

Conversations may be approximated by presenting selected stretches of speech (4…8 s) in “emulated conversation tests“ according to ITU-T Rec. P.1302

Listening-only tests according to ITU-T Rec. P.800 may be used for evaluating the single stretches of speech

Establishment of the Universal Scale Scale requirements.

Page 13: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

13

Bandwidth:

Scale should rank correctly narrowband, wideband, super-wideband and fullband signals, and a “per call quality”

Transformation of individual (P.800) experiments, of different bandwidth contexts, to the universal scale must be possible

Establishment of the Universal Scale Scale requirements.

Page 14: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

14

Bandwidth:

Conduct different tests according to ITU P-series Recommendations in any mode

Listening-only tests

Conversation tests

Emulated conversation tests

Transform results onto the universal scale using fixed anchor conditions

Use the transmission rating scale rather than the MOS scale as a first guess

Establishment of the Universal Scale Proposed procedure.

Page 15: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Transformation relative to anchor conditions

Independent of original experimental context, the score on the universal scale is the same

Establishment of the Universal Scale Proposed procedure.

Page 16: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Agenda

Motivation Influence Factors

Modelling a telephony situation Bandwidth

Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure

Conclusions and Next Steps

16

Page 17: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

17

Conclusions:

Requirements for the new scale have been set up regarding

length of the measurement episode

conversational mode

bandwidth

Establishment of the scale requires tob-down and bottom-up considerations

Next steps:

Define anchor conditions

Conduct subjective tests

Transform results and adjust

Define transformation laws also for instrumental models

Conclusions and Next Steps Universal quality scale.

Page 18: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus

Thank you for your attention!

Visit www.qu.tu-berlin.de for more information.


Recommended