W3C TPAC 2020Joint MeetingsWEBRTC WGOctober 13, 15, 16, 20208 AM - 9 AM Pacific Time
Chairs: Bernard AbobaHarald AlvestrandJan-Ivar Bruaroey 1
Welcome!● Welcome to the joint meetings of the W3C
WebRTC WG at TPAC 2020!
2
Agenda for Joint Meeting Weekhttps://www.w3.org/2011/04/webrtc/wiki/TPAC_2020https://www.w3.org/2020/10/TPAC/group-schedule.html
Tuesday, October 13, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of APA and WEBRTC WGZoom link: https://us02web.zoom.us/j/88497697633?pwd=WmJxRGxzRmlRUFNFVml1TTg0K2dDZz09
Thursday, October 15, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of PING and WEBRTC WGZoom link: https://us02web.zoom.us/j/83080493198?pwd=MXlKY1Fidzd4MG5EU2ZNdlRHdks4Zz09
Friday, October 16, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of MEIG and WEBRTC WGZoom link: https://us02web.zoom.us/j/88027802654?pwd=UDBnTUJ5SzI4S1VXNFRjMnNJUkxjdz09 3
W3C WG IPR Policy● This group abides by the W3C Patent Policy
https://www.w3.org/Consortium/Patent-Policy/ ● Only people and companies listed at
https://www.w3.org/2004/01/pp-impl/47318/status are allowed to make substantive contributions to the WebRTC specs
4
Joint MeetingAPA and WEBRTC WG
October 13, 20208 AM - 9:00 AM Pacific Time
5
W3C WG IPR Policy● This group abides by the W3C Patent Policy
https://www.w3.org/Consortium/Patent-Policy/ ● Only people and companies listed at
https://www.w3.org/2004/01/pp-impl/47318/status are allowed to make substantive contributions to the WebRTC specs
6
About this Meeting● Meeting info:
○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #apa ● The meeting is being recorded.
7
Welcome!● Welcome to the joint meeting of the W3C
APA and WebRTC WGs at TPAC 2020!● During this meeting, we will discuss issues
relating to accessibility in realtime communications.
8
Agenda for Joint APA/WEBRTC WG Meeting
● WEBRTC WG Charter and Deliverables (Chairs + Dom, 10 minutes)● W3C Machine Learning Workshop (Bernard + Dom, 5 minutes)
○ https://www.w3.org/2020/06/machine-learning-workshop/● IETF accessibility initiatives (Bernard + Lorenzo, 15 mins)
○ Real-Time Text (RTT) and WebRTC Data Channel○ Human Language Negotiation○ WebRTC Interoperability profile for the Video Relay Service
● RTC Accessibility User Requirements: https://w3c.github.io/apa/raur/ (Joshue, 30 minutes)
9
WebRTC WG Charter● WEBRTC WG recently re-chartered through September 2022:
https://w3c.github.io/webrtc-charter/webrtc-charter.html ● Out of scope:
○ The definition of the network protocols used to establish the connections between peers is out of scope for this group; in general, it is expected that protocols considerations will be handled in the IETF.
○ The definition of any new codecs for audio and video is out of scope.● In scope:
○ API functions to explore device capabilities, e.g. camera, microphone, speakers,○ API functions to capture media from local devices (e.g. camera and microphone, but also output
devices such as a screen),○ API functions for encoding and other processing of those media streams,○ API functions for accessing the data in these media streams,○ API functions for decoding and processing (including echo canceling, stream synchronization and
a number of other functions) of those streams at the incoming end,○ Delivery to the user of those media streams via local screens and audio output devices (partially
covered with HTML5). 10
WebRTC WG Deliverables● WebRTC 1.0 API: https://w3c.github.io/webrtc-pc/ ● WebRTC-Stats: https://w3c.github.io/webrtc-stats/● WebRTC-NV Use Cases: https://w3c.github.io/webrtc-nv-use-cases/● WebRTC Extensions: https://w3c.github.io/webrtc-extensions/ ● WebRTC-ICE: https://github.com/w3c/webrtc-ice ● WebRTC SVC: https://github.com/w3c/webrtc-svc● Insertable Streams:
https://github.com/w3c/webrtc-insertable-streams● WebRTC Priority: https://w3c.github.io/webrtc-priority/● WebRTC-DSCP: https://w3c.github.io/webrtc-dscp-exp/
11
WebRTC WG Deliverables (Capture)● Media Capture Automation:
https://w3c.github.io/mediacapture-automation/● Media Capture and Streams:
● Media Capture Image: https://w3c.github.io/mediacapture-image/● Audio Output: ● Screen Capture: ● Media Recording: ● Content-Hints: https://w3c.github.io/mst-content-hint/
12
Machine Learning and Accessibility● Machine Learning is increasingly used to address accessibility concerns.
Examples:○ Speech transcription. ○ Language translation.○ Image recognition○ Image to text or speech○ Emotion analysis (from audio or video).
● W3C Machine Learning Workshop: https://www.w3.org/2020/06/machine-learning-workshop/
● There is work underway to enable machine Learning algorithms to access raw media in an efficient way, including: ○ VideoTrackReader API (WebCodecs)○ Insertable Streams proposal for raw media (WEBRTC WG) 13
Accessibility Work within the IETF ART Area● T.140 over WebRTC Data Channel (MMUSIC):
draft-holmberg-mmusic-t140-usage-data-channel○ Enables Real-Time Text (RTT) to be sent and received over the WebRTC data channel using the
WebRTC 1.0 API. ○ Compatible with RFC 8373 language negotiation
● Language negotiation (SLIM): RFC 8373, draft-ietf-slim-use-cases○ Enables SDP negotiation of spoken, written and signed languages between parties.
○ Supports audio (spoken languages), video (signed and captioned languages), text (written languages)
● Interoperability profile of the Video Relay Service (RUM): https://tools.ietf.org/html/draft-ietf-rum-rue
○ Interoperability Profile for Relay User Equipment, referencing RTCWEB documents, including JSEP, Overview, RTP Usage, Security Architecture, Transports, RFC 7742 (Video requirements) and RFC 7874 (Audio requirements)
○ Open source implementation available.○ History & Background:
https://datatracker.ietf.org/meeting/105/materials/slides-105-rum-rum-history-background-00
T.140/WebRTC Gateway Specificationdraft-ietf-mmusic-t140-usage-data-channel● Enables Real-Time Text (RFC 4103) to be sent over the WebRTC data
channel.● Uses reliable, ordered transport.● Compatible with existing implementations of the W3C WebRTC API.● Compatible with negotiation of human language (RFC 8373).● Requires a gateway between WebRTC data channel and RTT
endpoints.● Implementation in Janus (Lorenzo Miniero):
○ PR: https://github.com/meetecho/janus-gateway/pull/1898 ○ Article:
https://www.meetecho.com/blog/realtime-text-sip-and-webrtc/ 15
T.140/WebRTC Gateway Implementation Experience● Integrated in Janus SIP plugin as an experimental feature
○ Negotiates m=text on the SIP side○ Negotiates m=application on the WebRTC side
● Both plain T.140 and T.140 over RED supported on the SIP side○ In both cases, always T.140 on data channels
● Initial integration for T.140 data channel negotiation properties○ dcmap line offered with hardcoded values
■ No support for any dcsa attributes, though○ Subprotocol and label used for delivery on data channels
● Patch includes simple integration in demo UI as well○ Basic UI to send local and display remote real-time text
16
T.140/WebRTC Gateway Implementation Experience● A few limitations in the current effort
○ Only tested with TIPcon1 (very old open source Java application)○ dcmap values are currently hardcoded, and ID is ignored
■ Note: may be hard to enforce in browsers in general?○ No buffering currently performed on the send side
■ Neither in the Janus plugin (when receiving data from DC)■ … nor in the browser/UI (when sending data on DC)
○ Behaviour in presence of packet loss not tested properly yet■ RTCP support still missing for the SIP/RTT SSRC
● Ready for experimentation, though!○ Would love to see this effort move forward
17
Negotiating Human Language (RFC 8373)● Supports negotiation of the language used to send and receive for each
media component.○ Enables user language preferences to be described in the Session Description Protocol
(SDP).○ “Language preferences” apply to all media, covering signed (e.g. ASE), spoken and
written languages (RTT). ○ Language negotiation handled via signaling outside the WebRTC API.
■ SDP Language negotiation attributes not passed in JSEP.
● Examples: ○ Negotiation of sending and/or receiving American Sign Language within a video stream.○ An Offer indicating a preference to write Spanish text and receive Spanish language
audio, with English as a second choice for both modalities.
18
Operational Model for RFC 8373● Language preferences negotiated for each media can be used to:
○ Route the call (e.g. to a Spanish speaker or ASL interpreter).○ Configure machine learning gateway services. Examples:
■ Text to speech or speech to text■ Translation from one language to another
● Media usage is up to the participants. Example: ○ Participants negotiate send/receiving ASE on video, sending/receiving English
via text, as well as receiving English audio.○ Whether the participants sign or text or speak is up to them (and can change
dynamically). Example:■ Participants can try signing, encounter video quality issues, then agree to
switch to text.19
RTC Accessibility User Requirements (RAUR)Updated draft overview - Oct 2020
There is now an updated FPWD available:
http://raw.githack.com/w3c/apa/AccessibleRTC/raur/index.html
20
New User Needs: Changes to RAUR
# Window anchoring and pinningUser Need 1: A deaf or hard of hearing user needs to anchor or pin certain windows in an RTC application so both a sign language interpreter and the person speaking (whose speech is being interpreted) are simultaneously visible.
21
New User Needs: Changes to RAUR
# Window anchoring and pinningREQ 1a: Provide the ability to anchor or pin specific windows so the user can associate the sign language interpreter with the correct speaker.REQ 1b: Allow the use of flexible pinning of captions or other related content alternatives. This may be to second screen devices.REQ 1c: Ensure the source of any captions, transcriptions or other alternatives is clear to the user, even when second screen devices are used.
22
New User Needs: Changes to RAUR
# Pause capture of 'on record' captioning in RTCUser Need 2: A deaf or hard of hearing user may need captioning of content in a meeting or presentation to be private.
23
New User Needs: Changes to RAUR# Pause capture of 'on record' captioning in RTCREQ 2a: Ensure there is a host operable toggle in the captioning service (whether human or automated) that facilitates going on and off record for the preserved transcript, but continues to provide captions meanwhile for 'off record' conversations.REQ 2b: Ensure the toggle between saving recordings also applies to the saving of captions. There should be a mechanism that both audio and captions can be paused or stopped, and both can be simultaneously restored for recording.
24
New User Needs: Changes to RAUR
# Accessibility user preferences and profilesUser Need 3: A user may need to change device or environment and have their accessibility user preferences preserved.
25
New User Needs: Changes to RAUR
# Accessibility user preferences and profilesREQ 3a: Ensure user profiles and accessibility preferences in RTC applications are mobile and can move with the user as they change device or environment.
26
New Requirements: # Emergency calls and RTT
REQ 11b:Avoid the problem of unsent emergency messages. A user may not be aware when they have not successfully sent an emergency message. For example, RTT avoids this problem due to instantaneous data transfer but this may be an issue for other messaging platforms.
REQ 12b: Provide support for other languages and translations. For example, VRS calls may be made between ASL (American Sign Language) users and hearing persons speaking either English or Spanish, or variations in signing itself such as Irish Sign Language (ISL, more closely related to French sign language) and British Sign Language (BSL) and a user may need to stream both or pin.
27
Other changes
# Note on the relationship between RTC and XR Accessibility User Needs.# Note on work on personalisation semantics and CSS media queries.# Moved User Need 19: A deaf user watching a signed broadcast needs a high-quality frame rate to maintain legibility and clarity in order to understand what is being signed to 'Quality of service issues' section.
# Added note on ITU definition of Total Conversation services.
REQ 10a: Ensure support for multiple simultaneous streams
28
Conclusions
● Review and feedback requested from WebRTC group to APA.
29
Thank you
Special thanks to:
WG Participants, Editors & Chairs
30
Joint MeetingPING and WEBRTC WG
October 15, 20208 AM - 9:00 AM Pacific Time
31
W3C WG IPR Policy● This group abides by the W3C Patent Policy
https://www.w3.org/Consortium/Patent-Policy/ ● Only people and companies listed at
https://www.w3.org/2004/01/pp-impl/47318/status are allowed to make substantive contributions to the WebRTC specs
32
About this Meeting● Meeting info:
○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #webrtc ● Do we want to record this meeting?
33
Welcome!● Welcome to the joint meeting of the W3C
PING and WebRTC WGs at TPAC 2020!● During these meetings, we hope to make
progress on WebRTC privacy issues.
34
Agenda for Joint PING/WEBRTC WG Meeting
1. State of privacy in Media Capture and Streams (Jan-Ivar)2. Open privacy issues (Jan-Ivar)3. State of privacy in Audio Output Devices API (Jan-Ivar)4. Media Capture Extensions: In-Browser Cam/Mic Picker
(Jan-Ivar)5. State of privacy in WebRTC-PC / Stats / SVC (Jan-Ivar)
35
State of privacy in Media Capture and StreamsThanks to PING for reviewing these APIs!
await navigator.mediaDevices.enumerateDevices() // device enumeration await navigator.mediaDevices.getUserMedia() // camera/mic access navigator.mediaDevices.ondevicechange = func // detect device add/rem
12 issues were filed (4 open, 8 closed). 7 PRs were merged from review.
An overview of the state of these APIs before diving into open issues.
36
await navigator.mediaDevices.enumerateDevices()
Drive-by web in iframe (without allow="camera" or allow="microphone"):[]
Drive-by web with focus: 2 bits - whether user has 0 cameras or 0 microphones*[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.
A site with persisted permission to use camera or microphone: same (2 bits)*[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.
Camera and/or microphone captured in the current document: full list w/deviceIds & labels[{kind: "videoinput", deviceId: [origin-specific id], label: "FaceTime HD Camera", groupId: [rotated id]}, {kind: "audioinput", deviceId: [origin-specific id], label: "MacBook Pro Microphone", groupId: [rotated id]}}, ...more] // lets site implement device selection
* Implemented in Safari. In development in Chrome & Firefox. Principle: “Trackers fear device light”.We’ve deprecated enumerate-first web strategy in favor of device-first. Some breakage expected.https://w3c.github.io/mediacapture-main/#access-control-model
State of device enumeration
37
Tracker
State of devicechange event
navigator.mediaDevices.ondevicechange = func
Drive-by web in iframe (without allow="camera" or allow="microphone"): never fired
Drive-by web with focus: fired on 0→1 and 1→0 transitions in number of cameras, ditto mics*
A site with persisted permission to use camera or microphone: same (0→1 and 1→0)*
Camera and/or microphone captured in the current document: fired on any change to full list
* Implemented in Safari. In development in Chrome & Firefox.
TL;DR: event fires on changes observable at the different enumerateDevices exposure levels.38
State of camera and microphone capture (permission)
await navigator.mediaDevices.getUserMedia({video: true, audio: true})
Drive-by web in iframe (without allow="camera" or allow="microphone"): NotAllowedError
Drive-by web with focus (top-level or iframe): Permission (& prompt) tied to top-level originper standard permissions-policy rules
A site with focus & persisted permission to use cam(s) and mic(s): Permitted (no prompt)
In focus & camera and microphone capturing in the current document: Permitted (no prompt)
TL;DR: unchanged39
State of camera and microphone deviceId
Recap of intended access model: Good sites remember which camera the user is using (ditto mic):
const video = {deviceId: localStorage.cameraId}; // from last visit const stream = await navigator.mediaDevices.getUserMedia({video}); const [track] = stream.getVideoTracks(); localStorage.cameraId = track.getSettings().deviceId; // in case new camera
In iframes: “the decision of whether or not the identifier is the same across documents, MUST follow the User Agent's partitioning rules for storage (such as localStorage)”
● No longer a UUID● No longer in
enumerateDevices()except during capture.
● Best practice: →low entropy (no impl)
40
Recap: sites may triage undesired devices ahead of prompt using required constraints (min,exact):
const video = {height: {min: 1080}}; // 1080p or higher only try { await navigator.mediaDevices.getUserMedia({video}); } catch (e) { // No prompt. No 1080p or higher }
But if there is 1080p this would prompt user or turn camera on (min 3 seconds)!This is thought to be a sufficient deterrent. Tracking libraries won’t risk device light or prompt.
To be conservative, we’ve made required constraints opt-in (for other specs like imageCapture):
“The allowed required constraints for device selection contains the following constraint names: width, height, aspectRatio, frameRate, facingMode, resizeMode, sampleRate, sampleSize, echoCancellation, autoGainControl, noiseSuppression, latency, channelCount, deviceId, groupId.”
State of camera and microphone constraints probing
41
● #640 - Only reveal labels of devices user has given permission to● #645 - enumerateDevices should only provide device info permission for
granted device types● #646 - Should enumerateDevices by default return an empty list?● #672 - Deprecate inputDeviceInfo.getCapabilities() for better privacy
Open Privacy Issues
42
Labels are bad for web compat and privacy, but will take time to get rid of.
Exposure significantly reduced now that a document must be capturing or have actively captured just now to see labels (persistent permission no longer sufficient).
Labels of non-granted devices are needed during capture to support sites implementing ⚙ device pickers in browsers that don’t grant all devices at once.
Long-term solution: in-browser picker for camera & mic (extension spec)
Short-term sub-issues:1. Labels may contain private information. Encourage sanitization.2. Clarify label is for display purposes; don’t rely on == model/manufacturer.
Propose: Close issue after short-term solved, and revisit with in-browser picker extension spec.
#640 - Only reveal labels of devices user has given permission to
43
Current spec allows enumeration of cams AND mics upon capturing either a camera OR microphone.
In theory it seems logical to further restrict this to allow● enumerating cameras only if document is capturing/has captured camera● enumerating microphones only if document is capturing/has captured mics
OTOH: Is successfully obtaining camera or microphone from the user perhaps sufficient to build a device picker for both?
Permission escalation example: Site X allows users to join web conferences with only microphone permission. Users expect to see camera choices in the site’s ⚙ options panel. Restriction may complicate ⚙ UX, so site X demands camera on entry instead.
Consensus: Restrict devices by granted types. This is what Chrome is implementing so breakage risk is probably low.
#645 - enumerateDevices should only give info for granted types
44
Not web compatible to return an empty list. Booleans enable cam/mic UX display.
Our thinking: allow user agents to fake camera and/or microphone if missing.
Side-effects: (inherent from loss of information; regardless of approach)1. camera/mic related UX (buttons) always visible on sites that today hide them for
users without camera and/or mic. Site only learns of absence when getUserMedia fails with NotFoundError. (Mild)
2. devicechange event will never fire when users plug in their first camera or mic. Site cannot take action on these events. (workflow issue?)
Consensus: Return booleans for cam/mic. The spec already allows user agents to fake devices (Safari has option to expose fake devices). Propose: Note this
#646 - Should enumerateDevices by default return an empty list?
45
This API helps sites enforce their constraints while building their pickers:
for (const device of await navigator.mediaDevices.enumerateDevices()) { if (device.getCapabilities().height.max < 1080) continue; options.push({name: device.label}); }
● But it lets site enumerate capabilities (min/max ranges, enums) of all devices.● Only available during capture● One implementation
Long term: A constraints-based in-browser picker would obsolete this need.Side-effect of losing: User would be able to pick device violating site constraints.No consensus. Feature at risk (1 implementation). Revisit w/in-browser picker
#672 - Deprecate inputDeviceInfo.getCapabilities(); better privacy
46
State of privacy in Audio Output Devices APIEarly spec followed an enumerate-first model with similar problems.
await navigator.mediaDevices.enumerateDevices() // enumerate speakers await audioElement.setSinkId(deviceId) // select speaker device
Implemented only in Chrome/ium behind microphone permission, which limits applications to web conferencing today (asking for mic to use speakers is escalation).
We got rid of this 3rd bit in enumerateDevices (before capture):[{kind: "videoinput"}, {kind: "audioinput"}, {kind: "audiooutput"}]
Latest API is an in-browser picker API (no implementation yet): await navigator.mediaDevices.selectAudioOutput() // in-browser picker
47
Latest API:
const id = await navigator.mediaDevices.selectAudioOutput(); // picker audioElement.setSinkId(id); // redirect audio from default speakers
● Single id exposed in session in enumerateDevices() only after user picks.● Works without microphone permission; redirect audio from any source.● Off in iframes by default. Needs allow="speaker-selection"● Firefox plans to implement soon. Thanks to Safari for driving design!
State of speaker selection (in-browser picker)
48
Sites still need a way to remember device to not prompt every time (if user permits), but must call selectAudioOutput again to validate the id:
const deviceId = localStorage.speakerId; // from last visit const id = await navigator.mediaDevices.selectAudioOutput({deviceId}); await audioElement.setSinkId(id); localStorage.speakerId = id; // store id for next time (might be new)
If accepted, the picker is skipped. But the user agent may show picker at times (e.g. if the speaker device is no longer available), deterring trackers.
The id only appears in enumerateDevices if call succeeds.
State of speaker selection (choice persistence)
49
Some devices are both microphone and speakers (e.g. headsets, laptops), detectable by shared groupId in enumerateDevices().
Such speakers get exposed with mics during (& immediately after) mic capture: [{kind: "videoinput", ... }, {kind: "audioinput", deviceId: [origin-specific id], label: "AirPods", groupId: "17"}, {kind: "audiooutput", deviceId: [origin-specific id], label: "AirPods", groupId: "17"}}, ...more] // lets site do headset detection
...but not before (no need):[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.
This is to allow headset detection & full duplex audio (I/O on same device).
The spec is narrower than Chrome which atm exposes all speaker devices on (its global) microphone permission, which passed old spec, but not new one.
Implicit microphone permission (headset detection)
50
Extension spec: In-Browser Device PickerLong term we want to get away from in-content device selection
PING wants privacy-by-default in-browser device picker:1. site asks for category (or categories) of device2. browser prompts user for one, many or all devices3. site gains access to only the device + label, of hardware the user selects.
51
In-Browser Cam/Mic PickerWhy not selectCamera() and selectMicrophone()? It’s complicated:
● Web apps want constraints on camera selection (e.g. resolution)● Web apps want some discovery (emerging use cases, streamers use 2 cams, WebVR)● Users want sites to remember their configuration(s) and not pick device every time● User agents differentiate in permission models (persistent on/off vs one-shot, innovation)● What would the migration path be?
○ getUserMedia (unlike setSinkId) is already implemented in all browsers○ Which sites will upgrade to prefer a less powerful / less established API?
Current Goal:1. Get rid of labels & capabilities of non-captured devices (consensus)2. Prevent User agents from granting permission to all cameras/mics (no consensus)3. Limit capabilities exposure of in-use cam/mic (“availability API”) (no consensus) 52
Incremental instead of new API
In-Browser Camera/Mic Picker Model (goals)
53
Device enumerationAPI (most powerful)
getUserMedia()enumerateDevices()
During capture:All labels
All deviceIdsAll capabilities
Implemented in all browsers
TAG/PING-definitionpicker-style API
no “selectCamera()”no “selectMicrophone()”
During capture:Select deviceIds
Select capabilities
No consensus on cam/micWouldn’t get rid of
getUserMedia or labels
May revisit
Label-lessDevice picker-style API
getUserMedia()++enumerateDevices()--
During capture:All deviceIds
Select capabilities
No implementation
Least
powerful
Migration
path
getUserMedia already has a picker in Firefox (tied to permission), letting theuser instead of user agent choose within the app’s constraints when choices >1
Meet.com
←Apps could have just called getUserMedia() again to get a different camera, but web compat preventsshowing a prompt then, because lazy sites expect the same result (no prompt)
Incremental API
?
Solution: Migrate to new getUserMedia semantics over time:
< await navigator.mediaDevices.getUserMedia({video: true, semantics: "browser-chooses"});> await navigator.mediaDevices.getUserMedia({video: true, semantics: "user-chooses"});
New semantics mandate a picker if app constraints don’t narrow down selection to 1 device per kind (where user agent normally would choose). Orthogonal to permission.
Migration strategy:1. Browsers implement pickers for "user-chooses" where agent chooses today.2. Allow sites time to replace in-content pickers in their ⚙ panel with browser pickers.3. Remove all labels from enumerateDevices(). Deprecate device.getCapabilities()4. (Maybe) flip default
Criticism / feature (for users w/multiple cams/mics): Flipping default would mean they see a picker even initially, instead of the browser picking the OS default device for them. Onsites wo/device selection, they’d be prompted every time (improvement over wrong device).
Incremental API (getUserMedia++)
2023: No more labels!
In-Browser Camera/Mic Picker Model (goals)
56
TAG/PING-definitionpicker-style API
no “selectCamera()”no “selectMicrophone()”
During capture:Select deviceIds
Select capabilities
No consensus on cam/micWouldn’t get rid of
getUserMedia or labels
May revisit
Label-lessDevice picker-style API
getUserMedia()++enumerateDevices()--
During capture:All deviceIds
Select capabilities
Someday implemented in all browsers
Least
powerful
Migration
path
State of privacy in WebRTC-PC / Stats / SVCThanks to PING for reviewing these APIs!
RTCRtpSender.getCapabilities("audio"); RTCRtpSender.getCapabilities("video"); RTCRtpReceiver.getCapabilities("audio"); RTCRtpReceiver.getCapabilities("video");
const pc = new RTCPeerConnection(); const offer = await pc.createOffer(); const stats = await pc.getStats();
3 issues were filed (open):● #2460 - getCapabilities seems to leak hardware capabilities w/o permission● #22 - getCapabilities seems to leak hardware capabilities w/o permission● #550 - Stats API should require additional permission
57
A site can learn about the visitor’s underlying hardware capabilities w/o a permission prompt or some other positive, affirmative action by the visitor.
Most of the same information is available in the SDP offer from pc.createOffer()which inherently needs to be signaled by JS to form a peer-to-peer connection, as described in JSEP (IETF):
Use cases:● Data channels● Receive media● Send media other than cam/mic/screen, e.g. canvas/elem.captureStream()
#2460/#22 - getCapabilities leaks hardware capabil w/o permission
58
getCapabilities:
createOffer says:
Conclusion from Graphics Hardware Fingerprinting document linked in issue:● “Information relating to graphics hardware capabilities provided by [WEBRTC], [WebRTC-Stats],
[WebRTC-SVC] ... may also be inferred from other sources such as Web-GPU, Web-GL and performance API.”
● “...graphics hardware fingerprinting concerns are not WebRTC-specific. ...consider adding a permission relating to “whether the page is permitted to know what graphics hardware the user is running” (outside WebRTC)”
Proposed resolution is to include a note relating to implementation issues with hardware capabilities.
#2460/#22 - getCapabilities leaks hardware capabil w/o permission
Two privacy harms / risks reported:1. Leaking communication / plain text
○ A useful consideration for isolated streams (WebRTC-identity), but for regular streams, the web page has prior access to audio and text content.
2. Hardware fingerprinting (decoderImplementation & codec)○ Similar to #2460 (covered in previous slide)
#550 - Stats API should require additional permission
Conclusions
● Conclusions and next steps here
61
Thank you!
Special thanks to:
PING, WG Participants, Editors & Chairs, Youenn Fablet
Bat by OpenClipart-Vectors from PixabayCrowd from PNG EGG 62
Joint MeetingMEIG and WEBRTC WG
October 16, 20208 AM - 9:00 AM Pacific Time
63
W3C WG IPR Policy● This group abides by the W3C Patent Policy
https://www.w3.org/Consortium/Patent-Policy/ ● Only people and companies listed at
https://www.w3.org/2004/01/pp-impl/47318/status are allowed to make substantive contributions to the WebRTC specs
64
Welcome!● Welcome to the joint meeting of the W3C
MEIG and WebRTC WGs at TPAC 2020!● During these meetings, we hope to make
progress on the future of the capture and output specifications
65
About this Meeting● Meeting info:
○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #me ● The meeting is being recorded.
66
Agenda for Joint MEIG/WEBRTC WG Meeting
● WebRTC WG Charter and Deliverables● Status of Capture and Output deliverables● Machine Learning● New work
○ WebCodecs (WICG)○ Insertable Streams for Raw Media (WEBRTC WG)
67
WebRTC WG Charter● WEBRTC WG recently re-chartered through September 2022:
https://w3c.github.io/webrtc-charter/webrtc-charter.html ● Out of scope:
○ The definition of the network protocols used to establish the connections between peers is out of scope for this group; in general, it is expected that protocols considerations will be handled in the IETF.
○ The definition of any new codecs for audio and video is out of scope.● In scope:
○ API functions to explore device capabilities, e.g. camera, microphone, speakers,○ API functions to capture media from local devices (e.g. camera and microphone, but also output
devices such as a screen),○ API functions for encoding and other processing of those media streams,○ API functions for accessing the data in these media streams,○ API functions for decoding and processing (including echo canceling, stream synchronization and
a number of other functions) of those streams at the incoming end,○ Delivery to the user of those media streams via local screens and audio output devices (partially
covered with HTML5). 68
WebRTC WG Deliverables: Networking● WebRTC 1.0 API: https://w3c.github.io/webrtc-pc/ ● WebRTC-Stats: https://w3c.github.io/webrtc-stats/● WebRTC-NV Use Cases: https://w3c.github.io/webrtc-nv-use-cases/● WebRTC Extensions: https://w3c.github.io/webrtc-extensions/ ● WebRTC-ICE: https://github.com/w3c/webrtc-ice ● WebRTC SVC: https://github.com/w3c/webrtc-svc● Insertable Streams:
https://github.com/w3c/webrtc-insertable-streams● WebRTC Priority: https://w3c.github.io/webrtc-priority/● WebRTC-DSCP: https://w3c.github.io/webrtc-dscp-exp/
69
WebRTC WG Deliverables: Media Capture & Output● Media Capture and Streams (recycled at CR):
● Media Capture Automation (for testing; just adopted): https://w3c.github.io/mediacapture-automation/
● Audio output devices API (CR): https://w3c.github.io/mediacapture-output/ ● MediaCapture from DOM: https://w3c.github.io/mediacapture-fromelement/● Screen Capture: ● Media Capture Image: https://w3c.github.io/mediacapture-image/● Media Recording: ● Content-Hints: https://w3c.github.io/mst-content-hint/● Media Capture Depth: https://w3c.github.io/mediacapture-depth/● Media Capture Extensions: https://github.com/w3c/mediacapture-extensions
70
State of Capture and Output Deliverables● Most (all?) specifications have been implemented by at least one
browser, several in multiple browsers.● Several specifications have gone to CR (or are being recycled at CR). ● Other specifications remain Working Drafts for many months or even
years without advancing.○ Is there is enough energy to get to CR (let alone PR)?○ WPT test coverage varies.
● Privacy is an ongoing concern. ○ “Browser picker” model for media capture under development
(Jan-Ivar). ● Is there a “hidden pool” of individuals who we could motivate to help with
these specifications?○ Or is this another example of the “Keebler Elf Theory of Software”? 71
MediaStream model: sources and sinks
.srcObject
.setSinkId
getDisplayMedia()
.captureStream()
MediaStreamTrack
▶ Element
Canvas
📷 🎙
.captureStream()
🖥▶ Element
getUserMedia() ��
(stream).start()MediaRecorder
��
🎧
��
.createMediaStreamDestination()WebAudio .createMediaStreamTrackSource(track)
ImageCapture (track).takePhoto()
.trackRTCRtpReceiver
RTCPeerConnection
.replaceTrack()RTCRtpSender
.addTrack()Networking RTCPeerConnection
WebAudio
State of Capture and Output Deliverablesmediacapture-main: (getUserMedia, enumerateDevices, MediaStreamTrack)
Camera & microphone. Final CR. 28 issues, mostly minor. ✅✅✅✅ All browsers.Work to reduce fingerprint w/in-browser device picker moved to mediacapture-extensions.
mediacapture-output: (setSinkId / selectAudioOutput)
Speaker selection. CR from 2017. Work picked up in 2020 with a more private picker-based model (selectAudioOutput) that also supports non-mic audio sources. 13 issues.✅✅ Chromium (old API), ☐ Firefox in development (new API), ☐ Safari interest (new API).
mediacapture-from-element: (canvas/element.captureStream())
WD from 2017. 22 minor issues. Mature. No recent activity.canvas.captureStream() ✅✅✅✅ All browsers.element.captureStream() ✅✅✅ most browsers, ❌ Safari. 73
State of Capture and Output Deliverablesmediacapture-screen-share: (getDisplayMedia)
WD from 2019. Mature. ✅✅✅✅ all browsers. 20 issues, mostly minor.Recent 2020 interest in same-origin DOM capture (document.captureStream()), privacy.
mediacapture-image: (takePhoto, more camera constraints: brightness, whiteBalance etc.)
WD from 2017. ✅✅ Chromium. ❌ Firefox. Work picked up in 2020 with pan, tilt & zoom constraints behind extra permission. 16 issues. ☐ PTZ interest from Safari?
mediacapture-record: (MediaRecorder)
WD from 2017. Low activity. ✅✅✅ most browsers, ☐ Safari Tech Preview (pref?). 33 open issues. Mostly around config/codec interop, adding/removing tracks, seekable recordings.
mediacapture-extensions: (New in-browser camera/mic picker, miscellaneous)Extensions repo for Rec updates to mediacapture-main. 74
Recent interest in new use cases and features
1. Raw media accessPerformant access to uncoded bytes of video tracks (MediaStreamTrack).○ Use cases: filters, virtual backgrounds, machine learning
2. Capture HTML renderingSafely capture same-origin-isolated document into video track○ Use cases:
■ Let a page stream itself into a web conference(e.g. “Projecting Google Slides into a conference”)
■ Record a web conference
75
Machine Learning● Machine Learning is increasingly important in Media and Entertainment
scenarios: ○ Background replacement (e.g. blurring, images, etc.)○ Constructed environments (e.g. “together mode”, AR/VR, etc.)○ Accessibility (transcription, translation, etc.)
● W3C Machine Learning Workshop: https://www.w3.org/2020/06/machine-learning-workshop/
● Efficient access to raw media is a pre-requisite. Proposals include: ○ VideoTrackReader API (WebCodecs)○ Insertable Streams for Raw Media (WEBRTC WG)
76
WebCodecs● Currently being incubated in WICG:
https://wicg.github.io/web-codecs/● Provides access to raw media (via VideoTrackReader
interface)● Provides low-level access for encoding and decoding of
audio and video.● Still in the early stages. Known issues/limitations:
○ M86 limitations ○ Decoupled from WHATWG Streams.○ No support for advanced video (simulcast, SVC)○ No support for content protection.○ Performance optimizations (HW encode/decode, GPU, etc.) 77
Insertable Streams for Raw Media
● Open up the MediaStreamTrack
● Keep it fast● Keep it simple
78
loss/delayadaptation
RTC media flow in WebRTC 1.0
transport decode display
transport networkadaptation encode pre-
processing
network
PeerConnection
capture
getUserMedia
Web Application
Open up the MediaStreamTrack
Media
Feedback
Me dia
Feedback
Javascript
Breakout Box stage 1 -> 2
Stage Two Track Processorfunction addMoustache(videoFrame) { let facePosition = detectFace(videoFrame.data); return addMoustache(videoFrame.data, facePosition);}
processingTrack = new ProcessingMediaStreamTrack(videoTrack);Transformer = new TransformStream({ Transform: (videoBuffer) => { videoFrame.modifyData(addMoustache(videoFrame)); }});processingTrack.readable .pipeThrough(transformer) .pipeTo(processingTrack.writable); 81
Break Apart the MediaStreamTrack
Media
FeedbackMedia
Feedback
Javascript
Breakout Box stage 3 - allows for generating and consuming tracks directly
Status and Next Steps
● Experimental implementation will be landing in Chrome 88
● Start of specification available○ https://github.com/alvestrand/mediacapture-insertable-streams
83
Capture HTML rendering
Today (getDisplayMedia)● Web-surfaces may be captured by screen-sharing only if users pick them.● Sharing them carries significant risks not understood by users
(malicious sites may do active attacks on WWW’s same-origin policy).● Prevents sites from influencing users to choose these, to deter attacks.● Behind elevated permission (browsers supposed to warn of risks)
Ironically, sharing native apps is safer.Unfortunate, since we’d like to promote web over native.
Prohibitive UX flow for “record this meeting” use case or “Present Google Doc”84
Capture HTML rendering
Better integration: What if web pages stream themselves into a conference?
The page could use existing tech (RTCPeerConnection) to join an ongoing meeting and stream itself there, if it could capture itself.
● The document needs only capture itself.● To be secure, document must be origin isolated as a matter of policy.● CORS-only allows opt-in which isn’t strong enough, since rendering a
document from another origin is different from reading it.● New policy needed, e.g. Cross-Origin-Embedder-Policy: disallow
85
Capture HTML rendering
More secure, but still needs permission: rendering may contain private info
○ link purpling (browser history)○ form autofill (address, credit card info)○ extensions (e.g. LastPass)○ file input element sometimes contains private info
Active attacks could harvest information quickly & covertly (CSS color shading)
These risks are hard to explain to users in a prompt.
86
Capture HTML rendering
HTML → Video a powerful paradigm. Remote browsing; stream web apps
Seems lower-level API than screen-sharing in use cases, behavior, challenges & potential
API suggestions:● document.captureStream() or even● canvas.drawImage(document) if we leave out audio
(since we already have canvas.captureStream())
(The latter would put it out of scope for WebRTC)87
Status and Next Steps
● Early idea stage● Is it a good idea / safe? Do we want this?
● Is WebRTC the right WG for this?● What’s the right audience?● Interested?
88
Open Calls for Review Feedback
● <Carine to fill in these slides>
89
Conclusions
● Conclusions and next steps here
90
Thank you
Special thanks to:
WG Participants, Editors & Chairs
91