This document proposes a mechanism by which an application APP can opt-in to exposing certain information with another application CAPTR, if CAPTR is screen-capturing the tab in which APP is running.
Consider a web-application, running in one tab, which we’ll name "main_app." Assume main_app calls getDisplayMedia and the user chooses to share another tab, where an application is running which we’ll call "captured_app."
Note that:
Both these traits are desirable for the general case, but there exist legitimate use cases where the browser would want to allow applications to opt-in to bridging that gap and enable a connection.
We wish to enable the legitimate use cases while keeping the general case as it was before.
Consider two applications that wish to cooperate, for example a VC app and a presentation app. Assume the user is in a VC session. The user starts sharing a presentation. Both applications are interested in letting the VC app discover that it is capturing a slides session, which application, and even which session, so that the VC application will be able to expose controls to the user for flipping through slides. When the user clicks those controls, the VC app will be able to send messages to the presentation app (either through a service worker or through a shared back-end infrastructure). These messages will instruct the presentation app to flip through slides, enter/leave presentation-mode, etc.
Capturing applications often wish to gather statistics over what applications their users tend to capture. For example, VC applications would like to know how often their users share presentation applications from specific providers, Wikipedia, CNN, etc. Gathering such information can be used to improve service for the users by introducing new collaborations, such as the one described above.
Users sometimes choose to share the wrong tab. Sometimes they switch to sharing the wrong tab by clicking the share-this-tab-instead button by mistake. A benevolent application could try to protect the user by presenting an in-app dialog for re-confirmation, if they believe that the user may have made a mistake.
This use-case is a sub-case of #3, but deserves its own section due to its importance. The "Hall of Mirrors" effect occurs when users choose to share the tab in which the VC call takes place. When detecting self-capture, a VC application can avoid displaying the captured stream back to the user, thereby avoiding the dreaded effect.
The capture-handle identification mechanism consists of two main parts - one on the captured side, one on the capturing side.
Applications are allowed to expose information to capturing applications. They would typically do so before knowing if they even are captured. The mechanism used is calling {{MediaDevices/setCaptureHandleConfig}} with an appropriate {{CaptureHandleConfig}}.
The CaptureHandleConfig dictionary is used to instruct the user agent what information the captured application intends to expose, and to which applications it is willing to expose said information.
dictionary CaptureHandleConfig { boolean exposeOrigin = false; DOMString handle = ""; sequence<DOMString> permittedOrigins = []; };
If true
, the user agent MUST expose the captured application's origin
through the {{CaptureHandle/origin}} field of {{CaptureHandle}}. If
false
, the user agent MUST NOT expose the captured application's origin.
The user agent MUST expose this value as {{CaptureHandle/handle}}.
Note: Values to this field are limited to 1024 16-bit characters. This limitation is specified further in {{MediaDevices/setCaptureHandleConfig}}.
Legal values of this field include:
"*"
If {{CaptureHandleConfig/permittedOrigins}} consists of the single item
"*"
, then the {{CaptureHandle}} is observable by all
capturers. Otherwise, {{CaptureHandle}} is [=observable=] only to capturers whose
origin is lists in {{CaptureHandleConfig/permittedOrigins}}.
{{MediaDevices}} is extended with a method - {{MediaDevices/setCaptureHandleConfig}} - which accepts a {{CaptureHandleConfig}} object. By calling this method, an application informs the user agent which information it permits capturing applications to observe.
partial interface MediaDevices { undefined setCaptureHandleConfig(optional CaptureHandleConfig config = {}); };
The user agent MUST run the following validations:
If all validations passed, the user agent MUST accept the new config. The user agent MUST forget any previous call to {{MediaDevices/setCaptureHandleConfig}}; from now on, the application's {{CaptureHandleConfig}} is config.
The [=observable=] {{CaptureHandle}} is re-evaluated for all capturing applications.
Capturing applications who are permitted to [=observable|observe=] a track's {{CaptureHandle}} have two ways of reading it.
The user agent exposes information about the captured application to the capturing application through the {{CaptureHandle}} dictionary. Note that a {{CaptureHandle}} object MUST NOT be given to a capturing application that is not permited to [=observable|observe=] it.
dictionary CaptureHandle { DOMString origin; DOMString handle; };
If the captured application opted-in to exposing its origin (by setting {{CaptureHandleConfig/exposeOrigin}} to true), then the user agent MUST set {{CaptureHandle/origin}} to the origin of the captured application. Otherwise, {{CaptureHandle/origin}} is not set.
The user agent MUST set this field to the value which the captured application set in {{CaptureHandleConfig/handle}}.
Extend {{MediaStreamTrack}} with a method called {{MediaStreamTrack/getCaptureHandle}}.
When the {{MediaStreamTrack}} is a video track derived of screen-capture,
{{MediaStreamTrack/getCaptureHandle}} returns the latest [=observable=] {{CaptureHandle}}.
Otherwise it returns null
.
partial interface MediaStreamTrack { CaptureHandle? getCaptureHandle(); };
If the track in question is not a video track, or is not the result of a capture of a
display surface
, then the user agent MUST return null
.
If the captured application did not set a {{CaptureHandleConfig}}, or if the last time
it set it to the empty {{CaptureHandleConfig}}, then the user agent MUST return
null
.
The user agent MUST compare the origin of the capturing document to those which the
captured application listed in {{CaptureHandleConfig/permittedOrigins}}. If the
capturing origin is not permitted to [=observable|observe=] the {{CaptureHandle}},
then the user agent MUST return null
.
If all previous validations passed, then the user agent MUST return a {{CaptureHandle}} dictionary with the values derived of the last {{CaptureHandleConfig}} set by the captured application.
Whenever the [=observable=] {{CaptureHandle}} for a given capturing application changes, the user agent fires an event of type CaptureHandleChangeEvent. This can happen in the following cases:
[Exposed=Window] interface CaptureHandleChangeEvent : Event { constructor(CaptureHandleChangeEventInit init); [SameObject] CaptureHandle captureHandle(); };
The track's {{CaptureHandle}} at the time the event was fired, as [=observable=] by the capturing application. If not [=observable=] by the capturing application, all of {{CaptureHandle}}'s fields will be set to their default value - the empty {{DOMString}}.
dictionary CaptureHandleChangeEventInit : EventInit { CaptureHandle captureHandle; };
The track's {{CaptureHandle}} at the time the event was fired.
{{MediaStreamTrack}} is extended with an {{EventListener}} called {{oncapturehandlechange}}.
partial interface MediaStreamTrack { attribute EventHandler oncapturehandlechange; };
{{EventHandler}} for events of type {{CaptureHandleChangeEvent}}.
The capture-handle actions mechanism consists of two parts - one on the captured side, one on the capturing side.
Applications in top-level documents can declare the [=capture actions=] they support, if any. They would typically do so before even knowing if they are being captured. The intended use is for an application to expect to receive these actions from capturer applications wishing to control the progression of the captured session, in response to interaction with the user. Supported actions are declared by calling {{MediaDevices/setSupportedCaptureActions}} with an array of the names of actions the application is prepared to respond to.
{{MediaDevices}} is extended with a method - {{MediaDevices/setSupportedCaptureActions}} - which accepts an array of {{DOMString}}s. By calling this method, an application registers with the user agent a set of zero or more [=capture actions=] it wishes to respond to.
Capture actions are values defined in {{CaptureAction}}. They are meant to be interpreted as instructions from the capturing application to control the advancement of the presentation of the captured session, however the captured application wishes to define this. The intent is to support capturer applications implementing interactive controls for these actions, whose sending requires [=transient activation=] and [=consume user activation=].
partial interface MediaDevices { undefined setSupportedCaptureActions(sequence<DOMString> actions); attribute EventHandler oncaptureaction; }; enum CaptureAction { "next", "previous", "first", "last" };
When this method is invoked, the user agent MUST run the following steps:
The event type of this event handler is `"captureaction"`.
When {{MediaDevices}} is created, give it a [[\RegisteredCaptureActions]] internal slot, initialized to an empty list.
This event is fired on the captured application's {{MediaDevices}} object whenever an action it registered with {{MediaDevices/setSupportedCaptureActions}} has been triggered. This lets the application respond by executing its implementation of this action.
[Exposed=Window] interface CaptureActionEvent : Event { constructor(CaptureActionEventInit init); readonly attribute CaptureAction action; };
dictionary CaptureActionEventInit : EventInit { DOMString action; };
Capturing applications can enumerate available [=capture actions=] that are supported on the video track they have obtained, by using {{MediaStreamTrack/getSupportedCaptureActions}}, and can trigger those actions by using {{MediaStreamTrack/sendCaptureAction}}.
When a {{MediaStreamTrack}} is a video track derived from screen-capture of a browser display surface, {{MediaStreamTrack/getSupportedCaptureActions}} returns the set of available [=capture actions=], if any, supported by the captured application associated with this video track.
partial interface MediaStreamTrack { sequence<DOMString> getSupportedCaptureActions(); Promise<undefined> sendCaptureAction(CaptureAction action); };
When this method is invoked, the user agent MUST return [=this=]' {{MediaDevices/[[AvailableCaptureActions]]}} if defined, or `[]` if not defined.
When this method is invoked, the user agent MUST run the following steps:
Queue a task on the task-list of the captured browser display surface's [=top-level browsing context=]'s [=active document=] to run the following steps:
When a video {{MediaStreamTrack}} is created as part of the getDisplayMedia algorithm, whose source is a browser display surface, give it an [[\AvailableCaptureActions]] internal slot, initialized to the captured browser display surface's [=top-level browsing context=]'s [=Browsing context/active window=]'s associated navigator's {{MediaDevices}} object's {{MediaDevices/[[RegisteredCaptureActions]]}}.
While capture of a browser display surface is occurring, whenever that surface's [=top-level browsing context=] is navigated, then for each capturer of that surface, queue a task on that capturer's task-list to set all associated video {{MediaStreamTrack}}s' {{MediaDevices/[[AvailableCaptureActions]]}} to `[]`.