Video Support
Prysm can play video/audio through the standard <video>
element like so:
<video width="320" height="240" src="my_movie.webm">
</video>
The video player will:
- play opaque videos
- play transparent videos
- send audio data to the engine for playing (it won’t play it by itself)
- currently work only with videos
- in the WebM format
- with 1 video track encoded with VP8 or VP9
- with 1 audio track encoded in Vorbis
The supported standard attributes are “src”, “autoplay”, “loop”, “muted” and “preload=auto”. The supported events are “durationchange”, “ended”, “loadstart”, “seeking”, “seeked”, “volumechange”.
Distribution
The video player is distributed in a separate dynamic library to reduce binary size for users who don’t need it. The separate library is called MediaDecoders.[PlatformName].[PlatformExtension] (e.g. MediaDecoders.WindowsDesktop.dll on Windows). You need to place that extra binary next to the core Cohtml library in your distribution package and it will be automatically loaded.
Distribution on iOS
iOS doesn’t support dynamic libraries so it’s an exception to the rule above. Instead of placing a dynamic library, on iOS, you’ll need to statically link to either libMediaDecoders.iOS.a or libMediaDecodersEmpty.iOS.a depending on whether you want to use the video feature or not.
How to encode videos
Prysm doesn’t have any special requirements for the video files as long as the format and the codecs used are supported. However video quality and playback performance depend highly on how the video was encoded. Using the latest versions of VPX encoders is highly recommended. Performance may suffer if old encoders are used or certain encoder features are disabled.
We recommend using the latest release version of ffmpeg for encoding your videos even if your preferred video editing software has an option to export in WebM. We have found ffmpeg to provide the best video quality and resulting decoding performance with its default VPX encoder settings, which makes ffmpeg easy to use, without the need to pass additional encoder options.
The basic usage of ffmpeg for transcoding is the following:
- specify the input file with “-i”
- choose a video codec with “-c:v” (either “libvpx-vp9” or “libvpx” for VP8)
- specify a target bitrate for the video stream with “-b:v” (use “k” suffix for kilobits e.g. 512k or capital “M” suffix for megabits e.g. 3M)
- choose an audio coded with “-c:a” (e.g. “libvorbis”)
- specify a target bitrate for the audio stream with “-b:a”
- finish the command with a filename which extension is “webm”, “ffmpeg” will automatically set the format accordingly
Example command for transcoding:
ffmpeg -i VideoIn.mp4 -c:v libvpx-vp9 -b:v 1M -c:a libvorbis -b:a 128k VideoOut.webm
For better video quality, results we recommend two-pass encoding, in which you will need to run ffmpeg twice, with almost the same settings, except for:
- in pass 1 and 2, use the “-pass 1” and “-pass 2” options, respectively.
- in pass 1, output to a null file descriptor, not an actual file. (This will generate a data file that ffmpeg needs for the second pass.)
- in pass 1, you need to specify an output format (with “-f”) that matches the output format you will use in pass 2.
- in pass 1, you can leave audio out by specifying “-an”.
Example commands for two-pass encoding:
ffmpeg -i VideoIn.mp4 -c:v libvpx-vp9 -b:v 1M -an -pass 1 -f webm NUL
ffmpeg -i VideoIn.mp4 -c:v libvpx-vp9 -b:v 1M -c:a libvorbis -b:a 128k -pass 2 VideoOut.webm
You can refer to the ffmpeg documentation on how to use it also to adjust resolution, framerate, audio channels and other media properties.
Transparent video support
The basic authoring process of transparent videos goes as follows:
Export a video with an alpha channel, if such format is available in your video editing tool (e.g Quicktime PNG with RGBA), OR export a sequence of transparent PNGs
Feed the transparent video OR the sequence of PNGs (e.g -i sequence-%05d.png) as input to ffmpeg. The ffmpeg commands are the same with the addition of the “-pix_fmt yuva420p” switch which enables transparency for VPX.
Video playback performance
The video playback performance depends on the amount of data that needs to be processed and the amount of video data that needs to be displayed. The size of the processing data is determined by the bitrate and the size of the display data is determined by the resolution. Using transparency in videos adds additional data for processing and additional channel to display.
- Bitrate - lower bitrate yields better performance
- Resolution - lower resolution yields better performance
- Transparency - adds up to 60% more data for processing, depending on how complex the alpha masking is, and 25% more display data
Most video compression algorithms are reusing the frame data from the previous frame and are only processing changes as this proves to be a very efficient compression technique for the most produced videos. This makes the size of the processing data depend heavily on the amount of motion and scene switches present in the video. Due to the dynamic nature of most videos, the amount of data to process varies from frame to frame.
The bitrate is the number of bits that are processed in a unit of time. Video data rates are given in bits per second. There are two methods of compression:
- Using constant bitrate (CBR), the raw data will be compressed as much as needed to meet the target bitrate value for the frame. This means that the video quality will vary depending on the dynamics of the scene. This method of compression will ensure stable performance for the entire duration of the video, but it is considered wasteful and better to be avoided.
- Using variable bitrate (VBR), the raw data will be compressed with a fixed amount which is calculated to meet the target bitrate as an average for the entire duration of the video. This means that the video quality will be constant during the entire video, but it may introduce performance issues in certain parts of the video where the bitrate is very high. We recommend you use the option for setting the upper limit of the bitrate when using VBR ("-maxrate" in ffmpeg).
The resolution isn’t usually changed during playback, so it can be said that it has a fixed performance cost. Higher resolution or frame rate means more data to process, hence it will require a higher bitrate to preserve the video quality.
VP8 vs VP9 - We have found that using the exact same properties for encoding, VP8 gives better performance, but produces worse video quality. VP9 performs slower but will produce better video quality. If we are about to compensate for the quality difference by adjusting the bitrate, either higher for VP8 or lower for VP9, then the performance difference will be evened out. Additionally, for the same perceivable quality, using VP9 will result in smaller file sizes due to the lower bit rate. We recommend using VP9 when possible, VP8 is generally used for compatibility reasons.
Seek performance
Seeking a video is an unexpected event both for the video player from the buffering standpoint and for the media itself.
Because most video compression formats only store incremental changes between frames (except for keyframes), it is not possible to directly seek at any arbitrary point in the video stream. In order to display an arbitrary interframe (non-keyframe), the decoder must start with the nearest previous keyframe and apply the changes of all interframes to that point.
Simplifying this to plain data looks like this (K - keyframe, I - interframe): [154(K), +6(I), -10(I), +5(I), 212(K), -15(I), …] If we seek to the third frame (-10(I)), in order to know the value we have to calculate 154 + 6 - 10 = 150
.
All videos start with a keyframe and encoders create additional keyframes only when that benefits the compression - when a keyframe is smaller in size than an interframe describing the changes (e.g. on scene switch). This means that you can have a video that doesn’t have a keyframe for several seconds. Seeking to such a point can require hundreds of frames decoded before you get the desired frame to show.
If you want to seek a video with good performance, make sure that the seek happens at a keyframe to avoid decoding more than one frame. We have added a custom attribute/property that does that automatically:
cohfastseek
attribute, andHTMLMediaElement.cohFastSeek
property which when present or enabled will force the seek to happen on the nearest keyframe. For example, if we have keyframes at the following time points:[0s, 1.5s, 5s]
a seekcurrentTime = 1.4;
will seek to1.5s
instead and readingcurrentTime
will report1.5
;
You can manually force the encoder to create keyframes at the desired seek points with ffmpeg by passing the following argument:
-force_key_frames 0:05:00,0:07:50,...
You can read the timestamps of the keyframes in a video by using another custom API:
HTMLMediaElement.cohGetKeyframeTimestamps()
- returns an array of timestamps in seconds of all keyframes. This info is available only after video metadata is parsed, otherwise, an empty array is returned.
The video player requires at least two buffers to guarantee smooth playback as the data is processed asynchronously. During normal playback, the video player buffers future encoded frame data and future decoded frames and discards past data to minimize memory consumption.
The encoded frame data can get delayed due to slow I/O operation. The player buffers future encoded frame data to mitigate this and to be able to send the data to the decoders on time - just before it needs to display the frame. The decoded frame data needed to draw the frame on screen can also get delayed. Decoding delay can happen because of unavailability to process the decode request immediately or when the frame which is being decoded takes more time than usual (different frames require a different amount of processing). The player buffers future decoded frame data to mitigate such delays and prevent frame misses.
In most cases, a seek event won’t be in a buffered region. This makes it vulnerable to delays both during obtaining the frame data and decoding. We have added the following API to mitigate the delay in obtaining the frame data:
HTMLMediaElement.cohPrebufferKeyframe(double timestamp)
- pre-buffers the encoded keyframe data, so a seek to that point can immediately schedule decoding. This API will accept only timestamps that are keyframes. It can be used in conjunction withgetKeyframeTimestamps
:
// Prebuffer all keyframes
video.cohGetKeyframeTimestamps().map(t => video.cohPrebufferKeyframe(t));
There is also a declarative way to preload keyframes to improve seek performance by adding the “preload” attribute to the video element which when set to “auto” will preload all known keyframes.
Audio support
Prysm does not play audio by itself. All audio data is decoded, converted to Pulse-code modulation (PCM) and passed to the engine for further processing. The PCM data is passed through several callbacks on the cohtml::IViewListener interface (look for the OnAudio* methods)
You can use your engine’s audio system to enqueue the PCM data in the sound buffers and get it playing. There are two reference implementations available
- one based on Windows' XAudio2 and one on OpenAL. Both can be found under Modules/AudioSystem/. The Audio system module provides an abstraction over both implementations and can also be used directly in the engine by including the source file and linking to the corresponding third-party dependencies.
Take a look at the Sample_VideoPlayer sample in the distribution package for more info.
How to play a video with controls
In order to use controls, you need to include an additional JavaScript library which you can find under Samples/uiresources/VideoPlayback. After you include the library, use the custom HTML element:<video-with-controls>
.
<video-with-controls id="myVideo" src="my_movie.webm" width="320px" height="240px">
</video-with-controls>
To use the library you must:
- copy the files from Samples/uiresources/VideoPlayback to your UI directory
- video_controls_images folder
- video_controls.js
- video_controls.css
- include video_controls.js at the beginning of your HTML file
- include video_controls.css which contains the styles for the controls
JavaScript API is the same as the one for HTMLMediaElement:
- Supported attributes:
- src
- width
- height
- autoplay
- loop
- muted
- Supported properties:
- paused
- ended
- loop
- autoplay
- currentTime
- duration
- volume
- muted
- src
- Supported methods:
- play()
- pause()
- Events are not supported
If the API exposed by our video container element is not sufficient for your needs, you can get the video element itself:
let video = document.getElementById("myVideo").querySelector("video");
Customizing controls
If you want to customize the way your video player looks - copy your custom images in the uiresources/VideoPlayback/video_controls_images folder and keep the names the same.
If you want to customize the font size or another inherited, then add CSS style to our video container element.
Media playback events
The media element supports the following standard events:
durationchange
: The metadata has loaded or changed, indicating a change in duration of the media. This is sent, for example, when the media has loaded enough that the duration is known.emptied
: The media has become empty; for example, this event is sent if the media has already been loaded (or partially loaded), and theload()
method is called to reload it.ended
: Sent when playback completes.pause
: Sent when the playback state is changed topaused
(paused
property istrue
).play
: Fired after theplay()
method has returned, or when theautoplay
attribute has caused the playback state to change. Note that this does not necessarily mean there’s actual playback, since the network request can be delayed. See theplayed
event for more information.playing
: Sent when the media has enough data to start playing, after theplay
event.seeked
: Sent when a seek operation completes.seeking
: Sent when a seek operation begins.timeupdate
: The time indicated by the element’s currentTime attribute has changed. Note that this event is not fired during normal playback.volumechange
: Sent when the audio volume changes (both when the volume is set and when themuted
attribute is changed).error
: Sent when there is a media error.Network errors with status code416 Range Not Satisfiable
are ignored and won’t trigger the error event.
There are a few custom events not present in the standard:
cohplaybackstalled
: Sent when a decoder is unable to keep up with the playback rate and cannot provide new frames quickly enough. This event is useful when trying to synchronize the video playback with other animations since the frame decoding is asynchronous and not tied in any way to the View timer. Usually, pausing any animations that need to be in sync with the video on thecohplaybackstalled
event and resuming them on thecohplaybackresumed
event is enough.cohplaybackresumed
: Sent when the video/audio decoder that previously stalled the playback is now caught up and provided new frames.
Showing video preview
Currently, Prysm does not render video frames unless the video is playing or “seeked”. This means that there will be no image of the video shown initially. In order to show a preview, either play the video or simply seek the desired time in seconds, e.g. videoElement.currentTime = 0;
.
Resource handler
When playing a video, Prysm will send range requests to the client’s implementation of cohtml::IAsyncResourceHandler::OnResourceRequest to avoid loading the entire video file in memory and potentially running out of memory on platforms with limited hardware. You check the reference implementation on how to handle range requests in the class resource::ResourceHandler
, which is used all across the samples.
Responding to range requests
Prysm always sends range requests for videos by including the Range header in the request. Range requests enable streaming resources over the network or from disk instead of loading the entire resource in memory. The client implementation should read the Range header to figure out which part of the resource is requested and then respond by providing the requested data and setting the status to 206 (HTTP Partial Content). If the client implementation does not set the status to 206, the response will be interpreted as a regular response, and Prysm will assume that the whole resource data is provided and won’t send more requests.
End of stream handling
When the client implementation responds with partial content Prysm reads the Content-Range header in the response and is specifically interested in the range-end and size directives to determine whether the end of the stream is reached. The client implementation is expected to provide a valid Content-Range header in the response, otherwise Prysm will issue an out-of-bounds request when the end of the stream is reached.
Lifetime of video resources
Lifetime flow in <video>
:
- Create
<video>
element with a video resource specified by thesrc
attribute. - The element will add a reference to the resource.
- The reference will be removed when:
- The
src
attribute is changed. - The Garbage Collector destroys the element.
- The
You can release the video resource reference when needed by changing the src
attribute.
For example, let’s say you want the resource reference to be released when the <video>
element is detached from the DOM tree. In that case, you may use the <video-with-controls>
custom element and unset the src
attribute inside the disconnectedCallback
function, like so:
//Javascript
class CohVideo extends HTMLElement {
//...
disconnectedCallback() {
this.videoElement.setAttribute("src", "")
}
//...
}