Text To Speech

Overview

To make games more accessible, Prysm provides the TextToSpeech module. It aims to help game developers integrate Text-to-speech capabilities into game UI. Any text from cohtml::View can now be verbally spoken to the end-user through the audio output. Text to speech is a functionality that makes text data be read to the audio output. It consists of two steps: synthesizing and playing audio. Synthesizing is a process of converting an arbitrary text into audio data (e.g. into WAVE audio stream). Playing audio is actually playing the generated audio stream to the audio output (this can happen either directly, or through the game engine audio system).

Here are some example use-cases of what can be easily achieved with this feature:

  • When a user hovers or focuses on an element, the element content gets spoken.
  • Voicing messages from a game chat.
  • Reading urgent notifications instantly to the audio output.
  • Reading text from an input field while a user is typing.

How to use

In order to integrate this feature into a game you need to:

  • Integrate the SpeechAPI C++ library (implement the needed interfaces, properly handle audio and worker tasks)
  • Integrate the SpeechAPI JS library into the UI and use it to schedule/discard speech requests
  • Optionally integrate ARIA JS library. While it’s possible to use default plugins, you can extend its functionality with your own plugins

Read this document and examine the packaged TextToSpeech sample to understand the overall flow.

Basic flow

The Text To Speech module contains 3 major components:

  • SpeechAPI C++ library speech_api_cpp_lib - implements a cross-platform API for synthesizing and speaking a speech request to the audio output
  • SpeechAPI JS library speech_api_js_lib - implements a queue managing system for speech requests with 4 Priority channels
  • ARIA JS library - implements a plugin system that can be used to implement accessibility plugins like the one for supporting ARIA live regions and use it in a Game UI in a declarative way.

The basic flow looks like this:

  1. The game process initializes the Manager with needed listeners and synthesizers. Note that default ones can be included from DefaultSynthesizers and DefaultAudioListeners.
  2. When a view is ready for bindings, we enable the view to use the speech module.
  3. The view loads the ARIA and the speech API JS libraries.
  4. Plugins inside ARIA lib schedule speech requests in Speech API JS lib.
  5. Speech API JS lib, using binding calls, notifies the controller to schedule a new speech request
  6. Controller updates the State and notifies client Processor Listener about a new task available.
  7. The client invokes processing on a worker thread that is done by the provided synthesizer implementation.
  8. Once processed client worker thread asks Manager to process audio using the Processor.
  9. If needed, the AudioListener callbacks are called from the Processor. Note that on some platforms step 7. takes care of audio to be read right to the audio output.
  10. AudioListener schedules PCM/WAVE data to be played if needed in-game audio system.
  11. Speech API JS lib uses the controller to manage request queues and check the state of the last scheduled speech request.

For more details please refer to the code documentation in TextToSpeech module and TextToSpeech sample.

SpeechAPI C++ library

The most important entities of this system are:

Speech synthesizers manager

SpeechSynthesizerManager is the API entry point for managing synthesizing for a single View. Use it to interact with the Speech API module.

The SpeechSynthesizerController is used by the manager to handle View binding calls.

The SpeechSynthesizerProcessor is used by the manager to handle processing work like audio data processing on an audio thread or speech request processing on a worker thread.

Speech synthesizers

ISpeechSynthesizer is a cross-platform interface for text synthesizing used by the manager. Default implementations are located under DefaultSynthesizers folder.

Note that some synthesizers expect the platform environment to be initialized throughout their lifetime. This should be done by a game because the game could need to use the environment outside of synthesizers' lifetime bounds.

Currently, we support GDK and PS5 platforms. Each platform defines its own ISpeechSynthesizer. For example, for GDK platforms, XSpeechSynthesizer API is used to implement the synthesizer interface.

You can implement custom synthesizers and use them on the target platforms. One valid approach for this is to use a third-party cloud service for synthesizing, which can be done by implementing a custom synthesizer that can be used on all platforms with Internet access.

Listeners

ISpeechSynthesizerProcessingListener, ISpeechSynthesizerAudioListener - these listeners are implemented by the client to process worker and audio tasks.

SpeechAPI JS library

This library maintains request channels and speaks a single request from a single channel at a time using bindings defined in SpeechSynthesizerController (See SPEECH_EVENT_NAMES). It provides an asynchronous API for adding, discarding, and managing speech requests across 4 priority channels.

For more information please refer to the documentation in cohtml-speech-api.js.

ARIA JS library

This library uses SpeechAPI JS lib to implement accessibility plugins.

Several plugins implemented can be found in the plugins folder:

  • CohtmlARIAHoverReadPlugin implements speaking hovered elements with aria-label support.
  • CohtmlARIAFocusChangePlugin implements speaking focused elements content
  • CohtmlARIALiveRegionsPlugin implements speaking ARIA live region changes
  • … and more

In order to use this library, the UI needs to load library scripts and required plugins, initialize the CohtmlARIAManager with loaded plugins, and ask the ARIA manager to observe a specific DOM subtree.

<!-- Common lib files -->
<script src="../js/aria-js/cohtml-aria-utils.js"></script>
<script src="../js/aria-js/cohtml-aria-common.js"></script>
<script src="../js/aria-js/cohtml-aria-plugin.js"></script>
<!-- Include needed Plugins -->
<script src="../js/speechAPI/cohtml-speech-api.js"></script>
<script src="../js/aria-js/plugins/cohtml-aria-live-region.plugin.js"></script>
<script src="../js/aria-js/plugins/cohtml-aria-hover-read.plugin.js"></script>
<script src="../js/aria-js/plugins/cohtml-aria-focus-change.plugin.js"></script>
<!-- Entry point scripts -->
<script src="../js/aria-js/cohtml-aria-observer.js"></script>
<script src="../js/aria-js/cohtml-aria-manager.js"></script>


<script>
// Create a manager with specific plugins
const manager = new CohtmlARIAManager([
        // include only needed plugins
        new CohtmlARIALiveRegionsPlugin(),
        new CohtmlARIAFocusChangePlugin(),
        new CohtmlARIAHoverReadPlugin(),
]);
// Observe a specific document root
manager.observe(document.body);
</script>

Plugins extension

It’s possible to add custom plugins and extend the current ones. All plugins should implement CohtmlARIAPlugin from cohtml-aria-plugin.js.

Here’s an example of a simple custom plugin. For more complex plugins, check existing ones in the plugins folder.

// This plugin is an example of how to handle custom events
// Use new CustomEvent("name") and target.dispatchEvent(customEvent)
// to trigger this plugin
class CohtmlARIACustomEventPlugin extends CohtmlARIAPlugin {
    onDOMEvent(event) {
        if (event.type == "customuserdefined") {
                console.log("Fired customuserdefined event: ", event);
        }
    }

    get observedDomEvents() {
        return ["customuserdefined"];
    }
};

React support

This library and live region changes support React SPA as well, usage does not differ from the standard one - a document fragment root needs to be explicitly observed.

Samples

Several examples of how to integrate JS libraries into a game UI can be found in the Samples uiresources under TextToSpeech/examples folder.

To see examples in action, refer to the SampleTextToSpeech sample. Note that it’s packaged only on supported platforms.