Dx12 Rendering Backend

Gameface comes with several example rendering backends. One of those backends is the DX12 backend that uses the DirectX 12 rendering API. This page explains the caveats of the DX12 backend and how to use it properly in a client application. For more general information on the rendering flow of Gameface, see Rendering Architecture

Using and driving the DX12 backend

Usually, when using the DirectX 12 rendering API, the application code has several frames in flight. This means that multiple frames are recorded in command lists. Also, when the command lists are submitted to a command queue, they are not executed immediately and the GPU might continue to use them for some time. In this situation, the application code has to be careful not to improperly reuse or destroy some GPU resource that might still be in use by the GPU.

For those reasons, in the DX12 backend, there is a concept of a rendering frame. The backend respects that there might be multiple frames in flight - by default 4, see the comments near Dx12Backend.h::Dx12Backend::MAX_FRAMES_IN_FLIGHT. Internally, the backend rotates and tracks resources so as not to use the same ones every frame. It also has logic that doesn’t delete resources straight away when handling BRC_Destroy* commands but only when it is safe to do so.

The safe use of GPU resources across frames in flight is achieved through the use of “fence values”. The correct flow for driving the DX12 backend is:

  1. At the beginning of a frame on the rendering thread, before all cohtml::Views are painted, call Dx12Backend::BeginFrame(list) with the command list object that will be used for this frame
  2. At the end of the frame, when all cohtml::Views have been painted, call Dx12Backend::EndFrame(). The returned uint64_t value is the “fence value” for this frame.
  3. When the GPU has finished its work with the command list for this frame, call Dx12Backend::ListComplete(frameFenceValue) with the corresponding fence value.

The call to Dx12Backend::ListComplete signals the backend that all resources used in the frame with the given fence value are no longer used by the GPU and are free to be destroyed if needed. Knowing when the GPU has finished work with some command list is usually achieved by using DX12 fences and calls to ID3D12CommandQueue::Signal.

A typical flow for using the DX12 backend is as follows:

// Signal the completion of some frame that has been
// previously recorded. The value of 'BackendFence' will
// be the last frame that was finished by the GPU
previousFrameFence = BackendFence->GetCompletedValue();
dx12Backend->ListComplete(previousFrameFence);

list = AcquireCommandListForCurrentFrame();

// Begin the frame for the backend
// list is a ID3D12GraphicsCommandList*
dx12Backend->BeginFrame(list);

// Paint all views. This records rendering commands in 'list'
cohtlView_1->Paint(...);
...
cohtlView_N->Paint(...);

// End the frame and get a fence value for this frame
frameFence = dx12Backend->EndFrame(list);

// Execute the command list that was recorded for this frame
ID3D12CommandList* lists[]{ list };
CommandQueue->ExecuteCommandLists(1, lists);

// tell the GPU to update the 'BackendFence' when
// it reaches this place in the command queue
CommandQueue->Signal(BackendFence, frameFence);

Descriptor Heaps in the DX12 Backend

A big part of all rendering backends is the management of GPU resources. Resources are created when handling BRC_Create* backend commands and are at some point destroyed when handling BRC_Destroy* commands. The managed resources are:

  • Textures - see BRC_CreateTexture, BRC_CreateTextureWithDataPtr, and BRC_DestroyTexture
  • Depth-Stencil Textures - see BRC_CreateDSTexture and BRC_DestroyDSTexture
  • Vertex buffers - see BRC_CreateVertexBuffer and BRC_DestroyVertexBuffer
  • Index buffers - see BRC_CreateIndexBuffer and BRC_DestroyIndexBuffer
  • Constant buffers - see BRC_CreateConstantBuffer and BRC_DestroyConstantBuffer
  • Pipeline state objects - see BRC_CreatePSO and BRC_DestroyPSO

However, in DX12 there are a couple of other resource types that have to be managed. Those are the GPU visible and CPU staging descriptor heaps. For more information, see the official Descriptor Heaps Overview

With regards to the descriptor heaps usage, the DX12 backend implements the following scheme:

  • Views of resources (shader resource views, depth-stencil views, render target views) are placed on CPU staging heaps when created. See Dx12Backend::m_SRVStagingDesc, Dx12Backend::m_RTVStagingDesc, Dx12Backend::m_DSVStagingDesc, and Dx12Backend::m_SamplersStagingDesc
  • When views need to be bound, they are first copied to GPU visible heaps. See the logic in Dx12Backend::SetPSTextures.

GPU visible descriptor heaps are kept in a ring buffer in Dx12Backend::m_FrameDescriptors and every frame the DX12 backend uses the next entry in the ring buffer. For every ring buffer entry, there might be multiple descriptor heaps as one of them might not be enough for a single frame. When a frame is begun - see Dx12Backend::BeginFrame - the DX12 backend can destroy some of the descriptor heaps in the current ring buffer entry. By default, the backend keeps up to 4 descriptor heaps per ring buffer entry. This number can be adjusted - see Dx12Backend::DESCRIPTOR_HEAPS_TO_KEEP_ALIVE.

The GPU visible descriptor heaps can be thought of as regular GPU resources that take up certain GPU memory. As such, depending on the application use case, there might be GPU memory pressure created by the GPU descriptor heaps usage. The things to consider:

  • Creating GPU descriptor heaps is an expensive operation and avoiding it is important
  • Having unused GPU descriptor heaps wastes GPU memory

For these reasons, having an appropriate value for DESCRIPTOR_HEAPS_TO_KEEP_ALIVE is an important thing to consider to achieve good performance and memory trade-off.

Performance tips

Achieving a good performance while using low-level graphics APIs like DirectX 12 requires knowledge of the concrete problem being solved. Having a general rendering backend like the DX12 backend goes against this philosophy. For this reason, fine-tuning the DX12 backend might be necessary when performance and GPU memory utilization are concerns. Here is a list of things to consider when using the DX12 backend:

  • Adjust the value of Dx12Backend::DESCRIPTOR_HEAPS_TO_KEEP_ALIVE. See the previous section for explanations.
  • Adjust the values of Dx12Backend::CBVSRVUAV_DESCRIPTOR_HEAP_SIZE, Dx12Backend::RTV_DESCRIPTOR_HEAP_SIZE, and Dx12Backend::DSV_DESCRIPTOR_HEAP_SIZE. Those control how big the allocated GPU descriptor heaps are.
  • Adjust the values of Dx12Backend::CBVSRVUAV_STAGING_DESCRIPTOR_HEAP_SIZE, Dx12Backend::RTV_STAGING_DESCRIPTOR_HEAP_SIZE, and Dx12Backend::DSV_STAGING_DESCRIPTOR_HEAP_SIZE. Those control how big the allocated staging CPU descriptor heaps are.
  • Adjust the value of PersistentConstantBuffersCount in Dx12Backend::FillCaps. This tells the rendering library how many constant buffers to keep alive per frame per ring buffer entry per cohtml::View. Allocating buffers in DX12 is expensive and if the rendered UI is complex, it might require more constant buffers.
  • Adjust Dx12Backend::MAX_FRAMES_IN_FLIGHT if the application has different numbers of frames in flight. More frames in flight generally imply keeping more copies of some resources. See the comments near Dx12Backend::MAX_FRAMES_IN_FLIGHT for more information.