Skip to main content

A Deep Dive into PSO Caching in Unreal Engine

· 7 min read
Andriy Kalysh

In modern real-time rendering, few things break immersion like a shader compilation hitch. You turn a corner, an explosion goes off, and the game freezes for 200ms. This article explores why this happens, how Bungie and Ubisoft solved it, and how to improve PSO Precaching in Unreal Engine 5.


Part 1: The Theory – Why Do We Hitch?

To understand shader hitching, we must understand the Pipeline State Object (PSO).

What is a PSO?

A PSO is a monolithic object that describes the state of the graphics pipeline. It is not just the shader code (Vertex/Pixel shader) — it combines the compiled shader bytecode with specific render states, including:

  • Blend State (Translucency, Additive, etc.)
  • Rasterizer State (Wireframe, Culling)
  • Depth/Stencil State
  • Input Layouts

In modern APIs like DirectX 12 and Vulkan, the GPU cannot draw a single triangle until the specific PSO for that draw call is fully compiled and ready.

The "Good Old Days" of DirectX 11

If PSOs are necessary, why didn't games hitch as much in the DX11 era?

According to AMD and Ubisoft's analysis (GDC 2017), D3D11 drivers hid significant complexity. When you called SetVertexShader or SetBlendState, the driver didn't compile immediately. It waited until draw time, checked if it had seen that combination before, and if not, JIT-compiled (Just-In-Time) the state. The drivers were incredibly optimized to handle this "lazy" compilation without stalling the game thread, effectively managing permutations automatically.

The DX12/Vulkan Reality

Modern APIs shifted control from the driver to the developer. Explicit control offers higher performance but removes the safety net. If your engine requests a specific Shader + Blend State combination that hasn't been pre-compiled (Precached), the driver must pause execution to compile it right now.

Result

A hitch.


Part 2: The Core Problem – Permutation Explosion

Why can't we just compile everything at startup? Because of Shader Permutations.

Modern material systems allow artists to toggle thousands of options (Static Switches). If you have a shader with 10 boolean switches (Shadows, Fog, Metalness, etc.), that creates 2^10 (1,024) potential variants.

Unreal Engine historically favors Permutations over Dynamic Branching:

ApproachGPU PerformanceDisk SizeCompile Time
Permutations✅ Fast (dedicated variant per combo)❌ Large❌ Slow
Dynamic Branching❌ Slower (branch divergence)✅ Small✅ Fast

Case Study: Bungie (Destiny)

In their 2017 GDC talk, Bungie revealed how they handled Destiny's 18,000+ artist-authored shaders. They avoided explosion using the TFX System:

  • Components: Encapsulated shader options within code boundaries.
  • Variant Layers: Artists created "layers" of overrides in a single file.
  • Selective Building: Crucially, they adhered to a strict rule:
Bungie's Golden Rule

"DON'T build ALL variants." Only compile the subsets actually used in game content, treating the pipeline as an offline content-bake rather than a runtime discovery.

Case Study: Ubisoft (AnvilNext Engine)

Moving Assassin's Creed to DX12, Ubisoft found their existing "granular" state management (setting states piecemeal) was incompatible with DX12's "blob" PSOs. They had to rebuild their renderer to pre-calculate these state blobs offline or during loading screens, essentially creating a database of PSOs linked to material graphs.

Honorable Mention: Naughty Dog

Naughty Dog opted for an Uber-Shader approach in The Last of Us Part II:

"The shader is about 48,000 lines of code, which doesn't include generated code."


Part 3: Unreal Engine 5

Automated PSO Precaching

As of UE 5.3+, Epic enabled PSO Precaching by default. This system looks at the assets being loaded and compiles the necessary PSOs on background threads before the GPU needs them.

Implementing the Loading Screen (The "Wait" Logic)

Even with background precaching, Global Shaders (Post-process, Compute) must be ready before gameplay starts. Gate your gameplay behind a loading screen that checks the compiler status.

Implementation

Check FShaderPipelineCache::NumPrecompilesRemaining(). Do not remove the loading screen until this returns 0.

Enable precaching via CVar:

r.PSOPrecaching=1

Bundled PSOs

You can also record a Bundled PSO Cache — a recording of every shader drawn during a QA playthrough, bundled into the installer.

Console Command for Diagnosis

Use stat unitgraph to visualize hitches. If the green line (GPU) or game thread spikes simultaneously with a LogRHI warning in the output log, you missed a PSO.


Step-by-Step: Creating a Bundled PSO Cache

1. Configuration

Set up your project to generate Stable Keys — identifiers that persist across builds.

DefaultEngine.ini
[DevOptions.Shaders]
NeedsShaderStableKeys=true

[ShaderPipelineCache]
LastOpenedMask=0
r.ShaderPipelineCache.StartupMode=1
DefaultGame.ini
[/Script/UnrealEd.ProjectPackagingSettings]
bShareMaterialShaderCode=True
bSharedMaterialNativeLibraries=True

2. The Clean Sweep

Critical Step

Old metadata poisons new caches. Always start clean.

  1. Delete Intermediate, Saved/Cooked, and Binaries.
  2. Perform a full Cook/Package of the game.
  3. Locate the generated .shk files in:
    Saved/Cooked/Windows/[Project]/Metadata/PipelineCaches/
    Save these files — you will need them.

3. Recording

Play the game on the target hardware.

Launch with the following arguments:

-logPSO -clearPSODriverCache

During the session:

  • Open every menu
  • Fire every weapon
  • Visit every level

Collect the .rec.upipelinecache files from Saved/CollectedPSOs/.

4. Expanding the Cache

Use the Unreal commandlet to merge the Recording (.rec) with the Stable Keys (.shk) to produce the Pipeline Cache (.spc).

5. Packaging

  1. Place the generated .spc file into Build\Windows\PipelineCaches\.
  2. Repackage the game.

The engine will now load this cache on startup, pre-compiling the recorded PSOs during your splash screens.


Advanced: Custom "Brute Force" Preloading System

While UE5's native precaching works well for linear games, it often falls short in Open World or Level Streaming scenarios. If the engine streams in a new biome chunk, the native precacher may not compile new foliage shaders fast enough, causing a game-thread spike.

To address this, a "Brute Force" Precompilation System has been prototyped by developers (such as user S_PHIR_H on the Epic forums).

1. The Filter (Class Selection)

Instead of loading everything, the script focuses on the heaviest shader contributors.

Nodes: Get Asset Registry → Make ARFilter → Get Assets

Classes to include:

  • StaticMesh
  • SkeletalMesh
  • NiagaraSystem (VFX are a frequent source of hitches)
note

This check runs in the editor, but can be adapted for runtime use.

2. The Grid Loop (Positioning)

Spawning thousands of assets at (0, 0, 0) causes physics collisions and debugging nightmares. The script calculates a grid position for each asset:

AxisFormulaResult
X(Index % 100) * 200Creates rows of 100 items
Y(Index / 100) * 200Advances to the next row every 100 items
Spacing200 unitsPrevents overlap

3. Spawning the Asset

Use the Spawn Actor from Object node, passing Asset Data from the registry search. This spawns the appropriate actor representation (e.g., a StaticMeshActor for a Static Mesh) at the calculated grid location.

4. Scale Normalization (Anti-Overlap Logic)

Assets range from 1cm screws to 100m skyscrapers. To ensure the camera can see all of them without massive overlaps, the script normalizes scale across all spawned actors before the camera sweep begins.


Conclusion

Shader hitching is a solvable problem, but it requires a multi-layered defense strategy:

LayerApproachWhen
🔻 Reduce VariantsMinimize Static Switch usage in materialsAlways
🤖 AutomateUse the Brute Force system to catch edge casesDev / Testing
Wait for PSOsShow a loading screen until precaching completes (reference)Level transitions
📦 RecordPerform a Bundled PSO recording for a smooth end-user experienceRelease
tip

All four systems can — and should — be used together for maximum coverage.