A Deep Dive into PSO Caching in Unreal Engine
In modern real-time rendering, few things break immersion like a shader compilation hitch. You turn a corner, an explosion goes off, and the game freezes for 200ms. This article explores why this happens, how Bungie and Ubisoft solved it, and how to improve PSO Precaching in Unreal Engine 5.
Part 1: The Theory – Why Do We Hitch?
To understand shader hitching, we must understand the Pipeline State Object (PSO).
What is a PSO?
A PSO is a monolithic object that describes the state of the graphics pipeline. It is not just the shader code (Vertex/Pixel shader) — it combines the compiled shader bytecode with specific render states, including:
- Blend State (Translucency, Additive, etc.)
- Rasterizer State (Wireframe, Culling)
- Depth/Stencil State
- Input Layouts
In modern APIs like DirectX 12 and Vulkan, the GPU cannot draw a single triangle until the specific PSO for that draw call is fully compiled and ready.
The "Good Old Days" of DirectX 11
If PSOs are necessary, why didn't games hitch as much in the DX11 era?
According to AMD and Ubisoft's analysis (GDC 2017), D3D11 drivers hid significant complexity. When you called SetVertexShader or SetBlendState, the driver didn't compile immediately. It waited until draw time, checked if it had seen that combination before, and if not, JIT-compiled (Just-In-Time) the state. The drivers were incredibly optimized to handle this "lazy" compilation without stalling the game thread, effectively managing permutations automatically.
The DX12/Vulkan Reality
Modern APIs shifted control from the driver to the developer. Explicit control offers higher performance but removes the safety net. If your engine requests a specific Shader + Blend State combination that hasn't been pre-compiled (Precached), the driver must pause execution to compile it right now.
A hitch.
Part 2: The Core Problem – Permutation Explosion
Why can't we just compile everything at startup? Because of Shader Permutations.
Modern material systems allow artists to toggle thousands of options (Static Switches). If you have a shader with 10 boolean switches (Shadows, Fog, Metalness, etc.), that creates 2^10 (1,024) potential variants.
Unreal Engine historically favors Permutations over Dynamic Branching:
| Approach | GPU Performance | Disk Size | Compile Time |
|---|---|---|---|
| Permutations | ✅ Fast (dedicated variant per combo) | ❌ Large | ❌ Slow |
| Dynamic Branching | ❌ Slower (branch divergence) | ✅ Small | ✅ Fast |
Case Study: Bungie (Destiny)
In their 2017 GDC talk, Bungie revealed how they handled Destiny's 18,000+ artist-authored shaders. They avoided explosion using the TFX System:
- Components: Encapsulated shader options within code boundaries.
- Variant Layers: Artists created "layers" of overrides in a single file.
- Selective Building: Crucially, they adhered to a strict rule:
"DON'T build ALL variants." Only compile the subsets actually used in game content, treating the pipeline as an offline content-bake rather than a runtime discovery.
Case Study: Ubisoft (AnvilNext Engine)
Moving Assassin's Creed to DX12, Ubisoft found their existing "granular" state management (setting states piecemeal) was incompatible with DX12's "blob" PSOs. They had to rebuild their renderer to pre-calculate these state blobs offline or during loading screens, essentially creating a database of PSOs linked to material graphs.
Honorable Mention: Naughty Dog
Naughty Dog opted for an Uber-Shader approach in The Last of Us Part II:
"The shader is about 48,000 lines of code, which doesn't include generated code."
Part 3: Unreal Engine 5
Automated PSO Precaching
As of UE 5.3+, Epic enabled PSO Precaching by default. This system looks at the assets being loaded and compiles the necessary PSOs on background threads before the GPU needs them.
Implementing the Loading Screen (The "Wait" Logic)
Even with background precaching, Global Shaders (Post-process, Compute) must be ready before gameplay starts. Gate your gameplay behind a loading screen that checks the compiler status.
Check FShaderPipelineCache::NumPrecompilesRemaining(). Do not remove the loading screen until this returns 0.
Enable precaching via CVar:
r.PSOPrecaching=1
Bundled PSOs
You can also record a Bundled PSO Cache — a recording of every shader drawn during a QA playthrough, bundled into the installer.
Console Command for Diagnosis
Use stat unitgraph to visualize hitches. If the green line (GPU) or game thread spikes simultaneously with a LogRHI warning in the output log, you missed a PSO.
Step-by-Step: Creating a Bundled PSO Cache
1. Configuration
Set up your project to generate Stable Keys — identifiers that persist across builds.
[DevOptions.Shaders]
NeedsShaderStableKeys=true
[ShaderPipelineCache]
LastOpenedMask=0
r.ShaderPipelineCache.StartupMode=1
[/Script/UnrealEd.ProjectPackagingSettings]
bShareMaterialShaderCode=True
bSharedMaterialNativeLibraries=True
2. The Clean Sweep
Old metadata poisons new caches. Always start clean.
- Delete
Intermediate,Saved/Cooked, andBinaries. - Perform a full Cook/Package of the game.
- Locate the generated
.shkfiles in:Save these files — you will need them.Saved/Cooked/Windows/[Project]/Metadata/PipelineCaches/
3. Recording
Play the game on the target hardware.
Launch with the following arguments:
-logPSO -clearPSODriverCache
During the session:
- Open every menu
- Fire every weapon
- Visit every level
Collect the .rec.upipelinecache files from Saved/CollectedPSOs/.
4. Expanding the Cache
Use the Unreal commandlet to merge the Recording (.rec) with the Stable Keys (.shk) to produce the Pipeline Cache (.spc).
5. Packaging
- Place the generated
.spcfile intoBuild\Windows\PipelineCaches\. - Repackage the game.
The engine will now load this cache on startup, pre-compiling the recorded PSOs during your splash screens.
Advanced: Custom "Brute Force" Preloading System
While UE5's native precaching works well for linear games, it often falls short in Open World or Level Streaming scenarios. If the engine streams in a new biome chunk, the native precacher may not compile new foliage shaders fast enough, causing a game-thread spike.
To address this, a "Brute Force" Precompilation System has been prototyped by developers (such as user S_PHIR_H on the Epic forums).
1. The Filter (Class Selection)
Instead of loading everything, the script focuses on the heaviest shader contributors.
Nodes: Get Asset Registry → Make ARFilter → Get Assets
Classes to include:
StaticMeshSkeletalMeshNiagaraSystem(VFX are a frequent source of hitches)
This check runs in the editor, but can be adapted for runtime use.
2. The Grid Loop (Positioning)
Spawning thousands of assets at (0, 0, 0) causes physics collisions and debugging nightmares. The script calculates a grid position for each asset:
| Axis | Formula | Result |
|---|---|---|
| X | (Index % 100) * 200 | Creates rows of 100 items |
| Y | (Index / 100) * 200 | Advances to the next row every 100 items |
| Spacing | 200 units | Prevents overlap |
3. Spawning the Asset
Use the Spawn Actor from Object node, passing Asset Data from the registry search. This spawns the appropriate actor representation (e.g., a StaticMeshActor for a Static Mesh) at the calculated grid location.
4. Scale Normalization (Anti-Overlap Logic)
Assets range from 1cm screws to 100m skyscrapers. To ensure the camera can see all of them without massive overlaps, the script normalizes scale across all spawned actors before the camera sweep begins.
Conclusion
Shader hitching is a solvable problem, but it requires a multi-layered defense strategy:
| Layer | Approach | When |
|---|---|---|
| 🔻 Reduce Variants | Minimize Static Switch usage in materials | Always |
| 🤖 Automate | Use the Brute Force system to catch edge cases | Dev / Testing |
| ⏳ Wait for PSOs | Show a loading screen until precaching completes (reference) | Level transitions |
| 📦 Record | Perform a Bundled PSO recording for a smooth end-user experience | Release |
All four systems can — and should — be used together for maximum coverage.