Skip to main content

The State of Desktop Forward Rendering in Unreal Engine 5.7

· 12 min read
Andriy Kalysh

There's a growing discontent with temporal anti-aliasing solutions in game development circles. Communities like r/FuckTAA have emerged as vocal critics of the ghosting and smearing artifacts that plague TAA and other temporal solutions.

The issue runs deeper than preference: UE's entire rendering pipeline now assumes temporal accumulation. SSGI has grown noisier since UE4, expecting TAA to clean up the result. Lumen's software ray tracing leans on temporal filtering to resolve its noisy probes. Even Nanite's virtualized geometry produces sub-pixel triangles that only resolve cleanly under temporal upscaling. Opting out of TAA means fighting the engine's core assumptions.

For certain scenarios, forward rendering with MSAA remains a compelling option. Dense foliage with a single dominant light source — the sun — is exactly where forward rendering can shine. One directional light and extensive vegetation where geometric edge quality matters more than complex lighting setups.

Personal Motivation

My interest is personal: I've always wanted to create a game set in a tropical forest environment. Perhaps this documentation will prove useful to graphics programmers exploring hybrid renderer implementations, or those seeking to understand how forward rendering can coexist with modern engine features.


AA Approach Comparisons

ConfigurationNotes
No AA (Deferred)Baseline
SMAA 1x (Deferred)Basic morphological AA
SMAA 1x + Filmic Filter (Deferred)Good result for deferred
MSAA 2x + SMAA S2x (Forward)Preserved edges better than Filmic + SMAA

I recently implemented SMAA S2x mode to work alongside MSAA 2x for my plugin, and the results demonstrate why this combination deserves attention.

A Note on the Filmic Filter

The SMAA 1x + Filmic Filter combination addresses a core limitation of pure morphological AA: lack of temporal stability. Based on Activision's Filmic SMAA research, this filter operates in two distinct modes:

Stationary pixels accumulate history aggressively (85% history weight by default), using Catmull-Rom bicubic sampling for sharp history reconstruction. When SMAA edge data is available and no significant motion is detected, the filter applies a convergence term m03 derived from subpixel positioning — this reconstructs detail between the current frame's left/right neighbors and blends it with history, effectively enhancing perceived resolution on static geometry.

Moving pixels (velocity > threshold or detected disocclusion) bypass the convergence sharpening entirely and blend toward the current frame, preventing the ghosting artifacts typical of aggressive temporal filters. Fast motion (>3.5 pixels) forces immediate fallback to the raw current frame.

The Result

Temporal smoothing and detail enhancement when the camera is still; instant reversion to sharp single-frame output on motion. This trades the persistent smear of traditional TAA for a "best of both worlds" approach that respects SMAA's geometric edge quality.


Why MSAA and Deferred Rendering Don't Mix

The GBuffer Problem

Deferred rendering stores material properties in multiple render targets called GBuffers. Looking at UE5's SceneTextures.cpp, we can see exactly what gets allocated:

SceneTextures.cpp — GBuffer allocation
if (Bindings.GBufferA.Index >= 0)
{
const FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent, Bindings.GBufferA.Format, FClearValueBinding::Transparent,
Bindings.GBufferA.Flags | FlagsToAdd | GFastVRamConfig.GBufferA,
Config.bRequireMultiView, Config.MobileMultiViewRenderTargetNumLayers);
SceneTextures.GBufferA = GraphBuilder.CreateTexture(Desc, TEXT("GBufferA"));
}
// ... GBufferB, C, D, E, F follow the same pattern

UE5's deferred renderer uses up to six GBuffer textures (A through F), plus depth. With 4x MSAA, you'd need to store 4 samples per pixel for each of these textures. The memory and bandwidth explosion is already substantial, and the lighting pass would need to read all MSAA samples, perform shading calculations per-sample, and resolve. Modern engines have dozens of passes that would each need MSAA-aware variants.

Deferred MSAA Is Possible — Crysis 3's Example

Before dismissing MSAA with deferred rendering entirely, it's worth examining how Crytek made it work in Crysis 3. Tiago Sousa's GDC 2013 and SIGGRAPH 2013 presentations document their implementation in detail.

The Core Technique: Stencil-Based Edge Detection

Crytek's approach splits rendering into pixel-frequency and sample-frequency passes:

  1. During G-Buffer fill, reserve 1 bit from the stencil buffer for a sub-sample mask.
  2. A resolve pass extracts sample 0 from the G-Buffer and builds a mask indicating whether all samples in a pixel match sample 0.
  3. Pixel-frequency passes read from pre-resolved (non-multisampled) textures, using stencil to process only uniform pixels.
  4. Sample-frequency passes read from multisampled textures, indexing via SV_SAMPLEINDEX, processing only edge pixels.

This builds on Intel's Andrew Lauritzen's SIGGRAPH 2010 research on tile-based deferred shading with MSAA. The key insight: store G-Buffer at sample frequency, but only apply per-sample shading where discontinuities exist.

The Performance Reality

The technique worked, but the performance cost was substantial. Forum discussions from 2013 show 30–50% frame rate drops when enabling MSAA in Crysis 3. Whether MSAA itself was the bottleneck remains debated — Threat Interactive's analysis of the Crysis 3 pipeline argues otherwise, though their methodology has drawn criticism in technical circles.

Why UE5 Doesn't Do This

Epic could theoretically implement stencil-based deferred MSAA. They haven't, likely because:

  • The engineering complexity is significant.
  • TSR/TAA solve aliasing well enough for most use cases.
  • Virtual Shadow Maps, Lumen, and Nanite all assume temporal accumulation anyway.
  • Modern hardware ray tracing changes the cost/benefit calculus.

For developers who need MSAA without temporal artifacts, UE5's forward rendering path remains the practical choice.


The Industry Shift Toward Visibility Buffers

id Software's GPC 2025 presentation on DOOM: The Dark Ages reveals they abandoned their Forward+ pipeline (used in DOOM Eternal) in favor of a visibility buffer / deferred hybrid. The core problem? Quad utilization efficiency.

When triangle density increases, forward rendering suffers disproportionately. id Software's profiling showed scenes where helper threads vastly outnumbered active threads — pixels being shaded that would never contribute to the final image. Their visibility buffer approach saved up to 25% GPU time on target hardware, with performance now scaling almost linearly with resolution.

Epic's trajectory tells a similar story. UE 5.7 deprecated Clustered Deferred Rendering, citing maintenance burden and low adoption — another reason may be the introduction of Megalights. Nanite already uses a visibility buffer internally. Lumen assumes temporal accumulation. The engine architecture increasingly optimizes for deferred-style pipelines with compute-based material evaluation.

The Hybrid Approach Remains Viable

Despite these trends, hybrid rendering — forward for select object types, deferred for the rest — remains a production-proven approach in 2025. DOOM: The Dark Ages itself ships as a hybrid: Forward+ is still used for transparents and remains available as a fallback path.

Many AAA studios continue to leverage hybrid pipelines precisely because different content types have different optimal rendering strategies. Foliage, hair, and particles often benefit from forward rendering's MSAA integration and simpler transparency handling, while static opaque geometry fits naturally into deferred or visibility buffer workflows.

Opinion

UE5 should preserve this flexibility. The current forward rendering path works. It integrates with MSAA. It handles masked materials without the complexity of compute-based dispatch systems. Rather than treating forward rendering as a checkbox, Epic should integrate it properly for developers who don't want to use "Unreal's Pipeline".


UE5's Forward Rendering Pipeline

Despite being named FDeferredShadingSceneRenderer, UE5's main renderer handles both deferred and forward paths. The branch happens based on project settings:

DeferredShadingRenderer.cpp
if (IsForwardShadingEnabled(ShaderPlatform))
{
// Forward-specific path
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
RenderShadowDepthMaps(GraphBuilder, InitViewTaskDatas.DynamicShadows,
InstanceCullingManager, ExternalAccessQueue);
bShadowMapsRenderedEarly = true;

// Hair strands rendering
if (bHairStrandsEnable)
{
RenderHairPrePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager, HairStrandsBookmarkParameters.CullingResults);
RenderHairBasePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager);
}

// Forward shadow projection
RenderForwardShadowProjections(GraphBuilder, SceneTextures,
ForwardScreenSpaceShadowMaskTexture, ForwardScreenSpaceShadowMaskHairTexture);

// Volumetric fog BEFORE base pass (critical ordering difference)
ComputeVolumetricFog(GraphBuilder, SceneTextures);
}
Critical Ordering Constraint

Forward rendering requires shadow maps to be rendered before the base pass. In deferred, shadows can be calculated later using GBuffer data. This ordering constraint is fundamental to understanding the pipeline.

Forward Rendering Execution Order

  1. Pre-pass / Early Z (required for forward; optional optimization for deferred)
  2. Shadow Depth Maps (rendered early in forward)
  3. Hair Strands (if enabled)
  4. Forward Shadow Projection → ForwardScreenSpaceShadowMaskTexture
  5. Volumetric Fog (before base pass in forward)
  6. Base Pass (lighting calculated inline)

The Base Pass: Where Lighting Happens

In forward rendering, the base pass does everything. From RenderBasePass:

DeferredShadingRenderer.h
static void RenderBasePass(
FDeferredShadingSceneRenderer& Renderer,
FRDGBuilder& GraphBuilder,
TArrayView<FViewInfo> InViews,
FSceneTextures& SceneTextures,
const FDBufferTextures& DBufferTextures,
FExclusiveDepthStencil::Type BasePassDepthStencilAccess,
FRDGTextureRef ForwardShadowMaskTexture,
FInstanceCullingManager& InstanceCullingManager,
bool bNaniteEnabled,
struct FNaniteShadingCommands& NaniteBasePassShadingCommands,
const TArrayView<Nanite::FRasterResults>& NaniteRasterResults);
info

Notice ForwardShadowMaskTexture being passed directly to the base pass. Forward materials sample this during shading.

Forward Lighting: The Shader Side

The actual lighting calculations happen in ForwardLightingCommon.ush. The core function is GetForwardDirectLightingSplit:

ForwardLightingCommon.ush
FDeferredLightingSplit GetForwardDirectLightingSplit(
uint2 PixelPos,
uint GridIndex,
float3 TranslatedWorldPosition,
float3 CameraVector,
FGBufferData GBufferData, // Note: "GBufferData" is an inaccurate term in forward
float2 ScreenUV,
uint PrimitiveId,
uint EyeIndex,
float Dither,
float InDirectionalLightCloudShadow,
float3 InDirectionalLightAtmosphereTransmittance,
inout float OutDirectionalLightShadow,
bool bSeparateMainDirLightLuminance,
inout float3 SeparatedMainDirLightLuminance,
bool bSkipDirLightVirtualShadowMapEvaluation)

Directional Light Handling

ForwardLightingCommon.ush — Directional light path
BRANCH
if (DirectionalLightData.HasDirectionalLight
#if MATERIALBLENDING_ANY_TRANSLUCENT
&& DirectionalLightData.bAffectsTranslucentLighting > 0
#endif
)
{
half4 PreviewShadowMapChannelMask = 1;
uint DirLightingChannelMask = LIGHTING_CHANNEL_MASK;
FDeferredLightData LightData = ConvertToDeferredLight(DirectionalLightData,
SpecularScale, PreviewShadowMapChannelMask, DirLightingChannelMask);

// Shadow factor calculation
#if DISABLE_FORWARD_DIRECTIONAL_LIGHT_SHADOW
float4 LightAttenuation = float4(1, 1, 1, 1);
#elif ((MATERIALBLENDING_SOLID || MATERIALBLENDING_MASKED) && !MATERIAL_SHADINGMODEL_SINGLELAYERWATER)
float DynamicShadowing = dot(PreviewShadowMapChannelMask, DynamicShadowFactors);
float PerObjectShadowing = LightData.DistanceFadeMAD.y < 0.0f ? 1.0f : DynamicShadowing;
float WholeSceneShadowing = LightData.DistanceFadeMAD.y < 0.0f ? DynamicShadowing : 1.0f;
float4 LightAttenuation = float4(WholeSceneShadowing.xx, PerObjectShadowing.xx);
#else
// Translucent path - calculates shadows inline
float DynamicShadowFactor = ComputeDirectionalLightDynamicShadowing(
TranslatedWorldPosition, GBufferData.Depth, bUnused);
#endif

Local Lights: Clustered Forward

Forward rendering doesn't mean no local lights. UE5 uses clustered forward lighting:

ForwardLightingCommon.ush — Clustered local lights
#if !DISABLE_FORWARD_LOCAL_LIGHTS
const FCulledLightsGridHeader CulledLightsGridHeader = GetCulledLightsGridHeader(GridIndex);

// Safety clamp to prevent GPU hangs
const uint NumLightsInGridCell = min(CulledLightsGridHeader.NumLights, GetMaxLightsPerCell());

LOOP
for (uint GridLightListIndex = 0; GridLightListIndex < NumLightsInGridCell; GridLightListIndex++)
{
half4 PreviewShadowMapChannelMask = 1;
uint LocalLightingChannelMask = LIGHTING_CHANNEL_MASK;
const FLocalLightData LocalLight = GetLocalLightDataFromGrid(
CulledLightsGridHeader.DataStartIndex + GridLightListIndex, EyeIndex);

#if MATERIALBLENDING_ANY_TRANSLUCENT
if(UnpackAffectsTranslucentLighting(LocalLight) == 0)
{
continue; // Skip lights that don't affect translucency
}
#endif

FDeferredLightData LightData = ConvertToDeferredLight(LocalLight, SpecularScale,
PreviewShadowMapChannelMask, LocalLightingChannelMask);

The per-cell light limit is controlled by this CVar:

int32 GMaxCulledLightsPerCell = 32;
FAutoConsoleVariableRef CVarMaxCulledLightsPerCell(
TEXT("r.Forward.MaxCulledLightsPerCell"),
GMaxCulledLightsPerCell,
TEXT("Controls how much memory is allocated for each cell for light culling. "
"When r.Forward.LightLinkedListCulling is enabled, this is used to compute "
"a global max instead of a per-cell limit on culled lights."),
ECVF_Scalability | ECVF_RenderThreadSafe
);

MSAA Implementation in UE5

Scene Texture Allocation

From SceneTextures.cpp, MSAA is configured at texture creation time:

SceneTextures.cpp — MSAA CVar
static TAutoConsoleVariable<int32> CVarMSAACount(
TEXT("r.MSAACount"),
4,
TEXT("Number of MSAA samples to use with the forward renderer. "
"Only used when MSAA is enabled in the rendering project settings.\n")
TEXT("0: MSAA disabled (Temporal AA enabled)\n")
TEXT("1: MSAA disabled\n")
TEXT("2: Use 2x MSAA\n")
TEXT("4: Use 4x MSAA")
TEXT("8: Use 8x MSAA"),
ECVF_RenderThreadSafe | ECVF_Scalability
);

Depth Buffer with MSAA

FMinimalSceneTextures::InitializeViewFamily
FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
SceneTextures.Config.Extent,
PF_DepthStencil,
Config.DepthClearValue,
Config.DepthCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples; // MSAA sample count

SceneTextures.Depth = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));

// MSAA requires resolve target
if (Desc.NumSamples > 1)
{
Desc.NumSamples = 1;

if ((StereoDepthRHI = FindStereoDepthTexture(Config.bSupportsXRTargetManagerDepthAlloc,
Config.Extent, ETextureCreateFlags::DepthStencilResolveTarget, Desc.NumSamples)) != nullptr)
{
// Use XR-provided resolve target
SceneTextures.Depth.Resolve = RegisterExternalTexture(GraphBuilder, StereoDepthRHI, TEXT("SceneDepthZ"));
}
else if (Config.bKeepDepthContent)
{
// Create our own resolve target
SceneTextures.Depth.Resolve = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));
}
}

Scene Color with MSAA

SceneTextures.cpp — Scene Color creation
{
const bool bIsMobilePlatform = Config.ShadingPath == EShadingPath::Mobile;
const ETextureCreateFlags sRGBFlag = (bIsMobilePlatform && IsMobileColorsRGB())
? TexCreate_SRGB : TexCreate_None;

FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent,
Config.ColorFormat,
Config.ColorClearValue,
Config.ColorCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples;

// CreateTextureMSAA handles creating both MSAA target and resolve target
SceneTextures.Color = CreateTextureMSAA(GraphBuilder, Desc,
TEXT("SceneColorMS"), TEXT("SceneColor"),
GFastVRamConfig.SceneColor | sRGBFlag);
}

The FRDGTextureMSAA structure (used for both Color and Depth) contains two members: Target (the MSAA render target) and Resolve (the resolved single-sample texture).


Limitations of Forward Rendering in UE5

Virtual Shadow Maps: Not Supported

This is explicitly enforced in the code:

if (IsForwardShadingEnabled(ShaderPlatform))
{
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
}

Post-Process Effects Require Resolved Data

Many post-process effects operate on resolved (single-sample) data, meaning MSAA doesn't help with aliasing they introduce:

  • Screen-Space Reflections (SSR)
  • Screen-Space Ambient Occlusion (SSAO)
  • Depth of Field
  • Motion Blur

Nanite: Works, But Independently

Nanite does work with forward rendering, but operates somewhat independently:

if (bNaniteEnabled && InViews.Num() > 0)
{
RenderNanite(GraphBuilder, InViews, LocalSceneTextures, bIsEarlyDepthComplete,
InNaniteBasePassVisibility, NaniteRasterResults, PrimaryNaniteViews,
FirstStageDepthBuffer);
}

Conclusion

Forward rendering in UE5 is a fully functional path, not a legacy compatibility mode. It's designed for scenarios with simpler lighting requirements where geometric clarity matters.

What Works Well

FeatureNotes
Clustered Forward LightingSupports multiple local lights efficiently
Full MSAASupported at the renderer level
NaniteWorks, albeit somewhat independently
Hair StrandsBenefits from MSAA coverage more than TAA temporal resolve — less ghosting on fine strands
Volumetric FogFully integrated

Real Trade-offs

What You Lose
  • No Virtual Shadow Maps — the engine explicitly asserts against this.
  • No Lumen GI or Lumen Reflections — these rely on deferred-specific screen traces and the surface cache.
  • Limited screen-space effects — SSR, SSAO, DoF, and motion blur all operate on resolved data.

For the right content type — dense foliage, fine geometry, single dominant light — forward rendering with MSAA remains a genuinely superior choice to temporal solutions. The key is understanding where each pipeline's assumptions align with your rendering goals.

A Deep Dive into PSO Caching in Unreal Engine

· 7 min read
Andriy Kalysh

In modern real-time rendering, few things break immersion like a shader compilation hitch. You turn a corner, an explosion goes off, and the game freezes for 200ms. This article explores why this happens, how Bungie and Ubisoft solved it, and how to improve PSO Precaching in Unreal Engine 5.


Part 1: The Theory – Why Do We Hitch?

To understand shader hitching, we must understand the Pipeline State Object (PSO).

What is a PSO?

A PSO is a monolithic object that describes the state of the graphics pipeline. It is not just the shader code (Vertex/Pixel shader) — it combines the compiled shader bytecode with specific render states, including:

  • Blend State (Translucency, Additive, etc.)
  • Rasterizer State (Wireframe, Culling)
  • Depth/Stencil State
  • Input Layouts

In modern APIs like DirectX 12 and Vulkan, the GPU cannot draw a single triangle until the specific PSO for that draw call is fully compiled and ready.

The "Good Old Days" of DirectX 11

If PSOs are necessary, why didn't games hitch as much in the DX11 era?

According to AMD and Ubisoft's analysis (GDC 2017), D3D11 drivers hid significant complexity. When you called SetVertexShader or SetBlendState, the driver didn't compile immediately. It waited until draw time, checked if it had seen that combination before, and if not, JIT-compiled (Just-In-Time) the state. The drivers were incredibly optimized to handle this "lazy" compilation without stalling the game thread, effectively managing permutations automatically.

The DX12/Vulkan Reality

Modern APIs shifted control from the driver to the developer. Explicit control offers higher performance but removes the safety net. If your engine requests a specific Shader + Blend State combination that hasn't been pre-compiled (Precached), the driver must pause execution to compile it right now.

Result

A hitch.


Part 2: The Core Problem – Permutation Explosion

Why can't we just compile everything at startup? Because of Shader Permutations.

Modern material systems allow artists to toggle thousands of options (Static Switches). If you have a shader with 10 boolean switches (Shadows, Fog, Metalness, etc.), that creates 2^10 (1,024) potential variants.

Unreal Engine historically favors Permutations over Dynamic Branching:

ApproachGPU PerformanceDisk SizeCompile Time
Permutations✅ Fast (dedicated variant per combo)❌ Large❌ Slow
Dynamic Branching❌ Slower (branch divergence)✅ Small✅ Fast

Case Study: Bungie (Destiny)

In their 2017 GDC talk, Bungie revealed how they handled Destiny's 18,000+ artist-authored shaders. They avoided explosion using the TFX System:

  • Components: Encapsulated shader options within code boundaries.
  • Variant Layers: Artists created "layers" of overrides in a single file.
  • Selective Building: Crucially, they adhered to a strict rule:
Bungie's Golden Rule

"DON'T build ALL variants." Only compile the subsets actually used in game content, treating the pipeline as an offline content-bake rather than a runtime discovery.

Case Study: Ubisoft (AnvilNext Engine)

Moving Assassin's Creed to DX12, Ubisoft found their existing "granular" state management (setting states piecemeal) was incompatible with DX12's "blob" PSOs. They had to rebuild their renderer to pre-calculate these state blobs offline or during loading screens, essentially creating a database of PSOs linked to material graphs.

Honorable Mention: Naughty Dog

Naughty Dog opted for an Uber-Shader approach in The Last of Us Part II:

"The shader is about 48,000 lines of code, which doesn't include generated code."


Part 3: Unreal Engine 5

Automated PSO Precaching

As of UE 5.3+, Epic enabled PSO Precaching by default. This system looks at the assets being loaded and compiles the necessary PSOs on background threads before the GPU needs them.

Implementing the Loading Screen (The "Wait" Logic)

Even with background precaching, Global Shaders (Post-process, Compute) must be ready before gameplay starts. Gate your gameplay behind a loading screen that checks the compiler status.

Implementation

Check FShaderPipelineCache::NumPrecompilesRemaining(). Do not remove the loading screen until this returns 0.

Enable precaching via CVar:

r.PSOPrecaching=1

Bundled PSOs

You can also record a Bundled PSO Cache — a recording of every shader drawn during a QA playthrough, bundled into the installer.

Console Command for Diagnosis

Use stat unitgraph to visualize hitches. If the green line (GPU) or game thread spikes simultaneously with a LogRHI warning in the output log, you missed a PSO.


Step-by-Step: Creating a Bundled PSO Cache

1. Configuration

Set up your project to generate Stable Keys — identifiers that persist across builds.

DefaultEngine.ini
[DevOptions.Shaders]
NeedsShaderStableKeys=true

[ShaderPipelineCache]
LastOpenedMask=0
r.ShaderPipelineCache.StartupMode=1
DefaultGame.ini
[/Script/UnrealEd.ProjectPackagingSettings]
bShareMaterialShaderCode=True
bSharedMaterialNativeLibraries=True

2. The Clean Sweep

Critical Step

Old metadata poisons new caches. Always start clean.

  1. Delete Intermediate, Saved/Cooked, and Binaries.
  2. Perform a full Cook/Package of the game.
  3. Locate the generated .shk files in:
    Saved/Cooked/Windows/[Project]/Metadata/PipelineCaches/
    Save these files — you will need them.

3. Recording

Play the game on the target hardware.

Launch with the following arguments:

-logPSO -clearPSODriverCache

During the session:

  • Open every menu
  • Fire every weapon
  • Visit every level

Collect the .rec.upipelinecache files from Saved/CollectedPSOs/.

4. Expanding the Cache

Use the Unreal commandlet to merge the Recording (.rec) with the Stable Keys (.shk) to produce the Pipeline Cache (.spc).

5. Packaging

  1. Place the generated .spc file into Build\Windows\PipelineCaches\.
  2. Repackage the game.

The engine will now load this cache on startup, pre-compiling the recorded PSOs during your splash screens.


Advanced: Custom "Brute Force" Preloading System

While UE5's native precaching works well for linear games, it often falls short in Open World or Level Streaming scenarios. If the engine streams in a new biome chunk, the native precacher may not compile new foliage shaders fast enough, causing a game-thread spike.

To address this, a "Brute Force" Precompilation System has been prototyped by developers (such as user S_PHIR_H on the Epic forums).

1. The Filter (Class Selection)

Instead of loading everything, the script focuses on the heaviest shader contributors.

Nodes: Get Asset Registry → Make ARFilter → Get Assets

Classes to include:

  • StaticMesh
  • SkeletalMesh
  • NiagaraSystem (VFX are a frequent source of hitches)
note

This check runs in the editor, but can be adapted for runtime use.

2. The Grid Loop (Positioning)

Spawning thousands of assets at (0, 0, 0) causes physics collisions and debugging nightmares. The script calculates a grid position for each asset:

AxisFormulaResult
X(Index % 100) * 200Creates rows of 100 items
Y(Index / 100) * 200Advances to the next row every 100 items
Spacing200 unitsPrevents overlap

3. Spawning the Asset

Use the Spawn Actor from Object node, passing Asset Data from the registry search. This spawns the appropriate actor representation (e.g., a StaticMeshActor for a Static Mesh) at the calculated grid location.

4. Scale Normalization (Anti-Overlap Logic)

Assets range from 1cm screws to 100m skyscrapers. To ensure the camera can see all of them without massive overlaps, the script normalizes scale across all spawned actors before the camera sweep begins.


Conclusion

Shader hitching is a solvable problem, but it requires a multi-layered defense strategy:

LayerApproachWhen
🔻 Reduce VariantsMinimize Static Switch usage in materialsAlways
🤖 AutomateUse the Brute Force system to catch edge casesDev / Testing
Wait for PSOsShow a loading screen until precaching completes (reference)Level transitions
📦 RecordPerform a Bundled PSO recording for a smooth end-user experienceRelease
tip

All four systems can — and should — be used together for maximum coverage.

PC Gaming Optimization Guide

· 4 min read

Most "optimization" guides just tell you to turn on DLSS or FSR or get new hardware. This guide focuses on the fundamentals:

  • System-level bottlenecks
  • OS issues
  • Engine-level tweaks
  • Red/Green specific fixes

1. The "Clean Environment" Protocol

Before tweaking settings, eliminate background interference.

Kill the Overlays

Disable Steam, Discord, GameBar, and NVIDIA overlays. These create "hooks" that interfere with frame pacing.

Cold Boot

Don't just Restart. Perform a full Shutdown, let the hardware capacitors clear, and boot fresh.

BIOS & Drivers

Keep your BIOS updated for critical CPU microcode fixes. For AMD, always use the latest chipset drivers directly from their site — not Windows Update.

Windows Update Reverting AMD Drivers

Windows Update sometimes reverts AMD drivers to older, generic versions, breaking features like Radeon Software. To fix this:

  1. Use the Show/Hide Updates troubleshooter to block the offending update.
  2. Use Display Driver Uninstaller (DDU) in Safe Mode for a clean install of the latest official AMD drivers.
  3. Disable driver updates via Device Installation Settings or Group Policy to prevent future overwrites.

Debloated OS

While older Windows iterations often feel snappier, staying on an outdated version isn't recommended for security reasons. The best option is Windows Enterprise, which receives stable, timely updates and ships with less bloat. LTSC releases have been known to cause compatibility issues.


2. Antivirus & Security (The Stutter Killers)

Windows Security can cause massive hitches by scanning game files as they are streamed.

Add AV Exclusions

Add your game's .exe and its installation folder to your antivirus Exclusions list.

Exclude Shader Caches

This is critical. Add these paths to your exclusions:

%AppData%\Local Low\NVIDIA\PerDriverVersion\DXCache
%LocalAppData%\[GameName]\Saved

CFG Override

  1. Open Start and search for Exploit Protection.
  2. Go to Program Settings.
  3. Add your game's .exe.
  4. Scroll to Control Flow Guard (CFG)Override → set to OFF.

3. Unreal Engine 5 Manual Tuning

UE5 games often suffer from "traversal stutter." You can force better engine behaviour by editing configuration files manually.

Finding the Config File

  1. Paste %LocalAppData% into File Explorer.
  2. Find your game's developer folder.
  3. Navigate to: Saved > Config > Windows (or WindowsClient / WinGDK).
  4. Open Engine.ini.

PSO Precaching

Add the following at the bottom of Engine.ini:

[/Script/Engine.RendererSettings]
r.PSOPrecaching=1
note

r.PSOPrecaching=1 works for games built with UE5.3 and later. This CVAR forces the game to pre-compile shaders, reducing shader-compile stutter during gameplay.

You can also use this file to lower the graphics API or disable specific features on a per-game basis.


4. FastVRam: Console-Level Memory Logic

In Unreal Engine, Fast VRAM is a specialized optimization flag within the Render Dependency Graph (RDG). It acts as a "VIP pass" for GPU memory allocation, instructing the renderer to place the most performance-critical resources — such as Depth Buffers, GBuffers, and Shadow Maps — into the hardware's fastest available memory pool.

Why Use It?

On consoles, bandwidth is king. By forcing the GBuffer (world normals, color, roughness) and Depth Buffers into high-speed VRAM, you reduce the time the GPU spends waiting for data. This mimics console-level stability on PC.

Configuration

Add the following under [/Script/Engine.RendererSettings] in your Engine.ini:

r.FastVRam.DBufferA=1
r.FastVRam.DBufferB=1
r.FastVRam.DBufferC=1
r.FastVRam.DBufferMask=1
r.FastVRam.GBufferA=1
r.FastVRam.GBufferB=1
r.FastVRam.GBufferC=1
r.FastVRam.GBufferD=1
r.FastVRam.GBufferE=1
r.FastVRam.GBufferF=1
r.FastVRam.GBufferVelocity=1
r.FastVRam.SceneDepth=1

5. Hardware-Specific Tweaks

NVIDIA

In NVIDIA Control Panel → Manage 3D Settings, set CUDA – System Fallback Policy to Prefer No System Fallback. This prevents the game from falling back to slow system RAM when VRAM fills up.

caution

This setting is volatile and may reset after driver updates.

AMD

If you have a Ryzen CPU and experience unexplained performance drops, try disabling Core Isolation in Windows Security settings.

Resizable BAR (ReBar)

GPU GenerationMethod
NVIDIA Turing / GTX 1600Use NvStrapsReBar to enable ReBar on older cards
AMD Polaris / VegaEnable SAM via Registry by setting KMD_EnableReBarForLegacyASIC to 1

6. The "Nuclear" Option

If you're still experiencing stuttering after everything above:

  • DDU (Display Driver Uninstaller): Run in Safe Mode to fully wipe your GPU drivers, then reinstall the latest version from scratch.
  • Disable Overclocks: UE5.0–5.3 is extremely sensitive to instability. If you're crashing, return your CPU and GPU to stock speeds.
  • Clear Shader Caches: Manually delete the DXCache folders mentioned in Section 2 to force a clean rebuild.
  • Set Virtual Memory to 0: If you have sufficient RAM, disabling the page file can remove a source of latency. (Tip via @GameDevMicah)

Summary

True optimization isn't just about upscaling — it's about removing software friction and managing memory like a console. Fix the foundation first.