Skip to main content

2 posts tagged with "unreal-engine"

View All Tags

The State of Desktop Forward Rendering in Unreal Engine 5.7

· 12 min read
Andriy Kalysh

There's a growing discontent with temporal anti-aliasing solutions in game development circles. Communities like r/FuckTAA have emerged as vocal critics of the ghosting and smearing artifacts that plague TAA and other temporal solutions.

The issue runs deeper than preference: UE's entire rendering pipeline now assumes temporal accumulation. SSGI has grown noisier since UE4, expecting TAA to clean up the result. Lumen's software ray tracing leans on temporal filtering to resolve its noisy probes. Even Nanite's virtualized geometry produces sub-pixel triangles that only resolve cleanly under temporal upscaling. Opting out of TAA means fighting the engine's core assumptions.

For certain scenarios, forward rendering with MSAA remains a compelling option. Dense foliage with a single dominant light source — the sun — is exactly where forward rendering can shine. One directional light and extensive vegetation where geometric edge quality matters more than complex lighting setups.

Personal Motivation

My interest is personal: I've always wanted to create a game set in a tropical forest environment. Perhaps this documentation will prove useful to graphics programmers exploring hybrid renderer implementations, or those seeking to understand how forward rendering can coexist with modern engine features.


AA Approach Comparisons

ConfigurationNotes
No AA (Deferred)Baseline
SMAA 1x (Deferred)Basic morphological AA
SMAA 1x + Filmic Filter (Deferred)Good result for deferred
MSAA 2x + SMAA S2x (Forward)Preserved edges better than Filmic + SMAA

I recently implemented SMAA S2x mode to work alongside MSAA 2x for my plugin, and the results demonstrate why this combination deserves attention.

A Note on the Filmic Filter

The SMAA 1x + Filmic Filter combination addresses a core limitation of pure morphological AA: lack of temporal stability. Based on Activision's Filmic SMAA research, this filter operates in two distinct modes:

Stationary pixels accumulate history aggressively (85% history weight by default), using Catmull-Rom bicubic sampling for sharp history reconstruction. When SMAA edge data is available and no significant motion is detected, the filter applies a convergence term m03 derived from subpixel positioning — this reconstructs detail between the current frame's left/right neighbors and blends it with history, effectively enhancing perceived resolution on static geometry.

Moving pixels (velocity > threshold or detected disocclusion) bypass the convergence sharpening entirely and blend toward the current frame, preventing the ghosting artifacts typical of aggressive temporal filters. Fast motion (>3.5 pixels) forces immediate fallback to the raw current frame.

The Result

Temporal smoothing and detail enhancement when the camera is still; instant reversion to sharp single-frame output on motion. This trades the persistent smear of traditional TAA for a "best of both worlds" approach that respects SMAA's geometric edge quality.


Why MSAA and Deferred Rendering Don't Mix

The GBuffer Problem

Deferred rendering stores material properties in multiple render targets called GBuffers. Looking at UE5's SceneTextures.cpp, we can see exactly what gets allocated:

SceneTextures.cpp — GBuffer allocation
if (Bindings.GBufferA.Index >= 0)
{
const FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent, Bindings.GBufferA.Format, FClearValueBinding::Transparent,
Bindings.GBufferA.Flags | FlagsToAdd | GFastVRamConfig.GBufferA,
Config.bRequireMultiView, Config.MobileMultiViewRenderTargetNumLayers);
SceneTextures.GBufferA = GraphBuilder.CreateTexture(Desc, TEXT("GBufferA"));
}
// ... GBufferB, C, D, E, F follow the same pattern

UE5's deferred renderer uses up to six GBuffer textures (A through F), plus depth. With 4x MSAA, you'd need to store 4 samples per pixel for each of these textures. The memory and bandwidth explosion is already substantial, and the lighting pass would need to read all MSAA samples, perform shading calculations per-sample, and resolve. Modern engines have dozens of passes that would each need MSAA-aware variants.

Deferred MSAA Is Possible — Crysis 3's Example

Before dismissing MSAA with deferred rendering entirely, it's worth examining how Crytek made it work in Crysis 3. Tiago Sousa's GDC 2013 and SIGGRAPH 2013 presentations document their implementation in detail.

The Core Technique: Stencil-Based Edge Detection

Crytek's approach splits rendering into pixel-frequency and sample-frequency passes:

  1. During G-Buffer fill, reserve 1 bit from the stencil buffer for a sub-sample mask.
  2. A resolve pass extracts sample 0 from the G-Buffer and builds a mask indicating whether all samples in a pixel match sample 0.
  3. Pixel-frequency passes read from pre-resolved (non-multisampled) textures, using stencil to process only uniform pixels.
  4. Sample-frequency passes read from multisampled textures, indexing via SV_SAMPLEINDEX, processing only edge pixels.

This builds on Intel's Andrew Lauritzen's SIGGRAPH 2010 research on tile-based deferred shading with MSAA. The key insight: store G-Buffer at sample frequency, but only apply per-sample shading where discontinuities exist.

The Performance Reality

The technique worked, but the performance cost was substantial. Forum discussions from 2013 show 30–50% frame rate drops when enabling MSAA in Crysis 3. Whether MSAA itself was the bottleneck remains debated — Threat Interactive's analysis of the Crysis 3 pipeline argues otherwise, though their methodology has drawn criticism in technical circles.

Why UE5 Doesn't Do This

Epic could theoretically implement stencil-based deferred MSAA. They haven't, likely because:

  • The engineering complexity is significant.
  • TSR/TAA solve aliasing well enough for most use cases.
  • Virtual Shadow Maps, Lumen, and Nanite all assume temporal accumulation anyway.
  • Modern hardware ray tracing changes the cost/benefit calculus.

For developers who need MSAA without temporal artifacts, UE5's forward rendering path remains the practical choice.


The Industry Shift Toward Visibility Buffers

id Software's GPC 2025 presentation on DOOM: The Dark Ages reveals they abandoned their Forward+ pipeline (used in DOOM Eternal) in favor of a visibility buffer / deferred hybrid. The core problem? Quad utilization efficiency.

When triangle density increases, forward rendering suffers disproportionately. id Software's profiling showed scenes where helper threads vastly outnumbered active threads — pixels being shaded that would never contribute to the final image. Their visibility buffer approach saved up to 25% GPU time on target hardware, with performance now scaling almost linearly with resolution.

Epic's trajectory tells a similar story. UE 5.7 deprecated Clustered Deferred Rendering, citing maintenance burden and low adoption — another reason may be the introduction of Megalights. Nanite already uses a visibility buffer internally. Lumen assumes temporal accumulation. The engine architecture increasingly optimizes for deferred-style pipelines with compute-based material evaluation.

The Hybrid Approach Remains Viable

Despite these trends, hybrid rendering — forward for select object types, deferred for the rest — remains a production-proven approach in 2025. DOOM: The Dark Ages itself ships as a hybrid: Forward+ is still used for transparents and remains available as a fallback path.

Many AAA studios continue to leverage hybrid pipelines precisely because different content types have different optimal rendering strategies. Foliage, hair, and particles often benefit from forward rendering's MSAA integration and simpler transparency handling, while static opaque geometry fits naturally into deferred or visibility buffer workflows.

Opinion

UE5 should preserve this flexibility. The current forward rendering path works. It integrates with MSAA. It handles masked materials without the complexity of compute-based dispatch systems. Rather than treating forward rendering as a checkbox, Epic should integrate it properly for developers who don't want to use "Unreal's Pipeline".


UE5's Forward Rendering Pipeline

Despite being named FDeferredShadingSceneRenderer, UE5's main renderer handles both deferred and forward paths. The branch happens based on project settings:

DeferredShadingRenderer.cpp
if (IsForwardShadingEnabled(ShaderPlatform))
{
// Forward-specific path
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
RenderShadowDepthMaps(GraphBuilder, InitViewTaskDatas.DynamicShadows,
InstanceCullingManager, ExternalAccessQueue);
bShadowMapsRenderedEarly = true;

// Hair strands rendering
if (bHairStrandsEnable)
{
RenderHairPrePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager, HairStrandsBookmarkParameters.CullingResults);
RenderHairBasePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager);
}

// Forward shadow projection
RenderForwardShadowProjections(GraphBuilder, SceneTextures,
ForwardScreenSpaceShadowMaskTexture, ForwardScreenSpaceShadowMaskHairTexture);

// Volumetric fog BEFORE base pass (critical ordering difference)
ComputeVolumetricFog(GraphBuilder, SceneTextures);
}
Critical Ordering Constraint

Forward rendering requires shadow maps to be rendered before the base pass. In deferred, shadows can be calculated later using GBuffer data. This ordering constraint is fundamental to understanding the pipeline.

Forward Rendering Execution Order

  1. Pre-pass / Early Z (required for forward; optional optimization for deferred)
  2. Shadow Depth Maps (rendered early in forward)
  3. Hair Strands (if enabled)
  4. Forward Shadow Projection → ForwardScreenSpaceShadowMaskTexture
  5. Volumetric Fog (before base pass in forward)
  6. Base Pass (lighting calculated inline)

The Base Pass: Where Lighting Happens

In forward rendering, the base pass does everything. From RenderBasePass:

DeferredShadingRenderer.h
static void RenderBasePass(
FDeferredShadingSceneRenderer& Renderer,
FRDGBuilder& GraphBuilder,
TArrayView<FViewInfo> InViews,
FSceneTextures& SceneTextures,
const FDBufferTextures& DBufferTextures,
FExclusiveDepthStencil::Type BasePassDepthStencilAccess,
FRDGTextureRef ForwardShadowMaskTexture,
FInstanceCullingManager& InstanceCullingManager,
bool bNaniteEnabled,
struct FNaniteShadingCommands& NaniteBasePassShadingCommands,
const TArrayView<Nanite::FRasterResults>& NaniteRasterResults);
info

Notice ForwardShadowMaskTexture being passed directly to the base pass. Forward materials sample this during shading.

Forward Lighting: The Shader Side

The actual lighting calculations happen in ForwardLightingCommon.ush. The core function is GetForwardDirectLightingSplit:

ForwardLightingCommon.ush
FDeferredLightingSplit GetForwardDirectLightingSplit(
uint2 PixelPos,
uint GridIndex,
float3 TranslatedWorldPosition,
float3 CameraVector,
FGBufferData GBufferData, // Note: "GBufferData" is an inaccurate term in forward
float2 ScreenUV,
uint PrimitiveId,
uint EyeIndex,
float Dither,
float InDirectionalLightCloudShadow,
float3 InDirectionalLightAtmosphereTransmittance,
inout float OutDirectionalLightShadow,
bool bSeparateMainDirLightLuminance,
inout float3 SeparatedMainDirLightLuminance,
bool bSkipDirLightVirtualShadowMapEvaluation)

Directional Light Handling

ForwardLightingCommon.ush — Directional light path
BRANCH
if (DirectionalLightData.HasDirectionalLight
#if MATERIALBLENDING_ANY_TRANSLUCENT
&& DirectionalLightData.bAffectsTranslucentLighting > 0
#endif
)
{
half4 PreviewShadowMapChannelMask = 1;
uint DirLightingChannelMask = LIGHTING_CHANNEL_MASK;
FDeferredLightData LightData = ConvertToDeferredLight(DirectionalLightData,
SpecularScale, PreviewShadowMapChannelMask, DirLightingChannelMask);

// Shadow factor calculation
#if DISABLE_FORWARD_DIRECTIONAL_LIGHT_SHADOW
float4 LightAttenuation = float4(1, 1, 1, 1);
#elif ((MATERIALBLENDING_SOLID || MATERIALBLENDING_MASKED) && !MATERIAL_SHADINGMODEL_SINGLELAYERWATER)
float DynamicShadowing = dot(PreviewShadowMapChannelMask, DynamicShadowFactors);
float PerObjectShadowing = LightData.DistanceFadeMAD.y < 0.0f ? 1.0f : DynamicShadowing;
float WholeSceneShadowing = LightData.DistanceFadeMAD.y < 0.0f ? DynamicShadowing : 1.0f;
float4 LightAttenuation = float4(WholeSceneShadowing.xx, PerObjectShadowing.xx);
#else
// Translucent path - calculates shadows inline
float DynamicShadowFactor = ComputeDirectionalLightDynamicShadowing(
TranslatedWorldPosition, GBufferData.Depth, bUnused);
#endif

Local Lights: Clustered Forward

Forward rendering doesn't mean no local lights. UE5 uses clustered forward lighting:

ForwardLightingCommon.ush — Clustered local lights
#if !DISABLE_FORWARD_LOCAL_LIGHTS
const FCulledLightsGridHeader CulledLightsGridHeader = GetCulledLightsGridHeader(GridIndex);

// Safety clamp to prevent GPU hangs
const uint NumLightsInGridCell = min(CulledLightsGridHeader.NumLights, GetMaxLightsPerCell());

LOOP
for (uint GridLightListIndex = 0; GridLightListIndex < NumLightsInGridCell; GridLightListIndex++)
{
half4 PreviewShadowMapChannelMask = 1;
uint LocalLightingChannelMask = LIGHTING_CHANNEL_MASK;
const FLocalLightData LocalLight = GetLocalLightDataFromGrid(
CulledLightsGridHeader.DataStartIndex + GridLightListIndex, EyeIndex);

#if MATERIALBLENDING_ANY_TRANSLUCENT
if(UnpackAffectsTranslucentLighting(LocalLight) == 0)
{
continue; // Skip lights that don't affect translucency
}
#endif

FDeferredLightData LightData = ConvertToDeferredLight(LocalLight, SpecularScale,
PreviewShadowMapChannelMask, LocalLightingChannelMask);

The per-cell light limit is controlled by this CVar:

int32 GMaxCulledLightsPerCell = 32;
FAutoConsoleVariableRef CVarMaxCulledLightsPerCell(
TEXT("r.Forward.MaxCulledLightsPerCell"),
GMaxCulledLightsPerCell,
TEXT("Controls how much memory is allocated for each cell for light culling. "
"When r.Forward.LightLinkedListCulling is enabled, this is used to compute "
"a global max instead of a per-cell limit on culled lights."),
ECVF_Scalability | ECVF_RenderThreadSafe
);

MSAA Implementation in UE5

Scene Texture Allocation

From SceneTextures.cpp, MSAA is configured at texture creation time:

SceneTextures.cpp — MSAA CVar
static TAutoConsoleVariable<int32> CVarMSAACount(
TEXT("r.MSAACount"),
4,
TEXT("Number of MSAA samples to use with the forward renderer. "
"Only used when MSAA is enabled in the rendering project settings.\n")
TEXT("0: MSAA disabled (Temporal AA enabled)\n")
TEXT("1: MSAA disabled\n")
TEXT("2: Use 2x MSAA\n")
TEXT("4: Use 4x MSAA")
TEXT("8: Use 8x MSAA"),
ECVF_RenderThreadSafe | ECVF_Scalability
);

Depth Buffer with MSAA

FMinimalSceneTextures::InitializeViewFamily
FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
SceneTextures.Config.Extent,
PF_DepthStencil,
Config.DepthClearValue,
Config.DepthCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples; // MSAA sample count

SceneTextures.Depth = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));

// MSAA requires resolve target
if (Desc.NumSamples > 1)
{
Desc.NumSamples = 1;

if ((StereoDepthRHI = FindStereoDepthTexture(Config.bSupportsXRTargetManagerDepthAlloc,
Config.Extent, ETextureCreateFlags::DepthStencilResolveTarget, Desc.NumSamples)) != nullptr)
{
// Use XR-provided resolve target
SceneTextures.Depth.Resolve = RegisterExternalTexture(GraphBuilder, StereoDepthRHI, TEXT("SceneDepthZ"));
}
else if (Config.bKeepDepthContent)
{
// Create our own resolve target
SceneTextures.Depth.Resolve = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));
}
}

Scene Color with MSAA

SceneTextures.cpp — Scene Color creation
{
const bool bIsMobilePlatform = Config.ShadingPath == EShadingPath::Mobile;
const ETextureCreateFlags sRGBFlag = (bIsMobilePlatform && IsMobileColorsRGB())
? TexCreate_SRGB : TexCreate_None;

FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent,
Config.ColorFormat,
Config.ColorClearValue,
Config.ColorCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples;

// CreateTextureMSAA handles creating both MSAA target and resolve target
SceneTextures.Color = CreateTextureMSAA(GraphBuilder, Desc,
TEXT("SceneColorMS"), TEXT("SceneColor"),
GFastVRamConfig.SceneColor | sRGBFlag);
}

The FRDGTextureMSAA structure (used for both Color and Depth) contains two members: Target (the MSAA render target) and Resolve (the resolved single-sample texture).


Limitations of Forward Rendering in UE5

Virtual Shadow Maps: Not Supported

This is explicitly enforced in the code:

if (IsForwardShadingEnabled(ShaderPlatform))
{
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
}

Post-Process Effects Require Resolved Data

Many post-process effects operate on resolved (single-sample) data, meaning MSAA doesn't help with aliasing they introduce:

  • Screen-Space Reflections (SSR)
  • Screen-Space Ambient Occlusion (SSAO)
  • Depth of Field
  • Motion Blur

Nanite: Works, But Independently

Nanite does work with forward rendering, but operates somewhat independently:

if (bNaniteEnabled && InViews.Num() > 0)
{
RenderNanite(GraphBuilder, InViews, LocalSceneTextures, bIsEarlyDepthComplete,
InNaniteBasePassVisibility, NaniteRasterResults, PrimaryNaniteViews,
FirstStageDepthBuffer);
}

Conclusion

Forward rendering in UE5 is a fully functional path, not a legacy compatibility mode. It's designed for scenarios with simpler lighting requirements where geometric clarity matters.

What Works Well

FeatureNotes
Clustered Forward LightingSupports multiple local lights efficiently
Full MSAASupported at the renderer level
NaniteWorks, albeit somewhat independently
Hair StrandsBenefits from MSAA coverage more than TAA temporal resolve — less ghosting on fine strands
Volumetric FogFully integrated

Real Trade-offs

What You Lose
  • No Virtual Shadow Maps — the engine explicitly asserts against this.
  • No Lumen GI or Lumen Reflections — these rely on deferred-specific screen traces and the surface cache.
  • Limited screen-space effects — SSR, SSAO, DoF, and motion blur all operate on resolved data.

For the right content type — dense foliage, fine geometry, single dominant light — forward rendering with MSAA remains a genuinely superior choice to temporal solutions. The key is understanding where each pipeline's assumptions align with your rendering goals.

PC Gaming Optimization Guide

· 4 min read

Most "optimization" guides just tell you to turn on DLSS or FSR or get new hardware. This guide focuses on the fundamentals:

  • System-level bottlenecks
  • OS issues
  • Engine-level tweaks
  • Red/Green specific fixes

1. The "Clean Environment" Protocol

Before tweaking settings, eliminate background interference.

Kill the Overlays

Disable Steam, Discord, GameBar, and NVIDIA overlays. These create "hooks" that interfere with frame pacing.

Cold Boot

Don't just Restart. Perform a full Shutdown, let the hardware capacitors clear, and boot fresh.

BIOS & Drivers

Keep your BIOS updated for critical CPU microcode fixes. For AMD, always use the latest chipset drivers directly from their site — not Windows Update.

Windows Update Reverting AMD Drivers

Windows Update sometimes reverts AMD drivers to older, generic versions, breaking features like Radeon Software. To fix this:

  1. Use the Show/Hide Updates troubleshooter to block the offending update.
  2. Use Display Driver Uninstaller (DDU) in Safe Mode for a clean install of the latest official AMD drivers.
  3. Disable driver updates via Device Installation Settings or Group Policy to prevent future overwrites.

Debloated OS

While older Windows iterations often feel snappier, staying on an outdated version isn't recommended for security reasons. The best option is Windows Enterprise, which receives stable, timely updates and ships with less bloat. LTSC releases have been known to cause compatibility issues.


2. Antivirus & Security (The Stutter Killers)

Windows Security can cause massive hitches by scanning game files as they are streamed.

Add AV Exclusions

Add your game's .exe and its installation folder to your antivirus Exclusions list.

Exclude Shader Caches

This is critical. Add these paths to your exclusions:

%AppData%\Local Low\NVIDIA\PerDriverVersion\DXCache
%LocalAppData%\[GameName]\Saved

CFG Override

  1. Open Start and search for Exploit Protection.
  2. Go to Program Settings.
  3. Add your game's .exe.
  4. Scroll to Control Flow Guard (CFG)Override → set to OFF.

3. Unreal Engine 5 Manual Tuning

UE5 games often suffer from "traversal stutter." You can force better engine behaviour by editing configuration files manually.

Finding the Config File

  1. Paste %LocalAppData% into File Explorer.
  2. Find your game's developer folder.
  3. Navigate to: Saved > Config > Windows (or WindowsClient / WinGDK).
  4. Open Engine.ini.

PSO Precaching

Add the following at the bottom of Engine.ini:

[/Script/Engine.RendererSettings]
r.PSOPrecaching=1
note

r.PSOPrecaching=1 works for games built with UE5.3 and later. This CVAR forces the game to pre-compile shaders, reducing shader-compile stutter during gameplay.

You can also use this file to lower the graphics API or disable specific features on a per-game basis.


4. FastVRam: Console-Level Memory Logic

In Unreal Engine, Fast VRAM is a specialized optimization flag within the Render Dependency Graph (RDG). It acts as a "VIP pass" for GPU memory allocation, instructing the renderer to place the most performance-critical resources — such as Depth Buffers, GBuffers, and Shadow Maps — into the hardware's fastest available memory pool.

Why Use It?

On consoles, bandwidth is king. By forcing the GBuffer (world normals, color, roughness) and Depth Buffers into high-speed VRAM, you reduce the time the GPU spends waiting for data. This mimics console-level stability on PC.

Configuration

Add the following under [/Script/Engine.RendererSettings] in your Engine.ini:

r.FastVRam.DBufferA=1
r.FastVRam.DBufferB=1
r.FastVRam.DBufferC=1
r.FastVRam.DBufferMask=1
r.FastVRam.GBufferA=1
r.FastVRam.GBufferB=1
r.FastVRam.GBufferC=1
r.FastVRam.GBufferD=1
r.FastVRam.GBufferE=1
r.FastVRam.GBufferF=1
r.FastVRam.GBufferVelocity=1
r.FastVRam.SceneDepth=1

5. Hardware-Specific Tweaks

NVIDIA

In NVIDIA Control Panel → Manage 3D Settings, set CUDA – System Fallback Policy to Prefer No System Fallback. This prevents the game from falling back to slow system RAM when VRAM fills up.

caution

This setting is volatile and may reset after driver updates.

AMD

If you have a Ryzen CPU and experience unexplained performance drops, try disabling Core Isolation in Windows Security settings.

Resizable BAR (ReBar)

GPU GenerationMethod
NVIDIA Turing / GTX 1600Use NvStrapsReBar to enable ReBar on older cards
AMD Polaris / VegaEnable SAM via Registry by setting KMD_EnableReBarForLegacyASIC to 1

6. The "Nuclear" Option

If you're still experiencing stuttering after everything above:

  • DDU (Display Driver Uninstaller): Run in Safe Mode to fully wipe your GPU drivers, then reinstall the latest version from scratch.
  • Disable Overclocks: UE5.0–5.3 is extremely sensitive to instability. If you're crashing, return your CPU and GPU to stock speeds.
  • Clear Shader Caches: Manually delete the DXCache folders mentioned in Section 2 to force a clean rebuild.
  • Set Virtual Memory to 0: If you have sufficient RAM, disabling the page file can remove a source of latency. (Tip via @GameDevMicah)

Summary

True optimization isn't just about upscaling — it's about removing software friction and managing memory like a console. Fix the foundation first.