Skip to main content

One post tagged with "smaa"

View All Tags

The State of Desktop Forward Rendering in Unreal Engine 5.7

· 12 min read
Andriy Kalysh

There's a growing discontent with temporal anti-aliasing solutions in game development circles. Communities like r/FuckTAA have emerged as vocal critics of the ghosting and smearing artifacts that plague TAA and other temporal solutions.

The issue runs deeper than preference: UE's entire rendering pipeline now assumes temporal accumulation. SSGI has grown noisier since UE4, expecting TAA to clean up the result. Lumen's software ray tracing leans on temporal filtering to resolve its noisy probes. Even Nanite's virtualized geometry produces sub-pixel triangles that only resolve cleanly under temporal upscaling. Opting out of TAA means fighting the engine's core assumptions.

For certain scenarios, forward rendering with MSAA remains a compelling option. Dense foliage with a single dominant light source — the sun — is exactly where forward rendering can shine. One directional light and extensive vegetation where geometric edge quality matters more than complex lighting setups.

Personal Motivation

My interest is personal: I've always wanted to create a game set in a tropical forest environment. Perhaps this documentation will prove useful to graphics programmers exploring hybrid renderer implementations, or those seeking to understand how forward rendering can coexist with modern engine features.


AA Approach Comparisons

ConfigurationNotes
No AA (Deferred)Baseline
SMAA 1x (Deferred)Basic morphological AA
SMAA 1x + Filmic Filter (Deferred)Good result for deferred
MSAA 2x + SMAA S2x (Forward)Preserved edges better than Filmic + SMAA

I recently implemented SMAA S2x mode to work alongside MSAA 2x for my plugin, and the results demonstrate why this combination deserves attention.

A Note on the Filmic Filter

The SMAA 1x + Filmic Filter combination addresses a core limitation of pure morphological AA: lack of temporal stability. Based on Activision's Filmic SMAA research, this filter operates in two distinct modes:

Stationary pixels accumulate history aggressively (85% history weight by default), using Catmull-Rom bicubic sampling for sharp history reconstruction. When SMAA edge data is available and no significant motion is detected, the filter applies a convergence term m03 derived from subpixel positioning — this reconstructs detail between the current frame's left/right neighbors and blends it with history, effectively enhancing perceived resolution on static geometry.

Moving pixels (velocity > threshold or detected disocclusion) bypass the convergence sharpening entirely and blend toward the current frame, preventing the ghosting artifacts typical of aggressive temporal filters. Fast motion (>3.5 pixels) forces immediate fallback to the raw current frame.

The Result

Temporal smoothing and detail enhancement when the camera is still; instant reversion to sharp single-frame output on motion. This trades the persistent smear of traditional TAA for a "best of both worlds" approach that respects SMAA's geometric edge quality.


Why MSAA and Deferred Rendering Don't Mix

The GBuffer Problem

Deferred rendering stores material properties in multiple render targets called GBuffers. Looking at UE5's SceneTextures.cpp, we can see exactly what gets allocated:

SceneTextures.cpp — GBuffer allocation
if (Bindings.GBufferA.Index >= 0)
{
const FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent, Bindings.GBufferA.Format, FClearValueBinding::Transparent,
Bindings.GBufferA.Flags | FlagsToAdd | GFastVRamConfig.GBufferA,
Config.bRequireMultiView, Config.MobileMultiViewRenderTargetNumLayers);
SceneTextures.GBufferA = GraphBuilder.CreateTexture(Desc, TEXT("GBufferA"));
}
// ... GBufferB, C, D, E, F follow the same pattern

UE5's deferred renderer uses up to six GBuffer textures (A through F), plus depth. With 4x MSAA, you'd need to store 4 samples per pixel for each of these textures. The memory and bandwidth explosion is already substantial, and the lighting pass would need to read all MSAA samples, perform shading calculations per-sample, and resolve. Modern engines have dozens of passes that would each need MSAA-aware variants.

Deferred MSAA Is Possible — Crysis 3's Example

Before dismissing MSAA with deferred rendering entirely, it's worth examining how Crytek made it work in Crysis 3. Tiago Sousa's GDC 2013 and SIGGRAPH 2013 presentations document their implementation in detail.

The Core Technique: Stencil-Based Edge Detection

Crytek's approach splits rendering into pixel-frequency and sample-frequency passes:

  1. During G-Buffer fill, reserve 1 bit from the stencil buffer for a sub-sample mask.
  2. A resolve pass extracts sample 0 from the G-Buffer and builds a mask indicating whether all samples in a pixel match sample 0.
  3. Pixel-frequency passes read from pre-resolved (non-multisampled) textures, using stencil to process only uniform pixels.
  4. Sample-frequency passes read from multisampled textures, indexing via SV_SAMPLEINDEX, processing only edge pixels.

This builds on Intel's Andrew Lauritzen's SIGGRAPH 2010 research on tile-based deferred shading with MSAA. The key insight: store G-Buffer at sample frequency, but only apply per-sample shading where discontinuities exist.

The Performance Reality

The technique worked, but the performance cost was substantial. Forum discussions from 2013 show 30–50% frame rate drops when enabling MSAA in Crysis 3. Whether MSAA itself was the bottleneck remains debated — Threat Interactive's analysis of the Crysis 3 pipeline argues otherwise, though their methodology has drawn criticism in technical circles.

Why UE5 Doesn't Do This

Epic could theoretically implement stencil-based deferred MSAA. They haven't, likely because:

  • The engineering complexity is significant.
  • TSR/TAA solve aliasing well enough for most use cases.
  • Virtual Shadow Maps, Lumen, and Nanite all assume temporal accumulation anyway.
  • Modern hardware ray tracing changes the cost/benefit calculus.

For developers who need MSAA without temporal artifacts, UE5's forward rendering path remains the practical choice.


The Industry Shift Toward Visibility Buffers

id Software's GPC 2025 presentation on DOOM: The Dark Ages reveals they abandoned their Forward+ pipeline (used in DOOM Eternal) in favor of a visibility buffer / deferred hybrid. The core problem? Quad utilization efficiency.

When triangle density increases, forward rendering suffers disproportionately. id Software's profiling showed scenes where helper threads vastly outnumbered active threads — pixels being shaded that would never contribute to the final image. Their visibility buffer approach saved up to 25% GPU time on target hardware, with performance now scaling almost linearly with resolution.

Epic's trajectory tells a similar story. UE 5.7 deprecated Clustered Deferred Rendering, citing maintenance burden and low adoption — another reason may be the introduction of Megalights. Nanite already uses a visibility buffer internally. Lumen assumes temporal accumulation. The engine architecture increasingly optimizes for deferred-style pipelines with compute-based material evaluation.

The Hybrid Approach Remains Viable

Despite these trends, hybrid rendering — forward for select object types, deferred for the rest — remains a production-proven approach in 2025. DOOM: The Dark Ages itself ships as a hybrid: Forward+ is still used for transparents and remains available as a fallback path.

Many AAA studios continue to leverage hybrid pipelines precisely because different content types have different optimal rendering strategies. Foliage, hair, and particles often benefit from forward rendering's MSAA integration and simpler transparency handling, while static opaque geometry fits naturally into deferred or visibility buffer workflows.

Opinion

UE5 should preserve this flexibility. The current forward rendering path works. It integrates with MSAA. It handles masked materials without the complexity of compute-based dispatch systems. Rather than treating forward rendering as a checkbox, Epic should integrate it properly for developers who don't want to use "Unreal's Pipeline".


UE5's Forward Rendering Pipeline

Despite being named FDeferredShadingSceneRenderer, UE5's main renderer handles both deferred and forward paths. The branch happens based on project settings:

DeferredShadingRenderer.cpp
if (IsForwardShadingEnabled(ShaderPlatform))
{
// Forward-specific path
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
RenderShadowDepthMaps(GraphBuilder, InitViewTaskDatas.DynamicShadows,
InstanceCullingManager, ExternalAccessQueue);
bShadowMapsRenderedEarly = true;

// Hair strands rendering
if (bHairStrandsEnable)
{
RenderHairPrePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager, HairStrandsBookmarkParameters.CullingResults);
RenderHairBasePass(GraphBuilder, Scene, SceneTextures, Views,
InstanceCullingManager);
}

// Forward shadow projection
RenderForwardShadowProjections(GraphBuilder, SceneTextures,
ForwardScreenSpaceShadowMaskTexture, ForwardScreenSpaceShadowMaskHairTexture);

// Volumetric fog BEFORE base pass (critical ordering difference)
ComputeVolumetricFog(GraphBuilder, SceneTextures);
}
Critical Ordering Constraint

Forward rendering requires shadow maps to be rendered before the base pass. In deferred, shadows can be calculated later using GBuffer data. This ordering constraint is fundamental to understanding the pipeline.

Forward Rendering Execution Order

  1. Pre-pass / Early Z (required for forward; optional optimization for deferred)
  2. Shadow Depth Maps (rendered early in forward)
  3. Hair Strands (if enabled)
  4. Forward Shadow Projection → ForwardScreenSpaceShadowMaskTexture
  5. Volumetric Fog (before base pass in forward)
  6. Base Pass (lighting calculated inline)

The Base Pass: Where Lighting Happens

In forward rendering, the base pass does everything. From RenderBasePass:

DeferredShadingRenderer.h
static void RenderBasePass(
FDeferredShadingSceneRenderer& Renderer,
FRDGBuilder& GraphBuilder,
TArrayView<FViewInfo> InViews,
FSceneTextures& SceneTextures,
const FDBufferTextures& DBufferTextures,
FExclusiveDepthStencil::Type BasePassDepthStencilAccess,
FRDGTextureRef ForwardShadowMaskTexture,
FInstanceCullingManager& InstanceCullingManager,
bool bNaniteEnabled,
struct FNaniteShadingCommands& NaniteBasePassShadingCommands,
const TArrayView<Nanite::FRasterResults>& NaniteRasterResults);
info

Notice ForwardShadowMaskTexture being passed directly to the base pass. Forward materials sample this during shading.

Forward Lighting: The Shader Side

The actual lighting calculations happen in ForwardLightingCommon.ush. The core function is GetForwardDirectLightingSplit:

ForwardLightingCommon.ush
FDeferredLightingSplit GetForwardDirectLightingSplit(
uint2 PixelPos,
uint GridIndex,
float3 TranslatedWorldPosition,
float3 CameraVector,
FGBufferData GBufferData, // Note: "GBufferData" is an inaccurate term in forward
float2 ScreenUV,
uint PrimitiveId,
uint EyeIndex,
float Dither,
float InDirectionalLightCloudShadow,
float3 InDirectionalLightAtmosphereTransmittance,
inout float OutDirectionalLightShadow,
bool bSeparateMainDirLightLuminance,
inout float3 SeparatedMainDirLightLuminance,
bool bSkipDirLightVirtualShadowMapEvaluation)

Directional Light Handling

ForwardLightingCommon.ush — Directional light path
BRANCH
if (DirectionalLightData.HasDirectionalLight
#if MATERIALBLENDING_ANY_TRANSLUCENT
&& DirectionalLightData.bAffectsTranslucentLighting > 0
#endif
)
{
half4 PreviewShadowMapChannelMask = 1;
uint DirLightingChannelMask = LIGHTING_CHANNEL_MASK;
FDeferredLightData LightData = ConvertToDeferredLight(DirectionalLightData,
SpecularScale, PreviewShadowMapChannelMask, DirLightingChannelMask);

// Shadow factor calculation
#if DISABLE_FORWARD_DIRECTIONAL_LIGHT_SHADOW
float4 LightAttenuation = float4(1, 1, 1, 1);
#elif ((MATERIALBLENDING_SOLID || MATERIALBLENDING_MASKED) && !MATERIAL_SHADINGMODEL_SINGLELAYERWATER)
float DynamicShadowing = dot(PreviewShadowMapChannelMask, DynamicShadowFactors);
float PerObjectShadowing = LightData.DistanceFadeMAD.y < 0.0f ? 1.0f : DynamicShadowing;
float WholeSceneShadowing = LightData.DistanceFadeMAD.y < 0.0f ? DynamicShadowing : 1.0f;
float4 LightAttenuation = float4(WholeSceneShadowing.xx, PerObjectShadowing.xx);
#else
// Translucent path - calculates shadows inline
float DynamicShadowFactor = ComputeDirectionalLightDynamicShadowing(
TranslatedWorldPosition, GBufferData.Depth, bUnused);
#endif

Local Lights: Clustered Forward

Forward rendering doesn't mean no local lights. UE5 uses clustered forward lighting:

ForwardLightingCommon.ush — Clustered local lights
#if !DISABLE_FORWARD_LOCAL_LIGHTS
const FCulledLightsGridHeader CulledLightsGridHeader = GetCulledLightsGridHeader(GridIndex);

// Safety clamp to prevent GPU hangs
const uint NumLightsInGridCell = min(CulledLightsGridHeader.NumLights, GetMaxLightsPerCell());

LOOP
for (uint GridLightListIndex = 0; GridLightListIndex < NumLightsInGridCell; GridLightListIndex++)
{
half4 PreviewShadowMapChannelMask = 1;
uint LocalLightingChannelMask = LIGHTING_CHANNEL_MASK;
const FLocalLightData LocalLight = GetLocalLightDataFromGrid(
CulledLightsGridHeader.DataStartIndex + GridLightListIndex, EyeIndex);

#if MATERIALBLENDING_ANY_TRANSLUCENT
if(UnpackAffectsTranslucentLighting(LocalLight) == 0)
{
continue; // Skip lights that don't affect translucency
}
#endif

FDeferredLightData LightData = ConvertToDeferredLight(LocalLight, SpecularScale,
PreviewShadowMapChannelMask, LocalLightingChannelMask);

The per-cell light limit is controlled by this CVar:

int32 GMaxCulledLightsPerCell = 32;
FAutoConsoleVariableRef CVarMaxCulledLightsPerCell(
TEXT("r.Forward.MaxCulledLightsPerCell"),
GMaxCulledLightsPerCell,
TEXT("Controls how much memory is allocated for each cell for light culling. "
"When r.Forward.LightLinkedListCulling is enabled, this is used to compute "
"a global max instead of a per-cell limit on culled lights."),
ECVF_Scalability | ECVF_RenderThreadSafe
);

MSAA Implementation in UE5

Scene Texture Allocation

From SceneTextures.cpp, MSAA is configured at texture creation time:

SceneTextures.cpp — MSAA CVar
static TAutoConsoleVariable<int32> CVarMSAACount(
TEXT("r.MSAACount"),
4,
TEXT("Number of MSAA samples to use with the forward renderer. "
"Only used when MSAA is enabled in the rendering project settings.\n")
TEXT("0: MSAA disabled (Temporal AA enabled)\n")
TEXT("1: MSAA disabled\n")
TEXT("2: Use 2x MSAA\n")
TEXT("4: Use 4x MSAA")
TEXT("8: Use 8x MSAA"),
ECVF_RenderThreadSafe | ECVF_Scalability
);

Depth Buffer with MSAA

FMinimalSceneTextures::InitializeViewFamily
FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
SceneTextures.Config.Extent,
PF_DepthStencil,
Config.DepthClearValue,
Config.DepthCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples; // MSAA sample count

SceneTextures.Depth = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));

// MSAA requires resolve target
if (Desc.NumSamples > 1)
{
Desc.NumSamples = 1;

if ((StereoDepthRHI = FindStereoDepthTexture(Config.bSupportsXRTargetManagerDepthAlloc,
Config.Extent, ETextureCreateFlags::DepthStencilResolveTarget, Desc.NumSamples)) != nullptr)
{
// Use XR-provided resolve target
SceneTextures.Depth.Resolve = RegisterExternalTexture(GraphBuilder, StereoDepthRHI, TEXT("SceneDepthZ"));
}
else if (Config.bKeepDepthContent)
{
// Create our own resolve target
SceneTextures.Depth.Resolve = GraphBuilder.CreateTexture(Desc, TEXT("SceneDepthZ"));
}
}

Scene Color with MSAA

SceneTextures.cpp — Scene Color creation
{
const bool bIsMobilePlatform = Config.ShadingPath == EShadingPath::Mobile;
const ETextureCreateFlags sRGBFlag = (bIsMobilePlatform && IsMobileColorsRGB())
? TexCreate_SRGB : TexCreate_None;

FRDGTextureDesc Desc = FRDGTextureDesc::CreateRenderTargetTextureDesc(
Config.Extent,
Config.ColorFormat,
Config.ColorClearValue,
Config.ColorCreateFlags,
Config.bRequireMultiView,
Config.MobileMultiViewRenderTargetNumLayers);
Desc.NumSamples = Config.NumSamples;

// CreateTextureMSAA handles creating both MSAA target and resolve target
SceneTextures.Color = CreateTextureMSAA(GraphBuilder, Desc,
TEXT("SceneColorMS"), TEXT("SceneColor"),
GFastVRamConfig.SceneColor | sRGBFlag);
}

The FRDGTextureMSAA structure (used for both Color and Depth) contains two members: Target (the MSAA render target) and Resolve (the resolved single-sample texture).


Limitations of Forward Rendering in UE5

Virtual Shadow Maps: Not Supported

This is explicitly enforced in the code:

if (IsForwardShadingEnabled(ShaderPlatform))
{
ensureMsgf(!VirtualShadowMapArray.IsEnabled(),
TEXT("Virtual shadow maps are not supported in the forward shading path"));
}

Post-Process Effects Require Resolved Data

Many post-process effects operate on resolved (single-sample) data, meaning MSAA doesn't help with aliasing they introduce:

  • Screen-Space Reflections (SSR)
  • Screen-Space Ambient Occlusion (SSAO)
  • Depth of Field
  • Motion Blur

Nanite: Works, But Independently

Nanite does work with forward rendering, but operates somewhat independently:

if (bNaniteEnabled && InViews.Num() > 0)
{
RenderNanite(GraphBuilder, InViews, LocalSceneTextures, bIsEarlyDepthComplete,
InNaniteBasePassVisibility, NaniteRasterResults, PrimaryNaniteViews,
FirstStageDepthBuffer);
}

Conclusion

Forward rendering in UE5 is a fully functional path, not a legacy compatibility mode. It's designed for scenarios with simpler lighting requirements where geometric clarity matters.

What Works Well

FeatureNotes
Clustered Forward LightingSupports multiple local lights efficiently
Full MSAASupported at the renderer level
NaniteWorks, albeit somewhat independently
Hair StrandsBenefits from MSAA coverage more than TAA temporal resolve — less ghosting on fine strands
Volumetric FogFully integrated

Real Trade-offs

What You Lose
  • No Virtual Shadow Maps — the engine explicitly asserts against this.
  • No Lumen GI or Lumen Reflections — these rely on deferred-specific screen traces and the surface cache.
  • Limited screen-space effects — SSR, SSAO, DoF, and motion blur all operate on resolved data.

For the right content type — dense foliage, fine geometry, single dominant light — forward rendering with MSAA remains a genuinely superior choice to temporal solutions. The key is understanding where each pipeline's assumptions align with your rendering goals.