Skip to content

Use half precision (f16) in gsplat WGSL shaders#8466

Merged
slimbuck merged 12 commits intoplaycanvas:mainfrom
slimbuck:f16-dev
Feb 23, 2026
Merged

Use half precision (f16) in gsplat WGSL shaders#8466
slimbuck merged 12 commits intoplaycanvas:mainfrom
slimbuck:f16-dev

Conversation

@slimbuck
Copy link
Member

Summary

Use half precision types throughout the gsplat WGSL shader pipeline to reduce ALU cost and varying bandwidth on devices that support the shader-f16 WebGPU feature. On devices without f16 support, all half types automatically fall back to f32 via the engine's alias system, so there is no change in behavior.

Changes

Fragment shader (gsplat.js frag)

  • gaussianUV and gaussianColor varyings changed from vec2f/vec4f to half2/half4, halving interpolation bandwidth
  • normExp() and alpha computation operate entirely in half precision
  • Final output color promoted to f32 only at the output assignment

Vertex shader - core (gsplat.js vert, gsplatStructs.js, gsplatSource.js, gsplatCommon.js)

  • SplatSource.cornerUV changed from vec2f to half2
  • SplatCorner.uv changed from vec2f to half2, aaFactor from f32 to half
  • Color pipeline uses half4 throughout; promoted to f32 only for the modifySplatColor user hook and clipCorner math
  • clipCorner computes clip factor in f32 (for log/sqrt range), stores result as half

Quaternion & covariance (gsplatQuatToMat3.js, gsplatCorner.js)

  • quatToMat3 now returns half3x3 and computes entirely in half precision
  • quatMul computes in half, returns f32
  • computeCovariance operates in half3x3; result promoted to f32 at the initCornerCov boundary for projection math

Spherical harmonics (gsplatEvalSH.js)

  • evalSH return type changed from vec3f to half3, avoiding an unnecessary promotion since all SH constants and intermediates were already half

SOG format (sog.js, sogSH.js)

  • getColor() dequantization (mix) and SH_C0 multiply now operate in half3
  • readSHTexel() dequantization (mix) now operates in half3 instead of f32, benefiting the 15 calls per vertex (SH band 3)
  • SH data read loop unrolled to avoid Dawn's forward progress volatile wrapper on loop-based texture reads

Technical details

All half-precision types use the engine's alias system (half, half2, half3, half4, half3x3) defined in half-types.js. When CAPS_SHADER_F16 is not defined, these resolve to their f32 equivalents (f32, vec2f, vec3f, vec4f, mat3x3f), ensuring the shaders compile and run correctly on all devices. No native WGSL f16 types (f16, vec2h, mat3x3h, etc.) are used directly.

f32 is intentionally retained for:

  • Position dequantization and world-space calculations
  • View/projection matrix math
  • Eigenvalue decomposition in initCornerCov (precision-sensitive subtraction)
  • clipCorner log/sqrt (dynamic range)

@slimbuck slimbuck requested review from a team and Copilot February 20, 2026 13:40
@slimbuck slimbuck self-assigned this Feb 20, 2026
@slimbuck slimbuck added the area: graphics Graphics related issue label Feb 20, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces half-precision (f16) floating-point types throughout the gsplat WGSL shader pipeline to reduce ALU cost and varying bandwidth on devices supporting the shader-f16 WebGPU feature. The implementation uses the engine's type alias system (half, half2, half3, half4, half3x3) that automatically falls back to f32 on devices without f16 support, ensuring compatibility across all hardware.

Changes:

  • Converted color pipeline, UV coordinates, spherical harmonics, quaternion/covariance computations, and fragment shader operations to half precision
  • Retained f32 for position calculations, view/projection matrices, eigenvalue decomposition, and precision-sensitive operations
  • Unrolled SH texture read loop in SOG format to avoid Dawn compiler issues

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
gsplatStructs.js Changed struct fields to use half2 for UVs and half for AA factor
gsplatSource.js Added half2 cast when initializing cornerUV from vertex position
gsplatQuatToMat3.js Converted quaternion and matrix operations to half precision with f32 boundaries
gsplatEvalSH.js Changed spherical harmonics evaluation to return half3 instead of vec3f
gsplatCorner.js Implemented half-precision covariance computation with f32 promotion for projection math
gsplatCommon.js Modified clipCorner to compute clip factor in f32, store as half
gsplat.js (vert) Converted color pipeline to half4 with f32 conversions at user hook and output boundaries
sogSH.js Unrolled SH data read loop and converted to half precision operations
sog.js Changed color dequantization and SH_C0 multiply to use half3
gsplat.js (frag) Converted fragment shader to operate in half precision with f32 output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@mvaligursky mvaligursky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, but it'd be good to test on carious Samsung devices - we had multiple half conversion issues on those

@slimbuck slimbuck merged commit 10fbc63 into playcanvas:main Feb 23, 2026
6 of 8 checks passed
@slimbuck slimbuck deleted the f16-dev branch February 23, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: graphics Graphics related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants