This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.
fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%
fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
We don't run this code before waitcnt insertion, so this isn't necessary.
This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:
v0 = valu
if (divergent) {
v0 = vmem
} else {
use(v0)
}
v0 = vmem
if (divergent) {
wait vmcnt(0)
} else {
wait vmcnt(0)
}
use(v0)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
Surprisingly, this has an effect on GFX1201:
Totals from 66 (0.08% of 82405) affected shaders:
Instrs: 200725 -> 201517 (+0.39%)
CodeSize: 978676 -> 981488 (+0.29%)
Latency: 291736 -> 291760 (+0.01%)
InvThroughput: 31556 -> 31604 (+0.15%)
Copies: 11928 -> 12588 (+5.53%)
Branches: 14850 -> 15048 (+1.33%)
SALU: 68981 -> 69509 (+0.77%)
I say surprisingly, because nir_analyze_range handles nothing but
constants and bcsel for integers. Maybe rdr2 is actually
hitting some weird bcsel(a, #b, #c) == 0 case where b and c are not 0?
No, I looked at a few of those shaders, and it's just noise from changed
instruction order.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
According to the ISA doc, this is needed for hang recovery.
This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
It's rarely used because a custom pixel or compute shader is almost always
faster, and we have those already.
RGB<->BGR swapping and microtile mode switching existed only for CB_RESOLVE
and are removed too.
RADV could also remove CB_RESOLVE, but it should probably use a pixel shader
until it can use ac_nir_meta_cs_blit, which is the fastest option for gfx12.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39872>
Add an option to enable HSR Prepass.
It is currently disabled by default as it might cause performance
regressions for content that:
- Has very simple fragment work.
- Already does a ZS prepass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
dcd_flags1 was not counted as dirty in case the color attachment map was
updated. This could lead to an outdated value for render_target_mask.
Fixes: a4670a67e0 ("panvk/csf: Set the correct DCD_FLAGS_1.render_rarget_mask")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
Odd macro-tile counts in X trigger flaky rendering/readback in
parallel stress runs with macro-tiled NPOT textures (for example
piglit draw-pixel-with-texture -auto -fbo).
When a texture is macro-tiled and uses stride addressing, align the
width to two macro tiles. This keeps the stride at an even number of
macro tiles in X and avoids the corruption without disabling
macrotiling.
I was not able to find anything about this in the docs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39882>
TexSubImage3D failures were caused by tiled multi-slice uploads for
unaligned XY box. Falls back to per-layer uploads when the 3D box
depth is > 1 and XY box is unaligned.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39897>
The previous code was copying RefPicList0 to RefPicList1 but not updating
num_ref_idx_l1_active_minus1, leaving it potentially uninitialized or zero.
This caused the hardware to see an inconsistent L1 list state.
Accordingly it sets num_ref_idx_active_override_flag if necessary.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
It's not needed to decompress DCC when formats are compatible each
other, this basically removes all decompressions on GFX11-GFX11.5.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39888>
No issue with clang or gcc-14.x (or earlier versions). The issue only
shows up since gcc-15.1. The compiler somehow fails to consider those
cs helpers dereferencing the pointer from the pNext chain for reads,
and thus has falsely optimized away the pNext store. This change works
around this with a no-op memory clobber.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13242
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39906>
lua_pushunsigned was introduced in Lua 5.2, deprecated in 5.3,
and removed in 5.4. Replace it with lua_pushinteger which has
been available since Lua 5.0 and handles the uint32_t value
safely via implicit widening to the 64-bit lua_Integer type.
This fixes the build with Lua 5.5:
../src/freedreno/decode/script.c: In function 'pushdecval':
../src/freedreno/decode/script.c:182:7: error: implicit declaration of function 'lua_pushunsigned'; did you mean 'lua_pushinteger'? [-Wimplicit-function-declaration]
182 | lua_pushunsigned(L, val.u);
| ^~~~~~~~~~~~~~~~
| lua_pushinteger
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39901>
If descriptor heap is used, there's no pipeline layout created. So we
have to patch compute and RT pipelines to allow it. Graphics pipeline
doesn't need the change because of GPL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
There're potential optimizations available for below:
- vkWriteSamplerDescriptorsEXT
- vkWriteResourceDescriptorsEXT
- vkGetPhysicalDeviceDescriptorSizeEXT
- vkRegisterCustomBorderColorEXT
...and we can revisit if there's perf hit from above for real apps.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>