The previous fix was incomplete because if the same graphics pipeline
and the same PS epilog are rebind after vkCmdExecuteCommands(), the PS
epilog state wouldn't be re-emitted, and it will use a wrong VA (in case
both fragment shader user SGPRs aren't similar either).
Resetting the PS epilog to NULL in the primary should prevent any
issues, but this tracking still need to be improved because it caused
two issues recently.
Fixes: 1a00587c44 ("radv: fix a GPU hang with PS epilogs and secondary command buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15176
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a73fc90bcd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41269>
Is a single triangle is selected, it can be the case that the next iteration
can't merge any pair with the triangle. In that case, the HW node with a
single triangle will not have the highest hw_node_index, triggering an
assert.
Fixes: c18a7d0 ("radv: Emit compressed primitive nodes on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
(cherry picked from commit db38d1a98c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Entering a loop with empty exec mask might lead to
not be able to execute the break condition and
lead to infinite loops.
Totals from 81 (0.04% of 202440) affected shaders: (Navi48)
Instrs: 3040566 -> 3040716 (+0.00%)
CodeSize: 17506768 -> 17507188 (+0.00%)
Latency: 16342966 -> 16345166 (+0.01%)
InvThroughput: 3112932 -> 3113286 (+0.01%)
Branches: 82229 -> 82365 (+0.17%)
Cc: mesa-stable
(cherry picked from commit 60b3e5b3f0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
If both src and dst are compressed formats, adjusting the extent isn't
necessary because it's required that texel block extent matches. The
previous division was also wrong because it was truncating partial
blocks causing issues in some tests.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4e00e1c3d0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Midpoint sorting is incompatible with how our traversal works.
Specifically, we change tMax when a hit is committed so we can skip over
BVH nodes that are guaranteed not to produce a closer hit. However,
changing tMax also changes the intersection interval of box nodes with
the ray, and thus, the midpoints of that interval. Stackless traversal
relies on getting nodes back in the exact same order as before, and if
that requirement is not met, traversal may incorrectly skip over nodes.
The likely benefit of midpoint sorting does not make up for the loss of
ability to skip over BVH nodes exceeding tMax, so simply disable
midpoint sorting.
This fixes geometry being visible behind other geometry when it
shouldn't be in various applications, including Half-Life 2 RTX.
Cc: mesa-stable
(cherry picked from commit c1a7680d93)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
move_rt_instructions() only makes sense for CPS recursive shaders, where
later rt_trace_ray calls can overwrite the current shader's RT system
values.
Running it on the function-call path can hoist load_hit_attrib_amd
above merged intersection writes, which corrupts any-hit
hitAttributeEXT. Move the pass into the existing CPS-only
non-intersection branch before nir_lower_shader_calls().
Fixes: c5d796c902 ("radv/rt: Use function call structure in NIR lowering")
Closes: #15074
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
(cherry picked from commit 5a7f4c62d8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Since the stack pointer may wrap around the stack size in overflow
cases, traversal logic calculates the real stack pointer with
nir_umod_imm(b, stack, args->stack_entries * args->stack_stride).
For ray queries, "stack" was initialized to
"stack_base + local_invocation_idx * 4". This was completely broken, as
the umod would later delete the stack base completely and overwrite the
start of LDS, which belongs to the apps' shared memory.
Instead, add the stack base as a constant offset in the load/store_stack
callback. (This should also save 1 VALU per ray query)
Also, delete radv_ray_traversal_args::stack_base since it's unused now.
Cc: mesa-stable
(cherry picked from commit b046eaf36d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
GFX12 encoding added one bit to the stack offset, doubling the limit on
the stack base offset that is possible to encode. In practice, this
always allows using bvh_stack_push* instructions on GFX12 since LDS is
still 64kB.
Cc: mesa-stable
Fixes: 59a39779 (radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode)
(cherry picked from commit 867d0b33b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
It must be lowered.
This fixes
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.{mesh,task}.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c4cb16159)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If the secondary changes the fragment output state and if the same
PS epilog used before ExecuteCommands() is re-bind immediately after
that call, the PS epilog state wouldn't be re-emitted.
Apply the same change for VS prologs, although the logic is slightly
different and the bug shouldn't occur. The whole logic of secondaries
should be completely rewritten because it's definitely not robust.
This fixes a GPU hang in Where Winds Meet, see
https://github.com/doitsujin/dxvk/issues/5436.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1a00587c44)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This is an old driver bug that could cause Z corruption on gfx8-11.5.
v2: handle allow_expclear differently
Cc: mesa-stable
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
(cherry picked from commit 4cfe08e583)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
allowed precision mismatches on uniforms, however if you lower precision on
16-bit consts, then this error triggers instead.
So here we relax the type matching and just make sure we match int vs
float.
Fixes: 0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5337
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 73bc604128)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We might need to DCE users of dead instructions removed by
process_block().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 17b18496f6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Cache flushes should be skipped on SDMA. In practice,
radv_emit_cache_flush() should only be called on GFX/ACE.
SDMA NOP packets are emitted in barriers directly.
This fixes recent VKCTS coverage
dEQP-VK.api.command_buffers.secondary_on_transfer_queue.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c4d5090d69)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This should definitely be an OR operation if MRT0 and MRT1 don't write
the same channels. This also requires to set the writemask manually
because when it's 0 (in case a dual-source output is missing), the
intrinsic computes the mask itself with the number of components.
No fossils-db changes on NAVI33.
Fixes: 45d8cd037a ("ac/nir: rewrite ac_nir_lower_ps epilog to fix dual src blending with mono PS")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14878
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2eb9420061)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This is possible since VK_KHR_maintenance10.
This fixes new VKCTS coverage in
dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ab6147e8ef)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Since the stride is always 32 dwords, we need to treat the workgroup
size as multiples of that value. Using MAX2() only works for cases where
the workgroup size is less than 32, which was hit by some CTS with 1x1
workgroups.
Cc: mesa-stable
(cherry picked from commit b08f9f192c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
While reworking image resolves completely in RADV, I found a very weird
bug where the only fix was to emit caches immediately after
decompressing the source resolve image (after FMASK_DECOMPRESS).
I have been struggling this for few hours and figured that it was
something related to context rolls (ie. as long the context was rolled
out, emitting the flushes immediately was required).
It turns out this was a known hardware bug on GFX6 that was implemented
in PAL. Though PAL only applies on GFX6 but GFX7-8 are also affected
based on my testing. Note that RadeonSI flushes CB_META too.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 837078b8d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Might happen with radv_emulate_rt=true.
Fixes the_great_circle/a6079328b8df7712 with polaris10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 75722da909)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The NIR_PASS macro only overwrites this when the pass actually makes
progress. If the pass doesn't make progress, the variable stays
uninitialized.
Clang correctly spots this and warns about it.
Cc: mesa-stable
(cherry picked from commit 47e4a68a83)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The hardware only provides 13 bits for encoding the stack base (in
dwords). That translates to the stack base being required to be below
8192 dwords, or 32kB. It's possible to exceed this - LDS is 64kB after
all. Add an explicit check to make sure we don't end up with offsets
that overflow the hw's address fields. This fixes Metro Exodus Enhanced
Edition, which was using ray queries in a 1024-thread sized workgroup,
resulting in exactly 64kB of LDS being required for the stack.
This check isn't required for RT pipelines as we always use 32 or 64
wide workgroups with no other LDS used, so it's impossible to reach this
stack base limit.
Cc: mesa-stable
(cherry picked from commit 59a397793e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.
is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1be4ffdff9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Could do better by checking which registers are clobbered/preserved,
but that's unlikely to be useful anyway.
Backport-to: 26.0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit fc7b5d7eed)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Only for partial copies because image stores don't decompress on writes
(ie. HTILE isn't updated by image stores).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 9f5a20abde)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
In glibc 2.43 the strstr function now propagate const to the output, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror.
Fix two of these cases by adding the const qualifier.
cc: mesa-stable
(cherry picked from commit ece5f671b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This could return the graphics DCC pipeline if it was created before,
and crash or potentially hang the GPU.
Found this while working on in-progress VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ad7151f4bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.
Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>
(cherry picked from commit 13c9e529bd)
Avoiding NaNs should have the same effect but it's good practice to not
rely on float OPs for correctness.
Fixes: 95a89f7 ("radv: Report smaller bvh sizes when possible")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 24a1e3d8c2)