Since the stack pointer may wrap around the stack size in overflow
cases, traversal logic calculates the real stack pointer with
nir_umod_imm(b, stack, args->stack_entries * args->stack_stride).
For ray queries, "stack" was initialized to
"stack_base + local_invocation_idx * 4". This was completely broken, as
the umod would later delete the stack base completely and overwrite the
start of LDS, which belongs to the apps' shared memory.
Instead, add the stack base as a constant offset in the load/store_stack
callback. (This should also save 1 VALU per ray query)
Also, delete radv_ray_traversal_args::stack_base since it's unused now.
Cc: mesa-stable
(cherry picked from commit b046eaf36d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
GFX12 encoding added one bit to the stack offset, doubling the limit on
the stack base offset that is possible to encode. In practice, this
always allows using bvh_stack_push* instructions on GFX12 since LDS is
still 64kB.
Cc: mesa-stable
Fixes: 59a39779 (radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode)
(cherry picked from commit 867d0b33b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
It must be lowered.
This fixes
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.{mesh,task}.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c4cb16159)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If the secondary changes the fragment output state and if the same
PS epilog used before ExecuteCommands() is re-bind immediately after
that call, the PS epilog state wouldn't be re-emitted.
Apply the same change for VS prologs, although the logic is slightly
different and the bug shouldn't occur. The whole logic of secondaries
should be completely rewritten because it's definitely not robust.
This fixes a GPU hang in Where Winds Meet, see
https://github.com/doitsujin/dxvk/issues/5436.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1a00587c44)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This is an old driver bug that could cause Z corruption on gfx8-11.5.
v2: handle allow_expclear differently
Cc: mesa-stable
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
(cherry picked from commit 4cfe08e583)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
allowed precision mismatches on uniforms, however if you lower precision on
16-bit consts, then this error triggers instead.
So here we relax the type matching and just make sure we match int vs
float.
Fixes: 0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5337
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 73bc604128)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We might need to DCE users of dead instructions removed by
process_block().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 17b18496f6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Cache flushes should be skipped on SDMA. In practice,
radv_emit_cache_flush() should only be called on GFX/ACE.
SDMA NOP packets are emitted in barriers directly.
This fixes recent VKCTS coverage
dEQP-VK.api.command_buffers.secondary_on_transfer_queue.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c4d5090d69)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This should definitely be an OR operation if MRT0 and MRT1 don't write
the same channels. This also requires to set the writemask manually
because when it's 0 (in case a dual-source output is missing), the
intrinsic computes the mask itself with the number of components.
No fossils-db changes on NAVI33.
Fixes: 45d8cd037a ("ac/nir: rewrite ac_nir_lower_ps epilog to fix dual src blending with mono PS")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14878
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2eb9420061)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This is possible since VK_KHR_maintenance10.
This fixes new VKCTS coverage in
dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ab6147e8ef)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Since the stride is always 32 dwords, we need to treat the workgroup
size as multiples of that value. Using MAX2() only works for cases where
the workgroup size is less than 32, which was hit by some CTS with 1x1
workgroups.
Cc: mesa-stable
(cherry picked from commit b08f9f192c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
While reworking image resolves completely in RADV, I found a very weird
bug where the only fix was to emit caches immediately after
decompressing the source resolve image (after FMASK_DECOMPRESS).
I have been struggling this for few hours and figured that it was
something related to context rolls (ie. as long the context was rolled
out, emitting the flushes immediately was required).
It turns out this was a known hardware bug on GFX6 that was implemented
in PAL. Though PAL only applies on GFX6 but GFX7-8 are also affected
based on my testing. Note that RadeonSI flushes CB_META too.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 837078b8d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Might happen with radv_emulate_rt=true.
Fixes the_great_circle/a6079328b8df7712 with polaris10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 75722da909)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The NIR_PASS macro only overwrites this when the pass actually makes
progress. If the pass doesn't make progress, the variable stays
uninitialized.
Clang correctly spots this and warns about it.
Cc: mesa-stable
(cherry picked from commit 47e4a68a83)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The hardware only provides 13 bits for encoding the stack base (in
dwords). That translates to the stack base being required to be below
8192 dwords, or 32kB. It's possible to exceed this - LDS is 64kB after
all. Add an explicit check to make sure we don't end up with offsets
that overflow the hw's address fields. This fixes Metro Exodus Enhanced
Edition, which was using ray queries in a 1024-thread sized workgroup,
resulting in exactly 64kB of LDS being required for the stack.
This check isn't required for RT pipelines as we always use 32 or 64
wide workgroups with no other LDS used, so it's impossible to reach this
stack base limit.
Cc: mesa-stable
(cherry picked from commit 59a397793e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.
is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1be4ffdff9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Could do better by checking which registers are clobbered/preserved,
but that's unlikely to be useful anyway.
Backport-to: 26.0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit fc7b5d7eed)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Only for partial copies because image stores don't decompress on writes
(ie. HTILE isn't updated by image stores).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 9f5a20abde)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
In glibc 2.43 the strstr function now propagate const to the output, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror.
Fix two of these cases by adding the const qualifier.
cc: mesa-stable
(cherry picked from commit ece5f671b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This could return the graphics DCC pipeline if it was created before,
and crash or potentially hang the GPU.
Found this while working on in-progress VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ad7151f4bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.
Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>
(cherry picked from commit 13c9e529bd)
Avoiding NaNs should have the same effect but it's good practice to not
rely on float OPs for correctness.
Fixes: 95a89f7 ("radv: Report smaller bvh sizes when possible")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 24a1e3d8c2)
It was incorrect to mark chit/miss arguments as discardable without
the equivalent in the traversal shader. Also, tail calls with modified
parameters that aren't marked discardable are incorrect.
This could lead to random corruption by clobbering parameter values
across two levels of nested calls: A Raygen shader calls traversal,
expecting e.g. the ray tMax parameter to be preserved. Traversal
overwrites the parameter's register with the hit t and tail-calls chit,
which immediately returns to raygen. Now the raygen shader still has the
clobbered tMax (which is actually the ray hit t) - if it calls traversal
multiple times, the second traversal iteration may use the previous
ray's hit t as tMax instead of the intended value.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 3275be503c)
There were two issues here:
1. Tail calls where the tail-callee receives modified parameters are
hazardous and only work if the parameter is return or discardable.
Otherwise, the caller of the function that executes the tail-call may
not expect some of the parameters to be clobbered.
2. There was also an indexing confusion with the call instruction vs.
call signature parameters. The call instruction has not been adapted
to the new lowered signatures, where the system args are prepended. To
make things clearer, split the loop into two, one iterating over
parameters in the call signature and one for parameters of the call
instruction.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 0d7705c206)
The original semantic of discardable parameters was "okay, nothing
actually uses this parameter, feel free to clobber it", but we were
only using it with tail calls from a function without discardable
parameters, which was broken.
Instead, slightly change the use-case and utilize the "discardable"
attribute to mark parameters that the callee will clobber in a tail
call. This makes doing tail calls safe when the tail callee receives a
modified set of parameters.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit ad23e02a28)
Also change to use H265 constant for maxDpbSlots (both values for H264 and H265
are the same).
Fixes: ee535aa039 ("radv: video: rework maxActiveReferenceSlot/MaxDpbSlots")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39609>
(cherry picked from commit 7607aeefa6)
This is just wrong if the secondary uses ESO because the emitted
pipelines would be NULL in the secondary, but if the app re-binds
the same pipeline in the primary it would consider it as already
emitted. A sequence like this would break:
CmdBindPipeline(compute)
CmdDispatch()
CmdExecuteCommands() --> with ESO compute
CmdBindPipeline(compute)
CmdDispatch()
This tracking is probably useless anyways because it's unlikely that
apps will rebind the same pipeline right after CmdExecuteCommands() but
let's keep it because this is a bugfix.
Fixes
dEQP-VK.api.command_buffers.pipeline_shader_object_mix_with_secondaries.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39587>
(cherry picked from commit 9ad02b5724)
It might be that the radv_pipeline_cache_lookup_nir_handle() in
radv_ray_tracing_pipeline_cache_search() fails but we will later need the
NIR. If rt_stages[i].shader was non-NULL, then we would not have created
the NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38263>
(cherry picked from commit 89eefdcadb)
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492>
(cherry picked from commit c10ebb0fda)