This optimization used to optimize the allocated space for descriptors
when immutable samplers are equal. Though, this was basically broken :
- descriptor copies were broken for combiner image sampler (or sampler)
with equal immutable samplers because 96 bytes were copied instead of
64 bytes (cf. the linked ticket). This could be fixed but it's not
worth it.
- the value returned by vkGetDescriptorLayoutSupport() was broken, it
should have been 96 with no immutable samplers (or when they aren't
equal)
This optimization was also not applied for descriptor buffers which is
the default for vkd3d-proton and Zink. DXVK doesn't use db but it
doesn't use immutable samplers, so basically only native vulkan games
would be concerned.
Note that immutable samplers would still be inlined in shaders if no
indirect access which should be 99.9% of the usecase.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34928>
(cherry picked from commit 69ff204422)
In a scenario where the viewports/scissors are a dynamic state but the
count is static (ie. updated when a graphics pipeline is bound), the
driver wasn't considering that and it was re-emitting the previous
number of viewports/scissors.
This fixes rendering issue with Blender.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13127
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34921>
(cherry picked from commit 9a07ccbc89)
The hardware requires a power of two bpe. To do that, the driver
needs to adjust the pitch/offset/extent based on a texel scale factor
which only applies to 96-bits formats.
This fixes new VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34927>
(cherry picked from commit 4b73d7e817)
For coherent/volatile access, this would be too high for vector access.
Even when we didn't set the alignment, LLVM seemed to assume too high of
an alignment for 8/16-bit vector access.
Fixes generated_tests/cl/vload/vload-char-constant.cl
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34903>
(cherry picked from commit d0a09b6ff7)
This assumes that the start of the load is 32-bit aligned.
For example, a vec3 16-bit store with align_offset=2 should split off the
first component, not the last.
This probably also fixed splitting with 8-bit stores.
Fixes arb_copy_buffer-overlap
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34903>
(cherry picked from commit c1ecad2b11)
CmdTraceRays is neither a dispatch or a draw command which means it
shouldn't be affected by conditional rendering.
Fixes recent VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34868>
(cherry picked from commit 4b76d04f7f)
VK_ERROR_INITIALIZATION_FAILED will fail physical device enumeration.
Returning VK_ERROR_INCOMPATIBLE_DRIVER means that the driver can still
be used on supported GPUs when multiple GPUs are installed.
cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34783>
(cherry picked from commit 84b9c281fe)
Previously, this function tried to swap the instruction which is not
v_mov_b32, so that it doesn't introduce any new OPY-only instructions. If
both were v_mov_b32, it swapped Y. Since this makes Y opy-only, this can't
be done if X is also opy-only.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 408fa33c09 ("aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13101
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34841>
(cherry picked from commit 9ca71b52aa)
Emitting compute dispatches on SDMA just hangs. It might be needed
to switch to gang submit for these to work but fixing the GPU hang is
more important for now.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34805>
(cherry picked from commit 0684dc5fa8)
This is now basically the same as the original VALUMaskWriteHazard, except
it now considers both VALU and SALU writes.
Now that it's a part of VALUMaskWriteHazard, differences from the original
VALU lanemask workaround are:
- it includes SALU reads after the write
- it includes VALU writes and SALU/VALU reads after the write which are
not lanemasks
- it combines s_waitcnt_depctr instructions when it's a read after both a
SALU write and a VALU write
- non-exec VALU SGPR reads reset the SGPRs read by VALU as a lanemask
- exec SGPRs are ignored
resolve_all_gfx11() is also finished.
fossil-db (navi31):
Totals from 21538 (27.13% of 79377) affected shaders:
Instrs: 27628855 -> 27552972 (-0.27%); split: -0.30%, +0.03%
CodeSize: 145968448 -> 145667616 (-0.21%); split: -0.23%, +0.02%
Latency: 209537805 -> 209509519 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 36304270 -> 36301624 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12623
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11480
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529>
(cherry picked from commit ce2be5ab8e)
Similar to previous generations that support compression, except that
the driver don't need to configure a meta VA because DCC is completely
transparent to the userspace.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34517>
On GFX12, DCC is transparent to the driver and there is no meta VA.
Adding a new flag to know if the SDMA surface is compressed is needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34517>
This wasn't technically incorrect because V_028C70_BU_NUM_xxx values
are similar to V_028C70_NUMBER_xxx but it's better to use the corect
helper.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34517>
It has no effect because num_entries is 1K, but the table shows a lot of
potential.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34432>
It's not needed since:
8b3056343f - ac/gpu_info: bump required DRM minor version to 3.42.0 (kernel 5.15+)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34432>
CP is very slow on GFX12 and parsing the packet header is the main
bottleneck. Using paired context regs reduce the number of packet
headers and it should be more optimal.
It doesn't seem worth when only one context reg is emitted (one packet
header and same number of DWORDS) or when consecutive context regs are
emitted (would increase the number of DWORDS).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421>