Using view3dAs2dArray changes the tiling and it's slower (-7.5% in
Silent Hill 2 Remake) than using 3D tiling. The previous implementation
was the best one regarding performance (it's also what RadeonSI does).
Sadly it seems that sampler2DViewOf3D can't really be supported without
that but nobody really needs it apparently.
Also view3dAs2array is incompatible for 2D views of sparse 3D images
because sparse 3D images requires 3D tiling.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11997
This reverts commit f5805bcb8e.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31869>
If blending is disabled or the color write mask is 0, dual-source
blending would be ignored, and this can be simplified a bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31681>
When the FS is unknown, this can happen with fast-link GPL or unlinked
ESO, rely on the number of VS/TES outputs which should be a good
approximation of the number of PS inputs.
This fixes a (huge?) performance regression from May 2023 because
for depth-only rendering, the FS is NULL and NGG culling wasn't
considered at all.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31830>
This has been removed few years ago by mistake but it's important for
performance. This is mostly for addrlib to determine tile_swizzle which
is used to make memory access faster with multiple render targets.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31797>
We cannot set the {X,Y}MAX_RIGHT_EXCLUSION bits
if we have a sample location at a pixel boundary.
CTS does not seem to be catching this.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Co-authored-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31839>
RADV needs to adjust this register for user sample locations because
it seems possible to have a sample on the -8 coordinate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815>
It's possible to enable NGG culling with ESO if shaders are linked, or
if the VS doesn't need a prolog or if TES is used. This wasn't
supposed to be enabled but I think it worked just by luck because the
user SGPR value was probably zero and NGGC was disabled at draw time.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31829>
This is useful in order to correlate shader hashes between RGP and
Fossilize. This is because Fossilize needs to pass the capture
statistics flag for getting shader hashes and the pipeline key won't
match otherwise.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31820>
I don't think this matters for how we use CounterMap::operator==.
The BITSET_TEST() was unnecessary because of the BITSET_EQUAL above.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30478>
Setting the surface index is optimal for performance in case of
multiple MRTs because addrlib rotates tiles differently.
But this should be disabled when the image is used for descriptor
buffers capture&replay because the descriptor isn't immutable
(ie. tile_swizzle can be different).
This fixes dEQP-VK.binding_model.descriptor_buffer.capture_replay.*
on some GPUs where tile_swizzle is non-zero.
Fixes: 3b57a35ece ("radv: Enable descriptorBufferCaptureReplay.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31818>
When only of the depth/stencil aspects is used, RADV dispatches a
compute shader to initialize the HTILE buffer. But dispatching on SDMA
just hangs and the only way to initialize the HTILE buffer is to clear
both aspects using a memory fill operation.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31803>
insert_exec_mask will no longer use s_and_saveexec if there was a previous copy
from a sgpr to exec, so this code path is no longer taken.
No Foz-DB changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567>
This allows to re-use previous temporaries in case exec was restored
from a Temp, rather than having to create a new copy of exec.
Foz-DB Navi31:
Totals from 545 (0.69% of 79395) affected shaders:
Instrs: 216563 -> 215698 (-0.40%)
CodeSize: 1183536 -> 1180076 (-0.29%)
Latency: 1135269 -> 1135294 (+0.00%); split: -0.00%, +0.00%
Copies: 11933 -> 11072 (-7.22%)
SALU: 18990 -> 18129 (-4.53%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567>
Exec always contains the same value as the top of stack, even if the
top of stack is a temporary/constant. So if the predecessors have different
top of stack operands, don't insert a phi and use exec as the new top of stack.
No Foz-DB changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567>
insert_exec will start using this in the future, handle it the same
just without the path to save exec before the v_cmpx instruction.
No Foz-DB changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567>
In rare cases, it could happen that during post-RA validation,
live-var-analysis sets needs_vcc = false after if was true
before register allocation.
Fixes: bb5eace0dc ('aco/live_var_analysis: check for isPrecolored flag rather than isFixed')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31791>
We used to only store Temps in the stack, so undef meant exec.
Then the stack was changed to operands, and some places started storing exec
directly, drop the undef handling by replacing everything with Operand(exec, lm)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560>