textureGrad() has to be split into two halves on Mali: Computing the
gradient/LOD and doing the actual texture operation. On Valhal, we do
this with LOD_MODE_GRDESC but on Bifrost, we use LOD_MOD_EXPLICIT. When
converting to NIR, I missed this.
Fixes: 05a066c921 ("pan/nir: Add bifrost support to pan_nir_lower_tex()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41513>
Trailing zeroes should be harmless, but it seems to cause issues with
latest ffmpeg (which looks like an ffmpeg bug).
The extra bytes are useless, so we can just skip them like we already
do on VCN to workaround it.
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41485>
Because SDMA doesn't support MSAA, it's possible to get there because
RADV fallback to compute queue in this case.
Some tests only pass because RDNA2 and older don't support image
stores with depth/stencil and MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41492>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
this is both a correctness fix (insufficient MEM registers reserved in some
cases) and a performance fix (unnecessary allocations & zeroing in the RA when
we don't spill).
fixes dEQP-VK.dgc.ext.compute.misc.scratch_space
stats are noise but positive i guess.
Totals from 35 (1.32% of 2647) affected shaders:
Instrs: 396770 -> 396690 (-0.02%)
CodeSize: 6040832 -> 6039600 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
poking around, it seems branches stall the pipelines so we don't need to do any
dataflow analysis, but we do need to fall through for correctness. just keep
going across block boundaries. this isn't optimal yet but it reduces a
pile of A@1's already.
Totals from 1389 (52.47% of 2647) affected shaders:
CodeSize: 56385376 -> 56325776 (-0.11%); split: -0.13%, +0.03%
--
this also fixes issues where the first instruction of a block is a SEND that has
an unmet register dependency, since the old code was fundamentally broken. oops.
lol. fixes
dEQP-VK.compute.pipeline.workgroup_memory_explicit_layout.zero.uint8_t_array_to_uint_array_1
among many others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
Lets us use more accumulators, I think this is well motivated. Saw this in a
test shader.
Totals from 242 (9.14% of 2647) affected shaders:
Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00%
CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
Uses an approach based on HoneyKrisp. In the vertex shader, an
extra output writes 1 if the cull distance is >= 0, otherwise it
writes 0. In the fragment shader, if the extra outputs from the
vertex shader interpolate zero, all cull distances are < 0, so
the primitive is culled by discarding fragments.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41463>
This fixes CTS flakes in several tests, for example:
dEQP-VK.synchronization.signal_order.shared_binary_semaphore.write_copy_buffer_read_ssbo_compute_indirect.buffer_262144_opaque_fd
Fixes: 1c77a6f049 ("nvk: Don't emit MME FIFO config on Blackwell+")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41010>
We need to remove ddx/ddy before doing the cube lowering,
otherwise we insert instructions that break dominance.
Affects Sable.
Fixes: 7d552d71e9 ("ac/nir: optimize txd(coord, ddx/ddy(coord))")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41489>
When a VRS view is used with a depth/stencil view, the driver is
expected to copy the VRS rates to the HTILE buffer of the depth/stencil
view. Though if the image uses mipmaps and the base level can't support
HTILE there is no way to copy the rates. The workaround is to force VRS
to be 1x1 which is valid in Vulkan.
This fixes old VKCTS failures on RAPHAEL just because it supports
fragmentShadingRateWithShaderDepthStencilWrites compared to other GPUs
in CI (NAVI21/VANGOGH).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41427>
It's required with VK_KHR_maintenance11. This allows way more transfer
queue related CTS tests to run and all issues I found should already
be fixed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41316>
This changes timestamps so they are written with their available part
directly.
This allows to save a bit of memory and just write timestamp with only
one operation.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
That remove the need of a special case for timestamp and will allow some
simplification for MME copies.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
We only need some counters when the user request to track certain
queries, let's not keep them enabled all the time.
Following the proprietary driver here, unsure if that will have any
impact.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
Like the proprietary driver, let's clear counters at the beginning of a
query.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
We are going to conditionally change SET_ZPASS_PIXEL_COUNT and
SET_STATISTICS_COUNTER in queries handling.
To prepare for that we now use the MME shadow RAM to restore the
previous state.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
This avoid relying on semaphore releases when we have multiple queries
being reset.
NVIDIA proprietary driver is performing the same however it do fill the
full report instead (as it also contain the available part packed with
it)
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>
Instead of trying to wait for each available values to be zeroed, we now
use NV906F_SET_REFERENCE.
NV906F_SET_REFERENCE behaves like a FE WFI+MEMBAR on the command
processor itself, ensuring ordering.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41507>