This composition comes up a bunch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>
in extensive testing, using no atomics at all is a bit too loose for
basic/common cases like sharing a vertex bufer between contexts, readily
leading to unintended behavior
keeping the atomics internal ensures that they remain in the same ccx,
which avoids impacting performance
it does require a little trickery to avoid an extra atomic in the buffer
decrement case, however
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38176>
This helps us to more accurately count the number of registers that
need to be spilled to keep us below the maximum.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37188>
If the AR is loaded from a register changing that register in a loop was
resulting in a scheduling failure because the AR load was made dependend
on a later instruction. Fix the dependencies by only using dependencies on
older instruuctions in the same block.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14114
Fixes: d21054b4bc ("r600/sfn: Add pass to split addess and index register loads")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38056>
This fixes mesh shader performance of RADV for GravityMark by stopping
the lowering of ClipDistance[64][4] indirect access for mesh shader outputs.
The perf improvement is 14% on Navi48.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>
Blob generates such norm_mul for glmark2:shadow benchmark on STM32MP257.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38172>
I'm not really sure why Coverity doesn't tag the `delete[]` as a
potential leak since it also happens after ASSERT macros, like it did
with the call to `fclose()`.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Coverity points out that if the asserts fail, then the file won't be
closed, and therefore wont be deleted. We'd like to avoid littering the
temp directory with useless files.
This uses GTEST's `TEST_F` feature with a custom class to manager the
creation and destruction of the tmpfile.
CID: 1666502
CID: 1666525
CID: 1666579
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Slight differences due to different optimization order.
Totals from 135 (0.17% of 79839) affected shaders: (Navi48)
Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03%
CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04%
Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12%
InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84%
VClause: 4092 -> 4084 (-0.20%)
SClause: 7462 -> 7478 (+0.21%)
Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21%
Branches: 6395 -> 6386 (-0.14%)
PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07%
PreVGPRs: 6375 -> 6382 (+0.11%)
VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02%
SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12%
VMEM: 6704 -> 6696 (-0.12%)
SMEM: 12099 -> 12129 (+0.25%)
Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV:
baseline:
nir_opt_algebraic was called 12917 times from radv_optimize_nir()
nir_opt_cse was called 15204 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 31.48%
total instruction fetch cost: 28,642,638,021
with nir/algebraic: ad-hoc constant-fold ALU instructions
nir_opt_algebraic was called 12797 times from radv_optimize_nir()
nir_opt_cse was called 12963 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 30.63%
total instruction fetch cost: 28,284,386,123
=> ~1.27% improvement in total compile times
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
This prevents bugged CTS tests from tripping over with the following commits.
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero_*
These tests exhibit undefined values where the result depends on the ordering
of nir_opt_algebraic and nir_opt_constant_folding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
In Gfx9+ the destination should be set to ARF null in all those cases, the
use of IP was a requirement of old versions only. The already zeroed
bits will encode ARF null, so no need to set.
Skipping the helper avoids setting unwanted bits (like hstride), which
in Gfx12+ are MBZ.
This patch adjust the expectations of the asm tests to remove the dst
type and dst stride fields -- will expect them all zeroed.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36454>
This is better than using the generic helper since will not set unwanted
bits (e.g. hstride) and it is already handling their case for Gfx12+
anyway.
There's an extra helper now for the case where src1 is not used. In
Gfx9-11 it needs to be set to ARF but with a matching type of src0.
Assembler was updated to follow the same approach.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36454>
Even though some platforms support int64 they don't support indirect
movs with 64-bit values. Effectively this is only supported for non-LP
Gfx9.
This fixes various tests in dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.*.push_constant.*64*
on BMG.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38125>
Use same approach as the other code checking for this vstride. Argument
could be made we want to reuse the same enum value for both the encoded
and decoded version, but for now follow the existing practice.
This will cause
dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.vulkan_memory_model.type_punning.load.push_constant.int64_to_uint64
and similar tests to fail validation on BMG. Later patch will fix that.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38125>
If you somehow have MESA_LOADER_DRIVER_OVERRIDE= in your environment,
you certainly weren't trying to force load the driver named "".
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38165>
New host feature allows usage of external_memory_host extension
for external memory support, mainly for software renderers which
don't implement platform specific extensions.
Also moves enablement of queue_family_foreign outside of android,
to allow it also on linux guests (ref: github PR#74), and removes
the check for VK_MVK_moltenvk extension in favor of metal mode.
Test: -gpu lavapipe -feature VulkanNativeSwapchain on windows
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38153>