The llvm::orc::ThreadSafeContext object wraps an llvm::Context and keeps
its reference.
As we are no longer able to squeeze out Context from ThreadSafeContext
in LLVM 21, do not let ThreadSafeContext create Context implicitly for
LLVM 21, instead explicitly create Context and then remember it.
This also eliminates the code creating a Context that is never disposed.
Fixes: cd129dbf8a ("gallivm: support LLVM 21")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37684>
This composition comes up a bunch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>
in extensive testing, using no atomics at all is a bit too loose for
basic/common cases like sharing a vertex bufer between contexts, readily
leading to unintended behavior
keeping the atomics internal ensures that they remain in the same ccx,
which avoids impacting performance
it does require a little trickery to avoid an extra atomic in the buffer
decrement case, however
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38176>
This helps us to more accurately count the number of registers that
need to be spilled to keep us below the maximum.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37188>
If the AR is loaded from a register changing that register in a loop was
resulting in a scheduling failure because the AR load was made dependend
on a later instruction. Fix the dependencies by only using dependencies on
older instruuctions in the same block.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14114
Fixes: d21054b4bc ("r600/sfn: Add pass to split addess and index register loads")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38056>
This fixes mesh shader performance of RADV for GravityMark by stopping
the lowering of ClipDistance[64][4] indirect access for mesh shader outputs.
The perf improvement is 14% on Navi48.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>
Blob generates such norm_mul for glmark2:shadow benchmark on STM32MP257.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38172>
I'm not really sure why Coverity doesn't tag the `delete[]` as a
potential leak since it also happens after ASSERT macros, like it did
with the call to `fclose()`.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Coverity points out that if the asserts fail, then the file won't be
closed, and therefore wont be deleted. We'd like to avoid littering the
temp directory with useless files.
This uses GTEST's `TEST_F` feature with a custom class to manager the
creation and destruction of the tmpfile.
CID: 1666502
CID: 1666525
CID: 1666579
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Slight differences due to different optimization order.
Totals from 135 (0.17% of 79839) affected shaders: (Navi48)
Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03%
CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04%
Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12%
InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84%
VClause: 4092 -> 4084 (-0.20%)
SClause: 7462 -> 7478 (+0.21%)
Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21%
Branches: 6395 -> 6386 (-0.14%)
PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07%
PreVGPRs: 6375 -> 6382 (+0.11%)
VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02%
SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12%
VMEM: 6704 -> 6696 (-0.12%)
SMEM: 12099 -> 12129 (+0.25%)
Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV:
baseline:
nir_opt_algebraic was called 12917 times from radv_optimize_nir()
nir_opt_cse was called 15204 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 31.48%
total instruction fetch cost: 28,642,638,021
with nir/algebraic: ad-hoc constant-fold ALU instructions
nir_opt_algebraic was called 12797 times from radv_optimize_nir()
nir_opt_cse was called 12963 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 30.63%
total instruction fetch cost: 28,284,386,123
=> ~1.27% improvement in total compile times
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
This prevents bugged CTS tests from tripping over with the following commits.
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero_*
These tests exhibit undefined values where the result depends on the ordering
of nir_opt_algebraic and nir_opt_constant_folding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>