Emma Anholt
ed8676dc28
nir: Rename the unit_test_*_amd intrinics to be un-vendored.
...
We'll reuse these from the nir_opt_algebraic_pattern_test.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076 >
2026-01-15 19:09:37 +00:00
Natalie Vock
cc81c7de23
nir,aco: Clean up useless lowering of sbt_base_amd
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580 >
2026-01-14 14:19:07 +00:00
Natalie Vock
0a1911b220
radv,aco: Use function call structure for RT programs
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580 >
2026-01-14 14:19:07 +00:00
Natalie Vock
06c2e90e35
aco: Note if a parameter needs to be explicitly preserved
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580 >
2026-01-14 14:19:05 +00:00
Rhys Perry
7a09e4a740
aco: use correct addition opcodes in gfx6-8 RT prolog
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 60dd9d797e ("aco: Swizzle ray launch IDs in the RT prolog")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39232 >
2026-01-14 11:23:42 +00:00
Rhys Perry
da728d5a1a
aco: micro-optimize ray launch ID swizzling
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39232 >
2026-01-14 11:23:42 +00:00
Natalie Vock
0d93e8ce54
aco: Don't insert p_reload_preserved in loops
...
This can't work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157 >
2026-01-12 21:46:50 +00:00
Konstantin Seurer
39d58a55a7
aco: Add support to f2f16 with rtpi/rtni
...
Those rounding modes are needed when computing 16-bit bounding boxes
since the bounding box must not get smaller.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883 >
2026-01-10 11:34:12 +01:00
Natalie Vock
60dd9d797e
aco: Swizzle ray launch IDs in the RT prolog
...
This converts from 1D workgroups to 2D ray launch IDs entirely via
shader ALU, including handling partial/cut-off workgroups optimally.
Doing this entirely in-shader means it Just Works(TM) with indirect
dispatches as well. Previous approaches manipulating various things on
CPU depending on the dispatch size couldn't handle indirect dispatches.
The swizzle implemented here also swizzles with a recursive Z-order
pattern, which should be a little more optimal than arranging
invocations linearly within the wave.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142 >
2026-01-08 19:49:55 +01:00
Natalie Vock
1f6ac3fa93
radv/rt,aco: Always dispatch 1D workgroups for RT
...
We will swizzle the workgroups ourselves in the next commit.
Removes the need for 1D dispatch workarounds.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142 >
2026-01-08 19:49:54 +01:00
Georg Lehmann
eb4737a1dd
nir: add nir_alu_instr_is_exact helper
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103 >
2026-01-07 09:40:57 +00:00
Daniel Schürmann
1e8d367537
amd: add and use ac_cu_info::has_vtx_format_alpha_adjust_bug
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:48 +00:00
Daniel Schürmann
addd4ea59f
aco: pass aco_compiler_options to init_program()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:46 +00:00
Alyssa Rosenzweig
079e9ae606
treewide: use BITSET_*_COUNT
...
Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955 >
2025-12-16 17:42:10 +00:00
Timur Kristóf
f001515c87
aco: Use only VGPR offset on buffer atomics on GFX6-7
...
SGPR offset is not included in the bounds check
according to the ISA documentation of GFX6-7 and
indeed it can trigger VM faults on OOB access.
Note that ACO already doesn't use the SGPR offset
on GFX6-7 for buffer loads and stores. This commit
just does the same for buffer atomics.
This commit mitigates a ton of VM faults that are exposed by:
24e75fea4b
Fossil DB stats on Hawaii (GFX7):
Totals from 148 (0.24% of 61818) affected shaders:
Instrs: 324004 -> 327352 (+1.03%)
CodeSize: 1556468 -> 1514100 (-2.72%); split: -2.74%, +0.02%
Latency: 1271480 -> 1276894 (+0.43%)
InvThroughput: 396850 -> 397740 (+0.22%)
VClause: 6861 -> 6858 (-0.04%)
Copies: 34083 -> 37430 (+9.82%)
PreVGPRs: 5705 -> 5706 (+0.02%)
VALU: 147529 -> 150898 (+2.28%)
SALU: 98194 -> 98172 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958 >
2025-12-15 21:03:19 +00:00
Georg Lehmann
a2b70ce4ec
aco/isel: remove uniform reduce/scan optimization
...
This is now done in NIR, with the exception of exclusive min/max/and/or scans.
But those are not really useful, and if we ever come across them we can
optimize them in NIR using write_invocation_amd.
No Foz-DB changes on Navi21.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38902 >
2025-12-15 12:22:32 +00:00
Georg Lehmann
072815e5cb
aco/gfx6: move mrtz writemask workaround to assembler and handle all mrt
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38853 >
2025-12-12 17:00:51 +00:00
Georg Lehmann
ef246aaf72
aco/isel: emit register copies for workgroup ids
...
This way, we don't overestimate SGPR pressure.
Foz-DB Navi48:
Totals from 1413 (1.45% of 97637) affected shaders:
Instrs: 3468375 -> 3468585 (+0.01%); split: -0.01%, +0.02%
CodeSize: 18643064 -> 18643520 (+0.00%); split: -0.01%, +0.01%
VGPRs: 71776 -> 71788 (+0.02%)
SpillSGPRs: 18575 -> 18561 (-0.08%)
Latency: 23207643 -> 23207998 (+0.00%); split: -0.00%, +0.01%
InvThroughput: 8116806 -> 8116541 (-0.00%); split: -0.01%, +0.00%
VClause: 75250 -> 75252 (+0.00%); split: -0.00%, +0.00%
SClause: 65274 -> 65283 (+0.01%); split: -0.02%, +0.04%
Copies: 275750 -> 275942 (+0.07%); split: -0.03%, +0.10%
PreSGPRs: 70246 -> 69072 (-1.67%)
VALU: 1892111 -> 1892092 (-0.00%); split: -0.00%, +0.00%
SALU: 523460 -> 523648 (+0.04%); split: -0.02%, +0.05%
VOPD: 41097 -> 41102 (+0.01%)
Sadly the RA noise is slightly negative for instruction count.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830 >
2025-12-11 08:06:59 +00:00
Georg Lehmann
911e1ce168
aco/isel: emit exec copy for ballot(true)
...
Once copy propagated in the optimizer, this will allow
using nir_opt_uniform_subgroup without too many regressions.
Foz-DB Navi48:
Totals from 405 (0.41% of 97637) affected shaders:
Instrs: 3796716 -> 3796894 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20116136 -> 20116652 (+0.00%); split: -0.00%, +0.00%
Latency: 18326661 -> 18327114 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 3353206 -> 3353268 (+0.00%); split: -0.00%, +0.00%
Copies: 292307 -> 293830 (+0.52%)
SALU: 507523 -> 507738 (+0.04%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830 >
2025-12-11 08:06:58 +00:00
Marek Olšák
308da55f1a
radv,radeonsi: use FRAG_RESULT_DUAL_SRC_BLEND
...
this is slightly nicer
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604 >
2025-12-10 19:16:46 +00:00
Natalie Vock
8bc5fdef53
aco: Remove unused p_reload_preserved def
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38281 >
2025-12-08 19:12:52 +00:00
Marek Olšák
2c9995a94f
ac/nir: move aco_nir_op_supports_packed_math_16bit here
...
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603 >
2025-11-28 20:16:10 +00:00
Georg Lehmann
0f7a1ce23e
aco/optimizer: some more mul opts
...
Foz-DB Navi48:
Totals from 1650 (2.00% of 82419) affected shaders:
Instrs: 975716 -> 970609 (-0.52%); split: -0.53%, +0.00%
CodeSize: 4986260 -> 4982916 (-0.07%); split: -0.09%, +0.02%
Latency: 2795394 -> 2793211 (-0.08%); split: -0.09%, +0.01%
InvThroughput: 620892 -> 620914 (+0.00%); split: -0.00%, +0.01%
VClause: 18773 -> 18729 (-0.23%)
SClause: 13219 -> 13218 (-0.01%)
Copies: 53619 -> 53620 (+0.00%); split: -0.01%, +0.01%
VALU: 592094 -> 592096 (+0.00%); split: -0.00%, +0.00%
SALU: 96586 -> 93532 (-3.16%); split: -3.17%, +0.00%
Foz-DB Navi21:
Totals from 1647 (2.00% of 82387) affected shaders:
Instrs: 1104100 -> 1100149 (-0.36%); split: -0.36%, +0.00%
CodeSize: 5631092 -> 5637668 (+0.12%); split: -0.00%, +0.12%
Latency: 3503029 -> 3501621 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 1088494 -> 1088495 (+0.00%); split: -0.00%, +0.00%
VClause: 20898 -> 20885 (-0.06%)
Copies: 72641 -> 72635 (-0.01%); split: -0.02%, +0.01%
VALU: 725593 -> 725592 (-0.00%); split: -0.00%, +0.00%
SALU: 139046 -> 135175 (-2.78%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38530 >
2025-11-25 11:49:17 +00:00
Georg Lehmann
3a175b54a4
aco,nir: support subdword v_permlane_b16
...
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389 >
2025-11-17 23:33:59 +00:00
Marek Olšák
e372365cf4
nir: rename nir_copy_prop -> nir_opt_copy_prop
...
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38411 >
2025-11-15 02:16:38 +00:00
Konstantin Seurer
de32f9275f
treewide: add & use parent instr helpers
...
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313 >
2025-11-12 21:22:13 +00:00
Daniel Schürmann
5682e39e6b
amd: enable load/store_shared2_amd for GFX6
...
Totals from 1509 (2.43% of 62200) affected shaders: (Pitcairn)
MaxWaves: 8078 -> 8057 (-0.26%); split: +0.09%, -0.35%
Instrs: 977182 -> 951746 (-2.60%); split: -2.62%, +0.02%
CodeSize: 4951468 -> 4758192 (-3.90%); split: -3.92%, +0.01%
SGPRs: 76704 -> 76696 (-0.01%)
VGPRs: 81092 -> 81068 (-0.03%); split: -0.34%, +0.31%
Latency: 11663237 -> 11526070 (-1.18%); split: -1.19%, +0.01%
InvThroughput: 6198904 -> 6114851 (-1.36%); split: -1.43%, +0.07%
VClause: 26656 -> 26655 (-0.00%); split: -0.05%, +0.05%
SClause: 22304 -> 22307 (+0.01%); split: -0.03%, +0.04%
Copies: 107503 -> 109564 (+1.92%); split: -0.23%, +2.15%
Branches: 22917 -> 22918 (+0.00%)
PreSGPRs: 42246 -> 42242 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 64561 -> 64761 (+0.31%); split: -0.01%, +0.32%
VALU: 600285 -> 601139 (+0.14%); split: -0.26%, +0.40%
SALU: 130622 -> 130851 (+0.18%); split: -0.16%, +0.33%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37682 >
2025-11-11 17:12:17 +00:00
Natalie Vock
f0c613765c
aco: Add preload_preserved pseudo instruction
...
These are helper instructions for the spill_preserved pass to insert
reloads for registers that are preserved by the ABI, yet
clobbered by the callee shader.
There is one p_reload_preserved instruction at the end of each block.
This allows us to insert reloads early, to alleviate the high latency of
scratch reloads.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37381 >
2025-11-06 12:09:39 +00:00
Samuel Pitoiset
a0d607bfdb
radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes
...
Not the most optimal solution but 64-bit vertex attributes are rarely
used. Could still revisit if we find a real use case that matters.
This fixes recent VKCTS coverage:
dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38237 >
2025-11-05 07:26:45 +00:00
Samuel Pitoiset
ba5bf81aa2
aco: fix reserving VGPRs for 64-bit attributes in VS prologs
...
Otherwise the fetch index would be overwritten if the attribute format
is 64-bit and more than 2 components are loaded.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14242
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38237 >
2025-11-05 07:26:45 +00:00
Georg Lehmann
0f54136730
aco/isel: emit vop2 v_lshlrev_b64 for gfx12+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156 >
2025-10-31 08:31:03 +00:00
Georg Lehmann
7ac67e2711
aco/isel: emit vop2 v_max_f64 for gfx12+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156 >
2025-10-31 08:31:03 +00:00
Georg Lehmann
8397b91934
aco/isel: emit vop2 v_min_f64 for gfx12+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156 >
2025-10-31 08:31:02 +00:00
Georg Lehmann
2e120d4e26
aco/isel: emit vop2 v_mul_f64 for gfx12+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156 >
2025-10-31 08:31:01 +00:00
Georg Lehmann
86ea462f4d
aco/isel: emit vop2 v_fadd_f64 for gfx12+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156 >
2025-10-31 08:31:01 +00:00
Georg Lehmann
0c8b885e21
aco/isel: emit v_mul_f64 for fp64 fsat
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38011 >
2025-10-29 17:57:52 +00:00
Georg Lehmann
9ece74ce79
aco/isel: emit v_mul_f64 with modifiers for fneg/fabs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38011 >
2025-10-29 17:57:52 +00:00
Konstantin Seurer
47ffe2ecd4
aco: Fixup out_launch_size_y in the RT prolog for 1D dispatch
...
launch_size_y is set to ACO_RT_CONVERTED_2D_LAUNCH_SIZE for 1D
dispatches. The prolog needs to set it to 1 so that the app shader
loads the correct value.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37974 >
2025-10-23 07:56:35 +00:00
Daniel Schürmann
eecd1c020d
amd: keep ac_shader_config::lds_size unaligned
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
fe6ff6d1ef
aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule
...
Use utility functions instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
11db02d5d9
radv: calculate LDS allocation requirements independently from the compiler
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
b651234414
amd: change ac_shader_config::lds_size to bytes
...
We still keep it aligned to allocation granularity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
d0b87a0d5f
ac/nir_flag_smem_for_loads: call divergence analysis internally
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().
Totals from 75 (0.09% of 79839) affected shaders: (Navi48)
Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef
amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
...
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Samuel Pitoiset
bc32286e5b
radv: declare a new user SGPR for dynamic descriptors
...
To move them out of push constants.
fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768 >
2025-10-14 15:34:43 +00:00
Georg Lehmann
58163f65f0
aco/optimizer: rework packed fneg opt
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:40 +00:00
Georg Lehmann
6eac72088c
aco/gfx10+: only work around split execution of uniform LDS in WGP mode
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
LDS instructions from one CU won't split the execution of other LDS instruction
on the same CU.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31630 >
2025-10-13 10:22:22 +00:00
Georg Lehmann
c13caa5e5f
aco: fix global_atomic_swap offset overflow check
...
Fixes: d7dcd81c77 ("aco/gfx6: allow both constant and gpr offset for global with sgpr address")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821 >
2025-10-13 09:41:41 +00:00
Marek Olšák
3fe651f607
nir: remove load_smem_amd
...
replaced by load_global_amd + ACCESS_SMEM_AMD
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:54:11 +00:00
Rhys Perry
20af16b4d8
aco: use MTBUF for 64-bit atomic load/store
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00