Emma Anholt
10ba7675c8
nir/uub: Use an optional max_samples from drivers for sample counts.
...
This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my
shader-db.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585 >
2025-12-11 14:26:11 +00:00
Marek Olšák
308da55f1a
radv,radeonsi: use FRAG_RESULT_DUAL_SRC_BLEND
...
this is slightly nicer
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604 >
2025-12-10 19:16:46 +00:00
Arcady Goldmints-Orlov
0df8aa940c
nir: Use nir_shader_intrinsics_pass in nir_lower_io_to_scalar
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38816 >
2025-12-05 22:30:22 +00:00
Marek Olšák
e6499fa73e
nir/recompute_io_bases: move color input bases after all other inputs
...
This is related to the FS prolog.
It should have no effect on other drivers.
v2: make it optional via io_options
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599 >
2025-11-29 05:00:40 +00:00
Marek Olšák
fa0bea5ff8
nir: remove nir_io_add_const_offset_to_base
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_opt_constant_folding does it now.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:38 +00:00
Marek Olšák
21cdbfa223
ac,radv: move opt_vectorize_callback to common code
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
radeonsi will use it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603 >
2025-11-28 20:16:10 +00:00
Marek Olšák
2c9995a94f
ac/nir: move aco_nir_op_supports_packed_math_16bit here
...
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603 >
2025-11-28 20:16:10 +00:00
Marek Olšák
9e339f4b32
nir: rename nir_lower_indirect_derefs -> nir_lower_indirect_derefs_to_if_else_trees
...
This describes better what it does.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38471 >
2025-11-20 05:42:11 +00:00
Marek Olšák
e372365cf4
nir: rename nir_copy_prop -> nir_opt_copy_prop
...
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38411 >
2025-11-15 02:16:38 +00:00
Rhys Perry
00edddf542
ac/nir: add some tests for ac_nir_lower_mem_access_bit_sizes
...
These test that nothing crashes for any possible input. With print=true,
it can also be used to compare the behaviour of two different
ac_nir_lower_mem_access_bit_sizes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37995 >
2025-11-13 15:23:20 +00:00
Konstantin Seurer
de32f9275f
treewide: add & use parent instr helpers
...
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313 >
2025-11-12 21:22:13 +00:00
Timur Kristóf
7f5f8b3932
ac/nir/ngg: Use align() instead of ALIGN()
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38364 >
2025-11-12 13:40:55 +00:00
Timur Kristóf
8f99d736d0
ac/nir/ngg: Fix scratch space for NGG GS streamout
...
For GS streamout, we need the following LDS scratch space:
- Repacking streamout vertices takes 1 dword per 4 waves per stream
(max 16 bytes for Wave64, max 32 bytes for Wave32)
- 1 dword per stream for buffer info
(16 bytes)
- 1 dword per buffer for buffer info
(16 bytes)
Previously, the space used for buffer info aliased with the
space for repacking the output vertices in ngg_gs_finale(),
and there was no barrier in between, which caused a race
condition, resulting in random failure.
Fix this by allocating a few more LDS dwords so that aliasing
is not required, which also allows us to remove an extra
workgroup barrier.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12705
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38364 >
2025-11-12 13:40:55 +00:00
Marek Olšák
9125e34372
amd: lower get_ssbo_size in ac_nir_lower_resinfo
...
The code for lowering get_ssbo_size will be different in future chips,
so do it in common code to reduce duplication in the future.
Lower get_ssbo_size to ssbo_descriptor_amd + nir_channel.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097 >
2025-11-02 01:42:07 +00:00
Marek Olšák
9def0a6e5b
ac/nir: set support_indirect_inputs/outputs in common code
...
This fixes mesh shader performance of RADV for GravityMark by stopping
the lowering of ClipDistance[64][4] indirect access for mesh shader outputs.
The perf improvement is 14% on Navi48.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155 >
2025-10-31 00:57:46 +00:00
Marek Olšák
966cb36722
amd: constify struct radeon_surf
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38093 >
2025-10-29 12:50:44 +00:00
Rhys Perry
f3ff2375ec
ac/nir: don't consider quads incomplete inside loops
...
We move terminates to outside loops, so this doesn't matter anymore.
fossil-db (gfx1201):
Totals from 145 (0.18% of 79839) affected shaders:
Instrs: 174693 -> 174389 (-0.17%); split: -0.18%, +0.01%
CodeSize: 917068 -> 915692 (-0.15%); split: -0.16%, +0.01%
VGPRs: 8340 -> 8184 (-1.87%)
Latency: 2528888 -> 2521006 (-0.31%); split: -0.48%, +0.16%
InvThroughput: 502383 -> 504082 (+0.34%); split: -0.44%, +0.78%
Copies: 15968 -> 15632 (-2.10%); split: -2.14%, +0.04%
PreVGPRs: 5918 -> 5858 (-1.01%)
VALU: 92802 -> 92484 (-0.34%); split: -0.35%, +0.01%
SALU: 29437 -> 29430 (-0.02%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561 >
2025-10-23 11:22:02 +00:00
Rhys Perry
9babec1366
radv,radeonsi: use optimize_txd
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561 >
2025-10-23 11:22:01 +00:00
Rhys Perry
7d552d71e9
ac/nir: optimize txd(coord, ddx/ddy(coord))
...
This is done in ac_nir_lower_tex so that we can optimize derivative
calculations with a different exec mask than the texture sample by using
the nir_strict_wqm_coord_amd path.
It's also more aware of divergence than nir_lower_tex is.
fossil-db (gfx1201):
Totals from 103 (0.13% of 79839) affected shaders:
MaxWaves: 2610 -> 2620 (+0.38%)
Instrs: 347283 -> 345912 (-0.39%); split: -0.40%, +0.00%
CodeSize: 1892380 -> 1883824 (-0.45%); split: -0.46%, +0.00%
VGPRs: 8028 -> 7824 (-2.54%)
Latency: 3942575 -> 3939623 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 867147 -> 865281 (-0.22%); split: -0.24%, +0.02%
VClause: 6230 -> 6221 (-0.14%); split: -0.19%, +0.05%
SClause: 3910 -> 3914 (+0.10%); split: -0.26%, +0.36%
Copies: 16091 -> 15721 (-2.30%); split: -2.74%, +0.44%
PreSGPRs: 4651 -> 4658 (+0.15%)
PreVGPRs: 6389 -> 6320 (-1.08%); split: -1.17%, +0.09%
VALU: 228715 -> 227490 (-0.54%); split: -0.54%, +0.01%
SALU: 32763 -> 32767 (+0.01%); split: -0.06%, +0.07%
VMEM: 9027 -> 9024 (-0.03%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561 >
2025-10-23 11:22:00 +00:00
Rhys Perry
309ac1f0c0
ac/nir: refactor move_coords_from_divergent_cf a bit
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561 >
2025-10-23 11:21:59 +00:00
Rhys Perry
42bb81137e
ac/nir: stop using NIR_PASS in ac_nir_lower_ngg_nogs()
...
When NIR_DEBUG=serialize or NIR_DEBUG=clone is used, NIR_PASS recreates
nir_function_impl and nir_variable objects, causing use-after-free since
ac_nir_lower_ngg_nogs() keeps pointers to those in local variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13946
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573 >
2025-10-23 10:44:38 +00:00
Rhys Perry
b18421ae3d
amd/lower_mem_access_bit_sizes: fix shared access when bytes<bit_size/8
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.
This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
e89b22280f
amd/lower_mem_access_bit_sizes: be more careful with 8/16-bit scratch load
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.3
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
8829fc3bd6
amd/lower_mem_access_bit_sizes: improve subdword/unaligned SMEM lowering
...
Summary of changes:
- handle unaligned 16-bit scalar loads when supported_dword=true
- increases the size of 8/16/32/64-bit buffer loads which are not dword
aligned, which can create less SMEM loads.
- handles when "bytes" is less than "bit_size / 8"
fossil-db (gfx1201):
Totals from 26 (0.03% of 79839) affected shaders:
Instrs: 12676 -> 12710 (+0.27%); split: -0.30%, +0.57%
CodeSize: 67272 -> 67384 (+0.17%); split: -0.24%, +0.40%
Latency: 44399 -> 44375 (-0.05%); split: -0.09%, +0.04%
SClause: 352 -> 344 (-2.27%)
SALU: 3972 -> 3992 (+0.50%)
SMEM: 554 -> 528 (-4.69%)
fossil-db (navi21):
Totals from 6 (0.01% of 79825) affected shaders:
Instrs: 2192 -> 2186 (-0.27%)
CodeSize: 12188 -> 12140 (-0.39%)
Latency: 10037 -> 10033 (-0.04%); split: -0.12%, +0.08%
SMEM: 124 -> 118 (-4.84%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
79b2fa785d
amd/lower_mem_access_bit_sizes: don't create subdword UBO loads with LLVM
...
These are unsupported.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14127
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:33 +00:00
Georg Lehmann
9e41a7c139
treewide: use nir_load_global alias of nir_build_load_global
...
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959 >
2025-10-21 12:37:58 +02:00
Timur Kristóf
d20049b430
ac/nir/ngg_mesh: Lower num_subgroups to constant
...
Mesh shader workgroups always have the same amount of subgroups.
When the API workgroup size is the same as the real workgroup
size, this is a small optimization (using a constant instead of
a shader arg).
When the API workgroup size is smaller than the real workgroup
size (eg. when the number of output vertices or primitves is
greater than the API workgroup size on RDNA 2), this fixes a
potential bug because num_subgroups would return the "real"
workgroup size instead of the API one.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37947 >
2025-10-20 14:05:40 +00:00
Daniel Schürmann
eecd1c020d
amd: keep ac_shader_config::lds_size unaligned
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
6fd5766620
amd: add and use utility functions for LDS size encoding
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
b651234414
amd: change ac_shader_config::lds_size to bytes
...
We still keep it aligned to allocation granularity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
d0b87a0d5f
ac/nir_flag_smem_for_loads: call divergence analysis internally
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().
Totals from 75 (0.09% of 79839) affected shaders: (Navi48)
Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
9b1a635bb3
amd/common: merge radv_nir_opt_access_speculate() into ac_nir_flag_smem_for_loads()
...
One shader is negatively affected, but we save 2 entire iterations over every shader.
This effect is also mitigated with the next commits.
Totals from 1 (0.00% of 79839) affected shaders: (Navi48)
Instrs: 947 -> 958 (+1.16%)
CodeSize: 4728 -> 4732 (+0.08%)
Latency: 20678 -> 20723 (+0.22%)
InvThroughput: 2697 -> 2698 (+0.04%)
SClause: 26 -> 27 (+3.85%)
Copies: 139 -> 145 (+4.32%)
Branches: 46 -> 47 (+2.17%)
VALU: 460 -> 463 (+0.65%)
SALU: 201 -> 204 (+1.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef
amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
...
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Daniel Schürmann
fbf0399517
amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes
...
This creates more SMEM instruction, mostly because vec3 64bit are being split
instead of overfetched.
Totals from 442 (0.55% of 79839) affected shaders: (Navi48)
Instrs: 288998 -> 289469 (+0.16%); split: -0.04%, +0.21%
CodeSize: 1538212 -> 1541460 (+0.21%); split: -0.03%, +0.24%
Latency: 3010072 -> 3009373 (-0.02%); split: -0.04%, +0.01%
InvThroughput: 885572 -> 885564 (-0.00%); split: -0.00%, +0.00%
VClause: 6900 -> 6885 (-0.22%); split: -0.28%, +0.06%
SClause: 4457 -> 4469 (+0.27%); split: -0.18%, +0.45%
VALU: 162473 -> 162469 (-0.00%)
SALU: 42871 -> 42855 (-0.04%); split: -0.05%, +0.01%
SMEM: 6893 -> 7239 (+5.02%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Georg Lehmann
e26a8be7af
ac/nir: enable nir atomic load/store opts
...
Foz-DB GFX1201:
Totals from 4 (0.00% of 80287) affected shaders:
Instrs: 2928 -> 2920 (-0.27%); split: -0.31%, +0.03%
CodeSize: 15424 -> 15392 (-0.21%); split: -0.23%, +0.03%
Latency: 835578 -> 823220 (-1.48%)
InvThroughput: 3307941 -> 3258515 (-1.49%)
Copies: 459 -> 447 (-2.61%)
VALU: 1297 -> 1291 (-0.46%)
SALU: 595 -> 589 (-1.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822 >
2025-10-14 06:24:17 +00:00
Marek Olšák
3fe651f607
nir: remove load_smem_amd
...
replaced by load_global_amd + ACCESS_SMEM_AMD
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:54:11 +00:00
Daniel Schürmann
3ae2f12eb4
ac/nir: switch load_smem_amd to use load_global
...
Totals from 24920 (31.21% of 79839) affected shaders: (Navi48)
Instrs: 22044185 -> 22413945 (+1.68%); split: -0.01%, +1.68%
CodeSize: 117211728 -> 118623656 (+1.20%); split: -0.01%, +1.21%
VGPRs: 1199008 -> 1198948 (-0.01%)
SpillSGPRs: 7421 -> 7365 (-0.75%); split: -0.78%, +0.03%
SpillVGPRs: 2177 -> 2184 (+0.32%); split: -0.09%, +0.41%
Scratch: 7037952 -> 7038208 (+0.00%)
Latency: 155140452 -> 155530877 (+0.25%); split: -0.02%, +0.27%
InvThroughput: 23601713 -> 23634131 (+0.14%); split: -0.01%, +0.15%
VClause: 458456 -> 458575 (+0.03%); split: -0.09%, +0.11%
SClause: 651928 -> 649405 (-0.39%); split: -1.26%, +0.87%
Copies: 1681110 -> 1677057 (-0.24%); split: -0.42%, +0.17%
Branches: 515419 -> 515322 (-0.02%); split: -0.02%, +0.00%
PreSGPRs: 992903 -> 990545 (-0.24%); split: -0.24%, +0.00%
VALU: 11971995 -> 11967962 (-0.03%); split: -0.04%, +0.00%
SALU: 3247576 -> 3476720 (+7.06%); split: -0.03%, +7.08%
VMEM: 821046 -> 821056 (+0.00%); split: -0.00%, +0.00%
SMEM: 988476 -> 988779 (+0.03%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:54:11 +00:00
Daniel Schürmann
fdd6bdf03d
ac/nir_lower_global_access: don't assume pack_64_2x32 is the same as u2u64
...
It might also be the expanded base address.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:53:58 +00:00
Daniel Schürmann
0209065229
ac/nir_lower_global_access: require no_unsigned wrap when extracting from 32-bit additions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:53:58 +00:00
Rhys Perry
20af16b4d8
aco: use MTBUF for 64-bit atomic load/store
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00
Georg Lehmann
84f26ed117
nir: optimize atomic isub if supported
...
Foz-DB Navi48:
Totals from 1 (0.00% of 80287) affected shaders:
Instrs: 1641 -> 1637 (-0.24%)
CodeSize: 8472 -> 8456 (-0.19%)
Latency: 19132 -> 19131 (-0.01%)
InvThroughput: 9566 -> 9565 (-0.01%)
Copies: 126 -> 125 (-0.79%)
VALU: 565 -> 563 (-0.35%)
SALU: 439 -> 438 (-0.23%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702 >
2025-10-07 14:07:56 +00:00
Daniel Schürmann
0e3bc3d8c0
nir/opt_offsets: call allow_offset_wrap() for try_fold_shared2()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This prevents applying wrapping offsets on GFX6.
Fixes: e1a692f74b ('nir/opt_offsets: allow for unsigned wraps when folding load/store_shared2_amd offsets')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667 >
2025-10-03 07:54:12 +00:00
Daniel Schürmann
93ce29c42e
amd: don't allow unsigned wraps for shared memory offsets on GFX6
...
Fixes: 10266e7b21 ('radv: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets')
Fixes: dd68825feb ('radeonsi: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667 >
2025-10-03 07:54:12 +00:00
Timur Kristóf
d3579190d6
ac/nir/ngg: Fix scalarized mesh primitive indices
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Take the write_mask into account when storing primitive indices,
otherwise they will end up being stored in the wrong place.
Fixes: 8e24d3426d ("ac/nir/ngg: Refactor MS primitive indices for scalarized IO.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610 >
2025-09-29 08:07:54 +00:00
Timur Kristóf
3dc9c1a91e
ac/nir/ngg: Remove dead code for 64-bit mesh shader variables
...
We already lower all 64-bit I/O to 32-bit before this pass,
and the rest of the code here already asserts that I/O variables
must be 32-bit or smaller.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610 >
2025-09-29 08:07:54 +00:00
Daniel Schürmann
10266e7b21
radv: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets
...
Totals from 76 (0.10% of 79839) affected shaders: (Navi48)
Instrs: 237450 -> 237323 (-0.05%); split: -0.05%, +0.00%
CodeSize: 1276732 -> 1275824 (-0.07%); split: -0.07%, +0.00%
Latency: 1123467 -> 1123387 (-0.01%); split: -0.01%, +0.01%
InvThroughput: 364942 -> 364738 (-0.06%); split: -0.06%, +0.00%
Copies: 20654 -> 20636 (-0.09%); split: -0.09%, +0.00%
Branches: 7326 -> 7327 (+0.01%)
PreSGPRs: 5197 -> 5195 (-0.04%)
PreVGPRs: 3395 -> 3396 (+0.03%)
VALU: 96134 -> 96034 (-0.10%)
SALU: 48059 -> 48041 (-0.04%); split: -0.04%, +0.00%
VOPD: 10 -> 8 (-20.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453 >
2025-09-24 14:28:24 +00:00
Rhys Perry
92a2ab8b64
ac/nir: fix progress reporting in ac_nir_lower_tex
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35069 >
2025-09-24 08:20:27 +00:00
Georg Lehmann
714a149396
nir: remove unsigned upper bound config
...
All config information is now either in nir->info or nir->options.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
76a502d75a
ac/nir: set subgroup size for gs copy shader
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37294 >
2025-09-14 13:21:21 +00:00
Georg Lehmann
83326af899
nir/builder: add nir_inverse_ballot_imm
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178 >
2025-09-04 14:03:56 +00:00