Commit graph

121 commits

Author SHA1 Message Date
Marek Olšák
509f0e62ad ac/nir/tess: allow passing explicit patch_offset to VMEM/LDS offset calculations
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:39 +00:00
Marek Olšák
a59464b6e3 radv,radeonsi: precompute and pass TCS per-vertex output stride via a user SGPR
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.

tcs_offchip_layout has 5 unused bits, so let's use them.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:39 +00:00
Marek Olšák
534b282573 ac/nir/tess: adjust memory layout of TCS outputs to have aligned store offsets
There is a comment that explains it.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Marek Olšák
80236f2367 ac/nir/tess: add if/endif for HS threads in NIR instead of ACO/LLVM
This just removes the if/endif wrapping for LLVM, and hopefully the ACO
change does the same thing.

ACO had redundant code in endif_merged_wave_info, which is removed here.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Marek Olšák
cd366b57d9 ac/nir: implement load_subgroup_id/local_invocation_index for TCS on gfx6-10.x
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Marek Olšák
c3034fa82c amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>
2025-06-04 17:46:38 +00:00
Karol Herbst
4f5ce2d5aa ac/nir: fix unaligned single component load/stores
This fixes two problems:
1. we need to lower the bit_size according to the alignment.
2. num_components could end up being 0, so we need to round up instead.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13102
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34976>
2025-06-03 13:14:31 +00:00
Samuel Pitoiset
fe2c93a788 ac/nir: enable 64-bit lowering for bitfield_extract
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35187>
2025-05-29 08:45:41 +02:00
Marek Olšák
6e4154b7ef ac/nir: fix export_ps_outputs not preserving divergence metadata
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34492>
2025-05-14 20:19:16 +00:00
Georg Lehmann
a2209547db ac/nir: enable nir_op_bfdot2_bfadd
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:26 +00:00
Georg Lehmann
f364303084 ac/nir: set lower_bfloat16_conversions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:26 +00:00
Rhys Perry
3b42626973 ac/nir: allow 8/16-bit smem loads
fossil-db (gfx1201):
Totals from 295 (0.37% of 79377) affected shaders:
Instrs: 314018 -> 313355 (-0.21%); split: -0.22%, +0.00%
CodeSize: 1697996 -> 1696528 (-0.09%); split: -0.11%, +0.02%
Latency: 4197719 -> 4197106 (-0.01%)
InvThroughput: 1258891 -> 1258744 (-0.01%)
PreSGPRs: 12232 -> 12230 (-0.02%)
SALU: 66762 -> 66269 (-0.74%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
5b116c4de9 ac/nir: allow vectorization of unsupported 8/16-bit loads
These can later be lowered to a vectorized 32-bit load.

fossil-db (gfx1201):
Totals from 1259 (1.59% of 79377) affected shaders:
MaxWaves: 36821 -> 36817 (-0.01%)
Instrs: 4363702 -> 4355749 (-0.18%); split: -0.23%, +0.05%
CodeSize: 22779980 -> 22706504 (-0.32%); split: -0.37%, +0.05%
VGPRs: 69672 -> 69792 (+0.17%); split: -0.02%, +0.19%
SpillSGPRs: 675 -> 673 (-0.30%)
Latency: 26684053 -> 26663819 (-0.08%); split: -0.11%, +0.03%
InvThroughput: 5617687 -> 5614798 (-0.05%); split: -0.10%, +0.04%
VClause: 106830 -> 106654 (-0.16%); split: -0.17%, +0.00%
SClause: 75523 -> 75495 (-0.04%); split: -0.04%, +0.01%
Copies: 323199 -> 323525 (+0.10%); split: -0.10%, +0.20%
Branches: 109475 -> 109480 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 55036 -> 55040 (+0.01%)
PreVGPRs: 47538 -> 47582 (+0.09%); split: -0.12%, +0.21%
VALU: 2377777 -> 2389977 (+0.51%); split: -0.02%, +0.53%
SALU: 578272 -> 578385 (+0.02%); split: -0.02%, +0.04%
VMEM: 190065 -> 181204 (-4.66%)
SMEM: 99709 -> 99565 (-0.14%)
VOPD: 244 -> 243 (-0.41%); split: +0.41%, -0.82%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
6dbf44ad9c ac/nir: allow less than one register of overfetch
This is to allow vectorization of 8/16-bit loads, which can later be
cheaply lowered to a 32-bit load.

fossil-db (gfx1201):
Totals from 178 (0.22% of 79377) affected shaders:
MaxWaves: 4138 -> 4102 (-0.87%)
Instrs: 619714 -> 617917 (-0.29%); split: -0.32%, +0.03%
CodeSize: 3364396 -> 3352724 (-0.35%); split: -0.38%, +0.03%
VGPRs: 12896 -> 12980 (+0.65%); split: -0.19%, +0.84%
SpillSGPRs: 546 -> 545 (-0.18%)
Latency: 7589585 -> 7406076 (-2.42%); split: -2.45%, +0.04%
InvThroughput: 1926356 -> 1879866 (-2.41%); split: -2.42%, +0.00%
VClause: 12301 -> 11750 (-4.48%)
SClause: 13614 -> 13583 (-0.23%); split: -0.45%, +0.22%
Copies: 82207 -> 82265 (+0.07%); split: -0.10%, +0.17%
Branches: 19284 -> 19266 (-0.09%)
PreSGPRs: 9525 -> 9457 (-0.71%)
PreVGPRs: 12366 -> 12421 (+0.44%)
VALU: 347928 -> 348020 (+0.03%); split: -0.01%, +0.04%
SALU: 82620 -> 82519 (-0.12%); split: -0.19%, +0.07%
VMEM: 22248 -> 21430 (-3.68%)
SMEM: 17951 -> 17843 (-0.60%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
ddef4bddf8 ac/nir: round components when lowering 8/16-bit loads to 32-bit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Marek Olšák
dfc3c1135c ac/nir/tess: don't pass nir_intrinsic_instr to hs_output_lds_offset
It will be used without intrinsics.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:13 +00:00
Marek Olšák
4bbe497d9b ac/nir/tess: don't pass nir_intrinsic_instr to VMEM IO calc helpers
These will be used without intrinsics.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:13 +00:00
Marek Olšák
360494f50d ac/nir/tess: remove unused variables
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:12 +00:00
Marek Olšák
f58c0cbb6a nir: split *_accessed_indirectly* bitmasks into *_read/written_indirectly*
for AMD

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:12 +00:00
Pierre-Eric Pelloux-Prayer
992a340eab ac/nir: init blake3 for cs blit shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34574>
2025-04-23 07:59:10 +00:00
Marek Olšák
d2e016c37d ac/nir: don't store tess levels for TES in TCS if no_varying is set
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>
2025-04-19 22:55:00 -04:00
Marek Olšák
be8977811b ac/nir: remove shader_info parameter from ac_nir_compute_tess_wg_info
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>
2025-04-19 22:55:00 -04:00
Marek Olšák
5fb2de9454 ac/nir: don't include TCS offchip size in LDS_SIZE
This drastically reduces LDS usage for TCS.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>
2025-04-19 22:55:00 -04:00
Marek Olšák
2c122d478b ac/nir: set X=0 for task->mesh shader dispatch when Y or Z is 0
The code set X=0 when Y and Z is 0, not "or".

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34432>
2025-04-16 06:08:48 +00:00
Marek Olšák
27d5be13c6 ac/nir/cull: always do frustum culling, skip only small prim culling
Only small prim culling uses the viewport state, so only that must be
disabled when there are multiple viewports.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Marek Olšák
0f97dc707d ac/nir/cull: rename skip_viewport_culling -> skip_viewport_state_culling
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Marek Olšák
1d5c42528b nir/opt_algebraic: lower 16-bit imul_high & umul_high
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Marek Olšák
ce716d009f ac/nir/cull: cull small prims using a point-triangle intersection test
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is based on Timur Kristof's code, but there are a lot of differences.
The idea is that it doesn't just compute an intersection between a point
and a triangle. It computes the *distance* between a point and a triangle
and it does so in screen space. It accurately takes the subpixel precision
of the rasterizer into account, so that it works optimally at all
resolutions, all MSAA modes, and all quant modes.

The distance computation is only approximated because it only considers
the infinite lines going through triangle edges. However, it seems to be
more than sufficient in practice because the existing rounding-based small
prim culling compensates for it.

The performance improvement is up to 10% in some geometry-bound tests,
though targeted microbenchmarks can show a lot more than that.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33361>
2025-04-01 16:12:22 +00:00
Pierre-Eric Pelloux-Prayer
785df1b980 ac/nir: fix nir_metadata value of ac_nir_lower_image_opcodes
This pass can insert new blocks so 'nir_metadata_control_flow' is not
preserved.

Fixes: eaf98b1422 ("ac/nir: implement image opcode emulation for CDNA, enable it in radeonsi")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34241>
2025-03-31 15:19:29 +02:00
Timur Kristóf
64c6930bfc ac/nir/ngg: Remove cleanup_culling_shader_after_dce.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Not needed anymore, now that the new concept is there.

No Fossil DB changes on Navi 21.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
243a80be44 ac/nir/ngg: Use deferred info for compacted arguments.
This means we don't have to emit dead code anymore and can only
repack the sysvals that are actually used by the deferred part.

No Fossil DB changes on Navi 21.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
0b71293358 ac/nir/ngg: Gather info about what the deferred shader part uses.
Now that the deferred shader part is prepared before emitting
the non-deferred part, we can also gather info about what sysvals
it needs.

No Fossil DB changes on Navi 21.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
e4c91c01e3 ac/nir/ngg: Prepare deferred shader part before adding culling code.
The previous concept was to emit the non-deferred shader part
first, including the culling code, and then modify the
non-deferred part accordingly.

This caused some issues because it was really impossible to tell
which sysvals the deferred part needs after DCE, so we had to
run an additional cleanup pass afterwards.

The new concept is to prepare the deferred part first by applying
reusable variables (from the non-deferred part) and run DCE.
This opens the possibility to accurately gather info about what
the deferred part needs.

This idea is further expanded in the next commits.

Fossil DB stats on Navi 21:

Totals from 17 (0.02% of 79377) affected shaders:
Instrs: 18063 -> 18064 (+0.01%)
CodeSize: 93368 -> 93372 (+0.00%)
Latency: 49889 -> 49899 (+0.02%); split: -0.01%, +0.03%
SALU: 2416 -> 2417 (+0.04%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
e9e58fa412 ac/nir/ngg: Remove inputs_needed_by_*
This information will be collected by NIR core better,
no need to do it here. It is also currently unused.

No functional changes.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
1e7d28a82e ac/nir/ngg: Improve reuse of position value.
Instead of hand-rolled code, use nir_scalar and its
helper functions to reuse the position value.
Results in more copies, which are mitigated by
copy prop from the previous commit.

This helps eliminate some instructions, especially VMEM loads
from the deferred shader part of NGG culling shaders, which
can be reused from the position values calculated by the
non-deferred part.

Fossil DB stats on Navi 21:

Totals from 2472 (3.11% of 79377) affected shaders:
MaxWaves: 78748 -> 78772 (+0.03%)
Instrs: 636342 -> 633739 (-0.41%); split: -0.45%, +0.04%
CodeSize: 3444740 -> 3427172 (-0.51%); split: -0.53%, +0.02%
VGPRs: 62552 -> 62176 (-0.60%)
Latency: 2025711 -> 2019449 (-0.31%); split: -0.73%, +0.42%
InvThroughput: 221140 -> 221946 (+0.36%); split: -0.12%, +0.49%
VClause: 5443 -> 5278 (-3.03%); split: -3.20%, +0.17%
SClause: 8369 -> 8302 (-0.80%); split: -0.82%, +0.02%
Copies: 102435 -> 101652 (-0.76%); split: -0.87%, +0.11%
PreSGPRs: 63714 -> 63533 (-0.28%)
PreVGPRs: 48555 -> 48392 (-0.34%)
VALU: 242165 -> 241457 (-0.29%); split: -0.33%, +0.04%
SALU: 197656 -> 197482 (-0.09%); split: -0.10%, +0.01%
VMEM: 7746 -> 7571 (-2.26%)
SMEM: 10822 -> 10730 (-0.85%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Timur Kristóf
f7a160d501 ac/nir/ngg: Run copy propagation.
Helps eliminate needless copies caused by reusing variables.
Mitigates negative stats from the next commit.

Fossil DB stats on Navi 21:

Totals from 109 (0.14% of 79377) affected shaders:
Instrs: 124480 -> 124486 (+0.00%); split: -0.00%, +0.01%
CodeSize: 651444 -> 651468 (+0.00%); split: -0.00%, +0.00%
Latency: 754120 -> 754116 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 174384 -> 174383 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
2025-03-29 00:47:20 +00:00
Georg Lehmann
8648d7cf75 ac/nir: set has_mul24_relaxed
This is only used by OpenCL.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:16 +00:00
Georg Lehmann
09ff1c28d8 ac/nir/lower_ps_late: consider dcc decompression for null exports
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33835>
2025-03-07 15:00:37 +00:00
Marek Olšák
d2141e6751 ac/nir/ngg: add an option to skip viewport-based culling
We can do W and face culling when we have multiple viewports, but not
frustum and small prim culling because those are dependent on the viewport.
When a shader writes the viewport index, the new option allows skipping
viewport-based culling while keeping W and face culling.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33482>
2025-03-06 21:10:48 +00:00
Marek Olšák
d429e35169 ac/nir/cull: extract a helper calling accept_func
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33482>
2025-03-06 21:10:48 +00:00
Marek Olšák
177c9b173e Revert "ac/nir: clamp vertex color outputs in the right place"
This reverts commit b3fc49686e.

It was a rebase failure.

Fixes: b3fc49686e

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33482>
2025-03-06 21:10:47 +00:00
Marek Olšák
e99efe7164 ac,radeonsi: don't set num_slots/src/dest_type/write_mask when they're set automatically
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33482>
2025-03-06 21:10:47 +00:00
Georg Lehmann
975be7ac5d ac/nir/mem_access_bit_sizes: split unaligned vec3 lds access to allow more read2/write2
Foz-DB Navi21:
Totals from 77 (0.10% of 79377) affected shaders:
Instrs: 69787 -> 68745 (-1.49%); split: -1.51%, +0.02%
CodeSize: 367256 -> 360060 (-1.96%); split: -1.97%, +0.01%
VGPRs: 3896 -> 3880 (-0.41%)
Latency: 335403 -> 335297 (-0.03%); split: -0.11%, +0.08%
InvThroughput: 102766 -> 102931 (+0.16%); split: -0.09%, +0.25%
VClause: 1645 -> 1643 (-0.12%); split: -0.18%, +0.06%
SClause: 1434 -> 1433 (-0.07%)
Copies: 4280 -> 4283 (+0.07%); split: -0.56%, +0.63%
PreVGPRs: 2408 -> 2421 (+0.54%); split: -0.08%, +0.62%
VALU: 45557 -> 45646 (+0.20%); split: -0.10%, +0.29%
SALU: 6458 -> 6474 (+0.25%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33448>
2025-03-01 18:26:54 +00:00
Alyssa Rosenzweig
9a58a8257e treewide: Switch to nir_progress
Via the Coccinelle patch at the end of the commit message, followed by

sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
ninja -C ~/mesa/build clang-format
cd ~/mesa/src/compiler/nir && clang-format -i *.c
agxfmt

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -return true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    -}
    +bool progress = prog_expr;
    +return nir_progress(progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    +nir_progress(prog, impl, metadata);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -return true;
    +return nir_progress(true, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    +return nir_no_progress(impl);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -other_prog |= prog;
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -other_prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -other_prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +nir_progress(impl_progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    +prog = nir_progress(true, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    +nir_no_progress(impl);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    +nir_progress(true, impl, metadata);

squashme! sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
2025-02-26 15:19:53 +00:00
Rhys Perry
2a3dce1b59 ac/nir: fix tess factor optimization when workgroup barriers are reduced
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: b49eab68a8 ("ac/nir: use s_sendmsg(HS_TESSFACTOR) to optimize writing tess factors for gfx11")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12632
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33645>
2025-02-24 14:07:40 +00:00
Timur Kristóf
b8797180e9 ac/nir/ngg: Add bool return value to ac_nir_lower_ngg_mesh.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33609>
2025-02-22 08:54:17 +01:00
Timur Kristóf
cd01e17e81 ac/nir/ngg: Add bool return value to ac_nir_lower_ngg_gs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33609>
2025-02-22 08:54:17 +01:00
Timur Kristóf
25adf353cc ac/nir/ngg: Add bool return value to ac_nir_lower_ngg_nogs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33609>
2025-02-22 08:54:17 +01:00
Timur Kristóf
fad58a99e8 ac/nir: Add bool return value to ac_nir_lower_legacy_gs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33609>
2025-02-22 08:54:17 +01:00
Timur Kristóf
d8ad068968 ac/nir: Add bool return value to ac_nir_lower_legacy_vs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33609>
2025-02-22 08:54:17 +01:00