Commit graph

1344 commits

Author SHA1 Message Date
Daniel Schürmann
62a92417ef aco: move instruction selection files to /compiler/instruction selection/ subfolder
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34977>
2025-05-16 11:01:19 +00:00
Daniel Schürmann
7adad4fc0e aco/isel: assert that terminate intrinsics don't appear in loops
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Daniel Schürmann
46f6c73d36 aco/isel: remove check for empty exec mask on uniform continues
This could only happen after terminate_if inside loops.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Daniel Schürmann
2b0536e921 aco: remove block_kind_continue_or_break workaround and tests
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Georg Lehmann
f5a5905e37 aco: support nir_op_bfdot2_bfadd
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:26 +00:00
Georg Lehmann
5ca98bf99e aco: support bf16 wmma
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Georg Lehmann
e8f5c335ff radv,aco,nir: keep the A and B base type for cmat_muladd_amd
With bfloat16, and the two fp8 formats in the future, using just the bit size
to identify the types is no longer possible.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Samuel Pitoiset
dae45adc9d aco: adjust an assertion in select_trap_handler_shader()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34840>
2025-05-09 09:04:57 +00:00
Samuel Pitoiset
ae6d3df139 radv,aco: dump more SQ_WAVE registers from the trap handler on GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34840>
2025-05-09 09:04:57 +00:00
Samuel Pitoiset
50a01a6559 radv: fix save/restore SCC in the trap handler on GFX12
SQ_WAVE_STATE_PRIV now contains SCC instead of SQ_WAVE_STATUS.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34840>
2025-05-09 09:04:57 +00:00
Rhys Perry
4fa1c92862 aco/gfx12: allow 8/16-bit smem loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
13b0131edc aco/gfx12: select dwordx3 smem loads
fossil-db (gfx1201):
Totals from 47814 (60.24% of 79377) affected shaders:
MaxWaves: 1436871 -> 1436855 (-0.00%); split: +0.00%, -0.00%
Instrs: 36653621 -> 36649461 (-0.01%); split: -0.07%, +0.06%
CodeSize: 194102884 -> 194076060 (-0.01%); split: -0.06%, +0.04%
VGPRs: 2267944 -> 2269648 (+0.08%); split: -0.01%, +0.08%
SpillSGPRs: 8301 -> 8295 (-0.07%); split: -0.08%, +0.01%
Latency: 249627561 -> 249631829 (+0.00%); split: -0.04%, +0.04%
InvThroughput: 40004042 -> 40003575 (-0.00%); split: -0.02%, +0.02%
VClause: 680488 -> 680429 (-0.01%); split: -0.08%, +0.07%
SClause: 1062835 -> 1066206 (+0.32%); split: -0.20%, +0.52%
Copies: 2393981 -> 2393607 (-0.02%); split: -0.23%, +0.22%
Branches: 728117 -> 728113 (-0.00%); split: -0.00%, +0.00%
VALU: 20358269 -> 20358585 (+0.00%); split: -0.01%, +0.01%
SALU: 4737317 -> 4736411 (-0.02%); split: -0.07%, +0.06%
SMEM: 1712349 -> 1710075 (-0.13%); split: -0.13%, +0.00%
VOPD: 5808 -> 5813 (+0.09%); split: +0.12%, -0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
90a5c93ea5 aco: prepare for dwordx3 smem loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
208d62430f aco/gfx12: use s_load_dwordx3 to load ray launch sizes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:49 +00:00
Rhys Perry
cbd718506b aco: add smem opcode helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:49 +00:00
Rhys Perry
752f5f317e aco: replace max_const_offset_plus_one with max_const_offset
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34766>
2025-05-01 09:19:02 +00:00
Rhys Perry
a85ebe16b3 aco: fix max_const_offset_plus_one overflow
smem_offset_max is UINT32_MAX on GFX7 and setting
max_const_offset_plus_one to 0 causes divisions by zero later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: c26851b80b ("aco: increase max_const_offset_plus_one for SMEM load_global")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34766>
2025-05-01 09:19:02 +00:00
Rhys Perry
6338ed44c5 aco/gfx12: increase maximum vbuffer offset
fossil-db (gfx1201):
Totals from 301 (0.38% of 79377) affected shaders:
Instrs: 2734478 -> 2728816 (-0.21%); split: -0.21%, +0.00%
CodeSize: 14347476 -> 14306568 (-0.29%)
Latency: 15508055 -> 15502202 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 2846419 -> 2842387 (-0.14%); split: -0.14%, +0.00%
VClause: 68286 -> 68101 (-0.27%); split: -0.30%, +0.03%
SClause: 49487 -> 49500 (+0.03%)
Copies: 207179 -> 206093 (-0.52%); split: -0.57%, +0.04%
Branches: 72941 -> 72942 (+0.00%); split: -0.00%, +0.00%
VALU: 1549156 -> 1544727 (-0.29%); split: -0.29%, +0.00%
SALU: 339620 -> 338989 (-0.19%); split: -0.19%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
c26851b80b aco: increase max_const_offset_plus_one for SMEM load_global
fossil-db (gfx1201):
Totals from 1115 (1.40% of 79377) affected shaders:
Instrs: 1473805 -> 1467571 (-0.42%); split: -0.43%, +0.01%
CodeSize: 7852972 -> 7819656 (-0.42%); split: -0.44%, +0.02%
SpillSGPRs: 1632 -> 1460 (-10.54%); split: -11.27%, +0.74%
Latency: 11975762 -> 11971915 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 2496961 -> 2496448 (-0.02%); split: -0.03%, +0.01%
VClause: 25213 -> 25218 (+0.02%); split: -0.00%, +0.02%
SClause: 28822 -> 28565 (-0.89%); split: -1.41%, +0.52%
Copies: 106377 -> 105715 (-0.62%); split: -1.23%, +0.61%
Branches: 27497 -> 27473 (-0.09%)
PreSGPRs: 52071 -> 51310 (-1.46%)
VALU: 871051 -> 870694 (-0.04%); split: -0.04%, +0.00%
SALU: 186090 -> 181811 (-2.30%); split: -2.32%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
f390893a64 aco/gfx12: use s_sub_u64
fossil-db (gfx1201):
Totals from 2 (0.00% of 79377) affected shaders:
Instrs: 243999 -> 243993 (-0.00%)
CodeSize: 1288176 -> 1288152 (-0.00%)
Latency: 1894093 -> 1894091 (-0.00%)
InvThroughput: 378819 -> 378818 (-0.00%)
SALU: 33048 -> 33044 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
5b4813c4f0 aco/gfx12: use s_add_u64
fossil-db (gfx1201):
Totals from 122 (0.15% of 79377) affected shaders:
Instrs: 3640138 -> 3637577 (-0.07%); split: -0.07%, +0.00%
CodeSize: 19133796 -> 19120080 (-0.07%); split: -0.08%, +0.01%
SpillSGPRs: 666 -> 650 (-2.40%)
SpillVGPRs: 2147 -> 2159 (+0.56%)
Scratch: 254208 -> 255232 (+0.40%)
Latency: 21529337 -> 21522317 (-0.03%); split: -0.04%, +0.00%
InvThroughput: 4048519 -> 4047233 (-0.03%); split: -0.03%, +0.00%
VClause: 90453 -> 90455 (+0.00%)
SClause: 67846 -> 67674 (-0.25%); split: -0.28%, +0.03%
Copies: 287449 -> 287476 (+0.01%); split: -0.04%, +0.05%
Branches: 104526 -> 104530 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 9795 -> 9723 (-0.74%); split: -0.78%, +0.04%
VALU: 2004219 -> 2003031 (-0.06%); split: -0.06%, +0.00%
SALU: 492651 -> 491737 (-0.19%); split: -0.19%, +0.00%
VMEM: 161317 -> 161341 (+0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Georg Lehmann
8f3489f351 aco/isel: create WMMA with constant C matrix if possible
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:57 +00:00
Georg Lehmann
6d7e67d986 nir,amd: add neg_lo/hi modifiers to cmat_matmul_amd
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:55 +00:00
Georg Lehmann
b0c8f31600 aco: set opsel_hi to 1 for WMMA
This is ignored by the hardware but LLVM requires it to disassemble GFX12 WMMA.

Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:54 +00:00
Konstantin Seurer
978e9b670e aco,nir: Add support for new GFX12 ray tracing instructions
Adds image_bvh_dual_intersect_ray and image_bvh8_intersect_ray which can
handle the new BVH format. Both instructions write up to 10 VGPRs so
they need to use a vec16 definition in nir.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
b9e506afd4 aco: Add support for multiple definitions in emit_mimg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
3d8db3cbbb aco: Make private_segment_buffer/scratch_offset per-resume
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.

This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.

Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114>
2025-04-09 14:21:37 +00:00
Georg Lehmann
c70dcd1451 aco/gfx9+: use d16 global/scratch/buffer loads
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.

Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346>
2025-04-04 16:20:39 +00:00
Georg Lehmann
7631b10984 aco: implement mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:16 +00:00
Daniel Schürmann
afc605bc9b aco: Remove empty exec skipping after demote
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)

Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72 aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
Also force-enalbe helpers in case of demote in divergent CF.

Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)

Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a aco: don't assume that demote doesn't cause an empty exec mask
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%

Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Georg Lehmann
3b5e537b09 aco/gfx11.5: remove vinterp ddx/ddy path
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.

Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
2025-03-12 11:31:54 +00:00
Georg Lehmann
20dd6dfa12 aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0
It doesn't require SCC and this is more consistent with b2f.

Foz-DB Navi21:
Totals from 2107 (2.64% of 79789) affected shaders:
Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00%
CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00%
Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00%
VClause: 171572 -> 171573 (+0.00%)
SClause: 257528 -> 257530 (+0.00%)
Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02%
VALU: 4189422 -> 4189418 (-0.00%)
SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Ivan Avdeev
ff6504d4c0 radv: add experimental support for AMD BC-250 board
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.

It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html

Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
2025-03-04 08:07:31 +00:00
Daniel Schürmann
52253da783 aco: unify get_addr_sgpr_from_waves() and get_addr_vgpr_from_waves() into one function
which returns the limit as RegisterDemand.

Also remove the unused get_extra_sgprs() from aco_ir.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Rhys Perry
539f9b4ba6 nir,aco,radv: add align_mul/offset to buffer_amd intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Daniel Schürmann
1a8a643bbd aco/isel: track control flow divergence in loops more accurately
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.

Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)

Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
583c3586fe aco/isel: remove loop nest information from exec_info
Since we never enter loops with an empty exec mask, and the
control flow is structured, we don't need to consider the
loop nest depth.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
a77258346c aco/isel: fix assumptions about potential empty exec mask in nested control flow
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
44216e035f aco/isel: add and use exec_info::empty() helper
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
8e8398832c aco/isel: use cf_context in loop_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
8b9c9fb904 aco/isel: use cf_context in if_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
c2bfc05d71 aco/isel: rename cf_context::has_divergent_branch
Make it more consistent with cf_context::has_branch.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
0c5a91b9f2 aco/isel: move cf_info into separate struct cf_context
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
61fa007e48 aco/isel: fix empty exec tracking for uniform branches
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)

Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Konstantin Seurer
60a20bcf3d nir: Stop using instructions for debug info
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.

The old approach also stops worrking if passes reorder instructions.

This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
2025-01-30 20:14:01 +00:00
Marek Olšák
f98613d47c aco: implement replacement of sample_mask_in with helper_invocation in PS prolog
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
a842f198d7 aco: simplify how broadcast_last_cbuf is implemented in PS epilog
So PS epilogs only need a single bool flag that determines whether all
enabled color buffers should be written.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
5c4f737b84 aco: implement replacing frag_coord with pixel_coord in PS prolog
This adds an option to replace frag_coord.xy with pixel_coord when sample
shading is disabled, which is most of the time. This reduces the number of
input VGPRs.

It's already implement in ac_nir_lower_ps_early for monolithic shaders.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00