Samuel Pitoiset
f94bd67b82
aco: fix VS prologs on GFX12
...
MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset
instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904 >
2025-01-07 13:44:32 +00:00
Marek Olšák
0d5b03f2b9
ac/nir: split local_invocation_ids to 3 separate VGPR inputs
...
so that we can set the upper range per VGPR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
ceb6f8fc32
amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR
...
The overwrite instruction complicates it a little, which is why these
intrinsics are lowered together.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
61bfb4fa06
amd: lower load_subgroup_invocation in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
e69f47faee
amd: lower load_local_invocation_index in NIR
...
This is the last intrinsic that needed the LS VGPR bug workaround in ACO
and ac_nir_to_llvm.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
342dcbdc8b
amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR
...
2 things complicate this:
- overwrite_vs_arguments_amd
- the LS VGPR bug workaround
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
66dd70adc5
amd: lower load_gs_wave_id_amd in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
923f59c971
amd: lower load_barycentric_at_offset in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
16ab05fad1
amd: lower load_barycentric_pixel/centroid/sample in NIR
...
radeonsi needs to preserve interp_mode in the arg load.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
7e83f6ca8b
amd: lower load_front_face in NIR
...
radeonsi must do this after si_lower_nir_abi, which optimizes front_face,
but doesn't lower it.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
6ad5225b2a
amd: lower load_frag_shading_rate in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
6d2e29ff6e
amd: lower load_sample_pos in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
110e474b4f
amd: lower load_sample_id in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
684c8da553
amd: lower load_invocation_id in NIR
...
ACO can't look for it because it's lowered there.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
d281240c57
amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
0d372b043b
amd: lower load_local_invocation_id in NIR
...
This is based on ACO.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
13cb5c7b72
amd: lower load_frag_coord in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
58cb155068
amd: lower load_pixel_coord in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Georg Lehmann
43fca7fffe
amd: support load_front_face_fsign
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791 >
2024-12-30 22:31:35 +00:00
Georg Lehmann
aee0c7274c
amd: switch to FRONT_FACE_ALL_BITS(0)
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791 >
2024-12-30 22:31:34 +00:00
Georg Lehmann
33a73203b0
aco/isel: skip and(exec) for top level demote_if/terminate_if
...
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.
At the top level, the masks are the same and no additional invocations
can be accidentally disabled.
Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755 >
2024-12-26 18:34:38 +00:00
Marek Olšák
de996ac481
radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
...
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.
It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders
Some of the samplemask code wasn't completely correct, but probably harmless.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713 >
2024-12-24 12:02:20 +00:00
Qiang Yu
dff14d102d
aco: fix voffset missing when buffer store base >=4096
...
Regression on test:
dEQP-GLES31.functional.geometry_shading.basic.output_256
voffset is missing if buffer store base >=4096, we need to
re-calculate offen after resolve_excess_vmem_const_offset().
Fixes: cdaf269924 ("aco: inline store_vmem_mubuf/emit_single_mubuf_store")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767 >
2024-12-24 01:42:45 +00:00
Marek Olšák
85c20def94
ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads
...
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.
Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673 >
2024-12-18 11:07:59 +00:00
Samuel Pitoiset
553eb1a3fd
radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported
...
On AMD hardware, it's possible to export a separate alpha channel for
applying alpha-to-one after alpha-to-coverage and not before.
On GFX11+, it's already mostly supported but alpha needs to be exported
to MRTZ.a and one to MRT0.a. The hw always uses alpha for
alpha-to-coverage from MRTZ.a.
On older generations, the driver needs the same separate alpha export
but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE
which selects alpha from MRTZ.a.
This should fix alpha-to-coverage with alpha-to-one when either
depth, stencil or samplemask are exported but it still needs a slightly
different solution without MRTZ. I will fix that later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523 >
2024-12-11 10:50:31 +00:00
Samuel Pitoiset
70047e6bd6
aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11
...
For FS epilogs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523 >
2024-12-11 10:50:31 +00:00
Georg Lehmann
4a977ea24f
aco/gfx11+: use v_and_b32 to extract local id 0
...
Foz-DB Navi31:
Totals from 2561 (3.23% of 79206) affected shaders:
CodeSize: 10399004 -> 10389120 (-0.10%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532 >
2024-12-10 11:58:21 +00:00
Daniel Schürmann
b64fff7731
aco: remove definition from Pseudo branch instructions
...
They are not needed anymore.
Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)
Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Georg Lehmann
7425e71ae0
aco/gfx12: disable vinterp ddx/ddy optimization
...
This only seems to work on gfx11 and gfx11.5, and it's only faster on gfx11.5.
We could continue to use vinterp, with constants copied to vgprs, but
whether that's beneficial depends on the shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12250
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32495 >
2024-12-06 12:01:39 +00:00
Rhys Perry
63b0692eac
aco: don't use uniform continues if exec might be empty
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
f35e229fae
aco: skip code if exec is empty
...
This is safer and potentially faster.
fossil-db (navi21):
Totals from 690 (0.87% of 79395) affected shaders:
Instrs: 4534778 -> 4535916 (+0.03%)
CodeSize: 25268516 -> 25272080 (+0.01%); split: -0.00%, +0.01%
Latency: 48482721 -> 48513907 (+0.06%); split: -0.00%, +0.07%
InvThroughput: 13213965 -> 13217828 (+0.03%); split: -0.00%, +0.03%
Copies: 432307 -> 432295 (-0.00%); split: -0.05%, +0.04%
Branches: 187305 -> 188249 (+0.50%)
VALU: 2904490 -> 2904508 (+0.00%); split: -0.00%, +0.00%
SALU: 674962 -> 675133 (+0.03%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
f00c3a14c0
aco: require WQM after demote in control flow
...
fossil-db (navi21):
Totals from 424 (0.53% of 79395) affected shaders:
Instrs: 404496 -> 404752 (+0.06%); split: -0.07%, +0.13%
CodeSize: 2150608 -> 2151616 (+0.05%); split: -0.05%, +0.09%
Latency: 9124298 -> 9115957 (-0.09%); split: -0.12%, +0.03%
InvThroughput: 1883570 -> 1883468 (-0.01%); split: -0.01%, +0.00%
VClause: 6832 -> 6830 (-0.03%)
SClause: 13801 -> 13778 (-0.17%); split: -0.17%, +0.01%
Copies: 26758 -> 26673 (-0.32%); split: -0.44%, +0.12%
Branches: 9819 -> 9567 (-2.57%)
PreSGPRs: 17902 -> 17934 (+0.18%)
SALU: 45407 -> 45906 (+1.10%); split: -0.01%, +1.11%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
8a175b02bc
aco: use repair pass for LCSSA workaround
...
This makes instruction selection simpler and fixes potential issues with
allocated_vec or the optimizer moving SGPR uses out of the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Georg Lehmann
f3926c9d4e
aco/isel: use undef Operands for p_create_vector created from nir vecs
...
Foz-DB Navi31:
Totals from 27464 (34.59% of 79395) affected shaders:
Instrs: 9595601 -> 9535260 (-0.63%); split: -0.63%, +0.00%
CodeSize: 47900112 -> 47658648 (-0.50%); split: -0.50%, +0.00%
Latency: 43928471 -> 43918448 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 4940105 -> 4903447 (-0.74%); split: -0.75%, +0.01%
Copies: 667294 -> 604603 (-9.39%); split: -9.39%, +0.00%
VALU: 5282264 -> 5219604 (-1.19%); split: -1.19%, +0.00%
VOPD: 342 -> 311 (-9.06%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32249 >
2024-11-21 14:09:52 +00:00
Samuel Pitoiset
2abbd361e2
radv,aco: dump LDS from the trap handler
...
Can be useful for debugging.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32182 >
2024-11-20 07:35:47 +00:00
Georg Lehmann
3e037ac2a9
aco/gfx8: use ds_swizzle_b32 rotate mode
...
Despite only being mentioned in the ISA docs since vega, rotate (and fft)
swizzle mode seem to exist since gfx8.
https://github.com/llvm/llvm-project/issues/28975#issuecomment-980964939
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31348 >
2024-11-14 15:34:48 +00:00
Samuel Pitoiset
b4b5f9eeb0
radv,aco: dump VGPRS from the trap handler shader
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090 >
2024-11-13 15:27:54 +00:00
Samuel Pitoiset
c712555a9f
aco: save/restore VGPRS on GFX8 in the trap handler shader
...
This will be needed for dumping VGPRs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090 >
2024-11-13 15:27:54 +00:00
Samuel Pitoiset
a77af57e83
aco: use all invocations from the current wave in the trap handler
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090 >
2024-11-13 15:27:54 +00:00
Samuel Pitoiset
034014a165
aco: restore m0/exec before exiting the trap handler
...
Dumping VGPRs will overwrite m0 and exec and they need to be restored
if we want to return to the shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090 >
2024-11-13 15:27:53 +00:00
Samuel Pitoiset
f4cf6a71ed
aco: use a 64-bit mov to save exec in the trap handler shader
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090 >
2024-11-13 15:27:53 +00:00
Rhys Perry
17cc8a5a54
aco: remove load byte_align
...
8/16-bit loads given to instruction selection now always use VMEM and
scalar load instructions unless alignment easily allows a vector load.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Rhys Perry
d3ae1842a2
aco,ac/nir: flag loads to use smem in NIR
...
This pass will be re-used later.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Rhys Perry
0619e4db63
nir,aco,ac/llvm: add nir_op_alignbyte_amd
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Rhys Perry
db0cbb7e9b
aco: optimize nir_op_shfr with <32 src1
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Samuel Pitoiset
0c77469995
aco: fix saving/restoring VGPRS in the trap handler on GFX9
...
When ADD_TID_ENABLE=1, DATA_FORMAT is STRIDE[14:17], so the stride
was too large.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32109 >
2024-11-13 11:12:54 +00:00
Georg Lehmann
7e8a08ae77
aco: use nir_def_all_uses_ignore_sign_bit
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31844 >
2024-11-12 18:03:57 +00:00
Samuel Pitoiset
2d5df46c25
aco: emit nir_intrinsic_debug_break
...
s_trap is used to enter the trap.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32061 >
2024-11-12 16:05:17 +00:00
Samuel Pitoiset
5f79b8ea2d
radv,aco: save/restore overwritten VGPRs in the trap handler shader
...
The trap currently doesn't return to the shader but it will be needed
for example for the debug mode.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056 >
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
6ec0c85908
radv,aco: use the trap handler layout struct while compiling the shader
...
It's less error prone to rely on the layout for offsets.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056 >
2024-11-12 11:16:13 +00:00