Commit graph

3422 commits

Author SHA1 Message Date
Samuel Pitoiset
034014a165 aco: restore m0/exec before exiting the trap handler
Dumping VGPRs will overwrite m0 and exec and they need to be restored
if we want to return to the shader.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Samuel Pitoiset
185a165a85 aco: fix validation for v_movrels_b32 and friends
m0 is the second operand.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Samuel Pitoiset
40b343bbee aco: add a new variant for vop1() with two operands
For v_movrels_b32 and friends which need a second operand for m0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Samuel Pitoiset
f4cf6a71ed aco: use a 64-bit mov to save exec in the trap handler shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Rhys Perry
17cc8a5a54 aco: remove load byte_align
8/16-bit loads given to instruction selection now always use VMEM and
scalar load instructions unless alignment easily allows a vector load.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
d3ae1842a2 aco,ac/nir: flag loads to use smem in NIR
This pass will be re-used later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
0619e4db63 nir,aco,ac/llvm: add nir_op_alignbyte_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
db0cbb7e9b aco: optimize nir_op_shfr with <32 src1
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Samuel Pitoiset
0c77469995 aco: fix saving/restoring VGPRS in the trap handler on GFX9
When ADD_TID_ENABLE=1, DATA_FORMAT is STRIDE[14:17], so the stride
was too large.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32109>
2024-11-13 11:12:54 +00:00
Georg Lehmann
7e8a08ae77 aco: use nir_def_all_uses_ignore_sign_bit
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31844>
2024-11-12 18:03:57 +00:00
Samuel Pitoiset
2d5df46c25 aco: emit nir_intrinsic_debug_break
s_trap is used to enter the trap.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32061>
2024-11-12 16:05:17 +00:00
Samuel Pitoiset
5f79b8ea2d radv,aco: save/restore overwritten VGPRs in the trap handler shader
The trap currently doesn't return to the shader but it will be needed
for example for the debug mode.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
6ec0c85908 radv,aco: use the trap handler layout struct while compiling the shader
It's less error prone to rely on the layout for offsets.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
6bfd92123f aco: simplify postprocessing the trap handler shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
44dfeb4479 radv,aco: add a separate function to compile the trap handler shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
62e335c779 radv,aco: dump more SQ_WAVE regs from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
107f29c39a aco: do not reorder s_trap instructions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32055>
2024-11-11 15:46:36 +00:00
Konstantin Seurer
69ebba82d4 aco: Pass debug information to the driver
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Konstantin Seurer
f8ef1afec8 aco: Handle nir_debug_info_instr
Propagated debug info using p_debug_info and Program::debug_info.
Offsets into the shader binary are gathered during assembly.
This will be usefull for mapping back the disassembled shader to
nir, glsl or spirv.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Samuel Pitoiset
437bd63265 radv,aco: dump m0 and exec from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
d1d41be43f aco: declare phys regs for tba_hi/tma_hi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
13bab450a2 aco: fix storing SQ_WAVE_STATUS in the trap handler shader
SQ_WAVE_STATUS can change inside the trap because of SCC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
494050d2ea aco: add a helper to dump SGPR to memory for the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
8c6f2fef1b aco: use scalar buffer stores for dumping SGPRS from the trap on GFX8
This avoids using any VGPRs on GFX8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
17f6b4e51e aco: save/restore SCC in the trap handler shader
SCC is only updated on GFX9+ but let's do it by default because the
trap handler shader is likely going to be more and more complex over
time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
7b4386facd aco: cleanup using fixed registers in the trap handler shader
It's easier to read and potentially less error prone.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Rhys Perry
215c44c124 aco: apply extract to v_cvt_f32_ubyte0
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
f1a932bc29 aco: apply extract to p_extract_vector
fossil-db (navi21):
Totals from 46 (0.06% of 79395) affected shaders:
Instrs: 80126 -> 79944 (-0.23%); split: -0.27%, +0.04%
CodeSize: 486860 -> 485668 (-0.24%); split: -0.31%, +0.06%
Latency: 1615395 -> 1614218 (-0.07%); split: -0.07%, +0.00%
InvThroughput: 705479 -> 705013 (-0.07%); split: -0.07%, +0.00%
Copies: 18934 -> 18797 (-0.72%); split: -0.98%, +0.25%
VALU: 52452 -> 52268 (-0.35%); split: -0.41%, +0.06%
SALU: 17253 -> 17255 (+0.01%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
6cb9d39bc2 aco: combine extracts with sub-dword definitions
fossil-db (navi21):
Totals from 23 (0.03% of 79395) affected shaders:
Instrs: 55133 -> 55099 (-0.06%)
CodeSize: 335744 -> 335512 (-0.07%)
Latency: 1709146 -> 1709031 (-0.01%)
InvThroughput: 613788 -> 613713 (-0.01%)
Copies: 14405 -> 14407 (+0.01%); split: -0.03%, +0.04%
VALU: 37038 -> 37000 (-0.10%)
SALU: 11125 -> 11131 (+0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
30af7ae44f aco: add and use apply_extract_twice helper
This will be used in the next commit.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
05d0fa894e aco: allow applying sign-extended sel to p_extract more often
In the case of v1=p_extract(v1=p_extract(src, 0, 16, 1), 0, 32, 0).
When we apply extracts with sub-dword definitions, this will also
include v2b=p_extract(v2b=p_extract(src, 0, 8, 1), 0, 16, 0).

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
e47bc3e750 aco: shrink code size of some p_extract
fossil-db (navi21):
Totals from 37 (0.05% of 79395) affected shaders:
CodeSize: 2048204 -> 2047836 (-0.02%)

fossil-db (navi31):
Totals from 307 (0.39% of 79395) affected shaders:
CodeSize: 3075732 -> 3065236 (-0.34%); split: -0.34%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
d285333800 aco: add a bit more p_extract/p_insert validation
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
d3ac69f79b aco: handle SGPR limitations when applying extract
We were already doing this, but missing it in a few places.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
07e28dad75 aco: disallow p_extract(,,32,)
Nothing uses these.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
f528597906 aco: check for SDWA before applying extract to lshl/cvt_f32
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
6ce51ea168 aco/gfx11: fix v1b=p_extract(src, 0, 16, 0)
This is weird, but the SDWA path supports this.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
b318fe47e9 aco: don't byte align global VMEM loads if it might be unsafe
Using the byte align path can be unsafe even when 12 byte loads are
supported.

fossil-db (navi21):
Totals from 185 (0.23% of 79395) affected shaders:
Instrs: 391501 -> 391575 (+0.02%); split: -0.03%, +0.05%
CodeSize: 2147336 -> 2147672 (+0.02%); split: -0.03%, +0.05%
Latency: 3762613 -> 3860941 (+2.61%); split: -0.01%, +2.62%
InvThroughput: 871429 -> 888013 (+1.90%); split: -0.08%, +1.98%
VClause: 9712 -> 10210 (+5.13%)
Copies: 53775 -> 53010 (-1.42%); split: -1.46%, +0.04%
VALU: 254009 -> 252146 (-0.73%)
SALU: 56698 -> 56699 (+0.00%); split: -0.00%, +0.00%
VMEM: 18503 -> 19601 (+5.93%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 391bf3ea30 ("aco: don't expand smem/mubuf global loads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31807>
2024-11-06 19:07:16 +00:00
Georg Lehmann
2cd8a9fef7 amd: lower gl_FragCoord.w rcp in NIR
This allows NIR to remove the rcps if the application uses rcp(gl_FragCoord.w).
D3D provides w, not 1/w like GL/VK in the shader, so this is commonly used.

Foz-DB Navi21:
Totals from 2068 (2.61% of 79206) affected shaders:
MaxWaves: 45636 -> 45652 (+0.04%)
Instrs: 2173444 -> 2169671 (-0.17%); split: -0.18%, +0.00%
CodeSize: 11881304 -> 11867208 (-0.12%); split: -0.12%, +0.01%
VGPRs: 118000 -> 117968 (-0.03%)
Latency: 35689676 -> 35675909 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 9167199 -> 9159801 (-0.08%); split: -0.08%, +0.00%
VClause: 45076 -> 45078 (+0.00%); split: -0.01%, +0.02%
SClause: 92503 -> 92366 (-0.15%); split: -0.31%, +0.17%
Copies: 140282 -> 140303 (+0.01%); split: -0.13%, +0.14%
Branches: 53347 -> 53346 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 96495 -> 96465 (-0.03%)
VALU: 1522980 -> 1519252 (-0.24%); split: -0.25%, +0.01%
SALU: 213451 -> 213460 (+0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31967>
2024-11-06 12:57:08 +00:00
Rhys Perry
5375d77488 aco: wait for scratch stores to complete before dealloc_vgprs
fossil-db (navi31):
Totals from 392 (0.49% of 79395) affected shaders:
Instrs: 5052043 -> 5054100 (+0.04%)
CodeSize: 26701200 -> 26709428 (+0.03%)
Latency: 43614861 -> 43615368 (+0.00%)
InvThroughput: 7353147 -> 7353216 (+0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
575f24d19f aco: don't emit early exit over dealloc_vgprs
fossil-db (navi31):
Totals from 3308 (4.17% of 79395) affected shaders:
Instrs: 387145 -> 375373 (-3.04%)
CodeSize: 2018276 -> 1964380 (-2.67%)
Latency: 6588004 -> 6549068 (-0.59%)
InvThroughput: 458792 -> 457025 (-0.39%); split: -0.39%, +0.00%
Branches: 10710 -> 7402 (-30.89%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
295b7d606f aco: insert NOP before dealloc_vgpr in the insert_NOPs pass
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
4dfc564669 aco: fix printing of block_kind_discard_early_exit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:04 +00:00
Rhys Perry
0ad713ca9f aco: add waitcnt build helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:04 +00:00
Samuel Pitoiset
32a537b25b aco: use inlined constant offsets for storing SGPRs in the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31976>
2024-11-05 11:55:24 +00:00
Samuel Pitoiset
9bcf17ef5a aco: add support for the trap handler shader on GFX11
This has been verified on navi31.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
6d5a2ae928 aco: clear the current wave exception in the trap handler
This is required to re-enable VALU instructions in this wave, only
float exception seem to be affected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
e85fc0f869 aco: fix validation for VOP1 instructions without any dest/src
Like v_clrexcp.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
81f4670ed6 radv,aco: dump all SGPRS from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Georg Lehmann
a58d2b59e9 aco: implement load_pixel_coord
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:30 +00:00