Commit graph

3503 commits

Author SHA1 Message Date
Marek Olšák
66dd70adc5 amd: lower load_gs_wave_id_amd in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
923f59c971 amd: lower load_barycentric_at_offset in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
16ab05fad1 amd: lower load_barycentric_pixel/centroid/sample in NIR
radeonsi needs to preserve interp_mode in the arg load.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
7e83f6ca8b amd: lower load_front_face in NIR
radeonsi must do this after si_lower_nir_abi, which optimizes front_face,
but doesn't lower it.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
6ad5225b2a amd: lower load_frag_shading_rate in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
6d2e29ff6e amd: lower load_sample_pos in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
110e474b4f amd: lower load_sample_id in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
684c8da553 amd: lower load_invocation_id in NIR
ACO can't look for it because it's lowered there.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
d281240c57 amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
0d372b043b amd: lower load_local_invocation_id in NIR
This is based on ACO.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
13cb5c7b72 amd: lower load_frag_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
58cb155068 amd: lower load_pixel_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Georg Lehmann
272ff275fa aco/insert_exec: reset top exec for p_discard_if
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12363
Fixes: 31f62a6123 ("aco/insert_exec: don't always reset top exec")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32830>
2025-01-02 15:18:48 +00:00
Georg Lehmann
3da2d96bc5 aco/optimizer: fix signed extract of sub dword temps with SDWA
If an instruction didn't already use SDWA convert_to_SDWA in apply_extract
will add ubyte0/uword0 selections for v1b/v2b operands. This loses information
that the instruction doesn't care about the high bits and makes the next
apply_extract_twice fail.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Fixes: 6cb9d39bc2 ("aco: combine extracts with sub-dword definitions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32803>
2025-01-02 09:33:18 +00:00
Timur Kristóf
01bf998e17 aco: Update documentation
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32766>
2024-12-31 23:01:23 +00:00
Georg Lehmann
43fca7fffe amd: support load_front_face_fsign
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann
aee0c7274c amd: switch to FRONT_FACE_ALL_BITS(0)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:34 +00:00
Georg Lehmann
6a6b26dfa5 aco: create v_cmpx with s_andn2(exec, v_cmp)
Foz-DB Navi21:
Totals from 3928 (4.95% of 79395) affected shaders:
Instrs: 1155370 -> 1151154 (-0.36%)
CodeSize: 6332192 -> 6314616 (-0.28%)
Latency: 11955231 -> 11933281 (-0.18%); split: -0.18%, +0.00%
InvThroughput: 1842283 -> 1841822 (-0.03%); split: -0.03%, +0.00%
SALU: 175431 -> 171215 (-2.40%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Georg Lehmann
42512208d8 aco/insert_exec: exit shader using exec for top level discard
Totals from 14538 (18.31% of 79395) affected shaders:
no changes

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Georg Lehmann
6b35d6f75b aco: allow p_exit_early_if_not with exec condition
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Georg Lehmann
c279e63a79 aco: rename p_early_exit_if to if_not
It exits the shaders if the condition is false, not true.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Georg Lehmann
33a73203b0 aco/isel: skip and(exec) for top level demote_if/terminate_if
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.

At the top level, the masks are the same and no additional invocations
can be accidentally disabled.

Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>
2024-12-26 18:34:38 +00:00
Marek Olšák
de996ac481 radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.

It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders

Some of the samplemask code wasn't completely correct, but probably harmless.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
2024-12-24 12:02:20 +00:00
Qiang Yu
dff14d102d aco: fix voffset missing when buffer store base >=4096
Regression on test:
  dEQP-GLES31.functional.geometry_shading.basic.output_256

voffset is missing if buffer store base >=4096, we need to
re-calculate offen after resolve_excess_vmem_const_offset().

Fixes: cdaf269924 ("aco: inline store_vmem_mubuf/emit_single_mubuf_store")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767>
2024-12-24 01:42:45 +00:00
Marek Olšák
85c20def94 ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.

Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Qiang Yu
d38efee8ef aco: enable gfx12 support for radeonsi
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Rhys Perry
53d0187bab aco: decrease max_workgroup_size
Match the limit of radeonsi and RADV.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>
2024-12-12 17:38:46 +00:00
Rhys Perry
87f2f77960 aco: fix max_workgroup_count[0]
This is necessary for radeonsi.

fossil-db (navi21):
Totals from 292 (0.37% of 79395) affected shaders:
Instrs: 305965 -> 306182 (+0.07%); split: -0.00%, +0.07%
CodeSize: 1624816 -> 1627212 (+0.15%); split: -0.00%, +0.15%
Latency: 5244652 -> 5243587 (-0.02%); split: -0.07%, +0.05%
InvThroughput: 1221089 -> 1225285 (+0.34%); split: -0.04%, +0.38%
Copies: 22712 -> 22702 (-0.04%)
PreSGPRs: 10713 -> 10712 (-0.01%)
PreVGPRs: 10918 -> 10920 (+0.02%)
VALU: 178613 -> 178836 (+0.12%)
SALU: 43490 -> 43493 (+0.01%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>
2024-12-12 17:38:46 +00:00
Daniel Schürmann
26a3038b65 aco/lower_branches: remove edges between blocks if there is no direct branch
This way, linear predecessors and successors better reflect the
actual control flow which improves wait state insertion and hazard
mitigation.

Totals from 10252 (12.91% of 79395) affected shaders: (Navi31)

Instrs: 18824540 -> 18803823 (-0.11%); split: -0.11%, +0.00%
CodeSize: 99025464 -> 98942028 (-0.08%); split: -0.08%, +0.00%
Latency: 169291854 -> 165781877 (-2.07%); split: -2.07%, +0.00%
InvThroughput: 29701086 -> 29228602 (-1.59%); split: -1.59%, +0.00%
SClause: 510587 -> 510586 (-0.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>
2024-12-12 08:46:22 +00:00
Daniel Schürmann
22ffe72022 aco: move branch lowering optimization into separate file 'aco_lower_branches.cpp'
No fossil changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>
2024-12-12 08:46:22 +00:00
Friedrich Vock
845660f2b7 aco/lower_to_hw_instr: Check the right instruction's opcode
instr is the branch instruction, its opcode won't ever be writelane. We
should check inst instead.

Found by inspection.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>
2024-12-12 08:46:21 +00:00
Daniel Schürmann
28ab7f0168 aco/jump_threading: remove branch sequence optimization
This optimization gets applied during postRA optimization, now.

No fossil changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>
2024-12-12 08:11:22 +00:00
Daniel Schürmann
fcd94a8ca7 aco: move try_optimize_branching_sequence() to postRA optimizations
Totals from 196 (0.25% of 79206) affected shaders: (Navi31)

Instrs: 534343 -> 534438 (+0.02%); split: -0.00%, +0.02%
CodeSize: 2774852 -> 2775420 (+0.02%); split: -0.00%, +0.02%
Latency: 7103512 -> 7103021 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 959477 -> 959447 (-0.00%)
Copies: 42646 -> 42648 (+0.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>
2024-12-12 08:11:21 +00:00
Daniel Schürmann
95d44c7ce0 aco/optimizer_postRA: set branch()->never_taken if exec is constant non-zero
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>
2024-12-12 08:11:21 +00:00
Daniel Schürmann
d67932f69e aco/print_ir: don't print disconnected empty blocks
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>
2024-12-12 08:11:21 +00:00
Daniel Schürmann
22881712c8 aco/assembler: Don't emit target basic block index when chaining branches
This could erroneously cause an assertion to fail if the
target block index was larger than UINT16_MAX.

Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32599>
2024-12-11 23:28:55 +00:00
Georg Lehmann
65506e635b aco/ra: don't write to scc/ttmp with s_fmac
Fixes: 4bd229ac50 ("aco/gfx11.5: select SOP2 float instructions")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>
2024-12-11 12:51:18 +00:00
Georg Lehmann
0b9e2a5427 aco/ra: disallow s_cmpk with scc operand
Fixes: 2d6b0a4177 ("aco/optimizer: Optimize SOPC with literal to SOPK.")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>
2024-12-11 12:51:18 +00:00
Georg Lehmann
fe0c72caec aco/ra: don't write to exec/ttmp with mulk/addk/cmovk
ttmp sgprs are readonly outside of trap handlers, so the instructions were
probably skipped. RA should also never create additional exec writes.

Fixes: e06773281b ("aco/ra: Optimize some SOP2 instructions with literal to SOPK.")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>
2024-12-11 12:51:18 +00:00
Georg Lehmann
576a2e798c aco/gfx12: don't assume memory operations complete in order
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32569>
2024-12-11 12:22:59 +00:00
Samuel Pitoiset
553eb1a3fd radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported
On AMD hardware, it's possible to export a separate alpha channel for
applying alpha-to-one after alpha-to-coverage and not before.

On GFX11+, it's already mostly supported but alpha needs to be exported
to MRTZ.a and one to MRT0.a. The hw always uses alpha for
alpha-to-coverage from MRTZ.a.

On older generations, the driver needs the same separate alpha export
but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE
which selects alpha from MRTZ.a.

This should fix alpha-to-coverage with alpha-to-one when either
depth, stencil or samplemask are exported but it still needs a slightly
different solution without MRTZ. I will fix that later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
2024-12-11 10:50:31 +00:00
Samuel Pitoiset
70047e6bd6 aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11
For FS epilogs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
2024-12-11 10:50:31 +00:00
Georg Lehmann
4a977ea24f aco/gfx11+: use v_and_b32 to extract local id 0
Foz-DB Navi31:
Totals from 2561 (3.23% of 79206) affected shaders:
CodeSize: 10399004 -> 10389120 (-0.10%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532>
2024-12-10 11:58:21 +00:00
Daniel Schürmann
b64fff7731 aco: remove definition from Pseudo branch instructions
They are not needed anymore.

Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)

Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
7e4687fd04 aco: remove definition from SOPP branch instructions
Totals from 17942 (22.60% of 79395) affected shaders: (Navi31)

Instrs: 20334063 -> 20312676 (-0.11%); split: -0.11%, +0.00%
CodeSize: 108458732 -> 108377540 (-0.07%); split: -0.08%, +0.00%
Latency: 180510540 -> 180479666 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 28079325 -> 28077938 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
cab5639a09 aco/assembler: chain branches instead of emitting long jumps
As regular branch instructions cannot jump further than
32768 dwords, previously we used long jumps as fallback
solution. The disadvantage of that is that an extra SGPR
pair must be provided in order to temporarily store the PC.

This patch changes that to chained branch instructions by
inserting an artificial extra block into the code to be
targeted by the original branch. This block contains a
single branch instruction jumping to the original target.
Before the block, if necessary, we insert a <branch 1>
instruction for the existing code in order to jump over
the newly inserted block.

Only a few RT shaders are affected.

Totals from 29 (0.04% of 79395) affected shaders: (Navi31)

CodeSize: 17281176 -> 17276332 (-0.03%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
c3d777d8ac aco/assembler: change ctx.loop_header to uint32_t instead of Block*
We are about to add new blocks during assembly which makes
pointers into a vector unreliable.
Also, only set it if the loop has no back-edge.

Totals from 126 (0.16% of 79206) affected shaders: (Navi31)

CodeSize: 1486152 -> 1488152 (+0.13%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
592f3fd994 aco/assembler: Actually insert s_inst_prefetch instructions when aligning blocks for loops
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
b92afdbd28 aco/assembler: constify assembly functions
Ensure that instruction formats and special operands
are not manipulated during assembly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
3a02bbd916 aco/print_asm: allow for empty blocks with arbitrary offsets
We will add empty blocks at the end of the shader,
in order to store some branch offset information.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00