Marek Olšák
de996ac481
radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
...
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.
It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders
Some of the samplemask code wasn't completely correct, but probably harmless.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713 >
2024-12-24 12:02:20 +00:00
Qiang Yu
dff14d102d
aco: fix voffset missing when buffer store base >=4096
...
Regression on test:
dEQP-GLES31.functional.geometry_shading.basic.output_256
voffset is missing if buffer store base >=4096, we need to
re-calculate offen after resolve_excess_vmem_const_offset().
Fixes: cdaf269924 ("aco: inline store_vmem_mubuf/emit_single_mubuf_store")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767 >
2024-12-24 01:42:45 +00:00
Marek Olšák
85c20def94
ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads
...
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.
Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673 >
2024-12-18 11:07:59 +00:00
Qiang Yu
d38efee8ef
aco: enable gfx12 support for radeonsi
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570 >
2024-12-16 07:35:07 +00:00
Rhys Perry
53d0187bab
aco: decrease max_workgroup_size
...
Match the limit of radeonsi and RADV.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577 >
2024-12-12 17:38:46 +00:00
Rhys Perry
87f2f77960
aco: fix max_workgroup_count[0]
...
This is necessary for radeonsi.
fossil-db (navi21):
Totals from 292 (0.37% of 79395) affected shaders:
Instrs: 305965 -> 306182 (+0.07%); split: -0.00%, +0.07%
CodeSize: 1624816 -> 1627212 (+0.15%); split: -0.00%, +0.15%
Latency: 5244652 -> 5243587 (-0.02%); split: -0.07%, +0.05%
InvThroughput: 1221089 -> 1225285 (+0.34%); split: -0.04%, +0.38%
Copies: 22712 -> 22702 (-0.04%)
PreSGPRs: 10713 -> 10712 (-0.01%)
PreVGPRs: 10918 -> 10920 (+0.02%)
VALU: 178613 -> 178836 (+0.12%)
SALU: 43490 -> 43493 (+0.01%); split: -0.02%, +0.03%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577 >
2024-12-12 17:38:46 +00:00
Daniel Schürmann
26a3038b65
aco/lower_branches: remove edges between blocks if there is no direct branch
...
This way, linear predecessors and successors better reflect the
actual control flow which improves wait state insertion and hazard
mitigation.
Totals from 10252 (12.91% of 79395) affected shaders: (Navi31)
Instrs: 18824540 -> 18803823 (-0.11%); split: -0.11%, +0.00%
CodeSize: 99025464 -> 98942028 (-0.08%); split: -0.08%, +0.00%
Latency: 169291854 -> 165781877 (-2.07%); split: -2.07%, +0.00%
InvThroughput: 29701086 -> 29228602 (-1.59%); split: -1.59%, +0.00%
SClause: 510587 -> 510586 (-0.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389 >
2024-12-12 08:46:22 +00:00
Daniel Schürmann
22ffe72022
aco: move branch lowering optimization into separate file 'aco_lower_branches.cpp'
...
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389 >
2024-12-12 08:46:22 +00:00
Friedrich Vock
845660f2b7
aco/lower_to_hw_instr: Check the right instruction's opcode
...
instr is the branch instruction, its opcode won't ever be writelane. We
should check inst instead.
Found by inspection.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389 >
2024-12-12 08:46:21 +00:00
Daniel Schürmann
28ab7f0168
aco/jump_threading: remove branch sequence optimization
...
This optimization gets applied during postRA optimization, now.
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330 >
2024-12-12 08:11:22 +00:00
Daniel Schürmann
fcd94a8ca7
aco: move try_optimize_branching_sequence() to postRA optimizations
...
Totals from 196 (0.25% of 79206) affected shaders: (Navi31)
Instrs: 534343 -> 534438 (+0.02%); split: -0.00%, +0.02%
CodeSize: 2774852 -> 2775420 (+0.02%); split: -0.00%, +0.02%
Latency: 7103512 -> 7103021 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 959477 -> 959447 (-0.00%)
Copies: 42646 -> 42648 (+0.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330 >
2024-12-12 08:11:21 +00:00
Daniel Schürmann
95d44c7ce0
aco/optimizer_postRA: set branch()->never_taken if exec is constant non-zero
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330 >
2024-12-12 08:11:21 +00:00
Daniel Schürmann
d67932f69e
aco/print_ir: don't print disconnected empty blocks
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330 >
2024-12-12 08:11:21 +00:00
Daniel Schürmann
22881712c8
aco/assembler: Don't emit target basic block index when chaining branches
...
This could erroneously cause an assertion to fail if the
target block index was larger than UINT16_MAX.
Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32599 >
2024-12-11 23:28:55 +00:00
Georg Lehmann
65506e635b
aco/ra: don't write to scc/ttmp with s_fmac
...
Fixes: 4bd229ac50 ("aco/gfx11.5: select SOP2 float instructions")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545 >
2024-12-11 12:51:18 +00:00
Georg Lehmann
0b9e2a5427
aco/ra: disallow s_cmpk with scc operand
...
Fixes: 2d6b0a4177 ("aco/optimizer: Optimize SOPC with literal to SOPK.")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545 >
2024-12-11 12:51:18 +00:00
Georg Lehmann
fe0c72caec
aco/ra: don't write to exec/ttmp with mulk/addk/cmovk
...
ttmp sgprs are readonly outside of trap handlers, so the instructions were
probably skipped. RA should also never create additional exec writes.
Fixes: e06773281b ("aco/ra: Optimize some SOP2 instructions with literal to SOPK.")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545 >
2024-12-11 12:51:18 +00:00
Georg Lehmann
576a2e798c
aco/gfx12: don't assume memory operations complete in order
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32569 >
2024-12-11 12:22:59 +00:00
Samuel Pitoiset
553eb1a3fd
radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported
...
On AMD hardware, it's possible to export a separate alpha channel for
applying alpha-to-one after alpha-to-coverage and not before.
On GFX11+, it's already mostly supported but alpha needs to be exported
to MRTZ.a and one to MRT0.a. The hw always uses alpha for
alpha-to-coverage from MRTZ.a.
On older generations, the driver needs the same separate alpha export
but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE
which selects alpha from MRTZ.a.
This should fix alpha-to-coverage with alpha-to-one when either
depth, stencil or samplemask are exported but it still needs a slightly
different solution without MRTZ. I will fix that later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523 >
2024-12-11 10:50:31 +00:00
Samuel Pitoiset
70047e6bd6
aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11
...
For FS epilogs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523 >
2024-12-11 10:50:31 +00:00
Georg Lehmann
4a977ea24f
aco/gfx11+: use v_and_b32 to extract local id 0
...
Foz-DB Navi31:
Totals from 2561 (3.23% of 79206) affected shaders:
CodeSize: 10399004 -> 10389120 (-0.10%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532 >
2024-12-10 11:58:21 +00:00
Daniel Schürmann
b64fff7731
aco: remove definition from Pseudo branch instructions
...
They are not needed anymore.
Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)
Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
7e4687fd04
aco: remove definition from SOPP branch instructions
...
Totals from 17942 (22.60% of 79395) affected shaders: (Navi31)
Instrs: 20334063 -> 20312676 (-0.11%); split: -0.11%, +0.00%
CodeSize: 108458732 -> 108377540 (-0.07%); split: -0.08%, +0.00%
Latency: 180510540 -> 180479666 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 28079325 -> 28077938 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
cab5639a09
aco/assembler: chain branches instead of emitting long jumps
...
As regular branch instructions cannot jump further than
32768 dwords, previously we used long jumps as fallback
solution. The disadvantage of that is that an extra SGPR
pair must be provided in order to temporarily store the PC.
This patch changes that to chained branch instructions by
inserting an artificial extra block into the code to be
targeted by the original branch. This block contains a
single branch instruction jumping to the original target.
Before the block, if necessary, we insert a <branch 1>
instruction for the existing code in order to jump over
the newly inserted block.
Only a few RT shaders are affected.
Totals from 29 (0.04% of 79395) affected shaders: (Navi31)
CodeSize: 17281176 -> 17276332 (-0.03%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
c3d777d8ac
aco/assembler: change ctx.loop_header to uint32_t instead of Block*
...
We are about to add new blocks during assembly which makes
pointers into a vector unreliable.
Also, only set it if the loop has no back-edge.
Totals from 126 (0.16% of 79206) affected shaders: (Navi31)
CodeSize: 1486152 -> 1488152 (+0.13%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
592f3fd994
aco/assembler: Actually insert s_inst_prefetch instructions when aligning blocks for loops
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
b92afdbd28
aco/assembler: constify assembly functions
...
Ensure that instruction formats and special operands
are not manipulated during assembly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Daniel Schürmann
3a02bbd916
aco/print_asm: allow for empty blocks with arbitrary offsets
...
We will add empty blocks at the end of the shader,
in order to store some branch offset information.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Rhys Perry
ab26b99c2c
aco: don't CSE p_shader_cycles_hi_lo_hi
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: fae2a85d57 ("aco/gfx12: implement subgroup shader clock")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12243
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32500 >
2024-12-06 14:06:05 +00:00
Georg Lehmann
b2464e3609
aco/gfx12+: do not use v_pack_b32_f16 to pack untyped data
...
GFX12 removed IEEE_MODE, and made its signalling NaN quieting the default.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12251
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32502 >
2024-12-06 12:33:05 +00:00
Georg Lehmann
7425e71ae0
aco/gfx12: disable vinterp ddx/ddy optimization
...
This only seems to work on gfx11 and gfx11.5, and it's only faster on gfx11.5.
We could continue to use vinterp, with constants copied to vgprs, but
whether that's beneficial depends on the shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12250
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32495 >
2024-12-06 12:01:39 +00:00
Rhys Perry
fd19ff0b9e
aco: force linear for event_vmem_sample and event_vmem_bvh
...
I don't know if this issue affects GFX12, but workaround it anyway to be
safe.
fossil-db (gfx1200):
Totals from 3463 (4.36% of 79395) affected shaders:
Instrs: 9794280 -> 9833253 (+0.40%); split: -0.00%, +0.40%
CodeSize: 52306040 -> 52457988 (+0.29%); split: -0.01%, +0.30%
Latency: 90549385 -> 93617517 (+3.39%); split: -0.00%, +3.39%
InvThroughput: 13189030 -> 13602942 (+3.14%); split: -0.00%, +3.14%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32373 >
2024-12-02 10:13:39 +00:00
Rhys Perry
86c63b29bc
aco/gfx12: insert wait between VMEM WaW
...
https://github.com/llvm/llvm-project/pull/105549
fossil-db (gfx1200):
Totals from 1783 (2.25% of 79395) affected shaders:
Instrs: 7398391 -> 7404566 (+0.08%); split: -0.00%, +0.08%
CodeSize: 38862456 -> 38886364 (+0.06%); split: -0.00%, +0.06%
Latency: 83191513 -> 84211504 (+1.23%); split: -0.00%, +1.23%
InvThroughput: 15185936 -> 15345744 (+1.05%); split: -0.01%, +1.06%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32373 >
2024-12-02 10:13:39 +00:00
Timur Kristóf
e2b8c4a9ac
radv, aco: Consolidate num_interp + num_prim_interp into num_inputs.
...
num_inputs contains the total number of FS inputs.
Note that this also fixes a bug where some calculations in RADV
and ACO were missing the per-primitive attributes from the LDS
usage of PS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32220 >
2024-11-28 18:14:57 +00:00
Rhys Perry
4c3809e7fc
aco: use small_vec in RegCounterMap
...
This seems to be a little faster.
insert_NOPs (navi31):
Difference at 95.0% confidence
-11.484 +/- 6.13377
-1.62767% +/- 0.860593%
(Student's t, pooled s = 5.71913)
insert_NOPs (gfx1200):
Difference at 95.0% confidence
-35.6745 +/- 4.97972
-8.1236% +/- 1.10453%
(Student's t, pooled s = 4.6431)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32374 >
2024-11-28 17:07:34 +00:00
Rhys Perry
7a500c8b22
aco: make small_vec copyable
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32374 >
2024-11-28 17:07:34 +00:00
Georg Lehmann
fd669fa69d
aco/optimizer: label fcanonicalize like a copy if there is nothing to flush
...
Allows copy propagation into non alu instructions like phis.
Foz-DB Navi21:
Totals from 138 (0.17% of 79395) affected shaders:
Instrs: 308135 -> 307792 (-0.11%); split: -0.12%, +0.01%
CodeSize: 1567924 -> 1566484 (-0.09%); split: -0.10%, +0.01%
VGPRs: 9696 -> 9720 (+0.25%)
Latency: 1162719 -> 1161663 (-0.09%); split: -0.10%, +0.00%
InvThroughput: 256944 -> 256590 (-0.14%); split: -0.15%, +0.01%
VClause: 5631 -> 5626 (-0.09%); split: -0.14%, +0.05%
Copies: 29962 -> 30028 (+0.22%); split: -0.10%, +0.32%
Branches: 8241 -> 8237 (-0.05%)
PreVGPRs: 7800 -> 7797 (-0.04%)
VALU: 216243 -> 215898 (-0.16%); split: -0.17%, +0.01%
SALU: 30768 -> 30767 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32355 >
2024-11-27 15:14:31 +00:00
Rhys Perry
63b0692eac
aco: don't use uniform continues if exec might be empty
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
aa0ede751d
aco/tests: add tests for empty exec masks
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
f35e229fae
aco: skip code if exec is empty
...
This is safer and potentially faster.
fossil-db (navi21):
Totals from 690 (0.87% of 79395) affected shaders:
Instrs: 4534778 -> 4535916 (+0.03%)
CodeSize: 25268516 -> 25272080 (+0.01%); split: -0.00%, +0.01%
Latency: 48482721 -> 48513907 (+0.06%); split: -0.00%, +0.07%
InvThroughput: 13213965 -> 13217828 (+0.03%); split: -0.00%, +0.03%
Copies: 432307 -> 432295 (-0.00%); split: -0.05%, +0.04%
Branches: 187305 -> 188249 (+0.50%)
VALU: 2904490 -> 2904508 (+0.00%); split: -0.00%, +0.00%
SALU: 674962 -> 675133 (+0.03%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
f00c3a14c0
aco: require WQM after demote in control flow
...
fossil-db (navi21):
Totals from 424 (0.53% of 79395) affected shaders:
Instrs: 404496 -> 404752 (+0.06%); split: -0.07%, +0.13%
CodeSize: 2150608 -> 2151616 (+0.05%); split: -0.05%, +0.09%
Latency: 9124298 -> 9115957 (-0.09%); split: -0.12%, +0.03%
InvThroughput: 1883570 -> 1883468 (-0.01%); split: -0.01%, +0.00%
VClause: 6832 -> 6830 (-0.03%)
SClause: 13801 -> 13778 (-0.17%); split: -0.17%, +0.01%
Copies: 26758 -> 26673 (-0.32%); split: -0.44%, +0.12%
Branches: 9819 -> 9567 (-2.57%)
PreSGPRs: 17902 -> 17934 (+0.18%)
SALU: 45407 -> 45906 (+1.10%); split: -0.01%, +1.11%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
8a175b02bc
aco: use repair pass for LCSSA workaround
...
This makes instruction selection simpler and fixes potential issues with
allocated_vec or the optimizer moving SGPR uses out of the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:59 +00:00
Rhys Perry
5de990f5a9
aco: add SSA repair pass
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143 >
2024-11-25 10:32:58 +00:00
Daniel Schürmann
bb87832ce0
aco/insert_NOPs: add early exit to handle_valu_partial_forwarding_hazard_instr
...
No need to continue if there was already a hazard found in
a different control flow path.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32191 >
2024-11-22 08:46:32 +00:00
Daniel Schürmann
07df37ba01
aco/insert_NOPs: use RegCounterMap as replacement for the CounterMap implementation
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32191 >
2024-11-22 08:46:32 +00:00
Daniel Schürmann
fb5e5adfb3
aco/insert_NOPs: implement vector-based RegCounterMap as replacement for VGPRCounterMap
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32191 >
2024-11-22 08:46:32 +00:00
Georg Lehmann
f3926c9d4e
aco/isel: use undef Operands for p_create_vector created from nir vecs
...
Foz-DB Navi31:
Totals from 27464 (34.59% of 79395) affected shaders:
Instrs: 9595601 -> 9535260 (-0.63%); split: -0.63%, +0.00%
CodeSize: 47900112 -> 47658648 (-0.50%); split: -0.50%, +0.00%
Latency: 43928471 -> 43918448 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 4940105 -> 4903447 (-0.74%); split: -0.75%, +0.01%
Copies: 667294 -> 604603 (-9.39%); split: -9.39%, +0.00%
VALU: 5282264 -> 5219604 (-1.19%); split: -1.19%, +0.00%
VOPD: 342 -> 311 (-9.06%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32249 >
2024-11-21 14:09:52 +00:00
Daniel Schürmann
1ff9a0fe80
aco: remove Pseudo_instruction::tmp_in_scc
...
This information is redundant, now.
No fossil-changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32217 >
2024-11-20 11:04:32 +00:00
Daniel Schürmann
a1a4a6061c
aco/ra: explicitly assign scratch SGPR for linear phis
...
We are about to remove the branch definitions which previously
served this purpose. Also remove Block::scc_live_out.
Some changes due to round-robin RA.
Totals from 939 (1.18% of 79395) affected shaders: (Navi31)
Instrs: 5038786 -> 5038611 (-0.00%); split: -0.01%, +0.00%
CodeSize: 26153412 -> 26152904 (-0.00%); split: -0.00%, +0.00%
Latency: 41649989 -> 41650120 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 6447508 -> 6447536 (+0.00%); split: -0.00%, +0.00%
SClause: 131319 -> 131276 (-0.03%); split: -0.03%, +0.00%
Copies: 359362 -> 359256 (-0.03%); split: -0.05%, +0.02%
SALU: 639275 -> 639169 (-0.02%); split: -0.03%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32217 >
2024-11-20 11:04:32 +00:00
Daniel Schürmann
17da551133
aco/ra: use bitset for sgpr_operands_alias_defs
...
We cannot rely on SGPR Temps being fully aligned to 64 SGPRs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32217 >
2024-11-20 11:04:32 +00:00