Georg Lehmann
46c1bd1147
aco: add a dedicated pass for better float MODE insertion
...
Foz-DB Navi48:
Totals from 14 (0.02% of 80251) affected shaders:
Instrs: 13998 -> 11684 (-16.53%)
CodeSize: 104464 -> 86260 (-17.43%)
Latency: 108722 -> 106667 (-1.89%)
InvThroughput: 100332 -> 100324 (-0.01%)
VClause: 621 -> 595 (-4.19%); split: -4.99%, +0.81%
VALU: 6875 -> 6871 (-0.06%)
SALU: 3256 -> 1015 (-68.83%)
VOPD: 1328 -> 1332 (+0.30%)
Removes the s_setreg spam in FSR4.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746 >
2025-07-10 13:48:50 +00:00
Daniel Schürmann
610a19cf31
aco/isel: allow to select SGPR defs for vectorized bcsel and logical operations
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:37 +00:00
Daniel Schürmann
d7477111d2
aco: split vectorized bcsel and bitwise logic VGPR definitions
...
This has a slightly negative effect on parallel-rdp, but positively affects FSR4.
Totals from 14 (0.02% of 79839) affected shaders: (Navi48)
Instrs: 63543 -> 63646 (+0.16%); split: -0.01%, +0.17%
CodeSize: 352888 -> 353608 (+0.20%); split: -0.02%, +0.23%
Latency: 1822354 -> 1825036 (+0.15%)
InvThroughput: 364683 -> 365738 (+0.29%); split: -0.04%, +0.32%
Copies: 9299 -> 9363 (+0.69%); split: -0.11%, +0.80%
PreVGPRs: 1381 -> 1394 (+0.94%)
VALU: 34511 -> 34575 (+0.19%); split: -0.03%, +0.21%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
764ee3a834
radv: don't lower subdword phis to scalar
...
Totals from 193 (0.24% of 79839) affected shaders: (Navi48)
MaxWaves: 6004 -> 6024 (+0.33%)
Instrs: 169276 -> 166784 (-1.47%); split: -3.01%, +1.53%
CodeSize: 940608 -> 915768 (-2.64%); split: -4.29%, +1.64%
VGPRs: 8012 -> 7716 (-3.69%); split: -3.99%, +0.30%
SpillVGPRs: 185 -> 0 (-inf%)
Scratch: 13568 -> 0 (-inf%)
Latency: 2159787 -> 2147084 (-0.59%); split: -2.86%, +2.28%
InvThroughput: 664022 -> 395859 (-40.38%); split: -42.59%, +2.21%
VClause: 2998 -> 2880 (-3.94%); split: -4.27%, +0.33%
SClause: 3117 -> 3120 (+0.10%)
Copies: 21290 -> 16278 (-23.54%); split: -24.74%, +1.20%
Branches: 4757 -> 4760 (+0.06%); split: -0.34%, +0.40%
PreSGPRs: 7369 -> 7378 (+0.12%); split: -0.11%, +0.23%
PreVGPRs: 4257 -> 3859 (-9.35%); split: -9.94%, +0.59%
VALU: 83173 -> 79804 (-4.05%); split: -5.68%, +1.63%
SALU: 36672 -> 37318 (+1.76%); split: -0.02%, +1.78%
VMEM: 4012 -> 3762 (-6.23%); split: -6.83%, +0.60%
SMEM: 4300 -> 4303 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
fc2fcac04e
aco: allow vectorized nir_op_mov
...
nir_lower_phis_to_scalar() can create these with the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
3f35b1329e
aco: allow subdword vector-definitions on some VALU instructions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
025306a95d
aco/isel: refactor emission of bitwise logical operations
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Georg Lehmann
9e8ba10447
aco/vn: remove dead instructions early
...
Dead p_create_vector/p_split_vector left behind by instruction selection slow down
the other passes and negatively affect extract labels in aco_optimizer.
Foz-DB GFX1201:
Totals from 964 (1.20% of 80251) affected shaders:
MaxWaves: 29206 -> 29030 (-0.60%); split: +0.08%, -0.68%
Instrs: 669369 -> 668842 (-0.08%); split: -0.16%, +0.09%
CodeSize: 3385192 -> 3383216 (-0.06%); split: -0.13%, +0.07%
VGPRs: 46788 -> 46848 (+0.13%); split: -0.85%, +0.97%
Latency: 3985660 -> 3892742 (-2.33%); split: -2.54%, +0.21%
InvThroughput: 538296 -> 536761 (-0.29%); split: -0.38%, +0.10%
VClause: 8336 -> 8418 (+0.98%); split: -0.17%, +1.15%
SClause: 17111 -> 17120 (+0.05%); split: -0.20%, +0.25%
Copies: 44393 -> 44239 (-0.35%); split: -1.25%, +0.91%
PreSGPRs: 45417 -> 45419 (+0.00%)
PreVGPRs: 30401 -> 31644 (+4.09%); split: -0.00%, +4.09%
VALU: 348282 -> 348167 (-0.03%); split: -0.15%, +0.12%
SALU: 121454 -> 121410 (-0.04%); split: -0.04%, +0.01%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825 >
2025-07-09 07:23:09 +00:00
Georg Lehmann
82af226690
aco: remove unused swap_srcs from emit_vop3p_instruction
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825 >
2025-07-09 07:23:09 +00:00
Georg Lehmann
96793fb0c1
aco/isel: implement 16bit vec2 shifts
...
The source bit size mismatch is a bit annoying, but it's still worth it to
vectorize these.
Foz-DB Navi48:
Totals from 85 (0.11% of 80251) affected shaders:
Instrs: 119073 -> 118827 (-0.21%); split: -0.21%, +0.00%
CodeSize: 669604 -> 667552 (-0.31%); split: -0.31%, +0.00%
VGPRs: 4796 -> 4736 (-1.25%)
Latency: 1907685 -> 1901983 (-0.30%); split: -0.32%, +0.02%
InvThroughput: 642603 -> 640680 (-0.30%); split: -0.33%, +0.03%
VClause: 2088 -> 2091 (+0.14%)
Copies: 18300 -> 18394 (+0.51%); split: -0.01%, +0.52%
Branches: 3452 -> 3440 (-0.35%)
VALU: 63378 -> 63144 (-0.37%); split: -0.37%, +0.00%
SALU: 23065 -> 23076 (+0.05%); split: -0.00%, +0.05%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825 >
2025-07-09 07:23:08 +00:00
Daniel Schürmann
2c51a8870d
nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783 >
2025-07-08 15:33:59 +00:00
Rhys Perry
34f1a8f707
aco: handle FPAtomicToDenormModeHazard
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is quite unlikely to happen, but I guess it might be possible and
it's relatively simple to work around.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35884 >
2025-07-07 13:02:43 +00:00
Marek Olšák
4263b49778
ac/nir: remove ngg_scratch LDS ABI, allocate it in the lowering pass
...
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352 >
2025-07-02 20:27:41 +00:00
Rhys Perry
dce1d4ad4c
aco/ra: fix repeated compact_linear_vgprs() in get_reg()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: b7738de4f9 ("aco/ra: rework linear VGPR allocation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13431
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35838 >
2025-07-02 09:26:04 +00:00
Rhys Perry
21c4400278
aco: update ctx.block when inserting discard block
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13432
Backport-to: 25.1
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35833 >
2025-07-01 14:31:11 +00:00
Alyssa Rosenzweig
67237b6f1b
treewide: use nir_break_if
...
Via Coccinelle patch:
@@
expression builder, condition;
@@
-nir_push_if(builder, condition);
-{
-nir_jump(builder, nir_jump_break);
-}
-nir_pop_if(builder, NULL);
+nir_break_if(builder, condition);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35794 >
2025-06-30 14:51:24 -04:00
Natalie Vock
af86cc37d5
aco/spill: Don't spill scratch_rsrc-related temps
...
These temps are used to create the scratch_rsrc. Spilling them will
never benefit anything, because assign_spill_slots will insert code
that keeps them live. Since the spiller assumes all spilled variables
to be dead, this can cause more variables being live than intended and
spilling to fail.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:53 +00:00
Natalie Vock
acf29e403a
aco/spill: Add a null scratch offset if no scratch_offset arg exists
...
Function callees' scratch_rsrc comes with the scratch offset
pre-applied.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:53 +00:00
Natalie Vock
630913e1b4
aco: Introduce static_scratch_rsrc program member
...
Function callees get their scratch resource as a parameter instead of
generating it on-the-fly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:53 +00:00
Natalie Vock
e006f68b11
aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:53 +00:00
Natalie Vock
a5eba11657
aco/isel: Use stack pointer parameter in load/store_scratch
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:53 +00:00
Natalie Vock
4a62b342f3
aco: Add common utility to load scratch descriptor
...
Also modifies the scratch descriptor to take the stack pointer into
account.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:52 +00:00
Natalie Vock
cd2caa5e2b
aco/spill: Use scratch stack pointer
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:52 +00:00
Natalie Vock
22624d6f12
aco: Add scratch stack pointer
...
Function callees shouldn't overwrite caller's stacks.
Track where to write scratch data with a stack pointer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:52 +00:00
Natalie Vock
be89c02be5
aco: Add pseudo instr to calculate a function callee's stack pointer
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031 >
2025-06-26 11:02:52 +00:00
Daniel Schürmann
7620957193
aco/ra: always set fill_operands=true when handling operands
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This makes the behavior consistent and less prone to error.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35735 >
2025-06-26 10:05:07 +00:00
Daniel Schürmann
ee8424d839
aco/ra: always fill moved operands when handling vector-operands
...
update_renames() assumes that killed operands are already removed from
the register file, except for precolored and copy-kill operands.
When dealing with vector-operands, however, unrelated operands might
also be moved, in order to make space.
Fixes: fb689f133e ('aco/ra: handle register assignment of vector-aligned operands')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35735 >
2025-06-26 10:05:07 +00:00
Samuel Pitoiset
e91029c82d
aco: consider that nir_tex_src_{coord,ddx} can be the first source
...
Only -1 means it's not found, but 0 is still valid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35736 >
2025-06-25 17:20:02 +00:00
Georg Lehmann
01d20680e2
aco/optimizer: generalize p_create_vector of split vector opt
...
Foz-DB Navi48:
Totals from 116 (0.14% of 80251) affected shaders:
MaxWaves: 2965 -> 2972 (+0.24%)
Instrs: 145933 -> 144632 (-0.89%); split: -0.91%, +0.02%
CodeSize: 815968 -> 806512 (-1.16%); split: -1.20%, +0.04%
VGPRs: 7240 -> 7144 (-1.33%); split: -1.66%, +0.33%
Latency: 3065858 -> 3063802 (-0.07%); split: -0.11%, +0.05%
InvThroughput: 745395 -> 743506 (-0.25%); split: -0.26%, +0.01%
VClause: 3702 -> 3694 (-0.22%); split: -0.65%, +0.43%
SClause: 3187 -> 3191 (+0.13%)
Copies: 12716 -> 11804 (-7.17%); split: -7.42%, +0.25%
Branches: 3501 -> 3503 (+0.06%)
PreVGPRs: 5400 -> 5327 (-1.35%); split: -1.41%, +0.06%
VALU: 76455 -> 75492 (-1.26%); split: -1.30%, +0.04%
SALU: 23594 -> 23595 (+0.00%); split: -0.00%, +0.01%
VOPD: 1478 -> 1527 (+3.32%); split: +4.67%, -1.35%
Mostly helps FSR4.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35674 >
2025-06-25 11:03:30 +00:00
Georg Lehmann
001cd632ee
aco: select float8 to fp32 conversions
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:27 +00:00
Georg Lehmann
19ca4be6b0
aco/isel: fix get_alu_src with 8bit vec2 source
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:27 +00:00
Georg Lehmann
f047a67fba
nir,aco: optimize FP16_OFVL pattern created by vkd3d-proton
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:27 +00:00
Georg Lehmann
9e6adcbca0
aco: select fp32 to float8 conversions
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:26 +00:00
Georg Lehmann
3a45802514
aco/lower_to_hw: support saturating fp8 conversions
...
Sadly amd only made this behavior controlable with global state.
We add a new pseudo opcode for this purpose and change FP16_OVFL
for each instruction. Ideally we would only do it once for clauses
and after ilp scheduling, but this can be improved in the future.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:25 +00:00
Georg Lehmann
65650cfef8
aco: emit float8 wmma
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:25 +00:00
Rhys Perry
325dfd809a
radv,aco: switch to shader statistics framework
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12756
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35583 >
2025-06-20 09:26:58 +00:00
Rhys Perry
2cfd2d3b1d
aco/tests: add lower_branches tests
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
2025-06-19 10:58:39 +00:00
Rhys Perry
c45482e652
aco: validate that preds/succs match
...
This isn't done in validate_cfg() because that's called less frequently.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
2025-06-19 10:58:39 +00:00
Rhys Perry
85db025cd7
aco: continue when try_remove_simple_block can't remove a predecessor
...
We should update linear_preds so that the predecessors we can remove are
actually removed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
2025-06-19 10:58:38 +00:00
Rhys Perry
5344abbc56
aco/lower_branches: keep blocks with multiple logical successors
...
It might be the case that both the branch and exec mask write in a
divergent branch block are removed. try_remove_simple_block() might then
try to remove it, but fail because it has multiple logical successors.
Instead, just skip these blocks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
2025-06-19 10:58:38 +00:00
Georg Lehmann
001fe8c236
aco: optimize boolean phi with empty else block
...
We can keep the else empty by handling the phi in the "then" block.
Foz-DB Navi21:
Totals from 921 (1.15% of 80065) affected shaders:
Instrs: 4532598 -> 4527309 (-0.12%); split: -0.12%, +0.00%
CodeSize: 24498484 -> 24481780 (-0.07%); split: -0.08%, +0.01%
Latency: 41016915 -> 41020477 (+0.01%); split: -0.10%, +0.11%
InvThroughput: 9998405 -> 9991873 (-0.07%); split: -0.08%, +0.02%
SClause: 128261 -> 128267 (+0.00%)
Copies: 409949 -> 408585 (-0.33%); split: -0.36%, +0.02%
Branches: 169740 -> 169222 (-0.31%); split: -0.58%, +0.27%
PreSGPRs: 64408 -> 64398 (-0.02%)
VALU: 2972521 -> 2972518 (-0.00%)
SALU: 673844 -> 668973 (-0.72%); split: -0.72%, +0.00%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35165 >
2025-06-19 07:32:43 +00:00
Georg Lehmann
88753ddd1d
aco: allow nir divergence to be printed again
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32990 >
2025-06-19 07:02:20 +00:00
Samuel Pitoiset
d23de4918e
aco: add support for image f32 atomic add
...
It's supported on GFX12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35493 >
2025-06-13 08:47:59 +00:00
Pierre-Eric Pelloux-Prayer
3bcbd11a33
aco/isel: fix visit_tex handling of is_sparse
...
For cases when less than 4 components are read, the original code
would compute an incorrect dmask. eg: with a single component + is_sparse,
the dmask was 0x13:
- 0x 3 = coming from nir_def_components_read
- 0x10 = the sparse bit
While it should have at 2 bits set (1 for the color/depth, 1 for tfe).
This caused problem when expand_vector() used the dmask to generate
the final results, because the value for the sparse component was
read from the wrong index.
So after the call to emit_mimg() dmask needs to be adjusted
because the components will be stored in order, so if mask is 0x11
the tfe value would be stored at invalid index=5 (while it should
be at index=1).
This fixes KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupResidency_texture_2d_depth_component16
and KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup_texture_2d_depth_component16
with ACO.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35206 >
2025-06-11 12:11:28 +00:00
Georg Lehmann
f36ac8434c
aco: add a readme entry for v_pk_cvt_u8_f32
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391 >
2025-06-10 07:32:05 +00:00
Georg Lehmann
94c191e6d9
aco: remove p_v_cvt_pk_u8_f32
...
Now unused.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391 >
2025-06-10 07:32:04 +00:00
Georg Lehmann
d95e90ab5f
aco: do not use v_cvt_pk_u8_f32 for f2u8
...
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.
We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.
Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391 >
2025-06-10 07:32:04 +00:00
Natalie Vock
a28515f096
aco/opt: Rename loop header phis
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fossil stats on top of !35269 :
Totals from 133 (0.16% of 81077) affected shaders:
Instrs: 4328456 -> 4327891 (-0.01%)
CodeSize: 22890004 -> 22887732 (-0.01%); split: -0.01%, +0.00%
Latency: 28406452 -> 28404732 (-0.01%)
InvThroughput: 5361458 -> 5361153 (-0.01%)
Copies: 376788 -> 376222 (-0.15%)
VALU: 2429210 -> 2428645 (-0.02%)
VOPD: 57 -> 56 (-1.75%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35270 >
2025-06-09 14:36:44 +00:00
Rhys Perry
00dd0d0dd1
aco: update VALUReadSGPRHazard comment
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35387 >
2025-06-09 10:12:25 +00:00
Rhys Perry
a714a19e16
aco/gfx12: fix VALUReadSGPRHazard with carry-out
...
fossil-db (gfx1201):
Totals from 370 (0.46% of 79653) affected shaders:
Instrs: 3933639 -> 3935914 (+0.06%)
CodeSize: 20743448 -> 20752068 (+0.04%); split: -0.00%, +0.04%
Latency: 26261246 -> 26261921 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 5363675 -> 5363760 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 65f95ae74e ("aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35387 >
2025-06-09 10:12:25 +00:00