Qiang Yu
9c63138ae3
aco: stop emit s_endpgm for first stage of merged shader
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
14022a3a0e
aco: move end program handling to select_shader
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
f3f2311d69
aco: extend max operands in a instruction to 128
...
We get more than 64 operands in p_end_with_regs when radeonsi.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
e2af0b0b3f
aco: add create_end_for_merged_shader
...
For radeonsi merged shader LS/ES part to pass args to next
stage.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
71fd3c2a35
aco: do not fix_exports when separately compiled ngg vs or es
...
For radeonsi not abort when this case, as it does not jump at
last.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Rhys Perry
4c3677094e
aco,nir: add export_row_amd intrinsic
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25040 >
2023-10-24 21:36:06 +00:00
Bas Nieuwenhuizen
d8458c0559
aco: Make RA understand WMMA instructions.
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 >
2023-10-24 13:24:18 +00:00
Bas Nieuwenhuizen
5e7c828c0e
aco: Add WMMA instructions.
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 >
2023-10-24 13:24:18 +00:00
Samuel Pitoiset
f0abdaea9f
amd/llvm,aco,radv: implement NGG streamout with GDS_STRMOUT registers on GFX11
...
According to RadeonSI, this is required for preemption, user queues,
and we only have to wait for VS after streamout which should be more
performant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25284 >
2023-10-12 16:58:22 +00:00
Rhys Perry
15a3515d0b
aco/tests: test that hazards are resolved at the end of shader parts
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
690cb0b211
aco: resolve all possible hazards at the end of shader parts
...
fossil-db (vega10):
Totals from 1266 (2.01% of 63055) affected shaders:
Instrs: 707116 -> 708382 (+0.18%)
CodeSize: 3512452 -> 3517516 (+0.14%)
Latency: 6661724 -> 6666788 (+0.08%)
InvThroughput: 4393626 -> 4393904 (+0.01%); split: -0.00%, +0.01%
fossil-db (navi10):
Totals from 1305 (2.07% of 63015) affected shaders:
Instrs: 719699 -> 722009 (+0.32%)
CodeSize: 3650836 -> 3660076 (+0.25%)
Latency: 5691633 -> 5693933 (+0.04%)
InvThroughput: 1532010 -> 1532024 (+0.00%); split: -0.00%, +0.00%
fossil-db (navi31):
Totals from 1580 (1.99% of 79332) affected shaders:
Instrs: 1678242 -> 1679879 (+0.10%)
CodeSize: 8463464 -> 8470168 (+0.08%)
Latency: 14273661 -> 14275298 (+0.01%)
InvThroughput: 3668049 -> 3668080 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
e4842c0270
aco: consider exec_hi in reads_exec()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
ed3ca5b781
aco: fix s_setreg hazards
...
s_setreg doesn't have any definitions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
ce27875f09
aco: only mitigate VcmpxExecWARHazard when necessary
...
fossil-db (navi10):
Totals from 5059 (8.03% of 63015) affected shaders:
Instrs: 7384947 -> 7351196 (-0.46%)
CodeSize: 39393180 -> 39299196 (-0.24%); split: -0.28%, +0.04%
Latency: 119683018 -> 119585224 (-0.08%); split: -0.08%, +0.00%
InvThroughput: 29647188 -> 29623895 (-0.08%); split: -0.08%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
a73f76750b
aco: fix LdsDirectVMEMHazard WaW with the wrong waitcnt
...
Seems we missed this case.
fossil-db (navi31):
Totals from 24 (0.03% of 79332) affected shaders:
Instrs: 3562 -> 3538 (-0.67%)
CodeSize: 18740 -> 18644 (-0.51%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2cdb3e4b6b ("aco: add VMEMtoScalarWriteHazard tests")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Alyssa Rosenzweig
c39896b17b
nir: Use getters for nir_src::parent_*
...
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671 >
2023-10-10 04:58:05 -04:00
Qiang Yu
5ef7c54829
aco: wait memory ops done before go to next shader part
...
Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.
Other memory load needs to be waited too like:
a = load_mem()
b = ...
if (...) {
wait_mem(a)
store_mem(a)
}
p_end_with_regs(b)
"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.
Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
5ba68f92b4
aco: create exit block for p_end_with_regs to branch to
...
To handle ps discard in radeonsi part mode shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
bf25a7f59b
aco: fix assertion fail when program contains empty block
...
radeonsi may generate empty main shader or an empty exit block
for p_end_with_regs to jump to.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
6eb0910d45
aco: do not fix_exports when program has epilog
...
PS with epilog does not need to fix_exports. And radeonsi use
p_end_with_regs so does not have jump instruction at last.
radeonsi may also have exec restore instruction, so may break
before reach to p_end_with_regs.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
f97a701d89
aco,radv,radeonsi: pass spi ps input ena and addr
...
radeonsi may pass different ena and addr when part mode shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
c9c18d3da5
aco: compact ps expilog color export for radeonsi
...
radeonsi need to compact color export for ps epilog while radv does not.
radv will fill empty color slot, so won't affected by this change.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
1517aa7a8a
aco,radv: add radeonsi spec ps epilog code
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
9972a385fb
aco: simplify export_fs_mrt_color
...
It's now used by ps epilog only.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
77d5966661
aco,radv: rename ps epilog info inputs to colors
...
Will add other mrtz args for radeonsi.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
d8fa106c17
aco,radv: remove unused ps epilog info fields
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
57b0f19582
aco: add create_fs_end_for_epilog for radeonsi
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Qiang Yu
90c901a987
aco: handle ps outputs from radeonsi
...
radeonsi will keep outputs <FRAG_RESULT_DATA0.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Qiang Yu
49250f9fc5
aco: add ps prolog generation for radeonsi
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Qiang Yu
67244fc88a
aco: remove p_end_with_regs from needs_exact()
...
ps needs to handle wqm:
1. main part may compute with args from prolog in wqm mode, so
prolog need to compute these args in wqm mode too.
2. prolog and main part need to end with exact exec, so next
shader part which inherit previous shader part's exec won't
do valid job for helper threads
1 need p_end_with_regs to operate in wqm mode and itself can't
be exact, otherwise some move instruction added by it won't be
in wqm mode so helper threads' compute result is not passed to
next shader part as args.
2 is done by p_end_wqm added by finish_program automatically
after p_end_with_regs.
Piglit tests can trigger the problem:
1. gl-2.1-polygon-stipple-fs
a. ps prolog call discard_if
b. ps main pass wqm exec to epilog
c. ps epilog export color for discarded pixel
2. fs-fwidth-color.shader_test
a. ps prolog need to pass args computed in wqm mode
b. set p_end_with_regs to exact will end wqm mode before
the move instructions, so helper threads's result is not
passed to next shader part
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Qiang Yu
80728a2e71
aco: do not eliminate final exec write when p_end_with_regs block
...
p_end_with_regs just partially end the program, next part need
exec mask to be set correctly. For example p_end_wqm will generate
a exec restore from WQM mode after p_end_with_regs.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Georg Lehmann
7b4f0e714c
aco/gfx11: support vinterp as fma_mix
...
Totals from 718 (0.94% of 76572) affected shaders:
Instrs: 657897 -> 654219 (-0.56%)
CodeSize: 3471668 -> 3457352 (-0.41%); split: -0.41%, +0.00%
VGPRs: 34200 -> 34164 (-0.11%)
Latency: 11687698 -> 11677030 (-0.09%); split: -0.10%, +0.00%
InvThroughput: 1455371 -> 1451537 (-0.26%); split: -0.26%, +0.00%
VClause: 7598 -> 7600 (+0.03%)
SClause: 18293 -> 18241 (-0.28%); split: -0.44%, +0.15%
Copies: 34641 -> 34644 (+0.01%); split: -0.05%, +0.06%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25220 >
2023-10-05 20:02:53 +00:00
Georg Lehmann
7d7657ef74
aco: support v_fma_f32_dpp as fma_mix
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25220 >
2023-10-05 20:02:53 +00:00
Georg Lehmann
5e9fad48bf
aco/gfx11: apply clamp/omod to vinterp
...
Totals from 2504 (3.27% of 76572) affected shaders:
MaxWaves: 74098 -> 74106 (+0.01%)
Instrs: 1829278 -> 1823427 (-0.32%); split: -0.32%, +0.00%
CodeSize: 9775908 -> 9759308 (-0.17%); split: -0.18%, +0.01%
Latency: 13494107 -> 13485390 (-0.06%); split: -0.10%, +0.04%
InvThroughput: 2052428 -> 2048724 (-0.18%); split: -0.18%, +0.00%
VClause: 26637 -> 26640 (+0.01%); split: -0.04%, +0.05%
SClause: 62027 -> 61988 (-0.06%); split: -0.14%, +0.08%
Copies: 73776 -> 73815 (+0.05%); split: -0.07%, +0.12%
PreVGPRs: 84403 -> 84397 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25220 >
2023-10-05 20:02:53 +00:00
Georg Lehmann
34d8fa6185
aco/gfx11: optimize dual source export
...
We can combine dpp with the v_cndmask_b32.
Foz-DB Navi31:
Totals from 222 (0.28% of 79330) affected shaders:
Instrs: 564392 -> 563373 (-0.18%); split: -0.19%, +0.01%
CodeSize: 2867040 -> 2864728 (-0.08%); split: -0.09%, +0.01%
Latency: 4278957 -> 4275199 (-0.09%); split: -0.09%, +0.00%
InvThroughput: 586636 -> 585824 (-0.14%); split: -0.14%, +0.00%
SClause: 20210 -> 20211 (+0.00%); split: -0.02%, +0.02%
Copies: 39763 -> 39778 (+0.04%); split: -0.13%, +0.17%
PreVGPRs: 13924 -> 13922 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25541 >
2023-10-05 10:37:34 +00:00
Rhys Perry
1a268dc59d
aco: disable FI for quad/masked swizzle
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8330
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525 >
2023-10-04 18:53:43 +00:00
Rhys Perry
0e79f76aa5
aco: add fetch_inactive field to DPP instructions
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525 >
2023-10-04 18:53:43 +00:00
Rhys Perry
26fce534b5
aco: shrink DPP8_instruction
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525 >
2023-10-04 18:53:43 +00:00
Georg Lehmann
0e819465b3
aco: print final ir instead if printing asm is unsupported
...
Not a perfect replacement, but it's better than nothing.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25522 >
2023-10-04 08:35:48 +00:00
Georg Lehmann
73605d46dd
aco: assume newer generation will use GFX11 wait_imm packing
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25522 >
2023-10-04 08:35:48 +00:00
Georg Lehmann
73e85c6691
aco: assume new generations are unsupported by clrx
...
clrx hasn't seen any changes since 2021. I guess the only reson to keep it is
GFX6 support.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25522 >
2023-10-04 08:35:48 +00:00
Georg Lehmann
79b767f4c0
aco: remove -0.0 for 32 bit fsign with mul_legacy/omod when denorms are flushed
...
v_mul_legacy_f32 and omod return +0.0 if any operand is +0.0/-0.0.
Foz-DB Navi21:
Totals from 4289 (5.60% of 76572) affected shaders:
Instrs: 8100571 -> 8099319 (-0.02%); split: -0.02%, +0.00%
CodeSize: 43433200 -> 43435088 (+0.00%); split: -0.01%, +0.01%
Latency: 88151566 -> 88147232 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 22966705 -> 22965192 (-0.01%); split: -0.01%, +0.00%
VClause: 190010 -> 190009 (-0.00%)
SClause: 269697 -> 269689 (-0.00%)
Copies: 687294 -> 687296 (+0.00%); split: -0.00%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25347 >
2023-10-02 14:02:49 +00:00
Georg Lehmann
9508cadadb
aco/optimizer: copy propagate to output modifier instructions
...
Foz-DB Navi21:
Totals from 847 (1.11% of 76572) affected shaders:
Instrs: 2331245 -> 2330335 (-0.04%); split: -0.04%, +0.00%
CodeSize: 12451040 -> 12451736 (+0.01%); split: -0.00%, +0.01%
Latency: 26230953 -> 26229153 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 6297802 -> 6296788 (-0.02%); split: -0.02%, +0.00%
VClause: 64527 -> 64528 (+0.00%); split: -0.00%, +0.01%
SClause: 73150 -> 73121 (-0.04%); split: -0.06%, +0.02%
Copies: 180083 -> 179172 (-0.51%); split: -0.53%, +0.02%
PreSGPRs: 62311 -> 62316 (+0.01%)
PreVGPRs: 51720 -> 51710 (-0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25347 >
2023-10-02 14:02:49 +00:00
Georg Lehmann
89f3a5ea37
aco/optimizer: check if we can use omod before labeling it
...
Allows to use omod for v_mul_legacy_f32 regardless of signedZeroInfNaNPreserve
Foz-DB Navi21:
Totals from 15 (0.02% of 76572) affected shaders:
Instrs: 20131 -> 20113 (-0.09%)
CodeSize: 107100 -> 107144 (+0.04%)
Latency: 400789 -> 400470 (-0.08%)
InvThroughput: 62342 -> 62278 (-0.10%)
Copies: 1194 -> 1176 (-1.51%)
PreVGPRs: 787 -> 785 (-0.25%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25347 >
2023-10-02 14:02:49 +00:00
Rhys Perry
558738e3c5
aco: remove zero offset optimization
...
This is done in nir_opt_constant_folding now.
No fossil-db changes on navi31.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25477 >
2023-10-02 10:11:37 +00:00
Rhys Perry
c3a894fb47
aco: disable zero offset optimization for strict WQM coords
...
If we try to do this, we end up using {undef,coordx} as the coordinates
for an image_sample instruction, because we can't shrink the linear VGPR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9767
Fixes: 859e059aa9 ("radv: use fix_derivs_in_divergent_cf")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25477 >
2023-10-02 10:11:37 +00:00
Marek Olšák
6b29c16db8
amd: rename GFX110x to NAVI31-33
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25492 >
2023-09-30 23:08:47 +00:00
Rhys Perry
6518d09601
aco: don't combine DPP into v_cmpx
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25471 >
2023-09-29 18:23:21 +00:00
Rhys Perry
ea633c128c
aco/optimizer_postRA: don't combine DPP across exec on GFX8/9
...
GFX8/9 seem to use FI=0 behaviour.
fossil-db (vega10):
Totals from 1 (0.00% of 63053) affected shaders:
Instrs: 542 -> 570 (+5.17%)
CodeSize: 2928 -> 3040 (+3.83%)
Latency: 2087 -> 2118 (+1.49%)
InvThroughput: 1103 -> 1143 (+3.63%)
Affected shader is from Cyberpunk 2077 fossil.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Cc: 23.2 <mesa-stable>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9784
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25471 >
2023-09-29 18:23:21 +00:00
Georg Lehmann
b91616e800
aco: implement 64bit div find_lsb
...
This can be selected for divergent subgroupBallotFindLSB.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25407 >
2023-09-27 14:47:42 +00:00