Instead, we can rely on the fact that subdword definitions
must preserve the unused bits while dword definitions either
pad or sign-extend.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
This commit introduces a new struct SubdwordSel
in order to ease and clean up the usage of SDWA
selections. This includes removing the distinction
between register-allocated and fixed SDWA selections.
Instead, SDWA selections can now also access the high
bits of subdword variables. Alignment and sizes are
validated accordingly. Size, offset and sign_extend
can be evaluated via helper methods.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
Implement elect using a pseudo-op which is lowered during the
insert_exec_mask pass. This makes it possible to emit a more
optimal sequence when the exec mask is constant.
Fossil DB results on Sienna Cichlid:
Totals from 211 (0.16% of 128647) affected shaders:
CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00%
Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01%
Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00%
InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00%
Copies: 40058 -> 40162 (+0.26%)
Branches: 31209 -> 31211 (+0.01%)
PreSGPRs: 9927 -> 10125 (+1.99%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
Fix up the operand size for v_sad instructions, and implement
the new NIR horizontal add. There is no viable way to do this
in SALU, so let's always use a VGPR destination.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
This allows partial writes and writes to the upper half of the destination.
fossil-db (Sienna Cichlid):
Totals from 135 (0.09% of 149839) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
These will let us make the SDWA optimizer much simpler than if we were to
recognize combinations of shift/and/bfe.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
This allows them to be scheduled properly and simplifies the assembler a
little.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
These opcodes are never used and they always write the carry-out
according to the GCN3 ISA documentation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7569>
For non-arithmetic opcodes such as buffer_load_dword and buffer_load_short,
default to a definition size of 32.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7276>
And add some "tests" to test and document currently unused features of the
framework.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3521>
SMEM does the addition with 64-bits, not 32. So if the original code
relied on wrapping around (for example, for subtraction), it would break.
Apparently swizzled MUBUF accesses also have issues with combining
additions that could overflow. Normal MUBUF accesses seem fine.
fossil-db (Navi):
Totals from 27219 (20.02% of 135946) affected shaders:
CodeSize: 128303256 -> 131062756 (+2.15%); split: -0.00%, +2.15%
Instrs: 24818911 -> 25280558 (+1.86%); split: -0.01%, +1.87%
VMEM: 162311926 -> 177226874 (+9.19%); split: +9.36%, -0.17%
SMEM: 18182559 -> 20218734 (+11.20%); split: +11.53%, -0.34%
VClause: 423635 -> 424398 (+0.18%); split: -0.02%, +0.20%
SClause: 865384 -> 1104986 (+27.69%); split: -0.00%, +27.69%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2748
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
fossil-db (Pitcairn):
Totals from 2172 (1.58% of 137414) affected shaders:
CodeSize: 7109080 -> 7100100 (-0.13%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5623>
The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr,
so we don't consider it a reduction anymore. Additionally, the
code is slightly reorganized in preparation for the GFX6 emulated
bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
The optimizer isn't yet updated to handle this, since lower_to_hw_instr
will be the only user for now.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
This will make it easier to skip unused channels at the start and to split
unaligned loads on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3086>
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
According to the different ISA docs (and to LLVM), this bit seems
to only exists on GFX6-GFX7.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>