On GFX6-7, the 8 and 16-bit integer add reductions use the 32-bit v_add
instruction, which clobbers the VCC register.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10346>
We can just check whether tex_instr is NULL instead.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>
Make sure to preserve signed zeroes.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.
Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
The user-set priority of shaders matters very little, but we hope
this might still help speed up VS input loads especially.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
We learned that the gs_alloc_req is not actually when the export
space allocation happens. So it makes no sense to prioritize it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
Previously, every wave had multiple active lanes read the LDS, and
the data was processed by VALU DPP instructions.
Now, only the first lane reads the LDS in order to avoid bank
conflicts, and the results are processed by SALU.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>
This shouldn't sign-extend.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
This allows to force the VRS rates via RADV_FORCE_VRS, the supported
values are 2x2, 1x2 and 2x1. This supports the primitive shading rate
mode for non GUI elements.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7794>
The upper 32 bits are truncated before converting, which still produces
correct results since they never meaningfully contribute to the result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
The previous code produced incorrect results for inputs outside the
range [INT16_MIN, INT16_MAX].
A problematic case is e.g. i2f16 32768, which previously would be
converted to -32768.0 instead of returning the exactly representable
floating point result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This function has evolved to be a generic helper function used throughout
the file, so having those assumptions written down explicitly and document
unsupported edge cases should help prevent incorrect use.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This doesn't change semantics but allows us to reject this potentially
ambiguous configuration in convert_int in a later change.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
Previously, inputs such as 0x100000000 would have their upper 32-bits
ignored despite being representable by 32-bit floats.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
It used to be that this intrinsic was never created and texture
instructions were always used.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 50881d59e6 ("compiler/spirv: fix image sample queries")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9686>
This allows them to be scheduled properly and simplifies the assembler a
little.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
Keep track of the current loop depth in Program and set the depth inside
Program::insert_block() instead of repeating it every time we insert one.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
Obviously this didn't affect correctness. Not sure about performance.
It also changes enabled_channels to match radeonsi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f29c81f863 ("aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9459>
A Hitman 2 shader does: read64(local_invocation_index() * 4 - 4). This was
likely emitting a ds_read2_b32 on GFX6. For local_invocation_index()=0,
because the first dword was out-of-bounds, the second was likely also
considered out-of-bounds (even though it's not, at offset 0).
Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/3882
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 57e6886f98 ("aco: refactor load_lds to use new helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
This reverts commit 1a0b0e8460.
The bounds checking behaviour of ds_read_b64, ds_read_b96 and ds_read_b128
make this feature very difficult to use safely.
This fixes a blocking artifact in Hitman 2. Previously, it contained:
ds_read_b64(local_invocation_index() * 4 - 4)
For local_invocation_index()=0, the second dword would be considered
out-of-bounds, even though it's at offset 0.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
Since value-numbering no longer works across loops, we no longer need to
use v_readfirstlane_b32.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9288>
This becomes possible as long as we do
val = s_and_b32/64 exec, val
before any subgroup operations.
This precautional instruction can be removed by the
optimizer if 'val' was computed by a VOPC instruction
using the same exec mask.
Totals from 59 (0.04% of 146267) affected shaders (Navi10):
VGPRs: 2808 -> 2816 (+0.28%)
CodeSize: 340888 -> 340852 (-0.01%); split: -0.20%, +0.19%
Instrs: 61733 -> 61625 (-0.17%); split: -0.18%, +0.01%
Cycles: 470636 -> 469112 (-0.32%); split: -0.33%, +0.01%
VMEM: 8091 -> 7993 (-1.21%)
SMEM: 2736 -> 2719 (-0.62%); split: +0.29%, -0.91%
VClause: 1745 -> 1741 (-0.23%)
SClause: 2394 -> 2392 (-0.08%); split: -0.25%, +0.17%
Copies: 3249 -> 3253 (+0.12%); split: -0.62%, +0.74%
Branches: 1210 -> 1206 (-0.33%)
PreSGPRs: 3126 -> 3176 (+1.60%); split: -0.16%, +1.76%
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9195>