These offsets were wrong and didn't match the old behaviour.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: e8220e106b ("aco: optimize AC_FETCH_FORMAT_SNORM alpha adjust")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8935>
This looks like it was copied from LLVM, which didn't have a fmax
intrinsic.
fossil-db (GFX8):
Totals from 43 (0.03% of 140385) affected shaders:
CodeSize: 49660 -> 49488 (-0.35%)
Instrs: 10434 -> 10348 (-0.82%)
Cycles: 41736 -> 41392 (-0.82%)
VMEM: 13793 -> 13719 (-0.54%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8918>
We were incorrectly shifting the input VGPRs for the instance ID
for chips affected by the LS VGPR init bug (ie. Vega10 and Raven).
When there is no HS threads, the hardware loads the LS VGPR
starting from VGPR 0, so they should be shifted by two.
This fixes some sort of vertex explosion with Squad, Visage, Barn
Finders and probably more titles that use tessellation. Note that
only Vega10 and Raven were affected by this bug.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4129
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3311
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8694>
Previously, we might not have required all coordinates to be in WQM if
there were other args before them. We should probably also require that
the offset is in WQM.
fossil-db (GFX10.3):
Totals from 10053 (7.21% of 139391) affected shaders:
SGPRs: 911032 -> 911048 (+0.00%); split: -0.00%, +0.00%
VGPRs: 689856 -> 688412 (-0.21%); split: -0.26%, +0.05%
CodeSize: 84151460 -> 84140396 (-0.01%); split: -0.02%, +0.01%
MaxWaves: 77526 -> 77527 (+0.00%)
Instrs: 15972106 -> 15971521 (-0.00%); split: -0.01%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4153
Fixes: 4015b3651a ("aco: only require texture coordinates to be in WQM if NSA is used")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8693>
Fixes several dEQP-VK.robustness.robustness2.* tests on GFX8. Generations
other than GFX8 don't fail the tests because bounds-checking is done using
the index (making it per-vertex).
fossil-db (Polaris):
Totals from 1387 (0.99% of 140385) affected shaders:
(no statistics affected)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 03a0d39366 ("aco: use MUBUF in some situations instead of splitting vertex fetches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7834>
From comment in emit_mimg():
We don't need the bias, sample index, compare value or offset to be
computed in WQM but if the p_create_vector copies the coordinates, then it
needs to be in WQM.
fossil-db (GFX10.3):
Totals from 1778 (1.28% of 139391) affected shaders:
SGPRs: 105080 -> 105072 (-0.01%); split: -0.02%, +0.01%
VGPRs: 96800 -> 96776 (-0.02%); split: -0.07%, +0.05%
CodeSize: 10001120 -> 10001384 (+0.00%); split: -0.04%, +0.04%
MaxWaves: 18164 -> 18163 (-0.01%)
Instrs: 1883750 -> 1883598 (-0.01%); split: -0.06%, +0.05%
Cycles: 34800176 -> 34767840 (-0.09%); split: -0.10%, +0.01%
We don't have a p_create_vector if we use NSA.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8523>
Could still be improved a little. For example, 8-bit pack without
constants could be:
(s_pack_ll(x, z) & 0x00ff00ff) | ((s_pack_ll(y, w) & 0x00ff00ff) << 8)
fossil-db (Sienna):
Totals from 136 (0.10% of 139391) affected shaders:
CodeSize: 279776 -> 278144 (-0.58%)
Instrs: 50742 -> 50470 (-0.54%)
Cycles: 211560 -> 210472 (-0.51%)
SMEM: 3607 -> 3557 (-1.39%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8421>
Since sparse fetch/load uses vec5 destinations, it may be possible that we
encounter nir_op_vec5.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
We will want both a VDATA operand and a sampler for some TFE/LWE MIMG
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
Compute the number of components of the destination vector from the
bitsize when eg. a 16-bit vec2 vertex fetches is splitted. This is
because the dst will be a v1, so the p_create_vector should be created
from two v2b fro both sizes to match.
This prevents a regression from the next change which will split
typed vertex buffer loads on GFX6 and GFX10+.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8363>
Without it, FragCoord.z will have the value of one of the fine pixels
instead of the center of the coarse pixel.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7837>
When VS outputs are known to be never stored in LDS, there is no
reason for HS waves to wait for all LS waves to complete. So, the
s_barrier between the LS and HS can be safely skipped.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7727>
avoids errors seen when building on OpenBSD/amd64
../src/amd/compiler/aco_instruction_selection.cpp:1677:62: error: ambiguous conversion for functional-style cast from 'unsigned long' to 'aco::Operand'
bld.vop3(aco_opcode::v_mul_f64, Definition(dst), Operand(0x3FF0000000000000lu), tmp);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
glibc uses unsigned long for uint64_t on LP64 archs and unsigned long long for
uint64_t on ILP32 archs. On OpenBSD unsigned long long is used for uint64_t
on all archs.
The Operand constructors are uint8_t uint16_t uint32_t uint64_t
use UINT64_C so lu or llu suffix will be used as needed.
Fixes: df645fa369 ("aco: implement VK_KHR_shader_float_controls")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7944>
During NIR linking, constant varyings might be moved to the next
stage and the sample qualifier removed.
shader_info::uses_sample_shading remembers if the sample qualifier
was used before optimizations.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7892>
With NGG, ngg_gs_known_vtxcnt[0] would be false and ngg_gs_finale() would
assert.
With legacy GS, I don't know why the assertion was there.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7576>