The implementations here just clone i2b32 and i2b1. This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
We know that in this case, the LS and HS invocations are working
on the exact same vertex, so it's safe to skip the LDS.
Totals:
VGPRS: 3960744 -> 3961844 (0.03 %)
Code Size: 254824300 -> 254764624 (-0.02 %) bytes
Max Waves: 1053748 -> 1053574 (-0.02 %)
Totals from affected shaders:
VGPRS: 26152 -> 27252 (4.21 %)
Code Size: 1496600 -> 1436924 (-3.99 %) bytes
Max Waves: 4860 -> 4686 (-3.58 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.
As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.
Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes
Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
It can be further optimized in the future, but
the new sequence already has a few advantages:
* Uses fewer instructions
* Uses even fewer instructions in wave32 mode
* Doesn't use the VALU at all
Totals from affected shaders (GFX10):
VGPRS: 43504 -> 43496 (-0.02 %)
Code Size: 2436000 -> 2423688 (-0.51 %) bytes
Max Waves: 8704 -> 8705 (0.01 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.
When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.
Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)
Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Previously the tess factors were loaded individually, but now they can
be loaded using a single LDS load instruction.
Note that the inner and outer tess factors are not yet combined.
Totals (GFX10):
Code Size: 254896008 -> 254879212 (-0.01 %) bytes
Totals from affected shaders (GFX10):
Code Size: 2028352 -> 2011556 (-0.83 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Some copypasta may have stuck in the code.
This was left on false by mistake.
Totals (GFX10):
Code Size: 254939248 -> 254896008 (-0.02 %) bytes
Totals from affected shaders (GFX10):
VGPRS: 16196 -> 16212 (0.10 %)
Code Size: 1126332 -> 1083092 (-3.84 %) bytes
Max Waves: 2336 -> 2334 (-0.09 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
There is no need to check whether they are written using indirect
indices, because all tess factors should be written to VMEM only
at the end of the shader.
No pipeline db changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
This allows the passes after isel to assume that the exports are
always correct, and also allows to schedule these null exports later.
Additionally, it ensures that the correct exec mask is used for
these exports.
Totals from affected shaders (GFX10):
SGPRS: 84224 -> 84344 (0.14 %)
VGPRS: 23088 -> 23076 (-0.05 %)
Code Size: 882892 -> 894368 (1.30 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
Enabling a Vulkan extension doesn't mean that all features need
to be implemented. DOOM Eternal crashes at launch if that ext
is not supported but it doesn't matter if the features are enabled
or not.
Let's enable it like we did for VK_KHR_16bit_storage.
Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4299>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4299>
64-bit VGPR constant copies can happen because of 64-bit constant copy
propagation. Since this optimization is beneficial and more annoying to
deal with in the optimizer, I've implemented 64-bit VGPR constant copies
in handle_operands().
This also sets copy_operation::size correctly for 64-bit constant copies.
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>
The old code would have previously caught:
loop {
...
break
}
when it was meant to just catch:
loop {
if (...)
break
else
break
}
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
NIR removes most of this but undef instructions for loop header phis can
remain. These were harmless because ACO would DCE them itself.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
Usually a loop ends with a uniform continue. If it doesn't and we end up
adding our own continue edges (because of continue_or_break or divergent
breaks at the end), we have to add extra operands to the loop header phis.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
bpermute only exists on GFX8+ and only with Wave32 on GFX10. Instead
we have to use readlane with a waterfall loop to defeat the LLVM
backend.
This fixes DOOM Eternal which requires subgroup shuffle.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4284>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4284>
The Vulkan spec 1.2.135 says:
"pSizes is an optional array of buffer sizes, specifying the maximum
number of bytes to capture to the corresponding transform feedback
buffer. If pSizes is NULL, or the value of the pSizes array element
is VK_WHOLE_SIZE, then the maximum bytes captured will be the size
of the corresponding buffer minus the buffer offset."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2650
Fixes: b4eb029062 ("radv: implement VK_EXT_transform_feedback")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4232>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4232>
Change the MESA_ENABLE_LLVM checks in Android.mk
files in order to get mesa3d to build w/ AOSP
using mmma.
This tries to re-create a change that was introduced
in the following merge in the AOSP branch:
69f2c0128d2b Merge branch 'aosp/upstream-18.0'
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4175>
The shader module name is used to compute the pipeline key. The
driver used to load the wrong pipelines because the shader names
were similar.
This should fix random failures of
dEQP-VK.pipeline.depth_range_unrestricted.*
Fixes: f11ea22666 ("radv: fix a performance regression with graphics depth/stencil clears")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
piglit/bin/arb_bindless_texture-limit -auto -fbo:
Needed to deal with non-NULL dynamic_index without deref in tex instructions.
piglit/bin/shader_runner tests/spec/arb_bindless_texture/execution/images/multiple-resident-images-reading.shader_test -auto:
Need to deal with non-deref images in enter_waterfall_imae.
Fixes: b83c9aca4a "amd/llvm: Fix divergent descriptor indexing. (v3)"
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
If compute shaders require a specific subgroup size (ie. Wave32),
we have to return the correct one.
Fixes dEQP-VK.subgroups.size_control.compute.required_subgroup_size_*.
Fixes: fb07fd4e6c ("radv: implement VK_EXT_subgroup_size_control")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4215>