v_frexp_exp_i16_f16 returns the two's complement for negative
exponents. For example, with 0.333252 it returns 0.666504 for
the mantissa and 65535 for the exponent (-1 in decimal).
RADV/LLVM and AMDVLK do a v_bfe_i32 and AMDGPU-PRO uses SDWA with
the sign extension bit set. The latter is probably what we want to
do in long term but for now RA doesn't support changing non-SDWA
instructions to SDWA if useful/needed.
Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.frexp.compute.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4546>
Because 16-bit constants are represented with the s1 RegClass, we
have to extract the low half.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4509>
When NGG is used, vertex and tess eval shaders are executed on the
hardware NGG geometry stage. There is a series of steps they
must perform:
* Request GS space using GS_ALLOC_REQ
* Export the primitive
* Finally, export the normal VS outputs
In this commit, two modes are implemented:
* "late" which matches what the RADV LLVM backend currently does
* "early" which is an optimized version as seen in radeonsi
Vulkan doesn't allow the shader to write the edge flags, so we can
currently always use the "early" mode.
Exporting the primitive ID is also supported by having the GS threads
write that into LDS and reading them from LDS in the ES threads.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
Currently we only use this at the beginning of merged shader parts,
but we are going to need to use it with some NGG code as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
Currently we only use this for uniform ifs that come from NIR,
but we are going to need to use it with some NGG parts as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
Currently without alignment check, so that
we can only use the _byte and _short versions
and multi-component stores are split.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
The implementations here just clone i2b32 and i2b1. This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
We know that in this case, the LS and HS invocations are working
on the exact same vertex, so it's safe to skip the LDS.
Totals:
VGPRS: 3960744 -> 3961844 (0.03 %)
Code Size: 254824300 -> 254764624 (-0.02 %) bytes
Max Waves: 1053748 -> 1053574 (-0.02 %)
Totals from affected shaders:
VGPRS: 26152 -> 27252 (4.21 %)
Code Size: 1496600 -> 1436924 (-3.99 %) bytes
Max Waves: 4860 -> 4686 (-3.58 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.
As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.
Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes
Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
It can be further optimized in the future, but
the new sequence already has a few advantages:
* Uses fewer instructions
* Uses even fewer instructions in wave32 mode
* Doesn't use the VALU at all
Totals from affected shaders (GFX10):
VGPRS: 43504 -> 43496 (-0.02 %)
Code Size: 2436000 -> 2423688 (-0.51 %) bytes
Max Waves: 8704 -> 8705 (0.01 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.
When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.
Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)
Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Previously the tess factors were loaded individually, but now they can
be loaded using a single LDS load instruction.
Note that the inner and outer tess factors are not yet combined.
Totals (GFX10):
Code Size: 254896008 -> 254879212 (-0.01 %) bytes
Totals from affected shaders (GFX10):
Code Size: 2028352 -> 2011556 (-0.83 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>