If this nir_if contains continues or other breaks, we can't move it
outside the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002>
Inverse_ballot result is undefined if the input is not dynamically uniform.
And sinking out of loops might make the input divergent.
Fixes: 18a0ff137f ("nir: sink/move inverse_ballot like moves")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
Same reason as for load_ubo.
Fixes: d199d65c3a ("nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
Prevent the accidental passing of 0 components or bits, as it makes no sense.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Suggested-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31103>
this existed due to the min/max, per the comment. now that we don't do min/max,
the whole routine is NaN correct so the fixup is pointless.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
everybody has abs on fmul, not everyone has abs on bcsel. should help agx and
bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
this does two things:
* ignores sign of negative numbers which let us play fast and loose later in th
series
* avoids an expensive fsign instruction in favour of a cheap bitwise op
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
worse in terms of NIR instruction count but lets the fabs fold easier. (on agx,
which has fabs on comparisons and fmul but not on bcsel. should be no worse if
ISA has fabs on all 3.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
just implement what the comment says, don't be clever. the clever thing is worse
on all architectures i'm familiar with, because the fdiv will turn into
fmul+frcp.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
When the non_uniform flag is not set, the result is never divergent.
v2: Remove redundant assignment to is_divergent. Suggested by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
The comment says, "This expands to (b & 3) & ~0xc which is (b & 3) &
3." This is not correct. ~0xc is actually 0xfffffff3.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: #11695
Fixes: 1c7e35d4e0 ("nir/algebraic: Optimize some bit operation nonsense observed in some shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30913>
Fixes the following CTS tests on NVK:
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.generated_args.signed_zero_sub_var_preserve*
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30859>
This is needed for cl_khr_mipmap_image, specifically the OpenCL C
function get_image_num_mip_levels.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30834>
Prints the shader to a string and assigns source locations based on
that.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
Adds a new instruction type that stores metadata that might be useful
for debugging purposes. Passes must ignore these instructions when
making decisions.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
If we don't do that and the source is not 32-bit we end up with a
bit_size mismatch when doing the ior operation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
A loop that looks like:
loop {
do_work_1();
if (cond) {
break;
} else {
}
do_work_2();
break;
}
We can't pull that break ahead of do_work_1() after hoisting the initial
do_work_1() out of the loop. So bail in this case.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11711
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30702>
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
The static assert used in encode deref modes used the fact there was
less than 16 modes that we wanted to compress as an opportunity to reuse
MODE_ENC_GENERIC_BIT as it just happened to represent 16. However if we
add more than 16 modes i.e need to compress to 6 bits not 5 bits then
MODE_ENC_GENERIC_BIT becomes 32 and the logic in the assert breaks.
Instead we more precisely make sure MODE_ENC_GENERIC_BIT is large
enough to fit all but the last 4 generic modes and that the last 4 modes
defined in the enum are in fact the 4 generic modes.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30654>
Similar to commit 6cef804067 ("nir/opt_if: fix opt_if_merge when
destination branch has a jump"), we shouldn't combine if statements when
the second if-then-else has a block that ends in a jump.
This fixes a case where opt_if_merge combines
if (cond) {
[then-block-1]
} else {
[else-block-1]
}
if (cond) {
[then-block-2]
} else {
[else-block-2]
}
where `then-block-2` or `else-block-2` ends in a jump. The phi nodes
following the control flow will be incorrectly updated to have an input
from a block that is not a predecessor.
Fixes: 4d3f6cb973 ("nir: merge some basic consecutive ifs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30629>