When the non_uniform flag is not set, the result is never divergent.
v2: Remove redundant assignment to is_divergent. Suggested by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
The comment says, "This expands to (b & 3) & ~0xc which is (b & 3) &
3." This is not correct. ~0xc is actually 0xfffffff3.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: #11695
Fixes: 1c7e35d4e0 ("nir/algebraic: Optimize some bit operation nonsense observed in some shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30913>
Fixes the following CTS tests on NVK:
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.generated_args.signed_zero_sub_var_preserve*
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30859>
This is needed for cl_khr_mipmap_image, specifically the OpenCL C
function get_image_num_mip_levels.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30834>
Prints the shader to a string and assigns source locations based on
that.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
Adds a new instruction type that stores metadata that might be useful
for debugging purposes. Passes must ignore these instructions when
making decisions.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
If we don't do that and the source is not 32-bit we end up with a
bit_size mismatch when doing the ior operation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
A loop that looks like:
loop {
do_work_1();
if (cond) {
break;
} else {
}
do_work_2();
break;
}
We can't pull that break ahead of do_work_1() after hoisting the initial
do_work_1() out of the loop. So bail in this case.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11711
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30702>
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
The static assert used in encode deref modes used the fact there was
less than 16 modes that we wanted to compress as an opportunity to reuse
MODE_ENC_GENERIC_BIT as it just happened to represent 16. However if we
add more than 16 modes i.e need to compress to 6 bits not 5 bits then
MODE_ENC_GENERIC_BIT becomes 32 and the logic in the assert breaks.
Instead we more precisely make sure MODE_ENC_GENERIC_BIT is large
enough to fit all but the last 4 generic modes and that the last 4 modes
defined in the enum are in fact the 4 generic modes.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30654>
Similar to commit 6cef804067 ("nir/opt_if: fix opt_if_merge when
destination branch has a jump"), we shouldn't combine if statements when
the second if-then-else has a block that ends in a jump.
This fixes a case where opt_if_merge combines
if (cond) {
[then-block-1]
} else {
[else-block-1]
}
if (cond) {
[then-block-2]
} else {
[else-block-2]
}
where `then-block-2` or `else-block-2` ends in a jump. The phi nodes
following the control flow will be incorrectly updated to have an input
from a block that is not a predecessor.
Fixes: 4d3f6cb973 ("nir: merge some basic consecutive ifs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30629>
Some store intrinsics (e.g., store_const_ir3) don't have a wrmask so
don't assume it always exists.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Some load/store intrinsics (e.g., load/store_const_ir3) use offsets in
units other than bytes. Currently, byte offsets were assumed in multiple
places.
This patch adds a new offset_scale field to intrinsic_info and uses it
were needed.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Phi nodes are mostly handled the same way as ALU instructions: if all
sources point to the same def (which happens if they are scalar or have
been previously vectorized), combine them into a single vectorized phi
node.
There is one case where this doesn't work, however: sources that come
from a loop back-edge. Since their defs haven't been processed yet, they
are generally not the same. We could simply refuse to vectorize such
phi nodes but this could leave many values used in loops unnecessarily
scalarized.
Instead, this patch implements a simple heuristic: if all defs coming
from a back-edge have the same instructions type and, in case of ALU,
the same operation, assume they will be vectorized later. Since we
require that normal edges are vectorized already, chances are that the
back-edge can also be vectorized.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
To handle phi nodes, it's important that all sources have been processed
before processing the phi node itself. The current traversal order
(depth-first on dom_children) does not guarantee this. This patch
rewrites the pass to visit blocks in source-code order.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Dispatch to different functions inside instr_try_combine. To prepare for
upcoming support for phi nodes.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
rewrote most of the impl but shrug.
regresses code gen for mediump but I'm not too bothered given the lackluster
perf of fp16 on bifrost :(
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30567>
I don't know what Apple calls these, so we're using the name "explicit
coordinates".
AGX has instructions for loading/stores register <---> tilebuffer ---> storage
images. Usually these are used in the fragment shader and end-of-tile shader to
implement colour attachments, with implicitly specified coordinates based on the
shader stage. However they can also be used in compute shaders with explicitly
specified coordinates ("imageblocks" in Apple parlance). Model this in NIR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30633>
after the smashing success of nir_shader_intrinsics_pass, let's add the ALU
version to help the odd non-algebraic ALU lowering pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30582>