Reduces the work that other shader passes have to do to look at dead code,
and possibly extra rounds around the optimization loop if dce wasn't the
last pass in it.
shader-db runtime -1.12919% +/- 0.264337% (n=49) on SKL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
ifind_msb_rev was introduced in a5747f8ab3.
ifind_msb_rev guards against src0 being both 0 or -1 at the same time.
That is always true. This patch changes it to check for those values
individually.
Spotted from a compile warning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: a5747f8ab3 (\"nir: add opcodes for *find_msb_rev and lowering\")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11630>
For images, variable data includes the format. For samplers, variable
data is used for OpenCL inline samplers. When converting a variable
from one to the other, zero out the data so we don't accidentally
interpret a converted image as an inline sampler.
Fixes: fa677c86 ("nir_lower_readonly_images_to_tex: Support non-CL semantics")
Acked-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11674>
This should be a subtract, not an add. The comment's proof is correct,
but the (wrong) expression we actually use isn't what it's in the
comment! Correct the discrepancy.
The lowering in nir_opt_algebraic was correctly typed.
Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11671>
In addition to register pressure benefits from getting more fp16/int16,
this avoids i2imp's from standing in the way of loop unrolling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11545>
Compiling with clang warns about an unused variable in
nir_lower_packing.
Tracking progress was added to nir_lower_packing in
adb157ddfd but the function
will ignore the progress from impl calls and always return
false.
This patch changes it to return the progress. It fixes the
warning and should enable validation calls in NIR_PASS when
progress is made.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: adb157ddfd "nir: Return progress from nir_lower_64bit_pack()"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11615>
Static struct bitset was renamed to brw_bitset as a struct bitset
is defined in sys/_bitset.h included by pthread_np.h on FreeBSD that
is indirectly included by src/intel/compiler/brw_nir_lower_shader_calls.c
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
Sometimes you might want to find a constant source without going through
all the copy prop and constant folding to make your source be a
load_const.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
We could do better if we knew the nir_address_format to obtain
addition_bits, but the only affected driver (Turnip) probably won't
benefit because it doesn't vectorize across vec4.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2e7bceb220 ("nir/load_store_vectorizer: fix check_for_robustness() with indirect loads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4922
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11382>
Some infinite loop cases were already covered by other
restrictions (e.g. if the loop had a body), but the case with a single
block in the loop body wasn't yet.
This prevents an infinite loop when optimizing the shader in
dEQP-VK.reconvergence.subgroup_uniform_control_flow_ballot.compute.nesting2.3.2
and various others reconvergence tests.
Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11476>
This is now 100% equivalent to the new rt_resume intrinsic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Really copying Jason's pass.
Changes:
- Instead of all the intel lowering introduce rt_{execute_callable,trace_ray,resume}
- Add the ability to use scratch intrinsics directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>
About half or more of the text here is actually from Connor Abbot. I've
edited it a bit to bring it up-to-date and make a few things more clear.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
Long ago, the semantics of bcsel were such that it took a single boolean
value and selected between whole vectors. These days, it takes a vector
boolean with the assumption that if you want the old behavior you can
just use a .xxxx swizzle. There currently are no opcodes which use a
output_size of 0 but have a scalar or fixed-vector input. Let's
disallow it for now to force us to think through the semantics again if
this ever comes up as something someone actually wants.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
For debug on Android, it's useful to be able to print shaders to the
android log interface, since you don't usually have stdout/stderr.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
limit==0 is the signal for "don't peephole anything but a move that will
be optimized aways." limit > 0 is "up to N alu instructions may be moved
out." nir-to-tgsi uses ~0 as the indicator of "No, we really need to
eliminate all if instructions" on hardware like i915 that doesn't have
control flow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
We need to make get it updated after we may have nir_instr_remove()d an
instruction, and when we cross blocks. This didn't really matter before
because the only builder usage was idiv, which other users of
lower_int_to_float were probably never hitting.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
Use a single set and ensure dominance by checking after a equivalent
instruction is found.
Besides removing the need to copy a set, this also lets us resize the set
at the start of the pass in the next commit.
ministat (CSE only):
Difference at 95.0% confidence
-984.956 +/- 28.8559
-6.90075% +/- 0.190231%
(Student's t, pooled s = 26.9052)
ministat (entire run):
Difference at 95.0% confidence
-1246.1 +/- 257.253
-0.998972% +/- 0.205094%
(Student's t, pooled s = 239.863)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
Now that there's a common NIR pass, there's no point in us doing this in
the back-end anymore. In order to use this pass in i965, we do have to
make one tiny change. Gallium runs the pass after assigning input and
output locations and so needs the pass to respect those locations and
num_inputs. i965, however, runs it before any location assignment or
I/O lowering so we don't care. We do, however, need the pass to succeed
with num_inputs == 0 because we set that later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
In theory you can rerun the info gather pass, but in practice that
doesn't always end well. Be consistent inside this pass and update the
info.
While we're here, change the inputs read to use VERT_BIT_EDGEFLAG.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
These map directly to v_perm_b32 and v_permlane_b32.
Unfortunately there is no corresponding NIR opcode or
intrinsics, and it's too tedious to puzzle these things
together from the existing NIR instructions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
NIR currently doesn't have any intrinsics for a horizontal packed add,
so this one is modeled after AMD's v_sad_u8.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
The helpers will be reused for per-primitive variables that are also
arrayed, so use a more general name.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11252>
At best, this is an extra instruction for NIR to optimize out. At worst,
depending on pass ordering nir_load_output could sneak into the final
NIR, even on drivers that don't support fbfetch.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11255>
if we have
if ... {
return;
} else {
// block X
}
// block Y
phi(X: ...)
then nir_lower_returns tries to move block Y into the else body,
except nir_cf_extract doesn't move the phi. As the return is removed
in the then-body the phi suddenly has the wrong number of arguments
(and the phi doesn't dominate its uses anymore).
In this case we know that the phi has to be single arg, so we can just
rewrite the users of the phis and drop them.
Hit this in my RT adventures, not sure if this is actually reachable
right now, as single arg phis tend to be kind of exceptional outside
of CSSA and we typically call nir_lower_returns pretty early.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11207>
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>