On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.
On platforms that use lower_find_msb_to_reverse, there should be no
difference.
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0
total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
4d802df3aa loosened the type restrictions
on these opcodes to enable support for 64-bit ballot operations. In
doing so, it enabled 8-bit and 16-bit sizes as well.
It's impossible to get these sizes through GLSL or SPIR-V. None of the
lowering in nir_opt_algebraic can handle non-32-bit sizes. Almost no
drivers can handle non-32-bit sizes.
It doesn't seem possible to enforce anything other than "one bit size"
or "all bit sizes" in nir_opcodes.py. The only way it seems possible to
enforce this is in nir_validate. This is not ideal, but it be what it
be.
v2: Remove restriction on find_lsb. It is acutally possible to get this
via GLSL by doing findLSB() on a lowp value. findMSB declares its
parameter as highp, so that path is still impossible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Both GLSL & SPIRV have undefined values for shift > bitsize. But SM5
says :
"This instruction performs a component-wise shift of each 32-bit
value in src0 left by an unsigned integer bit count provided by
the LSB 5 bits (0-31 range) in src1, inserting 0."
Better to not hard code the wrong behavior in NIR.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e227bb9fd5 ("nir/builder: add ishl_imm helper")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@colllabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21720>
For GLSL, we want to optimize code like
memoryBarrierBuffer();
controlBarrier();
into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.
This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>
Some backends can handle a constant texture index or a dynamic texture index but
not a constant texture index plus a dynamic texture offset. Add a nir_lower_tex
option to lower to one of these options.
v2: Use more straightforward code proposed by Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
It can sometimes be useful to also print the shaders that are marked as
internal, so let's add a flag that lets us do that.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21681>
This NIR pass lowers stores in fragment shaders to:
if (!gl_HelperInvocaton) {
store();
}
This implements the API requirement that helper invocations do not have visible
side effects, and the lowering is required on any hardware that cannot directly
mask helper invocation's side effects. The pass was originally written for
Midgard (which has this issue) but is also needed for Asahi. Let's share the
code, and fix it while we're at it.
Changes from the Midgard pass:
1. Add an option to only lower atomics.
AGX hardware can mask helper invocations for "plain" stores but not for
atomics. Accordingly, the AGX compiler wants this lowering for atomics but
not store_global. By contrast, Midgard cannot mask any stores and needs the
lowering for all store intrinsics. Add an option to the common pass to
accommodate both cases.
This is an optimization for AGX. It is not required for correctness, this
lowering is always legal.
2. Fix dominance issues.
It's invalid to have NIR like
if ... {
ssa_1 = ...
}
foo ssa_1
Instead we need to rewrite as
if ... {
ssa_1 = ...
} else {
ssa_2 = undef
}
ssa_3 = phi ssa_1, ssa_2
foo ssa_3
By default, neither nir_validate nor the backends check this, so this doesn't
currently fix a (known) real bug. But it's still invalid and fails validation
with NIR_DEBUG=validate_ssa_dominance.
Fix this in lower_helper_writes for intrinsics that return data (atomics).
3. Assert that the pass is run only for fragment shaders. This encourages
backends to be judicious about which passes they call instead of just
throwing everything in a giant lower everything spaghetti.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21413>
The pass can't handle it just like the other unpack opcodes and generates
invalid NIR.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
This should make things more clear than changing the value from earlier
in the loop. Also, rename chunk_offset to load_offset so they match.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
Checking against align_mul is insufficient if align_offset > 0. We need
to check against the combined alignment instead.
Fixes: 2e2d7803c7 ("nir: Add a load/store bit size lowering pass")
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
The offset handling should already work for flattening to our split vars,
just need to make sure we have enough (or any!) array elements.
Fixes: #7154
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13288>
We had all these nice fdot opts to drop individual channels that were 0,
but nothing handling it being entirely 0! Avoids r300g regression when
dropping them from GLSL.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
If a dual source blend colour is never written, src1 will be null and it will be
invalid to dereference it. src1 is dereferenced both for the f2fN instruction
but also if a dual blend factor is used... even if the latter isn't strictly
valid, segfaulting in the NIR pass seems a lot meaner than blending with zero.
The referenced commit hosed Asahi, causing anything that used blending to crash.
Panfrost is unaffected since it always supplies a dual colour due to our crude
construction of blend shaders.
Fixes: 8313016543 ("nir/lower_blend: Consume dual stores")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21544>
Add a new texture opcode that returns the LOD bias of the sampler. This will be
used on AGX to lower sampler LOD bias to txb and friends. This needs to be a
texture op (and not a new intrinsic) to handle both bindless and bindful
samplers across GL and Vulkan in a uniform way.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
This will emulate VGT_ESGS_RING_ITEMSIZE, which does the multiplication
for us. It's beneficial to stop setting VGT_ESGS_RING_ITEMSIZE to reduce
context rolls, and also the register will be removed in the future.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
The scaling needs to be ubo * MAX_INLINABLE_UNIFORMS, not
ubo * PIPE_MAX_CONSTANT_BUFFERS, otherwise accesses beyond buffer size
will result for ubo >= 4 (and we'd also access the wrong values later
for other non-zero ubo indices).
Fixes: a7696a4d98 ("lavapipe: Fix bad array index scale factor in lvp_inline_uniforms pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21506>
The result might be used in a deref_ptr_as_array, which requires a proper
stride within lower_explicit_io. If we'd lose that information or end up
with a different stride don't execute this optimization.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8289
Fixes: b779baa9bf ("nir/deref: fix struct wrapper casts. (v3)")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21458>