Just drop the store. Written while debugging
dEQP-VK.pipeline.monolithic.logic_op.r8_uint.no_op.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24252>
nir_const_value_for_int asserts signed bounds on the input, but we pass in an
unsigned value that would be out-of-bounds for 32-bit channels, causing the
assert to fail for 32-bit channel formats.
Fixes dEQP-VK.pipeline.monolithic.logic_op.r32_uint.* on AGXV (and probably
PanVK).
Fixes: dbd0615e7a ("nir/lower_blend: Avoid useless iand with logic ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24252>
This avoids the silly compiler versions. Some bits are slightly more
complicated, because they have to account for inverted enum values (rather than
a separate invert bit), but this is a LOT friendlier to drivers using the pass
and it makes the pass itself more readable.
The conversion functions in panfrost/panvk will go away momentarily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24076>
While debugging KHR-GLES31.core.draw_buffers_indexed.color_masks, the noise from
piles of store_output(load_output) instructions got in the way. Optimize it out.
This does not fix the test, but if this case ever happened in a real app it
would improve performance. This is only load bearing on Asahi (and PanVK?),
since Panfrost wouldn't call nir_lower_blend at all in this case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23836>
Loading output require per-sample blending, so enable per-sample execution of
the shader as a whole so the right sample values are blended. Affects:
dEQP-GLES31.functional.multisample.default_framebuffer.sample_mask_sum_of_inverses
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22385>
Only insert a load_output if we're going to use it, don't rely on it getting
DCE'd since that will mess up the shader info. This does require a bit of logic
to figure out whether we do need it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22385>
If a dual source blend colour is never written, src1 will be null and it will be
invalid to dereference it. src1 is dereferenced both for the f2fN instruction
but also if a dual blend factor is used... even if the latter isn't strictly
valid, segfaulting in the NIR pass seems a lot meaner than blending with zero.
The referenced commit hosed Asahi, causing anything that used blending to crash.
Panfrost is unaffected since it always supplies a dual colour due to our crude
construction of blend shaders.
Fixes: 8313016543 ("nir/lower_blend: Consume dual stores")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21544>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
This is one step towards lowering I/O during shader preprocess rather than at
variant create time, which helps mitigate shader variant jank. It's also a lot
simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> [v1]
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
In OpenGL, FRAG_RESULT_COLOR implicitly broadcasts to every render target. Our
existing lower_blend code (somewhat arbitrarily) aliases to the the first render
target's format and blend settings. That said, I don't think that works if
different render targets have different settings -- or blend with their
different destinations -- though I don't have relevant spec text right now.
The actual reason this works is that all users of this pass either call
nir_lower_fragcolor first (panfrost, asahi) or don't have FRAG_RESULT_COLOR as
part of their API (panvk, soon agxv). Unless/until we actually have a use case
for nir_lower_blend with gl_FragColor, assert that gl_FragColor is lowered first
so we don't need to worry about this imaginary case.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
Stores don't have destinations, and if they did, it would be invalid to change
their ssa_def's num_components without also changing the SSA def. Remove the
nonsensical (but harmless) assignment.
This fixes 25249e8be2 ("nir/lower_blend: Expand or shrink output variables as
needed"), but as the bug is harmless in practice, it does not need to be
backported.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
In this usual case, do a quick check to avoid generating 5 useless instructions
(mov/vec4 instructions). They'll get copypropped but that creates more work for
the optimizer and nir/lower_blend runs in a hot variant path on both Asahi and
Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
nir/lower_blend asserts:
assert(nir_intrinsic_write_mask(store) ==
nir_component_mask(store->num_components));
For the special blend shaders used in Panfrost, this holds. But for arbitrary
shaders coming out of GLSL-to-NIR (as used with Asahi), this does not hold. In
particular, after nir_opt_undef runs, undefined components can be trimmed.
Concretely, if we have the shader:
gl_FragColor.xyz = foo;
Then this becomes in NIR
gl_FragColor = vec4(foo.xyz, undef);
and then opt_undef will give the store_deref a wrmask of xyz but 4 components.
Then lower_blend asserts out.
Found in a gfxbench shader on asahi.
Closes: #6982
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Per the spec.
Fixes arb_color_buffer_float-render on both Panfrost and Asahi (before/after
reproduced on Mali-T860 and AGX G13 respectively). Without that patch, that test
fails the assertion:
arb_color_buffer_float-render: ../src/compiler/nir/nir_lower_blend.c:259: nir_blend_logicop: Assertion `util_format_is_pure_integer(format)' failed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
The upper bits start correctly, there's no need to clear them as long as we keep
them zero'ed by using ixor with a valid bit mask instead of inot.
Makes the code generated for logic op slightly less ridiculous. I'm joking. It's
still ridiculous but I'm not in the mood to fix up the Midgard compiler and it's
just a little ALU for a feature almost nothing uses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
We need to sign extend. Incidentally this means the iand above is useless for
SNORM.
Fixes arb_color_buffer_float-render with GL_RGBA8_SNORM.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Particularly constant colours, but also (more obscurely) SNORM.
Fixes arb_color_buffer_float-render with SNORM framebuffers. Issue affects both
Asahi and Panfrost (the latter after we start advertising EXT_render_snorm).
v2: Check the blend factor to avoid unnecessary clamps. This avoids regressing
blend shader code quality on Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com> [v1]
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
In this case we have 4 components but the value of the fourth component
is undefined. Apply the fixup we already have.
Fixes
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.0
on Asahi. That test blend with DST_ALPHA with its RGB565 attachment,
which is fine if RGB565 is preserved, but Asahi is demoting that
format to RGBX8 which means -- after lowering the tilebuffer access --
we blend with an ssa_undef.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
The option struct passed to nir_lower_blend doesn't have a "blending
disabled" flag. Unless blending is skipped due to logic ops or
framebuffer formats, nir_lower_blend always blends, even if the blend
mode is "replace" (corresponding to the API level blend disable).
That's mostly okay, since NIR can optimize out the code, at the expense
of a little compile time. However, there's a catch: nir_lower_blend
emits fsat at the start of the shader (for UNORM framebuffers, or
fsat_signed for SNORM). We can expect hardware to saturate the input to
store_output itself, so these operations are redundant, but it's tricky
to optimize these instructions out otherwise. Don't even try: detect the
replace blend mode and don't call nir_blend in that case. Colour masking
is still applied as usual.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18535>
Because we pull the RT from the variable location and use that to look
up formats, we need a constant RT index. To deal with arrays (possibly
of arrays), we would either need to handle array derefs (we don't today)
or we need to require the variables to be split into one variable per
RT. Given that we have to lower indirect derefs anyway (to get constant
indices), we may as well require the client to split output variables by
calling nir_lower_io_arrays_to_elements_no_indirect().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
The BITFIELD_MASK() macro is intended for using with actual bitfields,
not with nir_component_mask_t. This means we do some extra work to
handle values that are invalid for nir_component_mask_t in the first
place.
This eliminates some warnings on Clang, where the compiler complains
about casting UINT32_MAX to UINT16_MAX.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15547>
nir_lower_blend was written against the OpenGL ES 3.2 specification,
which does not support blending SNORM render targets. The ES spec
says that non-floating point buffers get clamped to [0, 1] before
blending. The story is not so simple: SNORM buffers are blendable in
OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case.
NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but
it is only implemented in Panfrost, and this pass is in common code.
Open code it instead. Panfrost optimizes the open coded version, so this
is good enough.
Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>
Make sure the new and old sources have the same number of components,
otherwise the NIR validation pass complains.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
SCALED formats are interpreted as floats, but not in the usual [0, 1]
or [-1, 1] range, meaning that the blend lowering logic can't directly
apply to those. Assert that the format being passed is not a scaled
format.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The caller doesn't necessarily want to lower blend operations on all
render targets since some of them might be supported natively (panvk
will be in that case). Let's just skip RTs that have a format set to
PIPE_FORMAT_NONE to allow that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
nir_ssa_for_src() is not supposed to pad the src vector if
dst->num_components > src->num_components. Let's pad things explicitly
with nir_pad_vector().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
This pass was originally developed for Panfrost, where it passes the
relevant dEQP tests. Upstreaming so it can be extended and then shared
with:
* Asahi, for blending
* Zink, for logic ops
* Lavapipe, for advanced blending
Note that using this with MRT in a fragment shader (as non-panfrost
drivers will) has not yet been tested. Logic ops with integer
framebuffers are probably todo. It's been enough for Panfrost, will
suffice for ES2 on Asahi, and provides an upstream base for kusma's work
on advanced blending, so overall the merge is a net benefit.
v2: Remove bogus assert that the format layout is PLAIN. We need to
render R11G11B10, which Mesa reports as layout OTHER. The code is still
correct.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10601>