These would only have worked in GCC and Clang, which so far wasn't an
issue, but let's clean it up anyway.
Cc: mesa-stable
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18190>
There are a bunch of subtle details of how 32-bit sources are
zero-extended to 64-bit, how their swizzles work, how 64-bit
destinations are shrunk to 32-bit, and how those two interact. This
fixes the interactions... mostly.
Fixes umul_high, all such tests should be passing now. Unblocks idiv
lowering that depends on umul_high.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17860>
This works around issue packing 32-bit scalar swizzles zero-extended to
64-bit, seen with the umul_high implementation. I tried for a while
figuring out the root cause (even rewrote a big chunk of disassembler)
but am still a bit lost. Nevertheless this is a safe workaround with no
performance impact (and avoids relying on NIR undefined behaviour to
implement GPU undefined behaviour), so let's do this for now to fix
umul_high.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17860>
Even if the driver doesn't *use* trivial blend shaders, building and compiling
blend shaders is expensive. We shouldn't be building blend shaders that should
never be used.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Helpful to disambiguate blend shaders with different colour masks used for the
same format/replace operation.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
We don't need blending in the blitter. That means blend shaders are only needed
on Midgard. Simplify accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
On Midgard, we need a "blend" shader even if blending is disabled, if the format
isn't blendable. This is inefficient. Bifrost solves this by decoupling the
format conversion from the blending, allowing opaque (unblended) output to any
format without a blend shader or fragment key.
Unfortunately, our blend code is from the Midgard era -- I wrote an early
version of nir_lower_blend when I was still in high school! -- so we've been
using blend shaders for opaque output even on Bifrost. Whoops!
In SuperTuxKart, reduces blend shader calls by 30%, translating to a 15%
reduction in i-cache misses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
..on Bifrost and later, where the conversion hardware makes this reasonable.
This saves us from inserting a pile of conversions in the compiler to lower away
the 8-bit input/output. This also generates substantially better code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
It's unnecessary since the hardware already does the conversion for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
The type of the output variable will propagate through the store_output
intrinsic's src_type field to the BLEND instruction's register format field. On
Valhall, the register format for a BLEND comes from the instruction -- the
register format specified in the conversion descriptor (used on Bifrost) is
ignored. That means it has to match.
Previously, we always used a blend shader for integer rendering. Since blend
shaders ignore the register format of the BLEND instruction, that masked this
issue. That also means we don't need to backport this.
Will prevent a regression from the following commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
For untyped_color_outputs, we need to ignore the type of the colour output in
the shader and instead use the type from the format. We have all the information
to do this at blend descriptor pack time, but not at shader compile time. This
means we need a (somewhat expensive) fixup in this edge case to ingest
NIR-to-TGSI. This will prevent a regression from the rest of the series.
Although the register_format field is also present on Valhall blend descriptors,
it is ignored so we don't need the fixup there.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Colour outputs in TGSI are untyped so we have to ignore the register format on
the store_output. Luckily, Valhall gives us the auto32 escape hatch to deal with
it, and Bifrost doesn't care anyway. This will avoid regressions from native int
output without corrupting the compiler for GLSL and SPIR-V shaders that are
well-typed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Clause scheduler edition of db2bdc1dc3 ("pan/bi: Require ATEST coverage mask
input in R60"). ATEST wants to read r60, which can't work if its input isn't
even in a register.
When per-sample shading isn't in use, prevents regressions in:
KHR-GLES31.core.sample_variables.mask.*
These tests previously passed because per-sample shading was forced. It's
not clear whether the bug addressed in this patch is possible to hit "in the
wild", i.e. without the optimizations in this series that allow us to use
per-pixel shading in more cases.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Fixes flaking in
dEQP-GLES31.functional.image_load_store.cube.qualifiers.volatile_r32i due to
image reads being moved past a BARRIER.
To make this more robust/optimal, we probably need scheduling information
(coherent/volatile/etc) added to instructions like ACO does. That's left for a
future extension, for now I just want the test to stop flaking.
Fixes: 569e5dc745 ("pan/bi: Schedule for pressure pre-RA")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
This takes advantage of the .i1 modifier on the comparison to get b2i32 "for
free" in typical circumstances, saving an instruction. Will help with an instr
count regression from lower_idiv.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17266>
If we don't recognize the model, dev->model will be NULL. In that case, we can't
dereference dev->model to get the tilebuffer size. If we do, we'll segfault,
instead of gracefully refusing to probe and loading the swrast instead.
Fixes: 96d65b47c7 ("panfrost: Use implementation-specific tile size")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18115>
It's noisy since Bifrost was introduced, unnecessary since we converted to
per-arch GenXML, and wrong since Valhall was added.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Architecturally, these only work for Midgard, and even on Midgard didn't turn
out to be too useful. While we're removing pandecode cruft, let's remove the
stats that just add noise to Bifrost and Valhall (and largely just noise to
Midgard too).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
It's the same core logic. Unify and let GenXML do its thing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Eliminate some #ifdef by grouping v5 and v6 state separately.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Remove unsued width/height properties, and use cleaner C syntax to build the
return value.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
There are a lot of problems with passing job_index around:
* Almost entirely unused
* Not particularly helpful even when used
* Mostly ignored for Valhall already
* Doesn't extend to CSF
It only really exists due to the early days of pandecode generating valid C code
as the trace format. With GenXML instead, that's not applicable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
It hasn't had a consistent semantic meaning since we've switched decoding over
to GenXML.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
The hardware doesn't care what BO a given buffer resides in, only what GPU
address it's at. It's simpler to fetch from a GPU address, rather than the pair
of a GPU address and a backing allocation. This cleans up a lot of cruft in
pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
../src/panfrost/lib/tests/test-earlyzs.cpp: In function 'void test(pan_earlyzs, pan_earlyzs, uint32_t)':
../src/panfrost/lib/tests/test-earlyzs.cpp:59:4: error: 'pan_shader_info::<unnamed union>' has no non-static data member named 'can_discard'
59 | };
| ^
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18024>
source_root function is deprecated in Meson version 0.56.0, so let's use
instead a current_source_dir() function, available in all Meson
versions. This also allows to deduplicate some code by declaring
commonly used string at the top meson.build file.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17974>
This includes the new timeout fixes so that tests that throw lots of debug
don't delay the timeout triggering, and the fraction vs shuffling behavior
change so that "--fraction 2" doesn't just skip every other test as it
appears in the caselist (every vertex shader variant, for example).
The fraction vs shuffling change does mean we see some different fails on
some drivers now.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17876>
PAN_MESA_DEBUG=overflow will place objects as close as possible to a
protected region at the end of the buffer, so that overflows segfault.
Caught the bugs in all four of the preceding commits.
v2: memset the BO to 0xbb to catch code expecting zeroed allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
create_vertex_elements_state is sometimes called with a too large
num_elements argument, for example with util_blitter, which causes a
buffer overflow.
There is no documentation to forbid this practice, so don't rely on
so->num_elements being correct and instead use the vertex shader
attribute count, which matches the value used to allocate the
descriptors.
Use attributes_read_count rather than attribute_count because the
latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID.
Fixes: 76de3e691c ("panfrost: Merge attribute packing routines")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
We're about to make it so that the compiler warns/errors if you use the
wrong iterator macro. Fix up a bunch of places where someone used the
wrong one before we break anything.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17630>
Remove the previous compile-time early-ZS implementation and replace it with the
decoupled early-ZS implementation. This uses more efficient settings in some
cases (depth/stencil tests always passes or do not write), and fixes the
settings used in another case (alpha-to-coverage enabled with an otherwise
early-ZS shader.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #6206
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
The new early-ZS helpers are pure functions, leaf nodes of the call graph, and
implemented with a different algorithm from the "oracle" table of correct values
for various combinations of states. Further, incorrect settings often still pass
CTS while causing game bugs or inefficiencies. That combination makes the
helpers an excellent candidate for unit tests. Add some.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Bifrost (and Valhall) separate early-ZS configuration into two fields: when does
the depth/stencil buffer update happen? and when are pixels killed by the
depth/stencil tests? The driver separately configures these to occur early
(before the shader executes) or late (after the ATEST instruction executes at
the end of the shader). Early tests are generally more efficient, but various
combinations of API state and fragment shader properties can require late
updates and/or late kills for correctness. Determining how to configure these
fields is nontrivial.
Our current implementation (on Bifrost) configures these fields at fragment
shader compile time and bakes the settings into the RSD. This is both wrong
(using early testing when late testing is required) and suboptimal (using late
testing when early testing would suffice). We need to defer this configuration
until draw time, when we know rasterizer and Z/S state.
Reclassifying at draw time (as we currently do on Valhall) would be expensive,
especially with the extra terms added in here. To cope, decouple the shader
classification from the draw-time configuration. Since there are only a few bits
of draw state involved, this implementation just calculates all possible states.
Then the draw time classification is just indexing into a lookup table.
The actual algorithm used to classify is written with correctness and clarity in
mind. Unlike the current classification algorithm (which tries to match what the
DDK does, poorly), this algorithm embeds its proofs of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory this wait is required for correct behaviour of discarded threads with
ATEST. Mesa usually waits before the instruction after ATEST, so this wait will
get optimized out by va_merge_flow, but as our scheduler gets more sophisticated
this could become an issue.
Let's stay on the safe side and insert the recommended wait.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory, ATEST can take any combination of registers for inputs.
Experimentally, however, ATEST requires the coverage mask in R60. This avoids
regressing the following dEQP tests, which write their coverage mask with
pixel-frequency-shading but without writing to the depth/stencil buffer.
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.*
This issue is known to affect both Mali-G52 (v7) and Mali-G57 (v9). I am unsure
if this is a silicon bug or just an obscure implementation detail.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>