f4812dc1 introduces optimizations that turn ior into bcsel. The MSL
compiler will incorrectly compile the shader internally when bcsel is used
leading to incorrect outputs. This commit adds a workaround that tricks
the MSL compiler into correctly compiling the shader internally.
Reviewed-by: squidbus <squidbus@proton.me>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41548>
Metal provides device properties for the recommended maximum memory usage and
the current amount of memory used. These can be used to provide an estimate
of heap usage and calculate a budget of memory usage by the application before
performance may degrade.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41523>
Also, re-title things to make it clear that the current text is about
implementing OpenGL[ES] extensions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41400>
Unfortunately we have to disable concurrent binning by default
because it hurts performance in a number of desktop games without
any case where we know it helps.
There are less vertex fetch resource available in BV compared to BR,
so when binning runs in BV, there are many vertices, and vertices are
attribute heavy - BV has much worse performance than BR, sometimes more
than 50% worse.
Even with worse performance it won't be bad if concurrent binning
actually overlapped with other workload in those cases, but in case of
desktop games - there is almost never a chance for overlap.
However it's impossible to statically find out if binning on BV would
be much slower than on BR, and we also cannot statically predict if
there is enough overlap (if any) to cover for the performance penalty.
Given the above, I don't see a way out but to make concurrent binning
opt in via `tu_allow_concurrent_binning` driconf toggle.
Still allow concurrent binning in CI to catch issues early.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41394>
We can determine used components earlier.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
Usually I'm able to run B580 capture on LNL, but in some cases the
oversubscription on replay would lead to allocation failures.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
I've pulled in a pile of changes to reduce the overhead (runtime and
memory) when sharding for deqp-runner, along with a bunch of fixes for
KHR_display testing that we recently enabled, plus a few others that
affect our drivers.
The big new set of failures looks like it's from more complete coverage of
blitting between formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243>
We're regularly hitting 13 minutes of deqp-runner runtime on our jobs,
which is too long. Once we uprev the CTS, one of them gets to 14 minutes
and triggered the existing job timeout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243>
This makes adding workgroup scope easier, this just creates the
split_box and moves things into it and adds some helpers.
This also rewrites some loops from r/c into i which calc r/c
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41500>
SEND operands don't have regions or types, hardware don't use those
bits except for possibly an old workaround. So from the perspective
of assembler, we shouldn't need to add them. For now brw_asm grammar
requires at least a type, so normalize to UD.
This will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
From the perspective of assembler, regions and types for ARF null are
not relevant -- so ignore them. We still have some validation relying
on the byte-stride of the destination, so keep those for now.
In the long run, if a certain Gfx version HW requires some specific
matching, the encoder (or the parser) should take care of it.
This change will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
This is a less obtuse error message for why things break.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
The hardware expects it to be present for every colour target.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
When lowering tg4 sparse testing to a non-gather opcode, we were adding
an explicit LOD 0 parameter. But we might already have a LOD or bias.
Fixes tests like:
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_lod
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_bias
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
Move the VecPair<A, B> data structure from NAK's ir.rs to the shared
compiler Rust crate so it can be reused by other backends.
The fields are private, and NAK's ir.rs (now in a different crate)
needs to read and mutate the inner Vecs. Add a_as_slice(..),
a_as_mut_slice(..), b_as_slice(..) and b_as_mut_slice(..), and update
NAK's SrcsAsSlice and DstsAsSlice impls to call them. Returning slices
keeps callers from changing the length of one side without the other,
which is what VecPair is built to prevent.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41435>
These are not usable by applications until we advertise GS, but the
implementation is effectively independent of the rest of the GS
implementation.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41478>
There is hardware support for adjacency primitives on v9 and later.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41478>
Previously, if only non bindless accesses where present, we would end up
emitting an empty preamble.
Also avoid emitting non binless textures.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40309>