This injects a MRTZ export with only the alpha channel to select it
with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage.
Co-Authored-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>
Displayable DCC should also be disabled, otherwise it's asserting
somewhere in ac_surface.c
Fixes: e3d1f27b31 ("radv: add radv_disable_dcc_stores and enable for Indiana Jones: The Great Circle")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32584>
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:
con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
...
con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)
A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates. In most cases, only the .x channel
of the result is read. So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value. Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.
The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently. This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture. It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example). It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width. (These get copy propagated away.)
We can pick the SIMD size of the load independently of the native shader
width as well. On Xe2, those 28 convergent loads become a single SIMD32
ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can
use a smaller size when there aren't many to combine.
In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers. On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
On a single runner, this job currently times out due to taking over 5
hours. The estimate from dEQP runner itself suggests a full run might
take over 8 hours with the current configuration. We can't really work
with that long runs, even if they are manual.
We currently have 7 vim3 runners, so we can actually afford to
parallelize the run a bit, to make this a bit more manageable. If we
choose 4, we take up a bit more than half of the runners, but we leave
two runners (plus a spare) for the pre-merge CI.
With this, a each job takes about 2.5 hours. We leave the timeout at 3
hours for now, to have some headroom for new tests being enabled.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32591>
While the nr_channels is defined with 3 bits, which allows up to 7
channels, actually the number of channels is less or equal to 4.
This adds an assertion that helps static analyzers to avoid several
false positives related with this.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32589>
This updates entry for 14017823839 which fixes issues on BMG with:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32550>
We know we have a broken Vulkan driver, so it's debatable whether it's
a broken Vulkan 1.0 or broken 1.1. Advertising 1.1 lets us run more
tests, and this patch does this. We also bump the instance version id
to 1.4, which seems appropriate since the overall Vulkan infrastructure
within Mesa is at that level.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32464>
We were using the same routine to find the device and instance
version numbers. This isn't correct; the device version may
vary based on the physical hardware we are using, but the
instance version should always be the same.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32464>
Like mad, it's sometimes useful to swap the srcs of sad since not all
flags are allowed on all srcs. However, unlike mad, sad is 3-src
commutative so more srcs can be swapped.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32501>
In preparation for supporting sad (which like mad may benefit from
swapping some of it srcs), extract the swapping from
try_swap_mad_two_srcs so that it can be reused for sad. This is
necessary since, unlike mad, sad might also benefit from swapping srcs
1->2 (instead of only 2->1) or 3->2.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32501>
We would mark mad srcs as swapped once we tried swapping them, even if
it would not succeed. However, it might happen (especially after running
ir3_shared_folding) that a new opportunity for swapping comes up later.
Therefore, we should only mark the srcs as swapped when it actually
succeeded.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32501>
Turns out that sad is just iadd3. I assume it's an acronym for "Sum of
Absolute Differences" which may make sense since its 2nd src supports
(neg) which would allow SAD to be implemented using this instruction.
NIR already supports algebraic patterns for selecting iadd3 so adding
codegen support in ir3 is trivial. However, sad seems to have the same
hardware limitation as mad and doesn't support the scalar ALU so we have
to make sure to disable it when emitting iadd3.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32501>
The override used for the immed encoding in #cat3-src-const-or-immed
used a pattern which isn't supported in overrides by isaspec. The
pattern in the base bitset (10) was too strict for immediates since it
didn't allow the most significant bit to be 1.
Fix this by making the base pattern 1 and adding an assert for the next
bit to be 0 in the non-immed case.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 1c6c200c0d ("ir3: add newly found shlg.b16 instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32549>
This is the format that drivers will want to use for NV16
without YUV conversion (if they support this natively).
Previously we had NV16 working but it was always emulated
with R8 + GR88.
Fixes: 440b69210a ("dri, mesa: fix NV16 texture format")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32524>
These can be lowered to ALU and load_subgroup_invocation, all of which are
reorderable.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
This missed dpp16_shift_amd, lane_permute_16_amd, last_invocation and
ballot_relaxed.
Instead, list the non-reorderable intrinsics which are allowed to be moved
after discards.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
This can't be moved to after demote, so it's not reorderable.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
This just works on Mali, nothing fancy needed.
Unfortunately, this triggers a lot of timeouts, presumably due to
uncached CPU access to memory. So lots of extra skips here.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32562>
ttmp sgprs are readonly outside of trap handlers, so the instructions were
probably skipped. RA should also never create additional exec writes.
Fixes: e06773281b ("aco/ra: Optimize some SOP2 instructions with literal to SOPK.")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>
When a timestamped present is not used (MAILBOX or the very first present),
it's possible that the very last queued present ID won't complete in finite time.
Similar to frame callback based workaround, apply a timeout to present
waits when they target the very last submitted presentID.
Only apply the workaround when we're not guaranteed forward progress.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cc: mesa-stable
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32556>
When transitioning from FIFO to MAILBOX with swapchain_maintenance1,
we must make sure that the first MAILBOX after FIFO observes the wait
barrier. This was done implicitly in the timestamp path, but not for
the non-commit-timing path.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cc: mesa-stable
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32556>
When commit-timing was not supported, but FIFO was we would end
up in a situation with throttling on FIFO barrier and legacy fence.
At that point, the entire point of FIFO falls flat.
There are some caveats with this approach, but it's not expected
that compositors will only support FIFO, and not commit-timing long
term.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: c26ab1aee1 ("vulkan/wsi/wayland: Pace frames with commit-timing-v1")
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32556>