This one is a bit more complex in that we need to handle 3-source
commutative opcodes. But it's also quite useful:
fossil-db results on Alchemist (A770):
Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01%
Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12%
Subgroup size: 7589608 -> 7589592 (-0.00%)
Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00%
Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01%
Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41%
Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47%
Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23%
Our older Shadow of the Tomb Raider fossil is particularly helped with
over a 90% reduction in scratch access (spills, fills, and scratch
size). However, benchmarking in the actual game shows no change in
performance. We're thinking the game's shaders have been updated since
our capture.
Ian noted that there was a bug here where we'd accidentally CSE two ADD3
instructions with null destinations and different src[2] that couldn't
be dead code eliminated due to conditional mods. However, this is only
a bug in the new cse_defs pass so we don't need to nominate this for
stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In src_idx = src_idx = binding_layout->desc_idx + i * desc_stride + subdesc_idx,
src_idx is written twice with the same value.
Fixes: 7bea6f8612 ("panvk: Overhaul the Bifrost descriptor set implementation")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29865>
This optimizes things by splitting the position and vertex
processing in two, allowing primitives to be discarded before
the varying shader is executed.
This optimization is even more important if we throw
layered rendering into the mix, because layered rendering on
Bifrost is implemented with N IDVS/fragment jobs (N being the
number of layers), with primitives not targetting a given
layer being artificially culled in the vertex shader by
issuing a position outside the render area.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
Right now we assume the fragment job chain contains only one job, but
with multilayer/multiview rendering, we want to submit fragment jobs
for all layers at once.
Turn pan_emit_fragment_job() into pan_emit_fragment_job_payload() and
delegate the job header packing to the caller.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
Right now, we always emit a framebuffer descriptor for the first layer
in the RT views. Extend the logic so we can emit one FBD per layer we're
supposed to render to without having to manually modify the RT views.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
For task shaders, RADV will need to prepare two command buffers in the
DGC prepare shader. The preprocess buffer will be splitted in two
parts, one for GFX and one for ACE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29814>
Adds a bit set to llvmpipe_resurce where each bit stores the residency
of a 64KB tile. The sampling code is adjusted to make use of said table
and return a residency code for sparse texture operations.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29408>
These were only applied in emit_apply_pipe_flushes but in theory could
be required for some other individually shot pipe controls.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29897>
Prevents regressions when removing input modifiers from a == 0.0.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
This fixes corruption of push constants on Xe2 due to a mismatch in
the uniform layout implemented by the compiler and assumed by the
driver. To fix it we need to align the push constant ranges computed
by the Vulkan driver to a multiple of the GRF size of the platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29926>
This is inspired from below MR but done in the fixed way:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26767
The requirements used to look up struct extensions are missing the alias
check for those promoted ones. This change fixes it so that the
condition now is correct.
We can land this now as all drivers have migrated to use the common
properties, which has now also been mandated.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29846>
We always emit multop+umul24 to implement integer multiply and
this is the only scenario in which we use multop, so if we decide
to DCE umul24 we should also DCE the previous multop.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>