No games are using task shaders with DGC at the moment but this is
supposed to work.
This fixes test_amplification_shader_execute_indirect from vkd3d.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
We need to make sure that the DGC ACE IB will wait for the DGC
prepare shader before the execution starts. When DGC is preprocessed
the synchronization is already correct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
When the DGC prepare shader is conditionally executed on the graphics
queue, the generated IBs might be uninitialized. It's fine for the
DGC GFX IB because the INDIRECT_PACKET would also be conditionally
skipped but it's not possible to do that for the DGC ACE IB
(ie. no IB2 on compute).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
Looks like some of the tests uses the bias which does not fit into half
float parameter, so it's better to use float param for sample_b.
If we have cube arrays, we anyway combine BIAS and array index properly
so we don't have to worry about the first parameter.
This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>
This has no impact on the generated shaders, but does have a small
(positive) impact on the amount of time spent in shader compilation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29126>
Following https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957,
some Xe2 code paths started triggering asserts.
In the cases fixed by this patch, it was because of the assert added
to brw_type_larger_of() in cf8ed9925f ("intel/brw: Make a helper for
finding the largest of two types"), and then brw_type_larger_of() is
used in 674e89953f. (For example, the assert was triggering when the
SHL types differed between D and UD.)
Fixes: 674e89953f ("intel/brw: Use new builder helpers that allocate a VGRF destination")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29925>
Function anv_physical_device_try_create() creates the devinfo variable
and then at some point it copies its contents to device->info:
device->info = devinfo;
Much much later we're calling:
intel_common_update_device_info(fd, &devinfo);
... which is updating devinfo but not device->info. As a consequence,
we're only creating one queue, as engine_class_supported_count[klass]
is zero for everybody.
Fixes: 5b8b4f7878 ("intel/dev: Add engine_class_supported_count to intel_device_info")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29927>
When the destination of both instructions is NULL and the conditional
modifier matches, operands_match (by way of instructions_match) will
only test the first two operands. This can result in bad CSE
happening.
This is a very, very narrow edge case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This one is a bit more complex in that we need to handle 3-source
commutative opcodes. But it's also quite useful:
fossil-db results on Alchemist (A770):
Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01%
Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12%
Subgroup size: 7589608 -> 7589592 (-0.00%)
Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00%
Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01%
Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41%
Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47%
Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23%
Our older Shadow of the Tomb Raider fossil is particularly helped with
over a 90% reduction in scratch access (spills, fills, and scratch
size). However, benchmarking in the actual game shows no change in
performance. We're thinking the game's shaders have been updated since
our capture.
Ian noted that there was a bug here where we'd accidentally CSE two ADD3
instructions with null destinations and different src[2] that couldn't
be dead code eliminated due to conditional mods. However, this is only
a bug in the new cse_defs pass so we don't need to nominate this for
stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In src_idx = src_idx = binding_layout->desc_idx + i * desc_stride + subdesc_idx,
src_idx is written twice with the same value.
Fixes: 7bea6f8612 ("panvk: Overhaul the Bifrost descriptor set implementation")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29865>
This optimizes things by splitting the position and vertex
processing in two, allowing primitives to be discarded before
the varying shader is executed.
This optimization is even more important if we throw
layered rendering into the mix, because layered rendering on
Bifrost is implemented with N IDVS/fragment jobs (N being the
number of layers), with primitives not targetting a given
layer being artificially culled in the vertex shader by
issuing a position outside the render area.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
Right now we assume the fragment job chain contains only one job, but
with multilayer/multiview rendering, we want to submit fragment jobs
for all layers at once.
Turn pan_emit_fragment_job() into pan_emit_fragment_job_payload() and
delegate the job header packing to the caller.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
Right now, we always emit a framebuffer descriptor for the first layer
in the RT views. Extend the logic so we can emit one FBD per layer we're
supposed to render to without having to manually modify the RT views.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29450>
For task shaders, RADV will need to prepare two command buffers in the
DGC prepare shader. The preprocess buffer will be splitted in two
parts, one for GFX and one for ACE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29814>