This optimization changed the rendered result for 11 pixels, all with
less than 1% change. Neither the old nor the new is obviously more
correct than the other, and the CTS is fine. So let's assume this change
is unproblematic, and accept the new result.
Fixes: 3d304d5647 ("nir/opt_algebraic: remove is_used_once on outer instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40321>
This is required to make sure that conformant_trunc_coord is correctly
enabled/disabled. Otherwise, it might be disabled on GFX11 GPUs with
drm-shim.
Bumping the minor version shouldn't have any other effects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40313>
Add a nightly job running Cuttlefish with Venus on Turnip.
Similar to the existing Venus-on-ANV jobs, this uses Cuttlefish's
'venus_guest_angle' mode to run deqp-vk and deqp-egl with ANGLE and
Venus inside the Android guest, with Turnip on the host.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
Useful for smaller/larger loads. Also there is no reason to be bitsize
specific here if we use an signed constant.
Foz-DB Navi48:
Totals from 8 (0.01% of 114655) affected shaders:
Instrs: 7629 -> 7612 (-0.22%)
CodeSize: 40772 -> 40692 (-0.20%)
Latency: 54880 -> 54944 (+0.12%)
InvThroughput: 8879 -> 8880 (+0.01%); split: -0.08%, +0.09%
VALU: 4029 -> 4027 (-0.05%); split: -0.15%, +0.10%
SALU: 1260 -> 1249 (-0.87%)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40292>
From what I understand, use of const_index[] by the driver is
dangerous and should be avoided, as commits such as a6330ed4d0
("nir: add ACCESS to load_uniforms") may result in the indexes
changing, breaking the driver. Switch to using the parameter names in
order to make the code more future-proof.
For elk_fs_nir.cpp and elk_vec4_tes.cpp we can verify in the generated
nir_intrinsics.c that the wanted value is actually
nir_intrinsic_base().
For elk_nir.c, according to Caio Oliveira:
"The code is checking for certain load/store via the is_input() and
is_output() checks a few lines above. I've checked all them have
BASE at 0."
Thanks to Ian Romanick for his guidance regarding this patch.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39438>
This is new in SM75 (Turing). Let's use it because it allows us to get rid
of the if/else around bound checked global loads.
There are some changes in fossils, but it seems that's mostly due to CFG
optimizations doing things a bit differently?
Totals:
CodeSize: 9442152688 -> 9442133184 (-0.00%); split: -0.00%, +0.00%
Static cycle count: 6120910991 -> 6120907718 (-0.00%); split: -0.00%, +0.00%
Spills to reg: 184789 -> 184810 (+0.01%)
Fills from reg: 223831 -> 223860 (+0.01%); split: -0.00%, +0.01%
Totals from 334 (0.03% of 1163204) affected shaders:
CodeSize: 22020752 -> 22001248 (-0.09%); split: -0.10%, +0.01%
Static cycle count: 26582978 -> 26579705 (-0.01%); split: -0.01%, +0.00%
Spills to reg: 3110 -> 3131 (+0.68%)
Fills from reg: 3401 -> 3430 (+0.85%); split: -0.03%, +0.88%
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
legalize_ext_instr wasn't doing anything besides lowering uniform sources
and panicing on a bunch of Source types.
Having a common helper looping over all sources doesn't make much sense,
because all the instructions are widly different in regards to UGPRs. The
panics will be hit while emitting the sources as well, so this helper
provided little help and wasn't flexible enough for what we need.
Furthermore some instructions like LDG also take an additional input
predicate that legalize_ext_instr can't handle.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
We'll start to lower load_global_bounded there and that will invalidate
loop analysis, because the amount of instructions will change within a
block.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
Filter out video decode/encode format features when the DRM modifier
doesn't support video operations. Along with a CTS fix, this will fix
dEQP-VK.video.formats.* on UVD/VCN1 (which do not support swizzled
input).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40203>
If multiview is enabled on the render pass, baseLayer and layerCount
will be 0 and 1 respectively and throw us off.
We can still fast clear if view_mask == 1, but anything else hits the
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL restriction.
Fixes: e488773b29 ("anv: Fast clear depth/stencil surface in vkCmdClearAttachments")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40229>
We can't guarantee that skipping the BVH build would let the BVH memory
all zero. So explicitly set it to zero when running things with
BVH_NO_BUILD option.
This will help us to narrow down isuse if it's in BVH encoding or
application shader. Leaving uninitialized blob of memory would hit
intermittent hangs and would lead us to nowhere.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40276>
INTEL_DEBUG=bat is no longer supported on release drivers, instead
using a stub decoder. Update stub decoder warning message to
mention this.
Signed-off-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40306>
This is not caused by the new kernel, these tests have occasionally
timed out over the last couple of weeks.
Running them single-threaded didn't help.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40159>
With `pres->format == PIPE_FORMAT_L8_UNORM` and
`state->format == PIPE_FORMAT_L8_SRGB` the assert is triggered.
We should be comparing the linear version of `state->format` since
we're only concerned about the physical memory layout here.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31717>
With the slab, anv_device_lookup_bo() will have anv_bo::map = NULL
while the seen_bbos will not and we want a host pointer for decoding.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40294>
It is legal to have a cluster size larger than the subgroup/ballot size,
but our lowering would blow up in this case due to the nir_ishl_imm
overflowing in the lowering. Fortunately, this is easy to handle.
Fixes sub_group_clustered_reduce_logical_and()
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40224>