make sure we can fold the f2f away. alternatively f2fmp would work
here but details.
elden ring:
Totals from 137 (4.27% of 3206) affected shaders:
Instrs: 485455 -> 484904 (-0.11%)
CodeSize: 3218638 -> 3215338 (-0.10%)
ALU: 308071 -> 307520 (-0.18%)
FSCIB: 308071 -> 307520 (-0.18%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35909>
for drivers where we need to lower a base_workgroup_id but not global IDs.
rather than lowering the whole global ID to stick the base workgroup ID in
there, just add the workgroup offset to the final thread position.
Elden ring fossils:
Totals from 52 (1.62% of 3206) affected shaders:
Instrs: 48355 -> 48233 (-0.25%); split: -0.31%, +0.06%
CodeSize: 331912 -> 331148 (-0.23%); split: -0.28%, +0.05%
ALU: 30853 -> 30674 (-0.58%); split: -0.70%, +0.12%
FSCIB: 30853 -> 30674 (-0.58%); split: -0.70%, +0.12%
IC: 9054 -> 8958 (-1.06%)
GPRs: 4184 -> 4216 (+0.76%)
Uniforms: 6703 -> 6677 (-0.39%); split: -1.61%, +1.22%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35909>
The Midgard compiler only deals with sized NIR types for image loads and
stores. Since we already have nir_get_nir_type_for_glsl_base_type()
which can provide us with the corresponding sized type, let's just use
that, and drop the extra table.
This fixes the following piglits on Mali-T760:
- spec/ext_texture_compression_s3tc/getteximage-targets 2d s3tc
- spec/ext_texture_compression_s3tc/getteximage-targets cube s3tc
Fixes: 9123ee0f18 ("st/mesa/pbo: Set src type on image_store")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35882>
For emulated multiplanar formats like PIPE_FORMAT_IYUV /
DRM_FORMAT_YUV420 resource->format is set to one of the sub-
formats, PIPE_FORMAT_R8_UNORM in this case. As this has only
one plane, users like `dri2_from_planar()` fail for queries
where plane > 0. This in turn breaks queries like
gbm_dri_bo_get_handle_for_plane() which is commonly used
by Wayland compositors to offload client supplied DMABufs
to KMS, using drmModeAddFB2WithModifiers().
Follow the example of most other drivers and return the number
of resources instead.
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35915>
os_mkdir() is a simple wrapper around mkdir() or _mkdir().
Remove a bunch of unneeded #includes in dd_util.h. Testing by
compiling llvmpipe and radeonsi (the only user of the dd_util.h
header).
There's a few other mkdir() callsites that I haven't touched but
could be updated to os_mkdir().
Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35841>
Suse Linux Enterprise 15 is still on g++ 7.5 so std::filesystem
does not exist. Just use mkdir() instead.
Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35841>
Wa_16018063123 is not a workaround that depends on stepping, so we
can use the INTEL_WA_16018063123_GFX_VER macro to reduce code generate
for non affected platforms.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
Wa_16018063123 is not a workaround that depends on stepping, so we
can use the INTEL_WA_16018063123_GFX_VER macro to reduce code generate
for non affected platforms.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
Wa_16018063123 don't apply to video engine also video engine don't
support XY_FAST_COLOR_BLT.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Fixes: ec43c20182 ("anv: implement dummy blit for Wa_16018063123")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
For 3D or GPGPU modes the same render engine should be used, CCS
register should only be used when using compute engine.
Fixes: 46f5359238 ("anv: Invalidate aux map for copy/video engine")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
In the following commits, as we split up a6xx.xml, the #include sequence
gets a bit more complicated. Let's keep it in one place.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35899>
This was added on the kernel side in commit 9d78f0250322
("drm/msm/a6xx+: Don't let IB_SIZE overflow"), but didn't
end up in mesa.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35899>
ac_estimate_size() triggers an assertion because the block size isn't
aligned to a power of two for ASTC formats.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35879>
FEX is a 64-bit process potentially running x86 (32-bit) binary, in
which case the automatic user MMIO offset detection doesn't work, so
let's explicitly set the user MMIO offset when we can.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34573>
Enforce strictness while we still can. We can't enforce strictness
on AFBC(RGB) because the driver might be used by a compositor that
imports buffers from clients that were relying on the old behavior.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35762>
AFBC(YUV) has been introduced after the stricter import rules. Let's
make them strict by default now, so we don't encourage exporters to pass
funky WSI pitch values.
We add a driconf option to allow relaxing this strictness on a per-app
basis.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35762>
Add all newly defined tracepoints, including meta, render,
dispatch/dispatch_indirect, barrier, and sync_wait.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32693>
Define tracepoints for these higl-level api calls
- meta
- render
- dispatch and dispatch_indirect
- barrier
and these low-level cs cmds
- sync_wait
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32693>
We can rely on panvk_per_arch(queue_check_status) to detect device lost.
Because we no longer emit cs_sync32_add from finish_cs to increment
debug syncobj, if an instr between the last draw/dispatch and
end-of-stream causes a CS error, the CS error is ignored. This is fine
because the instr should have no side effect and the kernel emits
ERROR_BARRIER to recover from the CS error.
If that is undesirable, we can restore the old behavior by emitting
cs_sync64_add from finish_cs to increment regular syncobj (and fix
cs_progress_seqno_reg) when PANVK_DEBUG is set.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35606>
If panvk_cs_sync64::error is non-zero at the end of a primary cmdbuf,
copy the error to panvk_cs_subqueue_context::last_error.
Update panvk_per_arch(queue_check_status) to check the last_error and
treat it as device lost if non-zero.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35606>
The chances of this happening are near zero with the way we do surface
ops today but I have seen it in the wild and this is apparently a rule.
The hardware throws an illegal instruction encoding if it sees 255.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35895>
As the lowering mentioned there got renamed twice:
commit b085016f94
Author: Rob Clark <robclark@freedesktop.org>
Date: Fri Mar 25 13:52:26 2016 -0400
nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries
Since it will gain support to lower inputs, give it a more generic name.
commit 1754507d49
Author: Marek Ol¨ák <maraeo@gmail.com>
Date: Wed Jun 25 19:05:19 2025 -0400
nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
Reviewed-by: Marek Ol¨ák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35855>
This changes legacy GS outputs to use the same logic as NGG GS.
It enables the same optimizations that NGG has such as forwarding
constant GS output components to the GS copy shader at compile time.
ac_nir_gs_output_info is removed.
GS output info is no longer passed to ac_nir_lower_legacy_gs and
ac_nir_create_gs_copy_shader separately.
ac_nir_lower_legacy_gs now gathers ac_nir_prerast_out, generates GSVS ring
stores, and also generates the GS copy shader with GSVS ring loads.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This way we won't have to pass output info between the two functions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>