Commit graph

215045 commits

Author SHA1 Message Date
Samuel Pitoiset
168a8d0b52 radv: fix RB+ for depth-only with unused attachments
When there are no color outputs in the rendering state, but color write
enable/write aren't masked out (which seems legal with
VK_EXT_dynamic_rendering_unused_attachments), the driver must emit
CB_DISABLE to disable CB rendering completely.

Otherwise, if there is also a depth/stencil attachment in the rendering
state, CB0 is always set to 32_R for RB+. That means, the pixel shader
would still export fragments but to the previously bound color
attachment.

VKCTS is missing coverage.

Fixes: 4580293ab2 ("radv: implement RB+ depth-only rendering for better perf")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14319
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38509>
2025-11-20 07:37:17 +00:00
Eric Engestrom
5db246a5ba perfetto: use the new upstream repo
The old one was abandoned without so much as a README note.

This will also allow using newer releases than 47; the current one being
53, but this MR doesn't address that, as it aims to be a simple no-op
change.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38525>
2025-11-20 07:07:42 +00:00
Yiwei Zhang
829bd406c0 venus: fix racy semaphore feedback counter update
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Previously, we update the sfb dst slot upon vn_SignalSemaphore so that
vn_GetSemaphoreCounterValue can poll just the feedback slot itself.
However, that can race with pending sfb cmds that are going to update
the slot value, ending up with stuck sync progression.

This change fixes it by disallowing vn_SignalSemaphore to touch the sfb
dst slot. To ensure counter query being monotonic, vn_GetSemaphoreCounterValue
now takes the greater of signaled counter and the sfb counter read.

Test with dEQP-VK.synchronization* group:
- w/o this: stuck shows up within 2 min with 8 parallel deqp runs
- with this: no stuck for multiple full runs of the same

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14304
Fixes: 5c7e60362c ("venus: enable timeline semaphore feedback")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38516>
2025-11-20 06:29:16 +00:00
Faith Ekstrand
0bd5734349 panvk: Set primitive_index_override when prim ID is written by IDVS
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>
2025-11-20 06:09:16 +00:00
Faith Ekstrand
333ca9133a pan/genxml: Rename Primitive Index Override
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>
2025-11-20 06:09:16 +00:00
Faith Ekstrand
652a5f41bf pan/bi: Add support for writing gl_PrimitiveID from IDVS
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>
2025-11-20 06:09:16 +00:00
Faith Ekstrand
5700c87db6 pan/bi: Add some helpers an an info field for needing the extended FIFO
The logic here is a bit scattered around and is about to get more
complicated.  This adds a helper which better documents the interactions
as well as an info field to make the driver's life easier.

Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>
2025-11-20 06:09:16 +00:00
Marek Olšák
9e339f4b32 nir: rename nir_lower_indirect_derefs -> nir_lower_indirect_derefs_to_if_else_trees
This describes better what it does.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38471>
2025-11-20 05:42:11 +00:00
Marek Olšák
22871fb8bd nir: for nir_shift_channels, fill undefined components with undef instead of .x
This potentially results in better code because we don't add def uses where
undef is allowed.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38468>
2025-11-20 04:26:55 +00:00
Marek Olšák
65837d8289 ac,radeonsi: remove gfx11 FW-based MCBP
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's too slow to be usable. User queues could replace it.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38338>
2025-11-20 03:31:47 +00:00
Sagar Ghuge
f0aad5bd7e anv: Convert indirect to direct dispatch
Saves unncessary PC and stall during encode phase.

Thanks to Felix for pointing out that CCS always needs a CS stall once
we add a pipe control, that will kill the performance for BVH
construction.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38513>
2025-11-20 03:11:55 +00:00
Felix DeGrood
15ffe6c524 anv/perfetto: include all pc reasons
Up to 4 reasons can be saved and displayed. Previously, we were
only displaying one reason for Perfetto.

Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38500>
2025-11-20 02:53:53 +00:00
Connor Abbott
5ccbcf8a8b tu: Support softfloat64
DOOM Eternal uses fp64 without checking. Don't expose it, but as a
workaround lower the fp64 operations to software so that we don't choke
on them, similar to anv.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>
2025-11-20 02:14:50 +00:00
Connor Abbott
3b3954e2b8 util/glsl2spirv: Use better glslang flag for -Olib
--create-unlinked also creates entrypoints for the functions, and
obviates the need to create a dummy entrypoint. This is one step closer
to removing glsl2spirv and aligns us with other users of glslang.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>
2025-11-20 02:14:50 +00:00
Connor Abbott
9e3bc1f123 tu: Make softfloat shader compiled on demand
The vast majority of users will not need this, so do not pay the runtime
and memory cost of compiling the shader to NIR until it's needed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>
2025-11-20 02:14:50 +00:00
Yonggang Luo
6400de124c docs: Update the minimal MSVC version requirements
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
As now __typeof__ is used in util/macros.h

The minimal required version is Visual Studio 17.9 or later, or cl.exe version 19.39.33428 or later.
According to https://learn.microsoft.com/en-us/cpp/c-language/typeof-c?view=msvc-170#requirements

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>
2025-11-20 01:16:17 +00:00
Yonggang Luo
ca364a9551 ci: update image tags for windows container
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>
2025-11-20 01:16:17 +00:00
Yonggang Luo
11a775539f ci: MSVC 2019 is not support anymore, remove it.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>
2025-11-20 01:16:17 +00:00
Yonggang Luo
84cce2a739 ci/microsoft: Downgrading WinFlexBison.win_flex_bison to version 2.5.24
This is achieved by provide version parameter to winget-cli

Fixes: 9592686ca0 ("ci/windows: Use winget to install packages and install Microsoft.WindowsWDK.10.0.26100")

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13968

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>
2025-11-20 01:16:17 +00:00
Aitor Camacho
4b4061fa71 kk: Fix image to image copy
This change forces image->buffer->image copy path for pretty much
all the cases now.

Metal's image to image copy only allows same format and sample
count. Previously we were only taking the image->buffer->image
path for compressed formats. This just seemed to work, but we may
run into issues in the future. Metal does not report any
validation layer error.

Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38459>
2025-11-20 00:57:29 +00:00
Faith Ekstrand
6d9f563960 spirv: Assume variable workgroup size unless it's set
This fixes an issue a bunch of different components were all working
around themselves where sometimes we don't have a workgroup size but
workgroup_size_variable is false.  This also fixes asahi_clc, which
didn't have the workaround and was assuming zero (but not variable!)
workgroup sizes everywhere.

LoLed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38538>
2025-11-20 00:02:42 +00:00
Mel Henning
80db8171de zink: Lock around screen_debug_marker_{begin,end}
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
vkQueueBeginDebugUtilsLabelEXT and vkQueueEndDebugUtilsLabelEXT
require queue to be externally synchronized, which means these functions
require the lock. Unfortunately, there's no guarantee that the debug
markers will be matched in the multithreaded case, but I suppose this is
better than crashing.

Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
2025-11-19 23:43:45 +00:00
Mel Henning
018178842e zink: Lock queue_lock in zink_destroy_screen
Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
2025-11-19 23:43:45 +00:00
Mel Henning
9acce36652 zink: Create one queue lock per device
We currently only create one queue per queue family on the device. The
device can be shared between multiple zink_screens, so having one lock
per screen can still lead to multiple locks per queue. Fix this by
allocating queue_lock along with the device.

This fixes an issue that was causing crashes with nvk+zink and
QtWebEngine with QTWEBENGINE_FORCE_USE_GBM=1 This can be reproduced by
resizing the window in either:

 * anki - https://apps.ankiweb.net/ or
 * Qt's simplebrowser example
   https://doc.qt.io/qt-6/qtwebengine-webenginewidgets-simplebrowser-example.html

which would then cause this dmesg error:

    nouveau 0000:01:00.0: anki[92007]: Failed to find syncobj (-> in): handle=40

along with a context loss.
With VK_LOADER_LAYERS_ENABLE=VK_LAYER_KHRONOS_validation we would additionally
get warnings like:

    Validation Error: [ UNASSIGNED-Threading-MultipleThreads-Write ] | MessageID = 0xa05b236e
    vkQueueSubmit(): THREADING ERROR : object of type VkQueue is simultaneously used in current thread 139824449189568 and thread 139823901816512
    Objects: 1
        [0] VkQueue 0x557a666783e0

Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
2025-11-19 23:43:44 +00:00
Mel Henning
dff1b9d4e9 zink: Make screen->queue_lock a pointer
Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
2025-11-19 23:43:44 +00:00
Mel Henning
f0dc8c0224 zink: Return zink_device in create_logical_device
Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
2025-11-19 23:43:43 +00:00
Aitor Camacho
abc719f01f kk: Add multiViewport and EXT_shader_viewport_index_layer support
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38518>
2025-11-19 23:29:00 +00:00
Aitor Camacho
15f170e369 kk: Merge io type modifying passes into one
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38518>
2025-11-19 23:29:00 +00:00
Faith Ekstrand
cbd0c9eb3b panvk: Add a panvk_common_sysvals struct
For geometry shaders, we're going to need to compile various graphics
shaders down to compute shaders.  This means that they'll look like
compute shaders to much of the compile pipeline but ultimately get
executed as graphics shaders.  Most of the time, the compiler will just
happily take whatever offset you give and try to load the sysval from
there so you can load a graphics sysval from a compute shader just fine.
However, for the common ones, we switch on the shader stage and load
from a different offset for 3D vs. compute.  This breaks the moment you
have a compute shader that's going to actually load from a 3D sysval
space.

The solution here is to ensure that any common sysvals (currently just
the push uniforms address and the printf buffer) are at exactly the same
offset in both.  This is done by adding a panvk_common_sysvals struct,
some static asserts, and a bit of macro magic to keep things eurgonamic.
This also changes push uniform upload to just swap in the push uniform
address instead of writing it to the command buffer on every iteration.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38508>
2025-11-19 23:10:41 +00:00
Eric Engestrom
3ebabe9e43 docs/release-calendar: add 26.0 branchpoint and release candidates
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38539>
2025-11-19 23:04:46 +00:00
Lionel Landwerlin
6fe2035065 anv: bump maxTessellationControlTotalOutputComponents
Our backend compiler explains the limits as :

   32 bytes for the patch header (tessellation factors)
  480 bytes for per-patch varyings (a varying component is 4 bytes and
            gl_MaxTessPatchComponents = 120)
16384 bytes for per-vertex varyings (a varying component is 4 bytes,
            gl_MaxPatchVertices = 32 and
            gl_MaxTessControlOutputComponents = 128)

In all that's :
  * 32 patches * 128 components (counting tessellation factors)
  * 32 vertices * 128 components

8192 total components.

I'm not sure why the limit was set so low, maybe leftover from older platforms?

Bump the limit to something like competition.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38523>
2025-11-19 22:44:54 +00:00
Eric R. Smith
65ba14519e pan: fix a bifrost disassembly assert failure
We were overflowing an array during bifrost disassembly. This was
only a problem if the user explicitly set an environment variable,
so unlikely to occur in casual use, and also only could be triggered
in very specific, dense code. But we still should get this right!

The specific CTS test that caused the assert is:

'dEQP-VK.graphicsfuzz.stable-quicksort-for-loop-with-injection'

with environment variable `BIFROST_MESA_DEBUG=shaders`. One of the
shaders has a clause with 6 constants (the maximum) and this overflowed
the array because we assume we always have an extra slot (used for
modifier processing).

Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38501>
2025-11-19 22:10:21 +00:00
Dmitry Baryshkov
7a3bfd1f79 rocket: drop file names from the generated file
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Having file names and dates in the generated file affects
reproducibility. Build systems (like OE) error out on the gen_header.py
output, because it can contain full paths. Drop file list from the
generated file.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>
2025-11-19 16:27:32 +00:00
Dmitry Baryshkov
cdb6468c53 ethosu: drop file names from the generated file
Having file names and dates in the generated file affects
reproducibility. Build systems (like OE) error out on the gen_header.py
output, because it can contain full paths. Drop file list from the
generated file.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>
2025-11-19 16:27:32 +00:00
Hyunjun Ko
9a9342e4aa anv/video: handling segmentations features for vp9 decoding
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38418>
2025-11-19 15:54:47 +00:00
Hyunjun Ko
1479e1ef82 anv/video: rework for handling alternative quantizer for vp9 decoding.
including prep-work for handling segmentation features.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38418>
2025-11-19 15:54:47 +00:00
Danylo Piliaiev
8827123fef tu: Disable FLAG_WAIT_FOR_BR sync when CB is disabled
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Skip TU_CMD_FLAG_WAIT_FOR_BR wait whenever concurrent binning is disabled.
Without CB there is nothing to wait for, so the sync only adds overhead,
and in workloads with thousands of tiny renderpasses the cumulative overhead
becomes too big.

In one real-world workload I saw the following timings:
- 99.20 ms without disabling TU_CMD_FLAG_WAIT_FOR_BR
- 65.15 ms with TU_CMD_FLAG_WAIT_FOR_BR disabled
- 64.92 ms with TU_DEBUG=nocb

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:33 +00:00
Danylo Piliaiev
9370bdc61e tu: Disable by default CB running alongside renderpasses
Disable concurrent binning by default so regular renderpasses have access
to all vertex fetch resources. When a renderpass can actually enable CB,
walk back to the CB barrier at submission time and re-enable CB for all
patchpoints between CB barrier and the renderpass.
Because we expect at most one or two renderpasses with CB per frame,
the number of patches stays small.

The reduced vertex fetch resources resulted in up to 10% performance loss
seen in targeted benchmark and in a few game captures.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:33 +00:00
Danylo Piliaiev
5d2b171886 tu/cs: Helpers to create a region that can be easily enabled/disabled
To mitigate CB perf impact we'd need to be able to eaily toggle CB
related IB regions.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:32 +00:00
Danylo Piliaiev
a7f63a5dbb tu: Do not WAIT_FOR_BR if concurrent binning is disabled
The sync emitted on TU_CMD_FLAG_WAIT_FOR_BR didn't disable CB
when CB was previously disabled for the renderpass, this resulted
in less resources vertex processing resources available for BR.

We can just not emit the sync instead, since next time CB is enabled
it will force the sync.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:32 +00:00
Danylo Piliaiev
f2fb8ad422 tu: Don't CONCURRENT_BIN_DISABLE when there is no depth image
We have to disable CB when lrz fast-clear is disabled, but if there
is no depth image at all, we can keep it enabled. This means that
RP without depth won't effectively be a CB barrier.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:32 +00:00
Danylo Piliaiev
ee4f375bfd tu: Fix CB barrier description
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
2025-11-19 14:35:32 +00:00
Janne Grunau
1f144081ec meson: Add asahi to aarch64's auto-generated drivers
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since the Apple silicon M1 and M2 series of SoCs support only aarch64
split the lists for 'arm' and 'aarch64'.

Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38341>
2025-11-19 11:16:53 +00:00
Georg Lehmann
fa66b670d4 aco/optimizer: reduce max alu_opt_info stack operands to 4
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
ALU instructions typically have a maximum of 3 operands, and even when combining
instructions, the peak count will not go above 4.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:43 +00:00
Georg Lehmann
4da74eed96 aco/tests: test packed fma opts
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:43 +00:00
Georg Lehmann
1f0293be0d aco/optimizer: use new helpers for packed fma
Foz-DB Navi48:
Totals from 374 (0.45% of 82419) affected shaders:
MaxWaves: 5476 -> 5480 (+0.07%)
Instrs: 2786653 -> 2784061 (-0.09%); split: -0.11%, +0.01%
CodeSize: 15163340 -> 15153460 (-0.07%); split: -0.08%, +0.01%
VGPRs: 46884 -> 46860 (-0.05%)
SpillVGPRs: 188 -> 189 (+0.53%)
Scratch: 3207936 -> 3208192 (+0.01%)
Latency: 27352681 -> 27350006 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 5933554 -> 5932632 (-0.02%); split: -0.02%, +0.01%
VClause: 62355 -> 62359 (+0.01%); split: -0.03%, +0.04%
Copies: 290221 -> 289786 (-0.15%); split: -0.21%, +0.06%
Branches: 108566 -> 108569 (+0.00%); split: -0.01%, +0.01%
PreVGPRs: 40172 -> 40157 (-0.04%)
VALU: 1355753 -> 1353329 (-0.18%); split: -0.19%, +0.01%
SALU: 524836 -> 524831 (-0.00%); split: -0.01%, +0.01%
VMEM: 90948 -> 90950 (+0.00%)
VOPD: 10489 -> 10490 (+0.01%); split: +0.98%, -0.97%

Foz-DB Navi21:
Totals from 374 (0.45% of 82387) affected shaders:
MaxWaves: 4339 -> 4348 (+0.21%)
Instrs: 2255741 -> 2253554 (-0.10%); split: -0.10%, +0.00%
CodeSize: 12755276 -> 12744184 (-0.09%); split: -0.09%, +0.01%
VGPRs: 40376 -> 40352 (-0.06%)
Latency: 27357012 -> 27348737 (-0.03%); split: -0.07%, +0.04%
InvThroughput: 7213578 -> 7211136 (-0.03%); split: -0.07%, +0.04%
VClause: 62154 -> 62172 (+0.03%); split: -0.01%, +0.04%
Copies: 268204 -> 268048 (-0.06%); split: -0.22%, +0.16%
Branches: 107067 -> 107066 (-0.00%)
PreVGPRs: 37615 -> 37599 (-0.04%)
VALU: 1423326 -> 1421187 (-0.15%); split: -0.16%, +0.01%
SALU: 383388 -> 383390 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:43 +00:00
Georg Lehmann
fec10ea3ea aco/optimizer: use new helpers for add16 opts
Foz-DB Navi48:
Totals from 164 (0.20% of 82419) affected shaders:
Instrs: 145304 -> 145335 (+0.02%); split: -0.00%, +0.02%
CodeSize: 794156 -> 794280 (+0.02%); split: -0.00%, +0.02%
Latency: 1884349 -> 1884227 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 350403 -> 350393 (-0.00%)

Foz-DB Navi21:
Totals from 164 (0.20% of 82387) affected shaders:
Instrs: 117416 -> 117414 (-0.00%)
CodeSize: 673328 -> 673312 (-0.00%)
Latency: 1896952 -> 1897094 (+0.01%); split: -0.00%, +0.01%
InvThroughput: 638536 -> 638556 (+0.00%); split: -0.01%, +0.01%
Copies: 14579 -> 14577 (-0.01%)
VALU: 65895 -> 65893 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:42 +00:00
Georg Lehmann
e8f5b9374b aco/optimizer: use new helpers to optimize mul(b2f(a), b)
Foz-DB Navi48:
Totals from 979 (1.19% of 82419) affected shaders:
Instrs: 3630560 -> 3629463 (-0.03%); split: -0.03%, +0.00%
CodeSize: 19154176 -> 19147124 (-0.04%); split: -0.04%, +0.00%
Latency: 17700546 -> 17699505 (-0.01%); split: -0.01%, +0.01%
InvThroughput: 3143808 -> 3143254 (-0.02%); split: -0.02%, +0.01%
SClause: 76410 -> 76405 (-0.01%); split: -0.01%, +0.00%
Copies: 256544 -> 256554 (+0.00%); split: -0.02%, +0.02%
PreVGPRs: 40868 -> 40835 (-0.08%)
VALU: 2003291 -> 2002466 (-0.04%); split: -0.04%, +0.00%
SALU: 514000 -> 514006 (+0.00%)
VOPD: 3254 -> 3256 (+0.06%); split: +0.12%, -0.06%

Foz-DB Navi21:
Totals from 926 (1.12% of 82387) affected shaders:
MaxWaves: 21538 -> 21542 (+0.02%)
Instrs: 2984216 -> 2983187 (-0.03%); split: -0.04%, +0.00%
CodeSize: 16104112 -> 16097272 (-0.04%); split: -0.05%, +0.00%
VGPRs: 46864 -> 46848 (-0.03%)
Latency: 15678064 -> 15677099 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 3779550 -> 3778230 (-0.03%); split: -0.04%, +0.01%
VClause: 81590 -> 81598 (+0.01%)
SClause: 70753 -> 70751 (-0.00%); split: -0.01%, +0.00%
Copies: 240446 -> 240466 (+0.01%); split: -0.01%, +0.02%
PreSGPRs: 51121 -> 51062 (-0.12%)
PreVGPRs: 38538 -> 38505 (-0.09%)
VALU: 1978847 -> 1977777 (-0.05%); split: -0.06%, +0.00%
SALU: 439184 -> 439212 (+0.01%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:42 +00:00
Georg Lehmann
f0e24284f5 aco/optimizer: create max3/min3/med3 with salu min/max
Foz-DB Navi48:
Totals from 175 (0.21% of 82419) affected shaders:
Instrs: 465863 -> 465260 (-0.13%); split: -0.13%, +0.00%
CodeSize: 2362264 -> 2360744 (-0.06%); split: -0.07%, +0.00%
Latency: 1548501 -> 1548371 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 227683 -> 227630 (-0.02%); split: -0.08%, +0.06%
Copies: 33646 -> 33648 (+0.01%)
PreSGPRs: 9996 -> 10004 (+0.08%)
VALU: 175836 -> 175850 (+0.01%)
SALU: 122094 -> 121621 (-0.39%); split: -0.39%, +0.00%

Foz-DB Navi21:
Totals from 1 (0.00% of 82387) affected shaders:
InvThroughput: 74 -> 76 (+2.70%)
VALU: 57 -> 58 (+1.75%)
SALU: 61 -> 60 (-1.64%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:42 +00:00
Georg Lehmann
d21734e024 aco/optimizer: use new helper functions to create med3
Foz-DB Navi48:
Totals from 9659 (11.72% of 82419) affected shaders:
Instrs: 17301747 -> 17301735 (-0.00%); split: -0.00%, +0.00%
CodeSize: 93378108 -> 93378184 (+0.00%); split: -0.00%, +0.00%
Latency: 145441784 -> 145441791 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 25768777 -> 25768778 (+0.00%)
Copies: 1370123 -> 1370124 (+0.00%)
VALU: 9705655 -> 9705656 (+0.00%)

Foz-DB Navi21:
Totals from 22 (0.03% of 82387) affected shaders:
Instrs: 27433 -> 27406 (-0.10%)
CodeSize: 146440 -> 146352 (-0.06%); split: -0.06%, +0.00%
Latency: 305857 -> 305806 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 63634 -> 63580 (-0.08%)
VALU: 19109 -> 19082 (-0.14%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
2025-11-19 10:51:42 +00:00