Commit graph

220765 commits

Author SHA1 Message Date
Valentine Burley
994ead31bd ci: Disable Collabora's farm due to network issues
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cambridge office has lost internet connection.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40824>
2026-04-07 14:30:35 +00:00
Mary Guillemard
6d700284ac nvk: Use SET_PRIMITIVE_TOPOLOGY instead of MME scratch
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of keeping track of the topology with some scratch value in MME,
we can rely on SET_PRIMITIVE_TOPOLOGY to directly set it.

This simplify some of the MME codegen but does not seems to have any
impact on performance in general.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40749>
2026-04-07 14:11:16 +00:00
Valentine Burley
bbed00ac81 ci/freedreno: Move remaining lazor a618 jobs, retire device type
The sc7180-trogdor-lazor-limozeen devices have been dying off over the
past few weeks, so move the last two jobs to sc7180-trogdor-kingoftown
and retire the device type.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40818>
2026-04-07 11:55:59 +00:00
Natalie Vock
fded5e321d aco: Nuke ACO-side prolog selection
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
afe519406b radv: Rewrite the RT prolog in NIR
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
b53dc3f052 aco/lower_to_hw_instr: Run p_init_scratch if the program has a call
Callees may use scratch even if the caller doesn't.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
378c9536de aco/isel: Fix stack_ptr synthesis
info.stack_ptr.is_reg is always true. We have a stack pointer to use
if and only if the program is a callee.

Also, apply_scratch_offset needs to be true in a few more places.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
31e08322d7 aco/spill_preserved: Only compute preserved registers if in a callee
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Xianzhong Li
248b0b47b7 panfrost: Fix GEM handle refcount leak in panfrost_bo_import
panfrost_bo_import() calls drmPrimeFDToHandle() then pan_kmod_bo_import(),
which also calls drmPrimeFDToHandle() internally. This double import causes
GEM handle refcount leaks because each drmPrimeFDToHandle() increments the
kernel's GEM handle refcount, but only one drmCloseBufferHandle() is called
during cleanup by panfrost_kmod_bo_free(or panthor_kmod_bo_free).

Fix by removing the redundant drmPrimeFDToHandle() and using
pan_kmod_bo_import() directly. On re-import of existing buffers, properly
release the extra pan_kmod_bo reference with pan_kmod_bo_put().

This ensures GEM handle refcount, pan_kmod_bo refcount, and panfrost_bo
refcount are all properly balanced.

Fixes: 5089a758df ("panfrost: Back panfrost_bo with pan_kmod_bo object")

Signed-off-by: Xianzhong Li <xianzhong.li@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40778>
2026-04-07 11:06:34 +00:00
Natalie Vock
436acc321a radv: Disable RADV_DEBUG=llvm in release builds
The LLVM backend is unmaintained. Let's not encourage users to swap out
entire parts of the driver with an unsupported codepath. Enabling this
option is a footgun nowadays anyway, given that it disables many
features and thus may trigger bigger changes in behavior than intended.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40815>
2026-04-07 09:55:25 +00:00
Daniel Schürmann
58390ceb98 radv: increase limit for peephole_select in radv_optimize_nir_algebraic_early()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals from 4868 (2.40% of 202440) affected shaders: (Navi48)

MaxWaves: 128008 -> 128004 (-0.00%); split: +0.04%, -0.05%
Instrs: 10006725 -> 9978721 (-0.28%); split: -0.31%, +0.03%
CodeSize: 54085500 -> 54018184 (-0.12%); split: -0.19%, +0.07%
VGPRs: 299524 -> 299584 (+0.02%); split: -0.10%, +0.12%
SpillSGPRs: 8707 -> 8669 (-0.44%); split: -0.48%, +0.05%
Latency: 79101292 -> 79243875 (+0.18%); split: -0.55%, +0.73%
InvThroughput: 13645193 -> 13731338 (+0.63%); split: -0.08%, +0.71%
VClause: 181709 -> 181485 (-0.12%); split: -0.23%, +0.10%
SClause: 222587 -> 221191 (-0.63%); split: -1.26%, +0.63%
Copies: 708979 -> 690992 (-2.54%); split: -2.71%, +0.17%
Branches: 232868 -> 223146 (-4.17%)
PreSGPRs: 275370 -> 274818 (-0.20%); split: -0.25%, +0.05%
PreVGPRs: 238859 -> 238907 (+0.02%); split: -0.01%, +0.03%
VALU: 5291185 -> 5291617 (+0.01%); split: -0.08%, +0.09%
SALU: 1610496 -> 1604458 (-0.37%); split: -0.68%, +0.30%
VMEM: 303401 -> 303037 (-0.12%)
SMEM: 358335 -> 357964 (-0.10%)
VOPD: 377180 -> 376374 (-0.21%); split: +0.05%, -0.27%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40708>
2026-04-07 08:00:04 +00:00
Samuel Pitoiset
71b6db06e1 ac/nir: add descriptor heap support to opt_flip_if_for_mem_loads()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>
2026-04-07 06:15:24 +00:00
Samuel Pitoiset
1184610de4 ac/nir: add descriptor heap support to ac_nir_lower_image_tex()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>
2026-04-07 06:15:24 +00:00
Samuel Pitoiset
d2132ae011 ac/nir: adjust lowering of query size for descriptor heap
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>
2026-04-07 06:15:24 +00:00
Mel Henning
001de6d71b nak: Fix mufu's f16 bit on sm90+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes multiple cts tests on blackwell, including eg.
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc

Fixes: d031365f7c ("nak: support MUFU.F16")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>
2026-04-07 05:10:16 +00:00
Faith Ekstrand
0d5cae97b7 pan/bi: Vectorize 8-bit ops up to v4i8
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
15d5675e8e pan/bi: Pack 8-bit vec2s
We used to splat out 8-bit vec2s to 16-bit by repeating both 8-bit
halves twice with the B0011 swizzle.  I think the original idea here was
that 16-bit swizzles were more widely available in the hardware and that
this would make swizzling things easier.  The problem is that nothing
actually knows that the value is half-repeated like this so nothing
knows it can upgrade a swizzle from B0022 to B0123 (H01).  So instead we
get a bunch of B0022 swizzles, which nothing supports.

We can shave a lot of instructions if we just stop trying to be so
clever and instead repeat the whole thing with a B0101 swizzle.

The only real issue here is that v2[fiu]8_to_v2[fiu]16 needs a B0011
swizzle, which we have to apply on-the-fly.  Fortunately, any swizzle
can be composed with B0011.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
db8cb73b34 pan/bi: Add bytewise copy propagation
This adds a new bytewise copy propagation pass which chews through MKVEC
and SWZ instructions.  The word-based copy propagation pass only existed
to chew through SPLIT/COLLECT but MKVEC is COLLECT for bytes and we had
nothing to help with that.

This is actually two passes in one: Byte propagation and swizzle
propagation. Any time we see a MKVEC, we look at its sources only as
bytes and chase individual bytes back, through other MKVEC and SWZ, to
their generating instruction and make the MKVEC only consume the
original bytes.  If the MKVEC happens to construct something that's just
a swizzle of another def (this is fairly common), we record that as
well. The idea here is that a lot of MKVEC just consume other MKVEC and
we can get rid of the intermediate ones or even the whole chain if it
just ends up being a swizzle in the end.

For SWZ instructions, we first look at them like a MKVEC of the
individual bytes they consume.  If that doesn't yield a single swizzled
word, we then crawl through the words table, just accumulating swizzles.
This gives us the best (closest to the generating instructions) coherent
word.  We could also replace SWZ with MKVEC and just do byte propagation
but MKVEC is often 2 instructions whereas SWZ is often one (or folded
into a source) so this is probably the better balance.

Finally, we not only replace the MKVEC and SWZ instructions but we also
attempt to propagate swizzles into individual ALU op sources.  For v4i8
ops, this often fails since the full generality isn't always available
but for fp16, we can almost always fold the swizzle into the consuming
instruction.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
a4e9002660 pan/bi: Emit MKVEC directly
Now that we have bi_lower_mkvec_swz(), there's no need to be so careful
in the NIR -> bi translation.  We can just emit MKVEC and move on.  The
lowering pass will sort out the detaisl.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
b9e33c7897 pan/bi: Stop lowering swizzles on mkvec and swz
The new lowering can handle all the swizzle cases and is generally
better at it than swizzle lowering.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
ed83d46d4e pan/bi: Always use SWZ.v4i8 in bi_lower_swizzle()
Now that we lower it, there's no advantage to one over the other at the
time this pass runs.  Also, the is_8bit check was technically wrong
since it checks destination sizes, not source sizes.  It's a lot safer
to just use SWZ.v4i8 and let the lowering pass do the right thing.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:25 +00:00
Faith Ekstrand
bc7053a976 pan/bi: Add a lowering pass for MKVEC and SWZ
Instead of trying very carefully in the bifrost emit code to only
generate valid MKVEC for the target hardware, this adds a lowering pass
which is capable of lowering any MKVEC or SWZ we can throw at it.  Even
if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll
lower it to something that does work on that platform.  This frees up
the rest of the compiler so we can add and modify MKVEC and SWZ at-will
and never have to worry about hardware generation details.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:24 +00:00
Faith Ekstrand
0edceaf383 pan/bi: Add a bi_op_supports_swizzle() helper
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:24 +00:00
Faith Ekstrand
a8879daf9c pan/bi: Add a bi_try_compose_swizzles() helper
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:24 +00:00
Faith Ekstrand
3b728cb613 pan/bi: Add a bi_swizzle_from_byte_channels() helper
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:24 +00:00
Faith Ekstrand
4912bda122 pan/bi: Return void from bi_swizzle_to_byte_channels()
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:24 +00:00
Faith Ekstrand
e637130794 pan/bi: Use bi_half() for texture MS indices
It feeds into a v2i16 so it needs to be 16-bit.

Fixes: ae79f6765a ("pan/bi: Emit Valhall texture instructions")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Faith Ekstrand
77f9cbd0c2 pan/bi: Compose swizzles in bi_half() and bi_byte()
At least bi_half() has the decency to assert if the swizzle isn't
BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert
and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of
what was there before.  In a lot of cases, this doesn't matter but we
use both in translating NIR to BI on things that may have already been
swizzled so we need to do the composition.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Faith Ekstrand
342e9ac7e8 pan/bi: Add a bi_swizzle_from_half() helper
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Faith Ekstrand
05c5e52054 pan/bi/ra: Allow offsets on tied sources
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source.  The source
offset doesn't matter.

Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Faith Ekstrand
538b5c411e pan/bi: Delete a few instruction encodings
The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are
either documented wrong or broken in hardware.  Instead of swizzling
b0101 and b2323, they swizzle b0011 and b2233 on G52.  This is either a
hardware bug or an issue with documentation.  In either case, it's
probably best not to trust it.  Those swizzles aren't all that useful
anyway.  We also weren't using any of them before (or they'd have
broken) so this isn't a performance regression.

Cc: mesa-stable
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Faith Ekstrand
3fffcf4338 pan/bi: Support more swizzle aliases in the bifrost pack code
Fixes: 82328a5245 ("pan/bi: Generate instruction packer for new IR")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
2026-04-06 21:39:23 +00:00
Karol Herbst
f015600c89 docs: add AI disclosure requirements
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
2026-04-06 21:34:11 +00:00
Karol Herbst
90d3ddfc80 docs: clarify the use of autonomously acting tooling
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
2026-04-06 21:34:11 +00:00
Silvio Vilerino
9f3b3f039f mediafoundation: Remove unnecessary staging variable in ProcessSliceBitstreamZeroCopy
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>
2026-04-06 21:19:05 +00:00
Silvio Vilerino
7ae2fe285f mediafoundation: Pre-create all MFSamples to avoid per slice COM allocation in the hot loop
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>
2026-04-06 21:19:05 +00:00
Silvio Vilerino
3467763ab5 mediafoundation: Prefetch the slice fence handles before the waits
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>
2026-04-06 21:19:04 +00:00
Sagar Ghuge
f0ae58df12 intel/compiler: Handle TerminateOnFirstHit in ray query execution
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Once commited and have AABB or triangle intersection found, terminate
the traversal if TerminateOnFirstHit ray flag is present.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>
2026-04-06 10:00:05 -07:00
Gurchetan Singh
c4cecd9d19 gfxstream: cereal: fix 'None' in gfxstream codegen
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Commit e27e41a842 ("vulkan,spirv: update headers") exposed a
flaw in the cerealgenerator.

It modified -- among other things -- the VkDeviceCreateInfo
struct in vk.xml.

In the update, the len="enabledLayerCount,null-terminated"
attribute was removed from the ppEnabledLayerNames member.

The gfxstream code generator processes ppEnabledLayerNames
(which is a const char* const*), it identifies it as an "array of
strings". However, because the len attribute is now missing,
vulkanType.getLengthExpression() returns None.

This leads to errors like:

gfxstream_guest_vk_autogen_impl/gen/goldfish_vk_counting_guest.cpp:642:30:
error: use of undeclared identifier 'None'

     642 |     for (uint32_t i = 0; i < None; ++i)
         |                              ^~~~
   1 error generated.

This patch adds various length access checks to prevent this from
happening.

TEST=m vulkan.ranchu

Reviewed-by: David Gilhooley <djgilhooley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40785>
2026-04-06 08:19:04 -07:00
Dhruv Mark Collins
77835f6c21 zink+turnip/ci: Add failures uncovered by new autotune
These failures turned out to be triggered by the new autotune
causing rendering mode transitions (such as GMEM -> SYSMEM) which
led to a new set of failures to be uncovered. They tend to work as
expected under either GMEM or SYSMEM being forced for all RPs but
the specific transitions caused by the autotuner leads them to fail.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:51 +00:00
Dhruv Mark Collins
152b9c8db3 freedreno/fdperf: Detect when counter values are invalid
The usage of two CP counters by latency sensitive autotuner will
affect the operation of fdperf, this detects when counters have
selectors that have been changed and marks them as invalid with
corresponding UI cues.

This also seems to detect selector values being dropped while the
GPU is in sleep states and tends to be useful to catch that too.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
da089bf741 tu/autotune: Only lock RPs sustain certain mode for 30s
Many games have short periods where a certain mode might win
consistently but this trend doesn't hold after that. Only allowing
locking to occur on RPs where a certain mode consistently stays
winning for 30s allows us to partially mitigate these bad locks.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
c725f2aea3 tu/autotune: Allow 99% max probability in profiled mode
The maximum probability was limited to 95% earlier due to the step
delta of 5% (95+5=100% which we wanted to avoid). This introduces a
new slower step delta after 95% which steps at 1% up to 99% which
is significantly better in terms of eliminating the performance loss
or stuttering from when there is a large difference between the modes.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
3b3ae477f3 tu/autotune: Add render mode locking to PROFILED algorithm
There are certain scenarios where even switching to another render
mode has significant negative implications for performance even
when done for a single invocation. Now we try to heuristically
pick out these cases and lock them into the optimal mode, at the
moment the heuristic is fairly conservative but it manages to lock
RPs in under a minute in most cases.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
3002d77dfd tu+util: Prefer SYSMEM for DXVK/VKD3D
PC games tend to almost always run far better in SYSMEM due to the
high FS complexity, and so preferring SYSMEM tends to be a winning
policy until profiled mode reaches a state where it can surpass it.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
ed643d1766 tu+util: Allow setting autotune mode from driconf
Allows for setting an override for the default autotune mode using
driconf, allowing for setting policy on a per-app basis.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
180c0de746 tu/autotune: Add prefer SYSMEM/GMEM mode
Certain games tend to use rendering patterns that strongly prefer
one mode over the other, and thus we're better off not bothering
with profiling them.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
3fcec4762f tu/autotune: Add "Preempt Optimize" mode
This introduces a new option that makes autotune optimize for low
preemption latency which is crucial to ensure responsiveness on
systems with GPU-based composition. A large enough draw can entirely
block the compositor from running with draw-level preemption, this can
be mitigated by preferring to use GMEM which breaks up the draw into
smaller pieces and generally has a lower latency for preemption.

As a further mitigation, tiles in GMEM are then divided into smaller
and smaller pieces which lowers the non-preemptible duration. There
are static checks in place to avoid doing this when it would incur a
cost that is too large.

Uses performance counters read during ambles to detect preemption
latency events while rendering in SYSMEM. This approach is superior
to using RBBM draw time thresholds which could be imprecise as only
the average was calculated rather than true maximum draw time.

However, converting the preemption latency performance counter value
from CP ticks to wall clock is based on the average GPU frequency of
the whole period from the start of the RP until the switch-away amble
while the preemption latency stars counting from the request. Thus, if
the GPU frequency shifts rapidly throughout the RP, it may cause the
estimated wall clock time to be inaccurate, but it should be good enough
in the vast majority of cases.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
bf2777c013 tu/autotune: Disable autotuning for small renderpasses by default
Tuning these small renderpasses is difficult due to their high
variability across command buffers and low impact on overall performance
in most cases. This change disables autotuning for renderpasses with 5
or fewer draw calls unless the TUNE_SMALL modifier flag is explicitly
set.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00
Dhruv Mark Collins
8e1fe9da20 tu/autotune: Prefer SYSMEM when only SW binning is possible
In cases where only SW binning is possible and where there would be
a performance impact from not using HW binning (i.e. > 2 tiles), it
is preferable to default to SYSMEM as the performance impact of
using GMEM is almost definitely not going to be worth it.

Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
2026-04-06 14:19:29 +00:00