fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 07:08:04 +02:00

Author	SHA1	Message	Date
Valentine Burley	bbed00ac81	ci/freedreno: Move remaining lazor a618 jobs, retire device type The sc7180-trogdor-lazor-limozeen devices have been dying off over the past few weeks, so move the last two jobs to sc7180-trogdor-kingoftown and retire the device type. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40818>	2026-04-07 11:55:59 +00:00
Natalie Vock	fded5e321d	aco: Nuke ACO-side prolog selection Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>	2026-04-07 11:28:05 +00:00
Natalie Vock	afe519406b	radv: Rewrite the RT prolog in NIR Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>	2026-04-07 11:28:05 +00:00
Natalie Vock	b53dc3f052	aco/lower_to_hw_instr: Run p_init_scratch if the program has a call Callees may use scratch even if the caller doesn't. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>	2026-04-07 11:28:05 +00:00
Natalie Vock	378c9536de	aco/isel: Fix stack_ptr synthesis info.stack_ptr.is_reg is always true. We have a stack pointer to use if and only if the program is a callee. Also, apply_scratch_offset needs to be true in a few more places. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>	2026-04-07 11:28:05 +00:00
Natalie Vock	31e08322d7	aco/spill_preserved: Only compute preserved registers if in a callee Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>	2026-04-07 11:28:05 +00:00
Xianzhong Li	248b0b47b7	panfrost: Fix GEM handle refcount leak in panfrost_bo_import panfrost_bo_import() calls drmPrimeFDToHandle() then pan_kmod_bo_import(), which also calls drmPrimeFDToHandle() internally. This double import causes GEM handle refcount leaks because each drmPrimeFDToHandle() increments the kernel's GEM handle refcount, but only one drmCloseBufferHandle() is called during cleanup by panfrost_kmod_bo_free(or panthor_kmod_bo_free). Fix by removing the redundant drmPrimeFDToHandle() and using pan_kmod_bo_import() directly. On re-import of existing buffers, properly release the extra pan_kmod_bo reference with pan_kmod_bo_put(). This ensures GEM handle refcount, pan_kmod_bo refcount, and panfrost_bo refcount are all properly balanced. Fixes: `5089a758df` ("panfrost: Back panfrost_bo with pan_kmod_bo object") Signed-off-by: Xianzhong Li <xianzhong.li@nxp.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40778>	2026-04-07 11:06:34 +00:00
Natalie Vock	436acc321a	radv: Disable RADV_DEBUG=llvm in release builds The LLVM backend is unmaintained. Let's not encourage users to swap out entire parts of the driver with an unsupported codepath. Enabling this option is a footgun nowadays anyway, given that it disables many features and thus may trigger bigger changes in behavior than intended. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40815>	2026-04-07 09:55:25 +00:00
Daniel Schürmann	58390ceb98	radv: increase limit for peephole_select in radv_optimize_nir_algebraic_early() Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Totals from 4868 (2.40% of 202440) affected shaders: (Navi48) MaxWaves: 128008 -> 128004 (-0.00%); split: +0.04%, -0.05% Instrs: 10006725 -> 9978721 (-0.28%); split: -0.31%, +0.03% CodeSize: 54085500 -> 54018184 (-0.12%); split: -0.19%, +0.07% VGPRs: 299524 -> 299584 (+0.02%); split: -0.10%, +0.12% SpillSGPRs: 8707 -> 8669 (-0.44%); split: -0.48%, +0.05% Latency: 79101292 -> 79243875 (+0.18%); split: -0.55%, +0.73% InvThroughput: 13645193 -> 13731338 (+0.63%); split: -0.08%, +0.71% VClause: 181709 -> 181485 (-0.12%); split: -0.23%, +0.10% SClause: 222587 -> 221191 (-0.63%); split: -1.26%, +0.63% Copies: 708979 -> 690992 (-2.54%); split: -2.71%, +0.17% Branches: 232868 -> 223146 (-4.17%) PreSGPRs: 275370 -> 274818 (-0.20%); split: -0.25%, +0.05% PreVGPRs: 238859 -> 238907 (+0.02%); split: -0.01%, +0.03% VALU: 5291185 -> 5291617 (+0.01%); split: -0.08%, +0.09% SALU: 1610496 -> 1604458 (-0.37%); split: -0.68%, +0.30% VMEM: 303401 -> 303037 (-0.12%) SMEM: 358335 -> 357964 (-0.10%) VOPD: 377180 -> 376374 (-0.21%); split: +0.05%, -0.27% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40708>	2026-04-07 08:00:04 +00:00
Samuel Pitoiset	71b6db06e1	ac/nir: add descriptor heap support to opt_flip_if_for_mem_loads() Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>	2026-04-07 06:15:24 +00:00
Samuel Pitoiset	1184610de4	ac/nir: add descriptor heap support to ac_nir_lower_image_tex() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>	2026-04-07 06:15:24 +00:00
Samuel Pitoiset	d2132ae011	ac/nir: adjust lowering of query size for descriptor heap Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40702>	2026-04-07 06:15:24 +00:00
Mel Henning	001de6d71b	nak: Fix mufu's f16 bit on sm90+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Fixes multiple cts tests on blackwell, including eg. dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc Fixes: `d031365f7c` ("nak: support MUFU.F16") Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>	2026-04-07 05:10:16 +00:00
Faith Ekstrand	0d5cae97b7	pan/bi: Vectorize 8-bit ops up to v4i8 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	15d5675e8e	pan/bi: Pack 8-bit vec2s We used to splat out 8-bit vec2s to 16-bit by repeating both 8-bit halves twice with the B0011 swizzle. I think the original idea here was that 16-bit swizzles were more widely available in the hardware and that this would make swizzling things easier. The problem is that nothing actually knows that the value is half-repeated like this so nothing knows it can upgrade a swizzle from B0022 to B0123 (H01). So instead we get a bunch of B0022 swizzles, which nothing supports. We can shave a lot of instructions if we just stop trying to be so clever and instead repeat the whole thing with a B0101 swizzle. The only real issue here is that v2[fiu]8_to_v2[fiu]16 needs a B0011 swizzle, which we have to apply on-the-fly. Fortunately, any swizzle can be composed with B0011. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	db8cb73b34	pan/bi: Add bytewise copy propagation This adds a new bytewise copy propagation pass which chews through MKVEC and SWZ instructions. The word-based copy propagation pass only existed to chew through SPLIT/COLLECT but MKVEC is COLLECT for bytes and we had nothing to help with that. This is actually two passes in one: Byte propagation and swizzle propagation. Any time we see a MKVEC, we look at its sources only as bytes and chase individual bytes back, through other MKVEC and SWZ, to their generating instruction and make the MKVEC only consume the original bytes. If the MKVEC happens to construct something that's just a swizzle of another def (this is fairly common), we record that as well. The idea here is that a lot of MKVEC just consume other MKVEC and we can get rid of the intermediate ones or even the whole chain if it just ends up being a swizzle in the end. For SWZ instructions, we first look at them like a MKVEC of the individual bytes they consume. If that doesn't yield a single swizzled word, we then crawl through the words table, just accumulating swizzles. This gives us the best (closest to the generating instructions) coherent word. We could also replace SWZ with MKVEC and just do byte propagation but MKVEC is often 2 instructions whereas SWZ is often one (or folded into a source) so this is probably the better balance. Finally, we not only replace the MKVEC and SWZ instructions but we also attempt to propagate swizzles into individual ALU op sources. For v4i8 ops, this often fails since the full generality isn't always available but for fp16, we can almost always fold the swizzle into the consuming instruction. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	a4e9002660	pan/bi: Emit MKVEC directly Now that we have bi_lower_mkvec_swz(), there's no need to be so careful in the NIR -> bi translation. We can just emit MKVEC and move on. The lowering pass will sort out the detaisl. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	b9e33c7897	pan/bi: Stop lowering swizzles on mkvec and swz The new lowering can handle all the swizzle cases and is generally better at it than swizzle lowering. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	ed83d46d4e	pan/bi: Always use SWZ.v4i8 in bi_lower_swizzle() Now that we lower it, there's no advantage to one over the other at the time this pass runs. Also, the is_8bit check was technically wrong since it checks destination sizes, not source sizes. It's a lot safer to just use SWZ.v4i8 and let the lowering pass do the right thing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:25 +00:00
Faith Ekstrand	bc7053a976	pan/bi: Add a lowering pass for MKVEC and SWZ Instead of trying very carefully in the bifrost emit code to only generate valid MKVEC for the target hardware, this adds a lowering pass which is capable of lowering any MKVEC or SWZ we can throw at it. Even if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll lower it to something that does work on that platform. This frees up the rest of the compiler so we can add and modify MKVEC and SWZ at-will and never have to worry about hardware generation details. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:24 +00:00
Faith Ekstrand	0edceaf383	pan/bi: Add a bi_op_supports_swizzle() helper Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:24 +00:00
Faith Ekstrand	a8879daf9c	pan/bi: Add a bi_try_compose_swizzles() helper Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:24 +00:00
Faith Ekstrand	3b728cb613	pan/bi: Add a bi_swizzle_from_byte_channels() helper Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:24 +00:00
Faith Ekstrand	4912bda122	pan/bi: Return void from bi_swizzle_to_byte_channels() Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:24 +00:00
Faith Ekstrand	e637130794	pan/bi: Use bi_half() for texture MS indices It feeds into a v2i16 so it needs to be 16-bit. Fixes: `ae79f6765a` ("pan/bi: Emit Valhall texture instructions") Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Faith Ekstrand	77f9cbd0c2	pan/bi: Compose swizzles in bi_half() and bi_byte() At least bi_half() has the decency to assert if the swizzle isn't BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of what was there before. In a lot of cases, this doesn't matter but we use both in translating NIR to BI on things that may have already been swizzled so we need to do the composition. Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Faith Ekstrand	342e9ac7e8	pan/bi: Add a bi_swizzle_from_half() helper Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Faith Ekstrand	05c5e52054	pan/bi/ra: Allow offsets on tied sources The only real requirement here is that the destination offset is zero and that the destination is big enough to hold the source. The source offset doesn't matter. Fixes: `bc17288697` ("pan/bi: Lower split/collect before RA") Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Faith Ekstrand	538b5c411e	pan/bi: Delete a few instruction encodings The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are either documented wrong or broken in hardware. Instead of swizzling b0101 and b2323, they swizzle b0011 and b2233 on G52. This is either a hardware bug or an issue with documentation. In either case, it's probably best not to trust it. Those swizzles aren't all that useful anyway. We also weren't using any of them before (or they'd have broken) so this isn't a performance regression. Cc: mesa-stable Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Faith Ekstrand	3fffcf4338	pan/bi: Support more swizzle aliases in the bifrost pack code Fixes: `82328a5245` ("pan/bi: Generate instruction packer for new IR") Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>	2026-04-06 21:39:23 +00:00
Karol Herbst	f015600c89	docs: add AI disclosure requirements Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Martin Roukala <martin.roukala@mupuf.org> Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca> Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>	2026-04-06 21:34:11 +00:00
Karol Herbst	90d3ddfc80	docs: clarify the use of autonomously acting tooling Reviewed-by: Martin Roukala <martin.roukala@mupuf.org> Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca> Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>	2026-04-06 21:34:11 +00:00
Silvio Vilerino	9f3b3f039f	mediafoundation: Remove unnecessary staging variable in ProcessSliceBitstreamZeroCopy Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>	2026-04-06 21:19:05 +00:00
Silvio Vilerino	7ae2fe285f	mediafoundation: Pre-create all MFSamples to avoid per slice COM allocation in the hot loop Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>	2026-04-06 21:19:05 +00:00
Silvio Vilerino	3467763ab5	mediafoundation: Prefetch the slice fence handles before the waits Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40805>	2026-04-06 21:19:04 +00:00
Sagar Ghuge	f0ae58df12	intel/compiler: Handle TerminateOnFirstHit in ray query execution Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Once commited and have AABB or triangle intersection found, terminate the traversal if TerminateOnFirstHit ray flag is present. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>	2026-04-06 10:00:05 -07:00
Gurchetan Singh	c4cecd9d19	gfxstream: cereal: fix 'None' in gfxstream codegen Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Commit `e27e41a842` ("vulkan,spirv: update headers") exposed a flaw in the cerealgenerator. It modified -- among other things -- the VkDeviceCreateInfo struct in vk.xml. In the update, the len="enabledLayerCount,null-terminated" attribute was removed from the ppEnabledLayerNames member. The gfxstream code generator processes ppEnabledLayerNames (which is a const char* const*), it identifies it as an "array of strings". However, because the len attribute is now missing, vulkanType.getLengthExpression() returns None. This leads to errors like: gfxstream_guest_vk_autogen_impl/gen/goldfish_vk_counting_guest.cpp:642:30: error: use of undeclared identifier 'None' 642 \| for (uint32_t i = 0; i < None; ++i) \| ^~~~ 1 error generated. This patch adds various length access checks to prevent this from happening. TEST=m vulkan.ranchu Reviewed-by: David Gilhooley <djgilhooley@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40785>	2026-04-06 08:19:04 -07:00
Dhruv Mark Collins	77835f6c21	zink+turnip/ci: Add failures uncovered by new autotune These failures turned out to be triggered by the new autotune causing rendering mode transitions (such as GMEM -> SYSMEM) which led to a new set of failures to be uncovered. They tend to work as expected under either GMEM or SYSMEM being forced for all RPs but the specific transitions caused by the autotuner leads them to fail. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:51 +00:00
Dhruv Mark Collins	152b9c8db3	freedreno/fdperf: Detect when counter values are invalid The usage of two CP counters by latency sensitive autotuner will affect the operation of fdperf, this detects when counters have selectors that have been changed and marks them as invalid with corresponding UI cues. This also seems to detect selector values being dropped while the GPU is in sleep states and tends to be useful to catch that too. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	da089bf741	tu/autotune: Only lock RPs sustain certain mode for 30s Many games have short periods where a certain mode might win consistently but this trend doesn't hold after that. Only allowing locking to occur on RPs where a certain mode consistently stays winning for 30s allows us to partially mitigate these bad locks. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	c725f2aea3	tu/autotune: Allow 99% max probability in profiled mode The maximum probability was limited to 95% earlier due to the step delta of 5% (95+5=100% which we wanted to avoid). This introduces a new slower step delta after 95% which steps at 1% up to 99% which is significantly better in terms of eliminating the performance loss or stuttering from when there is a large difference between the modes. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	3b3ae477f3	tu/autotune: Add render mode locking to PROFILED algorithm There are certain scenarios where even switching to another render mode has significant negative implications for performance even when done for a single invocation. Now we try to heuristically pick out these cases and lock them into the optimal mode, at the moment the heuristic is fairly conservative but it manages to lock RPs in under a minute in most cases. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	3002d77dfd	tu+util: Prefer SYSMEM for DXVK/VKD3D PC games tend to almost always run far better in SYSMEM due to the high FS complexity, and so preferring SYSMEM tends to be a winning policy until profiled mode reaches a state where it can surpass it. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	ed643d1766	tu+util: Allow setting autotune mode from driconf Allows for setting an override for the default autotune mode using driconf, allowing for setting policy on a per-app basis. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	180c0de746	tu/autotune: Add prefer SYSMEM/GMEM mode Certain games tend to use rendering patterns that strongly prefer one mode over the other, and thus we're better off not bothering with profiling them. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	3fcec4762f	tu/autotune: Add "Preempt Optimize" mode This introduces a new option that makes autotune optimize for low preemption latency which is crucial to ensure responsiveness on systems with GPU-based composition. A large enough draw can entirely block the compositor from running with draw-level preemption, this can be mitigated by preferring to use GMEM which breaks up the draw into smaller pieces and generally has a lower latency for preemption. As a further mitigation, tiles in GMEM are then divided into smaller and smaller pieces which lowers the non-preemptible duration. There are static checks in place to avoid doing this when it would incur a cost that is too large. Uses performance counters read during ambles to detect preemption latency events while rendering in SYSMEM. This approach is superior to using RBBM draw time thresholds which could be imprecise as only the average was calculated rather than true maximum draw time. However, converting the preemption latency performance counter value from CP ticks to wall clock is based on the average GPU frequency of the whole period from the start of the RP until the switch-away amble while the preemption latency stars counting from the request. Thus, if the GPU frequency shifts rapidly throughout the RP, it may cause the estimated wall clock time to be inaccurate, but it should be good enough in the vast majority of cases. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	bf2777c013	tu/autotune: Disable autotuning for small renderpasses by default Tuning these small renderpasses is difficult due to their high variability across command buffers and low impact on overall performance in most cases. This change disables autotuning for renderpasses with 5 or fewer draw calls unless the TUNE_SMALL modifier flag is explicitly set. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	8e1fe9da20	tu/autotune: Prefer SYSMEM when only SW binning is possible In cases where only SW binning is possible and where there would be a performance impact from not using HW binning (i.e. > 2 tiles), it is preferable to default to SYSMEM as the performance impact of using GMEM is almost definitely not going to be worth it. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	dde478ce98	util/math: Add ROUND_DOWN_TO_NPOT The default ROUND_DOWN_TO only handles POT alignment values, so an additional variant was added which handles NPOT alignment too. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00
Dhruv Mark Collins	fac705ab8a	tu/autotune: Add "Profiled" algorithm This algo measures the time taken by each RP as a whole, and uses that to move a probability distribution of whether to use GMEM or SYSMEM for that RP. This is done with a delta of 5% per run, and the probability is clamped to 5% and 95% to avoid getting stuck when conditions change. Additionally, an "immediate resolve" variant which tries to work off a single data point in SYSMEM and GMEM, then immediately resolves to the faster path. This is useful for usage in CI which runs a single frame multiple times where the performance isn't varying change from frame to frame. Signed-off-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>	2026-04-06 14:19:29 +00:00

1 2 3 4 5 ...

220763 commits