fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-29 08:00:12 +01:00

Author	SHA1	Message	Date
Samuel Pitoiset	168a8d0b52	radv: fix RB+ for depth-only with unused attachments When there are no color outputs in the rendering state, but color write enable/write aren't masked out (which seems legal with VK_EXT_dynamic_rendering_unused_attachments), the driver must emit CB_DISABLE to disable CB rendering completely. Otherwise, if there is also a depth/stencil attachment in the rendering state, CB0 is always set to 32_R for RB+. That means, the pixel shader would still export fragments but to the previously bound color attachment. VKCTS is missing coverage. Fixes: `4580293ab2` ("radv: implement RB+ depth-only rendering for better perf") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14319 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38509>	2025-11-20 07:37:17 +00:00
Eric Engestrom	5db246a5ba	perfetto: use the new upstream repo The old one was abandoned without so much as a README note. This will also allow using newer releases than 47; the current one being 53, but this MR doesn't address that, as it aims to be a simple no-op change. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38525>	2025-11-20 07:07:42 +00:00
Yiwei Zhang	829bd406c0	venus: fix racy semaphore feedback counter update Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Previously, we update the sfb dst slot upon vn_SignalSemaphore so that vn_GetSemaphoreCounterValue can poll just the feedback slot itself. However, that can race with pending sfb cmds that are going to update the slot value, ending up with stuck sync progression. This change fixes it by disallowing vn_SignalSemaphore to touch the sfb dst slot. To ensure counter query being monotonic, vn_GetSemaphoreCounterValue now takes the greater of signaled counter and the sfb counter read. Test with dEQP-VK.synchronization* group: - w/o this: stuck shows up within 2 min with 8 parallel deqp runs - with this: no stuck for multiple full runs of the same Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14304 Fixes: `5c7e60362c` ("venus: enable timeline semaphore feedback") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38516>	2025-11-20 06:29:16 +00:00
Faith Ekstrand	0bd5734349	panvk: Set primitive_index_override when prim ID is written by IDVS Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>	2025-11-20 06:09:16 +00:00
Faith Ekstrand	333ca9133a	pan/genxml: Rename Primitive Index Override Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>	2025-11-20 06:09:16 +00:00
Faith Ekstrand	652a5f41bf	pan/bi: Add support for writing gl_PrimitiveID from IDVS Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>	2025-11-20 06:09:16 +00:00
Faith Ekstrand	5700c87db6	pan/bi: Add some helpers an an info field for needing the extended FIFO The logic here is a bit scattered around and is about to get more complicated. This adds a helper which better documents the interactions as well as an info field to make the driver's life easier. Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>	2025-11-20 06:09:16 +00:00
Marek Olšák	9e339f4b32	nir: rename nir_lower_indirect_derefs -> nir_lower_indirect_derefs_to_if_else_trees This describes better what it does. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38471>	2025-11-20 05:42:11 +00:00
Marek Olšák	22871fb8bd	nir: for nir_shift_channels, fill undefined components with undef instead of .x This potentially results in better code because we don't add def uses where undef is allowed. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38468>	2025-11-20 04:26:55 +00:00
Marek Olšák	65837d8289	ac,radeonsi: remove gfx11 FW-based MCBP Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details It's too slow to be usable. User queues could replace it. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38338>	2025-11-20 03:31:47 +00:00
Sagar Ghuge	f0aad5bd7e	anv: Convert indirect to direct dispatch Saves unncessary PC and stall during encode phase. Thanks to Felix for pointing out that CCS always needs a CS stall once we add a pipe control, that will kill the performance for BVH construction. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38513>	2025-11-20 03:11:55 +00:00
Felix DeGrood	15ffe6c524	anv/perfetto: include all pc reasons Up to 4 reasons can be saved and displayed. Previously, we were only displaying one reason for Perfetto. Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38500>	2025-11-20 02:53:53 +00:00
Connor Abbott	5ccbcf8a8b	tu: Support softfloat64 DOOM Eternal uses fp64 without checking. Don't expose it, but as a workaround lower the fp64 operations to software so that we don't choke on them, similar to anv. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>	2025-11-20 02:14:50 +00:00
Connor Abbott	3b3954e2b8	util/glsl2spirv: Use better glslang flag for -Olib --create-unlinked also creates entrypoints for the functions, and obviates the need to create a dummy entrypoint. This is one step closer to removing glsl2spirv and aligns us with other users of glslang. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>	2025-11-20 02:14:50 +00:00
Connor Abbott	9e3bc1f123	tu: Make softfloat shader compiled on demand The vast majority of users will not need this, so do not pay the runtime and memory cost of compiling the shader to NIR until it's needed. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>	2025-11-20 02:14:50 +00:00
Yonggang Luo	6400de124c	docs: Update the minimal MSVC version requirements Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details As now __typeof__ is used in util/macros.h The minimal required version is Visual Studio 17.9 or later, or cl.exe version 19.39.33428 or later. According to https://learn.microsoft.com/en-us/cpp/c-language/typeof-c?view=msvc-170#requirements Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>	2025-11-20 01:16:17 +00:00
Yonggang Luo	ca364a9551	ci: update image tags for windows container Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>	2025-11-20 01:16:17 +00:00
Yonggang Luo	11a775539f	ci: MSVC 2019 is not support anymore, remove it. Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>	2025-11-20 01:16:17 +00:00
Yonggang Luo	84cce2a739	ci/microsoft: Downgrading WinFlexBison.win_flex_bison to version 2.5.24 This is achieved by provide version parameter to winget-cli Fixes: `9592686ca0` ("ci/windows: Use winget to install packages and install Microsoft.WindowsWDK.10.0.26100") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13968 Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38534>	2025-11-20 01:16:17 +00:00
Aitor Camacho	4b4061fa71	kk: Fix image to image copy This change forces image->buffer->image copy path for pretty much all the cases now. Metal's image to image copy only allows same format and sample count. Previously we were only taking the image->buffer->image path for compressed formats. This just seemed to work, but we may run into issues in the future. Metal does not report any validation layer error. Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38459>	2025-11-20 00:57:29 +00:00
Faith Ekstrand	6d9f563960	spirv: Assume variable workgroup size unless it's set This fixes an issue a bunch of different components were all working around themselves where sometimes we don't have a workgroup size but workgroup_size_variable is false. This also fixes asahi_clc, which didn't have the workaround and was assuming zero (but not variable!) workgroup sizes everywhere. LoLed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38538>	2025-11-20 00:02:42 +00:00
Mel Henning	80db8171de	zink: Lock around screen_debug_marker_{begin,end} Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details vkQueueBeginDebugUtilsLabelEXT and vkQueueEndDebugUtilsLabelEXT require queue to be externally synchronized, which means these functions require the lock. Unfortunately, there's no guarantee that the debug markers will be matched in the multithreaded case, but I suppose this is better than crashing. Fixes: `015eda4a41` ("zink: deduplicate VkDevice and VkInstance") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>	2025-11-19 23:43:45 +00:00
Mel Henning	018178842e	zink: Lock queue_lock in zink_destroy_screen Fixes: `015eda4a41` ("zink: deduplicate VkDevice and VkInstance") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>	2025-11-19 23:43:45 +00:00
Mel Henning	9acce36652	zink: Create one queue lock per device We currently only create one queue per queue family on the device. The device can be shared between multiple zink_screens, so having one lock per screen can still lead to multiple locks per queue. Fix this by allocating queue_lock along with the device. This fixes an issue that was causing crashes with nvk+zink and QtWebEngine with QTWEBENGINE_FORCE_USE_GBM=1 This can be reproduced by resizing the window in either: * anki - https://apps.ankiweb.net/ or * Qt's simplebrowser example https://doc.qt.io/qt-6/qtwebengine-webenginewidgets-simplebrowser-example.html which would then cause this dmesg error: nouveau 0000:01:00.0: anki[92007]: Failed to find syncobj (-> in): handle=40 along with a context loss. With VK_LOADER_LAYERS_ENABLE=VK_LAYER_KHRONOS_validation we would additionally get warnings like: Validation Error: [ UNASSIGNED-Threading-MultipleThreads-Write ] \| MessageID = 0xa05b236e vkQueueSubmit(): THREADING ERROR : object of type VkQueue is simultaneously used in current thread 139824449189568 and thread 139823901816512 Objects: 1 [0] VkQueue 0x557a666783e0 Fixes: `015eda4a41` ("zink: deduplicate VkDevice and VkInstance") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>	2025-11-19 23:43:44 +00:00
Mel Henning	dff1b9d4e9	zink: Make screen->queue_lock a pointer Fixes: `015eda4a41` ("zink: deduplicate VkDevice and VkInstance") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>	2025-11-19 23:43:44 +00:00
Mel Henning	f0dc8c0224	zink: Return zink_device in create_logical_device Fixes: `015eda4a41` ("zink: deduplicate VkDevice and VkInstance") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>	2025-11-19 23:43:43 +00:00
Aitor Camacho	abc719f01f	kk: Add multiViewport and EXT_shader_viewport_index_layer support Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38518>	2025-11-19 23:29:00 +00:00
Aitor Camacho	15f170e369	kk: Merge io type modifying passes into one Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38518>	2025-11-19 23:29:00 +00:00
Faith Ekstrand	cbd0c9eb3b	panvk: Add a panvk_common_sysvals struct For geometry shaders, we're going to need to compile various graphics shaders down to compute shaders. This means that they'll look like compute shaders to much of the compile pipeline but ultimately get executed as graphics shaders. Most of the time, the compiler will just happily take whatever offset you give and try to load the sysval from there so you can load a graphics sysval from a compute shader just fine. However, for the common ones, we switch on the shader stage and load from a different offset for 3D vs. compute. This breaks the moment you have a compute shader that's going to actually load from a 3D sysval space. The solution here is to ensure that any common sysvals (currently just the push uniforms address and the printf buffer) are at exactly the same offset in both. This is done by adding a panvk_common_sysvals struct, some static asserts, and a bit of macro magic to keep things eurgonamic. This also changes push uniform upload to just swap in the push uniform address instead of writing it to the command buffer on every iteration. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38508>	2025-11-19 23:10:41 +00:00
Eric Engestrom	3ebabe9e43	docs/release-calendar: add 26.0 branchpoint and release candidates Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38539>	2025-11-19 23:04:46 +00:00
Lionel Landwerlin	6fe2035065	anv: bump maxTessellationControlTotalOutputComponents Our backend compiler explains the limits as : 32 bytes for the patch header (tessellation factors) 480 bytes for per-patch varyings (a varying component is 4 bytes and gl_MaxTessPatchComponents = 120) 16384 bytes for per-vertex varyings (a varying component is 4 bytes, gl_MaxPatchVertices = 32 and gl_MaxTessControlOutputComponents = 128) In all that's : * 32 patches * 128 components (counting tessellation factors) * 32 vertices * 128 components 8192 total components. I'm not sure why the limit was set so low, maybe leftover from older platforms? Bump the limit to something like competition. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38523>	2025-11-19 22:44:54 +00:00
Eric R. Smith	65ba14519e	pan: fix a bifrost disassembly assert failure We were overflowing an array during bifrost disassembly. This was only a problem if the user explicitly set an environment variable, so unlikely to occur in casual use, and also only could be triggered in very specific, dense code. But we still should get this right! The specific CTS test that caused the assert is: 'dEQP-VK.graphicsfuzz.stable-quicksort-for-loop-with-injection' with environment variable `BIFROST_MESA_DEBUG=shaders`. One of the shaders has a clause with 6 constants (the maximum) and this overflowed the array because we assume we always have an extra slot (used for modifier processing). Cc: mesa-stable Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38501>	2025-11-19 22:10:21 +00:00
Dmitry Baryshkov	7a3bfd1f79	rocket: drop file names from the generated file Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Having file names and dates in the generated file affects reproducibility. Build systems (like OE) error out on the gen_header.py output, because it can contain full paths. Drop file list from the generated file. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>	2025-11-19 16:27:32 +00:00
Dmitry Baryshkov	cdb6468c53	ethosu: drop file names from the generated file Having file names and dates in the generated file affects reproducibility. Build systems (like OE) error out on the gen_header.py output, because it can contain full paths. Drop file list from the generated file. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>	2025-11-19 16:27:32 +00:00
Hyunjun Ko	9a9342e4aa	anv/video: handling segmentations features for vp9 decoding Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38418>	2025-11-19 15:54:47 +00:00
Hyunjun Ko	1479e1ef82	anv/video: rework for handling alternative quantizer for vp9 decoding. including prep-work for handling segmentation features. Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38418>	2025-11-19 15:54:47 +00:00
Danylo Piliaiev	8827123fef	tu: Disable FLAG_WAIT_FOR_BR sync when CB is disabled Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Skip TU_CMD_FLAG_WAIT_FOR_BR wait whenever concurrent binning is disabled. Without CB there is nothing to wait for, so the sync only adds overhead, and in workloads with thousands of tiny renderpasses the cumulative overhead becomes too big. In one real-world workload I saw the following timings: - 99.20 ms without disabling TU_CMD_FLAG_WAIT_FOR_BR - 65.15 ms with TU_CMD_FLAG_WAIT_FOR_BR disabled - 64.92 ms with TU_DEBUG=nocb Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:33 +00:00
Danylo Piliaiev	9370bdc61e	tu: Disable by default CB running alongside renderpasses Disable concurrent binning by default so regular renderpasses have access to all vertex fetch resources. When a renderpass can actually enable CB, walk back to the CB barrier at submission time and re-enable CB for all patchpoints between CB barrier and the renderpass. Because we expect at most one or two renderpasses with CB per frame, the number of patches stays small. The reduced vertex fetch resources resulted in up to 10% performance loss seen in targeted benchmark and in a few game captures. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:33 +00:00
Danylo Piliaiev	5d2b171886	tu/cs: Helpers to create a region that can be easily enabled/disabled To mitigate CB perf impact we'd need to be able to eaily toggle CB related IB regions. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:32 +00:00
Danylo Piliaiev	a7f63a5dbb	tu: Do not WAIT_FOR_BR if concurrent binning is disabled The sync emitted on TU_CMD_FLAG_WAIT_FOR_BR didn't disable CB when CB was previously disabled for the renderpass, this resulted in less resources vertex processing resources available for BR. We can just not emit the sync instead, since next time CB is enabled it will force the sync. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:32 +00:00
Danylo Piliaiev	f2fb8ad422	tu: Don't CONCURRENT_BIN_DISABLE when there is no depth image We have to disable CB when lrz fast-clear is disabled, but if there is no depth image at all, we can keep it enabled. This means that RP without depth won't effectively be a CB barrier. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:32 +00:00
Danylo Piliaiev	ee4f375bfd	tu: Fix CB barrier description Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>	2025-11-19 14:35:32 +00:00
Janne Grunau	1f144081ec	meson: Add asahi to aarch64's auto-generated drivers Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Since the Apple silicon M1 and M2 series of SoCs support only aarch64 split the lists for 'arm' and 'aarch64'. Signed-off-by: Janne Grunau <j@jannau.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38341>	2025-11-19 11:16:53 +00:00
Georg Lehmann	fa66b670d4	aco/optimizer: reduce max alu_opt_info stack operands to 4 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details ALU instructions typically have a maximum of 3 operands, and even when combining instructions, the peak count will not go above 4. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:43 +00:00
Georg Lehmann	4da74eed96	aco/tests: test packed fma opts Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:43 +00:00
Georg Lehmann	1f0293be0d	aco/optimizer: use new helpers for packed fma Foz-DB Navi48: Totals from 374 (0.45% of 82419) affected shaders: MaxWaves: 5476 -> 5480 (+0.07%) Instrs: 2786653 -> 2784061 (-0.09%); split: -0.11%, +0.01% CodeSize: 15163340 -> 15153460 (-0.07%); split: -0.08%, +0.01% VGPRs: 46884 -> 46860 (-0.05%) SpillVGPRs: 188 -> 189 (+0.53%) Scratch: 3207936 -> 3208192 (+0.01%) Latency: 27352681 -> 27350006 (-0.01%); split: -0.02%, +0.01% InvThroughput: 5933554 -> 5932632 (-0.02%); split: -0.02%, +0.01% VClause: 62355 -> 62359 (+0.01%); split: -0.03%, +0.04% Copies: 290221 -> 289786 (-0.15%); split: -0.21%, +0.06% Branches: 108566 -> 108569 (+0.00%); split: -0.01%, +0.01% PreVGPRs: 40172 -> 40157 (-0.04%) VALU: 1355753 -> 1353329 (-0.18%); split: -0.19%, +0.01% SALU: 524836 -> 524831 (-0.00%); split: -0.01%, +0.01% VMEM: 90948 -> 90950 (+0.00%) VOPD: 10489 -> 10490 (+0.01%); split: +0.98%, -0.97% Foz-DB Navi21: Totals from 374 (0.45% of 82387) affected shaders: MaxWaves: 4339 -> 4348 (+0.21%) Instrs: 2255741 -> 2253554 (-0.10%); split: -0.10%, +0.00% CodeSize: 12755276 -> 12744184 (-0.09%); split: -0.09%, +0.01% VGPRs: 40376 -> 40352 (-0.06%) Latency: 27357012 -> 27348737 (-0.03%); split: -0.07%, +0.04% InvThroughput: 7213578 -> 7211136 (-0.03%); split: -0.07%, +0.04% VClause: 62154 -> 62172 (+0.03%); split: -0.01%, +0.04% Copies: 268204 -> 268048 (-0.06%); split: -0.22%, +0.16% Branches: 107067 -> 107066 (-0.00%) PreVGPRs: 37615 -> 37599 (-0.04%) VALU: 1423326 -> 1421187 (-0.15%); split: -0.16%, +0.01% SALU: 383388 -> 383390 (+0.00%); split: -0.00%, +0.00% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:43 +00:00
Georg Lehmann	fec10ea3ea	aco/optimizer: use new helpers for add16 opts Foz-DB Navi48: Totals from 164 (0.20% of 82419) affected shaders: Instrs: 145304 -> 145335 (+0.02%); split: -0.00%, +0.02% CodeSize: 794156 -> 794280 (+0.02%); split: -0.00%, +0.02% Latency: 1884349 -> 1884227 (-0.01%); split: -0.01%, +0.00% InvThroughput: 350403 -> 350393 (-0.00%) Foz-DB Navi21: Totals from 164 (0.20% of 82387) affected shaders: Instrs: 117416 -> 117414 (-0.00%) CodeSize: 673328 -> 673312 (-0.00%) Latency: 1896952 -> 1897094 (+0.01%); split: -0.00%, +0.01% InvThroughput: 638536 -> 638556 (+0.00%); split: -0.01%, +0.01% Copies: 14579 -> 14577 (-0.01%) VALU: 65895 -> 65893 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:42 +00:00
Georg Lehmann	e8f5b9374b	aco/optimizer: use new helpers to optimize mul(b2f(a), b) Foz-DB Navi48: Totals from 979 (1.19% of 82419) affected shaders: Instrs: 3630560 -> 3629463 (-0.03%); split: -0.03%, +0.00% CodeSize: 19154176 -> 19147124 (-0.04%); split: -0.04%, +0.00% Latency: 17700546 -> 17699505 (-0.01%); split: -0.01%, +0.01% InvThroughput: 3143808 -> 3143254 (-0.02%); split: -0.02%, +0.01% SClause: 76410 -> 76405 (-0.01%); split: -0.01%, +0.00% Copies: 256544 -> 256554 (+0.00%); split: -0.02%, +0.02% PreVGPRs: 40868 -> 40835 (-0.08%) VALU: 2003291 -> 2002466 (-0.04%); split: -0.04%, +0.00% SALU: 514000 -> 514006 (+0.00%) VOPD: 3254 -> 3256 (+0.06%); split: +0.12%, -0.06% Foz-DB Navi21: Totals from 926 (1.12% of 82387) affected shaders: MaxWaves: 21538 -> 21542 (+0.02%) Instrs: 2984216 -> 2983187 (-0.03%); split: -0.04%, +0.00% CodeSize: 16104112 -> 16097272 (-0.04%); split: -0.05%, +0.00% VGPRs: 46864 -> 46848 (-0.03%) Latency: 15678064 -> 15677099 (-0.01%); split: -0.01%, +0.00% InvThroughput: 3779550 -> 3778230 (-0.03%); split: -0.04%, +0.01% VClause: 81590 -> 81598 (+0.01%) SClause: 70753 -> 70751 (-0.00%); split: -0.01%, +0.00% Copies: 240446 -> 240466 (+0.01%); split: -0.01%, +0.02% PreSGPRs: 51121 -> 51062 (-0.12%) PreVGPRs: 38538 -> 38505 (-0.09%) VALU: 1978847 -> 1977777 (-0.05%); split: -0.06%, +0.00% SALU: 439184 -> 439212 (+0.01%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:42 +00:00
Georg Lehmann	f0e24284f5	aco/optimizer: create max3/min3/med3 with salu min/max Foz-DB Navi48: Totals from 175 (0.21% of 82419) affected shaders: Instrs: 465863 -> 465260 (-0.13%); split: -0.13%, +0.00% CodeSize: 2362264 -> 2360744 (-0.06%); split: -0.07%, +0.00% Latency: `1548501` -> 1548371 (-0.01%); split: -0.01%, +0.00% InvThroughput: 227683 -> 227630 (-0.02%); split: -0.08%, +0.06% Copies: 33646 -> 33648 (+0.01%) PreSGPRs: 9996 -> 10004 (+0.08%) VALU: 175836 -> 175850 (+0.01%) SALU: 122094 -> 121621 (-0.39%); split: -0.39%, +0.00% Foz-DB Navi21: Totals from 1 (0.00% of 82387) affected shaders: InvThroughput: 74 -> 76 (+2.70%) VALU: 57 -> 58 (+1.75%) SALU: 61 -> 60 (-1.64%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:42 +00:00
Georg Lehmann	d21734e024	aco/optimizer: use new helper functions to create med3 Foz-DB Navi48: Totals from 9659 (11.72% of 82419) affected shaders: Instrs: 17301747 -> 17301735 (-0.00%); split: -0.00%, +0.00% CodeSize: 93378108 -> 93378184 (+0.00%); split: -0.00%, +0.00% Latency: 145441784 -> 145441791 (+0.00%); split: -0.00%, +0.00% InvThroughput: 25768777 -> 25768778 (+0.00%) Copies: 1370123 -> 1370124 (+0.00%) VALU: 9705655 -> 9705656 (+0.00%) Foz-DB Navi21: Totals from 22 (0.03% of 82387) affected shaders: Instrs: 27433 -> 27406 (-0.10%) CodeSize: 146440 -> 146352 (-0.06%); split: -0.06%, +0.00% Latency: 305857 -> 305806 (-0.02%); split: -0.02%, +0.00% InvThroughput: 63634 -> 63580 (-0.08%) VALU: 19109 -> 19082 (-0.14%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>	2025-11-19 10:51:42 +00:00

1 2 3 4 5 ...

215045 commits