fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-04 22:49:13 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	74ed2b78e8	asahi,hk: optimize no-op FS Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	626fa80c1b	asahi: optimize pass type with depth-only passes Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	7f2a6cdd26	hk: only enable image view min LOD for dx12 I don't really want random Vulkan apps using this. fixes Steam shading precaching via fossilize. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	a0a18c084e	hk: kill psiz writes via topology, not feature this regresses DXVK fast link shaders, I guess, but fixes Proton shader precompiles. per discussion with Hans-Kristian Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	9c987ee75e	asahi: use native colour masking seems to work now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	562377f01d	agx: try to rematerialize to improve occupancy we already have a perfectly good spiller and SSA... use it when it helps. yes, this costs a bit of CPU time, but it's guarded behind enough checks that the average time should be fine. this was prompted by a shadertoy where we were losing waves due to way too many constants pooled at the start of a chunky shader. in GL shader-db, only affected shaders are in blender: instrs HURT: shaders/blender/1020.shader_test FS: 3125 -> 3178 (1.70%) instrs HURT: shaders/blender/981.shader_test FS: 3125 -> 3178 (1.70%) instrs HURT: shaders/blender/729.shader_test FS: 3086 -> 3154 (2.20%) instrs HURT: shaders/blender/1023.shader_test FS: 3085 -> 3153 (2.20%) instrs HURT: shaders/blender/424.shader_test FS: 3085 -> 3153 (2.20%) threads helped: shaders/blender/1020.shader_test FS: 576 -> 640 (11.11%) threads helped: shaders/blender/1023.shader_test FS: 576 -> 640 (11.11%) threads helped: shaders/blender/424.shader_test FS: 576 -> 640 (11.11%) threads helped: shaders/blender/729.shader_test FS: 576 -> 640 (11.11%) threads helped: shaders/blender/981.shader_test FS: 576 -> 640 (11.11%) in VK fossils, there's a lot more high pressure shaders that benefit: Totals from 113 (0.21% of 54019) affected shaders: MaxWaves: 64448 -> 73088 (+13.41%) Instrs: 388529 -> 391646 (+0.80%); split: -0.00%, +0.80% CodeSize: 2750064 -> 2769106 (+0.69%); split: -0.00%, +0.69% ALU: 292960 -> 295863 (+0.99%); split: -0.00%, +0.99% FSCIB: 292960 -> 295863 (+0.99%); split: -0.00%, +0.99% GPRs: 21297 -> 19289 (-9.43%) Preamble instrs: 75703 -> 75911 (+0.27%) notable improvement in Far Cry 5. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig	6544a4f1ae	asahi: drop sink/move in GS code this is asking for trouble, since divergence analysis doesn't handle stuff we lower quickly. this fixes geometry shaders blowing up since the cited commit, but since I was the one who r-b'd that change, I don't have anyone to blame but myself C: Fixes: `d61edf079b` ("nir: add nir_move_only_convergent/divergent") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>	2025-08-03 14:40:53 -04:00
Antonino Maniscalco	e4584c9470	tu: Add support for realtime vk priority Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The kernel creates 4 rings so it is possible to map each of vulkan's priority to each ring. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36172>	2025-08-03 12:46:17 +00:00
LingMan	8227283d58	nak: Drop include paths for `size_of` and `size_of_val` Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details They have been added to the prelude with Rust 1.80. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:21 +00:00
LingMan	8376ecd842	rusticl: Use std::mem::offset_of!() Support for nested fields got stabilized with Rust 1.82. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:21 +00:00
LingMan	0631b4fd7e	rusticl: Drop include paths for `size_of`, `size_of_val`, and `align_of` They have been added to the prelude with Rust 1.80. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:21 +00:00
LingMan	d4a7811519	rusticl: Use `is_aligned` from std It got stabilized with Rust 1.79. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:20 +00:00
LingMan	6c7084357d	mesa: Bump required Rust version to 1.82 Firefox ESR requires Rust 1.82 since version 140. Thus, this update is in line with our Rust update policy. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:20 +00:00
LingMan	eda7043025	docs/rusticl: Update documented version requirements for meson and bindgen The requirements bump a few weeks ago forgot to update the docs. Fixes: `1a698c75ae` ("build: Rust: Bump minimum Meson and bindgen version") Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:20 +00:00
LingMan	b364732502	ci/rust: Drop date from Rust release channel selection For stable Rust, specifying the patch version already uniquely identifies a toolchain build. Specifying the date would only be required for nightly releases. Reviewed-by: Eric Engestrom <eric@igalia.com Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>	2025-08-03 10:16:19 +00:00
Job Noorman	b101aecb03	ir3: add shader bisect debug tool Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When debugging a problem in a trace, CTS test,... that is caused by a known compiler feature, the first step is usually to find which shader causes the problem. This is often non-trivial as the amount of shaders in a trace can be huge. This commit adds a debugging tool to help with this. The idea behind this tool is to assign every shader a deterministic (pre-compilation) ID that can be used to order shaders. Once we have this, we can use it to bisect which shader causes the problem. This obviously only works if the problem can be traced back to a single shader. In my experience, this is often the case. This tool reuses the shader cache key as deterministic ID. It is concatenated with the variant ID to distinguish the different variants of a shader. In practice, bisecting the shaders in a test run works like this: - Gate the problematic compiler feature using ir3_shader_bisect_select; E.g., if (ir3_shader_bisect_select(v)) IR3_PASS(...); - Run test with IR3_SHADER_BISECT_DUMP_IDS_PATH=ids.txt - Sort ids.txt - Bisect the shader IDs using IR3_SHADER_BISECT_LO/IR3_SHADER_BISECT_HI. - Dump the problematic shader using IR3_SHADER_BISECT_DISASM. A Python script is provided to make all this easier: - ir3_shader_bisect.py dump-ids -o ids.txt 'test args' - ir3_shader_bisect.py bisect -i ids.txt 'test args' Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602>	2025-08-03 09:30:49 +00:00
Job Noorman	0a123ce68b	ir3: add pointer from ir3_shader_variant to ir3_shader Needed in the next commit to get the shader key for a variant. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602>	2025-08-03 09:30:49 +00:00
Job Noorman	d36594f7f0	ir3/ra: fix file start wraparound The initial wraparound was calculated in a way I do not fully understand. However, it could lead to not starting from register 0 when a wraparound occurs. This, in turn, could lead to some unnecessary gaps. Fix this by explicitly setting start to 0 when a wraparound occurs. Totals from 16452 (10.00% of 164575) affected shaders: Instrs: 16456187 -> 16449330 (-0.04%); split: -0.14%, +0.10% CodeSize: 32357818 -> 32345432 (-0.04%); split: -0.14%, +0.10% NOPs: 3411778 -> 3410810 (-0.03%); split: -0.43%, +0.40% MOVs: 603559 -> 603199 (-0.06%); split: -0.81%, +0.75% COVs: 262804 -> 262761 (-0.02%); split: -0.13%, +0.11% Full: 279264 -> 279179 (-0.03%); split: -0.04%, +0.01% (ss): 422887 -> 422739 (-0.03%); split: -0.81%, +0.77% (sy): 188298 -> 188513 (+0.11%); split: -0.53%, +0.65% (ss)-stall: 1685300 -> 1679865 (-0.32%); split: -0.99%, +0.67% (sy)-stall: 5797450 -> 5788564 (-0.15%); split: -0.74%, +0.58% STPs: 18359 -> 18341 (-0.10%); split: -0.14%, +0.04% LDPs: 32825 -> 32833 (+0.02%); split: -0.22%, +0.24% Preamble Instrs: 3307822 -> 3308388 (+0.02%); split: -0.31%, +0.33% Early Preamble: 5853 -> 5852 (-0.02%) Last helper: 4154632 -> 4164580 (+0.24%); split: -0.34%, +0.58% Cat0: 3760257 -> 3759249 (-0.03%); split: -0.39%, +0.36% Cat1: 968587 -> 963086 (-0.57%); split: -0.99%, +0.43% Cat2: 6133128 -> `6133532` (+0.01%); split: -0.03%, +0.03% Cat6: 183289 -> 183275 (-0.01%); split: -0.05%, +0.05% Cat7: 684028 -> 683290 (-0.11%); split: -0.35%, +0.25% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36374>	2025-08-03 08:58:29 +00:00
sarbes	0a12ff6f45	lima: move RSW packing/unpacking to genxml Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This MR removes most magic values of the affected code paths, and makes the code more readable. Parsing of the RSW words is now done by genxml. v2: - Renamed varying types - Removed unnecessary whitespaces Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36401>	2025-08-02 18:26:55 +00:00
Iván Briano	bf8ebb6a7d	intel: Re-disable ray tracing on 32 bits Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We had this disabled before moving to the common framework for BVH building and lost it along the way. Fixes: `f0e18c475b` ("intel: remove GRL/intel-clc") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36522>	2025-08-02 00:12:44 +00:00
Yiwei Zhang	83b9c13b6f	Revert "android: moving HMI symbol to separate file" This reverts commit `6c7f7e4953`. The original change wasn't properly reviewed and the rationale was obscure. Meanwhile, it was for gfxstream Android frontend which was not built in upstream mesa at all. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36532>	2025-08-01 23:44:49 +00:00
Paulo Zanoni	4c7254d105	zink: new expected failures for sparse depth buffers Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Anv removed support for sparse depth buffers, but some glcts tests try to use them without first asking if we support them. We'll have to fix this in the VK-GL-CTS codebase. In the meantime, keep Marge happy. Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>	2025-08-01 14:51:10 -07:00
Paulo Zanoni	ea9f19ac7b	anv/sparse: call sparse_image_check_support from get_image_format_properties Funcion anv_get_image_format_properties() can get called from two different Vulkan entry points: - anv_GetPhysicalDeviceImageFormatProperties2 - anv_GetPhysicalDeviceSparseImageFormatProperties2 While there is a sparse-named function aimed specifically at sparse images, you can call vkGetPhysicalDeviceImageFormatProperties2 passing sparse flags in VkPhysicalDeviceImageFormatInfo2::flags. And when that happens, we need to detect it and properly either return VK_ERROR_FORMAT_NOT_SUPPORTED or properly set props->imageFormatProperties->sampleCounts with a value that matches the sparse usage. This change affects our behavior in 3 types of cases: color MSAA cases, depth/stencil MSAA cases and atomic_emulated cases. The previous patches should have covered these cases, so everything should be passing now. v2: Rebase. v3: Reword the commit message. v4: Rebase and reword the commit message. Testcase: dEQP-VK.api.info.sparse_image_format_properties2.2d.optimal.r16g16_unorm Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.r64_uint Reviewed-by: Iván Briano <ivan.briano@intel.com> (v1) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>	2025-08-01 14:51:10 -07:00
Paulo Zanoni	a1628aba1f	anv/sparse: we can support R64 and other atomics emulated formats We set sparseImageInt64Atomics to false on these formats, so there's no need for the software detiling. Thus, we can not set the flag, which will make ISL pick Tile64 for these formats, and things will work. Thanks to Lionel for pointing the fix here. Testcase: dEQP-VK.api.info.image_format_properties.d.optimal.r64_int Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>	2025-08-01 14:51:10 -07:00
Paulo Zanoni	d5da6980d3	anv/sparse: don't support depth/stencil with sparse We can't support multi-sampling with depth/stencil, only 1x and only with 2D and sometimes 3D formats. Claim everything as not supported, since games don't seem to be affected. This will be noticeable once we fix anv_GetPhysicalDeviceImageFormatProperties2() to stop (accidentally) lying about what we support: without this patch we'll get failures. It seems CTS expects that, if we do support the format, we have to support it with multi-sampling as well. Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint (and 5 others) Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>	2025-08-01 14:51:10 -07:00
Paulo Zanoni	420cda4798	anv/sparse: allow multiple sample bits in anv_sparse_image_check_support Prepare this function in a way where the caller is able to pass multiple sample bits as the 'samples' argument, and add an output to the function where we return the subset of 'samples' that is actually valid, when it's valid. For now none of the two callers is using the new argument, but this will be changed in the next patch. Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>	2025-08-01 14:51:10 -07:00
Paulo Zanoni	1797337efc	anv/sparse: declare sparse MSAA block shapes as standard before Xe2 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Only Xe2 and newer contain non-standard block shapes for sparse MSAA images. Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36523>	2025-08-01 21:32:04 +00:00
Paulo Zanoni	c6f832e849	anv/sparse: don't claim Xe2's non-standard MSAA shapes as unsupported We already advertise residencyStandard2DMultisampleBlockShape to be false, there's no need to claim these as not supported. Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36523>	2025-08-01 21:32:04 +00:00
Alyssa Rosenzweig	aca4948997	clc: force exact! across libclc libclc seems to have piles of bugs where it relies on precise floating point behaviours to meet CL precision requirements but doesn't actually disable fast math in its own spir-v. I am tired of playing this whack-a-mole game. Let's just assume that the math in CLC is right and should not be optimized in unsafe ways, and force the exact bit across libclc. This works around a large class of libclc bugs that keep cropping up from innocuous NIR changes. This does not force the exact bit for application shaders using libclc, just for the calculations inside of libclc itself. This seems like the right tradeoff all considered, anything "fast" bypasses libclc anyway. Fixes generated_tests/cl/builtin/math/builtin-float-pow-1.0.generated.cl on drivers using nir_opt_reassociate, and probably other stuff. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36527>	2025-08-01 21:00:47 +00:00
Georg Lehmann	cfd5fbfde1	nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16 Foz-DB Navi31: Totals from 11 out of 14 FSR4 shaders: Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21% CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15% Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00% InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10% VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91% Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47% VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21% VOPD: 1916 -> 1400 (-26.93%) These stats looks bad, but it's actually just unlucky RA. Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16) should still be a win from a register bandwidth perspective. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:30 +00:00
Georg Lehmann	3168ebe2c5	nir/range_analysis: look through vec2 Foz-DB Navi31: Totals from 11 out of 14 FSR4 shaders: Instrs: 58987 -> 58298 (-1.17%) CodeSize: 402844 -> 397836 (-1.24%) Latency: 209630 -> 209634 (+0.00%); split: -0.66%, +0.66% InvThroughput: 230240 -> 229152 (-0.47%); split: -0.48%, +0.00% VClause: 838 -> 826 (-1.43%); split: -1.55%, +0.12% Copies: 3019 -> 2954 (-2.15%); split: -2.82%, +0.66% VALU: 50196 -> 49637 (-1.11%) VOPD: 1950 -> 1916 (-1.74%); split: +0.72%, -2.46% Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:29 +00:00
Georg Lehmann	caf89c97de	nir/range_analysis: look through f2f Foz-DB Navi31: Totals from 93 (0.12% of 80273) affected shaders: Instrs: 123927 -> 121073 (-2.30%); split: -2.30%, +0.00% CodeSize: 670832 -> 653332 (-2.61%); split: -2.61%, +0.00% Latency: 337678 -> 322803 (-4.41%); split: -4.41%, +0.00% InvThroughput: 63277 -> 61083 (-3.47%) VClause: 460 -> 373 (-18.91%) SClause: 2178 -> 2100 (-3.58%) Copies: 7637 -> 7744 (+1.40%) PreSGPRs: 4414 -> 4287 (-2.88%) PreVGPRs: 4229 -> 4230 (+0.02%) VALU: 77375 -> 75693 (-2.17%) SALU: 16497 -> 16383 (-0.69%); split: -0.73%, +0.04% VMEM: 561 -> 477 (-14.97%) SMEM: 3197 -> 3113 (-2.63%) Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:28 +00:00
Georg Lehmann	261239a492	nir/opt_algebraic: use range analysis to detect no-op fmin/fmax Foz-DB Navi31: Totals from 418 (0.52% of 80273) affected shaders: Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01% CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01% Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00% InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00% Copies: 40126 -> 40125 (-0.00%) VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01% SALU: 50290 -> 50283 (-0.01%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:28 +00:00
Georg Lehmann	a0665e79e9	nir/opt_algebraic: push fsat into bcsel with constant bcsel doesn't have a free clamp modifier on AMD hardware, but what's inside might have free clamp. Foz-DB Navi31: Totals from 873 (1.09% of 80273) affected shaders: MaxWaves: 22008 -> 21968 (-0.18%) Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02% CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01% VGPRs: 57900 -> 57960 (+0.10%) Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02% InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00% VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14% SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02% Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10% PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05% PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08% VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01% SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Georg Lehmann	e9e5146848	nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive Foz-DB Navi31: Totals from 946 (1.18% of 80273) affected shaders: Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00% CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00% Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00% InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00% VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00% SClause: 109694 -> 109688 (-0.01%) Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01% Branches: 132636 -> 132633 (-0.00%) PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00% VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00% SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Rob Clark	898fa317dd	util: Optimize MESA_TRACE_FUNC() Avoiding the vsnprintf speeds up drawoverhead -test 3 by 60+% !! Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36492>	2025-08-01 19:58:24 +00:00
Rob Clark	b833bb2df4	freedreno/registers: Fix DBGC_CFG_DBGBUS_SEL_D definition Offset is the same, but bitfields change between a6xx and a7xx. Syncing the change from https://patchwork.freedesktop.org/series/152200/ Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>	2025-08-01 19:33:28 +00:00
Rob Clark	a05b6e293c	freedreno/crashdec: Add option to export a snapshot Add support to convert into the "snapshot" format used by internal tooling. Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>	2025-08-01 19:33:28 +00:00
Rob Clark	08b9d771e3	freedreno/crashdec: Sanitize index-regs section names Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>	2025-08-01 19:33:28 +00:00
Rob Clark	d8840db682	freedreno/decode: Add enum value decoding Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>	2025-08-01 19:33:27 +00:00
Job Noorman	c8f9990733	ir3/legalize: prevent infinite loop when inserting (ss)nop Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We need to insert a (ss)nop when an instruction that doesn't support (ss) needs it. However, when this happens in a block that needs to be legalized more than once (e.g., because it is in a loop), the (ss)nop would be inserted every iteration, causing an infinite loop. Fix this by checking if the previous instructions is a nop and applying (ss) there. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `5993723471` ("freedreno/a3xx/compiler: scheduling/legalize fixes") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36440>	2025-08-01 19:08:23 +00:00
Paulo Zanoni	257e1515e3	brw: null-tile sends don't need to skip L3 on Xe2 and newer Despite the information in "Overview of Memory Access" (57046), the L3 seems to be smarter on Xe2+. See `4aa3b2d3ad` ("anv: LNL+ doesn't need the special flush for sparse"). The behavior is the same both with vm_bind and TR-TT. v2: Add some comments (Caio). Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:37 +00:00
Paulo Zanoni	80f01c03ba	brw: remove unnecessary casts to unsigned after calling LSC_CACHE() The macro already casts the values to unsigned. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:37 +00:00
Paulo Zanoni	c845b30a21	brw: adjust comment pasted from a commit message The comment was pasted from the commit message that added it. Remove the parts that only make sense in the commit message, not in the final code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:37 +00:00
Paulo Zanoni	4bb41156b9	brw: mark 'volatile' sends as uncached on LSC messages The residencyNonResidentStrict property requires that writes to unbound memory be ignored and reads return zero. We need this property, otherwise vkd3d will claim we don't support DX12. If a shader writes to a variable associated with an unbound memory region (i.e., mapped to a null tile), reads it back (in the same shader) and expects the value be 0 instead of what is wrote, it has to use the 'volatile' access qualifier to the variable associated with the access, otherwise the compiler will be allowed to optmize things and use the non-zero value. This is explained in the "Accessing Unbound Regions" section of the Vulkan spec. Our hardware adds an extra problem on top of the above. BSpec page "Overview of Memory Access" (47630, 57046) says: "If a read from a Null tile gets a cache-hit in a virtually-addressed GPU cache, then the read may not return zeroes." So, when we detect this type of access, we have to turn off the caching. There's a proposed Vulkan CTS test that does exactly the above. No shaders on shader_db seem to be using 'volatile'. v2: - Reorder commit order - Rewrite commit message v3: - Rework the patch after Caio pointed out the interaction with 'coherent'. - Remove previous R-B tags due to the patch differences. v4: - Rework the patch and commit message again after further discussions. v5: - Check for atomic first so we don't regress DG2 atomic tests. Fixes future test: dEQP-VK.sparse_resources.buffer.ssbo.read_write.sparse_residency_non_resident_strict Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:37 +00:00
Paulo Zanoni	f7581e4a38	brw: consider 'volatile' memory access when doing CSE The GLSL spec says (among other things): "When a volatile variable is read, its value must be re-fetched from the underlying memory, even if the shader invocation performing the read had previously fetched its value from the same memory. When a volatile variable is written, its value must be written to the underlying memory, even if the compiler can conclusively determine that its value will be overwritten by a subsequent write." The SPIR-V spec says (among other things): "Accesses to volatile memory cannot be eliminated, duplicated, or combined with other accesses." So in this commit we make sure that both writes and reads marked as volatile can't be affected by CSE. v2: Reorder patches in the series. Credits-to: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1) Reviewed-by: Iván Briano <ivan.briano@intel.com> (v1) Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:36 +00:00
Paulo Zanoni	8e1e3ba152	brw: store 'volatile' GLSL/SPIR-V access in MEMORY_LOGICAL_FLAGS We seem to be ignoring the 'volatile' keyword coming from the shaders. Record this in MEMORY_LOGICAL_FLAGS so we can use it later. Credits-to: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:36 +00:00
Paulo Zanoni	670cd08c68	brw: remove unnecessary <vector> inclusions Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>	2025-08-01 18:47:35 +00:00
Jeongik Cha	3e39c09aa0	gfxstream: Generate goldfish dispatch code for AHB extension Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36510>	2025-08-01 18:34:15 +00:00
Daniel Schürmann	4ca3cc5a1a	aco/ra: propagate precolor affinities through parallelcopies and tied definitions Totals from 214 (0.27% of 79839) affected shaders: (Navi48) Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00% CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07% VGPRs: 9984 -> 9960 (-0.24%) Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01% InvThroughput: 40422 -> 40397 (-0.06%) Copies: 3180 -> 3155 (-0.79%) VALU: 38347 -> 38322 (-0.07%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00

1 2 3 4 5 ...

209614 commits