fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 04:30:10 +01:00

Author	SHA1	Message	Date
Jose Maria Casanova Crespo	0bcb82048c	v3dv: avoid TFU reading unmapped pages beyond the end of the buffers TFU units is doing a readahead of 64 bytes. This is causing invalid read MMU errors that can be observed at the nightly full Vulkan runs on Broadcom devices. 04:13:59.969: [ 85.623205] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x4869000, pte invalid 04:14:05.408: [ 91.019321] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x5209000, pte invalid 04:14:05.413: [ 91.031662] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x7521000, pte invalid Although the log reports the TLB the real culprit is the TFU. A fix to the kernel was submitted to fix AXI ID on V3D 4.2 and 7.1 So doing an over-allocation of 64-bytes at v3dv_AllocateMemory is the simplest method to make these MMU errors itp disapear. Running ./deqp-vk for an hour, we can see that ~%40 of allocations would need an extra page (4096 bytes) to accomodate this 64 bytes padding. Fixes: `ca330f7f04` ("v3dv: implement VK_EXT_memory_budget") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34475>	2025-04-15 00:17:11 +02:00
Caio Oliveira	fafdd24285	intel/executor: Update bfloat example Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Elaborate on the packed/unpack restrictions, use ADD(x, 0.0f) as a workaround for F->BF conversion. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	fbe5d559bd	brw: Update EU validation to allow packed BF mixed with packed F Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	d1dd088ede	brw: Allow DPAS with BF on Gfx125 MTL doesn't support, but both ACM and ARL-H do. Fixes: `e384ccde28` ("brw: Expand EU validation for DPAS") Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	050acb9def	intel: Disable has_bfloat16 for MTL Not supported. Some operations do work, but proper support was removed since it also doesn't support DPAS. Fixes: `9916cc1050` ("brw: Add BRW_TYPE_BF for bfloat16") Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	adfab666a4	intel: Add intel_device_info::has_systolic Gfx125+ has systolic, with exception for MTL and some ARL variants. Update code and tests to use it. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Mike Blumenkrantz	bf5273dd38	ci: update VVL to current week Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651>	2025-04-14 17:51:05 +00:00
Mike Blumenkrantz	0b7611824a	zink: use implicit stride in ntv for temp vars APPARENTLY explicit stride is illegal for temp vars because they should just be using the element stride implicitly this makes total sense and is very obvious Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651>	2025-04-14 17:51:05 +00:00
Mike Blumenkrantz	b4e3535650	zink: stop setting ArrayStride on image arrays this is illegal cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651>	2025-04-14 17:51:05 +00:00
Mike Blumenkrantz	1c0de360bc	zink: don't set shared block stride without KHR_workgroup_memory_explicit_layout this is illegal cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651>	2025-04-14 17:51:05 +00:00
Connor Abbott	74531094cb	ir3: Vectorize shared memory loads/stores This drastically helps a Path of Exile 2 compute dispatch, going from 4.6ms to 2.7ms. Totals from 969 (0.59% of 164134) affected shaders: MaxWaves: 9586 -> 9560 (-0.27%); split: +0.02%, -0.29% Instrs: 1252433 -> 1234724 (-1.41%); split: -1.47%, +0.05% CodeSize: 2237424 -> 2195238 (-1.89%); split: -1.91%, +0.03% NOPs: 362213 -> 360913 (-0.36%); split: -0.92%, +0.56% MOVs: 58879 -> 59591 (+1.21%); split: -0.62%, +1.83% Full: 15817 -> 15867 (+0.32%); split: -0.04%, +0.36% (ss): 35671 -> 35434 (-0.66%); split: -1.80%, +1.14% (sy): 23953 -> 23964 (+0.05%); split: -0.38%, +0.43% (ss)-stall: 127807 -> 124930 (-2.25%); split: -3.43%, +1.18% (sy)-stall: 583947 -> 585886 (+0.33%); split: -0.61%, +0.94% Early-preamble: 317 -> 316 (-0.32%) Cat0: 394577 -> 393316 (-0.32%); split: -0.85%, +0.53% Cat1: 100335 -> 101057 (+0.72%); split: -0.36%, +1.08% Cat2: 415880 -> 415835 (-0.01%); split: -0.05%, +0.04% Cat3: 187928 -> 187929 (+0.00%); split: -0.00%, +0.00% Cat5: 19143 -> 19148 (+0.03%) Cat6: 69630 -> 52523 (-24.57%) Cat7: 47160 -> 47136 (-0.05%); split: -0.56%, +0.51% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34441>	2025-04-14 17:22:47 +00:00
Connor Abbott	9977c4d682	ir3: Move load/store vectorization to finalize Some frontends such as rusticl and turnip call the optimization loop before choosing the shared memory layout, in order to be able to delete variables that turn out to be unused. This means that we can't vectorize them until after the first run of the optimization loop. Other drivers also seem to do something similar. This also has the benefit that by delaying vectorization of UBOs until after they are lowered from derefs, we don't insert casts which remove the ability of nir_lower_explicit_io to insert a range, which was blocking the pushing of vectorized indirect UBO loads. This has a significant positive impact on fossil-db: Only doing vectorization later exposes a bug where vectorization could change the bitsize after we used it to determine which descriptor to use. It happened to work before because vectorization was usually done early. To fix it, move adjusting the descriptor to a new pass that happens after finalizing. Totals: MaxWaves: 2249140 -> 2281068 (+1.42%); split: +1.43%, -0.01% Instrs: 49624230 -> 49143117 (-0.97%); split: -1.14%, +0.17% CodeSize: 103796862 -> 104143744 (+0.33%); split: -0.98%, +1.31% NOPs: 8489860 -> 8512218 (+0.26%); split: -1.55%, +1.81% MOVs: 1531650 -> 1574911 (+2.82%); split: -1.37%, +4.20% Full: 1814334 -> 1748906 (-3.61%); split: -3.64%, +0.03% (ss): 1155395 -> 1128249 (-2.35%); split: -3.48%, +1.13% (sy): 608650 -> 567972 (-6.68%); split: -7.32%, +0.64% (ss)-stall: 4352550 -> 4340473 (-0.28%); split: -2.08%, +1.80% (sy)-stall: 17852259 -> 16943647 (-5.09%); split: -6.25%, +1.16% STPs: 24568 -> 24215 (-1.44%) LDPs: 37799 -> 37468 (-0.88%) Early-preamble: 115698 -> 113694 (-1.73%); split: +0.17%, -1.90% Cat0: 9345228 -> 9367782 (+0.24%); split: -1.41%, +1.65% Cat1: 2445265 -> 2549122 (+4.25%); split: -0.81%, +5.06% Cat2: 18704736 -> 18377519 (-1.75%); split: -1.76%, +0.01% Cat3: 14210303 -> 14130558 (-0.56%); split: -0.56%, +0.00% Cat4: 1346895 -> 1346462 (-0.03%); split: -0.03%, +0.00% Cat5: 1420418 -> 1420417 (-0.00%); split: -0.07%, +0.07% Cat6: 745590 -> 549358 (-26.32%); split: -26.66%, +0.34% Cat7: 1405795 -> 1401899 (-0.28%); split: -0.96%, +0.68% Totals from 79089 (48.19% of 164134) affected shaders: MaxWaves: 947648 -> 979576 (+3.37%); split: +3.40%, -0.03% Instrs: 38664140 -> 38183027 (-1.24%); split: -1.47%, +0.22% CodeSize: 80179110 -> 80525992 (+0.43%); split: -1.27%, +1.70% NOPs: 6880907 -> 6903265 (+0.32%); split: -1.91%, +2.23% MOVs: 1183855 -> 1227116 (+3.65%); split: -1.78%, +5.43% Full: 1107056 -> 1041628 (-5.91%); split: -5.96%, +0.05% (ss): 939342 -> 912196 (-2.89%); split: -4.28%, +1.39% (sy): 457959 -> 417281 (-8.88%); split: -9.73%, +0.85% (ss)-stall: 3664495 -> 3652418 (-0.33%); split: -2.47%, +2.14% (sy)-stall: 12266805 -> 11358193 (-7.41%); split: -9.10%, +1.69% STPs: 7494 -> 7141 (-4.71%) LDPs: 7050 -> 6719 (-4.70%) Early-preamble: 46339 -> 44335 (-4.32%); split: +0.43%, -4.75% Cat0: 7548630 -> 7571184 (+0.30%); split: -1.75%, +2.05% Cat1: 1823872 -> 1927729 (+5.69%); split: -1.09%, +6.78% Cat2: 14767716 -> 14440499 (-2.22%); split: -2.22%, +0.01% Cat3: 10630582 -> 10550837 (-0.75%); split: -0.75%, +0.00% Cat4: 1150090 -> 1149657 (-0.04%); split: -0.04%, +0.00% Cat5: 1068913 -> 1068912 (-0.00%); split: -0.09%, +0.09% Cat6: 554910 -> 358678 (-35.36%); split: -35.82%, +0.45% Cat7: 1119427 -> 1115531 (-0.35%); split: -1.20%, +0.86% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34441>	2025-04-14 17:22:46 +00:00
Connor Abbott	2f93137308	nir/opt_preamble: Handle load_global_ir3 fossil-db results with turnip: Totals from 994 (0.60% of 165023) affected shaders: MaxWaves: 10720 -> 11528 (+7.54%); split: +7.57%, -0.04% Instrs: 1032004 -> 972314 (-5.78%); split: -5.99%, +0.21% CodeSize: 1847536 -> 1942472 (+5.14%); split: -0.11%, +5.25% NOPs: 261089 -> 233279 (-10.65%); split: -10.89%, +0.23% MOVs: 57217 -> 51434 (-10.11%); split: -14.11%, +4.00% Full: 16412 -> 14647 (-10.75%); split: -10.96%, +0.21% (ss): 23330 -> 25594 (+9.70%); split: -5.51%, +15.21% (sy): 17803 -> 15711 (-11.75%); split: -11.93%, +0.18% (ss)-stall: 96387 -> 107976 (+12.02%); split: -5.14%, +17.17% (sy)-stall: 952952 -> 765754 (-19.64%); split: -19.84%, +0.19% STPs: 494 -> 327 (-33.81%) LDPs: 1447 -> 1163 (-19.63%) Early-preamble: 668 -> 22 (-96.71%) Cat0: 280935 -> 251779 (-10.38%); split: -10.60%, +0.22% Cat1: 93400 -> 84766 (-9.24%); split: -11.79%, +2.55% Cat2: 343880 -> 337270 (-1.92%); split: -3.20%, +1.28% Cat3: 189311 -> 180918 (-4.43%) Cat4: 21008 -> 19920 (-5.18%) Cat5: 17788 -> 17783 (-0.03%) Cat6: 45786 -> 39531 (-13.66%) Cat7: 39896 -> 40347 (+1.13%); split: -0.43%, +1.56% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34483>	2025-04-14 16:53:34 +00:00
Connor Abbott	ec780eb0e7	ir3: Pass through access flags when lowering global accesses This will let us do optimizations such as moving loads to a preamble. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34483>	2025-04-14 16:53:34 +00:00
Boris Brezillon	b7ff9dddd4	pan/earlyzs: Fix the read-only ZS optimization Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Read-only ZS optimization can only happen if the ZS tile buffer is not written, which can only be known when the fixed-function settings is set. Change pan_earlyzs_get() to take an enum instead of a boolean and differentiate ZS-read and ZS-read-with-readonly-optimization-allowed. Fixes: 25a993731087 ("pan/earlyzs: Support the shader ZS read-only case and its optimization on v10+") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34480>	2025-04-14 15:20:06 +00:00
Eric R. Smith	69a6db4b2b	panfrost: fix transaction elimination crc valid calculation The setting of the clean_pixel_write_enable flag in pan_prepare_rt was not consistent with the crc valid calculations in pan_emit_fbd. This caused the crc_valid flag to not be accurate, causing transaction elimination to fail. Fixes: `eac8f1d460` ("Revert "panfrost: Disable CRC by default"") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34408>	2025-04-14 14:56:35 +00:00
Adam Jackson	c4b305079d	meson: Simplify the power8 optimization logic If it compiles, it works. And there's not a particularly good reason to disable it, so don't let people disable it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34239>	2025-04-14 14:12:30 +00:00
Maíra Canal	3122df666e	broadcom/simulator: Fix Indirect CSD jobs for V3D 7.1.6+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34465>	2025-04-14 12:13:30 +00:00
Maíra Canal	d3ad4e3465	broadcom/simulator: Expose V3D revision number in the simulator interface Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34465>	2025-04-14 12:13:30 +00:00
Erik Faye-Lund	1d5da22dfd	nir/lower_tex: avoid undefined-behavior Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When texture_index and sampler_index are over 32, we can't really check for them in a single 32-bit word. This happens among other things when Panfrost uses preload shaders on v9 and later. Otherwise, we trigger undefined behavior. We're already doing this for textures in one case, let's be consistent. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>	2025-04-14 11:22:43 +00:00
Erik Faye-Lund	41b136f674	nir/lower_tex: use texture_mask instead of shifting on use In commit `292ac71a4a` ("nir/lower_tex: handle deref casts"), we avoided using texture_index when a texture instruction contained a variable deref. There's no good reason why this should be done to some of the lowering, but not all. So let's fix up code-paths that were added after this change to do the same. The first two patches here crossed paths with the commit that introduced texture_mask, so it's not strange that the change was missed. The last one seems to have just copied what was done around it, propagating the issue. Fixes: `880b00dc59` ("nir/lower_tex: Add support for lowering YUYV formats") Fixes: `1358d93650` ("nir/lower_tex: Add support for lowering Y41x formats") Fixes: `65d6f5aed2` ("nir: add options to lower y_vu, yv_yu, yx_xvxu and xy_vxux") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>	2025-04-14 11:22:43 +00:00
Vignesh Raman	8e069e1ef9	ci: Uprev kernel to 6.14 Move to 6.14 for all mesa-ci jobs using gfx-ci/linux, except anv-jsl, and Raven. Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34401>	2025-04-14 10:53:50 +00:00
Philipp Zabel	39855a8fd1	teflon: Log (un)supported operations Log all operations with the information used to decide whether they are supported or unsupported. Include tensor data types, conv2d fused activation and dilation parameters to debug output. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34472>	2025-04-14 10:33:38 +00:00
Philipp Zabel	f23b376e84	etnaviv/ml: Fix padding input/output tensor zero points Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details For tensors that were converted from signed 8-bit tensors to unsigned 8-bit tensors with offset zero point, use the offset zero point also for the TP pad operation. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34474>	2025-04-14 09:16:29 +00:00
Philipp Zabel	13a120d13c	etnaviv/ml: Drop duplicated function reorder_for_hw_depthwise() This function is unused, remove it. An identical copy is found (and used) in etnaviv_ml_nn.c. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34471>	2025-04-14 08:59:15 +00:00
Samuel Pitoiset	8ea46b14fa	ci: update VKCTS main to 76c1572eaba42d7ddd9bb8eb5788e52dd932068e RADV is the only driver using VKCTS main. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34299>	2025-04-14 08:24:14 +00:00
Samuel Pitoiset	410f7f9f6e	radv: only enable DCC for invisible VRAM on GFX12 DCC should only be allowed on invisible VRAM, otherwise the CPU could read the data and it will read garbage if it's compressed. This also caused GPU hangs after suspend/resume probably because some buffers were compressed when moved back from GTT to VRAM. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12962 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12922 Fixes: `9af11bf306` ("radv: add initial DCC support on GFX12") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34347>	2025-04-14 07:39:33 +00:00
Samuel Pitoiset	75be860eec	radv: use paired context regs when optimal on GFX12 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details CP is very slow on GFX12 and parsing the packet header is the main bottleneck. Using paired context regs reduce the number of packet headers and it should be more optimal. It doesn't seem worth when only one context reg is emitted (one packet header and same number of DWORDS) or when consecutive context regs are emitted (would increase the number of DWORDS). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421>	2025-04-14 06:18:13 +00:00
Samuel Pitoiset	f92f50c58a	radv: add macros for paired context registers on GFX12 Imported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421>	2025-04-14 06:18:13 +00:00
Job Noorman	35ec960f6f	ir3: run cp after ir3_imm_const_to_preamble Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Now that ir3_cp has an option to not lower immediates to const registers, we can use it after ir3_imm_const_to_preamble instead of manually propagating immediates. This fixes a lot of missed opportunities for early-preamble as we didn't propagate the mova1 immediate which a caused a GPR to be used in many preambles. Totals: Instrs: 49704517 -> 49703700 (-0.00%); split: -0.16%, +0.16% CodeSize: 103917968 -> 103187072 (-0.70%); split: -0.82%, +0.11% NOPs: 8516944 -> 8511764 (-0.06%); split: -0.78%, +0.72% MOVs: `1534023` -> 1536385 (+0.15%); split: -1.12%, +1.27% Full: 1816517 -> 1816548 (+0.00%); split: -0.05%, +0.06% (ss): 1162108 -> 1161490 (-0.05%); split: -1.03%, +0.98% (sy): 611398 -> 610311 (-0.18%); split: -0.80%, +0.62% (ss)-stall: 4384529 -> 4388096 (+0.08%); split: -1.22%, +1.30% (sy)-stall: 17858701 -> 17837101 (-0.12%); split: -0.87%, +0.74% STPs: 25096 -> 25491 (+1.57%); split: -0.05%, +1.63% LDPs: 37635 -> 38030 (+1.05%); split: -0.03%, +1.08% Preamble Instrs: 12589113 -> 11391946 (-9.51%); split: -9.75%, +0.24% Early Preamble: 115946 -> 122893 (+5.99%); split: +6.05%, -0.06% Cat0: 9374513 -> 9370393 (-0.04%); split: -0.71%, +0.67% Cat1: 2443348 -> 2446546 (+0.13%); split: -0.82%, +0.95% Cat2: 18731502 -> 18731478 (-0.00%); split: -0.00%, +0.00% Cat7: 1410092 -> `1410221` (+0.01%); split: -0.61%, +0.62% Totals from 39189 (23.81% of 164575) affected shaders: Instrs: 30656115 -> 30655298 (-0.00%); split: -0.26%, +0.26% CodeSize: 61714230 -> 60983334 (-1.18%); split: -1.37%, +0.19% NOPs: 6074700 -> 6069520 (-0.09%); split: -1.10%, +1.01% MOVs: 1010392 -> 1012754 (+0.23%); split: -1.70%, +1.93% Full: 617108 -> 617139 (+0.01%); split: -0.16%, +0.16% (ss): 778842 -> 778224 (-0.08%); split: -1.54%, +1.46% (sy): 362803 -> 361716 (-0.30%); split: -1.35%, +1.05% (ss)-stall: 3203827 -> `3207394` (+0.11%); split: -1.67%, +1.78% (sy)-stall: 9507680 -> 9486080 (-0.23%); split: -1.63%, +1.40% STPs: 23004 -> 23399 (+1.72%); split: -0.06%, +1.77% LDPs: 33942 -> 34337 (+1.16%); split: -0.04%, +1.20% Preamble Instrs: 8090918 -> 6893751 (-14.80%); split: -15.18%, +0.38% Early Preamble: 12246 -> 19193 (+56.73%); split: +57.25%, -0.52% Cat0: 6656706 -> 6652586 (-0.06%); split: -1.00%, +0.94% Cat1: 1546399 -> 1549597 (+0.21%); split: -1.30%, +1.50% Cat2: 11642214 -> 11642190 (-0.00%); split: -0.00%, +0.00% Cat7: 943911 -> 944040 (+0.01%); split: -0.91%, +0.92% Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397>	2025-04-14 04:37:28 +00:00
Job Noorman	226ec669d8	ir3/cp: ignore alias sources for sam.s2en ir3_cp asserts that the first source of a sam.s2en is a collect which isn't necessarily true after creating alias registers. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397>	2025-04-14 04:37:28 +00:00
Job Noorman	1618c2495b	ir3/cp: add option to disable immediate to const lowering This will allow it to be used after ir3_imm_const_to_preamble so that we don't have to do the propagation of immediates manually there. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397>	2025-04-14 04:37:27 +00:00
Job Noorman	6546a40225	ir3: remove spaces in shader stats The shaderdb scripts don't like them. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397>	2025-04-14 04:37:27 +00:00
Trigger Huang	1e709dbea3	radeonsi: Change program seqnece for perf counters Based on the sample usage described in https://registry.khronos.org/OpenGL/extensions/AMD/AMD_performance_monitor.txt , the value read from SQ_0004 is always 0, while other counters can be read successfully. This patch will sync the program sequence with the following link https://github.com/GPUOpen-Drivers/AMDVLK/releases/tag/v-2023.Q3.3 With it, SQ_0004 and also other counters can be raed successfully Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34360>	2025-04-14 10:23:46 +08:00
Karol Herbst	fc7badeac0	zink: don't apply the map_offset when mapping a staging resource in zink_buffer_map Some checks failed macOS-CI / macOS-CI (dri) (push) Has been cancelled Details macOS-CI / macOS-CI (xlib) (push) Has been cancelled Details Fixes regressions in the OpenCL CTS allocation tests. Fixes: `5d46e2bf3c` ("zink: implement unsynchronized staging uploads for buffers") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34494>	2025-04-12 17:42:53 +00:00
Faith Ekstrand	fadac25b0c	nil: Multiply by array_stride_B instead of adding Fixes: `5577128c83` ("nil: Rewrite the TIC code in Rust") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34495>	2025-04-12 17:04:40 +00:00
Faith Ekstrand	5c81b3546f	nvk/nvkmd: Check the correct flag for the Kepler GART workaround Fixes: `1db57bb414` ("nvk/nvkmd: Rework memory placement flags") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34495>	2025-04-12 17:04:40 +00:00
Konstantin Seurer	985f5e0875	lavapipe: Do not emit aabb handling if no isec shader is used Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003>	2025-04-12 17:22:50 +02:00
Konstantin Seurer	7113620625	lavapipe: pre-load tmax tmax is lowered to scratch with ray tracing pipelines. Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003>	2025-04-12 17:22:44 +02:00
Konstantin Seurer	c1a620ae19	lavapipe: Run nir optimizations on ray tracing pipelines Improves performance by 10%. Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003>	2025-04-12 17:22:37 +02:00
Konstantin Seurer	cdb2e3d2b5	lavapipe: Prefetch 56 bytes of node data during ray traversal Almost all node types need around 56 bytes of data. This patch fetches this data in a less divergent block. Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003>	2025-04-12 17:22:27 +02:00
Konstantin Seurer	676e26aed5	radv: Fix rayTracingPositionFetch with multiple geometies Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The fix adds more indirections to avoid increasing register pressure by tracking the primitive address. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34460>	2025-04-11 22:26:08 +00:00
Aleksi Sapon	77eb58baad	draw: fix gl_PrimitiveID in tessellation Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33415>	2025-04-11 22:01:05 +00:00
Konstantin Seurer	cb31b5a958	clc,libcl: Clean up CL includes This patch does a couple of things to make CL integration with drivers as seamless as possible: - We pull in opencl-c.h and opencl-c-base.h to stop relying on system headers. - Parts of libcl.h are moved to new headers that are incomplete CL-safe variants of libc headers. - A couple of util headers are changed to remove now unnecessary __OPENCL_VERSION__ guards and make more headers CL safe. - Drivers now include src/compiler/libcl and use headers like macros.h,u_math.h instead of libcl.h. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>	2025-04-11 21:27:37 +00:00
Konstantin Seurer	a80fab3e87	clc: Allow bitfields bitfields are not officially supported by Open CL but there is a clang extension that adds support. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>	2025-04-11 21:27:37 +00:00
Konstantin Seurer	ed07aab147	clc: Print errors when initializing clang fails It's nice to know what actually went wrong. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>	2025-04-11 21:27:37 +00:00
Dmitry Baryshkov	b9c6afd3a7	meson: disable SIMD blake optimisations on x32 host Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On X.org startup libgallium crashes on x32 hosts inside blake3_hash_many_sse41(), most likely because of the different pointer size. Disable SIMD blake implementation if x32 is detected. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34453>	2025-04-11 20:57:38 +00:00
Kenneth Graunke	eb1ec9cf8e	brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs() This allows us to create temporary VGRFs that are larger than MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not be split on the initial pass, because we may need LOAD_PAYLOAD lowering, copy propagation, and so on to occur first. So we allow registers to exceed that size initially. The "Register allocation relies on split_virtual_grfs()" assertion in brw_reg_allocate.cpp still asserts that all VGRFs which reach the register allocator have been properly split. One case where this is useful is for vectorizing convergent block loads. We create temporaries to splat the SIMD1 values out to SIMD(N), which can lead to some very large temporaries. However, copy propagation and so on ultimately eliminate these and they'll get split down to proper sizes or elided entirely in the end. (Note: both this and the prior commits from this merge request are needed to close the linked issue.) Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	a45583f078	brw: Use live->max_vgrf_size in pre-RA scheduling Post-RA scheduling doesn't use liveness analysis, so we continue using MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use live->max_vgrf_size. This helps get us to a place where we can emit arbitrarily large VGRFs early on in compilation, but which will be split and cleaned up prior to register allocation. It may also allocate smaller arrays in practice since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things we actually could need to allocate. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	4b27b5895c	brw: Use live->max_vgrf_size in register coalescing We already require liveness, so just use the actual maximum size we saw instead of a hardcoded pessimal size. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00

... 3 4 5 6 7 ...

204373 commits