fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-26 10:40:11 +01:00

Author	SHA1	Message	Date
Chia-I Wu	0b3e10d6fd	panvk: no need to check query count on query create The spec says VUID-VkQueryPoolCreateInfo-queryCount-02763 queryCount must be greater than 0 Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>	2025-01-14 05:43:46 +00:00
Chia-I Wu	04e899f125	panvk: no need to zero availability on query create The spec says After query pool creation, each query is in an uninitialized state and must be reset before it is used. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>	2025-01-14 05:43:46 +00:00
Nanley Chery	cd8e120b97	anv: Allow more single subresource fast-clears with FCV Format re-interpretation is no longer a problem with texture views. The clear color address now points to a clear color that is in the expected format. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	35f02d8f36	anv: Inline can_fast_clear_with_non_zero_color Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	5549cb921d	Revert "anv: turn off non zero fast clears for CCS_E" This reverts commit `25a232238f`. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11110 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11325 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	3e62401df3	anv: Drop bpc check for non-zero fast clears Use the matching clear color address for an image view format to support any clear color. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	83cd73385a	anv: Use L3 Fabric flush in fast-clear post-amble on TGL Replace the Tile Cache flush with an L3 Fabric flush. According to HSD 1604687438, this should be faster. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Nanley Chery	cec086a074	anv: Reduce fast-clear post-amble synchronization On gfx12+, the pre-amble and post-amble flushes contain the stalls necessary to ensure the prior operation is complete. Remove the extra uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do this for the pre-amble flushes, but this doesn't have any impact. The flush application function will implicitly add the bit. For A750, this improves the TWWH3 trace in the performance CI by 0.52% (n=2). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Nanley Chery	e9a85dd3ac	iris: Use L3 Fabric flush in fast-clear post-amble on TGL Replace the Tile Cache flush with an L3 Fabric flush. According to HSD 1604687438, this should be faster. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Nanley Chery	2e7f344508	iris: Reduce fast-clear post-amble flushes On gfx12+, the post-amble flushes contain the stalls necessary to ensure the prior operation is complete. Remove the extra uses iris_emit_end_of_pipe_sync(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Caio Oliveira	634daf2827	intel/brw: Rename brw_fs_validate to brw_validate Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>	2025-01-13 23:56:22 +00:00
Caio Oliveira	d37cbfad66	util/ra: Don't store a pointer to a ra_regs per ra_reg Each reg may store a list of conflict regs. This was handled by util_dynarray, however each of those hold an extra pointer for the ra_regs (which serves as mem_ctx for that). Since the usage here is very simple, we just handle the array growth manually. The initial size remains the same as before. The mem_ctx of each ra_reg was being used to identify the case in which the list wasn't used. Change to use a bool in the ra_regs struct instead. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>	2025-01-13 23:10:51 +00:00
Caio Oliveira	298740d7a1	util/ra: Bump the initial size of adjacency lists For Intel, looking at a few fossils, the majority of nodes have more than 32 entries in the list. I'd expect other backends to have similar numbers. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>	2025-01-13 23:10:51 +00:00
Caio Oliveira	9cccb89dbc	util/ra: Don't store a pointer to graph per ra_node Each node stores a list of adjacent nodes. This was handled by util_dynarray, however each of those hold an extra pointer for the ra_graph (which serves as mem_ctx for that). Since the usage here is very simple, we just handle the array growth manually. For now keep using the same initial size as was being used by dynarray. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>	2025-01-13 23:10:51 +00:00
Caio Oliveira	3753c9ed1b	util/ra: Move less used data out of ra_node Create a parallel array to hold them. In particular, the `spill_cost` is used at a completely different moment than the main node data. Reduces the `struct ra_node` size to 40 bytes. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>	2025-01-13 23:10:51 +00:00
Nanley Chery	052d7e1a9c	anv: Slow clear if fast-clear cost is not mitigated Fast-clears require expensive flushes beforehand and afterwards. The cost of flushes are decreased in a series of back-to-back fast-clears as no extra fast-clear flushes are required in between them. If the ratio of a command buffer's recorded back-to-back fast clears over independent fast-clears falls below 1/2, prevent that command buffer from recording any further fast-clears. Averaging two runs of our Factorio trace on an A750 shows a +14.37% improvement in FPS. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>	2025-01-13 20:42:31 +00:00
Brian Paul	24107f2f67	svga: fix printing 64-bit value for 32-bit build Closes: #12449, #12451 Fixes: `b13e2a495e` ("svga: add svga_resource_create_with_modifiers() function") Signed-off-by: Brian Paul <brian.paul@broadcom.com> Reviewed-by: Neha Bhende <neha.Bhende@broadcom.com> Reviewed-by: Neha Bhende <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32995>	2025-01-13 18:25:55 +00:00
Zan Dobersek	7c927144b3	freedreno/registers: fix RBBM_PRIMCTR understanding and usage RBBM_PRIMCTR registers are used for different pipeline statistics that can be queried, but current usage was wrong in some cases. Comments in the register file are updated, and the per-statistic register index mapping is updated accordingly. Fixes on a750: test_query_pipeline_statistics in vkd3d-proton arb_query_buffer_object failures in piglit (zink) Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32900>	2025-01-13 15:46:20 +00:00
Sergi Blanch Torne	3fed68b607	Revert "ci: disable Collabora's farm due to maintenance" This reverts commit `02f8b22a1a`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32993>	2025-01-13 13:43:53 +00:00
David Rosca	5a5628284a	frontends/va: Allow creating DRM PRIME surfaces without surface descriptor If we don't have surface descriptor, treat this as a hint from application that it will export the surface later. This matches Intel driver behavior. Reviewed-by: Leo Liu <leo.liu@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32970>	2025-01-13 10:26:02 +00:00
Samuel Pitoiset	10e424f586	aco: always use ds_bpermute for shuffle/rotate on GFX12 ds_bpermute supports both 32 and 64 lanes now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>	2025-01-13 08:33:38 +00:00
Samuel Pitoiset	b3d4d65f5a	radv: fix CP DMA clears/copies on GFX12 CP DMA on GFX12 doesn't always use L2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32971>	2025-01-13 08:07:58 +00:00
Samuel Pitoiset	603541f1a2	ac/gpu_info: add cp_dma_use_L2 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32971>	2025-01-13 08:07:58 +00:00
Sergi Blanch Torne	02f8b22a1a	ci: disable Collabora's farm due to maintenance Planned downtime in the farm: * Start: 2025-01-13 08:00 UTC * End: 2025-01-13 14:00 UTC Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32734>	2025-01-13 07:36:17 +00:00
Job Noorman	c1dfe22b7b	ir3: emit uniform iadd3 as two adds The `sad` instruction (used for iadd3) doesn't support the scalar ALU. This means we might fall back to non-earlypreamble whenever we use it in the preamble. Prevent this by emitting it as two adds instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32943>	2025-01-13 07:06:03 +00:00
Lucas Stach	bed748d5f6	etnaviv: fix polygon offset disable If a polygon offset is set via glPolygonOffset, but the functionality isn't enabled via glEnable(GL_POLYGON_OFFSET_FILL) the offset must not be taken into account when computing the sample depth. As the Vivante hardware does not have a separate enable state, the offset units and scale must both be set to 0 to keep the sample depth unchanged. Fixes dEQP-GLES2.functional.polygon_offset.default_enable Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32982>	2025-01-12 21:06:33 +00:00
duncan.hopkins	20b806284a	glx: Add back in `applegl_create_display()` so the OpenGL.framework, on MacOS, pointer get setup. Fixes: `4e8740370a` ("glx: rework __glXInitialize") Tested-by: Yurii Kolesnykov <root@yurikoles.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32656>	2025-01-12 16:49:33 +00:00
duncan.hopkins	48ebbe2777	glx: Guard some of the bind_extensions() code with the same conditions as `glx_screen`s `frontend_screen` member. Configution like simple MacOS builds do not have `frontend_screen` and fail to build. Fixes: `34dea2b38e` ("glx: unify extension binding") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12317 Tested-by: Yurii Kolesnykov <root@yurikoles.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32656>	2025-01-12 16:49:33 +00:00
Karol Herbst	0aa218328d	rusticl/kernel: store memory arguments as Weak references Through the spec it's required that cl_kernel doesn't hold references to its bound kernel arguments. There is a CL CTS test verifying this, but because the arguments were not used in the test kernel, a reference was never taken. This will change with SVM and BDA as we need to know all bound memory objects even if they aren't directly used in kernels. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: @LingMan Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32961>	2025-01-12 15:26:14 +00:00
Rob Clark	114a47544f	ir3: Add preamble instr count metric Turnip already had a pipeline stat to indicate whether we were using early-preamble or not. But no way to tell if there was a preamble at all. Adding a preamble instruction count tells us whether there is a preamble, but also how big it is. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32977>	2025-01-11 18:10:17 +00:00
Kenneth Graunke	894393470a	brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16 LSC can do native SIMD32 messages on Xe2. Cuts spill/fills on Lunarlake: - q2rtx-rt-pipeline: -20.83% / -16.85% - Borderlands 3 DX12: -18.26% / -2.09% - Cyberpunk 2077: -2.18% / -0.11% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>	2025-01-11 09:33:09 +00:00
Lionel Landwerlin	8ac7802ac8	brw: move final send lowering up into the IR Because we do emit the final send message form in code generation, a lot of emissions look like this : add(8) vgrf0, u0, 0x100 mov(1) a0.1, vgrf0 # emitted by the generator send(8) ..., a0.1 By moving address register manipulation in the IR, we can get this down to : add(1) a0.1, u0, 0x100 send(8) ..., a0.1 This reduce register pressure around some send messages by 1 vgrf. All lost shaders in the below results are fragment SIMD32, due to the throughput estimator. If turned off, we loose no SIMD32 shaders with this change. DG2 results: Assassin's Creed Valhalla: Totals from 2044 (96.87% of 2110) affected shaders: Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00% Subgroup size: 23832 -> 23824 (-0.03%) Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82% Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39% Fill count: 2005 -> 1256 (-37.36%) Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00% Max live registers: 116765 -> 115058 (-1.46%) Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67% Cyberpunk 2077: Totals from 1181 (93.43% of 1264) affected shaders: Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01% Subgroup size: 13016 -> 13032 (+0.12%) Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39% Spill count: 12 -> 8 (-33.33%) Fill count: 9 -> 6 (-33.33%) Dota2: Totals from 173 (11.59% of 1493) affected shaders: Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34% Max live registers: 5787 -> 5779 (-0.14%) Max dispatch width: 1344 -> 1152 (-14.29%) Hitman3: Totals from 5072 (95.39% of 5317) affected shaders: Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00% Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48% Spill count: 3942 -> 3200 (-18.82%) Fill count: 10158 -> 8846 (-12.92%) Scratch Memory Size: 257024 -> 223232 (-13.15%) Max live registers: 328467 -> 324631 (-1.17%) Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73% Fortnite: Totals from 360 (4.82% of 7472) affected shaders: Instrs: 778068 -> 777925 (-0.02%) Subgroup size: 3128 -> 3136 (+0.26%) Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19% Max live registers: 50689 -> 50658 (-0.06%) Hogwarts Legacy: Totals from 1376 (84.00% of 1638) affected shaders: Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03% Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12% Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36% Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23% Scratch Memory Size: 99328 -> 89088 (-10.31%) Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23% Max dispatch width: 11848 -> 11920 (+0.61%) Metro Exodus: Totals from 92 (0.21% of 43072) affected shaders: Instrs: 262995 -> 262968 (-0.01%) Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25% Max live registers: 11152 -> 11140 (-0.11%) Red Dead Redemption 2 : Totals from 451 (7.71% of 5847) affected shaders: Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00% Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00% Max live registers: 42294 -> 42185 (-0.26%) Spiderman Remastered: Totals from 6820 (98.02% of 6958) affected shaders: Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65% Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25% Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61% Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58% Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74% Max live registers: 493149 -> 487458 (-1.15%) Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20% Strange Brigade: Totals from 3769 (91.21% of 4132) affected shaders: Instrs: 1354476 -> 1321474 (-2.44%) Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59% Max live registers: 199057 -> 193656 (-2.71%) Max dispatch width: 30272 -> 30240 (-0.11%) Witcher 3: Totals from 25 (2.40% of 1041) affected shaders: Instrs: 24621 -> 24606 (-0.06%) Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05% Max live registers: 1963 -> 1955 (-0.41%) LNL results: Assassin's Creed Valhalla: Totals from 1928 (98.02% of 1967) affected shaders: Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11% Subgroup size: 41264 -> 41280 (+0.04%) Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11% Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90% Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60% Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56% Max live registers: 205483 -> 202192 (-1.60%) Cyberpunk 2077: Totals from 1177 (96.40% of 1221) affected shaders: Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03% Subgroup size: 24912 -> 24944 (+0.13%) Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81% Spill count: 8 -> 3 (-62.50%) Fill count: 6 -> 3 (-50.00%) Max live registers: 126922 -> 125472 (-1.14%) Dota2: Totals from 428 (32.47% of 1318) affected shaders: Instrs: 89355 -> 89740 (+0.43%) Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55% Max live registers: 32863 -> 32847 (-0.05%) Fortnite: Totals from 5354 (81.72% of 6552) affected shaders: Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53% Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65% Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72% Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35% Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71% Hitman3: Totals from 4912 (97.09% of 5059) affected shaders: Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00% Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55% Spill count: 3739 -> 3136 (-16.13%) Fill count: 10657 -> 9564 (-10.26%) Scratch Memory Size: 373760 -> 318464 (-14.79%) Max live registers: 597566 -> 589460 (-1.36%) Hogwarts Legacy: Totals from 1471 (96.33% of 1527) affected shaders: Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05% Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68% Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95% Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83% Scratch Memory Size: 251904 -> 217088 (-13.82%) Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12% Metro Exodus: Totals from 18356 (49.81% of 36854) affected shaders: Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83% Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84% Spill count: 595 -> 546 (-8.24%) Fill count: 1604 -> 1408 (-12.22%) Max live registers: 2086937 -> 2086933 (-0.00%) Red Dead Redemption 2: Totals from 4171 (79.31% of 5259) affected shaders: Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83% Subgroup size: 86416 -> 86432 (+0.02%) Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53% Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59% Scratch Memory Size: 401408 -> 385024 (-4.08%) Spiderman Remastered: Totals from 6639 (98.94% of 6710) affected shaders: Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98% Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59% Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82% Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76% Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17% Max live registers: 918240 -> 906604 (-1.27%) Strange Brigade: Totals from 3675 (92.24% of 3984) affected shaders: Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00% Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09% Max live registers: 361849 -> 351265 (-2.92%) Witcher 3: Totals from 13 (46.43% of 28) affected shaders: Instrs: 593 -> 660 (+11.30%) Cycle count: 28302 -> 28714 (+1.46%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	a27d98e933	brw: avoid having the scratch surface handle partially written Allows it to be visible through the def_analysis. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	aac906c16c	brw: add scheduler support for address registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	0a5bdf1199	brw: add infra to make use of the address register in the IR This limits the address register to simple cases inside a block. Validation ensures that the address register is only written once and read once. Instruction scheduling makes sure that instructions using the address register in the generator are not scheduled while there is an usage of the register in the IR. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	c9fa235c28	brw: split validation iteration into blocks No functional change. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	9b73a73a6e	brw: use phys_nr() more in generation Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	b110b06447	brw: introduce a new register type for the address register We want to reuse the brw::nr field as a virtual address register identifer. So we can't use brw::file=ARF brw::nr=ADDRESS. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Marek Olšák	842c91300f	mesa: enable GL name reuse by default for all drivers except virgl v2: detect qemu, crossvm Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32715>	2025-01-11 05:54:52 +00:00
Marek Olšák	b15c8fe3f1	mesa: rework enablement of force_gl_names_reuse force_gl_names_reuse is changed to integer. -1 means default (currently disabled), 0 means disabled, 1 means enabled The names reuse initialization is moved to _mesa_alloc_shared_state -> _mesa_InitHashTable instead of _mesa_HashEnableNameReuse. It will be enabled by default. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32715>	2025-01-11 05:54:52 +00:00
Felix DeGrood	06423b1792	vk/overlay-layer: defer log creation to swapchain creation Moving output file creation to coincide with swapchain creation ensures only rendering thread will create/destroy log file. This was causing problems with non-rendering processes stomping log file. Reviewed-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32814>	2025-01-10 23:44:24 +00:00
Kenneth Graunke	de1eaa4019	brw: Always use MEMORY_LOAD for load_ubo_uniform_block_intel intrinsics Rather than emitting FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD to do block loads that were cacheline aligned, loading entire cachelines at a time, we now rely on NIR passes to group, CSE, and vectorize things into appropriately sized blocks. This means that we'll usually still load a cacheline, but we may load only 32B if we don't actually need anything from the full 64B. Prior to Xe2, this saves us registers, and it ought to save us some bandwidth as well as the response length can be lowered. The cacheline-aligning hack was the main reason not to simply call fs_nir_emit_memory_access(), so now we do that instead, porting yet one more thing to the common memory opcode framework. We unfortunately still emit the old FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD opcode for non-block intrinsics. We'd have to clean up 16-bit handling among other things in order to eliminate this, but we should in the future. fossil-db results on Alchemist for this and the previous patch together: Instrs: 161481888 -> 161297588 (-0.11%); split: -0.12%, +0.01% Subgroup size: 8102976 -> 8103000 (+0.00%) Send messages: 7895489 -> 7846178 (-0.62%); split: -0.67%, +0.05% Cycle count: 16583127302 -> 16703162264 (+0.72%); split: -0.57%, +1.29% Spill count: 72316 -> 67212 (-7.06%); split: -7.25%, +0.19% Fill count: 134457 -> 125970 (-6.31%); split: -6.83%, +0.52% Scratch Memory Size: 4093952 -> 3787776 (-7.48%); split: -7.53%, +0.05% Max live registers: 33037765 -> 32947425 (-0.27%); split: -0.28%, +0.00% Max dispatch width: 5780288 -> 5778536 (-0.03%); split: +0.17%, -0.20% Non SSA regs after NIR: 177862542 -> 178816944 (+0.54%); split: -0.06%, +0.60% In particular, several titles see incredible reductions in spill/fills: Shadow of the Tomb Raider: -65.96% / -65.44% Batman: Arkham City GOTY: -53.49% / -28.57% Witcher 3: -16.33% / -14.29% Total War: Warhammer III: -9.60% / -10.14% Assassins Creed Odyssey: -6.50% / -9.92% Red Dead Redemption 2: -6.77% / -8.88% Far Cry: New Dawn: -7.97% / -4.53% Improves performance in many games on Arc A750: Cyberpunk 2077: 5.8% Witcher 3: 4% Shadow of the Tomb Raider: 3.3% Assassins Creed: Valhalla: 3% Spiderman Remastered: 2.75% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	21636ff9fa	brw: Align and combine constant-offset UBO loads in NIR The hope here is to replace our backend handling for loading whole cachelines at a time from UBOs into NIR-based handling, which plays nicely with the NIR load/store vectorizer. Rounding down offsets to multiples of 64B allows us to globally CSE UBO loads across basic blocks. This is really useful. However, blindly rounding down the offset to a multiple of 64B can trigger anti-patterns where...a single unaligned memory load could have hit all the necessary data, but rounding it down split it into two loads. By moving this to NIR, we gain more control of the interplay between nir_opt_load_store_vectorize and this rebasing and CSE'ing. The backend can then simply load between nir_def_{first,last}_component_read() and trust that our NIR has the loads blockified appropriately. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	36d0485ae4	brw: Allow CSE of MEMORY_MODE_CONSTANT loads This matches the behavior of FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	7ce66e2b61	brw: Add a new MEMORY_MODE_CONSTANT option This will translate to HDC Constant Cache loads or LSC UGM loads. On LSC, MEMORY_MODE_UNTYPED would be fine, but for HDC we need to distinguish between the regular and constant cache data ports. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	cfbb5ebcdd	brw: Skip unread leading/trailing components in convergent block loads The NIR vectorizer may produce block loads with unread trailing components. Upcoming passes may produce unread leading components as well. With a bit of finesse, we can skip loading those, and only bother with the ones we actually need. This can sometimes save us on loads and MOVs. v2: Skip this for SLM reads on pre-LSC platforms (caught by Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	4f0c852a4e	brw: Skip unnecessary work for trivial emit_uniformize of IMMs If we pass an immediate, just trivially return that immediate. This preserves the property that if x was an IMM, emit_uniformize(x) will also be an IMM, without the need for optimizations to eliminate unnecessary operations. That way, you can call emit_uniformize() on a value and still check whether it's constant afterwards. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	a0b1e07976	brw: Make get_nir_src_imm() usable for non-32-bit-sizes. We return an immediate for 32-bit constant values, but fall back to calling get_nir_src() for other values, as 64-bit, and even 8-bit immediates have odd restrictions. We could probably support 16-bit here without too many issues, but we leave it be for now. This makes it usable for case where we'd like to get constants for 32-bit values but where it may be a different bit-size too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	03f948f5fd	brw: Skip fetching unread leading components of UBO loads We were already skipping unread trailing components, but now we skip them on both ends. About -3.5% spills on Shadow of the Tomb Raider on Alchemist (mostly a wash elsewhere, but it will help additional shaders with later patches). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	c8b2ab041e	brw: Add more safeguards against misaligned OWord Block messages HDC doesn't support block loads/stores with sub-DWord (<4B) aligned offsets, and shared local memory has to use the Aligned OWord Block messages which require OWord (16B) alignment. Make the validator detect this case and say no. Also make the lowering code assert that the alignment is valid as a second line of defense. LSC has no such restrictions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00

... 24 25 26 27 28 ...

201327 commits