fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 02:58:06 +02:00

Author	SHA1	Message	Date
Matt Turner	a4f0a96dda	brw: Avoid reading past the end of `p->store` On the last iteration of the loop, `offset` will point to the location just beyond the last instruction in the program. If the program exactly fills `p->store` then calling `next_offset()` will read out of bounds. Instead just let the inner while loop call `next_offset()` one additional time. Fixes: `a35b9cb625` ("i965: Add annotation data structure and support code.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12486 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33101>	2025-01-21 22:58:55 +00:00
Caio Oliveira	fb09dac988	intel/brw: Remove 'fs' prefix from reg alloc code Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33112>	2025-01-21 07:33:49 -08:00
Caio Oliveira	62dd470d0a	intel/brw: Rename brw_fs_reg_allocate.cpp to brw_reg_allocate.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33112>	2025-01-21 07:33:49 -08:00
Pierre-Eric Pelloux-Prayer	b307951648	glx: fix glx-create-context-invalid-es-version * GLES3.x is only valid for x <= 2 * The expected error is GLXBadProfileARB, not BadValue cc: mesa-stable Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33036>	2025-01-21 14:33:13 +00:00
Caio Oliveira	793cba0e6f	intel/brw: Apply conventions to lower_src_modifiers helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33110>	2025-01-19 08:24:09 -08:00
Caio Oliveira	d7d210fed4	intel/brw: Move shuffle_from_32bit_read implementation to brw_builder Make it a member function for convenience -- since another member function uses it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33108>	2025-01-18 20:48:57 +00:00
Caio Oliveira	b3001e4946	intel/brw: Move a few builder helpers to brw_builder.h/cpp Add brw prefix when necessary. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33108>	2025-01-18 20:48:57 +00:00
Lionel Landwerlin	10a4dc529f	blorp: disable PS shaders with depth/stencil HiZ ops Found on simulation, complaining about SIMD32 shaders enabled when using MSAA 16x. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30753>	2025-01-18 17:52:19 +00:00
Caio Oliveira	1043187ec6	intel/brw: Stop using namespace for brw_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>	2025-01-18 16:12:56 +00:00
Caio Oliveira	5ac82efd35	intel/brw: Rename fs_builder to brw_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>	2025-01-18 16:12:55 +00:00
Caio Oliveira	f2d4c9db92	intel/brw: Rename brw_fs_builder.h to brw_builder.h Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>	2025-01-18 16:12:54 +00:00
Caio Oliveira	f0fe0026c0	intel/brw: Remove extra wrapping around fs_visitor in tests Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33100>	2025-01-18 07:41:35 -08:00
Caio Oliveira	94fa449318	intel/brw: Add missing cases to flags_written() These virtual opcodes will write the whole flag set, either directly (via brw_fill_flag()) or indirectly by using LOAD_LIVE_CHANNELS. Issue was found when analysing a hang that would disappear if the lowering of those opcodes was pulled all the way up right before brw_opt_cmod_propagation (which uses the flags_written). Fixes: `019770f026` ("intel/brw: Add SHADER_OPCODE_VOTE_*") Fixes: `2bd7592b0b` ("intel/brw: Add SHADER_OPCODE_BALLOT") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12347 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12479 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33085>	2025-01-18 05:30:23 +00:00
Lionel Landwerlin	f96e95fcc9	anv: remove print lowering This is handled by the back compiler. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:46 +00:00
Lionel Landwerlin	e1074f5bd4	anv: update debug printf example code Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:45 +00:00
Lionel Landwerlin	58a3ef4160	anv: handle printf buffer size relocations Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:45 +00:00
Lionel Landwerlin	d63b5fc8c5	brw: handle load_printf_buffer_size intrinsic Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:45 +00:00
Alyssa Rosenzweig	e1368f0a30	nir,util: move printf serializing into util there's nothing NIR specific here and these routines will be useful otherwise. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:45 +00:00
Alyssa Rosenzweig	e7a1d704d0	intel: set max_buffer_size to nir_lower_printf instead of relying on an implicit value which doesn't make much sense. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>	2025-01-17 18:09:45 +00:00
Caio Oliveira	0b310ae4d8	intel/brw: Rename fs_generator to brw_generator Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>	2025-01-17 00:04:41 +00:00
Caio Oliveira	3659934862	intel/brw: Add brw_generator.h header Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>	2025-01-17 00:04:41 +00:00
Caio Oliveira	a5a9f42a39	intel/brw: Rename brw_fs_generator.cpp to brw_generator.cpp Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>	2025-01-17 00:04:41 +00:00
Vignesh Raman	9e7ca3b86a	ci: update expectation files Update expectation files for the test runs with kernel 6.13-rc4. Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: David Heidelberg <None> Reviewed-by: Sergi Blanch Torné <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32788>	2025-01-16 22:57:52 +00:00
Vignesh Raman	af8ab2bb3e	ci: Uprev kernel to 6.13 Move to 6.13-rc4 for all mesa-ci jobs except anv-jsl. Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: David Heidelberg <None> Reviewed-by: Sergi Blanch Torné <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32788>	2025-01-16 22:57:52 +00:00
Lionel Landwerlin	2774fb32e6	brw: fix coarse_z computation on Xe2+ The payload format changed and we forgot to update this path. Putting a Fixes: commit that is kind of related but probably not the source of the issue. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12031 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11871 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12042 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12339 Fixes: `4672fcbc76` ("intel/fs: Fix PS thread payload setup for depth_w_coef_reg.") Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33029>	2025-01-16 07:19:57 +00:00
Felix DeGrood	0ff8534008	intel/perf: add new perf consts to support more metrics Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32909>	2025-01-16 00:01:56 +00:00
Collabora's Gfx CI Team	3f6f55e891	Uprev Piglit to 631b72944f56e688f56a08d26c8a9f3988801a08 `4c0fd15fd9...631b72944f` Reviewed-by: Sergi Blanch Torné <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32642>	2025-01-15 22:24:33 +00:00
Nanley Chery	15e23f3781	anv: Limit slow clear heuristic to ACM and prior It hasn't been tuned for Xe2. Fixes: `052d7e1a9c` ("anv: Slow clear if fast-clear cost is not mitigated") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33035>	2025-01-15 15:43:19 +00:00
Nanley Chery	caf007ff27	anv: Drop can_fast_clear_with_non_zero_color() This got dropped during a rebase. Fixes: `35f02d8f36` ("anv: Inline can_fast_clear_with_non_zero_color") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33035>	2025-01-15 15:43:19 +00:00
Matthew Brost	2a053b2e60	anv/xe: Bind queue per anv_queue The Xe uAPI is designed to use bind queues such that binds without input dependencies (sync objects) do not block on binds with input dependencies. For example: - Bind A (sparse) is submitted with a list of input dependencies. - Bind B (immediate) is subsequently submitted without a list of input dependencies. If Bind A and Bind B share a single bind queue, Bind B will not be scheduled until Bind A completes. Using individual bind queues decouples Bind A and Bind B, allowing Bind B to make immediate progress. This change creates a separate bind queue for each ANV queue, enabling support for sparse bindings that may have input dependencies. v2: - Bail on bind queue creation failure (Linoel) - Only create bind queue if VK_QUEUE_SPARSE_BINDING_BIT is set (Jose) v3: - Add comment around submit->queue usage (Jose) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32873>	2025-01-14 14:39:53 +00:00
Nanley Chery	cd8e120b97	anv: Allow more single subresource fast-clears with FCV Format re-interpretation is no longer a problem with texture views. The clear color address now points to a clear color that is in the expected format. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	35f02d8f36	anv: Inline can_fast_clear_with_non_zero_color Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	5549cb921d	Revert "anv: turn off non zero fast clears for CCS_E" This reverts commit `25a232238f`. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11110 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11325 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	3e62401df3	anv: Drop bpc check for non-zero fast clears Use the matching clear color address for an image view format to support any clear color. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>	2025-01-14 03:43:55 +00:00
Nanley Chery	83cd73385a	anv: Use L3 Fabric flush in fast-clear post-amble on TGL Replace the Tile Cache flush with an L3 Fabric flush. According to HSD 1604687438, this should be faster. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Nanley Chery	cec086a074	anv: Reduce fast-clear post-amble synchronization On gfx12+, the pre-amble and post-amble flushes contain the stalls necessary to ensure the prior operation is complete. Remove the extra uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do this for the pre-amble flushes, but this doesn't have any impact. The flush application function will implicitly add the bit. For A750, this improves the TWWH3 trace in the performance CI by 0.52% (n=2). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>	2025-01-14 03:14:00 +00:00
Caio Oliveira	634daf2827	intel/brw: Rename brw_fs_validate to brw_validate Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>	2025-01-13 23:56:22 +00:00
Nanley Chery	052d7e1a9c	anv: Slow clear if fast-clear cost is not mitigated Fast-clears require expensive flushes beforehand and afterwards. The cost of flushes are decreased in a series of back-to-back fast-clears as no extra fast-clear flushes are required in between them. If the ratio of a command buffer's recorded back-to-back fast clears over independent fast-clears falls below 1/2, prevent that command buffer from recording any further fast-clears. Averaging two runs of our Factorio trace on an A750 shows a +14.37% improvement in FPS. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>	2025-01-13 20:42:31 +00:00
Kenneth Graunke	894393470a	brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16 LSC can do native SIMD32 messages on Xe2. Cuts spill/fills on Lunarlake: - q2rtx-rt-pipeline: -20.83% / -16.85% - Borderlands 3 DX12: -18.26% / -2.09% - Cyberpunk 2077: -2.18% / -0.11% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>	2025-01-11 09:33:09 +00:00
Lionel Landwerlin	8ac7802ac8	brw: move final send lowering up into the IR Because we do emit the final send message form in code generation, a lot of emissions look like this : add(8) vgrf0, u0, 0x100 mov(1) a0.1, vgrf0 # emitted by the generator send(8) ..., a0.1 By moving address register manipulation in the IR, we can get this down to : add(1) a0.1, u0, 0x100 send(8) ..., a0.1 This reduce register pressure around some send messages by 1 vgrf. All lost shaders in the below results are fragment SIMD32, due to the throughput estimator. If turned off, we loose no SIMD32 shaders with this change. DG2 results: Assassin's Creed Valhalla: Totals from 2044 (96.87% of 2110) affected shaders: Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00% Subgroup size: 23832 -> 23824 (-0.03%) Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82% Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39% Fill count: 2005 -> 1256 (-37.36%) Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00% Max live registers: 116765 -> 115058 (-1.46%) Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67% Cyberpunk 2077: Totals from 1181 (93.43% of 1264) affected shaders: Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01% Subgroup size: 13016 -> 13032 (+0.12%) Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39% Spill count: 12 -> 8 (-33.33%) Fill count: 9 -> 6 (-33.33%) Dota2: Totals from 173 (11.59% of 1493) affected shaders: Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34% Max live registers: 5787 -> 5779 (-0.14%) Max dispatch width: 1344 -> 1152 (-14.29%) Hitman3: Totals from 5072 (95.39% of 5317) affected shaders: Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00% Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48% Spill count: 3942 -> 3200 (-18.82%) Fill count: 10158 -> 8846 (-12.92%) Scratch Memory Size: 257024 -> 223232 (-13.15%) Max live registers: 328467 -> 324631 (-1.17%) Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73% Fortnite: Totals from 360 (4.82% of 7472) affected shaders: Instrs: 778068 -> 777925 (-0.02%) Subgroup size: 3128 -> 3136 (+0.26%) Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19% Max live registers: 50689 -> 50658 (-0.06%) Hogwarts Legacy: Totals from 1376 (84.00% of 1638) affected shaders: Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03% Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12% Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36% Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23% Scratch Memory Size: 99328 -> 89088 (-10.31%) Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23% Max dispatch width: 11848 -> 11920 (+0.61%) Metro Exodus: Totals from 92 (0.21% of 43072) affected shaders: Instrs: 262995 -> 262968 (-0.01%) Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25% Max live registers: 11152 -> 11140 (-0.11%) Red Dead Redemption 2 : Totals from 451 (7.71% of 5847) affected shaders: Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00% Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00% Max live registers: 42294 -> 42185 (-0.26%) Spiderman Remastered: Totals from 6820 (98.02% of 6958) affected shaders: Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65% Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25% Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61% Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58% Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74% Max live registers: 493149 -> 487458 (-1.15%) Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20% Strange Brigade: Totals from 3769 (91.21% of 4132) affected shaders: Instrs: 1354476 -> 1321474 (-2.44%) Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59% Max live registers: 199057 -> 193656 (-2.71%) Max dispatch width: 30272 -> 30240 (-0.11%) Witcher 3: Totals from 25 (2.40% of 1041) affected shaders: Instrs: 24621 -> 24606 (-0.06%) Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05% Max live registers: 1963 -> 1955 (-0.41%) LNL results: Assassin's Creed Valhalla: Totals from 1928 (98.02% of 1967) affected shaders: Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11% Subgroup size: 41264 -> 41280 (+0.04%) Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11% Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90% Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60% Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56% Max live registers: 205483 -> 202192 (-1.60%) Cyberpunk 2077: Totals from 1177 (96.40% of 1221) affected shaders: Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03% Subgroup size: 24912 -> 24944 (+0.13%) Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81% Spill count: 8 -> 3 (-62.50%) Fill count: 6 -> 3 (-50.00%) Max live registers: 126922 -> 125472 (-1.14%) Dota2: Totals from 428 (32.47% of 1318) affected shaders: Instrs: 89355 -> 89740 (+0.43%) Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55% Max live registers: 32863 -> 32847 (-0.05%) Fortnite: Totals from 5354 (81.72% of 6552) affected shaders: Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53% Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65% Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72% Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35% Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71% Hitman3: Totals from 4912 (97.09% of 5059) affected shaders: Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00% Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55% Spill count: 3739 -> 3136 (-16.13%) Fill count: 10657 -> 9564 (-10.26%) Scratch Memory Size: 373760 -> 318464 (-14.79%) Max live registers: 597566 -> 589460 (-1.36%) Hogwarts Legacy: Totals from 1471 (96.33% of 1527) affected shaders: Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05% Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68% Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95% Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83% Scratch Memory Size: 251904 -> 217088 (-13.82%) Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12% Metro Exodus: Totals from 18356 (49.81% of 36854) affected shaders: Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83% Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84% Spill count: 595 -> 546 (-8.24%) Fill count: 1604 -> 1408 (-12.22%) Max live registers: 2086937 -> 2086933 (-0.00%) Red Dead Redemption 2: Totals from 4171 (79.31% of 5259) affected shaders: Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83% Subgroup size: 86416 -> 86432 (+0.02%) Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53% Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59% Scratch Memory Size: 401408 -> 385024 (-4.08%) Spiderman Remastered: Totals from 6639 (98.94% of 6710) affected shaders: Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98% Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59% Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82% Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76% Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17% Max live registers: 918240 -> 906604 (-1.27%) Strange Brigade: Totals from 3675 (92.24% of 3984) affected shaders: Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00% Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09% Max live registers: 361849 -> 351265 (-2.92%) Witcher 3: Totals from 13 (46.43% of 28) affected shaders: Instrs: 593 -> 660 (+11.30%) Cycle count: 28302 -> 28714 (+1.46%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	a27d98e933	brw: avoid having the scratch surface handle partially written Allows it to be visible through the def_analysis. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	aac906c16c	brw: add scheduler support for address registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	0a5bdf1199	brw: add infra to make use of the address register in the IR This limits the address register to simple cases inside a block. Validation ensures that the address register is only written once and read once. Instruction scheduling makes sure that instructions using the address register in the generator are not scheduled while there is an usage of the register in the IR. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	c9fa235c28	brw: split validation iteration into blocks No functional change. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	9b73a73a6e	brw: use phys_nr() more in generation Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	b110b06447	brw: introduce a new register type for the address register We want to reuse the brw::nr field as a virtual address register identifer. So we can't use brw::file=ARF brw::nr=ADDRESS. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Kenneth Graunke	de1eaa4019	brw: Always use MEMORY_LOAD for load_ubo_uniform_block_intel intrinsics Rather than emitting FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD to do block loads that were cacheline aligned, loading entire cachelines at a time, we now rely on NIR passes to group, CSE, and vectorize things into appropriately sized blocks. This means that we'll usually still load a cacheline, but we may load only 32B if we don't actually need anything from the full 64B. Prior to Xe2, this saves us registers, and it ought to save us some bandwidth as well as the response length can be lowered. The cacheline-aligning hack was the main reason not to simply call fs_nir_emit_memory_access(), so now we do that instead, porting yet one more thing to the common memory opcode framework. We unfortunately still emit the old FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD opcode for non-block intrinsics. We'd have to clean up 16-bit handling among other things in order to eliminate this, but we should in the future. fossil-db results on Alchemist for this and the previous patch together: Instrs: 161481888 -> 161297588 (-0.11%); split: -0.12%, +0.01% Subgroup size: 8102976 -> 8103000 (+0.00%) Send messages: 7895489 -> 7846178 (-0.62%); split: -0.67%, +0.05% Cycle count: 16583127302 -> 16703162264 (+0.72%); split: -0.57%, +1.29% Spill count: 72316 -> 67212 (-7.06%); split: -7.25%, +0.19% Fill count: 134457 -> 125970 (-6.31%); split: -6.83%, +0.52% Scratch Memory Size: 4093952 -> 3787776 (-7.48%); split: -7.53%, +0.05% Max live registers: 33037765 -> 32947425 (-0.27%); split: -0.28%, +0.00% Max dispatch width: 5780288 -> 5778536 (-0.03%); split: +0.17%, -0.20% Non SSA regs after NIR: 177862542 -> 178816944 (+0.54%); split: -0.06%, +0.60% In particular, several titles see incredible reductions in spill/fills: Shadow of the Tomb Raider: -65.96% / -65.44% Batman: Arkham City GOTY: -53.49% / -28.57% Witcher 3: -16.33% / -14.29% Total War: Warhammer III: -9.60% / -10.14% Assassins Creed Odyssey: -6.50% / -9.92% Red Dead Redemption 2: -6.77% / -8.88% Far Cry: New Dawn: -7.97% / -4.53% Improves performance in many games on Arc A750: Cyberpunk 2077: 5.8% Witcher 3: 4% Shadow of the Tomb Raider: 3.3% Assassins Creed: Valhalla: 3% Spiderman Remastered: 2.75% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	21636ff9fa	brw: Align and combine constant-offset UBO loads in NIR The hope here is to replace our backend handling for loading whole cachelines at a time from UBOs into NIR-based handling, which plays nicely with the NIR load/store vectorizer. Rounding down offsets to multiples of 64B allows us to globally CSE UBO loads across basic blocks. This is really useful. However, blindly rounding down the offset to a multiple of 64B can trigger anti-patterns where...a single unaligned memory load could have hit all the necessary data, but rounding it down split it into two loads. By moving this to NIR, we gain more control of the interplay between nir_opt_load_store_vectorize and this rebasing and CSE'ing. The backend can then simply load between nir_def_{first,last}_component_read() and trust that our NIR has the loads blockified appropriately. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	36d0485ae4	brw: Allow CSE of MEMORY_MODE_CONSTANT loads This matches the behavior of FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	7ce66e2b61	brw: Add a new MEMORY_MODE_CONSTANT option This will translate to HDC Constant Cache loads or LSC UGM loads. On LSC, MEMORY_MODE_UNTYPED would be fine, but for HDC we need to distinguish between the regular and constant cache data ports. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00

1 2 3 4 5 ...

13354 commits