fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 13:20:14 +01:00

Author	SHA1	Message	Date
Ian Romanick	07dc1d4043	brw/algebraic: Clear condition modifier on optimized SEL instruction The condition modifier on SEL means something completely different than it means on MOV. On MOV it means to modify the flags based on the value written to the destination. On SEL it means to compare the sources using that mode and pick the result (i.e., as min() or max()) without modifying the flags. The resulting MOV should not have a condition modifier for the same reason it (already) doesn't have a predicate. This bug was found by inspection, so I added a unit test. No shader-db or shader-db changes on any Intel platform. Fixes: `fab92fa1cb` ("i965/fs: Optimize SEL with the same sources into a MOV.") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Caio Oliveira	fbe5d559bd	brw: Update EU validation to allow packed BF mixed with packed F Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	d1dd088ede	brw: Allow DPAS with BF on Gfx125 MTL doesn't support, but both ACM and ARL-H do. Fixes: `e384ccde28` ("brw: Expand EU validation for DPAS") Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	adfab666a4	intel: Add intel_device_info::has_systolic Gfx125+ has systolic, with exception for MTL and some ARL variants. Update code and tests to use it. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Kenneth Graunke	eb1ec9cf8e	brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs() This allows us to create temporary VGRFs that are larger than MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not be split on the initial pass, because we may need LOAD_PAYLOAD lowering, copy propagation, and so on to occur first. So we allow registers to exceed that size initially. The "Register allocation relies on split_virtual_grfs()" assertion in brw_reg_allocate.cpp still asserts that all VGRFs which reach the register allocator have been properly split. One case where this is useful is for vectorizing convergent block loads. We create temporaries to splat the SIMD1 values out to SIMD(N), which can lead to some very large temporaries. However, copy propagation and so on ultimately eliminate these and they'll get split down to proper sizes or elided entirely in the end. (Note: both this and the prior commits from this merge request are needed to close the linked issue.) Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	a45583f078	brw: Use live->max_vgrf_size in pre-RA scheduling Post-RA scheduling doesn't use liveness analysis, so we continue using MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use live->max_vgrf_size. This helps get us to a place where we can emit arbitrarily large VGRFs early on in compilation, but which will be split and cleaned up prior to register allocation. It may also allocate smaller arrays in practice since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things we actually could need to allocate. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	4b27b5895c	brw: Use live->max_vgrf_size in register coalescing We already require liveness, so just use the actual maximum size we saw instead of a hardcoded pessimal size. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	ea468412f6	brw: Track the largest VGRF size in liveness analysis We're already looking at this data to calculate the per-component vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the largest VGRF size while we're here. This will allow passes to size arrays based on the actual size needed, rather than hardcoding some fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than necessary, because e.g. vec5 sparse sampling results aren't used. Not hardcoding this means we can also temporarily handle very large VGRFs which we know will be split eventually, without having to increase the maximum which is ultimately used for RA classes. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Lionel Landwerlin	06ad9a25e5	brw: fix Wa_22013689345 emission Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details 2 problems : - not detecting null destination correctly - applied too late using SHADER_OPCODE_MEMORY_FENCE, when lowering already happened Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34319>	2025-04-10 16:44:28 +00:00
Ian Romanick	cb69d019cf	brw/nir: Use offset() for all uses of offs in emit_pixel_interpolater_alu_at_offset This is necessary to appropriately uniformize the first component access of a convergent vector. Without this, this is produced: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f This is the correct code: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f Without `38b58e286f`, the code generated was more incorrect, but happened to work for this test case: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+0.4<0>:F, 0.5f Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `38b58e286f` ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset") Closes: #12969 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>	2025-04-09 22:21:18 +00:00
Caio Oliveira	7457c4ecfd	brw: Make brw_range use half-open ranges Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	6509f8139d	brw: Use brw_range::last() to explicit get the last valid IP This is a preparation to change what is stored in brw_range::end. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	596bbb2c95	brw: Use brw_range to store Vars ranges Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	0b4a3c0ff6	brw: Use brw_range to store VGRF ranges Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	e644b42e59	brw: Use brw_range when operating with live ranges Makes the intention of some comparisons clearer by using the named helper functions. Add commentary when the straightforward range is not the one used, e.g. VGRF interference. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	f56a5cf1eb	brw: Use brw_range in IP ranges analysis Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	fb50461220	brw: Add brw_range struct Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Caio Oliveira	8d9155e34d	brw: Clean up saturate propagation after non-defs version removal Remove now unused analysis and no need to walk blocks in reverse after the non-defs version of the pass was removed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Caio Oliveira	cfc4067b0e	brw: Add a few basic tests for register coalesce Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Lionel Landwerlin	19e4dda9a2	brw: fix shuffle with scalar/uniform index The fixes commit isn't actually the source of the bug but likely the biggest enabler because it creates scalar values that more easily end up in the shuffle operations. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1b24612c57` ("brw/nir: Treat load_*_uniform_block_intel as convergent") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12927 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12688 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12570 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12905 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12734 Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34393>	2025-04-08 20:14:11 +00:00
Felix DeGrood	7a3de9e877	intel/brw: support for dumping shader line numbers Add support for dumping shader asm containing instruction line numbers matching offsets within instruction state pool buffer. Offsets should match values collected from eu stall sampling. This is required for match eu stall data with individual shader instructions. Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>	2025-04-08 19:39:53 +00:00
Faith Ekstrand	436f175187	intel/compiler: Use nir_split_conversions() Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34266>	2025-04-07 17:45:21 -05:00
Caio Oliveira	bf9ad36f2d	brw: Properly handle cooperative matrices created with constants Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Expand constant sources to cover the region read by DPAS, and also use NULL register as accumulator when possible. Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>	2025-04-07 14:27:43 -07:00
Ian Romanick	f33faa4648	brw/nir: Allow b2f(not(X)) optimization on Gfx12.5+ Since there are no type conversions, no restrictions are violated. No shader-db or fossil-db changes on any Gfx12 or older Intel platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 16956077 -> 16944933 (-0.07%) instructions in affected programs: 1957573 -> 1946429 (-0.57%) helped: 4629 / HURT: 35 total cycles in shared programs: 915668518 -> 915684808 (<.01%) cycles in affected programs: 341925598 -> 341941888 (<.01%) helped: 3040 / HURT: 1305 helped stats (abs) min: 2 max: 23034 x̄: 205.36 x̃: 16 helped stats (rel) min: <.01% max: 41.21% x̄: 1.28% x̃: 0.48% HURT stats (abs) min: 2 max: 68820 x̄: 490.88 x̃: 22 HURT stats (rel) min: <.01% max: 103.69% x̄: 2.29% x̃: 0.37% 95% mean confidence interval for cycles value: -50.28 57.78 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 40 GAINED: 42 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 209828027 -> 209790349 (-0.02%); split: -0.03%, +0.01% Cycle count: 30504938008 -> 30514045408 (+0.03%); split: -0.06%, +0.09% Spill count: 512182 -> 512168 (-0.00%) Fill count: 623432 -> 623426 (-0.00%); split: -0.00%, +0.00% Max live registers: 65465029 -> 65464959 (-0.00%) Totals from 57895 (8.19% of 706589) affected shaders: Instrs: 50144907 -> 50107229 (-0.08%); split: -0.11%, +0.03% Cycle count: 7549692606 -> 7558800006 (+0.12%); split: -0.25%, +0.37% Spill count: 58834 -> 58820 (-0.02%) Fill count: 102324 -> 102318 (-0.01%); split: -0.01%, +0.01% Max live registers: 9129045 -> 9128975 (-0.00%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	853ead2073	brw/nir: Optimize b2f(not(X)) using logical operations instead of arithmetic Funny story... this is how regular b2f was implemented before Curro implmented the `MOV dst:F -src:D` method 9 years ago (see `3ee2daf23d`). Eliminating the type conversion in the arithmetic operation enables the next commit. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	3d23496fd9	brw/copy: Copy prop -X into Y&1 This commit prevents code quality regressions in the next commit. Without this, some fragment shaders in Batman: Arkham Origins have code like: shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V ... and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW ... add(8) g56<1>D -g52<8,8,1>D 1D transformed to shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V ... and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW ... mov(8) g56<1>D -g52<8,8,1>D ... and(8) g57<1>UD ~g56<8,8,1>D 0x00000001UD Propagating through the negation allows the added MOV to be deleted. shader-db: All Intel platforms had simlar results. (Lunar Lake shown) total instructions in shared programs: 16968020 -> 16968019 (<.01%) instructions in affected programs: 281 -> 280 (-0.36%) helped: 1 / HURT: 0 total cycles in shared programs: 914598850 -> 914598832 (<.01%) cycles in affected programs: 5398 -> 5380 (-0.33%) helped: 1 / HURT: 0 A single Blender vertex shader was affected. fossil-db: Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown) Totals: Instrs: 209894650 -> 209894651 (+0.00%) Cycle count: 30545958586 -> 30545952860 (-0.00%) Totals from 2 (0.00% of 706657) affected shaders: Instrs: 3582 -> 3583 (+0.03%) Cycle count: 1875100 -> 1869374 (-0.31%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Subgroup size: 9906400 -> 9906416 (+0.00%) Totals from 2 (0.00% of 805770) affected shaders: Subgroup size: 16 -> 32 (+100.00%) Two compute shaders in Hogwarts Legacy were affected. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	e82464e6e0	brw/copy: Refactor source modifier type checking This simplifies the next commit. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	dee49f4206	brw/algebraic: Optimize derivative of convergent value Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This is mostly defensive. If a convergent value ever ended up as a source of a DDX or DDY, the eu_emit code will ignore the stride. This will result in bad code being generated. No shader-db or fossil-db changes on any Intel platform. v2: DDX and DDY will always be float, but brw_imm_for_type only works with integer types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Ken Fixes: `d5d7ae22ae` ("brw/nir: Fix up handling of sources that might be convergent vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00
Ian Romanick	5656682344	brw/nir: Eliminate default parameter to get_nir_src The vast majority of the callers want channel = 0. During the development process, using this default parameter value saved a lot of pain in rebasing. However, it seems to be more trouble than it's worth. Issue #12464 occurred because LNL was merged while this code was in review. As a result, one caller of get_nir_src that wanted channel = -1 was not inspected closely, and it got the default channel = 0 instead. To prevent this happening in the future (with possible branches still yet to be merged, for example), remove the default parameter. This will force the inspection of any callers that don't have an explicit channel parameter. Hopefully that will prevent more problems. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00
Ian Romanick	38b58e286f	brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset The source of nir_intrinsic_load_barycentric_at_offset is a vector, so -1 should be passed to get_nir_src. This is also done for texture sampling intrinsics. I skimmed the other user of get_nir_src, and I believe they are correct. This one was just missed as LNL support landed an many, many rebases of the original MR occurred. v2: Fix another get_nir_src call. Suggested by Lionel. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `d5d7ae22ae` ("brw/nir: Fix up handling of sources that might be convergent vectors") Closes: #12464 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00
Caio Oliveira	9845693912	brw: Fix memory leak in EU validation tests Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Fixes: `62323a934b` ("brw: Add BRW_TYPE_BF validation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34395>	2025-04-06 06:26:03 +00:00
Caio Oliveira	c33ee4adae	brw: Fix invalid memory access in scoreboard test Fixes: `03aca2d248` ("brw: Use new bld/exp style in scoreboard tests") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34394>	2025-04-05 22:58:23 -07:00
Caio Oliveira	7ae638c0fe	brw: Add brw_builder::uniform() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34355>	2025-04-04 23:07:21 +00:00
Caio Oliveira	f33d93da11	brw: Remove HSW specific code from brw_compile_cs.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34355>	2025-04-04 23:07:21 +00:00
Caio Oliveira	03aca2d248	brw: Use new bld/exp style in scoreboard tests Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:53 +00:00
Caio Oliveira	7ee673c195	brw: Add parser of SWSB annotations to use in tests Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:53 +00:00
Caio Oliveira	81dd3e1527	brw: Return actual progress in brw_lower_scoreboard This will be useful later for tests to be used in conjunction with the EXPECT_PROGRESS / EXPECT_NO_PROGRESS helpers. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:53 +00:00
Caio Oliveira	3e727000dd	brw: Stop setting SFID in scoreboard tests They won't affect the scoreboard, and will get in the way of a later change. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:53 +00:00
Caio Oliveira	bcea076aca	brw: Use SIMD16 shaders in scoreboard tests for Xe2+ Some tests changed to avoid unintended overlap between operands which would change the SWSB assigned. In some cases also changed the Gfx12 matching test so they remain equal. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:52 +00:00
Caio Oliveira	cd486cda48	brw: Use control flow helpers in scoreboard tests Also update WHILE to optionally take a predicate (default to NONE). And make the predicate in the IF optional (default to NORMAL). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>	2025-04-04 20:14:52 +00:00
Ian Romanick	20cce95ce5	brw/opt: Don't call brw_opt_copy_propagation before brw_lower_load_reg Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On a 36c/72t Xeon system, performance of replaying hogwarts_legacy.dx12vk-ultra.foz was improved 1.3% +/- 0.77% (n=10). I picked MTL for the fossil-db results because it was the most negative. shader-db: All Intel platforms had fairly similar results. (Lunar Lake) total instructions in shared programs: 16964217 -> 16964216 (<.01%) instructions in affected programs: 51777 -> 51776 (<.01%) helped: 20 / HURT: 27 total cycles in shared programs: 892934916 -> 893041912 (0.01%) cycles in affected programs: 51245298 -> 51352294 (0.21%) helped: 96 /HURT: 78 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 233678547 -> 233678944 (+0.00%); split: -0.00%, +0.00% Cycle count: 24398049850 -> 24400490877 (+0.01%); split: -0.01%, +0.02% Max live registers: 42145052 -> 42145038 (-0.00%); split: -0.00%, +0.00% Totals from 1141 (0.14% of 805934) affected shaders: Instrs: 1546001 -> 1546398 (+0.03%); split: -0.01%, +0.03% Cycle count: 1201746062 -> 1204187089 (+0.20%); split: -0.14%, +0.34% Max live registers: 84247 -> 84233 (-0.02%); split: -0.03%, +0.01% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	991a2f510b	brw/sat: Eliminate non-defs saturate propagation The intervening_saturating_copy test is removed. The defs version of the pass does not handle this case. It should not occur often in practice anyway. Copy propagation and brw_nir_opt_fsat should prevent this scenario from happening. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 212677275 -> 212677278 (+0.00%) Cycle count: 30466062848 -> 30466056040 (-0.00%) Totals from 1 (0.00% of 706300) affected shaders: Instrs: 1343 -> 1346 (+0.22%) Cycle count: 411664 -> 404856 (-1.65%) v2: Stop counting ip. The non-defs part of the pass was the only thing that used it. v3: Also delete "if (block != def->block) continue;" code. I noticed this while working on some other changes to this function. It's the last thing in the loop, so it's totally useless. Delete some other spurious continues too. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	cc5a6a5ae8	brw/sat: Convert tests to use load_reg This is in prepartion for a commit that removes the non-defs version of the pass. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	2d13acf9d9	brw: Add passes to generate and lower load_reg v2: Add support for WE_all instructions... this already just worked, so I only had to delete the check and the FINISHME comment. v3: Use logic more like def_analysis::update_for_reads to determine when to not insert LOAD_REG instructions. Based on a suggestion by Ken. v4: Eliminate "store" from all the names since STORE_REG does not exist anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra call to s.def_analysis.require() after progress. Pull a loop-invariant check out of the inst->srouces loop. Drop call to brw_opt_split_virtual_grfs after lowering load_reg. All suggested by Caio. v5: Assert that LOAD_REG doesn't already exist in brw_insert_load_reg. Update comment before fully_defines. Both suggested by Caio. v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL. Move the inst->dst.file != VGRF check earlier to avoid the loop over sources. Both suggested by Ken. Move the call the brw_insert_load_reg a little bit later, and explain why it's at that location. Suggested by Caio. v7: Many changes to the for-each-source loop in brw_insert_load_reg. Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds checks for matching SIMD size and NoMask in the search for pre-existing LOAD_REG of same value. v8: Add some unit tests. Suggested by Caio. shader-db: Lunar Lake total instructions in shared programs: 16923237 -> 16921895 (<.01%) instructions in affected programs: 450565 -> 449223 (-0.30%) helped: 251 / HURT: 377 total cycles in shared programs: 910428418 -> 889920590 (-2.25%) cycles in affected programs: 719248184 -> 698740356 (-2.85%) helped: 9076 / HURT: 9082 total fills in shared programs: 2242 -> 2218 (-1.07%) fills in affected programs: 116 -> 92 (-20.69%) helped: 2 / HURT: 0 total sends in shared programs: 848635 -> 848421 (-0.03%) sends in affected programs: 810 -> 596 (-26.42%) helped: 10 / HURT: 0 LOST: 82 GAINED: 78 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19875784 -> 19871694 (-0.02%) instructions in affected programs: 1050091 -> 1046001 (-0.39%) helped: 251 / HURT: 2403 total cycles in shared programs: 905328238 -> 882446458 (-2.53%) cycles in affected programs: 682736344 -> 659854564 (-3.35%) helped: 7869 / HURT: 7911 total spills in shared programs: 5512 -> 5032 (-8.71%) spills in affected programs: 1830 -> 1350 (-26.23%) helped: 8 / HURT: 0 total fills in shared programs: 5648 -> 4782 (-15.33%) fills in affected programs: 3312 -> 2446 (-26.15%) helped: 8 / HURT: 0 total sends in shared programs: 1032942 -> 1032722 (-0.02%) sends in affected programs: 572 -> 352 (-38.46%) helped: 10 / HURT: 0 LOST: 138 GAINED: 53 Tiger Lake total instructions in shared programs: 19711930 -> 19715591 (0.02%) instructions in affected programs: 1040623 -> 1044284 (0.35%) helped: 317 / HURT: 2474 total cycles in shared programs: 862988990 -> 860573870 (-0.28%) cycles in affected programs: 612392461 -> 609977341 (-0.39%) helped: 7447 / HURT: 7686 total sends in shared programs: 1034763 -> 1034555 (-0.02%) sends in affected programs: 784 -> 576 (-26.53%) helped: 8 / HURT: 0 LOST: 56 GAINED: 143 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20545461 -> 20545220 (<.01%) instructions in affected programs: 422405 -> 422164 (-0.06%) helped: 180 / HURT: 459 total cycles in shared programs: 872697345 -> 866874523 (-0.67%) cycles in affected programs: 573117917 -> 567295095 (-1.02%) helped: 6783 / HURT: 6980 total spills in shared programs: 4335 -> 4336 (0.02%) spills in affected programs: 90 -> 91 (1.11%) helped: 1 / HURT: 2 total fills in shared programs: 4194 -> 4196 (0.05%) fills in affected programs: 463 -> 465 (0.43%) helped: 1 / HURT: 2 total sends in shared programs: 1079446 -> 1079238 (-0.02%) sends in affected programs: 784 -> 576 (-26.53%) helped: 8 / HURT: 0 LOST: 117 GAINED: 37 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01% Send messages: 10927753 -> 10927640 (-0.00%) Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62% Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08% Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12% Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26% Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%) Totals from 503284 (71.25% of 706391) affected shaders: Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01% Send messages: 9699732 -> 9699619 (-0.00%) Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63% Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08% Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12% Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27% Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00% Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	8b2be206f3	brw/algebraic: Constant folding for BROADCAST and SHUFFLE This prevents assertion failures in brw_eu_emit in a later commit in this MR. Even though they have not been previously observed, these assertion failures could happen even without that commit. No shader-db or fossil-db changes on any Intel platform. Fixes: `04e1783278` ("brw: Call brw_fs_opt_algebraic less often") v2: Add SHUFFLE. Suggested by Ken. Fixed indentation. v3: Update BROADCAST exec_size after rebasing on "brw/build: Use SIMD8 temporaries in emit_uniformize". v4: Explain why munging the exec_size is correct. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	1b997c7bcc	brw/coalesce: Prepare brw_opt_register_coalesce for load_reg v2: Explain the problematic situation a little better in the comment. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	15637334ce	brw/copy: Prepare copy_propagation for load_reg The changes to try_copy_propagate will be removed later in the series. v2: Fix up some comments to note that offset != 0 is allowed only when stride == 0. Apply same offset=0 restriction in try_copy_propagate_def too. Allow copy propagation if the source is either a def or UNIFORM. Don't copy prop a load_reg through a non-def value. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	cfc50390fb	brw: Add basic infrastructure for load_reg pseudo op load_reg is something like load_payload except it has a single source. It copies the entire source to the destination. Its purpose is to convert a non-SSA VGRF into an SSA value. This copy is marked as volatile so that it will act as a scheduling barrier. v2: Fix some typos in the commit message. Eliminate the brw_builder::LOAD_REG overload that returns a brw_inst*. This is unlikely to ever be used. Add some checks to brw_validate. All suggested by Caio. v3: Force the source and destination types of the LOAD_REG to by integer. This will (eventually) simplify the creating of unit tests for the pass that adds LOAD_REG instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Ian Romanick	b9656d51c0	brw/opt: Move non-SSA register accounting after first brw_opt_split_virtual_grfs v2: Move to immediately before the main optimization loop. Most importantly, this is after the first call to DCE. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Non SSA regs after NIR: 237045283 -> 100183460 (-57.74%); split: -58.12%, +0.39% Totals from 701423 (99.26% of 706657) affected shaders: Non SSA regs after NIR: 236868848 -> 100007025 (-57.78%); split: -58.17%, +0.39% Suggested-by: Ken Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>	2025-04-04 06:45:02 +00:00
Caleb Callaway	5ad00bae8b	intel/compiler: fix lingering i965 references Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34351>	2025-04-03 03:17:25 +00:00

1 2 3 4 5 ...

4262 commits