fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 13:20:14 +01:00

Author	SHA1	Message	Date
Caio Oliveira	e7179232c9	intel/brw: Move encoding of Gfx11 3-src inside the inst helpers Create specific helper for register file encoding and handle it there. Use ad-hoc structs to let the macro take optional named arguments. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	d31c8bfb6f	intel/brw: Remove more uses of variable length arrays In these cases there's a clear bound we can use. In C++ this is a compiler extension and not compatible with zero initializing a regular struct -- which will happen in a later change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	86c20e2910	intel/brw: Use a helper for common VEC pattern In the helper, instead of using the Variable Length Array, use a fixed size array to NIR_MAX_VEC_COMPONENTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	abc535a3b4	intel/brw: Remove unused variable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:13 +00:00
Kenneth Graunke	b97e10208c	intel/brw: Add a file parameter to idom_tree::dump() The other dump methods in this file also take a file parameter, defaulting to stderr. Dumping dot files to stdout is probably not what anybody really wanted. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Kenneth Graunke	bb4f05005e	intel/brw: Print blocks in brw_print_instructions_to_file() Useful when examining the control flow graph. For some reason, we printed this for the final assembly but not the IR. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Kenneth Graunke	2d73e42333	intel/brw: Fix OOB reads when printing instructions post-reg-alloc Post-register allocation, but before brw_fs_lower_vgrfs_to_fixed_grfs, we have registers with the VGRF file but they are actually fixed GRFs. brw_print_instructions_to_file() was seeing VGRFs and trying to access their size, but using bogus register numbers that could be out-of-bound. Detect when we're post-RA and avoid doing this. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Lionel Landwerlin	d9406658ed	brw: remove unused prog_data field Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>	2024-08-22 19:44:40 +00:00
Kenneth Graunke	6a292c2699	intel: Fix bad align_offset on global_constant_uniform_block_intel We were specifying align_offset = 64 and align_mul = 64, which is invalid. nir_combined_align() asserts that align_offset < align_mul. Our intention here is to perform cacheline-aligned (64B-aligned) block loads, so we should set align_mul = 64 and can leave align_offset = 0. Fixes: `fbafa9cabd` ("intel/nir: remove load_global_const_block_intel intrinsic") Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>	2024-08-21 20:44:57 +00:00
Ian Romanick	c96ceb50d0	intel/brw/xe2: Allow int64 conversions As far as I can tell from looking at the Bspec, MOV between integers of all sizes appears to be supported. shader-db: total instructions in shared programs: 17480631 -> 17480535 (<.01%) instructions in affected programs: 26284 -> 26188 (-0.37%) helped: 21 / HURT: 13 total cycles in shared programs: 897601907 -> 897664293 (<.01%) cycles in affected programs: 10929664 -> 10992050 (0.57%) helped: 48 / HURT: 45 fossil-db: Totals: Instrs: 140686824 -> 140686155 (-0.00%); split: -0.00%, +0.00% Cycle count: 21525129188 -> 21524717729 (-0.00%); split: -0.01%, +0.00% Spill count: 70778 -> 70776 (-0.00%) Fill count: 139172 -> 139168 (-0.00%) Max live registers: 47513859 -> 47513795 (-0.00%) Totals from 612 (0.11% of 549272) affected shaders: Instrs: 964441 -> 963772 (-0.07%); split: -0.09%, +0.02% Cycle count: 1215564312 -> 1215152853 (-0.03%); split: -0.09%, +0.06% Spill count: 16172 -> 16170 (-0.01%) Fill count: 37962 -> 37958 (-0.01%) Max live registers: 70749 -> 70685 (-0.09%) Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30700>	2024-08-21 20:16:00 +00:00
Francisco Jerez	71ca8529c5	intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations. We were currently emitting logical atomic instructions with a packed destination region for sub-dword LSC atomics, along the lines of: > untyped_atomic_logical(32) dst<1>:HF, ... However, these instructions use an LSC data size D16U32, which means that the 16b data on the return payload is expanded to 32b by the LSC shared function, so we were lying to the compiler about the location of the individual channels on the return payload, its execution masking, etc. This is why the hacks that manually set the 'inst->size_written' of the instruction were required. In some cases this worked, but any non-trivial manipulation of the instruction destination by lowering or optimization passes could have led to corruption, as has been reproduced in deqp-vk during lower_simd_width() for shaders that use 16-bit atomics in SIMD32 dispatch mode. Note that LSC sub-dword reads aren't affected by this because they use raw UD destinations and specify the actual bit size of the operation datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't work for atomic operations since that immediate specifies the atomic opcode. Instead, have the logical operation implement the behavior of 16-bit destinations correctly instead of silently replacing the 16-bit region with an inconsistent 32-bit region -- This is done by emitting the MOV instructions used to pack the data from the UD temporary into the packed destination from the lower_logical_sends() pass instead of from the NIR translation pass. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>	2024-08-21 02:33:12 +00:00
Kenneth Graunke	d22d6d814d	intel/brw: Fix Xe2+ SWSB encoding/decoding for DPAS instructions SBID SET can only be used on SEND, SENDC, or DPAS instructions. The existing code was handling SET for SEND/SENDC, but was using the wrong encoding for DPAS. Add a new case to handle that and make it clear that the existing code is only for SEND/SENDC. While here, rewrite the encoder to use 2-bit binary immediates shifted up into the mode [9:8] field, rather than pre-shifted hex values. This matches the documentation better and is a little easier to follow. On the decode side, we were incorrectly decoding MATH instructions. Because they're marked is_unordered, we were hitting the SEND/SENDC decoding, which is incorrect for MATH. Fixes 22 cooperative matrix tests on Lunar Lake. Huge thanks to Paulo Zanoni for bisecting failures to one of my commits, then analyzing shaders and experimenting to discover that the failure was really an unrelated bug, just being provoked by different choices of registers. His work narrowing the problem down made it much easier to discover and fix this bug. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>	2024-08-20 19:09:37 +00:00
Kenneth Graunke	89f9a6e10b	intel/brw: Pass opcode to brw_swsb_encode/decode We're going to need to handle encoding/decoding differently for DPAS vs. SEND/SENDC vs. other instructions. Pass the opcode so we can figure out the encodings for each type of instruction. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>	2024-08-20 19:09:37 +00:00
Caio Oliveira	40f77b6936	intel/brw: Avoid modifying the shader in assign_curb_setup if not needed If there are no uniforms to push, don't emit the AND or invalidate the shader analysis. This affects only compute shaders. Not a significant impact since lots of shaders end up pushing uniforms. Fossil-db numbers (restricted to compute pipelines only) for DG2 ``` Totals: Instrs: 3071016 -> 3070894 (-0.00%) Cycle count: 8320268863 -> 8320264519 (-0.00%) Totals from 122 (2.70% of 4520) affected shaders: Instrs: 10675 -> 10553 (-1.14%) Cycle count: 2060003 -> 2055659 (-0.21%) ``` Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>	2024-08-17 16:25:01 -07:00
Sagar Ghuge	c4f2a8d984	intel/compiler: Fix indirect offset in GS input read for Xe2+ Make sure to take new GRF size into consideration and adjust the indirect offset according to new size so that when we do the indirect load with address register, we load right values. This helps pass the following tests: - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.geom - dEQP-VK.ray_query.geometry_shader. Backport-to: 24.2 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>	2024-08-16 18:40:13 +00:00
Ian Romanick	c8038643b8	intel/brw: Make ifind_msb SSA friendly No shader-db changes on any Intel platform. v2: Use negate(tmp) instead of creating a new temporary. Suggested by Ken. fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06% Totals from 40 (0.01% of 633223) affected shaders: Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00% Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00% Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11% Spill count: 59795 -> 59794 (-0.00%) Fill count: 103513 -> 103509 (-0.00%) Totals from 40 (0.01% of 632445) affected shaders: Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00% Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43% Spill count: 16896 -> 16895 (-0.01%) Fill count: 27819 -> 27815 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Ian Romanick	e9c151fde6	intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00% Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00% Spill count: 78571 -> 78525 (-0.06%) Fill count: 148178 -> 148132 (-0.03%) Totals from 210 (0.03% of 633223) affected shaders: Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08% Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00% Spill count: 15632 -> 15586 (-0.29%) Fill count: 26241 -> 26195 (-0.18%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Caio Oliveira	6267585778	intel/brw: Also return the size of the assembled shader Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>	2024-08-14 03:03:46 +00:00
Sagar Ghuge	83c2524124	intel/compiler: Adjust trace ray control field on Xe2 Bspec 64643: Structure_TraceRayPayload::Trace Ray Control Bit field moved from 9-8 to 10-8 on Xe2. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	5f437aa24d	elk: fix compute shader derivatives derivatives are not fs only so move to be with the rest of subgroup ops. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>	2024-08-13 12:19:30 +00:00
Lionel Landwerlin	aaff191356	brw/rt: fix ray_object_(direction\|origin) for closest-hit shaders When closest hit shader is called, the BVH object level brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using is the ray origin/direction and apply the transform of the current instance. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ba7d459a3` ("intel/rt: Implement the new ray-tracing system values") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>	2024-08-13 10:28:50 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Ian Romanick	f5815a003e	intel/brw: Use def analysis for simple cases of saturate propagation I had hoped this would improve compilation performance too. I tried several different long running fossils, and there was no difference. Fossil-db results are all over the place from platform to platform. All of the Tiger Lake shaders hurt for spills and fills are fragment shaders in rdr2. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19734088 -> 19733645 (<.01%) instructions in affected programs: 71200 -> 70757 (-0.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1 helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48% 95% mean confidence interval for instructions value: -2.69 -2.07 95% mean confidence interval for instructions %-change: -0.93% -0.72% Instructions are helped. total cycles in shared programs: 916290473 -> 916180971 (-0.01%) cycles in affected programs: 3403719 -> 3294217 (-3.22%) helped: 89 HURT: 88 helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10 helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46% HURT stats (abs) min: 1 max: 8750 x̄: 195.98 x̃: 7 HURT stats (rel) min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19% 95% mean confidence interval for cycles value: -1199.88 -37.43 95% mean confidence interval for cycles %-change: -0.66% 0.56% Inconclusive result (%-change mean confidence interval includes 0). fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151458346 -> 151457413 (-0.00%) Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00% Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5500560 -> 5500384 (-0.00%) Totals from 479 (0.08% of 628970) affected shaders: Instrs: 398836 -> 397903 (-0.23%) Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29% Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92% Max dispatch width: 4392 -> 4216 (-4.01%) Tiger Lake Totals: Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00% Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00% Spill count: 61241 -> 61251 (+0.02%) Fill count: 107304 -> 107314 (+0.01%) Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5517568 -> 5517248 (-0.01%) Totals from 486 (0.08% of 628673) affected shaders: Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01% Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51% Spill count: 1302 -> 1312 (+0.77%) Fill count: 3746 -> 3756 (+0.27%) Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99% Max dispatch width: 4576 -> 4256 (-6.99%) Ice Lake Totals: Instrs: 151348422 -> 151347463 (-0.00%) Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00% Fill count: 108114 -> 108111 (-0.00%) Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5611288 -> 5611256 (-0.00%) Totals from 483 (0.08% of 634352) affected shaders: Instrs: 393333 -> 392374 (-0.24%) Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22% Fill count: 3654 -> 3651 (-0.08%) Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92% Max dispatch width: 4312 -> 4280 (-0.74%) Skylake Totals: Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00% Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00% Max live registers: 31757558 -> 31757725 (+0.00%) Max dispatch width: 5470040 -> 5469920 (-0.00%) Totals from 3542 (0.57% of 624449) affected shaders: Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00% Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12% Max live registers: 304531 -> 304698 (+0.05%) Max dispatch width: 31016 -> 30896 (-0.39%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:05 -07:00
Ian Romanick	adcce2bba4	intel/brw: Small code refactor in brw_fs_opt_saturate_propagation This bit of code will have a second use in the next commit. v2: Fix some broken indentation. Noticed by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:03 -07:00
Ian Romanick	9125b7c1b4	intel/elk: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. shader-db: All Intel platforms had similar results. (Broadwell shown) total instructions in shared programs: 18282314 -> 18282316 (<.01%) instructions in affected programs: 78 -> 80 (2.56%) helped: 0 HURT: 2 total cycles in shared programs: 952924234 -> 952924252 (<.01%) cycles in affected programs: 584 -> 602 (3.08%) helped: 0 HURT: 2 Fixes: `e6022281f2` ("intel/elk: Rename files to use elk prefix") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:01 -07:00
Ian Romanick	3d8fea0e09	intel/brw: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the following code: mad.ge.f0.0(8) g51<1>F g50<8,8,1>F g46<8,8,1>F g11<1,1,1>F mov.sat(8) g52<1>F g51<1,1,0>F ... (+f0.0) sel(8) g54<1>UD g53<8,8,1>UD 0x3f000000UD Without this commit, the saturate is incorrectly propagated to the MAD. A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275. There are even worse cases like total_war_warhammer3.dx12vk-g6.foz 78328466761ef7ab and ee920491573860fc. The former has the following code (and the latter has very similar code): mad.l.f0.0(16) g95<1>F g93<8,8,1>F g62<8,8,1>F g68<1,1,1>F ... mov.sat(16) g109<1>F -g95<1,1,0>F ... (+f0.0) sel(16) g68<1>UD g111<1,1,0>UD g54<1,1,0>UD (+f0.0) sel(16) g70<1>UD g113<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g115<1,1,0>UD g58<1,1,0>UD Saturate propagation makes a hash of this code: mad.sat.l.f0.0(16) g106<1>F -g93<8,8,1>F -g62<8,8,1>F g68<1,1,1>F ... (+f0.0) sel(16) g70<1>UD g110<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g112<1,1,0>UD g58<1,1,0>UD (+f0.0) sel(16) g68<1>UD g108<1,1,0>UD g54<1,1,0>UD Not only is the saturate incorrectly applied to the MAD, but the MAD result is negated without changing the conditional modifier to G! NOTE: Backports of this commit to stable branches may need to be more like the following commit to elk. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19729375 -> 19729377 (<.01%) instructions in affected programs: 112 -> 114 (1.79%) helped: 0 HURT: 2 total cycles in shared programs: 916234266 -> 916234288 (<.01%) cycles in affected programs: 636 -> 658 (3.46%) helped: 0 HURT: 2 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151531594 -> 151531601 (+0.00%) Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00% Totals from 6 (0.00% of 630198) affected shaders: Instrs: 4550 -> 4557 (+0.15%) Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03% Fixes: `947c828d5c` ("i965/fs: Add a saturation propagation optimization pass.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:57 -07:00
Ian Romanick	6da4649191	intel/brw: Eliminate dead flag writes This prevents a couple small regressions in the next commit. The only changes in shader-db or fossil-db were on Skylake. This seems to eliminate an unused flags write that doesn't exist on other platforms. With that flag write eliminated, a later CMP can be scheduled better. I did not investigate this further. v2: Clean up some unnecessary bits and add some comments to can_elminate_conditional_mod. Suggested by Ken and Matt. Skylake Totals: Cycle count: 14665454524 -> 14665454444 (-0.00%) Totals from 10 (0.00% of 625685) affected shaders: Cycle count: 38630 -> 38550 (-0.21%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:54 -07:00
Alyssa Rosenzweig	bf9a17e2d5	elk: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Alyssa Rosenzweig	eec02246f8	brw: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Caio Oliveira	2e2b83f72d	intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION Instead of emitting a single one at the top, and making reference to it, emit the virtual instruction as needed and let CSE do its job. Since load_subgroup_invocation now can appear not at the start of the shader, use UNDEF in all cases to ensure that the liveness of the destination doesn't extend to the first partial write done here (it was being used only for SIMD > 8 before). Note this option was considered in the past `6132992cdb` but at the time dismissed. The difference now is that the lowering of the virtual instruction happens earlier than the scheduling. The motivation for this change is to allow passes other than the NIR conversion to use this value. The alternative of storing a `brw_reg` in the shader (instead of NIR state) gets complicated by passes like compact_vgrfs, that move VGRFs around (and update the instructions). This and maybe other passes would have to care about the brw_reg. Fossil-db numbers, TGL ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00% Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03% Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01% Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01% Max live registers: 31725096 -> 31671792 (-0.17%) Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00% Totals from 52574 (8.34% of 630441) affected shaders: Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02% Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29% Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03% Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02% Max live registers: 1992493 -> 1939189 (-2.68%) Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05% ``` and DG2 ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7699776 -> 7699784 (+0.00%) Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02% Spill count: 61283 -> 61274 (-0.01%) Fill count: 119886 -> 119849 (-0.03%) Max live registers: 31810432 -> 31758920 (-0.16%) Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06% Totals from 49286 (7.81% of 631231) affected shaders: Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04% Subgroup size: 857752 -> 857760 (+0.00%) Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72% Spill count: 6339 -> 6330 (-0.14%) Fill count: 12571 -> 12534 (-0.29%) Max live registers: 1788346 -> 1736834 (-2.88%) Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66% ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>	2024-08-08 18:20:49 +00:00
Lionel Landwerlin	0bd96e868c	intel-clc: missing printf lowering Useful for printf() debugging in our opencl shader snippets. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Kenneth Graunke	32cce2f397	intel/brw: Set appropriate types for 16-bit sampler trailing components 16-bit SIMD8 sampler writeback messages come with a bit of padding in them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into the padding-free format expected by NIR. Additionally, we may reduce the response length on the sampler messages based on which components of the (always vec4) NIR destination are actually in use. When we do that, dest_size > read_size, and the trailing components are all empty BAD_FILE registers, indicating the contents are undefined. Unfortunately, we can't ignore those trailing components entirely. In the past, we left them default-initialized, giving us a BAD_FILE register with UD type (which didn't matter, since all sampler returns were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD. For example, writing RGB and skipping A (without sparse) would produce read_size = 3 and dest_size = 4 and nir_dest[5] containing: nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud> We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see 3 HF's and a UD, and try to copy the full 32-bit value at the end, instead of 16-bits of pad like we intended. This meant it would overflow the destination register's size, triggering validation errors. Thanks to Ian Romanick for noticing this, writing a test, and also coming up with a nearly identical fix. Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617 References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>	2024-08-06 17:26:05 +00:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Iván Briano	f8553f56ac	intel/rt: fix terminateOnFirstHit handling If TraceRay() is called with the TerminateOnFirstHit flag, we need to terminate the ray on the first confirmed intersection. This is handled by the lowering of accept_ray_intersection and it's working fine for the case of multiple instances of the intersection shader being called. But if the shader calls reportIntersection() more than once, we were handling them all and accepting the closest one regardless of the flag. Check for the flag on every confirmed intersection and, if set, accept it right there. The subsequent lowering will take care of terminating handling the ray termination if necessary. Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>	2024-08-05 21:43:36 +00:00
Caio Oliveira	ba3fd5dc57	intel/brw: Don't retype load_subgroup_invocation result to signed The values are small unsigned integers, so their signed representation will be the same -- the sign conversion is not needed. As a result the extra MOV can be elided by the optimizations. Fossil-db results for DG2 ``` Totals: Instrs: 151779000 -> 151761591 (-0.01%) Cycle count: 12743968649 -> 12742826024 (-0.01%); split: -0.01%, +0.00% Max live registers: 31834993 -> 31834996 (+0.00%) Totals from 17018 (2.70% of 631450) affected shaders: Instrs: 2381740 -> 2364331 (-0.73%) Cycle count: 76798588 -> 75655963 (-1.49%); split: -1.70%, +0.22% Max live registers: 378921 -> 378924 (+0.00%) ``` and TGL ``` Totals: Instrs: 149812033 -> 149794080 (-0.01%); split: -0.01%, +0.00% Cycle count: 11534727002 -> 11534929834 (+0.00%); split: -0.01%, +0.01% Spill count: 42510 -> 42511 (+0.00%); split: -0.00%, +0.01% Fill count: 75100 -> 75101 (+0.00%); split: -0.00%, +0.00% Max live registers: 31727318 -> 31727321 (+0.00%) Totals from 17421 (2.76% of 630458) affected shaders: Instrs: 3092614 -> 3074661 (-0.58%); split: -0.58%, +0.00% Cycle count: 286061417 -> 286264249 (+0.07%); split: -0.32%, +0.39% Spill count: 11538 -> 11539 (+0.01%); split: -0.02%, +0.03% Fill count: 21359 -> 21360 (+0.00%); split: -0.01%, +0.01% Max live registers: 418954 -> 418957 (+0.00%) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30508>	2024-08-05 18:05:45 +00:00
Jordan Justen	58469620d3	intel/brw/validate: Convert access mask to be grf based Our validation code doesn't need to know which bytes are accessed. It only needs to know which grfs were accessed by an element. This also helps to easily handle the Xe2 register size change. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	e62606b2ec	intel/brw/validate: Update dst grf crossing check for Xe2 Rework: * Update grf_size_shift calculation (s-b Ken) Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	f2800deacb	intel/brw/validate: Simplify grf span validation check by not using a mask Previously this check would create a mask of the bytes used in the grf, and then shift the mask. This worked well when there was 32 bytes in the register because a 64-bit uint64_t could easily detect that bytes were used in the next regiter. (The next register was the high 32-bits of the `access_mask` variable.) With Xe2, the register size becomes 64 bytes, meaning this strategy doesn't work. Instead of a mask, we can just check to see if more than 1 grfs are used during each loop iteration. (Suggested by Ken.) This will make it easier to extend for Xe2 in a follow on commit. Verified this with dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize on Xe2, which otherwise would cause the program to fail to validate because it assumed a grf was 32 bytes. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	4ca4b064cf	intel/brw: Record g0 as live for sends with send_ex_desc_scratch set brw_send_indirect_split_message() implicitly reads g0 to construct the extended message descriptor for certain send messages when this is set. Record that liveness explicitly. Thanks to Francisco Jerez for reminding me about this use of g0. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:32 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Kenneth Graunke	545f20419f	intel/brw: Delete fs_reg_alloc::discard_interference_graph() Unused since commit `50519598ff`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:28 -07:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00

... 2 3 4 5 6 ...

3839 commits