fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 09:00:10 +01:00

Author	SHA1	Message	Date
Caio Oliveira	793cba0e6f	intel/brw: Apply conventions to lower_src_modifiers helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33110>	2025-01-19 08:24:09 -08:00
Caio Oliveira	d7d210fed4	intel/brw: Move shuffle_from_32bit_read implementation to brw_builder Make it a member function for convenience -- since another member function uses it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33108>	2025-01-18 20:48:57 +00:00
Caio Oliveira	b3001e4946	intel/brw: Move a few builder helpers to brw_builder.h/cpp Add brw prefix when necessary. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33108>	2025-01-18 20:48:57 +00:00
Caio Oliveira	1043187ec6	intel/brw: Stop using namespace for brw_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>	2025-01-18 16:12:56 +00:00
Caio Oliveira	5ac82efd35	intel/brw: Rename fs_builder to brw_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>	2025-01-18 16:12:55 +00:00
Caio Oliveira	3659934862	intel/brw: Add brw_generator.h header Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>	2025-01-17 00:04:41 +00:00
Caio Oliveira	634daf2827	intel/brw: Rename brw_fs_validate to brw_validate Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>	2025-01-13 23:56:22 +00:00
Lionel Landwerlin	8ac7802ac8	brw: move final send lowering up into the IR Because we do emit the final send message form in code generation, a lot of emissions look like this : add(8) vgrf0, u0, 0x100 mov(1) a0.1, vgrf0 # emitted by the generator send(8) ..., a0.1 By moving address register manipulation in the IR, we can get this down to : add(1) a0.1, u0, 0x100 send(8) ..., a0.1 This reduce register pressure around some send messages by 1 vgrf. All lost shaders in the below results are fragment SIMD32, due to the throughput estimator. If turned off, we loose no SIMD32 shaders with this change. DG2 results: Assassin's Creed Valhalla: Totals from 2044 (96.87% of 2110) affected shaders: Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00% Subgroup size: 23832 -> 23824 (-0.03%) Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82% Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39% Fill count: 2005 -> 1256 (-37.36%) Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00% Max live registers: 116765 -> 115058 (-1.46%) Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67% Cyberpunk 2077: Totals from 1181 (93.43% of 1264) affected shaders: Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01% Subgroup size: 13016 -> 13032 (+0.12%) Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39% Spill count: 12 -> 8 (-33.33%) Fill count: 9 -> 6 (-33.33%) Dota2: Totals from 173 (11.59% of 1493) affected shaders: Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34% Max live registers: 5787 -> 5779 (-0.14%) Max dispatch width: 1344 -> 1152 (-14.29%) Hitman3: Totals from 5072 (95.39% of 5317) affected shaders: Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00% Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48% Spill count: 3942 -> 3200 (-18.82%) Fill count: 10158 -> 8846 (-12.92%) Scratch Memory Size: 257024 -> 223232 (-13.15%) Max live registers: 328467 -> 324631 (-1.17%) Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73% Fortnite: Totals from 360 (4.82% of 7472) affected shaders: Instrs: 778068 -> 777925 (-0.02%) Subgroup size: 3128 -> 3136 (+0.26%) Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19% Max live registers: 50689 -> 50658 (-0.06%) Hogwarts Legacy: Totals from 1376 (84.00% of 1638) affected shaders: Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03% Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12% Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36% Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23% Scratch Memory Size: 99328 -> 89088 (-10.31%) Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23% Max dispatch width: 11848 -> 11920 (+0.61%) Metro Exodus: Totals from 92 (0.21% of 43072) affected shaders: Instrs: 262995 -> 262968 (-0.01%) Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25% Max live registers: 11152 -> 11140 (-0.11%) Red Dead Redemption 2 : Totals from 451 (7.71% of 5847) affected shaders: Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00% Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00% Max live registers: 42294 -> 42185 (-0.26%) Spiderman Remastered: Totals from 6820 (98.02% of 6958) affected shaders: Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65% Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25% Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61% Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58% Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74% Max live registers: 493149 -> 487458 (-1.15%) Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20% Strange Brigade: Totals from 3769 (91.21% of 4132) affected shaders: Instrs: 1354476 -> 1321474 (-2.44%) Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59% Max live registers: 199057 -> 193656 (-2.71%) Max dispatch width: 30272 -> 30240 (-0.11%) Witcher 3: Totals from 25 (2.40% of 1041) affected shaders: Instrs: 24621 -> 24606 (-0.06%) Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05% Max live registers: 1963 -> 1955 (-0.41%) LNL results: Assassin's Creed Valhalla: Totals from 1928 (98.02% of 1967) affected shaders: Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11% Subgroup size: 41264 -> 41280 (+0.04%) Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11% Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90% Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60% Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56% Max live registers: 205483 -> 202192 (-1.60%) Cyberpunk 2077: Totals from 1177 (96.40% of 1221) affected shaders: Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03% Subgroup size: 24912 -> 24944 (+0.13%) Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81% Spill count: 8 -> 3 (-62.50%) Fill count: 6 -> 3 (-50.00%) Max live registers: 126922 -> 125472 (-1.14%) Dota2: Totals from 428 (32.47% of 1318) affected shaders: Instrs: 89355 -> 89740 (+0.43%) Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55% Max live registers: 32863 -> 32847 (-0.05%) Fortnite: Totals from 5354 (81.72% of 6552) affected shaders: Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53% Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65% Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72% Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35% Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71% Hitman3: Totals from 4912 (97.09% of 5059) affected shaders: Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00% Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55% Spill count: 3739 -> 3136 (-16.13%) Fill count: 10657 -> 9564 (-10.26%) Scratch Memory Size: 373760 -> 318464 (-14.79%) Max live registers: 597566 -> 589460 (-1.36%) Hogwarts Legacy: Totals from 1471 (96.33% of 1527) affected shaders: Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05% Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68% Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95% Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83% Scratch Memory Size: 251904 -> 217088 (-13.82%) Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12% Metro Exodus: Totals from 18356 (49.81% of 36854) affected shaders: Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83% Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84% Spill count: 595 -> 546 (-8.24%) Fill count: 1604 -> 1408 (-12.22%) Max live registers: 2086937 -> 2086933 (-0.00%) Red Dead Redemption 2: Totals from 4171 (79.31% of 5259) affected shaders: Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83% Subgroup size: 86416 -> 86432 (+0.02%) Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53% Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59% Scratch Memory Size: 401408 -> 385024 (-4.08%) Spiderman Remastered: Totals from 6639 (98.94% of 6710) affected shaders: Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98% Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59% Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82% Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76% Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17% Max live registers: 918240 -> 906604 (-1.27%) Strange Brigade: Totals from 3675 (92.24% of 3984) affected shaders: Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00% Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09% Max live registers: 361849 -> 351265 (-2.92%) Witcher 3: Totals from 13 (46.43% of 28) affected shaders: Instrs: 593 -> 660 (+11.30%) Cycle count: 28302 -> 28714 (+1.46%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Lionel Landwerlin	0a5bdf1199	brw: add infra to make use of the address register in the IR This limits the address register to simple cases inside a block. Validation ensures that the address register is only written once and read once. Instruction scheduling makes sure that instructions using the address register in the generator are not scheduled while there is an usage of the register in the IR. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Kenneth Graunke	4ab04799ee	brw: Delete assign_constant_locations and push_constant_loc[] The push_constant_loc[] array is always an identity mapping these days, so it's kind of pointless. Just use the original uniform number and skip the unnecessary "remap" step. With that gone, and shrinking UBO ranges gone, assign_constant_locations() is now empty and can be removed as well. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Kenneth Graunke	93e186e1a4	brw: Delete pull constant lowering Now that we never shrink ranges in the backend, we never lower push constants to pull constants late in the backend either. get_pull_loc will never return true, and so all of brw_lower_constant_loads becomes a noop. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Caio Oliveira	e1aebf8a0c	intel/brw: Remove 'fs' prefix from passes and related functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>	2025-01-02 18:11:05 +00:00
Caio Oliveira	056b14b882	intel/brw: Move two NIR passes to brw_nir.c Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32799>	2024-12-30 20:18:23 +00:00
Ian Romanick	007c92b2ac	brw/lower: Adjust source stride on DF is_scalar sources to MAD on Gfx9 This commit used to be "brw/emit: Allow scalar sources to 64-bit 3-source instructions". These instructions were fixed up in brw_eu_emit. There seems to be some conflict with the <0,1,0> stride an post-RA scheduling. The only difference between the passing code generated by this commit and the failing code generated by the older commit is some post-RA scheduling. v2: Change the stride of a MAD even if the instruction isn't lowered. MAD instructions that are already SIMD8 have to follow the same rules. 🤦 v3: Pull the lowering out to its own pass. Update the comment in brw_fs_validate. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Caio Oliveira	c8f6d8154f	intel/brw: Remove overloads for brw_print_instruction/s functions Almost all cases now handled with default arguments. The only real extra work that was being done was pushed to the client code in debug_optimizer(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>	2024-12-12 22:01:48 +00:00
Kenneth Graunke	6341b3cd87	brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>	2024-12-12 00:05:42 +00:00
Caio Oliveira	abe41b1d2c	intel/compiler: Use #pragma once instead of header guards Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>	2024-12-11 19:47:44 +00:00
Lionel Landwerlin	ba3ff8b3bb	brw: move barycentric_mode enum to intel_shader_enums.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Ian Romanick	2cc1575a31	brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>	2024-10-25 23:39:36 +00:00
Sviatoslav Peleshko	2a4efe21c5	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11928 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31746>	2024-10-23 15:02:27 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Caio Oliveira	9537b62759	intel/brw: Add SHADER_OPCODE_REDUCE Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	bf9456753d	intel/brw: Validate some instructions exists only up until some phases Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	affa7567c2	intel/brw: Add phases to backend The general idea is to be able to validate that certain instructions were lowered and certain restrictions were already handled. Passes can now assert their expectations, i.e. if a pass is mean to run after certain lowerings or not. The actual phases are a initial stab and as we re-organized the passes, we may remove/add phases. This commit just add some phase steps, later commits will make use of them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	2811cb2923	intel: Add statistic for Non SSA registers after NIR to BRW This is going to be useful while we convert the NIR to BRW to produce SSA definitions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	6db7d1af16	intel/compiler: Rename shader_stats structs Add the `brw_` and `elk_` prefixes to the structs to avoid compilation failure building with LTO ("violates the C++ One Definition Rule") when the structs diverge. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Caio Oliveira	23b0798551	intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a5cc8c4807	intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	8a39231e4f	intel/brw: Move calculate_cfg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	b98930c770	intel/brw: Move regalloc and scheduling functions out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	5cb1f46fd1	intel/brw: Remove workgroup_size() helper from fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	17b7e49089	intel/brw: Move out of fs_visitor and rename print instructions They use the brw_print prefix now. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	cdbee4156e	intel/brw: Reduce scope of some MESH specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	67ead4edff	intel/brw: Reduce scope of some TES specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	f9ddf51b70	intel/brw: Reduce scope of some TCS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	47b9dc9070	intel/brw: Reduce scope of some GS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	28858b3ad1	intel/brw: Reduce scope of some FS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a8b4b9dd51	intel/brw: Reduce scope of some VS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	fdb029fe1b	intel/brw: Move and reduce scope of run_*() functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Sagar Ghuge	99ce8b5a07	intel/compiler: Add indirect mov lowering pass Indirect addressing(vx1 and vxh) not supported with UB/B datatype for src0, so we need to change the data type for both dest and src0. This fixes following tests cases on Xe2+ - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16* - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32* Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Kenneth Graunke	1e69ec3b8d	intel/brw: Add a lower_csel pass and allow building it for all types We can do CSEL on F, HF, W, and D on Gfx11+. Gfx9 can only do F. We can lower unsupported types to CMP+CSEL, allowing us to use CSEL in the IR and not worry about the limitations. Rework: (Sagar) - Update validation pass for CSEL Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Francisco Jerez	95eec5a0dd	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_offset that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. That's okay since we can get all the primitive setup information needed for interpolation at an arbitrary coordinate: We use the X/Y offset relative to the "X/Y Start" coordinates from the thread payload order to evaluate the plane equations also provided in the thread payload for each barycentric coordinate of each polygon. The evaluation of the barycentric plane equations (and the RHW plane equation for perspective-correct interpolation) uses the accumulator and MAD/MAC for ALU efficiency, but that means we need to manually split instructions to fit the width of the accumulator. The division and scaling for perspective-correct interpolation is also now done in the shader if necessary. Note that even though this is only immediately useful on Xe2+, the thread payload numbers are filled out for older platforms, and the EU restrictions of previous Xe platforms are taken into account, mostly for the purposes of testing and performance evaluation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Caio Oliveira	b59ea3d63f	intel/brw: Print SWSB information when dumping instructions These were only being shown before as part of disassemble. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29738>	2024-06-23 08:09:56 -07:00

1 2 3 4 5 ...

359 commits