fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 00:08:09 +02:00

Author	SHA1	Message	Date
Mel Henning	17876a00af	nir: Add a faster lowest common ancestor algorithm On a fossil from the blender 4.5.0 vulkan backend, this improves compile times in nak by about 17%. Compile time of other shaders improves by a more modest 1.2%. No stat changes on shader-db. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>	2025-09-08 23:03:13 +00:00
Caio Oliveira	f37c9c873c	brw: Fix printing of blocks in disassembly when BRW is available Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When disassembling and BRW IR is available (which happens in the generator), there will be pointers to the BRW's basic block structures that are used to print the block numbers and predecessor/successors in the output. There are two challenges: - Because DO and FLOW instructions are not real instructions, they are not emitted in the output but would still cause the output to contain empty blocks. Previous code accounted for DO but still had problems. - DO blocks have special physical links that don't make sense when the DO is not emitted at the end, but they would be shown even if that block was omitted. These issues can be seen here (edited to remove non-essential bits) ``` START B0 (2 cycles) mov(8) g126<1>UD 0x3f800000UD END B0 ->B1 START B2 <-B1 <-B4 (0 cycles) END B2 ->B3 START B3 <-B2 (260 cycles) LABEL1: mov(8) g1<1>D 0D cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D sync nop(1) null<0,1,0>UB send(1) g0UD g1UD nullUD (+f0.0) break(8) JIP: LABEL0 UIP: LABEL0 END B3 ->B1 ->B5 ->B4 START B4 <-B3 (1000 cycles) sync nop(1) null<0,1,0>UB mov(8) g126<1>UD g0<0,1,0>UD LABEL0: while(8) JIP: LABEL1 END B4 ->B2 START B5 <-B1 <-B3 (20 cycles) ``` For example: - Block 1 is missing (a skipped DO block) - Block 2 is empty (it was a FLOW block) - Block 3 ends with a link to Block 1 (the special links involving DO blocks). Two key changes were made to fix this. First, skip the DO and FLOW blocks completely. The use_tail ensures that the instruction group is reused to avoid empty blocks. Second, when printing, the successors and predecessors, walk through the skipped blocks. And finally, don't print the special blocks. With the fix, here's the output. Note the blocks retain their original BRW IR number. ``` START B0 (2 cycles) mov(8) g127<1>UD 0x3f800000UD END B0 ->B3 START B3 <-B0 <-B4 (260 cycles) LABEL1: mov(8) g1<1>D 0D cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D sync nop(1) null<0,1,0>UB send(1) g0UD g1UD nullUD (+f0.0) break(8) JIP: LABEL0 UIP: LABEL0 END B3 ->B5 ->B4 START B4 <-B3 (1000 cycles) sync nop(1) null<0,1,0>UB mov(8) g127<1>UD g0<0,1,0>UD LABEL0: while(8) JIP: LABEL1 END B4 ->B3 START B5 <-B3 (20 cycles) ``` Issue was spotted by Ken. Fixes: `d2c39b1779` ("intel/brw: Always have a (non-DO) block after a DO in the CFG") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>	2025-09-06 16:42:05 +00:00
Lionel Landwerlin	a91e0e0d61	brw: add support for separate tessellation shader compilation Tessellation factors have to be written dynamically (based on the next shader primitive topology) and the builtins read using a dynamic offset (based on the preceeding shader's VUE). Anv is updated to use this new infrastructure for dynamic patch_control_points. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>	2025-09-05 07:46:17 +00:00
Lionel Landwerlin	a18835a9ca	anv/brw/iris: move VS VUE computation to backend Drivers can provide the inputs required for the backend to call the compute function. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>	2025-09-05 07:46:16 +00:00
Lionel Landwerlin	8dee4813b0	brw: add ability to compute VUE map for separate tcs/tes Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>	2025-09-05 07:46:16 +00:00
Ian Romanick	1ce90ad5e1	elk: Use nir_opt_sink and more nir_opt_move Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details I spent a bunch of time playing around with the various enable bits, and this was the best I could come up with. Enabling any of nir_move_comparisons or nir_move_load_ubo in nir_opt_sink helped instructions quite a bit, but it also caused a large pile of added spills and fills. shader-db: Broadwell total instructions in shared programs: 18428980 -> 18427957 (<.01%) instructions in affected programs: 425245 -> 424222 (-0.24%) helped: 1522 / HURT: 405 total cycles in shared programs: 954756705 -> 953755695 (-0.10%) cycles in affected programs: 623470486 -> 622469476 (-0.16%) helped: 17989 / HURT: 21175 total spills in shared programs: 8349 -> 8356 (0.08%) spills in affected programs: 285 -> 292 (2.46%) helped: 7 / HURT: 13 total fills in shared programs: 10426 -> 10192 (-2.24%) fills in affected programs: 675 -> 441 (-34.67%) helped: 25 / HURT: 1 LOST: 346 GAINED: 554 Haswell total instructions in shared programs: 16809730 -> 16801634 (-0.05%) instructions in affected programs: 772251 -> 764155 (-1.05%) helped: 3055 / HURT: 840 total cycles in shared programs: 945179935 -> 944315696 (-0.09%) cycles in affected programs: 549177588 -> 548313349 (-0.16%) helped: 34143 / HURT: 23605 total spills in shared programs: 7699 -> 7666 (-0.43%) spills in affected programs: 353 -> 320 (-9.35%) helped: 10 / HURT: 16 total fills in shared programs: 8184 -> 7671 (-6.27%) fills in affected programs: 1006 -> 493 (-50.99%) helped: 30 / HURT: 2 total sends in shared programs: 1016676 -> 1016682 (<.01%) sends in affected programs: 49 -> 55 (12.24%) helped: 0 / HURT: 6 LOST: 415 GAINED: 441 Ivy Bridge total instructions in shared programs: 15764955 -> 15757178 (-0.05%) instructions in affected programs: 707453 -> 699676 (-1.10%) helped: 2893 / HURT: 547 total cycles in shared programs: 430017934 -> 429720104 (-0.07%) cycles in affected programs: 251816726 -> 251518896 (-0.12%) helped: 33110 / HURT: 22056 total spills in shared programs: 1537 -> 1525 (-0.78%) spills in affected programs: 18 -> 6 (-66.67%) helped: 6 / HURT: 0 total fills in shared programs: 926 -> 905 (-2.27%) fills in affected programs: 24 -> 3 (-87.50%) helped: 6 / HURT: 0 total sends in shared programs: 816646 -> 816652 (<.01%) sends in affected programs: 49 -> 55 (12.24%) helped: 0 / HURT: 6 LOST: 332 GAINED: 417 Sandy Bridge total instructions in shared programs: 14055229 -> 14045281 (-0.07%) instructions in affected programs: 1436142 -> 1426194 (-0.69%) helped: 5858 / HURT: 757 total cycles in shared programs: 772123170 -> 813543451 (5.36%) cycles in affected programs: 521342483 -> 562762764 (7.94%) helped: 27928 / HURT: 35923 total spills in shared programs: 1742 -> 1741 (-0.06%) spills in affected programs: 66 -> 65 (-1.52%) helped: 1 / HURT: 0 total fills in shared programs: 970 -> 967 (-0.31%) fills in affected programs: 93 -> 90 (-3.23%) helped: 1 / HURT: 0 total sends in shared programs: 1239222 -> 1238992 (-0.02%) sends in affected programs: 6137 -> 5907 (-3.75%) helped: 342 / HURT: 112 LOST: 244 GAINED: 434 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8366385 -> 8363954 (-0.03%) instructions in affected programs: 162761 -> 160330 (-1.49%) helped: 600 / HURT: 195 total cycles in shared programs: 248992618 -> 252119334 (1.26%) cycles in affected programs: 50774708 -> 53901424 (6.16%) helped: 3435 / HURT: 5131 total sends in shared programs: 623693 -> 623681 (<.01%) sends in affected programs: 351 -> 339 (-3.42%) helped: 12 / HURT: 0 LOST: 0 GAINED: 6 Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>	2025-09-04 15:01:18 -07:00
Ian Romanick	6f30cf71fe	brw: Use nir_opt_sink and more nir_opt_move The shader-db results on most platforms are pretty mixed. However, this seems to be a decent improvement in fossil-db. shader-db:: Lunar Lake total instructions in shared programs: 17019147 -> 17023017 (0.02%) instructions in affected programs: 1200847 -> 1204717 (0.32%) helped: 814 / HURT: 2458 total cycles in shared programs: 880532116 -> 880406462 (-0.01%) cycles in affected programs: 798253846 -> 798128192 (-0.02%) helped: 30064 / HURT: 33008 total spills in shared programs: 3262 -> 3260 (-0.06%) spills in affected programs: 66 -> 64 (-3.03%) helped: 1 / HURT: 2 total fills in shared programs: 1616 -> 1637 (1.30%) fills in affected programs: 89 -> 110 (23.60%) helped: 1 / HURT: 2 LOST: 241 GAINED: 356 Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19859724 -> 19865383 (0.03%) instructions in affected programs: 2166810 -> 2172469 (0.26%) helped: 942 / HURT: 3563 total cycles in shared programs: 879095859 -> 878616086 (-0.05%) cycles in affected programs: 753840990 -> 753361217 (-0.06%) helped: 33442 / HURT: 35053 total spills in shared programs: 4679 -> 4677 (-0.04%) spills in affected programs: 80 -> 78 (-2.50%) helped: 1 / HURT: 2 total fills in shared programs: 4113 -> 4175 (1.51%) fills in affected programs: 87 -> 149 (71.26%) helped: 1 / HURT: 2 LOST: 706 GAINED: 563 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20610947 -> 20615741 (0.02%) instructions in affected programs: 2138334 -> 2143128 (0.22%) helped: 979 / HURT: 3635 total cycles in shared programs: 863103771 -> 862153697 (-0.11%) cycles in affected programs: 731626072 -> 730675998 (-0.13%) helped: 34060 / HURT: 34256 total spills in shared programs: 3992 -> 3949 (-1.08%) spills in affected programs: 504 -> 461 (-8.53%) helped: 8 / HURT: 6 total fills in shared programs: 3640 -> 3573 (-1.84%) fills in affected programs: 1505 -> 1438 (-4.45%) helped: 8 / HURT: 5 LOST: 622 GAINED: 1018 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 232649299 -> 232485503 (-0.07%); split: -0.16%, +0.09% Subgroup size: 15932144 -> 15933056 (+0.01%); split: +0.01%, -0.00% Loop count: 137431 -> 137430 (-0.00%) Cycle count: 32619860020 -> 32714539770 (+0.29%); split: -0.80%, +1.09% Spill count: 540835 -> 519861 (-3.88%); split: -4.79%, +0.91% Fill count: 700278 -> 663650 (-5.23%); split: -6.46%, +1.23% Scratch Memory Size: 37258240 -> 35654656 (-4.30%); split: -5.24%, +0.94% Max live registers: 72561256 -> 71501759 (-1.46%); split: -1.62%, +0.16% Non SSA regs after NIR: 67682385 -> 67692495 (+0.01%); split: -0.00%, +0.02% Totals from 617432 (78.20% of 789594) affected shaders: Instrs: 217754449 -> 217590653 (-0.08%); split: -0.17%, +0.10% Subgroup size: 12656912 -> 12657824 (+0.01%); split: +0.01%, -0.00% Loop count: 133283 -> 133282 (-0.00%) Cycle count: 32367979192 -> 32462658942 (+0.29%); split: -0.81%, +1.10% Spill count: 540770 -> 519796 (-3.88%); split: -4.79%, +0.91% Fill count: 700277 -> 663649 (-5.23%); split: -6.46%, +1.23% Scratch Memory Size: 37182464 -> 35578880 (-4.31%); split: -5.25%, +0.94% Max live registers: 64912683 -> 63853186 (-1.63%); split: -1.81%, +0.18% Non SSA regs after NIR: 60158776 -> 60168886 (+0.02%); split: -0.00%, +0.02% Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>	2025-09-04 15:01:18 -07:00
Caio Oliveira	4e253184de	brw: Run validation as soon as we have the CFG around Fixes: `affa7567c2` ("intel/brw: Add phases to backend") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37148>	2025-09-03 20:42:05 +00:00
Lionel Landwerlin	23a4aef14a	Revert "brw: move texture offset packing to NIR" Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This reverts commit `4346210ae6`. Fixes: `4346210ae6` ("brw: move texture offset packing to NIR") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>	2025-08-29 06:29:14 +00:00
Ian Romanick	49141ad5f2	brw: Strategically place flags initialization to help cmod prop v2: Rebase on `ac2b072312` ("brw: Add more specific brw_builder helpers"), and fix a bug that caused the new instruction to possibly be put in the wrong place. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 233675305 -> 233641585 (-0.01%) Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00% Totals from 33513 (4.25% of 789264) affected shaders: Instrs: 5200332 -> 5166612 (-0.65%) Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>	2025-08-28 22:08:20 +00:00
Ian Romanick	3018849535	brw: Don't emit redundant flags initialization for subgroup op lowering No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 233676039 -> 233675305 (-0.00%) Cycle count: 32594097814 -> 32593658094 (-0.00%); split: -0.00%, +0.00% Totals from 325 (0.04% of 789264) affected shaders: Instrs: 104491 -> 103757 (-0.70%) Cycle count: 1183870034 -> 1183430314 (-0.04%); split: -0.04%, +0.00% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>	2025-08-28 22:08:20 +00:00
Ian Romanick	4a238f461d	brw: Do cmod prop again after brw_lower_subgroup_ops shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17114300 -> 17114294 (<.01%) instructions in affected programs: 3617 -> 3611 (-0.17%) helped: 6 / HURT: 0 total cycles in shared programs: 886397556 -> 886397454 (<.01%) cycles in affected programs: 511400 -> 511298 (-0.02%) helped: 6 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233683694 -> 233676039 (-0.00%); split: -0.00%, +0.00% Cycle count: 32602038466 -> 32594097814 (-0.02%); split: -0.03%, +0.01% Spill count: 540908 -> 540704 (-0.04%) Fill count: 700935 -> 700258 (-0.10%) Totals from 2200 (0.28% of 789264) affected shaders: Instrs: 2062360 -> 2054705 (-0.37%); split: -0.37%, +0.00% Cycle count: 2506073282 -> 2498132630 (-0.32%); split: -0.41%, +0.09% Spill count: 14423 -> 14219 (-1.41%) Fill count: 34219 -> 33542 (-1.98%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 263545171 -> 263543341 (-0.00%); split: -0.00%, +0.00% Cycle count: 26480835985 -> 26484748317 (+0.01%); split: -0.01%, +0.03% Spill count: 554335 -> 554338 (+0.00%) Fill count: 645486 -> 645498 (+0.00%) Totals from 610 (0.07% of 903944) affected shaders: Instrs: 1139871 -> 1138041 (-0.16%); split: -0.17%, +0.01% Cycle count: 2274612327 -> 2278524659 (+0.17%); split: -0.15%, +0.33% Spill count: 15153 -> 15156 (+0.02%) Fill count: 36831 -> 36843 (+0.03%) Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 268713723 -> 268712817 (-0.00%); split: -0.00%, +0.00% Cycle count: 24653238085 -> 24652269669 (-0.00%); split: -0.00%, +0.00% Fill count: 671369 -> 671361 (-0.00%) Totals from 666 (0.07% of 899711) affected shaders: Instrs: 924423 -> 923517 (-0.10%); split: -0.11%, +0.01% Cycle count: 840380565 -> 839412149 (-0.12%); split: -0.13%, +0.02% Fill count: 13006 -> 12998 (-0.06%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>	2025-08-28 22:08:20 +00:00
Caio Oliveira	84963d6833	intel/brw: Take shader in the brw_generator::generate_code() parameters Simplify the calls in all the stage compile functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:20 +00:00
Caio Oliveira	c19a4150b5	intel/brw: Simplify variant tracking in brw_compile_fs Remove the cfg variables and use the shader pointers directly. Reset the variant pointer if a shader failed or will not be used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:20 +00:00
Caio Oliveira	834e30d244	intel/brw: Simplify tracking of dispatch_width_limit in brw_compile_fs Keep it in a variable, that way don't need to check which shader to look for the limit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:20 +00:00
Caio Oliveira	9d53e27579	intel/brw: Remove brw_shader::import_uniforms() The brw_shader::uniforms now is derived from the nir_shader. The only exception is compute shaders for older Gfx versions, so we move the adjust logic for that. The benefit here is untangling the code for compilation variants, that before needed to keep track of the first that compiled to, in most cases, copy an integer. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:19 +00:00
Caio Oliveira	b8a35a8a27	brw: Pass per_primitive_offset in brw_shader_params Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:19 +00:00
Caio Oliveira	6ca9021758	brw: Add brw_shader_params And unify the initialization code for brw_shader. Avoid passing brw_compile_params since for a single compilation we might have multiple shaders (the case for BS stage). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>	2025-08-28 00:06:18 +00:00
Caio Oliveira	1c933b6511	brw: Fix checking sources of wrong instruction in opt_address_reg_load Fixes: `8ac7802ac8` ("brw: move final send lowering up into the IR") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37019>	2025-08-27 22:50:23 +00:00
Lionel Landwerlin	93996c07e2	brw: fix broadcast opcode The problem with the current code is that there is a disconnect between : - the virtual register size allocated - the dispatch size - the size_written value Only the last 2 are in sync and this confuses the spiller that only looks at the destination register allocation & dispatch size to figure out how much to spill. The solution in this change is to make BROADCAST more like MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a SIMD32 register. We put the size of the register read into src2. Now the spiller sees correct read/write sizes just looking at the destination register & dispatch size. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `662339a2ff` ("brw/build: Use SIMD8 temporaries in emit_uniformize") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>	2025-08-28 00:23:44 +03:00
Lionel Landwerlin	e6ca709a4e	brw: fix INTEL_DEBUG=spill_fs We need to dirty the instruction BRW_DEPENDENCY_INSTRUCTIONS & BRW_DEPENDENCY_VARIABLES if anything was spilled. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `a6b0783375` ("brw: Use brw_ip_ranges in scheduling / regalloc") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13233 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36925>	2025-08-27 15:08:35 +00:00
Lionel Landwerlin	3362b8dcb5	brw: use a scalar builder for the load_payload on transpose loads I noticed SIMD32 shaders have that kind of pattern : mov(32) g94<1>D 0D { align1 WE_all }; send(1) g15UD g94UD nullUD 0x6210d500 0x02010000 ugm MsgDesc: ( load, a32, d32, V16, transpose, L1STATE_L3MOCS dst_len = 1, src0_len = 1, src1_len = 0 bti ) BTI 2 base_offset 16 { align1 WE_all 1N I@5 $1 }; Why use a 32 wide register for a SEND that is only going to read the first lane? We can stick a single physical register and reduce register pressure. DG2 fossils-db results : Totals: Instrs: 157417515 -> 157417796 (+0.00%); split: -0.00%, +0.00% Cycle count: 15362185116 -> 15363086774 (+0.01%); split: -0.05%, +0.05% Max live registers: 29059141 -> 29051166 (-0.03%) Max dispatch width: 5071256 -> 5075720 (+0.09%); split: +0.33%, -0.24% Totals from 82132 (14.43% of 569221) affected shaders: Instrs: 26564632 -> 26564913 (+0.00%); split: -0.00%, +0.00% Cycle count: 4630907475 -> 4631809133 (+0.02%); split: -0.16%, +0.18% Max live registers: 5425037 -> 5417062 (-0.15%) Max dispatch width: 128384 -> 132848 (+3.48%); split: +12.92%, -9.45% LNL fossils-db results : Totals: Instrs: 141870413 -> 141870745 (+0.00%); split: -0.00%, +0.00% Cycle count: 20176018818 -> 20191262632 (+0.08%); split: -0.07%, +0.14% Max live registers: 44858167 -> 44838370 (-0.04%) Totals from 51859 (10.55% of 491590) affected shaders: Instrs: 16834547 -> 16834879 (+0.00%); split: -0.00%, +0.00% Cycle count: 5761980106 -> 5777223920 (+0.26%); split: -0.24%, +0.50% Max live registers: 5893878 -> 5874081 (-0.34%) Perf A/B testing only reported a 0.5% improvement on DG2 on one trace, no changes on BMG. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>	2025-08-26 12:03:22 +00:00
Lionel Landwerlin	27c69acb6a	brw: remove uniform from opt_offsets Those are for push constants, no point in doing that because : - there is no HW constant offsets in push constants (payload delivery), it's just register offset calculation - if we have an dynamic value it's already using MOV_INDIRECT Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `e103afe7be` ("brw: run the nir_opt_offsets pass and set the maximum offset size") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>	2025-08-26 12:03:22 +00:00
Konstantin Seurer	9df7b48d2f	nir: Use nir_def_as_* in more places Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>	2025-08-24 14:03:09 +00:00
Caio Oliveira	74a4e7dd4b	brw: Fix folding case for MAD instruction with all immediates Fixes: `b605f76b2a` ("brw/algebraic: Constant fold multiplicands of MAD") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36867>	2025-08-21 17:19:18 +00:00
Caio Oliveira	eec64c865f	brw: Add disabled test for MAD constant folding Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36867>	2025-08-21 17:19:18 +00:00
Calder Young	c7e48f79b7	brw,anv: Reduce UBO robustness size alignment to 16 bytes Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Instead of being encoded as a contiguous 64-bit mask of individual registers, the robustness information is now encoded as a vector of up to 4 bytes that represent the limits of each of the pushed UBO ranges in 16 byte units. Some buggy Direct3D workloads are known to depend on a robustness alignment as low as 16 bytes to work properly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>	2025-08-21 09:04:55 +00:00
Lionel Landwerlin	2281e88381	brw: make assign_curb_setup visible in optimizer debug Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>	2025-08-21 09:04:54 +00:00
Lionel Landwerlin	df37c7ca74	brw: fix analysis dirtying with pulled constants Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `5c17299084` ("brw: enable A64 pulling of push constants") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>	2025-08-21 09:04:53 +00:00
Marek Olšák	c601308615	nir: convert nir_instr_worklist to init/fini semantics w/out allocation This removes the malloc overhead. Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>	2025-08-21 06:13:49 +00:00
Marek Olšák	3aadae22ad	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd We can just place the set structures inside nir_block. This reduces the number of ralloc calls by 6.7% when compiling Heaven shaders with radeonsi+ACO using a release build (i.e. not including nir_validate set allocations, which are also removed). Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>	2025-08-21 06:13:48 +00:00
Lionel Landwerlin	fe38fb858c	brw: workaround broken indirect RT messages on Gfx11 Unfortunately we cannot use the indirect descriptor on Gfx11, it appears to just drop writes. Other platforms appear to be fine. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>	2025-08-20 15:01:50 +00:00
Lionel Landwerlin	a0844458b8	brw: enable opt_register_coalesce to work with multiple EOT blocks Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>	2025-08-20 15:01:50 +00:00
Lionel Landwerlin	c4c7ff3f8f	brw: enable register allocation to deal with multiple EOTs Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>	2025-08-20 15:01:50 +00:00
Caio Oliveira	4fda724fd4	brw: Avoid invalid access when compacting out-of-bounds JIP/UIP Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Usually JIP will be valid, but as part of other changes, it will be possible to have a shader that have multiple EOT messages and end with and ENDIF instruction. Its JIP will point after the program ends. This is fine but was tripping up the compaction code. Change compaction to not read its internal structures beyond the last instruction. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36822>	2025-08-20 00:54:41 +00:00
Caio Oliveira	148063670d	brw: If the instruction is already a SEND, no need to resize sources Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Kept an assert as a placeholder in case we had something odd going on that this code was protecting. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>	2025-08-19 13:54:43 +00:00
Caio Oliveira	cebac156c4	brw: Only access valid sources in lower_btd_logical_send() Only the SHADER_OPCODE_BTD_SPAWN_LOGICAL has sources, so only reach for them when handling that instruction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>	2025-08-19 13:54:43 +00:00
Caio Oliveira	dc960936fc	brw: Move resize_sources() earlier when lowering FIND_LIVE_CHANNELS Move it before the new source is used. This currently works because all instructions have a minimum amount of sources allocated, but a later commit will change that. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>	2025-08-19 13:54:43 +00:00
Caio Oliveira	fe2e2fabcd	brw: Make sure copied instruction don't copy the list pointers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>	2025-08-19 13:54:43 +00:00
Caio Oliveira	5a34f676a5	brw: Define order for fixes in 3-src operand fix Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>	2025-08-19 13:54:43 +00:00
Sagar Ghuge	49b917baaf	intel/compiler: Fix ray geometry index Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We have only 24-bit wide geometry index, not the 28-bit wide. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36796>	2025-08-19 09:32:55 +00:00
Matt Turner	6fd4dc353c	elk/algebraic: Protect SHUFFLE from OOB indices Akin to `b67230de63` ("intel/fs: Protect opt_algebraic from OOB BROADCAST indices"), we need to protect SHUFFLE as well. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36779>	2025-08-19 09:15:19 +00:00
Matt Turner	b4b692c486	brw/algebraic: Protect SHUFFLE from OOB indices Akin to `b67230de63` ("intel/fs: Protect opt_algebraic from OOB BROADCAST indices"), we need to protect SHUFFLE as well. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13351 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36779>	2025-08-19 09:15:19 +00:00
Lionel Landwerlin	c871a62a75	brw: move URB channel mask shifting to the lowering pass For example Xe2 uses the LSC and doesn´t need the shifting, so let's just apply it where it's needed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>	2025-08-13 12:01:49 +00:00
Lionel Landwerlin	68838d7001	brw: reorder reloc enums to leave embedded samplers at the end So that the driver can allocate an array of relocations using BRW_SHADER_RELOC_EMBEDDED_SAMPLER_HANDLE + number_of_embedded_samplers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>	2025-08-13 12:01:49 +00:00
Lionel Landwerlin	46c16f854e	brw: compute consistent clip/cull distance masks with VUE We can optimize the VUE layout in cases where all shaders are compiled together and some outputs are unused. So we need to have consistent clip/cull_distance_mask with the VUE. Previously we could have a VUE without ClipDistance present in the header and yet have a non zero clip_distance_mask. This would trip the HW into taking into account a VUE field that doesn't exist. Here we set the clip/cull_distance_mask to 0 if the associated output is not written by the shader. The written outputs are always consistent with what's in the VUE. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `2d396f6085` ("intel: prepare VUE layout for more than 2 layouts") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13685 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>	2025-08-13 06:24:44 +00:00
Sagar Ghuge	cac3b4f404	anv: Mask off excessive invocations Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details For unaligned invocations, don't launch two COMPUTE_WALKER, instead we can mask off excessive invocations in the shader itself at nir level and launch one additional workgroup. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>	2025-08-12 23:17:02 +00:00
Kenneth Graunke	5e9de5317e	brw: Validate that send payloads can't be imms or have source mods Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details To ensure we haven't missed resolving these things. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>	2025-08-08 22:12:11 +00:00
Kenneth Graunke	22165defb5	brw: Drop interlock and memory fence logical opcodes from is_payload() These are lowered to sends prior to any callers of this helper. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>	2025-08-08 22:12:11 +00:00
Kenneth Graunke	ed4fadbb16	brw: Drop INTERPOLATE_AT_* opcodes from is_payload() These are lowered to sends prior to any callers of this helper. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>	2025-08-08 22:12:10 +00:00

1 2 3 4 5 ...

4526 commits