fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 11:18:11 +02:00

Author	SHA1	Message	Date
Ian Romanick	df704bd38e	elk: Call nir_opt_algebraic_late in elk_postprocess_nir Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Make sure that lowering undone in elk_nir_optimize are reapplied. No shader-db or fossil-db changes on any Intel platform. This is most likely to impact either Gfx8 on ANV or Gfx7.5 on HASVK. I don't fossil-db test either of those platforms. I tried doing a similar thing here as is done in BRW (previous commit), but that caused a couple Haswell shaders to fall off a performance cliff: total spills in shared programs: 8247 -> 8311 (0.78%) spills in affected programs: 6 -> 70 (1066.67%) helped: 0 / HURT: 2 total fills in shared programs: 8558 -> 8910 (4.11%) fills in affected programs: 6 -> 358 (5866.67%) helped: 0 / HURT: 2 Fixes: `442daeb54a` ("nir/opt_algebraic: use fcanonicalize") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>	2026-02-14 02:06:59 +00:00
Ian Romanick	11b96a84b0	brw: Call nir_opt_algebraic_late later in brw_postprocess_nir_opts Move the call to nir_opt_algebraic_late after the last time brw_nir_optimize might be called. nir_opt_algebraic_distribute_src_mods works together with the late algebraic optimizations, so move it also. shader-db: Lunar Lake total instructions in shared programs: 17081222 -> 17080842 (<.01%) instructions in affected programs: 419931 -> 419551 (-0.09%) helped: 545 / HURT: 826 total cycles in shared programs: 878437752 -> 879236226 (0.09%) cycles in affected programs: 506003142 -> 506801616 (0.16%) helped: 3091 / HURT: 3189 LOST: 18 GAINED: 16 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19994270 -> 19993231 (<.01%) instructions in affected programs: 490499 -> 489460 (-0.21%) helped: 660 / HURT: 800 total cycles in shared programs: 882498776 -> 882834186 (0.04%) cycles in affected programs: 477858602 -> 478194012 (0.07%) helped: 3458 / HURT: 3564 total fills in shared programs: 4371 -> 4370 (-0.02%) fills in affected programs: 7 -> 6 (-14.29%) helped: 1 / HURT: 0 LOST: 28 GAINED: 10 Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) total instructions in shared programs: 19943849 -> 19942782 (<.01%) instructions in affected programs: 467384 -> 466317 (-0.23%) helped: 655 / HURT: 796 total cycles in shared programs: 860085674 -> 861410289 (0.15%) cycles in affected programs: 426900998 -> 428225613 (0.31%) helped: 3250 / HURT: 3441 LOST: 19 GAINED: 14 fossil-db: Lunar Lake Totals: Instrs: 926472091 -> 926204838 (-0.03%); split: -0.04%, +0.01% CodeSize: 14845921056 -> 14842776112 (-0.02%); split: -0.10%, +0.08% Send messages: 41459570 -> 41459574 (+0.00%); split: -0.00%, +0.00% Cycle count: 104481085069 -> 104583692712 (+0.10%); split: -0.14%, +0.24% Spill count: 3454651 -> 3457340 (+0.08%); split: -0.15%, +0.23% Fill count: 4958779 -> 4958487 (-0.01%); split: -0.46%, +0.45% Max live registers: 193805970 -> 193839002 (+0.02%); split: -0.00%, +0.02% Max dispatch width: 49114416 -> 49113776 (-0.00%); split: +0.01%, -0.01% Non SSA regs after NIR: 142953905 -> 142800740 (-0.11%); split: -0.12%, +0.01% Totals from 420256 (20.80% of 2020128) affected shaders: Instrs: 448571327 -> 448304074 (-0.06%); split: -0.09%, +0.03% CodeSize: 7312002800 -> 7308857856 (-0.04%); split: -0.21%, +0.17% Send messages: 17716494 -> 17716498 (+0.00%); split: -0.00%, +0.00% Cycle count: 52178854998 -> 52281462641 (+0.20%); split: -0.28%, +0.48% Spill count: 2945654 -> 2948343 (+0.09%); split: -0.17%, +0.26% Fill count: 4404768 -> 4404476 (-0.01%); split: -0.51%, +0.51% Max live registers: 60875448 -> 60908480 (+0.05%); split: -0.01%, +0.06% Max dispatch width: 9455280 -> 9454640 (-0.01%); split: +0.04%, -0.04% Non SSA regs after NIR: 60542740 -> 60389575 (-0.25%); split: -0.28%, +0.02% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 1000081384 -> 999726726 (-0.04%); split: -0.05%, +0.01% CodeSize: 16764458080 -> 16761624256 (-0.02%); split: -0.09%, +0.07% Subgroup size: 27599528 -> 27599544 (+0.00%) Send messages: 45538933 -> 45538951 (+0.00%); split: -0.00%, +0.00% Cycle count: 93303830912 -> 93370118192 (+0.07%); split: -0.19%, +0.26% Spill count: 3739306 -> 3739719 (+0.01%); split: -0.22%, +0.23% Fill count: 5089719 -> 5083626 (-0.12%); split: -0.56%, +0.44% Max live registers: 122041364 -> 122055848 (+0.01%); split: -0.00%, +0.01% Max dispatch width: 38117296 -> 38127200 (+0.03%); split: +0.06%, -0.03% Non SSA regs after NIR: 164296197 -> 164299306 (+0.00%); split: -0.01%, +0.01% Totals from 338754 (14.82% of 2285730) affected shaders: Instrs: 452723479 -> 452368821 (-0.08%); split: -0.10%, +0.03% CodeSize: 7861878032 -> 7859044208 (-0.04%); split: -0.19%, +0.16% Subgroup size: 16 -> 32 (+100.00%) Send messages: 17050010 -> 17050028 (+0.00%); split: -0.00%, +0.00% Cycle count: 52881801997 -> 52948089277 (+0.13%); split: -0.33%, +0.46% Spill count: 3271458 -> 3271871 (+0.01%); split: -0.25%, +0.26% Fill count: 4628422 -> 4622329 (-0.13%); split: -0.61%, +0.48% Max live registers: 30738902 -> 30753386 (+0.05%); split: -0.01%, +0.06% Max dispatch width: 4787264 -> 4797168 (+0.21%); split: +0.47%, -0.26% Non SSA regs after NIR: 61748026 -> 61751135 (+0.01%); split: -0.03%, +0.03% Tiger Lake Totals: Instrs: 1011068379 -> 1010977290 (-0.01%); split: -0.03%, +0.02% CodeSize: 14197751744 -> 14197683040 (-0.00%); split: -0.07%, +0.07% Send messages: 46431228 -> 46431220 (-0.00%); split: -0.00%, +0.00% Cycle count: 85066526419 -> 85085088071 (+0.02%); split: -0.16%, +0.18% Spill count: 3853750 -> 3855185 (+0.04%); split: -0.15%, +0.19% Fill count: 6716746 -> 6719594 (+0.04%); split: -0.25%, +0.29% Max live registers: 122307387 -> 122326083 (+0.02%); split: -0.00%, +0.02% Max dispatch width: 38009632 -> 38003280 (-0.02%); split: +0.03%, -0.05% Non SSA regs after NIR: 158403572 -> 158415390 (+0.01%); split: -0.01%, +0.02% Totals from 277728 (12.17% of 2281577) affected shaders: Instrs: 349206856 -> 349115767 (-0.03%); split: -0.07%, +0.05% CodeSize: 5042621104 -> 5042552400 (-0.00%); split: -0.20%, +0.20% Send messages: 13132243 -> 13132235 (-0.00%); split: -0.00%, +0.00% Cycle count: 36183327716 -> 36201889368 (+0.05%); split: -0.38%, +0.43% Spill count: 2210072 -> 2211507 (+0.06%); split: -0.26%, +0.33% Fill count: 4188439 -> 4191287 (+0.07%); split: -0.39%, +0.46% Max live registers: 24956695 -> 24975391 (+0.07%); split: -0.02%, +0.09% Max dispatch width: 3948832 -> 3942480 (-0.16%); split: +0.32%, -0.48% Non SSA regs after NIR: 45616425 -> 45628243 (+0.03%); split: -0.04%, +0.06% Ice Lake Totals: Instrs: 1009584306 -> 1009411757 (-0.02%); split: -0.02%, +0.01% CodeSize: 12593466880 -> 12592958096 (-0.00%); split: -0.01%, +0.01% Send messages: 47274203 -> 47274171 (-0.00%); split: -0.00%, +0.00% Cycle count: 84920281455 -> 84914027301 (-0.01%); split: -0.05%, +0.04% Spill count: 2988523 -> 2986191 (-0.08%); split: -0.14%, +0.07% Fill count: 5296078 -> 5288737 (-0.14%); split: -0.21%, +0.07% Max live registers: 125429384 -> 125444786 (+0.01%); split: -0.00%, +0.02% Max dispatch width: 41269072 -> 41267312 (-0.00%); split: +0.03%, -0.03% Non SSA regs after NIR: 163223895 -> 163236623 (+0.01%); split: -0.01%, +0.02% Totals from 243818 (10.45% of 2334244) affected shaders: Instrs: 296953759 -> 296781210 (-0.06%); split: -0.08%, +0.02% CodeSize: 3643224480 -> 3642715696 (-0.01%); split: -0.04%, +0.03% Send messages: 11518671 -> 11518639 (-0.00%); split: -0.00%, +0.00% Cycle count: 33065548412 -> 33059294258 (-0.02%); split: -0.13%, +0.11% Spill count: 1346515 -> 1344183 (-0.17%); split: -0.32%, +0.15% Fill count: 2537906 -> 2530565 (-0.29%); split: -0.43%, +0.14% Max live registers: 21476776 -> 21492178 (+0.07%); split: -0.02%, +0.09% Max dispatch width: 3727288 -> 3725528 (-0.05%); split: +0.31%, -0.35% Non SSA regs after NIR: 41050474 -> 41063202 (+0.03%); split: -0.04%, +0.07% Skylake Totals: Instrs: 513573157 -> 513462971 (-0.02%); split: -0.02%, +0.00% CodeSize: 5950280672 -> 5950001392 (-0.00%); split: -0.01%, +0.00% Send messages: 24909757 -> 24909758 (+0.00%); split: -0.00%, +0.00% Cycle count: 57636102242 -> 57634726342 (-0.00%); split: -0.03%, +0.03% Spill count: 627286 -> 627241 (-0.01%); split: -0.01%, +0.00% Fill count: 837888 -> 837804 (-0.01%); split: -0.01%, +0.00% Max live registers: 87272271 -> 87284192 (+0.01%); split: -0.00%, +0.02% Max dispatch width: 32278832 -> 32271800 (-0.02%); split: +0.02%, -0.04% Non SSA regs after NIR: 87387713 -> 87387614 (-0.00%); split: -0.00%, +0.00% Totals from 177432 (10.30% of 1722906) affected shaders: Instrs: 127170648 -> 127060462 (-0.09%); split: -0.10%, +0.01% CodeSize: 1443406368 -> 1443127088 (-0.02%); split: -0.03%, +0.01% Send messages: 5444220 -> 5444221 (+0.00%); split: -0.00%, +0.00% Cycle count: 15423028495 -> 15421652595 (-0.01%); split: -0.10%, +0.10% Spill count: 235844 -> 235799 (-0.02%); split: -0.03%, +0.01% Fill count: 333783 -> 333699 (-0.03%); split: -0.03%, +0.01% Max live registers: 13765573 -> 13777494 (+0.09%); split: -0.01%, +0.10% Max dispatch width: 3086880 -> 3079848 (-0.23%); split: +0.24%, -0.47% Non SSA regs after NIR: 17623772 -> 17623673 (-0.00%); split: -0.00%, +0.00% Fixes: `442daeb54a` ("nir/opt_algebraic: use fcanonicalize") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>	2026-02-14 02:06:59 +00:00
Ian Romanick	5af0b8bd09	brw: Call nir_opt_algebraic_late in brw_nir_create_raygen_trampoline Make sure that lowering undone in brw_nir_optimize are reapplied. No shader-db changes on any Intel platform. Why are there fossil-db changes on platforms that don't support ray tracing? Lunar Lake Totals: Instrs: 926636441 -> 926636313 (-0.00%); split: -0.00%, +0.00% Send messages: 41510729 -> 41510723 (-0.00%); split: -0.00%, +0.00% Cycle count: 104509492613 -> 104509490569 (-0.00%); split: -0.00%, +0.00% Max live registers: 193792922 -> 193792890 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 150091934 -> 150092170 (+0.00%); split: -0.00%, +0.00% Totals from 10 (0.00% of 2020428) affected shaders: Instrs: 8142 -> 8014 (-1.57%); split: -3.14%, +1.57% Send messages: 192 -> 186 (-3.12%); split: -7.29%, +4.17% Cycle count: 131892 -> 129848 (-1.55%); split: -6.93%, +5.38% Max live registers: 1442 -> 1410 (-2.22%); split: -3.05%, +0.83% Non SSA regs after NIR: 950 -> 1186 (+24.84%); split: -26.95%, +51.79% Meteor Lake Totals: Instrs: 1000805547 -> 1000805543 (-0.00%); split: -0.00%, +0.00% Cycle count: 93131592265 -> 93131619619 (+0.00%); split: -0.00%, +0.00% Max live registers: 122081268 -> 122081244 (-0.00%); split: -0.00%, +0.00% Totals from 16 (0.00% of 2286241) affected shaders: Instrs: 18652 -> 18648 (-0.02%); split: -1.39%, +1.37% Cycle count: 369520 -> 396874 (+7.40%); split: -2.94%, +10.34% Max live registers: 1350 -> 1326 (-1.78%); split: -4.15%, +2.37% DG2 Totals: Instrs: 999834626 -> 999834651 (+0.00%); split: -0.00%, +0.00% Send messages: 45719398 -> 45719403 (+0.00%); split: -0.00%, +0.00% Cycle count: 93118238139 -> 93118269557 (+0.00%); split: -0.00%, +0.00% Max live registers: 122098944 -> 122098936 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 169413734 -> 169413661 (-0.00%); split: -0.00%, +0.00% Totals from 13 (0.00% of 2286795) affected shaders: Instrs: 18799 -> 18824 (+0.13%); split: -1.04%, +1.18% Send messages: 492 -> 497 (+1.02%); split: -2.44%, +3.46% Cycle count: 352838 -> 384256 (+8.90%); split: -1.08%, +9.98% Max live registers: 1237 -> 1229 (-0.65%); split: -2.91%, +2.26% Non SSA regs after NIR: 2191 -> 2118 (-3.33%); split: -20.86%, +17.53% Tiger Lake Totals: Instrs: 1011816778 -> 1011816714 (-0.00%); split: -0.00%, +0.00% Send messages: 46515289 -> 46515285 (-0.00%); split: -0.00%, +0.00% Cycle count: 85148902406 -> 85148894668 (-0.00%); split: -0.00%, +0.00% Max live registers: 122362180 -> 122362172 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 38036160 -> 38036176 (+0.00%) Non SSA regs after NIR: 160317521 -> 160317649 (+0.00%); split: -0.00%, +0.00% Totals from 6 (0.00% of 2282318) affected shaders: Instrs: 9204 -> 9140 (-0.70%); split: -1.43%, +0.74% Send messages: 258 -> 254 (-1.55%); split: -3.10%, +1.55% Cycle count: 287652 -> 279914 (-2.69%); split: -3.29%, +0.60% Max live registers: 552 -> 544 (-1.45%); split: -2.90%, +1.45% Max dispatch width: 48 -> 64 (+33.33%) Non SSA regs after NIR: 914 -> 1042 (+14.00%); split: -14.00%, +28.01% Ice Lake Totals: Instrs: 1012203285 -> 1012203249 (-0.00%); split: -0.00%, +0.00% Send messages: 47358859 -> 47358858 (-0.00%); split: -0.00%, +0.00% Cycle count: 85112165276 -> 85112171905 (+0.00%); split: -0.00%, +0.00% Max live registers: 125545002 -> 125544992 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 41335696 -> 41335656 (-0.00%) Non SSA regs after NIR: 166448597 -> 166448602 (+0.00%); split: -0.00%, +0.00% Totals from 13 (0.00% of 2335519) affected shaders: Instrs: 16486 -> 16450 (-0.22%); split: -1.67%, +1.46% Send messages: 368 -> 367 (-0.27%); split: -4.89%, +4.62% Cycle count: 347643 -> 354272 (+1.91%); split: -1.34%, +3.25% Max live registers: 1104 -> 1094 (-0.91%); split: -3.80%, +2.90% Max dispatch width: 192 -> 152 (-20.83%) Non SSA regs after NIR: 2100 -> 2105 (+0.24%); split: -21.76%, +22.00% Skylake Totals: Instrs: 504548665 -> 504548057 (-0.00%); split: -0.00%, +0.00% Send messages: 24479148 -> 24479118 (-0.00%); split: -0.00%, +0.00% Cycle count: 57575198140 -> 57575179256 (-0.00%); split: -0.00%, +0.00% Max live registers: 85570671 -> 85570575 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 85097646 -> 85098486 (+0.00%); split: -0.00%, +0.00% Totals from 22 (0.00% of 1703671) affected shaders: Instrs: 19866 -> 19258 (-3.06%); split: -3.72%, +0.66% Send messages: 464 -> 434 (-6.47%); split: -8.19%, +1.72% Cycle count: 250854 -> 231970 (-7.53%); split: -9.23%, +1.70% Max live registers: 2024 -> 1928 (-4.74%); split: -5.53%, +0.79% Non SSA regs after NIR: 2498 -> 3338 (+33.63%); split: -8.33%, +41.95% Fixes: `442daeb54a` ("nir/opt_algebraic: use fcanonicalize") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>	2026-02-14 02:06:59 +00:00
Ian Romanick	fd29183901	elk: Use F16TO32 for nir_op_f2f32 of float16 source This matches the behavior of nir_op_unpack_half_2x16_split_x. Gfx7 uses a special opcode for this conversion. Fixes numerous assertion failures in shader-db on Ivy Bridge and Haswell. I am not sure why this was never encountered previously. Fixes: `609c46cf23` ("nir/lower_alu_width: emit f2f32 for unpack_half_2x16") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>	2026-02-14 02:06:59 +00:00
Alyssa Rosenzweig	bd5ebbb2f8	brw: drop buggy SLM optimization Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This was incorrect for OpenCL due to the possibility of variable shared memory existing despite shared_size == 0. Fortunately the optimization it was trying to do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw code and move on with our merry lives. Fixes OpenCL tests on Iris: non_uniform_work_group non_uniform_3d_barriers basic async_strided_copy_local_to_global Cc: mesa-stable Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39795>	2026-02-13 20:28:28 +00:00
Lionel Landwerlin	1f1f484570	brw/iris: move ubo range analysis pass to iris Anv isn't using this pass anymore. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:26 +00:00
Lionel Landwerlin	d1a1e98e4e	brw: handle non-GRF aligned pushed UBO masking Right now all the drivers align push data to GRF (32B pre Xe2, 64B post Xe2) but the push constant delivery mechanism can actually pack 32B ranges so alignment is not required. Off course we need the push UBO masking to deal with unaligned pushed ranges. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Calder Young <cgiacun@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:25 +00:00
Lionel Landwerlin	c1c9048dbf	anv: add a couple of surfaces to read descriptors Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:25 +00:00
Sagar Ghuge	1fb8435b77	nir: Add nir_resource_intel_internal entry Will use the load/store_ssbo with nir_resource_intel_internal later in this series. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:22 +00:00
Lionel Landwerlin	2ef29502ed	brw: enable ex_bso for LSC_SS Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:22 +00:00
Lionel Landwerlin	9bb152c9a9	brw: make PULL_CONSTANT opcodes more like MEMORY opcodes Using binding & binding_type sources. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:22 +00:00
Matt Turner	14c65322e8	elk/cse: use copies in `operands_match` instead of in-place modification `operands_match` was modifying instruction source operands in-place (through the `elk_fs_reg *src` pointer member) and relying on a save/restore pattern to undo the modifications. Work on local copies instead, which is simpler and avoids mutating shared state in a comparison function. Fixes: `47c4b38540` ("i965/fs: Allow CSE to handle MULs with negated arguments.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>	2026-02-11 18:43:03 +00:00
Matt Turner	93f39f87c4	elk/cse: fix `operands_match` corrupting non-IMM register data The MUL case in `operands_match` was reading and writing the `.f` union member unconditionally, even when the register's `.file != IMM`. In that case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the `fabsf()` call could corrupt the `.nr` by clearing bit 31. Guard all `.f` accesses with `.file == IMM` checks. Fixes: `47c4b38540` ("i965/fs: Allow CSE to handle MULs with negated arguments.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>	2026-02-11 18:43:03 +00:00
Matt Turner	b302faad8b	brw/cse: use copies in `operands_match` instead of in-place modification `operands_match` was modifying instruction source operands in-place (through the `brw_reg *src` pointer member) and relying on a save/restore pattern to undo the modifications. Work on local copies instead, which is simpler and avoids mutating shared state in a comparison function. Fixes: `47c4b38540` ("i965/fs: Allow CSE to handle MULs with negated arguments.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>	2026-02-11 18:43:02 +00:00
Matt Turner	f5e0f63216	brw/cse: fix `operands_match` corrupting non-IMM register data The MUL case in `operands_match` was reading and writing the `.f` union member unconditionally, even when the register's `.file != IMM`. In that case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the `fabsf()` call could corrupt the `.nr` by clearing bit 31. Guard all `.f` accesses with `.file == IMM` checks. Fixes: `47c4b38540` ("i965/fs: Allow CSE to handle MULs with negated arguments.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>	2026-02-11 18:43:02 +00:00
Georg Lehmann	5926209996	brw/nir_lower_fsign: try to fix NaN correctness Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>	2026-02-10 18:42:03 +00:00
Kenneth Graunke	05ed18a37b	elk: Delete mesh shader remnants This compiler does not support mesh shaders. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>	2026-02-09 21:56:05 +00:00
Kenneth Graunke	3b4af8907f	brw: Delete wm_prog_data::urb_setup_channel[] The entire array is always initialized to zero and never modified. Cuts the size of brw_wm_prog_data by 32%. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>	2026-02-09 21:56:04 +00:00
Caio Oliveira	6b0e29bc77	brw: Fix cooperative matrix constant sources other than src0 Code was wrongly using src0 to pick the constant value. Fixes: `bf9ad36f2d` ("brw: Properly handle cooperative matrices created with constants") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39769>	2026-02-09 19:52:16 +00:00
Kenneth Graunke	c5859b2d40	intel: Rename wm_prog_key to fs_prog_key Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This is the shader key for the fragment shader. Nobody even knows what the windowizer/masker unit is or does anymore. Even on Gen4-6, "fs" is still clearer. This makes the codebase easier to read. This is only about 15 years overdue. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>	2026-02-06 20:52:01 -08:00
Kenneth Graunke	56e638be81	intel: Rename wm_prog_data to fs_prog_data This is the program data for the fragment shader. Nobody even knows what the windowizer/masker unit is or does anymore. Even on Gen4-6, "fs" is still clearer. This makes the codebase easier to read. This is only about 15 years overdue. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>	2026-02-06 20:51:59 -08:00
Kenneth Graunke	beb4b78fe7	intel: Rename intel_msaa_flags to intel_fs_config This started out as dynamic configuration for MSAA related state, but has since expanded to cover many dynamic fragment shader options. We rename it to intel_fs_config, similar to intel_tess_config, to better indicate its purpose. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>	2026-02-06 20:51:43 -08:00
Georg Lehmann	d71db17e53	elk: remove unpack_half support Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>	2026-02-06 06:12:36 +00:00
Georg Lehmann	d8391d70fe	elk/lower_storage_image: use f2f32 instead of unpack_half Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>	2026-02-06 06:12:36 +00:00
Georg Lehmann	e5f1e08f3e	brw: remove unpack_half support Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>	2026-02-06 06:12:36 +00:00
Georg Lehmann	caf982218d	brw/lower_storage_image: use f2f32 instead of unpack_half Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>	2026-02-06 06:12:36 +00:00
Caio Oliveira	06251fcc24	brw/print: Don't print extra space at the end Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39597>	2026-02-06 01:00:31 +00:00
Kenneth Graunke	6fbe201a12	brw: Convert VS/TES/GS outputs to URB intrinsics. For VS/TES/GS, we lower all outputs to temporaries and emit copies at the end of the shader (or for GS, at each EmitVertex() call) from those temporaries back to real outputs. We use vec8 URB writes without writemasking, since our output area's contents are undefined anyhow. This is simpler than what TCS and Mesh do, which allow for output variables to be read/written at a per-component level at any time, with the output memory being used for cross-thread communication. Rather than using the complicated TCS/Mesh handling and relying on vectorization, we port the emit_urb_writes() approach to NIR. This also takes care of emitting the VUE header with default values when fields aren't explicitly written by the shader. We also handle multiview in the process. It simplifies things, and also drops another case of non-semantic IO in brw. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:21 +00:00
Kenneth Graunke	52341b8b9c	brw: Split EOT handling out of emit_urb_writes() The TES workaround code is still going to be needed even after we rework URB output handling for VS/TES/GS to use NIR intrinsics. For VS, we know at least one URB write will have been emitted at the end of the program, so we can just tag it. GS already handles EOT via emit_gs_thread_end(). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:21 +00:00
Kenneth Graunke	1f0773e951	brw: Add VUE header varyings to io_component() This is needed for VS/TES/GS outputs. Mesh takes a different path because those are per-primitive. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:21 +00:00
Kenneth Graunke	54def4020c	brw: Set a valid varying_to_slot for VUE header fields other than PSIZ This lets us look up things in varying_to_slot[] without having to special case VIEWPORT, LAYER, and PRIMITIVE_SHADING_RATE. All of them map to the same slot as PSIZ, slot 0, the VUE header. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:20 +00:00
Kenneth Graunke	076a183b8f	brw: Move TES VUE map calculation before lowering outputs We'll need the VUE map when we convert to using URB intrinsics. Prepare for that by reordering VUE map setup before IO lowering. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:20 +00:00
Kenneth Graunke	2af44670ed	brw: Implement load_urb_output_handle_intel for VS/GS stages Simply get the payload field. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:20 +00:00
Kenneth Graunke	0cbf49aa8f	brw: Drop urb_handle parameter from store_urb() We always store to outputs, never inputs. Just use the output handle. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:19 +00:00
Alyssa Rosenzweig	bc69e4364f	intel: report code size in shader stats This is missing from ANV's statistics currently. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>	2026-02-02 23:30:24 +00:00
Alyssa Rosenzweig	fc53da9c39	intel: simplify shader stats names This brings what ANV reports closer to what Iris reports, and is mostly dropping redundancies. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>	2026-02-02 23:30:24 +00:00
Alyssa Rosenzweig	3d5170c705	intel: add scheduling mode statistic This is for parity with what we do in the current GL shader-db path. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>	2026-02-02 23:30:24 +00:00
José Roberto de Souza	1f61b1c367	intel/brw: Add BRW_DEPENDENCY_INSTRUCTIONS invalidation when instructions are added or removed in brw_opt_split_virtual_grfs() This fix a brw_ip_ranges shader analysis, were it fails because there is a different number of instructions than expected after brw_opt_split_virtual_grfs() optimization. Reproduced in Piglit test spec@arb_sample_shading@builtin-gl-sample-mask 0: arb_sample_shading-builtin-gl-sample-mask: ../src/intel/compiler/brw_analysis.h:150: T& brw_analysis<T, C>::require() [with T = brw_ip_ranges; C = brw_shader]: Assertion `p->validate(c)' failed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39629>	2026-02-02 14:46:50 +00:00
Caio Oliveira	db4bc5407f	brw: Print "GRF registers" in INTEL_DEBUG=shaders output Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>	2026-01-29 20:16:48 +00:00
Caio Oliveira	0d19fc8256	brw: Fix "GRF registers" stats output Pick the value from the brw_shader instead of from the prog_data, since when there are multiple variants, the prog_data one will have the maximum value. Picking the wrong value also caused compute shaders that had a single variant to report 0 GRFs since the prog_data was being filled after the generate_code() call. Issue spotted by Felix DeGrood. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>	2026-01-29 20:16:48 +00:00
Kenneth Graunke	bfca9d32d3	brw: Fix geometry shaders with non-constant vertex indices Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Geometry shaders load from separate handles for each vertex, so they don't incorporate the vertex index in the URB offset like tessellation shaders do. This means we can have a constant offset (within a vertex's section) but not have a constant vertex index. Prior to `41d7debcfe` we were emitting non-folded ALU so we thought the offset was non-constant at this point. Now we can properly detect constant offsets...but still don't want to use push inputs for non-constant vertex indices. Fixes: `41d7debcfe` ("brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39603>	2026-01-29 00:18:20 +00:00
Caio Oliveira	cc06e1ebe2	brw: Remove outdated comment about remove_dead_variables Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This now also removes dead variables created by split_array_vars, and in the future it is reasonable other optimizations inside the optimization loop to make temp variables dead. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39596>	2026-01-28 22:26:43 +00:00
Caio Oliveira	354dbbe3ae	brw: Use the "early break" loop macros when possible Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This macro will stop the loop early if there's no chance to make further progress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>	2026-01-28 19:52:02 +00:00
Caio Oliveira	da80122257	brw: Include backend NIR passes in mda files Add a pass tracker struct that can live the whole lifetime of brw_compile() functions, it will keep track of the debug_archiver and also store some metadata that allow us to name the passes. With that, we can also embed the loop tracking in the same struct, so that is free for any loop to use the "early break" optimization. There are other brw_nir_* passes that are called in the pre-processing phase. These are not currently included in the mda yet. Will be handled when we hook debug_archiver or similar to the runtime/driver. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>	2026-01-28 19:52:02 +00:00
Iván Briano	5b48805b42	brw: fix local_invocation_index with quad derivaties on mesh/task shaders For mesh/task shaders, the thread payload provides a local invocation index, but it's always linear so it doesn't give the correct value when quad derivatives are in use. The lowering pass where all of this is done correctly for compute shaders assumes load_local_invocation_index will be lowered in the backend for mesh/task, calculates the values for the quads correctly but then avoid replacing the original intrinsic and we remain with the wrong results. Add an intel specific intrinsic and always lower the generic one to that (or whatever else was calculated) to avoid ambiguities and fix the value for quad derivatives. Fixes future CTS tests using mesh/task shaders under: dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.* Fixes: `d89bfb1ff7` ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>	2026-01-27 22:28:19 +00:00
Kenneth Graunke	41d7debcfe	brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This avoids generating some useless math that would need to be cleaned up later, without complicating things too much. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	24c66d3871	brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize This helps cut down URB messages on tessellation and mesh shaders significantly. fossil-db results on Battlemage: Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01% Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00% Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00% Spill count: 576554 -> 576616 (+0.01%) Fill count: 545304 -> 545413 (+0.02%) Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04% Max dispatch width: 39856192 -> 39856208 (+0.00%) Totals from 4231 (0.27% of 1583648) affected shaders: Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40% Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03% Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70% Spill count: 378 -> 440 (+16.40%) Fill count: 1308 -> 1417 (+8.33%) Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36% Max dispatch width: 67952 -> 67968 (+0.02%) There are several reasons we didn't go with nir_opt_vectorize_io: 1. nir_opt_vectorize_io appears to work on the slot location level. We want to be able to vectorize based on the URB offsets, especially for cases like point size, layer, and viewport which have different VARYING_SLOT_* values but live in the same vec4 in a URB entry. 2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize within a single 32-bit vec4. It does handle 8 components, but that's only for packing 16-bit values into a 32-bit vec4. Improves performance of Sascha Willems' tessellation demo by around 4% on Meteorlake. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	aafe8967fd	brw: Avoid using URB global offset with per-slot offsets on <= Icelake Both the URB Global Offset and Per-Slot Offsets are specified to be unsigned numbers. The URB Global Offset is only 11 bits, and so is limited to be between [0, 2047]. While the per-slot offsets are given as U32 values, it would appear that adding the two offsets does not handle 32-bit overflow/unsigned wrap correctly. This pops up in Piglit's TCS variable-indexing tests, which ends up performing loads from offset (x - 16) and a base of 18, and at an offset (x) with a base of 2. These should be equivalent, but when x <= 15, the per-slot offset calculated in the shader is negative (0xfffffff[0-f]) and adding the base of 18 is not wrapping around correctly to [2, 17]. To work around this, avoid using the global offset when the per-slot offset is present, and just add the two in the shader where unsigned wrap works correctly. Tigerlake and later don't seem to have this issue. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	07ac0e3463	brw: Skip vec8 store_urb_vec4_intel noop writemasks as well We were checking for 0xf which is fine for vec4, but vec8 gets 0xff. Either way, nothing is writemasked, so we can skip sending the mask. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	dbb24ff56b	brw: Assert that urb_vec4_intel stores only have 4/8 components vec1-3, 5-7, and 9+ are not supported. Only vec4 and vec8. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00

1 2 3 4 5 ...

4950 commits