Commit graph

4526 commits

Author SHA1 Message Date
Mel Henning
17876a00af nir: Add a faster lowest common ancestor algorithm
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.

No stat changes on shader-db.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
2025-09-08 23:03:13 +00:00
Caio Oliveira
f37c9c873c brw: Fix printing of blocks in disassembly when BRW is available
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When disassembling and BRW IR is available (which happens in the
generator), there will be pointers to the BRW's basic block structures
that are used to print the block numbers and predecessor/successors
in the output.

There are two challenges:

- Because DO and FLOW instructions are not real instructions, they are
  not emitted in the output but would still cause the output to contain
  empty blocks.  Previous code accounted for DO but still had problems.

- DO blocks have special physical links that don't make sense when the
  DO is not emitted at the end, but they would be shown even if that
  block was omitted.

These issues can be seen here (edited to remove non-essential bits)

```
   START B0 (2 cycles)
mov(8)          g126<1>UD       0x3f800000UD
   END B0 ->B1
   START B2 <-B1 <-B4 (0 cycles)
   END B2 ->B3
   START B3 <-B2 (260 cycles)

LABEL1:
mov(8)          g1<1>D          0D
cmp.ge.f0.0(8)  null<1>D        g2<0,1,0>D      10D
sync nop(1)                     null<0,1,0>UB
send(1)         g0UD            g1UD            nullUD
(+f0.0) break(8) JIP:  LABEL0         UIP:  LABEL0
   END B3 ->B1 ->B5 ->B4
   START B4 <-B3 (1000 cycles)
sync nop(1)                     null<0,1,0>UB
mov(8)          g126<1>UD       g0<0,1,0>UD

LABEL0:
while(8)        JIP:  LABEL1
   END B4 ->B2
   START B5 <-B1 <-B3 (20 cycles)
```

For example:
- Block 1 is missing (a skipped DO block)
- Block 2 is empty (it was a FLOW block)
- Block 3 ends with a link to Block 1 (the special links involving DO
  blocks).

Two key changes were made to fix this.  First, skip the DO and FLOW
blocks completely.  The use_tail ensures that the instruction group is
reused to avoid empty blocks.  Second, when printing, the successors and
predecessors, walk through the skipped blocks.  And finally, don't print
the special blocks.

With the fix, here's the output.  Note the blocks retain their original
BRW IR number.

```
   START B0 (2 cycles)
mov(8)          g127<1>UD       0x3f800000UD
   END B0 ->B3
   START B3 <-B0 <-B4 (260 cycles)

LABEL1:
mov(8)          g1<1>D          0D
cmp.ge.f0.0(8)  null<1>D        g2<0,1,0>D      10D
sync nop(1)                     null<0,1,0>UB
send(1)         g0UD            g1UD            nullUD
(+f0.0) break(8) JIP:  LABEL0         UIP:  LABEL0
   END B3 ->B5 ->B4
   START B4 <-B3 (1000 cycles)
sync nop(1)                     null<0,1,0>UB
mov(8)          g127<1>UD       g0<0,1,0>UD

LABEL0:
while(8)        JIP:  LABEL1
   END B4 ->B3
   START B5 <-B3 (20 cycles)
```

Issue was spotted by Ken.

Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>
2025-09-06 16:42:05 +00:00
Lionel Landwerlin
a91e0e0d61 brw: add support for separate tessellation shader compilation
Tessellation factors have to be written dynamically (based on the next
shader primitive topology) and the builtins read using a dynamic
offset (based on the preceeding shader's VUE).

Anv is updated to use this new infrastructure for dynamic
patch_control_points.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:17 +00:00
Lionel Landwerlin
a18835a9ca anv/brw/iris: move VS VUE computation to backend
Drivers can provide the inputs required for the backend to call the
compute function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:16 +00:00
Lionel Landwerlin
8dee4813b0 brw: add ability to compute VUE map for separate tcs/tes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:16 +00:00
Ian Romanick
1ce90ad5e1 elk: Use nir_opt_sink and more nir_opt_move
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I spent a bunch of time playing around with the various enable bits, and
this was the best I could come up with. Enabling any of
nir_move_comparisons or nir_move_load_ubo in nir_opt_sink helped
instructions quite a bit, but it also caused a large pile of added
spills and fills.

shader-db:

Broadwell
total instructions in shared programs: 18428980 -> 18427957 (<.01%)
instructions in affected programs: 425245 -> 424222 (-0.24%)
helped: 1522 / HURT: 405

total cycles in shared programs: 954756705 -> 953755695 (-0.10%)
cycles in affected programs: 623470486 -> 622469476 (-0.16%)
helped: 17989 / HURT: 21175

total spills in shared programs: 8349 -> 8356 (0.08%)
spills in affected programs: 285 -> 292 (2.46%)
helped: 7 / HURT: 13

total fills in shared programs: 10426 -> 10192 (-2.24%)
fills in affected programs: 675 -> 441 (-34.67%)
helped: 25 / HURT: 1

LOST:   346
GAINED: 554

Haswell
total instructions in shared programs: 16809730 -> 16801634 (-0.05%)
instructions in affected programs: 772251 -> 764155 (-1.05%)
helped: 3055 / HURT: 840

total cycles in shared programs: 945179935 -> 944315696 (-0.09%)
cycles in affected programs: 549177588 -> 548313349 (-0.16%)
helped: 34143 / HURT: 23605

total spills in shared programs: 7699 -> 7666 (-0.43%)
spills in affected programs: 353 -> 320 (-9.35%)
helped: 10 / HURT: 16

total fills in shared programs: 8184 -> 7671 (-6.27%)
fills in affected programs: 1006 -> 493 (-50.99%)
helped: 30 / HURT: 2

total sends in shared programs: 1016676 -> 1016682 (<.01%)
sends in affected programs: 49 -> 55 (12.24%)
helped: 0 / HURT: 6

LOST:   415
GAINED: 441

Ivy Bridge
total instructions in shared programs: 15764955 -> 15757178 (-0.05%)
instructions in affected programs: 707453 -> 699676 (-1.10%)
helped: 2893 / HURT: 547

total cycles in shared programs: 430017934 -> 429720104 (-0.07%)
cycles in affected programs: 251816726 -> 251518896 (-0.12%)
helped: 33110 / HURT: 22056

total spills in shared programs: 1537 -> 1525 (-0.78%)
spills in affected programs: 18 -> 6 (-66.67%)
helped: 6 / HURT: 0

total fills in shared programs: 926 -> 905 (-2.27%)
fills in affected programs: 24 -> 3 (-87.50%)
helped: 6 / HURT: 0

total sends in shared programs: 816646 -> 816652 (<.01%)
sends in affected programs: 49 -> 55 (12.24%)
helped: 0 / HURT: 6

LOST:   332
GAINED: 417

Sandy Bridge
total instructions in shared programs: 14055229 -> 14045281 (-0.07%)
instructions in affected programs: 1436142 -> 1426194 (-0.69%)
helped: 5858 / HURT: 757

total cycles in shared programs: 772123170 -> 813543451 (5.36%)
cycles in affected programs: 521342483 -> 562762764 (7.94%)
helped: 27928 / HURT: 35923

total spills in shared programs: 1742 -> 1741 (-0.06%)
spills in affected programs: 66 -> 65 (-1.52%)
helped: 1 / HURT: 0

total fills in shared programs: 970 -> 967 (-0.31%)
fills in affected programs: 93 -> 90 (-3.23%)
helped: 1 / HURT: 0

total sends in shared programs: 1239222 -> 1238992 (-0.02%)
sends in affected programs: 6137 -> 5907 (-3.75%)
helped: 342 / HURT: 112

LOST:   244
GAINED: 434

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8366385 -> 8363954 (-0.03%)
instructions in affected programs: 162761 -> 160330 (-1.49%)
helped: 600 / HURT: 195

total cycles in shared programs: 248992618 -> 252119334 (1.26%)
cycles in affected programs: 50774708 -> 53901424 (6.16%)
helped: 3435 / HURT: 5131

total sends in shared programs: 623693 -> 623681 (<.01%)
sends in affected programs: 351 -> 339 (-3.42%)
helped: 12 / HURT: 0

LOST: 0
GAINED: 6

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>
2025-09-04 15:01:18 -07:00
Ian Romanick
6f30cf71fe brw: Use nir_opt_sink and more nir_opt_move
The shader-db results on most platforms are pretty mixed. However, this
seems to be a decent improvement in fossil-db.

shader-db::

Lunar Lake
total instructions in shared programs: 17019147 -> 17023017 (0.02%)
instructions in affected programs: 1200847 -> 1204717 (0.32%)
helped: 814 / HURT: 2458

total cycles in shared programs: 880532116 -> 880406462 (-0.01%)
cycles in affected programs: 798253846 -> 798128192 (-0.02%)
helped: 30064 / HURT: 33008

total spills in shared programs: 3262 -> 3260 (-0.06%)
spills in affected programs: 66 -> 64 (-3.03%)
helped: 1 / HURT: 2

total fills in shared programs: 1616 -> 1637 (1.30%)
fills in affected programs: 89 -> 110 (23.60%)
helped: 1 / HURT: 2

LOST:   241
GAINED: 356

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19859724 -> 19865383 (0.03%)
instructions in affected programs: 2166810 -> 2172469 (0.26%)
helped: 942 / HURT: 3563

total cycles in shared programs: 879095859 -> 878616086 (-0.05%)
cycles in affected programs: 753840990 -> 753361217 (-0.06%)
helped: 33442 / HURT: 35053

total spills in shared programs: 4679 -> 4677 (-0.04%)
spills in affected programs: 80 -> 78 (-2.50%)
helped: 1 / HURT: 2

total fills in shared programs: 4113 -> 4175 (1.51%)
fills in affected programs: 87 -> 149 (71.26%)
helped: 1 / HURT: 2

LOST:   706
GAINED: 563

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20610947 -> 20615741 (0.02%)
instructions in affected programs: 2138334 -> 2143128 (0.22%)
helped: 979 / HURT: 3635

total cycles in shared programs: 863103771 -> 862153697 (-0.11%)
cycles in affected programs: 731626072 -> 730675998 (-0.13%)
helped: 34060 / HURT: 34256

total spills in shared programs: 3992 -> 3949 (-1.08%)
spills in affected programs: 504 -> 461 (-8.53%)
helped: 8 / HURT: 6

total fills in shared programs: 3640 -> 3573 (-1.84%)
fills in affected programs: 1505 -> 1438 (-4.45%)
helped: 8 / HURT: 5

LOST:   622
GAINED: 1018

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 232649299 -> 232485503 (-0.07%); split: -0.16%, +0.09%
Subgroup size: 15932144 -> 15933056 (+0.01%); split: +0.01%, -0.00%
Loop count: 137431 -> 137430 (-0.00%)
Cycle count: 32619860020 -> 32714539770 (+0.29%); split: -0.80%, +1.09%
Spill count: 540835 -> 519861 (-3.88%); split: -4.79%, +0.91%
Fill count: 700278 -> 663650 (-5.23%); split: -6.46%, +1.23%
Scratch Memory Size: 37258240 -> 35654656 (-4.30%); split: -5.24%, +0.94%
Max live registers: 72561256 -> 71501759 (-1.46%); split: -1.62%, +0.16%
Non SSA regs after NIR: 67682385 -> 67692495 (+0.01%); split: -0.00%, +0.02%

Totals from 617432 (78.20% of 789594) affected shaders:
Instrs: 217754449 -> 217590653 (-0.08%); split: -0.17%, +0.10%
Subgroup size: 12656912 -> 12657824 (+0.01%); split: +0.01%, -0.00%
Loop count: 133283 -> 133282 (-0.00%)
Cycle count: 32367979192 -> 32462658942 (+0.29%); split: -0.81%, +1.10%
Spill count: 540770 -> 519796 (-3.88%); split: -4.79%, +0.91%
Fill count: 700277 -> 663649 (-5.23%); split: -6.46%, +1.23%
Scratch Memory Size: 37182464 -> 35578880 (-4.31%); split: -5.25%, +0.94%
Max live registers: 64912683 -> 63853186 (-1.63%); split: -1.81%, +0.18%
Non SSA regs after NIR: 60158776 -> 60168886 (+0.02%); split: -0.00%, +0.02%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>
2025-09-04 15:01:18 -07:00
Caio Oliveira
4e253184de brw: Run validation as soon as we have the CFG around
Fixes: affa7567c2 ("intel/brw: Add phases to backend")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37148>
2025-09-03 20:42:05 +00:00
Lionel Landwerlin
23a4aef14a Revert "brw: move texture offset packing to NIR"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit 4346210ae6.

Fixes: 4346210ae6 ("brw: move texture offset packing to NIR")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:14 +00:00
Ian Romanick
49141ad5f2 brw: Strategically place flags initialization to help cmod prop
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%

Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick
3018849535 brw: Don't emit redundant flags initialization for subgroup op lowering
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233676039 -> 233675305 (-0.00%)
Cycle count: 32594097814 -> 32593658094 (-0.00%); split: -0.00%, +0.00%

Totals from 325 (0.04% of 789264) affected shaders:
Instrs: 104491 -> 103757 (-0.70%)
Cycle count: 1183870034 -> 1183430314 (-0.04%); split: -0.04%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick
4a238f461d brw: Do cmod prop again after brw_lower_subgroup_ops
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17114300 -> 17114294 (<.01%)
instructions in affected programs: 3617 -> 3611 (-0.17%)
helped: 6 / HURT: 0

total cycles in shared programs: 886397556 -> 886397454 (<.01%)
cycles in affected programs: 511400 -> 511298 (-0.02%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233683694 -> 233676039 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32602038466 -> 32594097814 (-0.02%); split: -0.03%, +0.01%
Spill count: 540908 -> 540704 (-0.04%)
Fill count: 700935 -> 700258 (-0.10%)

Totals from 2200 (0.28% of 789264) affected shaders:
Instrs: 2062360 -> 2054705 (-0.37%); split: -0.37%, +0.00%
Cycle count: 2506073282 -> 2498132630 (-0.32%); split: -0.41%, +0.09%
Spill count: 14423 -> 14219 (-1.41%)
Fill count: 34219 -> 33542 (-1.98%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 263545171 -> 263543341 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26480835985 -> 26484748317 (+0.01%); split: -0.01%, +0.03%
Spill count: 554335 -> 554338 (+0.00%)
Fill count: 645486 -> 645498 (+0.00%)

Totals from 610 (0.07% of 903944) affected shaders:
Instrs: 1139871 -> 1138041 (-0.16%); split: -0.17%, +0.01%
Cycle count: 2274612327 -> 2278524659 (+0.17%); split: -0.15%, +0.33%
Spill count: 15153 -> 15156 (+0.02%)
Fill count: 36831 -> 36843 (+0.03%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 268713723 -> 268712817 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24653238085 -> 24652269669 (-0.00%); split: -0.00%, +0.00%
Fill count: 671369 -> 671361 (-0.00%)

Totals from 666 (0.07% of 899711) affected shaders:
Instrs: 924423 -> 923517 (-0.10%); split: -0.11%, +0.01%
Cycle count: 840380565 -> 839412149 (-0.12%); split: -0.13%, +0.02%
Fill count: 13006 -> 12998 (-0.06%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Caio Oliveira
84963d6833 intel/brw: Take shader in the brw_generator::generate_code() parameters
Simplify the calls in all the stage compile functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira
c19a4150b5 intel/brw: Simplify variant tracking in brw_compile_fs
Remove the cfg variables and use the shader pointers directly.  Reset
the variant pointer if a shader failed or will not be used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira
834e30d244 intel/brw: Simplify tracking of dispatch_width_limit in brw_compile_fs
Keep it in a variable, that way don't need to check which shader to look
for the limit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira
9d53e27579 intel/brw: Remove brw_shader::import_uniforms()
The brw_shader::uniforms now is derived from the nir_shader.  The
only exception is compute shaders for older Gfx versions, so we
move the adjust logic for that.

The benefit here is untangling the code for compilation variants,
that before needed to keep track of the first that compiled to,
in most cases, copy an integer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira
b8a35a8a27 brw: Pass per_primitive_offset in brw_shader_params
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira
6ca9021758 brw: Add brw_shader_params
And unify the initialization code for brw_shader.  Avoid passing
brw_compile_params since for a single compilation we might have
multiple shaders (the case for BS stage).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:18 +00:00
Caio Oliveira
1c933b6511 brw: Fix checking sources of wrong instruction in opt_address_reg_load
Fixes: 8ac7802ac8 ("brw: move final send lowering up into the IR")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37019>
2025-08-27 22:50:23 +00:00
Lionel Landwerlin
93996c07e2 brw: fix broadcast opcode
The problem with the current code is that there is a disconnect between :
   - the virtual register size allocated
   - the dispatch size
   - the size_written value

Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.

The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.

Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
2025-08-28 00:23:44 +03:00
Lionel Landwerlin
e6ca709a4e brw: fix INTEL_DEBUG=spill_fs
We need to dirty the instruction BRW_DEPENDENCY_INSTRUCTIONS &
BRW_DEPENDENCY_VARIABLES if anything was spilled.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a6b0783375 ("brw: Use brw_ip_ranges in scheduling / regalloc")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13233
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36925>
2025-08-27 15:08:35 +00:00
Lionel Landwerlin
3362b8dcb5 brw: use a scalar builder for the load_payload on transpose loads
I noticed SIMD32 shaders have that kind of pattern :

mov(32)         g94<1>D         0D                              { align1 WE_all };
send(1)         g15UD           g94UD           nullUD          0x6210d500                0x02010000
                ugm MsgDesc: ( load, a32, d32, V16, transpose, L1STATE_L3MOCS dst_len = 1, src0_len = 1, src1_len = 0 bti )  BTI 2  base_offset 16  { align1 WE_all 1N I@5 $1 };

Why use a 32 wide register for a SEND that is only going to read the first lane?

We can stick a single physical register and reduce register pressure.

DG2 fossils-db results :

Totals:
Instrs: 157417515 -> 157417796 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15362185116 -> 15363086774 (+0.01%); split: -0.05%, +0.05%
Max live registers: 29059141 -> 29051166 (-0.03%)
Max dispatch width: 5071256 -> 5075720 (+0.09%); split: +0.33%, -0.24%

Totals from 82132 (14.43% of 569221) affected shaders:
Instrs: 26564632 -> 26564913 (+0.00%); split: -0.00%, +0.00%
Cycle count: 4630907475 -> 4631809133 (+0.02%); split: -0.16%, +0.18%
Max live registers: 5425037 -> 5417062 (-0.15%)
Max dispatch width: 128384 -> 132848 (+3.48%); split: +12.92%, -9.45%

LNL fossils-db results :

Totals:
Instrs: 141870413 -> 141870745 (+0.00%); split: -0.00%, +0.00%
Cycle count: 20176018818 -> 20191262632 (+0.08%); split: -0.07%, +0.14%
Max live registers: 44858167 -> 44838370 (-0.04%)

Totals from 51859 (10.55% of 491590) affected shaders:
Instrs: 16834547 -> 16834879 (+0.00%); split: -0.00%, +0.00%
Cycle count: 5761980106 -> 5777223920 (+0.26%); split: -0.24%, +0.50%
Max live registers: 5893878 -> 5874081 (-0.34%)

Perf A/B testing only reported a 0.5% improvement on DG2 on one trace, no changes on BMG.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
2025-08-26 12:03:22 +00:00
Lionel Landwerlin
27c69acb6a brw: remove uniform from opt_offsets
Those are for push constants, no point in doing that because :
   - there is no HW constant offsets in push constants (payload
     delivery), it's just register offset calculation
   - if we have an dynamic value it's already using MOV_INDIRECT

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
2025-08-26 12:03:22 +00:00
Konstantin Seurer
9df7b48d2f nir: Use nir_def_as_* in more places
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>
2025-08-24 14:03:09 +00:00
Caio Oliveira
74a4e7dd4b brw: Fix folding case for MAD instruction with all immediates
Fixes: b605f76b2a ("brw/algebraic: Constant fold multiplicands of MAD")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36867>
2025-08-21 17:19:18 +00:00
Caio Oliveira
eec64c865f brw: Add disabled test for MAD constant folding
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36867>
2025-08-21 17:19:18 +00:00
Calder Young
c7e48f79b7 brw,anv: Reduce UBO robustness size alignment to 16 bytes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of being encoded as a contiguous 64-bit mask of individual registers,
the robustness information is now encoded as a vector of up to 4 bytes that
represent the limits of each of the pushed UBO ranges in 16 byte units.
Some buggy Direct3D workloads are known to depend on a robustness alignment
as low as 16 bytes to work properly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:55 +00:00
Lionel Landwerlin
2281e88381 brw: make assign_curb_setup visible in optimizer debug
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:54 +00:00
Lionel Landwerlin
df37c7ca74 brw: fix analysis dirtying with pulled constants
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5c17299084 ("brw: enable A64 pulling of push constants")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:53 +00:00
Marek Olšák
c601308615 nir: convert nir_instr_worklist to init/fini semantics w/out allocation
This removes the malloc overhead.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
2025-08-21 06:13:49 +00:00
Marek Olšák
3aadae22ad nir: make nir_block::predecessors & dom_frontier sets non-malloc'd
We can just place the set structures inside nir_block.

This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
2025-08-21 06:13:48 +00:00
Lionel Landwerlin
fe38fb858c brw: workaround broken indirect RT messages on Gfx11
Unfortunately we cannot use the indirect descriptor on Gfx11, it
appears to just drop writes. Other platforms appear to be fine.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>
2025-08-20 15:01:50 +00:00
Lionel Landwerlin
a0844458b8 brw: enable opt_register_coalesce to work with multiple EOT blocks
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>
2025-08-20 15:01:50 +00:00
Lionel Landwerlin
c4c7ff3f8f brw: enable register allocation to deal with multiple EOTs
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>
2025-08-20 15:01:50 +00:00
Caio Oliveira
4fda724fd4 brw: Avoid invalid access when compacting out-of-bounds JIP/UIP
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Usually JIP will be valid, but as part of other changes, it will be
possible to have a shader that have multiple EOT messages and end with
and ENDIF instruction.  Its JIP will point after the program ends.
This is fine but was tripping up the compaction code.

Change compaction to not read its internal structures beyond the last
instruction.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36822>
2025-08-20 00:54:41 +00:00
Caio Oliveira
148063670d brw: If the instruction is already a SEND, no need to resize sources
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Kept an assert as a placeholder in case we had something odd going on
that this code was protecting.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
2025-08-19 13:54:43 +00:00
Caio Oliveira
cebac156c4 brw: Only access valid sources in lower_btd_logical_send()
Only the SHADER_OPCODE_BTD_SPAWN_LOGICAL has sources, so
only reach for them when handling that instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
2025-08-19 13:54:43 +00:00
Caio Oliveira
dc960936fc brw: Move resize_sources() earlier when lowering FIND_LIVE_CHANNELS
Move it before the new source is used.  This currently works because all
instructions have a minimum amount of sources allocated, but a later commit
will change that.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
2025-08-19 13:54:43 +00:00
Caio Oliveira
fe2e2fabcd brw: Make sure copied instruction don't copy the list pointers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
2025-08-19 13:54:43 +00:00
Caio Oliveira
5a34f676a5 brw: Define order for fixes in 3-src operand fix
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
2025-08-19 13:54:43 +00:00
Sagar Ghuge
49b917baaf intel/compiler: Fix ray geometry index
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We have only 24-bit wide geometry index, not the 28-bit wide.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36796>
2025-08-19 09:32:55 +00:00
Matt Turner
6fd4dc353c elk/algebraic: Protect SHUFFLE from OOB indices
Akin to b67230de63 ("intel/fs: Protect opt_algebraic from OOB BROADCAST
indices"), we need to protect SHUFFLE as well.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36779>
2025-08-19 09:15:19 +00:00
Matt Turner
b4b692c486 brw/algebraic: Protect SHUFFLE from OOB indices
Akin to b67230de63 ("intel/fs: Protect opt_algebraic from OOB BROADCAST
indices"), we need to protect SHUFFLE as well.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13351
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36779>
2025-08-19 09:15:19 +00:00
Lionel Landwerlin
c871a62a75 brw: move URB channel mask shifting to the lowering pass
For example Xe2 uses the LSC and doesn´t need the shifting, so let's
just apply it where it's needed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
2025-08-13 12:01:49 +00:00
Lionel Landwerlin
68838d7001 brw: reorder reloc enums to leave embedded samplers at the end
So that the driver can allocate an array of relocations using
BRW_SHADER_RELOC_EMBEDDED_SAMPLER_HANDLE + number_of_embedded_samplers

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
2025-08-13 12:01:49 +00:00
Lionel Landwerlin
46c16f854e brw: compute consistent clip/cull distance masks with VUE
We can optimize the VUE layout in cases where all shaders are compiled
together and some outputs are unused. So we need to have consistent
clip/cull_distance_mask with the VUE.

Previously we could have a VUE without ClipDistance present in the
header and yet have a non zero clip_distance_mask. This would trip the
HW into taking into account a VUE field that doesn't exist.

Here we set the clip/cull_distance_mask to 0 if the associated output
is not written by the shader. The written outputs are always
consistent with what's in the VUE.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2d396f6085 ("intel: prepare VUE layout for more than 2 layouts")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13685
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>
2025-08-13 06:24:44 +00:00
Sagar Ghuge
cac3b4f404 anv: Mask off excessive invocations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For unaligned invocations, don't launch two COMPUTE_WALKER, instead we
can mask off excessive invocations in the shader itself at nir level and
launch one additional workgroup.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>
2025-08-12 23:17:02 +00:00
Kenneth Graunke
5e9de5317e brw: Validate that send payloads can't be imms or have source mods
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
To ensure we haven't missed resolving these things.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
2025-08-08 22:12:11 +00:00
Kenneth Graunke
22165defb5 brw: Drop interlock and memory fence logical opcodes from is_payload()
These are lowered to sends prior to any callers of this helper.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
2025-08-08 22:12:11 +00:00
Kenneth Graunke
ed4fadbb16 brw: Drop INTERPOLATE_AT_* opcodes from is_payload()
These are lowered to sends prior to any callers of this helper.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
2025-08-08 22:12:10 +00:00