Commit graph

4838 commits

Author SHA1 Message Date
Lionel Landwerlin
ec456e99f2 brw: add a pass to lower ubo to push constant data
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
2026-01-09 14:19:49 +00:00
Lionel Landwerlin
2c7254c131 brw: invert condition to reduce code nesting
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
2026-01-09 14:19:48 +00:00
Caio Oliveira
dcefa0e6b3 brw: Rework UIP and JIP setting code
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The current code walks the instructions, and when needed,
it will scan to find the next "end of scope" and sometimes
the next "end of block".  It also has a separate patching
logic for HALTs.

The new code collects the necessary scope information up front,
then walks the instruction backwards, making avoiding the need
to scan for the end of scope.  It will also walk only the
relevant instructions that were previously collected.  It also
replaces the previous HALT-specific patching logic.

With this new change, many cases that were jumping to
intermediate HALTs, will now jump straight to the end of
scope (or the "end of the program" section).  E.g. in

```
   if
      ...
      (...) HALT
      ...
      (...) HALT
   endif
```

both HALTs now will jump to the end of the scope, instead of the
first HALT jumping into the second one.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38914>
2026-01-08 22:01:45 +00:00
Caio Oliveira
c939744d2d brw: Consolidate generator code for emitting "regular" instructions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Most of instructions follow the basic formats (1, 2 and 3 src), so
consolidate their emission code in generator.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
2026-01-08 16:47:02 +00:00
Caio Oliveira
e1e055f23f brw: Move LRP related validation
Move validation, noting that LRP only supports BRW_TYPE_F -- the
previous assert had DF because it also was used by MAD in the past.
With that change, ALU3F can be replaced by ALU3 for LRP.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
2026-01-08 16:47:02 +00:00
Caio Oliveira
68e1a07181 brw: Move normalization of 3-src instructions swizzles to a single place
When repctrl is used, the swizzle/chansel is ignored.  Instead of setting
a swizzle that has all zeros and encode that, don't encode anything.

For context see e7598c5a62 ("intel/compiler: Set swizzle to BRW_SWIZZLE_XXXX
for scalar region").

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
2026-01-08 16:47:01 +00:00
José Roberto de Souza
0cc73385e6 intel/brw: Document UBO_START
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
2026-01-07 14:25:42 +00:00
José Roberto de Souza
961ca451e0 intel/brw: Add comment to ubo_ranges
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
2026-01-07 14:25:42 +00:00
Georg Lehmann
eb4737a1dd nir: add nir_alu_instr_is_exact helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
2026-01-07 09:40:57 +00:00
Marek Olšák
1912a00a91 ALL: use SHA1_DIGEST_LENGTH etc. instead of hardcoding the numbers
only build_id is switched to use literal 20 instead of SHA1_DIGEST_LENGTH
because we will increase SHA1_DIGEST_LENGTH to BLAKE3_KEY_LEN

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39110>
2026-01-07 08:32:33 +00:00
José Roberto de Souza
6f031a98e0 intel/brw: Nuke brw_inst::is_volatile()
There is no users for that function, is_volatile is only used in
brw_opt_cse.cpp is_expression() but it access the information using brw_send_inst
struct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39104>
2026-01-05 14:11:47 +00:00
Georg Lehmann
f3290219ab nir: use a seperate enum for per alu floating point math control
We don't need one bit per bitsize per instruction if only one actually
matters in the end.

First step towards moving NIR in the direction of full float_controls2
only.

Also rename this from fp_fast_math, because that name implied that 0 is
the no fast math mode, while the opposite was the case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
2025-12-29 10:57:05 +00:00
Sushma Venkatesh Reddy
d1d4e3d530 brw: Add EU assembler support for float8
Decode logic in Gfx12+ has become complex with the new types, so Caio
suggested that we move to the table like other gens.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
2025-12-19 00:09:53 +00:00
Jordan Justen
0088aae481 intel/brw: Add new encode/decode for use with brw_data_type_float/int
Rework:
 * Sushma: Add BF in brw_data_type_encode, brw_data_type_decode

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
2025-12-19 00:09:53 +00:00
Jordan Justen
46e843f76e intel/brw: Add brw_data_type_float/brw_data_type_int
These type encodings were first were used in dpas instructions, but
continue to be used in more places.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
2025-12-19 00:09:52 +00:00
Sushma Venkatesh Reddy
54accefed2 brw: Add BRW_TYPE_BF8 and BRW_TYPE_HF8 for float8
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
2025-12-19 00:09:52 +00:00
Ian Romanick
b967942b64 brw: Do cmod prop again after scheduling
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
After selecting the scheduling mode, do cmod prop again. It's possible
that doing cmod prop between performing a schedule and trying to
register allocate would cause a different scheduling mode to be
selected. However, this would require fully restoring the pre-schedule
set of instructions (via cloning). I have tried to implement this, and
it's harder than it looks. :(

v2: Delete unused variable `progress`. Noticed by Marge.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19967018 -> 19967006 (<.01%)
instructions in affected programs: 10652 -> 10640 (-0.11%)
helped: 4 / HURT: 0

total cycles in shared programs: 884129990 -> 884139590 (<.01%)
cycles in affected programs: 20334512 -> 20344112 (0.05%)
helped: 0 / HURT: 4

fossil-db:

Lunar Lake
Totals:
Instrs: 924967191 -> 924963460 (-0.00%); split: -0.00%, +0.00%
Cycle count: 105962414958 -> 105961925594 (-0.00%); split: -0.00%, +0.00%
Spill count: 3423582 -> 3423564 (-0.00%); split: -0.00%, +0.00%
Fill count: 4877121 -> 4876955 (-0.00%); split: -0.00%, +0.00%

Totals from 2511 (0.12% of 2018786) affected shaders:
Instrs: 12541707 -> 12537976 (-0.03%); split: -0.03%, +0.00%
Cycle count: 4816359238 -> 4815869874 (-0.01%); split: -0.01%, +0.00%
Spill count: 179536 -> 179518 (-0.01%); split: -0.03%, +0.02%
Fill count: 279407 -> 279241 (-0.06%); split: -0.07%, +0.01%

Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 980252404 -> 980237686 (-0.00%); split: -0.00%, +0.00%
Cycle count: 91758669556 -> 91764028404 (+0.01%); split: -0.00%, +0.01%
Spill count: 3664771 -> 3664744 (-0.00%); split: -0.00%, +0.00%
Fill count: 4962078 -> 4960482 (-0.03%); split: -0.04%, +0.01%

Totals from 8472 (0.38% of 2251522) affected shaders:
Instrs: 34977623 -> 34962905 (-0.04%); split: -0.04%, +0.00%
Cycle count: 6251857553 -> 6257216401 (+0.09%); split: -0.04%, +0.13%
Spill count: 480251 -> 480224 (-0.01%); split: -0.01%, +0.00%
Fill count: 676539 -> 674943 (-0.24%); split: -0.28%, +0.05%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
09450faf6a brw: Do cmod prop again after post-RA scheduling
shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968728 -> 19963825 (-0.02%)
instructions in affected programs: 788014 -> 783111 (-0.62%)
helped: 2503 / HURT: 0

total cycles in shared programs: 884112912 -> 884093268 (<.01%)
cycles in affected programs: 20017168 -> 19997524 (-0.10%)
helped: 1830 / HURT: 52

LOST:   0
GAINED: 6

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 980768016 -> 980172179 (-0.06%)
Cycle count: 91762351767 -> 91757280093 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 37602592 -> 37608768 (+0.02%)

Totals from 157150 (6.98% of 2251329) affected shaders:
Instrs: 107323207 -> 106727370 (-0.56%)
Cycle count: 12696754006 -> 12691682332 (-0.04%); split: -0.04%, +0.00%
Max dispatch width: 3708584 -> 3714760 (+0.17%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
08d71730ca brw/cmod: Propagate to an instruction with same source
Detect cases like

    mov.nz.f0.0(8)  null<1>D       g66<8,8,1>D
    (+f0.0) sel(8)  g123<1>UD      g87<8,8,1>UD   g84<8,8,1>UD
    mov.nz.f0.0(8)  null<1>D       g66<8,8,1>D
    (+f0.0) sel(8)  g124<1>UD      g88<8,8,1>UD   g85<8,8,1>UD

Either MOV instruction could also be an equivalent CMP.

v2: Require no predicate, groups match, and flags written match.

v3: Add some more unit tests. Suggested by Caio.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17203627 -> 17203590 (<.01%)
instructions in affected programs: 51432 -> 51395 (-0.07%)
helped: 37 / HURT: 0

total cycles in shared programs: 879884982 -> 879884670 (<.01%)
cycles in affected programs: 6014730 -> 6014418 (<.01%)
helped: 25 / HURT: 4

fossil-db:
Lunar Lake
Totals:
Instrs: 925092938 -> 925071952 (-0.00%); split: -0.00%, +0.00%
Cycle count: 105972157149 -> 105966120894 (-0.01%); split: -0.01%, +0.00%
Spill count: 3423592 -> 3423582 (-0.00%)
Fill count: 4876743 -> 4877121 (+0.01%); split: -0.00%, +0.01%
Max live registers: 193525293 -> 193525251 (-0.00%)
Max dispatch width: 49047056 -> 49047088 (+0.00%); split: +0.00%, -0.00%

Totals from 17714 (0.88% of 2018791) affected shaders:
Instrs: 56708169 -> 56687183 (-0.04%); split: -0.04%, +0.00%
Cycle count: 4560530879 -> 4554494624 (-0.13%); split: -0.15%, +0.01%
Spill count: 434846 -> 434836 (-0.00%)
Fill count: 807443 -> 807821 (+0.05%); split: -0.02%, +0.07%
Max live registers: 4332542 -> 4332500 (-0.00%)
Max dispatch width: 295248 -> 295280 (+0.01%); split: +0.02%, -0.01%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 995075628 -> 995051291 (-0.00%); split: -0.00%, +0.00%
Cycle count: 92060967154 -> 92059311640 (-0.00%); split: -0.00%, +0.00%
Spill count: 3664664 -> 3664675 (+0.00%); split: -0.00%, +0.00%
Fill count: 4961929 -> 4961874 (-0.00%); split: -0.00%, +0.00%
Max live registers: 121480292 -> 121480184 (-0.00%)
Max dispatch width: 37947528 -> 37947496 (-0.00%)

Totals from 20569 (0.90% of 2278279) affected shaders:
Instrs: 57437989 -> 57413652 (-0.04%); split: -0.04%, +0.00%
Cycle count: 4297505238 -> 4295849724 (-0.04%); split: -0.06%, +0.03%
Spill count: 487508 -> 487519 (+0.00%); split: -0.00%, +0.00%
Fill count: 869228 -> 869173 (-0.01%); split: -0.01%, +0.00%
Max live registers: 2413028 -> 2412920 (-0.00%)
Max dispatch width: 239280 -> 239248 (-0.01%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1012570598 -> 1012546137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85579989052 -> 85589116671 (+0.01%); split: -0.00%, +0.01%
Spill count: 3901755 -> 3901748 (-0.00%)
Fill count: 6799383 -> 6799367 (-0.00%)
Max live registers: 122288761 -> 122288658 (-0.00%)

Totals from 20595 (0.90% of 2280449) affected shaders:
Instrs: 57764192 -> 57739731 (-0.04%); split: -0.04%, +0.00%
Cycle count: 3899898675 -> 3909026294 (+0.23%); split: -0.04%, +0.27%
Spill count: 481262 -> 481255 (-0.00%)
Fill count: 1057996 -> 1057980 (-0.00%)
Max live registers: 2412395 -> 2412292 (-0.00%)

Skylake
Totals:
Instrs: 516619178 -> 516617390 (-0.00%)
Cycle count: 57593545602 -> 57592502019 (-0.00%); split: -0.00%, +0.00%
Fill count: 860403 -> 860402 (-0.00%)
Max live registers: 87553761 -> 87553649 (-0.00%)

Totals from 1357 (0.08% of 1730068) affected shaders:
Instrs: 3575640 -> 3573852 (-0.05%)
Cycle count: 1772148559 -> 1771104976 (-0.06%); split: -0.06%, +0.00%
Fill count: 68917 -> 68916 (-0.00%)
Max live registers: 131237 -> 131125 (-0.09%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
50f2cd7366 brw/dce: Don't generate more NULL destinations after brw_lower_3src_null_dest
Later commits will call DCE after lowering has been performed. Creating
more things that would need lowering is problematic.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
24cd8aa3b8 brw/cmod: Allow FIXED_GRF
Later commits will call cmod prop after register allocation. At that
time, there is only FIXED_GRF.

No shader-db or fossil-db changes on any Intel platform.

v2: FIXED_GRF uses subnr instead of offset. Add a unit test to
demonstrate the issue.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
d7227b11a1 brw: elk: Disable can_do_cmod for MACH
PRMs for G35 (Gfx4) through Ivy Bridge (Gfx7) all say that conditional
modifiers are allowed for MACH. Starting with Haswell (Gfx7.5), this
seems to be removed. This function doesn't have any way to know the
platform, so false is returned for all platforms.

No shader-db or fossil-db changes on any Intel platform.

Prevents a failure in "brw: Do cmod prop again after post-RA scheduling"
in piglit's builtin-uint-mad_sat-1.0.generated.cl.

Cc: stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
ba30794847 brw/cmod: Don't propagate between instructions in different groups
The group implicity selects which flags the instruction can write. This
was discovered while working on another set of changes that could change
some logical operations into predicated MOV instructions.

Prevents regressions later in the series in
dEQP-VK.graphicsfuzz.cov-loop-fragcoord-identical-condition.

No shader-db or fossil-db changes on any Intel platform.

v2: Update the comment in the test case. Suggested by Caio.

Fixes: 95ac3b1dae ("i965/fs: don't propagate cmod when the exec sizes differ")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Ian Romanick
c0fb93506b brw: Add brw_reg::is_grf
v2: Add a function comment. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
2025-12-18 15:15:20 -08:00
Alyssa Rosenzweig
61dc9201a1 brw: constant fold before texture lowering
This ensures we don't need dynamic stuff. Noticed when debugging weird
regressions around the mcs lowering. ARL:

total instructions in shared programs: 19857061 -> 19854964 (-0.01%)
instructions in affected programs: 91768 -> 89671 (-2.29%)
helped: 154
HURT: 0
helped stats (abs) min: 9.0 max: 33.0 x̄: 13.62 x̃: 13
helped stats (rel) min: 0.51% max: 40.91% x̄: 4.66% x̃: 3.36%
95% mean confidence interval for instructions value: -14.04 -13.19
95% mean confidence interval for instructions %-change: -5.49% -3.84%
Instructions are helped.

total cycles in shared programs: 884538769 -> 884485530 (<.01%)
cycles in affected programs: 10508994 -> 10455755 (-0.51%)
helped: 116
HURT: 38
helped stats (abs) min: 4.0 max: 15238.0 x̄: 666.22 x̃: 148
helped stats (rel) min: 0.01% max: 34.53% x̄: 2.58% x̃: 1.07%
HURT stats (abs)   min: 4.0 max: 4027.0 x̄: 632.68 x̃: 302
HURT stats (rel)   min: 0.01% max: 32.75% x̄: 3.46% x̃: 0.59%
95% mean confidence interval for cycles value: -631.32 -60.09
95% mean confidence interval for cycles %-change: -2.06% -0.12%
Cycles are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39023>
2025-12-18 17:55:29 +00:00
Kenneth Graunke
d83c699045 brw: Convert GS pulled inputs to use URB intrinsics
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
2025-12-18 06:39:02 +00:00
Kenneth Graunke
eae3bd19d4 brw: Move GS URB Read Length limiting to brw_nir_lower_gs_inputs()
We're going to be deciding on push vs. pull in the NIR lowering pass
soon, so move the code to limit our register usage from brw's thread
payload code to brw_nir_lower_gs_inputs().

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
2025-12-18 06:39:02 +00:00
Kenneth Graunke
8889802271 brw: Make max_push_bytes a parameter to URB lowering data
This allows us to program something other than a stage-based constant.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
2025-12-18 06:39:02 +00:00
Kenneth Graunke
f62f7d80e2 brw: Update try_load_push_input to handle dword-unit offsets too
We don't need this case today, but it's trivial to handle.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
2025-12-18 06:39:01 +00:00
Caio Oliveira
9c16bbd023 brw: Perform mark_last_urb_write_with_eot optimization after CFG
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Avoid using exec_node::remove() and the initial "main list of
instructions", and instead use the existing helpers like other
passes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37146>
2025-12-16 17:02:58 +00:00
Caio Oliveira
e53576a559 brw: Move MATH related validation
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Moved existing checks to EU validation and added a few more
based on instruction description in the various PRMs / BSpec.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:46 +00:00
Caio Oliveira
55863c1267 brw: Add EU validation for ROR/ROL
And remove asserts() in generator.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:46 +00:00
Caio Oliveira
47d8ed1177 brw: Move PLN/LINE normalization
Add validation for Source 0 and move the normalization into
the code producing the instruction.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:44 +00:00
Caio Oliveira
3f436bdc6e brw: Make LINE normalization into validation
Add validation for Source 0.  Should not cause problems
since this instruction is not used by the compiler anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:43 +00:00
Caio Oliveira
75cf20f0eb brw: Remove LINE from brw_builder and brw_generator
Gfx9 only instruction that is not used anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:42 +00:00
Caio Oliveira
cd3e3dd0d3 brw: Drop asserts for brw_SRND
These are already covered by the EU validation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:41 +00:00
Caio Oliveira
68190499df brw: Move ADD related validation
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:40 +00:00
Caio Oliveira
6ae92d3372 brw: Move AVG related validation
Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:38 +00:00
Caio Oliveira
6d8d733d4d brw: Move MUL related validation
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:34 +00:00
Kenneth Graunke
26523bedec brw: Call nir_opt_offsets for mesh shaders
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
d831f38d11 brw: Delete all the old backend mesh/task URB handling code
This has all been replaced by NIR lowering to URB intrinsics.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
d0dc45955d brw: Lower task shader payload access in NIR
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets.  Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts.  But we can't do the mesh URB lowering
there because that doesn't have the MUE map.

It's not that much code as a separate pass, though.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
bd0c173595 brw: Lower mesh shader outputs in NIR
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.

MUE initialization is now done with semantic IO instead of raw offsets.

This drops another case of non-standard NIR IO usage (and no_validate).

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:44 +00:00
Kenneth Graunke
6e5cc63a3a brw: Extend URB lowering infrastructure to handle mesh shader outputs
Mesh shaders introduce per-primitive outputs, and also our MUE layout
has per-vertex data starting at an offset.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:43 +00:00
Lionel Landwerlin
60db7f20c9 brw: move MUE initialization out of the SIMD loop
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:42 +00:00
Lionel Landwerlin
d3053fb3d2 brw: Implement URB handle intrinsics for task/mesh stages
(Split by Ken from a larger patch originally written by Lionel.)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:40 +00:00
Kenneth Graunke
d18423b116 brw: Make lower_{inputs,outputs}_to_urb_intrinsics non-static
I want to reuse these in brw_compile_mesh.cpp.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:40 +00:00
Kenneth Graunke
788c49ecc6 brw: Extend load_urb/store_urb to handle 32-bit non-vec4-aligned access
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)

The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets.  For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.

Still, most fields are still conveniently vec4-aligned.

To handle this, we add a new cb_data::vec4_access flag.  If set, access
remains in vec4 units, with vec4 alignment.  We use this for non-mesh
stages.  When unset, offset is in 32-bit units, allowing unaligned
DWord access.

This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing.  On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).

Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked.  We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can).  If
not, we use the ability to have dynamic writemasks to sort it out.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:38 +00:00
Kenneth Graunke
2b700f6bfd brw: Delete attr_desc struct
Unused since commit 18bbcf9a63.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:37 +00:00
Kenneth Graunke
8177695403 brw: Add missed access to store_urb_lsc_intel intrinsics
I forgot to copy this over in the LSC case.  This meant we were missing
reorderability which meant that we were missing out on CSE.

fossil-db results on Battlemage:

   Instrs: 231471427 -> 231363032 (-0.05%)
   Send messages: 12077759 -> 12019628 (-0.48%)
   Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
   Spill count: 520387 -> 520135 (-0.05%)
   Fill count: 470812 -> 470722 (-0.02%)
   Max live registers: 72111834 -> 71873886 (-0.33%)

   Totals from 2898 (0.37% of 788851) affected shaders:
   Instrs: 1223836 -> 1115441 (-8.86%)
   Send messages: 148633 -> 90502 (-39.11%)
   Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
   Spill count: 252 -> 0 (-inf%)
   Fill count: 90 -> 0 (-inf%)
   Max live registers: 491684 -> 253736 (-48.39%)
   Non SSA regs after NIR: 255397 -> 255402 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:36 +00:00