Commit graph

4434 commits

Author SHA1 Message Date
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Lionel Landwerlin
cea714329c brw: make more passes printable through NIR_DEBUG
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:00 +00:00
Marek Olšák
db26597f8d intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility
NIR is going to use exec_node/list without the C++ code, and may switch to
a different linked list implementation in the future.

GLSL is going to use ir_exec_node/list, which we want to keep private
for GLSL, so that we can change it easily.

Thus, it's better to fork the C++ version of list.h for Intel.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36425>
2025-07-31 20:23:02 +00:00
Caio Oliveira
f222b16f92 brw: Remove extra iteration on instructions from brw_opt_address_reg_load
The helper function already iterate instructions.

Fixes: 8ac7802ac8 ("brw: move final send lowering up into the IR")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36478>
2025-07-31 19:45:16 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Caio Oliveira
f2a49081de brw: Use ralloc helpers for string handling in brw_eu_validate
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36339>
2025-07-30 17:59:26 +00:00
Lionel Landwerlin
60932e8fae brw: always ensure coarse pixel is disabled on Gfx9
No HW support there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36457>
2025-07-30 07:57:19 +00:00
Lionel Landwerlin
aa6810b706 brw: consider LOAD_PAYLOAD fully defined
It's mostly used for SEND messages and fully defines the register data
(that's its purpose after all).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36457>
2025-07-30 07:57:19 +00:00
Lionel Landwerlin
9371e8d370 brw: fixup coarse_z computation
The delivered values in the coarse pixel size are 0 when coarse pixel
dispatch is disabled and that is screwing up our half pixel offset
adjustment.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36457>
2025-07-30 07:57:19 +00:00
Lionel Landwerlin
9dac7dda87 brw: fixup source depth enabling with coarse pixel shading
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36457>
2025-07-30 07:57:18 +00:00
Lionel Landwerlin
68c50d129e brw: fix NIR metadata invalidation with closest-hit shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36457>
2025-07-30 07:57:18 +00:00
José Roberto de Souza
07f5b53dd7 intel/brw: Remove duplicated implementation of brw_imm_uq/brw_imm_u64()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36448>
2025-07-29 16:05:54 +00:00
José Roberto de Souza
14386eb7e5 intel/brw: Add comment to reg_unit()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36448>
2025-07-29 16:05:54 +00:00
José Roberto de Souza
7981a18df2 intel/brw: Nuke unused brw_message_desc_header_present()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36448>
2025-07-29 16:05:53 +00:00
Ian Romanick
fa74c31b22 brw: Allow additional flags registers on Xe2+
Xe2 adds two more flags registers. We barely use the second flags
register on previous platforms, so the omission was not previously
noticed.

There are several efforts in progress that will add using of more flags
registers.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35415>
2025-07-24 23:08:08 +00:00
Ian Romanick
1279f12c84 brw: Implement Wa_22012725308 for flags via SWSB too
At this point, using the per-register granularity will only help in
conjuction with fragment shader discard (which is implemented using f1).

v2: Loop restructuring and code cleanups. Suggested by Curro.

v3: Only apply Wa on Gfx12.5+. Suggested by Curro.

v4: Also apply to implicit flag reads. Suggested by Curro. This version
affects a *lot* more shaders (10,936 on Meteor Lake shader-db versus
4,482 before). The results are still very much in the 🤷 territory.

v5: Add missing dependency. I thought I got them all the previous
time. :( Noticed by Curro.

shader-db:

Lunar Lake
total cycles in shared programs: 886315282 -> 886391040 (<.01%)
cycles in affected programs: 204907250 -> 204983008 (0.04%)
helped: 1 / HURT: 6716

LOST:   0
GAINED: 1

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total cycles in shared programs: 883774789 -> 883921507 (0.02%)
cycles in affected programs: 481836784 -> 481983502 (0.03%)
helped: 4 / HURT: 10936

LOST:   3
GAINED: 7

fossil-db:

Lunar Lake
Totals:
Cycle count: 32600441334 -> 32601862658 (+0.00%); split: -0.00%, +0.00%

Totals from 90283 (11.44% of 789260) affected shaders:
Cycle count: 17265933202 -> 17267354526 (+0.01%); split: -0.00%, +0.01%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Cycle count: 26477292677 -> 26480321805 (+0.01%); split: -0.00%, +0.01%
Max dispatch width: 8010440 -> 8010984 (+0.01%)

Totals from 132952 (14.71% of 903925) affected shaders:
Cycle count: 15349555348 -> 15352584476 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 1085416 -> 1085960 (+0.05%)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35415>
2025-07-24 23:08:07 +00:00
Ian Romanick
1fdcc9039b brw: Add and use brw_reg_is_arf to test for a specific ARF
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35415>
2025-07-24 23:08:07 +00:00
Alyssa Rosenzweig
ecfca8ec6f util: crib SWAP macro from freedreno
we have a bunch of copies across the tree, unify them.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
2025-07-21 11:42:18 +00:00
Caio Oliveira
3c7dd0ccf1 brw: Make brw_builder() shader constructor use CFG if available
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Properly pick the end of the last block as a cursor.  Also remove the
default constructor since is not needed anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:48 +00:00
Caio Oliveira
ab8af62745 brw: Use a builder to track position in lower_simd
Removes brw_builder::at() since it is now unused, replaced by various
other helpers.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:48 +00:00
Caio Oliveira
8826b1e680 brw: Use a more specific builder helper in combine constants
Also remove commentary about older Gfx versions that don't apply
anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:47 +00:00
Caio Oliveira
ac2b072312 brw: Add more specific brw_builder helpers
Replace uses of brw_builder::at() with various more descriptive
variants.  Use block pointer from instruction when possible.

A couple of special cases remained and will be handled in separate patches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:47 +00:00
Caio Oliveira
6c5132ec9a brw: Move insert/remove code to the block
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:46 +00:00
Caio Oliveira
2dfd4dcbc5 brw: Fix cmat conversion between bfloat16 and non-float32
The HW only supports converting BRW_TYPE_BF values to/from BRW_TYPE_F,
so intermediate conversion is needed.  Move the intermediate conversion
to the implementation of `@convert_cmat_intel` and simplify the
brw_nir_lower_cooperative_matrix pass.  This has two positive effects

- Fixes conversion between BF and integer type cooperative matrices,
  that was still using the old emit_alu1 approach instead of the new
  code for `@convert_cmat_intel`.

- Guarantee the intermediate conversion will result in a valid layout
  for conversions involved USE_B matrices.  If we instead used the
  intrinsic twice in brw_nir_lower_cooperative_matrix.c, a matrix with
  invalid layout would be visible at NIR level and we wouldn't be able
  to keep the current assertion for USE_B case.

Due to the configurations we have exposed, we still don't need to
write a more complex USE_B conversion -- they are all between same
size types (and, consequently, packing factors), so no shuffling of
data is needed to respect the USE_B layout.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36185>
2025-07-18 21:55:43 +00:00
Ian Romanick
2594fcadd4 brw: Split virtual GRFs again at the end of optimizations
Logical sends and load_payload can have large VGRFs that cannot be
split. Once all of the lowering passes and optimization passes that
might eliminate any of those instructions have completed, try to split
larger VGRFs one last time.

Register allocation can only handle VGRFs up to a certain size, so this
is the last opportunity to prevent later failures due to VGRFs that are
too large.

Closes: #13239

shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17114494 -> 17114496 (<.01%)
instructions in affected programs: 2790 -> 2792 (0.07%)
helped: 2 / HURT: 4

total cycles in shared programs: 886617364 -> 886315282 (-0.03%)
cycles in affected programs: 4067540 -> 3765458 (-7.43%)
helped: 48 / HURT: 9

Ice Lake and Skylake had similar restuls. (Ice Lake shown)
total instructions in shared programs: 20799801 -> 20799691 (<.01%)
instructions in affected programs: 1210 -> 1100 (-9.09%)
helped: 1 / HURT: 0

total cycles in shared programs: 865495386 -> 865498990 (<.01%)
cycles in affected programs: 60132 -> 63736 (5.99%)
helped: 2 / HURT: 1

total spills in shared programs: 3987 -> 3981 (-0.15%)
spills in affected programs: 24 -> 18 (-25.00%)
helped: 1 / HURT: 0

total fills in shared programs: 3535 -> 3519 (-0.45%)
fills in affected programs: 36 -> 20 (-44.44%)
helped: 1 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 208647246 -> 208646499 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31257819536 -> 31263957016 (+0.02%); split: -0.02%, +0.04%
Max live registers: 66160877 -> 66155728 (-0.01%)

Totals from 34703 (4.91% of 707053) affected shaders:
Instrs: 13766639 -> 13765892 (-0.01%); split: -0.02%, +0.01%
Cycle count: 3693572086 -> 3699709566 (+0.17%); split: -0.15%, +0.32%
Max live registers: 4843852 -> 4838703 (-0.11%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36202>
2025-07-18 19:04:01 +00:00
Ian Romanick
f6da6399d7 brw/reg_allocate: Don't access out of bounds in non-debug builds
In debug builds, the assertion should be preferred as it will highlight
the actual problem. In non-debug builds, it is possible to fail register
allocation more gracefully. If the problem only occurs in, for example,
a SIMD32 version of a shader, the application may even continue to
function.

Closes: #13239
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36202>
2025-07-18 19:04:01 +00:00
Ian Romanick
b57bad1fd7 brw/reg_allocate: Check source / destination hazard for all larger SIMD
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All platforms needs this check for SIMD32. Xe2+ do not need this for
SIMD16.

Also... delete some really stale comments about Gfx4/Gfx5. This compiler
doesn't even support those platforms.

No shader-db changes on any pre-Xe2 Intel platforms:

shader-db:

Lunar Lake
total instructions in shared programs: 17108867 -> 17108855 (<.01%)
instructions in affected programs: 35211 -> 35199 (-0.03%)
helped: 19 / HURT: 6

total cycles in shared programs: 885026794 -> 885805580 (0.09%)
cycles in affected programs: 140449880 -> 141228666 (0.55%)
helped: 903 / HURT: 1142

LOST:   0
GAINED: 25

fossil-db:

Lunar Lake
Totals:
Instrs: 208578317 -> 208574097 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31268800798 -> 31259914590 (-0.03%); split: -0.10%, +0.07%
Spill count: 504472 -> 504102 (-0.07%); split: -0.09%, +0.02%
Fill count: 606581 -> 606079 (-0.08%); split: -0.13%, +0.05%
Scratch Memory Size: 35001344 -> 34957312 (-0.13%)

Totals from 60714 (8.59% of 706970) affected shaders:
Instrs: 48923370 -> 48919150 (-0.01%); split: -0.01%, +0.01%
Cycle count: 11830486210 -> 11821600002 (-0.08%); split: -0.27%, +0.20%
Spill count: 397150 -> 396780 (-0.09%); split: -0.12%, +0.02%
Fill count: 469651 -> 469149 (-0.11%); split: -0.17%, +0.06%
Scratch Memory Size: 25971712 -> 25927680 (-0.17%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
2025-07-15 19:35:44 +00:00
Ian Romanick
7e98ca89f2 brw/reg_allocate: Adjust source / destination hazard conditions for broadcast
Broadcast selects one lane from the source to write to all the lanes
of the destination. This makes it possible for the first half to
overwrite the source used by the second half.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 208705405 -> 208705374 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31274597098 -> 31273711544 (-0.00%); split: -0.00%, +0.00%

Totals from 77 (0.01% of 707133) affected shaders:
Instrs: 220177 -> 220146 (-0.01%); split: -0.02%, +0.00%
Cycle count: 461694212 -> 460808658 (-0.19%); split: -0.33%, +0.14%

No fossil-db changes on any other Intel platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
2025-07-15 19:35:44 +00:00
Ian Romanick
67dc02acc2 brw/reg_allocate: Only add interference for the source with the hazard
shader-db:

Lunar Lake
total instructions in shared programs: 17105892 -> 17105732 (<.01%)
instructions in affected programs: 55720 -> 55560 (-0.29%)
helped: 29 / HURT: 24

total cycles in shared programs: 884342344 -> 884663448 (0.04%)
cycles in affected programs: 154776382 -> 155097486 (0.21%)
helped: 719 / HURT: 761

total spills in shared programs: 3278 -> 3262 (-0.49%)
spills in affected programs: 320 -> 304 (-5.00%)
helped: 4 /HURT: 0

total fills in shared programs: 1632 -> 1616 (-0.98%)
fills in affected programs: 368 -> 352 (-4.35%)
helped: 4 / HURT: 0

LOST:   3
GAINED: 4

No shader-db changes on any other Intel platforms.

fossil-db:

Lunar Lake
Totals:
Instrs: 208696275 -> 208692511 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31325252074 -> 31274118190 (-0.16%); split: -0.27%, +0.11%
Spill count: 504809 -> 504472 (-0.07%); split: -0.07%, +0.01%
Fill count: 607047 -> 606581 (-0.08%); split: -0.08%, +0.01%
Scratch Memory Size: 35037184 -> 35001344 (-0.10%); split: -0.11%, +0.01%

Totals from 44135 (6.24% of 707112) affected shaders:
Instrs: 39570465 -> 39566701 (-0.01%); split: -0.01%, +0.00%
Cycle count: 11140437886 -> 11089304002 (-0.46%); split: -0.76%, +0.30%
Spill count: 279756 -> 279419 (-0.12%); split: -0.13%, +0.01%
Fill count: 354706 -> 354240 (-0.13%); split: -0.14%, +0.01%
Scratch Memory Size: 18758656 -> 18722816 (-0.19%); split: -0.20%, +0.01%

Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
Totals:
Cycle count: 25377247343 -> 25377246251 (-0.00%); split: -0.00%, +0.00%

Totals from 11 (0.00% of 806166) affected shaders:
Cycle count: 899080 -> 897988 (-0.12%); split: -0.48%, +0.36%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
2025-07-15 19:35:43 +00:00
Ian Romanick
4e05de7c3d brw/reg_allocate: Require SIMD32 for destination / source interference on Xe2
No platforms other than Lunar Lake were affected in shader-db or
fossil-db for obvious reasons.

shader-db:

Lunar Lake
total instructions in shared programs: 17070074 -> 17069908 (<.01%)
instructions in affected programs: 151939 -> 151773 (-0.11%)
helped: 61 / HURT: 60

total cycles in shared programs: 891338314 -> 880188516 (-1.25%)
cycles in affected programs: 550482120 -> 539332322 (-2.03%)
helped: 8053 / HURT: 7183

total spills in shared programs: 3294 -> 3278 (-0.49%)
spills in affected programs: 138 -> 122 (-11.59%)
helped: 8 / HURT: 0

total fills in shared programs: 1653 -> 1632 (-1.27%)
fills in affected programs: 212 -> 191 (-9.91%)
helped: 8 / HURT: 0

LOST:   96
GAINED: 70

fossil-db:

Lunar Lake
Totals:
Instrs: 208555066 -> 208509387 (-0.02%); split: -0.03%, +0.00%
Cycle count: 31487691872 -> 31318442816 (-0.54%); split: -0.88%, +0.34%
Spill count: 508701 -> 504809 (-0.77%); split: -0.86%, +0.10%
Fill count: 612583 -> 607047 (-0.90%); split: -1.03%, +0.13%
Scratch Memory Size: 35311616 -> 35037184 (-0.78%); split: -0.81%, +0.04%

Totals from 214417 (30.33% of 706852) affected shaders:
Instrs: 123732970 -> 123687291 (-0.04%); split: -0.04%, +0.01%
Cycle count: 27410928904 -> 27241679848 (-0.62%); split: -1.01%, +0.39%
Spill count: 452458 -> 448566 (-0.86%); split: -0.97%, +0.11%
Fill count: 550991 -> 545455 (-1.00%); split: -1.15%, +0.14%
Scratch Memory Size: 31138816 -> 30864384 (-0.88%); split: -0.92%, +0.04%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
2025-07-15 19:35:43 +00:00
Ian Romanick
e9ae997ffc brw: Only apply GRF 127 send workaround to Gfx9
The portion of the Bspec dedicated to Gfx6-Gfx11 says that this
workaround applies to "Pre-CNL" (with CNL being Gfx10). There is no
mention of this workaround in the sections for Xe or Xe2.

No shader-db or fossil-db changes on Skylake or older Intel platforms.

shader-db:

Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake (Lunar Lake shown)
total instructions in shared programs: 17107031 -> 17107027 (<.01%)
instructions in affected programs: 32182 -> 32178 (-0.01%)
helped: 16 / HURT: 14

total cycles in shared programs: 895016760 -> 894975410 (<.01%)
cycles in affected programs: 312774834 -> 312733484 (-0.01%)
helped: 9279 / HURT: 8091

LOST:   40
GAINED: 33

The pre-Xe2 platforms had a lot more lost / gained shaders. This appears
to be due to churn in the cycle counts and the SIMD32 heuristic.

fossil-db:

Lunar Lake
Totals:
Instrs: 208667436 -> 208671853 (+0.00%); split: -0.00%, +0.01%
Subgroup size: 14241168 -> 14241200 (+0.00%)
Cycle count: 31495149690 -> 31481397970 (-0.04%); split: -0.17%, +0.13%
Spill count: 508467 -> 508701 (+0.05%); split: -0.10%, +0.14%
Fill count: 611979 -> 612583 (+0.10%); split: -0.07%, +0.17%
Scratch Memory Size: 35288064 -> 35311616 (+0.07%); split: -0.07%, +0.14%

Totals from 205773 (29.10% of 707019) affected shaders:
Instrs: 103153541 -> 103157958 (+0.00%); split: -0.01%, +0.01%
Subgroup size: 4563584 -> 4563616 (+0.00%)
Cycle count: 12979963010 -> 12966211290 (-0.11%); split: -0.42%, +0.32%
Spill count: 494741 -> 494975 (+0.05%); split: -0.10%, +0.15%
Fill count: 597988 -> 598592 (+0.10%); split: -0.07%, +0.17%
Scratch Memory Size: 33351680 -> 33375232 (+0.07%); split: -0.08%, +0.15%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 233063764 -> 233057897 (-0.00%); split: -0.01%, +0.00%
Subgroup size: 9892840 -> 9892856 (+0.00%)
Cycle count: 25387597341 -> 25373885583 (-0.05%); split: -0.36%, +0.31%
Spill count: 518469 -> 517940 (-0.10%); split: -0.19%, +0.09%
Fill count: 559444 -> 558537 (-0.16%); split: -0.29%, +0.13%
Scratch Memory Size: 19694592 -> 19658752 (-0.18%); split: -0.21%, +0.03%
Max dispatch width: 7135248 -> 7131672 (-0.05%); split: +0.13%, -0.18%

Totals from 301996 (37.49% of 805603) affected shaders:
Instrs: 144535999 -> 144530132 (-0.00%); split: -0.01%, +0.01%
Subgroup size: 3768528 -> 3768544 (+0.00%)
Cycle count: 18687102311 -> 18673390553 (-0.07%); split: -0.50%, +0.42%
Spill count: 515687 -> 515158 (-0.10%); split: -0.20%, +0.09%
Fill count: 557638 -> 556731 (-0.16%); split: -0.29%, +0.13%
Scratch Memory Size: 18662400 -> 18626560 (-0.19%); split: -0.22%, +0.03%
Max dispatch width: 2029872 -> 2026296 (-0.18%); split: +0.44%, -0.62%

Tiger Lake
Totals:
Instrs: 238813279 -> 238766482 (-0.02%); split: -0.04%, +0.02%
Subgroup size: 9851320 -> 9851328 (+0.00%)
Cycle count: 23668877036 -> 23646286421 (-0.10%); split: -0.51%, +0.42%
Spill count: 559060 -> 554241 (-0.86%); split: -1.12%, +0.26%
Fill count: 595926 -> 591843 (-0.69%); split: -1.46%, +0.78%
Scratch Memory Size: 19929088 -> 19764224 (-0.83%); split: -1.19%, +0.36%
Max dispatch width: 7102184 -> 7101840 (-0.00%); split: +0.13%, -0.13%

Totals from 284125 (35.42% of 802235) affected shaders:
Instrs: 144695094 -> 144648297 (-0.03%); split: -0.06%, +0.03%
Subgroup size: 3567312 -> 3567320 (+0.00%)
Cycle count: 11303753658 -> 11281163043 (-0.20%); split: -1.07%, +0.87%
Spill count: 554624 -> 549805 (-0.87%); split: -1.13%, +0.26%
Fill count: 592252 -> 588169 (-0.69%); split: -1.47%, +0.78%
Scratch Memory Size: 19553280 -> 19388416 (-0.84%); split: -1.21%, +0.37%
Max dispatch width: 1895488 -> 1895144 (-0.02%); split: +0.48%, -0.50%

Ice Lake
Totals:
Instrs: 239034316 -> 239049108 (+0.01%); split: -0.03%, +0.04%
Subgroup size: 9926440 -> 9926448 (+0.00%)
Cycle count: 24944253156 -> 24919967386 (-0.10%); split: -0.25%, +0.15%
Spill count: 575498 -> 571612 (-0.68%); split: -1.18%, +0.51%
Fill count: 709760 -> 716665 (+0.97%); split: -1.31%, +2.28%
Scratch Memory Size: 20699136 -> 20599808 (-0.48%); split: -1.45%, +0.97%
Max dispatch width: 7140856 -> 7143568 (+0.04%); split: +0.15%, -0.12%

Totals from 233451 (29.01% of 804669) affected shaders:
Instrs: 127440610 -> 127455402 (+0.01%); split: -0.07%, +0.08%
Subgroup size: 2835784 -> 2835792 (+0.00%)
Cycle count: 11818511030 -> 11794225260 (-0.21%); split: -0.53%, +0.32%
Spill count: 559557 -> 555671 (-0.69%); split: -1.22%, +0.52%
Fill count: 694460 -> 701365 (+0.99%); split: -1.34%, +2.33%
Scratch Memory Size: 19774464 -> 19675136 (-0.50%); split: -1.52%, +1.02%
Max dispatch width: 1602736 -> 1605448 (+0.17%); split: +0.69%, -0.52%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
2025-07-15 19:35:42 +00:00
Caio Oliveira
f8db53ccae brw: Fix comparison with unordered_mode when making baked dependency
The unordered mode stored in dependencies might be a bitmask and not
only a single mode.  In practice, only the "stronger" mode will stick.
Make sure that the code testing for the mode uses "&" instead of "==",
to avoid prevent some valid combinations to happen, e.g.

```
   // ...
   add(16)         g104<1>F        g94<1,1,0>F     g34<1,1,0>F     { align1 1H @7 $7.dst compacted };
```

which without the fix ends up as

```
   // ...
   sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N F@7 };
   add(16)         g104<1>F        g94<1,1,0>F     g34<1,1,0>F     { align1 1H $7.dst compacted };
```

Enables two tests for the scoreboard pass that illustrate this case.

For measuring the effect, re-enabled the sync.nop accounting on total of
instructions and got the following results.

```
   Totals:
   Instrs: 322041261 -> 321748285 (-0.09%)
   Cycle count: 22864587567 -> 22863073741 (-0.01%)
   Max dispatch width: 7989040 -> 7989024 (-0.00%); split: +0.00%, -0.00%

   Totals from 88212 (9.78% of 902056) affected shaders:
   Instrs: 102282050 -> 101989074 (-0.29%)
   Cycle count: 12787629859 -> 12786116033 (-0.01%)
   Max dispatch width: 525336 -> 525320 (-0.00%); split: +0.01%, -0.01%
```

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36096>
2025-07-14 20:28:54 +00:00
Caio Oliveira
1e18a2d1a8 brw: Add scoreboard test for edge case involving baked dependency
This is disable because it is adding a `sync.nop` instead of baking
together both "@3 $0.dst".

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36096>
2025-07-14 20:28:54 +00:00
jhananit
debd903a00 intel: Update all NIR_PASS_V to NIR_PASS
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35889>
2025-07-14 19:25:52 +00:00
Sagar Ghuge
36172c41dc intel/compiler: Drop unused param from set_memory_address
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36092>
2025-07-14 03:46:21 +00:00
Caio Oliveira
887642b0f2 intel: Add INTEL_DEBUG=no-vrt
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Add support for disabling the VRT (Variable Register Thread) feature.
The strategy here is to force the old BRW_MAX_GRF limit for the
register allocator (locks the upper limit) and make sure
ptl_register_blocks() always return that amount of blocks (locks
the lower limit).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35781>
2025-07-13 21:11:02 +00:00
Ian Romanick
5adab50283 brw/nir: Use nir_opt_reassociate_matrix_mul
This needs to be called before intel_nir_opt_peephole_ffma, so I
arbitrarilly decided to call it right before.

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17120227 -> 17118227 (-0.01%)
instructions in affected programs: 5854 -> 3854 (-34.16%)
helped: 51 / HURT: 0

total cycles in shared programs: 895497762 -> 894733940 (-0.09%)
cycles in affected programs: 4603518 -> 3839696 (-16.59%)
helped: 95 / HURT: 21

LOST:   1
GAINED: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35925>
2025-07-09 19:28:49 +00:00
Sviatoslav Peleshko
8d22eb960b brw/disasm: Fix Gfx11 3src-instructions dst register disassembly
The conversion from bit value to register file type is already done
by the brw_eu_inst_3src_a1_dst_reg_file in the FFC macro now, so doing it
again produced incorrect results.

Fixes: e7179232 ("intel/brw: Move encoding of Gfx11 3-src inside the inst helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13141
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35960>
2025-07-08 19:49:09 +00:00
Daniel Schürmann
2c51a8870d nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.

The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
2025-07-08 15:33:59 +00:00
Marek Olšák
8def3f865d agx,freedreno,intel,lima,panfrost,svga,virgl,zink: fix supports_indirect_inputs
The GLSL compiler always lowers inputs to temps for VS and GS, so exclude
them from driver support because the GLSL compiler will no longer do that
unconditionally. Thus, indirect VS and GS inputs are completely untested
and broken in a lot of drivers.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
2025-07-08 06:11:42 +00:00
Alyssa Rosenzweig
d31cb824df treewide: use VARYING_BIT_*
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Via Coccinelle patch generated by the following Python:

  varys = [ "POS", "COL0", "COL1", "FOGC", "TEX0", "TEX1", "TEX2", "TEX3", "TEX4",
           "TEX5", "TEX6", "TEX7", "PSIZ", "BFC0", "BFC1", "EDGE", "CLIP_VERTEX",
           "CLIP_DIST0", "CLIP_DIST1", "CULL_DIST0", "CULL_DIST1", "PRIMITIVE_ID",
           "PRIMITIVE_COUNT", "LAYER", "VIEWPORT", "FACE",
           "PRIMITIVE_SHADING_RATE", "PNTC", "TESS_LEVEL_OUTER",
           "TESS_LEVEL_INNER", "PRIMITIVE_INDICES", "BOUNDING_BOX0",
           "BOUNDING_BOX1", "VIEWPORT_MASK", "CULL_PRIMITIVE" ]
  t = """
  @@
  @@

  -(1 << VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -BITFIELD_BIT(VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -(1ull << VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -BITFIELD64_BIT(VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  """
  for v in varys:
      from mako.template import Template
      print(Template(t).render(V = v))

Closes: #13453
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [panfrost, common]
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [broadcom]
Reviewed-by: Corentin Noël <corentin.noel@collabora.com> [virgl]
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> [zink]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35917>
2025-07-04 19:01:04 +00:00
Matt Turner
e6242fb958 brw: Handle bfloat16 dest and src0 operands for DPAS
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35320>
2025-07-02 20:06:59 +00:00
Caio Oliveira
c006bee22d brw: Don't use simd_select for BS shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since there's only one possible SIMD, don't need to use
the helpers to decide which one to compile.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35799>
2025-07-02 19:48:04 +00:00
Caio Oliveira
c733f07378 brw: Use the right width in brw_nir_apply_key for BS shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 23c7142cd6 ("anv: disable SIMD16 for RT shaders")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35798>
2025-07-02 15:32:23 +00:00
Lionel Landwerlin
343f3dd3c1 brw: fix non constant BTI accesses with offsets
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35822>
2025-07-02 01:04:06 +03:00
Lionel Landwerlin
89f3ee4cb2 brw: remove debug printf
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: fcf4401824 ("brw: handle wa_18019110168 with independent shader compilation")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35815>
2025-06-29 12:39:03 +03:00
Lionel Landwerlin
a742b859bd anv: add support for handling wa_18019110168 with gfx-libs
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:35 +00:00
Lionel Landwerlin
fcf4401824 brw: handle wa_18019110168 with independent shader compilation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:35 +00:00
Lionel Landwerlin
bc8d18aee2 brw: make a helper for vertex attribute offset computation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:34 +00:00
Lionel Landwerlin
8fabcd754f brw: move primitive_id_index field in fs_msaa
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:34 +00:00