Commit graph

4808 commits

Author SHA1 Message Date
Caio Oliveira
e53576a559 brw: Move MATH related validation
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Moved existing checks to EU validation and added a few more
based on instruction description in the various PRMs / BSpec.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:46 +00:00
Caio Oliveira
55863c1267 brw: Add EU validation for ROR/ROL
And remove asserts() in generator.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:46 +00:00
Caio Oliveira
47d8ed1177 brw: Move PLN/LINE normalization
Add validation for Source 0 and move the normalization into
the code producing the instruction.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:44 +00:00
Caio Oliveira
3f436bdc6e brw: Make LINE normalization into validation
Add validation for Source 0.  Should not cause problems
since this instruction is not used by the compiler anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:43 +00:00
Caio Oliveira
75cf20f0eb brw: Remove LINE from brw_builder and brw_generator
Gfx9 only instruction that is not used anymore.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:42 +00:00
Caio Oliveira
cd3e3dd0d3 brw: Drop asserts for brw_SRND
These are already covered by the EU validation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:41 +00:00
Caio Oliveira
68190499df brw: Move ADD related validation
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:40 +00:00
Caio Oliveira
6ae92d3372 brw: Move AVG related validation
Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:38 +00:00
Caio Oliveira
6d8d733d4d brw: Move MUL related validation
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:34 +00:00
Kenneth Graunke
26523bedec brw: Call nir_opt_offsets for mesh shaders
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
d831f38d11 brw: Delete all the old backend mesh/task URB handling code
This has all been replaced by NIR lowering to URB intrinsics.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
d0dc45955d brw: Lower task shader payload access in NIR
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets.  Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts.  But we can't do the mesh URB lowering
there because that doesn't have the MUE map.

It's not that much code as a separate pass, though.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
bd0c173595 brw: Lower mesh shader outputs in NIR
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.

MUE initialization is now done with semantic IO instead of raw offsets.

This drops another case of non-standard NIR IO usage (and no_validate).

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:44 +00:00
Kenneth Graunke
6e5cc63a3a brw: Extend URB lowering infrastructure to handle mesh shader outputs
Mesh shaders introduce per-primitive outputs, and also our MUE layout
has per-vertex data starting at an offset.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:43 +00:00
Lionel Landwerlin
60db7f20c9 brw: move MUE initialization out of the SIMD loop
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:42 +00:00
Lionel Landwerlin
d3053fb3d2 brw: Implement URB handle intrinsics for task/mesh stages
(Split by Ken from a larger patch originally written by Lionel.)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:40 +00:00
Kenneth Graunke
d18423b116 brw: Make lower_{inputs,outputs}_to_urb_intrinsics non-static
I want to reuse these in brw_compile_mesh.cpp.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:40 +00:00
Kenneth Graunke
788c49ecc6 brw: Extend load_urb/store_urb to handle 32-bit non-vec4-aligned access
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)

The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets.  For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.

Still, most fields are still conveniently vec4-aligned.

To handle this, we add a new cb_data::vec4_access flag.  If set, access
remains in vec4 units, with vec4 alignment.  We use this for non-mesh
stages.  When unset, offset is in 32-bit units, allowing unaligned
DWord access.

This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing.  On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).

Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked.  We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can).  If
not, we use the ability to have dynamic writemasks to sort it out.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:38 +00:00
Kenneth Graunke
2b700f6bfd brw: Delete attr_desc struct
Unused since commit 18bbcf9a63.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:37 +00:00
Kenneth Graunke
8177695403 brw: Add missed access to store_urb_lsc_intel intrinsics
I forgot to copy this over in the LSC case.  This meant we were missing
reorderability which meant that we were missing out on CSE.

fossil-db results on Battlemage:

   Instrs: 231471427 -> 231363032 (-0.05%)
   Send messages: 12077759 -> 12019628 (-0.48%)
   Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
   Spill count: 520387 -> 520135 (-0.05%)
   Fill count: 470812 -> 470722 (-0.02%)
   Max live registers: 72111834 -> 71873886 (-0.33%)

   Totals from 2898 (0.37% of 788851) affected shaders:
   Instrs: 1223836 -> 1115441 (-8.86%)
   Send messages: 148633 -> 90502 (-39.11%)
   Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
   Spill count: 252 -> 0 (-inf%)
   Fill count: 90 -> 0 (-inf%)
   Max live registers: 491684 -> 253736 (-48.39%)
   Non SSA regs after NIR: 255397 -> 255402 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:36 +00:00
Kenneth Graunke
87c63b4725 brw: Rename brw_nir_lower_vue_inputs to brw_nir_lower_gs_inputs
The other stages don't use this anymore.  Geometry should stop too,
but that's for a future MR.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:34 +00:00
Kenneth Graunke
d525d2456a brw: Calculate tessellation URB offsets when lowering to URB intrinsics
This now lowers IO intrinsics to URB intrinsics in a single step,
rather than modifying IO intrinsics to have non-standard meanings
temporarily.  We are able to drop one "no_validate" flag.

For example, remap_patch_urb_offsets had added (vertex * stride) to
(offset) for per-vertex IO intrinsics, but left them as per-vertex
intrinsics.  Now we just have an urb_offset() function to calculate
that when doing the lowering.

This also provides a central location for calculating URB offsets,
which we should be able to extend for other uses (per-view lowering,
mesh per-primitive lowering) in future patches.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:34 +00:00
Kenneth Graunke
b02a01c636 intel: Replace signed char with int8_t
Let's use modern stdint types.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:32 +00:00
José Roberto de Souza
518705a4fe intel/brw: Split to a function the code that calculate sampler channels that should be written
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This block of code will be re-used in a future patch, also it reduces a bit the
size and complexity of lower_sampler_logical_send().
No changes in behavior intended here.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>
2025-12-15 06:57:15 -08:00
Kenneth Graunke
bdacab49c7 brw: Use LSC extended descriptor offsets for Xe2 URB messages
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
URB messages on Xe2 are LSC messages with FLAT addressing.  We can
specify a S19 immediate offset in the extended message descriptor,
which should be more than adequate to hold any offsets we need.

We wrote the original URB code before implementing that, and never
doubled back to take advantage of it.  But doing so can drop ADDs
near every URB access.

fossil-db results on Battlemage:

   Totals:
   Instrs: 232239759 -> 231432254 (-0.35%)
   Cycle count: 34044435848.0 -> 34055507100.0 (+0.03%); split: -0.00%, +0.04%
   Spill count: 520370 -> 520362 (-0.00%); split: -0.00%, +0.00%
   Fill count: 470790 -> 470803 (+0.00%); split: -0.00%, +0.00%
   Max live registers: 72111853 -> 72111369 (-0.00%); split: -0.00%, +0.00%

   Totals from 227920 (28.89% of 788851) affected shaders:
   Instrs: 59841897 -> 59034392 (-1.35%)
   Cycle count: 683385208.0 -> 694456460.0 (+1.62%); split: -0.14%, +1.76%
   Spill count: 17278 -> 17270 (-0.05%); split: -0.10%, +0.06%
   Fill count: 17481 -> 17494 (+0.07%); split: -0.03%, +0.10%
   Max live registers: 23052652 -> 23052168 (-0.00%); split: -0.00%, +0.00%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
2025-12-12 00:12:03 +00:00
Kenneth Graunke
9482d392a1 brw: Fix outdated comments about urb->offset units
I recently converted urb->offset to be in bytes on Xe2, but neglected to
update these comments that still said OWord.

Fixes: 9ffae42975 ("brw: Store brw_urb_inst::offset in bytes on Xe2")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
2025-12-12 00:12:03 +00:00
Iván Briano
094f8f041f anv: enable fragmentShadingRateWithShaderSampleMask on Xe2+
Before DG2, the value the HW gives us seems to be backwards, but
since DG2 this is supposed to be supported just fine.
However, due to Wa_22012766191, enable it only for Xe2 and up.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
2025-12-11 22:50:10 +00:00
Iván Briano
a7280ab590 nir: add nir_lower_single_sampled::lower_sample_mask_in option
GLSL defines gl_SampleMaskIn as :
   "a fragment language that indicates the set of samples covered
    by the primitive generating the fragment during multisample
    rasterization"

when variable rate shading is enabled, a single invocation might cover
multiple samples. The lowering done in nir_lower_single_sampled() does
not account for that case, so add an option to selectively disable it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
2025-12-11 22:50:10 +00:00
Emma Anholt
10ba7675c8 nir/uub: Use an optional max_samples from drivers for sample counts.
This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my
shader-db.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
2025-12-11 14:26:11 +00:00
Mel Henning
dc44c0f32b treewide: Use nir_deref_instr_is_arr()
Via coccinelle and some manual fixups.

@@
expression e1;
@@
- e1->deref_type == nir_deref_type_array || e1->deref_type == nir_deref_type_ptr_as_array
+ nir_deref_instr_is_arr(e1)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
2025-12-10 22:07:45 +00:00
Caio Oliveira
7bd238fa5a brw: Properly set 'desc as register' for SEND in assembler
The non-split SEND case was missing setting this.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38876>
2025-12-10 19:46:52 +00:00
Lionel Landwerlin
6e92720ece anv/brw: drop cs_prog_key::lower_unaligned_dispatch usage
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38837>
2025-12-10 07:44:31 +00:00
Lionel Landwerlin
86419dd519 brw: remove driver specific load_num_workgroup lowering
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38735>
2025-12-02 22:44:05 +00:00
Ian Romanick
d64ce23b08 elk: only lower flrp once
No shader-db changes on any Intel platform. Both Iris and Crocus use
st_nir_opts, which calls nir_lower_flrp before brw_nir_optimize. The
call still needs to exist for hasvk, but I don't collect fossil-db data
for hasvk.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
2025-12-02 21:28:05 +00:00
Alyssa Rosenzweig
e4b8b758b1 brw: only lower flrp once
No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 926275147 -> 926273376 (-0.00%); split: -0.00%, +0.00%
Cycle count: 106012190597 -> 106011255305 (-0.00%); split: -0.00%, +0.00%
Spill count: 3424180 -> 3424168 (-0.00%)
Fill count: 4877035 -> 4877017 (-0.00%)
Max live registers: 193918196 -> 193918122 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 49106544 -> 49106448 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 231281721 -> 231281719 (-0.00%)

Totals from 1705 (0.08% of 2020028) affected shaders:
Instrs: 926974 -> 925203 (-0.19%); split: -0.28%, +0.09%
Cycle count: 39024288 -> 38088996 (-2.40%); split: -2.77%, +0.37%
Spill count: 2229 -> 2217 (-0.54%)
Fill count: 2977 -> 2959 (-0.60%)
Max live registers: 183056 -> 182982 (-0.04%); split: -0.20%, +0.16%
Max dispatch width: 46880 -> 46784 (-0.20%); split: +0.07%, -0.27%
Non SSA regs after NIR: 263520 -> 263518 (-0.00%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
2025-12-02 21:28:05 +00:00
Lionel Landwerlin
a4e9e660d4 brw/iris: remove fs key for coherent_fb_fetch
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38737>
2025-12-02 12:44:35 +00:00
Kenneth Graunke
320f91a5ab intel/elk: Also disable output constant offset src folding
Same fix from brw.

Fixes: 9a56672f56 ("nir: add shader_info::disable_input/output_offset_src_constant_folding")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38758>
2025-12-01 20:10:37 +00:00
Marek Olšák
fa0bea5ff8 nir: remove nir_io_add_const_offset_to_base
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_opt_constant_folding does it now.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:38 +00:00
Marek Olšák
9a56672f56 nir: add shader_info::disable_input/output_offset_src_constant_folding
and set it where needed to prevent nir_opt_constant_folding from breaking
those drivers.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:38 +00:00
Lionel Landwerlin
515d8f8e3a brw: fix sample mask flag emission
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's also used for testing helper invocations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3328dfa2f ("brw: only initialize sample mask flag if needed")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38699>
2025-11-27 15:59:35 +00:00
Ian Romanick
0c089a5c32 brw: Eliminate duplicate fills
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.

This optimization replaces the later fill with a move from the result of
the earlier fill.

v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.

v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.

v4: In intervening spill to the same location also invalidates the
value. 🤦

v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0

total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14

total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)

Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:13 +00:00
Ian Romanick
d2e3707ecc brw: Eliminate redundant fills and spills
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.

The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.

v2: Require force_writemask_all to be the same for the spill and the fill.

v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.

v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.

v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.

v6: Move a calculation outside a loop. Suggested by Lionel.

v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.

v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.

shader-db:

Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0

total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0

total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0

total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2

total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)

Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)

Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:13 +00:00
Ian Romanick
b7f5285ad3 brw: Add fill and spill opcodes for LSC platforms
These opcodes are emitted during register allocation instead of the
scratch reads and writes that were previously emitted. These
instructions contain additional information (i.e., the instruction
encodes the scratch offset) that enable optimizations to be added
later.

The fill and spill opcodes are lowered to scratch reads and writes
shortly after register allocation. Eventually this lower may have some
optimizations (e.g., reuse previous address calculations for successive
spills).

v2: Add brw_scratch_inst::offset instead of storing it as a
source. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:12 +00:00
Ian Romanick
2215003d95 brw: Add OPT macro to brw_shader.cpp like brw_opt.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:11 +00:00
Ian Romanick
1f42ff530c brw: Return the new register from brw_lower_vgrf_to_fixed_grf
...and make the function public.

v2: s/struct brw_reg/brw_reg/. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:11 +00:00
Ian Romanick
243a3a4ca7 brw: Don't pass compressed to brw_lower_vgrf_to_fixed_grf
The parameter is never used. It's recalculated in the function.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:10 +00:00
Ian Romanick
1fc2f52d36 brw: Force allow_spilling when spill_all is set
This ensures that g0 is reserved for spilling since there is going to be
spilling.

Fixes: 8bca7e520c ("intel/brw: Only force g0's liveness to be the whole program if spilling")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:09 +00:00
Ian Romanick
042417a72e brw: Don't spill_all on internal shaders
Basically all of the internal shaders (e.g., from blorp) will fail
assertions if there is any scratch space used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:09 +00:00
Alyssa Rosenzweig
e3328dfa2f brw: only initialize sample mask flag if needed
This is a refinement of 7c129d9365 ("intel/brw/xe2+: Keep PS sample mask in the
f1.0 register whether or not kill is used."). Rather than always insert this
move, do so only when we'll actually read the register: for memory writes and
for discards. This deletes an instruction from piles of fragment shaders.

shader-db on LNL:

total instructions in shared programs: 17134031 -> 17042706 (-0.53%)
instructions in affected programs: 9065743 -> 8974418 (-1.01%)
helped: 65045
HURT: 0
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.40 x̃: 1
helped stats (rel) min: <.01% max: 50.00% x̄: 3.06% x̃: 1.64%
95% mean confidence interval for instructions value: -1.41 -1.40
95% mean confidence interval for instructions %-change: -3.10% -3.03%
Instructions are helped.

total cycles in shared programs: 885172098 -> 884835306 (-0.04%)
cycles in affected programs: 590294230 -> 589957438 (-0.06%)
helped: 53636
HURT: 4500
helped stats (abs) min: 2.0 max: 1126.0 x̄: 8.02 x̃: 4
helped stats (rel) min: <.01% max: 50.00% x̄: 1.24% x̃: 0.24%
HURT stats (abs)   min: 2.0 max: 7706.0 x̄: 20.77 x̃: 6
HURT stats (rel)   min: <.01% max: 82.06% x̄: 1.09% x̃: 0.54%
95% mean confidence interval for cycles value: -6.15 -5.43
95% mean confidence interval for cycles %-change: -1.10% -1.02%
Cycles are helped.

LOST:   385
GAINED: 47

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38665>
2025-11-26 16:53:36 +00:00
Kenneth Graunke
3182deaae1 brw: Combine output stores for TCS outputs even when unlinked
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise we get a lot of individual x/y/z stores to tesslevels when
we should really just be storing the whole thing at once.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
2025-11-25 22:44:03 +00:00