Commit graph

5040 commits

Author SHA1 Message Date
Alyssa Rosenzweig
2ed6ff728a brw: explicitly pad tgl_swsb
This lets us treat it as a packed data structure without worrying about garbage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40829>
2026-04-07 19:32:15 +00:00
Sagar Ghuge
f0ae58df12 intel/compiler: Handle TerminateOnFirstHit in ray query execution
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Once commited and have AABB or triangle intersection found, terminate
the traversal if TerminateOnFirstHit ray flag is present.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>
2026-04-06 10:00:05 -07:00
Arkady Shlykov
7f7ba20cca brw: Implement divergent atomics fusion optimization (single message approach)
For an atomic with a divergent addr generates a CFG grouping the same addrs
values together and emits a single atomic with fused data covering
the subgroup. Lanes with other addr values perform a default atomic.

Co-authored-by: Jhanani Thiagarajan <jhanani.thiagarajan@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631>
2026-04-03 12:17:01 +00:00
Lionel Landwerlin
fab6f84126 brw: make the program key available on pass_tracker
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631>
2026-04-03 12:17:01 +00:00
Caio Oliveira
0bf3aaedb1 brw: Always use split send in generator
Instead of generating special single source send in some cases, always
use the split send (called SENDS pre-Xe, and the only option in Xe).
Having code-path for single source was relevant for old Gfx versions,
but for Gfx9+ split send is always available.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40755>
2026-04-02 18:31:02 +00:00
Kenneth Graunke
ca3cabd2f8 brw: Use nir_texop_resinfo_intel for query_levels and txs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This eliminates the need to special case query_levels.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40451>
2026-03-29 12:53:10 +00:00
Lionel Landwerlin
fa523aedd0 brw: fence SLM writes between workgroups
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On LSC platforms the SLM writes are unfenced between workgroups. This
means a workgroup W1 finishing might have uncompleted SLM writes.
Another workgroup W2 dispatched after W1 which gets allocated an
overlapping SLM location might have writes that race with the previous
W1 operations.

The solution to this is fence all write operations (store & atomics)
of a workgroup before ending the threads. We do this by emitting a
single SLM fence either at the end of the shader or if there is only a
single unfenced right, at the end of that block.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13924
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40430>
2026-03-26 22:38:55 +00:00
Georg Lehmann
eef0fa22e0 brw: preserve fp_math_ctrl when lowering cmat alu
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40630>
2026-03-26 13:15:50 +00:00
Kenneth Graunke
204af7e09f intel/nir: Replace tg4 with txl/txb/tex when splitting texture residency
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
textureGather() returns the four taps that would have been filtered
together to produce the value that ordinary texturing operations
would return.  As such, it should access the same data, so we can use
either interchangeably when we're only checking for residency and not
returning the actual data.

This allows us to mask out some unneeded registers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
2026-03-24 16:06:29 +00:00
Kenneth Graunke
605ef577b3 intel/nir: Generalize lower_tex_compare to split_tex_residency
This splits a single texture-with-residency operation into two halves,
one which returns texture data, and another which queries residency.

We're currently using this only for a shadow sampling workaround,
but the technique is more broadly applicable, if we ever wanted.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
2026-03-24 16:06:29 +00:00
Kenneth Graunke
dc760104ba intel/nir: Set new image intrinsic parameters via builder helpers
A bit less code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
2026-03-24 16:06:28 +00:00
Kenneth Graunke
9d07e85287 intel/nir: Use txf builder in intel_nir_lower_sparse
Newer helpers make NIR easier to write.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
2026-03-24 16:06:28 +00:00
Tapani Pälli
c75256b2ab intel/compiler: move validation assert after brw_shader_debug_log
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When validation fails we print instructions to use INTEL_DEBUG=shaders
but that will not help if we assert before dumping shader debug log.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40529>
2026-03-24 04:54:31 +00:00
Ian Romanick
b5e023777c brw: Change the flags written by some CMP
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
One frustrating thing about the CMP and CMPN instructions is that they
always write the flags. Sometimes, however, it is desirable to generate
the comparison result without modifying the flags. This would,
theoretically, reduce false dependencies that restrict the scheduler's
ability to rearrange code, create more opportunities for cmod
propagation, save a kitten from a tree, and make a rainbow.

Consider this sequence:

           cmp.ge.f0.0(8)  g103<1>F        g101<8,8,1>F    g39<8,8,1>F
           cmp.nz.f0.0(8)  null<1>D        g81<8,8,1>D     0D
   (+f0.0) if(8)   JIP:  LABEL19         UIP:  LABEL19

It would be advantageous to put the first CMP between the second CMP and
the IF, but this cannot be done since the IF depends on the flags generated
by the second CMP.

This pass enables this rescheduling by changing the first CMP to write
to a different flags register.

           cmp.ge.f1.0(8)  g103<1>F        g101<8,8,1>F    g39<8,8,1>F
           cmp.nz.f0.0(8)  null<1>D        g81<8,8,1>D     0D
   (+f0.0) if(8)   JIP:  LABEL19         UIP:  LABEL19

Sometimes this is also possible by using a different instruction.  For
example, consider

           cmp.l.f0.0(8)   g103<1>D        g101<8,8,1>D    0D

This produces 0xffffffff when g101 negative and zero otherwise. This
instruction, which does not modifiy the flag, also produces these results:

           asr(8)          g103<1>D        g101<8,8,1>D    31D

Gfx9 platforms take a hit on instructions due to the instruction added
at the end of short shaders by brw_workaround_source_arf_before_eot.

shader-db:

Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17089451 -> 17088766 (<.01%)
instructions in affected programs: 766613 -> 765928 (-0.09%)
helped: 653 / HURT: 0

total cycles in shared programs: 888832986 -> 887873068 (-0.11%)
cycles in affected programs: 549441852 -> 548481934 (-0.17%)
helped: 10474 / HURT: 130

LOST:   9
GAINED: 0

Skylake
total instructions in shared programs: 19037976 -> 19049719 (0.06%)
instructions in affected programs: 3979914 -> 3991657 (0.30%)
helped: 503 / HURT: 12303

total cycles in shared programs: 867918242 -> 866930801 (-0.11%)
cycles in affected programs: 512773919 -> 511786478 (-0.19%)
helped: 13858 / HURT: 66

LOST:   32
GAINED: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 925023504 -> 924950382 (-0.01%); split: -0.01%, +0.00%
Cycle count: 106348432916 -> 106116809009 (-0.22%); split: -0.22%, +0.00%
Spill count: 3423988 -> 3423930 (-0.00%); split: -0.00%, +0.00%
Fill count: 4877087 -> 4876960 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 49087552 -> 49078448 (-0.02%); split: +0.00%, -0.02%

Totals from 1099332 (54.44% of 2019443) affected shaders:
Instrs: 742670473 -> 742597351 (-0.01%); split: -0.01%, +0.00%
Cycle count: 100455549635 -> 100223925728 (-0.23%); split: -0.23%, +0.00%
Spill count: 3384366 -> 3384308 (-0.00%); split: -0.00%, +0.00%
Fill count: 4837434 -> 4837307 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 26725152 -> 26716048 (-0.03%); split: +0.00%, -0.03%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997603774 -> 997529238 (-0.01%); split: -0.01%, +0.00%
Cycle count: 93904012762 -> 93646730006 (-0.27%); split: -0.28%, +0.00%
Spill count: 3710155 -> 3710125 (-0.00%); split: -0.00%, +0.00%
Fill count: 5032908 -> 5032819 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 37929640 -> 37811560 (-0.31%)

Totals from 1334920 (58.52% of 2281134) affected shaders:
Instrs: 817377787 -> 817303251 (-0.01%); split: -0.01%, +0.00%
Cycle count: 88468851658 -> 88211568902 (-0.29%); split: -0.29%, +0.00%
Spill count: 3663353 -> 3663323 (-0.00%); split: -0.00%, +0.00%
Fill count: 4991629 -> 4991540 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 20245832 -> 20127752 (-0.58%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013433769 -> 1013363273 (-0.01%); split: -0.01%, +0.00%
Cycle count: 85766921182 -> 85509316620 (-0.30%); split: -0.31%, +0.00%
Spill count: 3903923 -> 3903944 (+0.00%); split: -0.00%, +0.00%
Fill count: 6801983 -> 6801948 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37896320 -> 37805320 (-0.24%); split: +0.00%, -0.24%

Totals from 1333814 (58.54% of 2278396) affected shaders:
Instrs: 830200531 -> 830130035 (-0.01%); split: -0.01%, +0.00%
Cycle count: 80746184101 -> 80488579539 (-0.32%); split: -0.32%, +0.01%
Spill count: 3855771 -> 3855792 (+0.00%); split: -0.00%, +0.00%
Fill count: 6755513 -> 6755478 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 20301456 -> 20210456 (-0.45%); split: +0.00%, -0.45%

Skylake
Totals:
Instrs: 519389758 -> 519874108 (+0.09%); split: -0.00%, +0.10%
Cycle count: 57932316132 -> 57789433956 (-0.25%); split: -0.25%, +0.00%
Spill count: 636741 -> 636715 (-0.00%); split: -0.01%, +0.00%
Fill count: 860470 -> 860357 (-0.01%); split: -0.02%, +0.00%
Max dispatch width: 32527800 -> 32481792 (-0.14%); split: +0.00%, -0.14%

Totals from 1080380 (62.25% of 1735462) affected shaders:
Instrs: 411976399 -> 412460749 (+0.12%); split: -0.00%, +0.12%
Cycle count: 54291447615 -> 54148565439 (-0.26%); split: -0.27%, +0.00%
Spill count: 602993 -> 602967 (-0.00%); split: -0.01%, +0.00%
Fill count: 734459 -> 734346 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 18626096 -> 18580088 (-0.25%); split: +0.00%, -0.25%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
2026-03-24 01:31:26 +00:00
Ian Romanick
31de96d321 brw/lower_regioning: Allow integer conversions in SEL
The Bspec says that SEL sources and destination can be any mix of *B,
*W, and *D. We should allow those. Specifically, without this change,
this instruction

    sel.sat.l(8) v548:UD, v899:D, 255d

gets unnecessarily split into two instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
2026-03-24 01:31:26 +00:00
Ian Romanick
dff1e8ae28 brw: Handle scalars and swizzles correctly in is_const_zero
v2: Massive simplification based on feedback from Ken.

Fixes: 96cde9cc01 ("intel/fs: Emit better code for bfi(..., 0)")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
2026-03-24 01:31:25 +00:00
Ian Romanick
985ace332b brw/algebraic: Allow mixed types in saturate constant folding
Prevents assertion failures in func.shader-ballot.basic.q0 and other
tests starting with "nir/algebraic: Optimize some b2f of integer
comparison".

Vector immediates, bfloat, and 8-bit floats are still not supported.

v2: Almost complete re-write based on suggestions from Ken.

v3: Don't retype() on a brw_imm_f value.

Fixes: f8e54d02f7 ("intel/compiler: Relax mixed type restriction for saturating immediates")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
2026-03-24 01:31:25 +00:00
Marek Olšák
fa5175023b Final rename of sha1 names to blake3
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:28 +00:00
Marek Olšák
ae9ea27e0d Rename *_sha1 names to *_blake3
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:28 +00:00
Marek Olšák
102d41799b Rename more sha and sha1 names to blake3
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:28 +00:00
Marek Olšák
d4831aaf5f Rename sha1_* and sha_* names to blake3_*
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:28 +00:00
Marek Olšák
c0ac992a2a Remove mesa-sha1.h
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Marek Olšák
53c64973e8 Inline _mesa_sha1_compute/format, remove the other unused ones
_mesa_sha1_format has a few remaining uses, so it's moved to build_id.c,
which is its last user.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Marek Olšák
699f9d7066 Inline _mesa_sha1_init/update/final functions
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Marek Olšák
a965ada6ee Inline mesa_sha1, SHA1_CTX
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Marek Olšák
0da88d237a Inline SHA1_DIGEST_STRING_LENGTH
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Marek Olšák
110632f702 Inline SHA1_DIGEST_LENGTH
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
2026-03-23 07:03:27 +00:00
Georg Lehmann
ec331cc48a nir: replace lower_ldexp with has_ldexp
I can be bothered to fix all the backends that don't set lower_ldexp,
and only two backends have ldexp anyway.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900>
2026-03-20 08:15:08 +00:00
Iván Briano
fd556e54f6 brw: do not omit RT writes if dual_src_blend is on
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.

Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*

Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>
2026-03-19 23:38:40 +00:00
Caio Oliveira
dcba49d7ef intel/compiler: Handle shuffle_*_intel intrinsics in bit size lowering
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
2026-03-17 17:21:52 +00:00
Kenneth Graunke
9f77991751 brw: Simplify mark_last_urb_write_with_eot()
Just tag the last instruction, drop useless dead code elimination.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
4bfa7a602c brw: Don't emit HALT_TARGET for VS/TCS/TES/GS
This isn't needed and will allow simplifications in the next patch.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
2b6c6f8130 brw: Lower TCS single patch invocation ID calculations in NIR
This is a bit less code and also drops one more TCS-specific thing
from the "run" function.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
66fbfe7bf3 brw: Fix single patch thread dispatch masks in NIR
Arguably a little more code but it brings us a bit closer to not
needing separate per-stage "run" functions.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
4a9aa3ecc4 brw: Combine brw_assign_*_urb_setup() into one function
They all do exactly the same thing, except that GS multiplies by an
extra factor, and TCS has urb_read_length == 0 so it skips one line.

No need for four copies.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
7d463a45f7 brw: Simplify GS load_invocation_id handling
Just return the register instead of having multiple functions stash the
register in an array of registers.  Way too much hoopla here.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
9933882182 brw: Purge source_depth_to_render_target
This was used for Gfx4-5.  Since then, we're just passing around a
boolean that nobody wants.  Even if someone did, a better plan is to
just check nir->info directly.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Sagar Ghuge
cb423ee636 anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.

We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.

This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.

This affect performance on Xe2+ onwards:
* Boundary Benchmark:            36.2%
* Solar Bay extreme:             9.8%
* Hitman world of assassination: 3.9%

Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40310>
2026-03-10 22:41:54 +00:00
Lionel Landwerlin
f508c6acbb brw/nir: improve shader_indirect_data_intel handling
Use is_scalar to know if we can do transpose loading.

Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).

Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
2026-03-10 18:24:04 +00:00
Paulo Zanoni
85751506ab elk: don't use instr->const_index[] directly
From what I understand, use of const_index[] by the driver is
dangerous and should be avoided, as commits such as a6330ed4d0
("nir: add ACCESS to load_uniforms") may result in the indexes
changing, breaking the driver. Switch to using the parameter names in
order to make the code more future-proof.

For elk_fs_nir.cpp and elk_vec4_tes.cpp we can verify in the generated
nir_intrinsics.c that the wanted value is actually
nir_intrinsic_base().

For elk_nir.c, according to Caio Oliveira:

  "The code is checking for certain load/store via the is_input() and
   is_output() checks a few lines above. I've checked all them have
   BASE at 0."

Thanks to Ian Romanick for his guidance regarding this patch.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39438>
2026-03-10 01:03:42 +00:00
Ian Romanick
ffd4497e48 brw/asm: Don't drop accumulator number in the assembler
Previously "acc1" or "acc2" would be stored as acc0.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:39 +00:00
Ian Romanick
1ae7a82811 brw: Fix encoding of accumulator sources of 3-source instructions
Previously the accumulator was always forced to be acc0.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:39 +00:00
Ian Romanick
6531c425a0 brw/emit: Src1 can be accumulator on Gfx12.5 and newer
v2: Add Bspec reference number. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:39 +00:00
Ian Romanick
c3a5b62c08 brw/validate: Perform more 3-src validation in brw_validate instead of brw_eu_emit
v2: s/Lake/Ice Lake/ in a comment. Noticed by Caio. Add a missing Xe2
Bspec reference number. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:39 +00:00
Ian Romanick
1f45e33072 brw/validate: Implicit read of accumulator cannot also have explicit read
v2: Add Bspec reference number. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:38 +00:00
Ian Romanick
8a6de2d973 brw/validate: Eliminate duplicate integer multiply validation
I think two MRs must have crossed in the mail so to speak. Keep Caio's
formatting and error message, and keep my PRM quote.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>
2026-03-09 19:21:38 +00:00
Ian Romanick
64c60582b5 elk/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
shader-db:

Broadwell
total instructions in shared programs: 18607516 -> 18607530 (<.01%)
instructions in affected programs: 2095 -> 2109 (0.67%)
helped: 0 / HURT: 8

total cycles in shared programs: 955704436 -> 955702925 (<.01%)
cycles in affected programs: 34299 -> 32788 (-4.41%)
helped: 2 / HURT: 6

All Haswell and older platforms had similar results. (Haswell shown)
total instructions in shared programs: 16989200 -> 16989201 (<.01%)
instructions in affected programs: 461 -> 462 (0.22%)
helped: 0 / HURT: 1

total cycles in shared programs: 946537070 -> 946537035 (<.01%)
cycles in affected programs: 16378 -> 16343 (-0.21%)
helped: 1 / HURT: 0

Test: piglit!1100
Reported-by: Georg Lehmann
Fixes: ca675b73d3 ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40284>
2026-03-09 18:41:55 +00:00
Ian Romanick
6c6c6ce054 brw/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
This optimization was added in October 2013, and the error was only just
now discovered. Removing the SEL.G.SAT optimization affected zero
shader-db shaders, and it affected 9 fossil-db shaders for instruction
size only.

I haven't checked to see if any of the hurt shaders are helped by
!39987.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17093041 -> 17093055 (<.01%)
instructions in affected programs: 2072 -> 2086 (0.68%)
helped: 0 / HURT: 8

total cycles in shared programs: 876739578 -> 876739154 (<.01%)
cycles in affected programs: 18946 -> 18522 (-2.24%)
helped: 2 / HURT: 6

fossil-db:

Lunar Lake
Totals:
Instrs: 906230557 -> 906240487 (+0.00%); split: -0.00%, +0.00%
CodeSize: 14498856128 -> 14499003168 (+0.00%); split: -0.00%, +0.00%
Send messages: 40667184 -> 40667205 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104068494103 -> 104068561943 (+0.00%); split: -0.00%, +0.00%
Max live registers: 189570192 -> 189570204 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48157648 -> 48157552 (-0.00%)
Non SSA regs after NIR: 139823587 -> 139823016 (-0.00%); split: -0.00%, +0.00%

Totals from 9172 (0.46% of 1985212) affected shaders:
Instrs: 10774709 -> 10784639 (+0.09%); split: -0.00%, +0.09%
CodeSize: 177868384 -> 178015424 (+0.08%); split: -0.08%, +0.17%
Send messages: 311154 -> 311175 (+0.01%); split: -0.00%, +0.01%
Cycle count: 232471392 -> 232539232 (+0.03%); split: -0.15%, +0.18%
Max live registers: 1243549 -> 1243561 (+0.00%); split: -0.00%, +0.01%
Max dispatch width: 196672 -> 196576 (-0.05%)
Non SSA regs after NIR: 509663 -> 509092 (-0.11%); split: -0.19%, +0.08%

Test: piglit!1100
Reported-by: Georg Lehmann
Fixes: ca675b73d3 ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40284>
2026-03-09 18:41:55 +00:00
Lionel Landwerlin
9f2215b480 anv/brw: remove push constant load emulation from the backend compiler
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Anv is responsible for much of how the data is accessed (where is the
push constant base pointer located, etc...), so move the memory load
there.

Fossildb on LNL :
Totals from 135931 (8.65% of 1572134) affected shaders:
Instrs: 68518228 -> 67142101 (-2.01%); split: -2.05%, +0.05%
CodeSize: 1123507040 -> 1092022560 (-2.80%); split: -2.88%, +0.08%
Subgroup size: 32 -> 16 (-50.00%)
Send messages: 4401584 -> 4402565 (+0.02%); split: -0.02%, +0.04%
Cycle count: 4626573038 -> 4619434858 (-0.15%); split: -0.89%, +0.74%
Spill count: 451759 -> 452407 (+0.14%); split: -0.43%, +0.57%
Fill count: 374513 -> 377440 (+0.78%); split: -0.76%, +1.54%
Max live registers: 15788042 -> 15791399 (+0.02%); split: -0.05%, +0.08%
Max dispatch width: 3349408 -> 3346192 (-0.10%); split: +0.09%, -0.19%
Non SSA regs after NIR: 9477038 -> 9498328 (+0.22%); split: -0.27%, +0.50%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
2026-03-06 06:34:43 +00:00
Lionel Landwerlin
e14d6b535c brw/nir: add new intrinsics to load data from the indirect address
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
2026-03-06 06:34:43 +00:00