Commit graph

87 commits

Author SHA1 Message Date
Kenneth Graunke
afb97ff2af brw: Switch FS outputs to semantic IO and FRAG_RESULT_DUAL_SRC_BLEND
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1.  This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location.  That cascades to meaning we no longer have to
store a tuple in driver_location.  And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
fbaa5ad0c3 iris: Implement force_dual_color_blend_by_location via NIR
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR.  By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Lionel Landwerlin
c30a4d4fdb anv/brw/nir: fix wa_18019110168
Several things were wrong :
  - incorrect offset in the FS push constant data
  - incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:41 +00:00
Calder Young
3ac6233655 brw: Avoid rounding every convergent block load up to a full register
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Caio Oliveira
1ebc14bcb9 brw: Stop tracking inline parameter usage in prog_key/prog_data
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist.  They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.

In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup.  Remove/Update some stale
comments since we are here.

Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
2026-04-30 16:39:22 +00:00
Sagar Ghuge
620835926d brw: Pass write back register for ray query messages
For DG2 (Bspec 47937) has the same programming note as of Xe2+,

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

So this patch is just passing a write back destination register when we
have ray query message.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>
2026-04-21 23:16:09 +00:00
José Roberto de Souza
64bc538f5e intel/brw: Explicitly upcast UB to UW for SHR with vector immediates
HW does not allow instructions with vector immediates to cross a GRF boundary if
it has a stride.

Under register pressure, the register allocator may place a temporary register
across such a boundary.

To resolve this, we now explicitly emit a MOV to upcast the UB payload into a
UW VGRF.
This ensures the SHR instruction operates on a dense, well-aligned region that
satisfies hardware alignment constraints.

Below is the portion of the shader exhibiting this issue:

Native code for unnamed fragment shader GLSL6 (src_hash 0x9c84a007) (sha1 48745e7dae90d08f8a9bbe4dbf837de23440c841f0344e669cb8af9df79bce58)
SIMD32 shader: 44 instructions. 0 loops. 354 cycles. 0:0 spills:fills, 2 sends, scheduled with mode latency-sensitive. Promoted 0 constants. GRF registers: 22. Non-SSA regs (after NIR): 11. Compacted 800 to 800 bytes (0%)
mov(1)          f1<1>UW         g0.30<0,1,0>UW                  { align1 WE_all 1N };
mov(1)          f1.1<1>UW       g1.30<0,1,0>UW                  { align1 WE_all 1N I@1 };
mov(32)         g2<2>UW         g0.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g4<2>UW         g0.21<2,8,0>UW                  { align1 WE_all };
mov(32)         g8<2>UW         g1.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g10<2>UW        g1.21<2,8,0>UW                  { align1 WE_all };
mov(16)         g12<4>UB        g0.60<1,8,0>UB                  { align1 1H };
mov(16)         g13<4>UB        g1.60<1,8,0>UB                  { align1 2H };
add(32)         g0<1>UW         g2<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g1<1>UW         g4<16,8,2>UW    0x01010000V     { align1 WE_all I@6 };
add(32)         g2<1>UW         g8<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g3<1>UW         g10<16,8,2>UW   0x01010000V     { align1 WE_all I@6 };
shr(16)         g4<1>UW         g12<32,8,4>UB   0x76543210V     { align1 1H I@6 };
mov(16)         g14.32<4>UB     g13<32,8,4>UB                   { align1 2H I@6 };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g5<1>UW         g0<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g0<1>UW         g1<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g5.16<1>UW      g2<16,8,2>UW                    { align1 2H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g0.16<1>UW      g3<16,8,2>UW                    { align1 2H };
shr(16)         g4.16<1>UW      g14.32<32,8,4>UB 0x76543210V    { align1 2H I@5 };
    ERROR: Invalid register region for source 0.  See special restrictions section.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40856>
2026-04-21 22:51:45 +00:00
Lionel Landwerlin
a84c12414c brw: don't support frontfacing ternary optimization on != 32bit
Fix shader compilation on Crimson Desert :

  16    %1995 = b32csel %1992, %1993, %1994

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40931>
2026-04-13 20:32:06 +00:00
Kenneth Graunke
0b99c88337 nir, brw: lower scratch in NIR
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This will let us share a common scratch swizzling between brw and jay.

Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
  info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch.  These take an
  offset in DWords, and our offsets are now always in bytes.  This also
  means that we no longer create MEMORY_OPCODE_* IR with inconsistent
  units of either bytes or dwords.  Yikes.  We use byte scattered
  messages now.

fossil-db stats on Battlemage:

   Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
   Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
   Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
   Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
   Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)

   Totals from 99 (0.01% of 1581969) affected shaders:
   Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
   CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
   Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
   Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
   Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
   Non SSA regs after NIR: 37894 -> 24155 (-36.26%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
2026-04-09 21:02:16 +00:00
Kenneth Graunke
d7d2d7aceb brw: Support load_simd_width_intel for fragment shaders
This lets us emit NIR code based on the SIMD size.  For non-fragment
stages, we'll replace it with a constant and optimize, but for FS,
we delay it until the backend.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
2026-04-09 21:02:16 +00:00
Kenneth Graunke
ca3cabd2f8 brw: Use nir_texop_resinfo_intel for query_levels and txs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This eliminates the need to special case query_levels.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40451>
2026-03-29 12:53:10 +00:00
Ian Romanick
dff1e8ae28 brw: Handle scalars and swizzles correctly in is_const_zero
v2: Massive simplification based on feedback from Ken.

Fixes: 96cde9cc01 ("intel/fs: Emit better code for bfi(..., 0)")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
2026-03-24 01:31:25 +00:00
Kenneth Graunke
4bfa7a602c brw: Don't emit HALT_TARGET for VS/TCS/TES/GS
This isn't needed and will allow simplifications in the next patch.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
2b6c6f8130 brw: Lower TCS single patch invocation ID calculations in NIR
This is a bit less code and also drops one more TCS-specific thing
from the "run" function.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Kenneth Graunke
7d463a45f7 brw: Simplify GS load_invocation_id handling
Just return the register instead of having multiple functions stash the
register in an array of registers.  Way too much hoopla here.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
2026-03-12 21:40:37 +00:00
Lionel Landwerlin
f508c6acbb brw/nir: improve shader_indirect_data_intel handling
Use is_scalar to know if we can do transpose loading.

Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).

Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
2026-03-10 18:24:04 +00:00
Lionel Landwerlin
e14d6b535c brw/nir: add new intrinsics to load data from the indirect address
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
2026-03-06 06:34:43 +00:00
Lionel Landwerlin
d7c64af78e brw: use scalar build for immediate offsets
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
2026-03-06 06:34:43 +00:00
Caio Oliveira
df4042371f anv: Set PIPELINE_SELECT systolic mode based on shader usage
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
2026-02-26 19:05:56 +00:00
Lionel Landwerlin
7f19814414 brw/nir: handle inline_data_intel more like push_data_intel
It's pretty much the same mechanism, except it's a different register
location.

With this change we gain indirect loading support.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:09 +00:00
José Roberto de Souza
39ec9e3448 intel/brw: Add and call brw_lsc_supports_base_offset() in places that checks for support of this feature
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39817>
2026-02-19 16:53:03 +00:00
Alyssa Rosenzweig
5386e93865 brw: use data helper
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39939>
2026-02-19 14:47:11 +00:00
Kenneth Graunke
4bdef9824a anv, brw: Consolidate ex_bso bits to a static devinfo inline
If we have extended bindless surface offset (ExBSO) support, we want to
use it.  Consolidate the anv_physical_device and brw_compiler bits into
a single static inline that take devinfo.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:47 +00:00
Alyssa Rosenzweig
bd5ebbb2f8 brw: drop buggy SLM optimization
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:

non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39795>
2026-02-13 20:28:28 +00:00
Sagar Ghuge
1fb8435b77 nir: Add nir_resource_intel_internal entry
Will use the load/store_ssbo with nir_resource_intel_internal later in
this series.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Lionel Landwerlin
2ef29502ed brw: enable ex_bso for LSC_SS
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Lionel Landwerlin
9bb152c9a9 brw: make PULL_CONSTANT opcodes more like MEMORY opcodes
Using binding & binding_type sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Kenneth Graunke
3b4af8907f brw: Delete wm_prog_data::urb_setup_channel[]
The entire array is always initialized to zero and never modified.

Cuts the size of brw_wm_prog_data by 32%.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
2026-02-09 21:56:04 +00:00
Caio Oliveira
6b0e29bc77 brw: Fix cooperative matrix constant sources other than src0
Code was wrongly using src0 to pick the constant value.

Fixes: bf9ad36f2d ("brw: Properly handle cooperative matrices created with constants")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39769>
2026-02-09 19:52:16 +00:00
Kenneth Graunke
c5859b2d40 intel: Rename wm_prog_key to fs_prog_key
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is the shader key for the fragment shader.  Nobody even knows
what the windowizer/masker unit is or does anymore.  Even on Gen4-6,
"fs" is still clearer.  This makes the codebase easier to read.

This is only about 15 years overdue.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:52:01 -08:00
Kenneth Graunke
56e638be81 intel: Rename wm_prog_data to fs_prog_data
This is the program data for the fragment shader.  Nobody even knows
what the windowizer/masker unit is or does anymore.  Even on Gen4-6,
"fs" is still clearer.  This makes the codebase easier to read.

This is only about 15 years overdue.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:51:59 -08:00
Kenneth Graunke
beb4b78fe7 intel: Rename intel_msaa_flags to intel_fs_config
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.

We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:51:43 -08:00
Georg Lehmann
e5f1e08f3e brw: remove unpack_half support
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>
2026-02-06 06:12:36 +00:00
Kenneth Graunke
6fbe201a12 brw: Convert VS/TES/GS outputs to URB intrinsics.
For VS/TES/GS, we lower all outputs to temporaries and emit copies at the
end of the shader (or for GS, at each EmitVertex() call) from those
temporaries back to real outputs.  We use vec8 URB writes without
writemasking, since our output area's contents are undefined anyhow.

This is simpler than what TCS and Mesh do, which allow for output
variables to be read/written at a per-component level at any time,
with the output memory being used for cross-thread communication.

Rather than using the complicated TCS/Mesh handling and relying on
vectorization, we port the emit_urb_writes() approach to NIR.  This
also takes care of emitting the VUE header with default values when
fields aren't explicitly written by the shader.

We also handle multiview in the process.  It simplifies things, and
also drops another case of non-semantic IO in brw.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:21 +00:00
Kenneth Graunke
2af44670ed brw: Implement load_urb_output_handle_intel for VS/GS stages
Simply get the payload field.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:20 +00:00
Iván Briano
5b48805b42 brw: fix local_invocation_index with quad derivaties on mesh/task shaders
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.

Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.

Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*

Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
2026-01-27 22:28:19 +00:00
Kenneth Graunke
07ac0e3463 brw: Skip vec8 store_urb_vec4_intel noop writemasks as well
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
dbb24ff56b brw: Assert that urb_vec4_intel stores only have 4/8 components
vec1-3, 5-7, and 9+ are not supported.  Only vec4 and vec8.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Caio Oliveira
74f1d4f47b intel/compiler: Use SPDX annotations
Minor adjustments to formatting of the copyright line, but keep
dates and holders.  "Authors" entries that could be
obtained via Git logs were also removed.

The license in brw_disasm.c and elk_disasm.c don't match directly
any SPDX pattern I could find, so kept as is.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39503>
2026-01-24 20:37:31 +00:00
Caio Oliveira
dc352f3d7c brw: Don't increment block loads addresses unless needed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39447>
2026-01-24 04:38:23 +00:00
Lionel Landwerlin
a19e949824 brw: move coarse_z computation to NIR
So that we can print it easily with debug printfs

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38996>
2026-01-21 16:00:52 +00:00
Lionel Landwerlin
3d2a696763 brw: treat inline parameters like UNIFORM
Makes a bunch of copy propagation and other passes work much better.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>
2026-01-20 21:25:53 +00:00
Caio Oliveira
b542ac4ca0 brw: Fix and properly use increment_a64_address()
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.

Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.

Found when writing some new code involving large block loads.

Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
2026-01-15 19:47:23 +00:00
Lionel Landwerlin
fd744b0c8a brw: switch buffer/image size intrinsics lowering to NIR
Fossil-db DG2:

Totals from 127 (0.01% of 1799288) affected shaders:
Instrs: 60593 -> 60508 (-0.14%); split: -0.15%, +0.01%
Cycle count: 7099635 -> 7116148 (+0.23%); split: -0.12%, +0.35%
Spill count: 468 -> 466 (-0.43%)
Fill count: 224 -> 222 (-0.89%)
Max live registers: 6418 -> 6424 (+0.09%); split: -0.06%, +0.16%
Non SSA regs after NIR: 11228 -> 11220 (-0.07%); split: -0.20%, +0.12%

Fossil-db LNL:

Totals from 135 (0.01% of 1573226) affected shaders:
Instrs: 55173 -> 55143 (-0.05%); split: -0.07%, +0.01%
Cycle count: 9178338 -> 9157052 (-0.23%); split: -0.32%, +0.09%
Spill count: 454 -> 452 (-0.44%)
Fill count: 181 -> 179 (-1.10%)
Max live registers: 12915 -> 12919 (+0.03%); split: -0.06%, +0.09%
Non SSA regs after NIR: 10860 -> 10852 (-0.07%); split: -0.20%, +0.13%

shader-db LNL:

total instructions in shared programs: 16911578 -> 16911566 (<.01%)
instructions in affected programs: 1602 -> 1590 (-0.75%)
helped: 7
HURT: 0
helped stats (abs) min: 1.0 max: 2.0 x̄: 1.71 x̃: 2
helped stats (rel) min: 0.48% max: 1.10% x̄: 0.75% x̃: 0.74%
95% mean confidence interval for instructions value: -2.17 -1.26
95% mean confidence interval for instructions %-change: -0.96% -0.55%
Instructions are helped.

total loops in shared programs: 5168 -> 5168 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 848964184 -> 848955094 (<.01%)
cycles in affected programs: 1528020 -> 1518930 (-0.59%)
helped: 9
HURT: 6
helped stats (abs) min: 2.0 max: 8484.0 x̄: 1212.89 x̃: 20
helped stats (rel) min: 0.02% max: 3.23% x̄: 0.57% x̃: 0.11%
HURT stats (abs)   min: 2.0 max: 1608.0 x̄: 304.33 x̃: 15
HURT stats (rel)   min: <.01% max: 0.59% x̄: 0.19% x̃: 0.07%
95% mean confidence interval for cycles value: -1875.18 663.18
95% mean confidence interval for cycles %-change: -0.75% 0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3345 -> 3345 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 1777 -> 1777 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

total sends in shared programs: 869299 -> 869299 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39258>
2026-01-14 10:37:32 +00:00
Lionel Landwerlin
c3bd1a1688 brw: handle layer_id only through system value
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39259>
2026-01-12 19:53:36 +00:00
Lionel Landwerlin
081c5bc6a5 brw: fix derivatives on non 32bit floats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14600
Meh'd-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39226>
2026-01-12 15:18:46 +00:00
Lionel Landwerlin
b996b03f21 brw: enable topology opcodes in SIMD32
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36181>
2026-01-12 12:19:21 +00:00
Lionel Landwerlin
faa857a061 intel: rework push constant handling
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nr_params & params array are gone.

brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)

The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.

Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
2026-01-09 14:19:52 +00:00
Kenneth Graunke
d83c699045 brw: Convert GS pulled inputs to use URB intrinsics
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
2025-12-18 06:39:02 +00:00
Caio Oliveira
47d8ed1177 brw: Move PLN/LINE normalization
Add validation for Source 0 and move the normalization into
the code producing the instruction.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
2025-12-16 01:34:44 +00:00