Commit graph

5217 commits

Author SHA1 Message Date
Alyssa Rosenzweig
3d94ba1d20 jay: make indirect push data blow up more obviously
fail to crash:

dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.glsl_memory_model.basic_usecase.load.push_constant.int32

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:31 +00:00
Alyssa Rosenzweig
b10c0d95a8 jay: optimize pack_32_2x16_split(#0, x)
Kinda pointless but whatever.

Totals from 10 (0.38% of 2647) affected shaders:
Instrs: 6846 -> 6830 (-0.23%)
CodeSize: 95728 -> 95520 (-0.22%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:31 +00:00
Alyssa Rosenzweig
5ebf0c9161 jay: elide atomic dests
simd16 results. kinda noisy but obviously the right thing to do.

Totals from 45 (1.70% of 2647) affected shaders:
Instrs: 59182 -> 59194 (+0.02%); split: -0.11%, +0.14%
CodeSize: 905200 -> 904752 (-0.05%); split: -0.17%, +0.12%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:31 +00:00
Alyssa Rosenzweig
b3fe01e2c1 jay: fix bfn with 0xffff constant
awkward.

Totals from 128 (4.84% of 2647) affected shaders:
Instrs: 258121 -> 257970 (-0.06%); split: -0.07%, +0.01%
CodeSize: 3662400 -> 3661792 (-0.02%); split: -0.14%, +0.12%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:30 +00:00
Alyssa Rosenzweig
c5cee5d973 jay: add JAY_DEBUG=noacc option
can help when debugging RA.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:30 +00:00
Alyssa Rosenzweig
9dbaaecb74 jay: swap predication/acc pass order
Lets us use more accumulators, I think this is well motivated. Saw this in a
test shader.

Totals from 242 (9.14% of 2647) affected shaders:
Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00%
CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
2026-05-12 22:46:30 +00:00
Ian Romanick
907cc49c32 brw: Calcuate divergence before brw_from_nir
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We were previously assuming that potentially stale divergence data was
valid. On some paths the register pressure estimator would recalculate
this, but, as is obvious from the results, not always.

v2: Add an assertion in brw_from_nir_emit_impl to ensure we don't end
up in this situation again.

v3: Call nir_divergence_analysis from
brw_nir_lower_deferred_urb_writes. This fixes assertion failures (the
assertion added in v2) in basically every graphics shader. The
altnerative was to call it from brw_compile_vs, brw_compile_gs, and
brw_compile_tes.

shader-db:

All Intel platformms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050403 -> 17054033 (0.02%)
instructions in affected programs: 296344 -> 299974 (1.22%)
helped: 0 / HURT: 376

total cycles in shared programs: 876063126 -> 875817316 (-0.03%)
cycles in affected programs: 78627328 -> 78381518 (-0.31%)
helped: 91 / HURT: 276

LOST:   1
GAINED: 10

fossil-db:

All Intel platformms had similar results. (Lunar Lake shown)
Totals:
Instrs: 913770429 -> 916075391 (+0.25%); split: -0.00%, +0.26%
CodeSize: 14647414640 -> 14726176320 (+0.54%); split: -0.02%, +0.56%
Cycle count: 102308091527 -> 102290664775 (-0.02%); split: -0.26%, +0.24%
Spill count: 3469632 -> 3469124 (-0.01%); split: -0.08%, +0.07%
Fill count: 5007038 -> 4998674 (-0.17%); split: -0.51%, +0.34%
Max live registers: 192568853 -> 192595355 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 48713168 -> 48712880 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 140252767 -> 140253718 (+0.00%)

Totals from 223099 (11.11% of 2007586) affected shaders:
Instrs: 314077245 -> 316382207 (+0.73%); split: -0.01%, +0.75%
CodeSize: 5335583824 -> 5414345504 (+1.48%); split: -0.06%, +1.54%
Cycle count: 45868025821 -> 45850599069 (-0.04%); split: -0.58%, +0.54%
Spill count: 2062649 -> 2062141 (-0.02%); split: -0.14%, +0.11%
Fill count: 3343019 -> 3334655 (-0.25%); split: -0.76%, +0.51%
Max live registers: 36762498 -> 36789000 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 5542224 -> 5541936 (-0.01%); split: +0.03%, -0.03%
Non SSA regs after NIR: 43727142 -> 43728093 (+0.00%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41370>
2026-05-11 21:03:19 +00:00
Caio Oliveira
d08d345686 brw: Remove references to SIMD4x2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In Gfx9 the enum value was changed to mean SIMD8 double precision, so
drop the old unused enum.  At least on Gfx9 there is an extension bit
to set to use the old SIMD4x2 mode, we can recover if we ever need this
in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41457>
2026-05-11 20:16:02 +00:00
Iván Briano
2ad92e3ea4 anv/brw: handle FullyCoveredEXT
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
2026-05-11 18:15:50 +00:00
Iván Briano
58006eaaa4 anv/brw: add conservative raster on/off to FS_CONFIG
FullyCovered will need to know if conservative rasterization is enabled,
so pass it on to the shader.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
2026-05-11 18:15:50 +00:00
Iván Briano
fea8830946 intel/brw: add load_frag_shading_rate_intel
Add a new intrinsic to read the raw shading rate provided to the FS
payload, and lower load_frag_shading_rate in NIR using it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
2026-05-11 18:15:49 +00:00
Iván Briano
5383afadbf intel/brw: add load_msaa_rate_intel intrinsic
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
2026-05-11 18:15:49 +00:00
Iván Briano
3448f3ce4a intel/brw: add load_coverage_mask_intel intrinsic
We'll need the raw coverage mask provided to the fragment shader in a
future patch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
2026-05-11 18:15:49 +00:00
Caio Oliveira
46cd7b6e28 brw: Move brw_prog_data_init to a different file
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The generator code will be reworked, remove this unrelated
function from there.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41458>
2026-05-10 00:07:15 +00:00
Caio Oliveira
2273533504 brw: Fix some indentation in brw_generator.cpp
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Will reduce noise in later changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41459>
2026-05-09 16:40:32 -07:00
Paulo Zanoni
7eab94d542 intel/nir: fix sparse shadow comparison for BRW
While Jay overwrites sparse_tex->op with the newer opcodes that only
return red and the sparse stuff, BRW keeps using the original opcode
of the cloned instruction, so it can't change def->num_components.

This was not previously detectable since we did not have sparse
enabled for depth/stencil on Anv for a while. A patch to re-enable
that was proposed a while ago (MR !37423), never merged, but then a
recent attempt to try to merge it (by me) detected this regression.
Let's fix the regression first, then we can finally re-enable sparse
depth/stencil support in Anv, hopefully.

Fixes: 7468261d3d ("intel/nir: Make intel_nir_lower_sparse work for either brw or jay")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>
2026-05-07 23:47:51 +00:00
Kenneth Graunke
2729b1608f brw: Limit SIMD width based on NIR rather than first backend compile
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I originally added this mechanism to have the first (SIMD8) compile
note that certain features were in use which would prevent SIMD16/32
from compiling, so we could skip the work of trying those.

But these days, there aren't many cases, and the ones we have are
easily detectable based on the NIR.  We can detect it earlier without
even having to do the SIMD8 compile.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
c5928d40ae brw: Drop dead code from dispatch limit check for dual source blending
We checked that ver is 11 or 12.  It can't be >= 20.  This is dead code.

Dual source blending on Xe2 does not have native SIMD32 RT write message
support, but SIMD splitting is currently lowering it to low/high SIMD16
message pairs when using SIMD32 dispatch.  I'm not aware of any of the
hardware errata from previous platform still applying.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
599d26db00 brw: Set prog_data::dual_src_blend from NIR outputs written bitfield
Simpler and set earlier.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
afb97ff2af brw: Switch FS outputs to semantic IO and FRAG_RESULT_DUAL_SRC_BLEND
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1.  This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location.  That cascades to meaning we no longer have to
store a tuple in driver_location.  And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
fbaa5ad0c3 iris: Implement force_dual_color_blend_by_location via NIR
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR.  By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Alyssa Rosenzweig
5636a57f60 jay/lower_scoreboard: use SYNC.allrd/allwr
This collapses piles of silliness.

Totals:
CodeSize: 71626288 -> 70710000 (-1.28%)

Totals from 1634 (61.73% of 2647) affected shaders:
CodeSize: 66319376 -> 65403088 (-1.38%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig
c1dc9d3b1a jay/lower_scoreboard: be the sole emitter of SYNC
this gets closer to something we can schedule and avoids some pointless syncs.

Totals from 491 (18.55% of 2647) affected shaders:
Instrs: 602994 -> 602946 (-0.01%)
CodeSize: 9063888 -> 9015904 (-0.53%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig
0885ed10f5 jay/lower_scoreboard: use .src annotations
This is less heavy handed, avoiding unnecessary stalls after SENDs in a
bunch of common cases. The stats (SIMD32) are:

Totals:
CodeSize: 70345392 -> 71674272 (+1.89%)

Totals from 1774 (67.02% of 2647) affected shaders:
CodeSize: 67359248 -> 68688128 (+1.97%)

What's happening here is we are inserting extra SYNC.nop instructions in a
bunch of cases for the .src preceding the eventual .dst. However, putting aside
the i-cache impact for a moment, this is showing the optimization doing what it
should (deferring dst syncs and inserting cheaper src syncs first). So this
should be positive in reality despite the negative stat impact.

The most hurt shaders are pooling up SYNC.nop's at the end of blocks due to
local-only SWSB and lack of SYNC.allwr optimization. The latter is added later
in this MR. The former is planned.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
130e724d5e jay/lower_scoreboard: refactor SYNC.nop insertion
for next commit

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
1ecd75a397 jay/lower_scoreboard: fix tracking for A@* and *@7
update the tracking with what we actually waited on, not what we ideally wanted
to wait on. reduces extra annotations in some cases.

SIMD32:

Totals from 194 (7.33% of 2647) affected shaders:
CodeSize: 14473840 -> 14469088 (-0.03%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
93edf9a3fd jay/lower_scoreboard: refactor wait pipe code
for next commit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
18e09858eb jay/lower_scoreboard: elide more dependencies
IGC does these optimizations and I think they should be safe given my mental
model. Given a sequence like:

   r0 = add.f32 r1, r2
   r1 = add.f32 r3, r4

Each ALU pipe is pipelined but in-order. Therefore, the second add cannot
possibly complete before the first add, so it cannot write r1 before the first
add reads r1, so we can elide the write-after-read dependency. That in term
avoids a pipeline bubble between the two instructions. Ditto for
write-after-write.

Similarly if the distance is too great within an in-order pipe since there is a
maximum pipeline length, it's not infinite.

Note that if there was cross-pipe dependencies we do need the annotation since
the pipes themselves are parallel.

SIMD32:

Totals from 58 (2.19% of 2647) affected shaders:
CodeSize: 3316592 -> 3315056 (-0.05%); split: -0.05%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
e4dc161277 jay: assign accumulators post-RA
Greedy post-RA substitution pass, similar to IGC's AccSubstitution pass.
Stats together with the previous commits.

SIMD16:

   Totals from 2209 (83.45% of 2647) affected shaders:
   Instrs: 2701029 -> 2696350 (-0.17%)
   CodeSize: 39166720 -> 40372272 (+3.08%); split: -0.36%, +3.44%

SIMD32:

   Totals from 2211 (83.53% of 2647) affected shaders:
   Instrs: 4691165 -> 4641188 (-1.07%)
   CodeSize: 69365792 -> 69341616 (-0.03%); split: -0.50%, +0.47%

The instruction count reduction is from RA shuffle code getting coalesced via
accumulators. The code size changes are from:

* Fewer moves from the instr count reduction (helped)
* Smaller MADs encoded as MACs (helped)
* Fewer SYNC.nop due to fewer scoreboarding annotations (helped)
* Less compaction due to explicit accumulator operands (hurt)

I expect significant cycle count changes from this but we don't have a cycle
model wired up yet, so reading the assembly will have to do.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
8b324591d1 jay: move simd32 deswizzling to float pipe
for more accumulator usage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
712719a2ae jay: do moves on the float pipe where possible
this allows us to use accumulators more.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
6f2b1cece6 jay: model MAC
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
b6e88ab904 jay/to_binary: fix packing of simd-split accumulators
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Lionel Landwerlin
6f5d30c0a2 anv: add apply_layout support for device bindable shaders/pipelines
We consider them like bindless stages (no binding table) as much as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:44 +00:00
Lionel Landwerlin
1281e2b9a0 anv/intel: add device generated commands shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:43 +00:00
Lionel Landwerlin
c30a4d4fdb anv/brw/nir: fix wa_18019110168
Several things were wrong :
  - incorrect offset in the FS push constant data
  - incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:41 +00:00
Lionel Landwerlin
f309f0b1a0 intel: add resource intrinsic support for heaps
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
25bc517ef5 brw: add heap support to brw_lower_storage_image
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
5ec7d31e20 brw/lower_texel_address: add heap support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Calder Young
4120ae4963 brw: Avoid vectorizing loads in NIR if it could extend into a different page
Took inspiration from RADV to make nir_opt_load_store_vectorize robust against
page faults, by checking the align_offset and align_mul to see if any extra
components could be overlapping into a different page.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
3ac6233655 brw: Avoid rounding every convergent block load up to a full register
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
8ce98fedc4 anv: Make sure robust UBO access does not fault
We can just conditionally replace the address with an address to a zero
initialized cacheline if the read is going to go out of bounds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Caio Oliveira
1ebc14bcb9 brw: Stop tracking inline parameter usage in prog_key/prog_data
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist.  They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.

In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup.  Remove/Update some stale
comments since we are here.

Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
2026-04-30 16:39:22 +00:00
Caio Oliveira
e1745e0bd9 brw: Fix max_dispatch_width collection for CS with variable size
The intention of the original commit was to make all the shaders report
the same max_dispatch_width.  When CS has multiple variants, this was
not happening as expected.

Fixes: 2acc2f18ea ("intel/compiler: report max dispatch width statistic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41209>
2026-04-29 15:52:04 +00:00
Alyssa Rosenzweig
a78634ccb0 jay/to_binary: rename grf -> phys_reg
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
since it covers accumulators to

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
ab87a035c9 jay: drop a bunch of stale TODO and XXX
These are either done, or never going to be done, or otherwise stale or
silly or unnecessary. Drop a bunch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
70d09d97ef jay: predicate NoMask instructions in uniform IF's
Totals:
Instrs: 4742391 -> 4742257 (-0.00%)
CodeSize: 70245120 -> 70243520 (-0.00%); split: -0.00%, +0.00%

Totals from 81 (3.06% of 2647) affected shaders:
Instrs: 337727 -> 337593 (-0.04%)
CodeSize: 4992992 -> 4991392 (-0.03%); split: -0.03%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
f199f00564 jay: adjust flag replication
Now instructions still read/write UFLAG, which preserves the information about
lane 0 we need for proper predication etc.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
930d36b54a jay: smarten predication pass
Merge the empty else optimization, the then-block predication, and the
break-while fusion into a unified "try to predicate each side of an if, peephole
optimizing control flow" optimization. This is simpler and more general.

Totals:
Instrs: 4783809 -> 4775647 (-0.17%)
CodeSize: 70766656 -> 70674064 (-0.13%); split: -0.13%, +0.00%

Totals from 1109 (41.90% of 2647) affected shaders:
Instrs: 4130644 -> 4122482 (-0.20%)
CodeSize: 61180848 -> 61088256 (-0.15%); split: -0.15%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00