Commit graph

16201 commits

Author SHA1 Message Date
Calder Young
bec5d3fff5 anv: Add workaround for vertex explosions in Split Fiction
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The game tries to use anisotropic filtering deep in some control flow
while updating a procedural displacement map, our sampling hardware
does not check the channel enable mask before calculating the
derivatives for each subspan, which causes it to get garbage for any
subspans that have partially disabled lanes.

This workaround converts any sample messages in fragment shaders that
have divergent control flow into a sample_d message with the derivatives
zero'd by software if some of the lanes are disabled.

Closes: #12796
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41716>
2026-05-26 21:21:55 +00:00
Calder Young
abe41f3acf brw: Add workaround pass for shaders using derivatives in control flow
Using derivatives in control flow that is not uniform across a subspan
will produce "undefined behavior" in GLSL.

On Intel hardware, this means the sampler will just always compute the
derivatives from whatever values are in each lane of a subspan in the
raw payload, regardless if some have been disabled and contain garbage.

Unfortunately, some applications seem to expect the sampler to ignore
disabled lanes in these cases instead of computing their derivatives
anyway from garbage, so for those we need a pass that finds any sample
messages in divergent control flow and converts them to a sample_d with
the derivatives zero'd by software if one or more lanes required to
calculate them have been disabled.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41716>
2026-05-26 21:21:55 +00:00
Caio Oliveira
c88e30f0e4 anv, brw: Use previous shader VUE map for FS input layout when available
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The FS compilation needs the VUE map from the previous stage when the FS
has more inputs than SBE_SWIZ can remap.  Recomputing a VUE map just
from FS inputs loses certain slots like multiview slots and extra
position slots (for primitive replication), so high-numbered attributes
can read the wrong source.

When available, pass the previous stage VUE map to the FS compilation
and use it.  Make sure that the payload is sized based on what is read,
in case the previous stage has more outputs than FS reads.

Bugs did not surface when there were just 16 or fewer varying inputs,
because the driver can program SBE_SWIZ to translate the positions in
the previous shader VUE into what the FS wants.  For more inputs, this
mapping is not used, and the FS must use the exact same slots.

Note this is not a problem for pipeline libraries because they use
a different fixed layout.  This is also not an issue with
EXT_shader_object because multiview draws are not allowed with that
extension.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41747>
2026-05-23 05:17:37 +00:00
Nanley Chery
d74a03a70d anv: Flush previous aux-mode changes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If the aux-usage changes, we need to flush out the previous mode from
the cache (see iris's flush_previous_aux_mode()).

I ran into this while testing layout-based compression toggling with the
Hogwarts Legacy trace on DG2. The trace exhibited graphical corruption
unless the DATA_CACHE was flushed.

On an unmodified driver, this currently only affects transitions from
AUX_NONE->AUX_CCS_D.

Backport-to: *
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Nanley Chery
c49674188e anv: Move storage check out of CCS-compat helper
Provides clearer reasons for INTEL_DEBUG=perf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Nanley Chery
4b9561c733 anv: Allow CCS on more storage images for gfx12.5
To do this, we use a heuristic that depends on the image format and size
(see HSD 18014810884).

Average of two runs on an A750 from the performance CI:

* Naraka    +0.89%
* TWWH3     +0.45%
* Control   +0.37%
* Cyberpunk +0.35%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Nanley Chery
df446326f8 anv: Avoid aux-disabling paths for block-compression
We don't support CCS on block-compressed textures prior to Xe2. On Xe2,
CCS is enabled on every image.

Improves INTEL_DEBUG=perf outputs. For example, in the Naraka trace on
DG2, we now report that r32_uint is CCS_E-incompatible instead of
bptc_rgba. This incompatibility is due to the storage usage flag and
will be clarified in future commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Nanley Chery
8e736f4d0d anv: Improve the CCS_E-incompatible perf-warn
Print the image format which is incompatible (or has an incompatible
list). On gfx12+, the format list shouldn't impact CCS_E-compatibility.
So, not printing the entire list should be sufficient on those
platforms.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Nanley Chery
ab9a1b4c92 anv: Improve the fast clear layout perf-warn
anv emits performance warnings earlier about compression being disabled,
so no need to emit this for AUX_NONE. Do provide the tiling however as
Xe2+ supports compressed linear surfaces.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
2026-05-22 20:51:27 +00:00
Sagar Ghuge
04fe65e2bb brw/rt: Use BLAS(Object) level to get the ray address
Intersection shader works on custom procedural geometries which are
present only in BLAS (Object) level not in the TLAS (World) level.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41739>
2026-05-22 19:26:59 +00:00
Lionel Landwerlin
fd11e4b4d3 intel: switch shader hash to 64bit value
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41748>
2026-05-22 15:05:30 +00:00
Lionel Landwerlin
c09f00d339 anv: use shader source hash rather than cmd_buffer fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41748>
2026-05-22 15:05:28 +00:00
Lionel Landwerlin
294644643e brw: avoid requiring a valid render target for empty fragment shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Dishonered 2 or DXVK is creating pipelines with empty fragment
shaders. With alpha-to-coverage a dynamic state, we currently consider
there is a need for a render target but if the shader is not writing
anything, it's not needed.

This change only considers the color output writes as it's the alpha
channel there that is used for coverage computation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41711>
2026-05-22 09:53:33 +00:00
Lionel Landwerlin
f34dd96ab5 anv: fix render target remapping tracking at the beginning of render passes
At the beginning of render passes we need to consider all entries as
unknown because it's all new color outputs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15475
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41711>
2026-05-22 09:53:33 +00:00
Lionel Landwerlin
f35a0f3ba5 anv: fix missing bindless flag hashing
It got dropped in a rebase it seems...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41711>
2026-05-22 09:53:33 +00:00
Caio Oliveira
ffa4bc7d6a anv: Simplify code that calls brw/jay
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41633>
2026-05-22 00:57:20 -07:00
Caio Oliveira
33475c0cce brw: Move key and prog_data to base compile params
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41633>
2026-05-22 00:57:20 -07:00
Caio Oliveira
7893eefa3b brw: Use a single brw_compile entrypoint
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41633>
2026-05-22 00:57:20 -07:00
Valentine Burley
190ce8280f meson: Add Soong compatibility compiler flags to Vulkan drivers
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Suggested by @gurchetansingh.

Android's Soong build system treats several compiler warnings as errors
by default: https://android.googlesource.com/platform/build/soong/+/27f57506/cc/config/global.go/#218

To catch these issues in Mesa, introduce `soong_compat_c_args`
and `soong_compat_cpp_args` with the following flags treated as errors:
 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
 -Werror=date-time
 -Werror=gnu-alignof-expression
 -Werror=ignored-qualifiers
 -Werror=implicit-fallthrough
 -Werror=int-conversion
 -Werror=missing-prototypes
 -Werror=pragma-pack
 -Werror=pragma-pack-suspicious-include
 -Werror=sizeof-array-div
 -Werror=string-plus-int
 -Werror=unreachable-code-loop-increment

These compatibility flags are added to the meson configurations
for ANV, Gfxstream, Lavapipe, PanVK, Turnip, and Venus.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
2026-05-22 07:09:49 +00:00
Lionel Landwerlin
dd41fde91d anv: use the new generation script for drirc
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41664>
2026-05-22 06:32:39 +00:00
Lionel Landwerlin
83ed74b5df hasvk: add a driver section for drirc
Only adding the workarounds that have an actual effect on that driver.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41664>
2026-05-22 06:32:39 +00:00
Lionel Landwerlin
af88ba317d hasvk: rename a couple of drirc options
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41664>
2026-05-22 06:32:39 +00:00
Sagar Ghuge
73382c8126 brw/rt: Update committed hit leaf type properly
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We want extract the leaf type from potential hit and assign it
to commited hit.

Instead of that, we were simply assigning leaf type 0x7 to commited hit.

This patch mask out leaf type with nir_iand_imm and also update the
incorrect field comment.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41667>
2026-05-22 00:47:39 +00:00
Caio Oliveira
2c64e12462 intel/executor: Add performance counter support
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Add optional OA performance counter collection around each execute()
call. Examples:

```
  # List all profiles and counters, with descriptions.
  $ executor --oa list

  # Collect all counters from a profile.
  $ executor --oa ComputeBasic file.lua

  # Collect a subset of counters from a profile, separated by comma.
  $ executor --oa ComputeBasic:GpuTime,AvgGpuCoreFrequency file.lua

  # By default use ComputeBasic profile, so counter names only also work.
  $ executor --oa GpuTime file.lua
```

The selected counters are printed to stdout after the script finishes,
or written to a file specified by --oa-csv FILENAME.

Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>
2026-05-21 16:46:35 -07:00
Caio Oliveira
8d237b5408 intel/executor: Add an overflow check for alloc function
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>
2026-05-21 16:46:35 -07:00
Caio Oliveira
0dda43819e intel/compiler: Move bison command to shared meson.build
It is used by both brw and elk.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41738>
2026-05-21 22:15:00 +00:00
Sagar Ghuge
7f1defa5ef brw/rt: Commit hit even if we are skipping closest hit shader
It's not about the memory traffic but updating the Tmax value/distance
so that on next intersection, we would be comparing the updated Tmax
value/distance instead of original distance.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41709>
2026-05-21 20:45:39 +00:00
Sagar Ghuge
17f7e7f96b anv: Set execution mask based on SIMD size
Execution mask gets applied to last thread in the threadgroup to mask
off simd lanes, But with BTD enabled, we are seeing only last 4
components has valid stack ID's and upper 4 components of the register
are zero.

Changing execution mask somehow populates the stack IDs properly.

This is on simulator, before changing the execution mask:
00000000 00000000 00000000 00000000  000F000E 000D000C 000B000A 00090008  00000000 00000000 00000000 00000000  000F000E 000D000C 000B000A 00090008  r1

After changing execution mask:
000F000E 000D000C 000B000A 00090008  00070006 00050004 00030002 00010000  000F000E 000D000C 000B000A 00090008  00070006 00050004 00030002 00010000  r1

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41409>
2026-05-21 20:25:46 +00:00
Caio Oliveira
e2402f6a07 brw: Bound register coalesce rewrites by live range
When updating a register after successfully finding a pair to coalesce,
use the live range of the source register to walk only the instructions
that might use it.  Depending on the shader this allows skipping a bunch
of blocks -- and also terminating early.

Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU, the big win here was for Cyberpunk 2077.

```
// Differences at 95.0% confidence.

// Rise of the Tomb Raider (n=20)
   -0.0095 +/- 0.00706877
   -1.90572% +/- 1.40609%

// Alan Wake (n=20)
   -0.031 +/- 0.0172806
   -0.93599% +/- 0.51952%

// Borderlands 3 (n=15)
   -0.353333 +/- 0.118679
   -2.44307% +/- 0.80787%

// Oblivion Remastered (n=15)
   -0.134 +/- 0.026008
   -2.76898% +/- 0.531637%

// Baldur's Gate 3 (n=15)
   -0.954286 +/- 0.163625
   -2.21713% +/- 0.377562%

// Cyberpunk 2077 (n=20)
   -2.8665 +/- 0.228489
   -8.08661% +/- 0.621779%
```

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41495>
2026-05-21 18:32:36 +00:00
Caio Oliveira
821a812c7d brw: Don't directly use regs_read/regs_written/size_read as bound for non-trivial loops
Instead save to a local variable and use that.  In various cases the
compiler is not able to pull it out of the loop, since there are other
not inlined function calls as part of the loop's body, resulting in
repeated unnecessary calls to either size_read() or its pieces that
get inlined.

Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:

```
// Differences at 95.0% confidence.

// Rise of the Tomb Raider (n=20)
   -0.017 +/- 0.00724575
   -3.45177665% +/- 1.45084%

// Alan Wake (n=20)
   -0.153 +/- 0.00960067
   -4.99265786% +/- 0.303695%

// Borderlands 3 (n=14)
   -0.486428571 +/- 0.15354
   -3.51248195% +/- 1.0835%

// Oblivion Remastered (n=14)
   -0.143571429 +/- 0.0357991
   -3.05749924% +/- 0.747872%

// Baldur's Gate 3 (n=14)
   -1.68928571 +/- 0.151598
   -4.12128605% +/- 0.364259%
```

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
2026-05-21 18:04:14 +00:00
Caio Oliveira
3f71aab327 brw: Pass VGRF numbers to liveness helpers
Compute var_from_reg() once in setup_def_use() and pass the variable
number to setup_one_read() and setup_one_write().  This lets the loops walk
consecutive variable numbers directly instead of mutating a brw_reg offset.

Also: setup_one_write() is only called for VGRFs, so remove the check
for VGRF there.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
2026-05-21 18:04:14 +00:00
Caio Oliveira
9975a35f43 brw: Avoid unnecessary calls to size_read() in flags_read()
Only ARF sources are relevant in this case, so check the file
before calling size_read().

Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:

```
// Differences at 95.0% confidence.

// Rise of the Tomb Raider (n=20)
   No difference proven

// Alan Wake (n=20)
   -0.0725 +/- 0.0139437
   -2.30965276% +/- 0.438787%

// Borderlands 3 (n=14)
   -0.248571429 +/- 0.135107
   -1.76946153% +/- 0.954171%

// Oblivion Remastered (n=14)
   -0.0735714286 +/- 0.0235712
   -1.54770849% +/- 0.492117%

// Baldur's Gate 3 (n=14)
   -0.832142857 +/- 0.23095
   -1.98028217% +/- 0.545648%
```

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
2026-05-21 18:04:13 +00:00
Caio Oliveira
bb8d8a2141 brw: Call size_read() once in regs_read()
regs_read() itself gets inlined, but size_read() does not.  In GCC
release builds this results in three calls to size_read() at each site,
one of them due to how MIN2 is expanded.  Use a local variable to store
the result.

Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:

```
// Differences at 95.0% confidence.

// Rise of the Tomb Raider (n=20)
   -0.013 +/- 0.00596452
   -2.56410256% +/- 1.15623%

// Alan Wake (n=20)
   -0.1755 +/- 0.0144896
   -5.29491628% +/- 0.425556%

// Borderlands 3 (n=14)
   -0.562142857 +/- 0.129678
   -3.84765816% +/- 0.870239%

// Oblivion Remastered (n=14)
   -0.0821428571 +/- 0.0262485
   -1.69867061% +/- 0.537247%

// Baldur's Gate 3 (n=14)
   -1.61357143 +/- 0.21693
   -3.69788342% +/- 0.486462%
```

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
2026-05-21 18:04:13 +00:00
Caio Oliveira
3850922b78 brw: Save original regs_written() value in register coalesce
The instruction may get transformed, modifying the destination before
the loop index gets incremented.  So save the original regs_written
value to be used in the loop increment.

While we are here, assert that all the slots in mov[] are filled
at this point in the code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
2026-05-21 18:04:13 +00:00
Michael Cheng
ec778a297f brw: Fix ordered dependency exec_all handling on Xe2+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On Xe2+ the Wa_1407528679 NoMask workaround is disabled, so
baked_ordered_dependency_mode() should treat all instructions as
exec_all, matching the logic in gather_inst_dependencies() and
emit_inst_dependencies().

Without this, ordered RegDist dependencies from uniform/WE_all
producers (e.g. 'mov s0, imm') are not found during baking and
fall through as separate WE_all SYNC NOPs. Real shaders pile up
dozens of these in front of masked sends.

v2(Caio): Fix existing scalar_register test expectations

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Fixes: 47a6ef3fef ("brw/scoreboard: Use a predicate helper for the nomask workaround")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>
2026-05-21 16:50:50 +00:00
Caio Oliveira
26e832d069 brw/scoreboard: Add disabled tests for RegDist baking on Xe2+
Add two tests verifying that ordered RegDist dependencies from
uniform/WE_all producers are baked into the consumer's SWSB on Xe2+.
Disabled for now since they fail on current main.

Reviewed-by: Michael Cheng <michael.cheng@intel.com>
Assisted-by: Pi coding agent (Opus-4.7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>
2026-05-21 16:50:50 +00:00
Alyssa Rosenzweig
3a447b4065 jay: use new fs payload variable more
blow up harder if we try to load stuff in the wrong stage

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
ababf12b04 jay: add a hack until we munge barycentrics dynamically
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
a56aa9547b jay: Call constant folding before collecting FS outputs
Fixes "multiple stores to the same location" assertions in tests like
dEQP-VK.pipeline.monolithic.color_write_enable_maxa.cwe_after_bind.attachments3_more0

In that case, the stores were actually to different locations, but some
constant additions hadn't been folded into the location field yet.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
23884ee02c jay: Prohibit JAY_STRIDE_8 for EXPAND_QUAD
No idea why we're getting a stride 8 here, but we can't handle it.
Fixes baldurs_gate_3.vk.foz --graphics-pipeline-range 2248 2249.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
a9525f4b44 jay: hack for sample position
Adding this to the list of design constraints for the next RA rework.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
1e31be0e52 jay: fix omask on single sample
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.singlesample_rbo

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
6a02e228bc jay: Implement load_fs_config_intel
We could lower this in to load_push_data_intel in NIR, but it's trivial,
and probably less code just to implement it directly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
3d91cb9d1e jay: Implement coverage mask
This is the actual MSAA coverage mask.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
35622f165f jay, nir: Make a dispatch_mask_intel intrinsic
jay is trying to use the fragment shader dispatch mask for helper
invocation lowering, but it was using load_sample_mask_in for that
(now load_coverage_mask_intel).  But this isn't the MSAA coverage
mask, the two are different payload fields.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
0f3a311591 jay: Implement sample position
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
a590500802 jay: Add a GPR_FROM_UGPRS opcode
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
4555cd23c6 jay: Set Dispatch GRF Start Register in jay_setup_payload()
We want it to be set to wherever the push constants ended up.
Setting it close to the setup_payload_push() call makes this easier.

We'll also be adding some extra UGPRs for the fragment shader payload
soon, and the partitioning code will just have one big UGPR partition
for payload fields, push constants, and general purpose UGPRs, so it
really won't know how to do this very well without duplicating a bunch
of information.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
0670b40013 jay: Add comments summarizing the PS thread payload layout
The documentation is large and hard to follow due to all the optional
fields and the SIMD16 vs. SIMD32 split for barycentrics.  This quick
summary helps clarify what fields exist, which are split for SIMD32
or kept together, and which pairs of registers are involved for splits.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
6c142f7edc jay: Implement sample mask writes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00