Commit graph

217839 commits

Author SHA1 Message Date
Olivia Lee
4959f45e99 Revert "panvk: advertise VK_EXT_primitives_generated_query on v10+"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit 6eadcaa851.

VK_EXT_primitives_generated_query has a dependency on
VK_EXT_transform_feedback, which we do not implement yet. This is
breaking the android CTS. It will be reenabled once transform feedback
is in.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39547>
2026-01-27 23:34:19 +00:00
Georg Lehmann
1240444e63 spirv: assert fp_math_ctrl was reset after use
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:44 +00:00
Georg Lehmann
3deb57b654 spirv: remove vtn_builder::exact
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:44 +00:00
Georg Lehmann
51d30d0f96 spirv: consider both source and dest type for fast math
This matters for conversions and and comparisons.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:44 +00:00
Georg Lehmann
46a617884e spirv: use base type instead of bit size to determine fp_math_ctrl
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:44 +00:00
Georg Lehmann
565f37b98c spirv: handle fast_math for opencl opcodes
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:42 +00:00
Georg Lehmann
836efa8c3c spirv: move NoContraction handling into vtn_handle_fp_fast_math
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39460>
2026-01-27 23:01:42 +00:00
Iván Briano
5b48805b42 brw: fix local_invocation_index with quad derivaties on mesh/task shaders
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.

Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.

Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*

Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
2026-01-27 22:28:19 +00:00
Emma Anholt
eb990cd81e nir: Bump test timeouts.
nir_opt_algebraic_tests has been pushing our qemu-ed tests over the line.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39563>
2026-01-27 21:31:14 +00:00
Christian Gmeiner
d19460ffc4 etnaviv: Emit alpha_to_coverage dither table only when needed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of unconditionally emitting the dither table during GPU state
reset, only emit it when alpha_to_coverage is actually enabled in
the blend state. A tracking flag avoids redundant re-emission until the
next GPU state reset.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39557>
2026-01-27 21:14:33 +00:00
Georg Lehmann
2d38da94d4 aco: allow v_cmpx with DPP
The wording in the RDNA3 ISA doc was since clarified, v_cmpx with DPP
behaves exactly like one would expect:
FI controls whether the source value can be read from inactive lanes,
but inactive lanes always write a 0 bit. The same applies to v_cmp with DPP.

Foz-DB Navi48:
Totals from 987 (1.20% of 82405) affected shaders:
Instrs: 517003 -> 516445 (-0.11%); split: -0.11%, +0.00%
CodeSize: 2782688 -> 2780508 (-0.08%); split: -0.08%, +0.00%
Latency: 2059169 -> 2056327 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 365374 -> 365328 (-0.01%); split: -0.03%, +0.01%
Copies: 64669 -> 65616 (+1.46%)
SALU: 70693 -> 70652 (-0.06%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:51 +00:00
Georg Lehmann
1c1bd9d090 aco: only apply DPP with 3 or less uses
Creating many new DPP instructions increases code size and decreases throughput.

Foz-DB Navi48:
Totals from 2196 (2.67% of 82179) affected shaders:
MaxWaves: 59930 -> 59960 (+0.05%); split: +0.08%, -0.03%
Instrs: 3718514 -> 3718298 (-0.01%); split: -0.08%, +0.07%
CodeSize: 20593544 -> 20507660 (-0.42%); split: -0.43%, +0.02%
VGPRs: 135924 -> 135744 (-0.13%); split: -0.17%, +0.04%
Latency: 33174704 -> 33163001 (-0.04%); split: -0.07%, +0.04%
InvThroughput: 6500723 -> 6491382 (-0.14%); split: -0.15%, +0.01%
VClause: 72348 -> 72343 (-0.01%); split: -0.06%, +0.05%
SClause: 83160 -> 83165 (+0.01%); split: -0.03%, +0.04%
Copies: 286592 -> 285575 (-0.35%); split: -0.45%, +0.09%
Branches: 99970 -> 99971 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 103280 -> 103279 (-0.00%)
PreVGPRs: 95590 -> 95440 (-0.16%); split: -0.30%, +0.14%
VALU: 1931369 -> 1931725 (+0.02%); split: -0.08%, +0.09%
SALU: 637663 -> 636780 (-0.14%); split: -0.15%, +0.01%
VOPD: 65236 -> 65589 (+0.54%); split: +0.91%, -0.37%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:51 +00:00
Georg Lehmann
bb6a3e2891 aco/optimizer: rework how dpp is applied
Using the common helpers means we can use VINTERP instead of DPP,
which has higher throughput and smaller CodeSize.

Foz-DB Navi48:
Totals from 986 (1.20% of 82405) affected shaders:
Instrs: 1985282 -> 1985545 (+0.01%); split: -0.01%, +0.02%
CodeSize: 11179700 -> 11151780 (-0.25%); split: -0.26%, +0.01%
Latency: 19899190 -> 19897694 (-0.01%); split: -0.01%, +0.01%
InvThroughput: 4110650 -> 4104911 (-0.14%)
VClause: 44143 -> 44139 (-0.01%); split: -0.03%, +0.02%
Copies: 164340 -> 164344 (+0.00%); split: -0.02%, +0.02%
VALU: 1061904 -> 1061908 (+0.00%); split: -0.00%, +0.00%
SALU: 305980 -> 305974 (-0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:51 +00:00
Georg Lehmann
228cb29dae aco/optimizer: allow DPP with scalar src1 in alu_opt_info_is_valid
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:51 +00:00
Georg Lehmann
d4c0318f48 aco: apply DPP with scalar src1 on gfx11.5+
Foz-DB Navi48:
Totals from 6261 (7.62% of 82179) affected shaders:
MaxWaves: 176284 -> 176236 (-0.03%); split: +0.01%, -0.03%
Instrs: 5850185 -> 5828451 (-0.37%); split: -0.41%, +0.04%
CodeSize: 31363324 -> 31419904 (+0.18%); split: -0.08%, +0.26%
VGPRs: 328284 -> 328200 (-0.03%); split: -0.07%, +0.05%
SpillSGPRs: 2268 -> 2256 (-0.53%)
Latency: 50235516 -> 50218816 (-0.03%); split: -0.06%, +0.03%
InvThroughput: 8256243 -> 8242036 (-0.17%); split: -0.22%, +0.05%
VClause: 81000 -> 80975 (-0.03%); split: -0.11%, +0.08%
SClause: 136376 -> 136387 (+0.01%); split: -0.11%, +0.11%
Copies: 414021 -> 417894 (+0.94%); split: -0.13%, +1.07%
Branches: 105301 -> 105298 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 291360 -> 291432 (+0.02%)
PreVGPRs: 238593 -> 238729 (+0.06%); split: -0.02%, +0.08%
VALU: 3425446 -> 3403463 (-0.64%); split: -0.65%, +0.01%
SALU: 815505 -> 819372 (+0.47%); split: -0.02%, +0.50%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:51 +00:00
Georg Lehmann
3fe329b3d0 aco/ra: don't move sgpr into v_fmac_f32_dpp src0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:50 +00:00
Georg Lehmann
903d940fa9 aco: don't convert VOP3P to VOP3 when applying DPP
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:50 +00:00
Georg Lehmann
8ac7b9fc37 aco: undo operand swap if applying DPP fails
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:50 +00:00
Georg Lehmann
531228159f aco/validate: allow dpp with scalar src1 on gfx11.5+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:50 +00:00
Georg Lehmann
140ca3bb50 aco: disable DPP for rev integer subs and shifts
It is not documented anywhere, but at least on gfx12 and gfx10.3
DPP is applied to src1 instead of src0.
This might be useful for shifts, but to be safe just disable DPP
completely for now.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14739

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:49 +00:00
Georg Lehmann
510dbbae7f aco/optimizer: use opcode_supports_dpp
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:49 +00:00
Georg Lehmann
8e99bf5380 aco: add a helper function for non supported DPP opcodes
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
2026-01-27 20:42:49 +00:00
Eric Engestrom
d12e3454e6 nir/meson: fix cpp_args of nir_opt_algebraic_pattern_tests
Fixes: 4c30c44b75 ("nir: Generate unit tests for nir_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39550>
2026-01-27 20:03:16 +00:00
Nanley Chery
4512d81559 intel/blorp: Bump pitch when clearing unaligned bottom rows
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This might be faster if the layer starts at a 64KB offset. No
performance benefits found in the performance CI.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:55 +00:00
Nanley Chery
3e331e4fe9 intel/blorp: Optimize non-zero-layer fast-clears
Allow surface redescription when fast-clearing a layer > 0. This affects
at least five traces in the performance CI, but the CI doesn't report
any performance benefit from this. We already had code to handle unaligned
rows at the bottom of an image. Now that this handles the misalignment at
the top of the image range, we gain some symmetry.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:55 +00:00
Nanley Chery
ba63883692 intel/blorp: Avoid unused surface redescription calc
Suggested-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:54 +00:00
Nanley Chery
e42b2a5d70 anv: Don't partial resolve LOD1+ for non-FCV CCS
We don't allow fast-clears in this case.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:54 +00:00
Nanley Chery
21d187b7f5 anv: Support fast clears on more layers
On Xe2+, support multi-layer and non-zero-layer CCS fast-clears. To do
this in a simple manner, drop the code which splits multi-layer clears
into fast clears and slow clears. The performance CI reports no
regressions nor improvements on BMG.

For MCS on all platforms and for CCS on prior platforms, use a new
heuristic. Instead of only allowing fast clears on the first
slice/layer, do the following:

For 3D images, only fast-clear if all slices are cleared. Enables
fast-clearing every slice of 3D textures in:

   * Terminator Resistance - 480x270x128.
   * Ghostrunner 2 - 320x180x128.

For 2D arrays, match the Xe2+ behavior and allow clearing to any layer.
This is possible because we only allow fast-clearing if the clear color
matches the default value. Enables fast-clearing every layer of 2D array
textures in:

  * Assassin's Creed - 128x128, 6-layers.
  * Blackops 3 - 1024x1024, 6-layers.
  * Borderlands 3 - 128x128, 6-layers.
  * Cyberpunk - 1024x1024, 10-layers.
  * Unigine Superposition - 4K, 2-layers.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11893
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:54 +00:00
Nanley Chery
b8f6ad9060 anv: Use variable default value for some images using CLEAR
A future commit will enable clearing to more than the first layer of 2D
array images. To ensure consistency for the clear color, require the
ANV_FAST_CLEAR_DEFAULT_VALUE for such images if they make use of
ISL_AUX_STATE_CLEAR. Also, use a non-zero default value for some image
formats.

I tested the majority of workloads in the performance CI. This will
cause those which clear to 2D array layers to gain clears on more than
just the first layer. At the moment, we still only support clearing the
first layer, so there should be no change in performance. Affected games
are documented in the code.

Acked-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:53 +00:00
Nanley Chery
811c413f98 anv: Don't return the Xe2+ fast-clear type early
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:53 +00:00
Nanley Chery
7bb7b63b96 anv: Line wrap anv_CmdClearColorImage
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:52 +00:00
Nanley Chery
390c9e3fda anv: Inline the CCS/MCS predicated resolve functions
Now we can see the MI writes performed before and after the resolves in
transition_color_buffer().

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:52 +00:00
Nanley Chery
4d8c71ab1f anv: Delete conversion of CCS_D partial resolve
Now that hasvk is the driver for supporting HSW and BDW, we no longer
need to convert CCS_D partial resolves to full resolves to avoid an
assert-failure in BLORP.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:51 +00:00
Nanley Chery
b1db1179c2 anv: Set compressed bit separately from fast-clear type
This will make handling fast-clears on multiple layers simpler by saving
us from having to pass more parameters into fast-clear state setting
functions.

It also allows us to set more complex fast-clear state for FCV_CCS_E
without marking the image as compressed.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:50 +00:00
Nanley Chery
c054d4fe2f anv: Support partial resolves on any level/layer
Enables more support for FCV_CCS_E partial resolves if we ever need it.
Also enables support for multiple layers being fast cleared and needing
resolves. Support for that will arrive in several commits.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:50 +00:00
Nanley Chery
0a8ab13b9d anv: Reset fast-clear type in transition_color_buffer()
Moving the code here will simplify the task of supporting fast-clears on
multiple array layers and depth slices.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:49 +00:00
Nanley Chery
ce196c9de5 anv: Fix the fast clear type for FCV writes
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.

Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:49 +00:00
Nanley Chery
e7854d06a5 anv: Update predicated resolve documentation
* Don't mention gfx7-8 due to the hasvk split.
* Account for the array of clear colors.

Fixes: 0e6b132a75 ("anv: Access more colors in fast_clear_memory_range")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:48 +00:00
Nanley Chery
6c6b2d8f30 iris: Use the CLEAR state on Xe2+ for MCS
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.

Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:48 +00:00
Nanley Chery
3b642f7456 iris: Set missing flags on clear color changes
When changing the clear color without a fast clear, use dirty bits to
ensure that surfaces with inline clear colors are updated and that
partial resolves are done as needed.

Remove the flags at the bottom of fast_clear_color() as
blorp_fast_clear() already sets them for us.

Fixes: 64d861b700 ("iris: Skip some fast-clears even on color changes")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:48 +00:00
Nanley Chery
eb4a581e44 intel/isl: Fix QPitch of arrayed MCS
From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+,

   This field must be set to an integer multiple of the Surface
   Vertical Alignment

Accomplish this by aligning the height of each MCS layer to main
surface's vertical alignment. Prevents the following test group from
failing on Xe2 when a future commit enables multi-layer fast-clears in
anv:

   dEQP-VK.api.image_clearing.*.
   clear_color_attachment.multiple_layers.
   *_clamp_input_sample_count_*

The main test I used to debug this:

   dEQP-VK.api.image_clearing.core.
   clear_color_attachment.multiple_layers.
   a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2

Backport-to: 25.3
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:47 +00:00
Connor Abbott
b95839b9c9 ir3: Fix barrier error case calculation
Make waves_per_wg actually be the waves per workgroup, so that the
condition below for when we should error out actually is correct.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39463>
2026-01-27 18:15:06 +00:00
Mel Henning
f3c53cf66b nvk: Disable large pages for now
Reviewed-by: Mary Guillemard <mary@mary.zone>
Fixes: cabfdb4404 ("nvk: Enable compression")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39364>
2026-01-27 17:58:40 +00:00
Georg Lehmann
4b1996b1c7 aco: fix demote in header of single iteration loop
The control is not divergent before a divergent break in a single iteration loop,
but we already pushed the loop mask on the stack.

Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14733
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39528>
2026-01-27 17:39:05 +00:00
Duncan Brawley
b8889f5eaa pvr: add basic support for shader statistics framework
Mesa now has a statistics framework. This adds support for emitting
additional statistics about PowerVR shaders for the Rogue architecture.

Add support for emitting the following statistics: Code size, scratch
size, spill count, temp count, loop count, number of inst groups, number
of main inst groups, number of bitwise inst groups and number of control
inst groups.
Add support for new PCO_DEBUG_PRINT option "stats" to emit shader stats.

Signed-off-by: Duncan Brawley <duncan.brawley@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39523>
2026-01-27 16:58:30 +00:00
Kenneth Graunke
41d7debcfe brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This avoids generating some useless math that would need to be cleaned
up later, without complicating things too much.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
24c66d3871 brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize
This helps cut down URB messages on tessellation and mesh shaders
significantly.  fossil-db results on Battlemage:

   Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01%
   Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00%
   Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00%
   Spill count: 576554 -> 576616 (+0.01%)
   Fill count: 545304 -> 545413 (+0.02%)
   Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04%
   Max dispatch width: 39856192 -> 39856208 (+0.00%)

   Totals from 4231 (0.27% of 1583648) affected shaders:
   Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40%
   Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03%
   Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70%
   Spill count: 378 -> 440 (+16.40%)
   Fill count: 1308 -> 1417 (+8.33%)
   Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36%
   Max dispatch width: 67952 -> 67968 (+0.02%)

There are several reasons we didn't go with nir_opt_vectorize_io:

1. nir_opt_vectorize_io appears to work on the slot location level.
   We want to be able to vectorize based on the URB offsets, especially
   for cases like point size, layer, and viewport which have different
   VARYING_SLOT_* values but live in the same vec4 in a URB entry.

2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize
   within a single 32-bit vec4.  It does handle 8 components, but that's
   only for packing 16-bit values into a 32-bit vec4.

Improves performance of Sascha Willems' tessellation demo by around 4%
on Meteorlake.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
aafe8967fd brw: Avoid using URB global offset with per-slot offsets on <= Icelake
Both the URB Global Offset and Per-Slot Offsets are specified to be
unsigned numbers.  The URB Global Offset is only 11 bits, and so is
limited to be between [0, 2047].  While the per-slot offsets are
given as U32 values, it would appear that adding the two offsets
does not handle 32-bit overflow/unsigned wrap correctly.

This pops up in Piglit's TCS variable-indexing tests, which ends up
performing loads from offset (x - 16) and a base of 18, and at an offset
(x) with a base of 2.  These should be equivalent, but when x <= 15, the
per-slot offset calculated in the shader is negative (0xfffffff[0-f])
and adding the base of 18 is not wrapping around correctly to [2, 17].

To work around this, avoid using the global offset when the per-slot
offset is present, and just add the two in the shader where unsigned
wrap works correctly.

Tigerlake and later don't seem to have this issue.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
07ac0e3463 brw: Skip vec8 store_urb_vec4_intel noop writemasks as well
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
dbb24ff56b brw: Assert that urb_vec4_intel stores only have 4/8 components
vec1-3, 5-7, and 9+ are not supported.  Only vec4 and vec8.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00