Commit graph

213310 commits

Author SHA1 Message Date
Ian Romanick
d2077e24f6 brw/disasm: Pretty print the BFN equation as an annotation
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Ian Romanick
fdb01f2a5a brw/disasm: Fix BFN disassembly of src1 and src2
The negate and abs bits of src1 and src2 are repurposed for some of the
function control value bits.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Zach Battleman
ca2a067469 brw: Initial bits of BFN support
v2 (idr): So much rebasing. Deleted a bunch of code that we're not
going to need yet.

v3 (Ken): bfn inst encoding fix

v4 (idr): Add BFN to brw_get_lowered_simd_width.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Ian Romanick
f7939f2fdc nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub
I noticed some things related to this while implementing support for
bitfield_select / BFN in BRW.

shader-db:

Lunar Lake
total instructions in shared programs: 17183140 -> 17183128 (<.01%)
instructions in affected programs: 3830 -> 3818 (-0.31%)
helped: 6 / HURT: 0

total cycles in shared programs: 889936934 -> 889936056 (<.01%)
cycles in affected programs: 253758 -> 252880 (-0.35%)
helped: 4 / HURT: 2

No shader-db changes on any other Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738646 -> 71738626 (-0.00%)
Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%)

Totals from 177 (0.02% of 790723) affected shaders:
Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00%
Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01%
Max live registers: 39364 -> 39344 (-0.05%)
Non SSA regs after NIR: 70453 -> 70455 (+0.00%)

Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095611 -> 264095358 (-0.00%)
Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00%
Fill count: 613233 -> 613231 (-0.00%)

Totals from 123 (0.01% of 905547) affected shaders:
Instrs: 334830 -> 334577 (-0.08%)
Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22%
Fill count: 4145 -> 4143 (-0.05%)

Tiger Lake and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 269733849 -> 269733590 (-0.00%)
Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01%

Totals from 123 (0.01% of 903812) affected shaders:
Instrs: 338617 -> 338358 (-0.08%)
Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
073ffceef6 elk: Enable saturating float to integer conversion opcodes
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:06 +00:00
Ian Romanick
65e8220180 brw: Enable saturating float to integer conversion opcodes
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:06 +00:00
Ian Romanick
986086c846 nir: Add saturating float to integer conversion opcodes
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.

v3: Apply "NaN should be zero" fix suggested by Georg.

Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:05 +00:00
Pohsiang (John) Hsu
5ce8b34a10 mediafoundation: update version to 1.07
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:53 -07:00
Pohsiang (John) Hsu
03baa8ac72 mediafoundation: remove extra ';'
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:44 -07:00
Pohsiang (John) Hsu
eb088e339f mediafoundation: periodic clang format - no code changes
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:13 -07:00
Pohsiang (John) Hsu
d35735b32d mediafoundation: create sample allocator for SW input sample on demand to save video memory
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:35:58 -07:00
Silvio Vilerino
5061b7ba1a mediafoundation: mftransform async slices parsing, avoid heap allocation inside loop
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:34:39 -07:00
Ashish Chauhan
6e189ba6c1 pvr: Drop '-experimental' suffix from the 'imagination' build option
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The imagination-experimental flag has been replaced with the imagination flag,
as the driver is now Vulkan conformant.

Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
2025-10-10 15:29:04 +00:00
Ashish Chauhan
1143363a4f pvr: Drop broken driver environment variable check for BXS-4-64
The Imagination PowerVR Vulkan driver is now conformant on BXS-4-64, so drop the
PVR_I_WANT_A_BROKEN_VULKAN_DRIVER runtime check for this GPU. This eliminates
the need for the user to explicitly opt in via an environment variable.

Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
2025-10-10 15:29:04 +00:00
Frank Binns
206bef1560 docs/features: claim vk 1.2 for pvr
Although the PowerVR driver isn't passing Vulkan 1.2 conformance yet, all the
required support has been implemented and it's very close to passing all the
tests now.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
2025-10-10 15:29:03 +00:00
Ashley Smith
a8fb3671e8 panfrost,mesa: Fix versions for EXT_shader_clock
ES version was missed from extension table

Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>
2025-10-10 14:58:34 +00:00
Ashley Smith
09d86f9863 panfrost,mesa: Fix versions for EXT_shader_realtime_clock
ES version was missed from extension table

Fixes: c5500cd1 ("mesa: Add support for GL_EXT_shader_realtime_clock")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>
2025-10-10 14:58:34 +00:00
Hans-Kristian Arntzen
2848901722 radv: Actually fail custom border color sampler creation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: a52483d9e7 ("radv: fix capture/replay with sampler border color")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37787>
2025-10-10 14:25:54 +00:00
Samuel Pitoiset
183ed8046c radv: allow VK_FORMAT_S8_UINT with host image copy
Depth/stencil formats still need to be properly implemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>
2025-10-10 13:46:51 +00:00
Samuel Pitoiset
ef900e93fc ac/surface: fix host image copies with stencil-only
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>
2025-10-10 13:46:51 +00:00
Samuel Pitoiset
9a7f1401d8 ac/surface: fix host image copies with 96-bits formats
Fixes dEQP-VK.image.host_image_copy.simple.r32g32b32_* with
RADV_PERFTEST=hic on RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>
2025-10-10 13:46:51 +00:00
Samuel Pitoiset
d063072182 radv: rename radv_mark_descriptor_sets_dirty()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Descriptor heaps will be marked as dirty in this function too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:05 +00:00
Samuel Pitoiset
34b3dae3b6 radv: make radv_descriptor_get_va() a static function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:05 +00:00
Samuel Pitoiset
08dbab0600 radv: rename shader arg descriptor_sets to descriptors
It's more generic and descriptor heaps will use it too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:03 +00:00
Samuel Pitoiset
609ae4e647 radv: rename indirect_descriptor_sets to indirect_descriptors
With descriptor heap the driver will also have to emit indirect
descriptor heaps in some cases.

Rename couple of things to make them more generic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:03 +00:00
Samuel Pitoiset
0ff1ce4ac5 radv: use force_indirect_desc_sets when creating RT prologs
This is cleaner and this field has been added exactly for that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:02 +00:00
Samuel Pitoiset
055b10a75c radv: do not initialize HiZ on transfer queue on RDNA4
Emitting compute dispatches on SDMA would just hang.

This fixes pending depth/stencil copy tests on transfer queue with
RADV_PERFTEST=transfer_queue.

Fixes: e6c485afb0 ("radv: initialize HiZ metadata during image layout transitions")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37795>
2025-10-10 12:50:02 +00:00
Martin Roukala (né Peres)
0fbd9e3894 zink/ci: run the a750 job in pre-merge
In order to fit within the time budget, we parallelize the job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37758>
2025-10-10 11:48:51 +00:00
Martin Roukala (né Peres)
33d7be0d9f turnip/ci: squeeze a750-vk into 4 jobs
The drop in coverage should allow us enable to pre-merge testing on
zink.

While we are at it, I used the result of a previous stress test to
prune the flakes list.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37758>
2025-10-10 11:48:51 +00:00
Lionel Landwerlin
b8ae4ede60 brw: add serialize send stats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
37a9c5411f brw: serialize messages on Gfx12.x if required
The Intel EU fusion feature needs to be disabled on SEND messages
where either the texture handle, sampler handle, sampler header is not
identical on fused threads.

This is the case in particular with accesses on non-uniform
texture/sampler handles but could also strike with dynamic
programmable offsets (currently disabled).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
301b71a19f compiler: add an access flag for intel EU fusion
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
c7ac46a1d8 nir/lower_io: add get_io_index_src_number support for image intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
ca1533cd03 nir/divergence: add a new mode to cover fused threads on Intel HW
The Intel Gfx12.x generation of GPU has an architecture feature called
EU fusion in which 2 subgroups run lock step. A typical case where
this happens is a compute shader with 1x1x1 local workgroup size and a
dispatch command of 2x1x1. In that case 2 threads will be run in lock
step for each of the workgroup.

This has been the sources of some troubles in the backend because one
subgroup can run with all lanes disabled, requiring care for SEND
messages using the NoMask flag (execution regardless of the lane mask).

We found out that other things are happening when 2 subgroups run
together :
  - the HW will use the surface/sampler handle from only one subgroup
  - the HW will use the sampler header from only one subgroup

So one of the fused subgroup can access the wrong surface/sampler if
the value is different between the 2 subgroups and that can happen
even with subgroup uniform values.

Fortunately we can flag SEND instructions to disable the fusion
behavior (most likely at a performance cost).

This change introduce a new divergence mode that tries to compute
things divergent between subgroups so that we can flag instructions
accordingly.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Simon Perretta
79923115e7 nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_recompute_io_bases will modify i/o intrinsics, which is not the
expected behaviour when the keep_intrinsics flag is set.

Fixes: 83aecc8f3f ("mesa/st, nir: commonize unlower_io_to_vars pass")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>
2025-10-10 11:53:24 +01:00
Kenneth Graunke
dd9e002129 brw: Fix mesh shader asserts in clip/cull distance setting
mesh doesn't use brw_vue_prog_data.  Also, I had been catching TCS
shaders here, and shouldn't.

Fixes: bf76e86bc8 ("brw: Refactor clip/cull distance mask setting into a helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37809>
2025-10-10 09:51:26 +00:00
Juan A. Suarez Romero
9f45f09b86 glsl: use array element type to validate assignment
When comparing an vec3 and a vec4 array, scalar type is the same for
both (float). Instead use the array element type to compare (that is,
vec3 vs vec4).

Fixes
spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert
piglit test.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>
2025-10-10 09:19:55 +00:00
Job Noorman
6d59a3e3e7 nir/lower_alu: use Knuth's Algorithm M for [iu]mul_high
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This significantly simplifies the handling of signed numbers as the same
code path can handle signed and unsigned numbers by simply using ishr
instead of ushr for some of the shifts. For both cases, the number of
additions and shifts are also reduced.

Note that LLVM uses the same algorithm.

fossil-db stats for Turnip:

Totals from 4849 (2.94% of 164705) affected shaders:
MaxWaves: 52318 -> 52332 (+0.03%); split: +0.04%, -0.02%
Instrs: 5262458 -> 5218922 (-0.83%); split: -0.87%, +0.05%
CodeSize: 10831900 -> 10655170 (-1.63%); split: -1.64%, +0.01%
NOPs: 829481 -> 836010 (+0.79%); split: -0.95%, +1.74%
MOVs: 176187 -> 173788 (-1.36%); split: -3.27%, +1.91%
COVs: 104096 -> 86543 (-16.86%); split: -16.87%, +0.01%
Full: 90434 -> 90158 (-0.31%); split: -0.33%, +0.03%
(ss): 131091 -> 130866 (-0.17%); split: -0.87%, +0.70%
(sy): 55550 -> 55769 (+0.39%); split: -0.92%, +1.32%
(ss)-stall: 406003 -> 407194 (+0.29%); split: -1.10%, +1.39%
(sy)-stall: 1668213 -> 1678082 (+0.59%); split: -1.31%, +1.90%
Preamble Instrs: 1105270 -> 1067290 (-3.44%); split: -3.50%, +0.06%
Constlen: 423776 -> 423560 (-0.05%)
Last helper: 1038202 -> 1035540 (-0.26%); split: -0.42%, +0.16%
Last baryf: 38908 -> 38632 (-0.71%)
Subgroup size: 336640 -> 336832 (+0.06%)
Cat0: 916209 -> 922848 (+0.72%); split: -0.87%, +1.59%
Cat1: 282813 -> 262845 (-7.06%); split: -7.49%, +0.43%
Cat2: 2198715 -> 2183012 (-0.71%); split: -0.72%, +0.01%
Cat3: 1390914 -> 1376421 (-1.04%)
Cat7: 123127 -> 123116 (-0.01%); split: -0.24%, +0.23%

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
2025-10-10 05:31:17 +00:00
Job Noorman
18f69890d1 nir: add nir_shr builder
Sometimes we need to select between ishr/ushr based some condition; this
builder makes this less verbose.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
2025-10-10 05:31:17 +00:00
Olivia Lee
10a8defecc util/macros: coerce likely/unlikely to bool even without __builtin_expect
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Coercing the argument to a bool when we have __builtin_expect but
leaving it unmodified otherwise is a recipe for really subtle bugs. I
don't know if any bugs like that exist currently, but I almost
introduced one in panfrost.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37801>
2025-10-10 01:37:28 +00:00
Lionel Landwerlin
196c7903b9 anv: fix companion usage for emulated image
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We need to return true if we need the companion batch.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e60416b4e4 ("anv: use companion batch for operations with HIZ/STC_CCS destination")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37797>
2025-10-09 21:38:33 +00:00
Kenneth Graunke
bb096b0f12 brw: Use BITFIELD_{MASK,RANGE} in clip/cull distance mask handling code
Suggested by Alyssa.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
2025-10-09 13:20:04 -07:00
Kenneth Graunke
bf76e86bc8 brw: Refactor clip/cull distance mask setting into a helper
This was copy pasted between 4 different stages.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
2025-10-09 13:20:03 -07:00
Kenneth Graunke
b3c511592a brw: Replace type_size_xvec4 with glsl_count_attribute_slots
This is nearly identical, except for bindless sampler/texture/image
handling.  But we only use it for inputs/outputs, not uniforms, where
there are no bindless handles to worry about.

Deletes a lot of mostly-duplicated code.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
2025-10-09 13:20:00 -07:00
Kenneth Graunke
a12f117cef brw: Stop using type_size_dvec4 for fragment shader outputs
There are no 64-bit renderable formats so we can't have FS outputs that
are dvecs.  This dates back to 2016 and a ton of the backend has been
rewritten, so I think whatever this was trying to solve is no longer a
problem.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
2025-10-09 13:19:56 -07:00