Commit graph

213345 commits

Author SHA1 Message Date
Mel Henning
e9432eb3e0 nvk/cmd_copy: Use PIPELINED for user transfers
Vulkan requires applications to insert any necessary pipeline barriers.

Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
08861bad46 nvk: WFI on the most recent subc
This should be a bit faster. It also matches what the proprietary driver
generates, based on the reverse engineering done here:
https://gitlab.freedesktop.org/mhenning/re/-/tree/main/vk_test_overlap_exec

Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
8447dba5b3 nvk: INVALIDATE_SHADER_CACHES on most recent subc
This should be a bit faster.

Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mohamed Ahmed
7a0e7d24bb nvk: Use the compute MME for compute dispatch
Switching from compute to 3D and vice versa leads to a long stall which
destroys compute performance. This switches to the compute MME on Ampere
onwards (which was where it was added) for compute dispatches which eliminates
stalling from sub-channel switching in these cases.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mohamed Ahmed
146a64524d nouveau/mme: Add unit tests for sharing between compute and 3D scratch registers
Co-developed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Faith Ekstrand
0bfe27553d nvk: Actually reserve 1/2 for FALCON
In 03f785083f ("nvk: Reserve MME scratch area for communicating with
FALCON"), we said we reserved these but actually only reserved 0.  Only
0 is actually used today but if we're going to claim to reserve
registers we should actually do it.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mohamed Ahmed
17ab1d463f nouveau/headers: Add AMPERE_B compute subchannel definition
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
0e3781df7f vulkan: Drop vk_pipeline_stage_flags2_has_*_shader
These are no longer used anywhere. Moreover, it's not clear that they
can be used for a correct implementation of pipeline barriers since a
correct implementation cannot ignore execution deps in non-shader
stages.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
2eeef34e35 nvk/cmd_buffer: Remove redundant tests for access
In each of these cases, the spec mandates that apps pair a memory barrier
specified with access with a relevant exec barrrier specified by stages.
We therefore don't need to wfi based on access - the tests on stage are
sufficient.

Acked-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
515793d5bb nvk: Fix execution deps in pipeline barriers
We were under-synchronizing before. In particular, `stages` form
execution barriers even in the absence of a memory barrier in the
`access` flags.

The particular issue that prompted this was one where we weren't waiting
on a pipeline barrier in Baldur's Gate 3 with:

    srcStageMask == VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT &&
    srcAccessMask == VK_ACCESS_2_SHADER_READ_BIT &&
    dstStageMask == (VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT |
                     VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT) &&
    dstAccessMask == (VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
                      VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT)

Based on the spec and discussion in
https://github.com/KhronosGroup/Vulkan-Docs/issues/131 the read bit in
srcAccessMask doesn't really matter here - what matters is that there's
an execution barrier on the fragment stage which needs to prevent the
fragment shader exection from overlapping with the later call's
fragment tests (which write to the depth attachment).

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13909
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
895bbb7601 nvk: Combine BARRIER_{COMPUTE,RENDER}_WFI
When we want to WFI, we only need to do so on a single channel. The
others will implicitly get a WFI from the channel switch.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:24 +00:00
Mel Henning
6c44390e80 nvk: Only run one INVALIDATE_SHADER_CACHES
This is presumably the same cache across compute and 3d, so we only need
to run one of these, not two.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
2025-10-11 16:58:23 +00:00
Lorenzo Rossi
b56b5b90f7 nvk: Fix QMD buffer length on upload
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Current code allocates the maximum QMD data for all generations and
uploads everything, even on generations where a smaller QMD buffer
suffices. This is not only wasteful, but actually crashes Kepler GPUs
due to complications with the QMD queue.

Only upload the useful bytes of the QMD buffer.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14070
Fixes: 0e268dad00 ("nvk: Allow for larger QMDs")
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37815>
2025-10-11 08:20:22 +00:00
Surafel Assefa
a219308867 wsi: Implements scaling controls for DRI3 presentation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30701>
2025-10-11 06:59:37 +00:00
Caio Oliveira
74859c19fb intel/executor: Add a matrix multiplication example
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
2025-10-11 01:02:45 +00:00
Caio Oliveira
1e0ee84841 intel/executor: Add DPAS examples for HF/F, UB/UD and BF/F
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
2025-10-11 01:02:45 +00:00
Caio Oliveira
62f07dc5e3 intel/executor: Add script directory to package.path
In Lua, modules (i.e. files with lua code) are loaded by using
the standard library require(), e.g.

```
local mylib = require("mylib")

mylib.do_something()
```

The require() will decide where to look by peeking at `package.path`
table.  By default it doesn't include the scripts directory, so running
executor from the script directory vs. from the root of the repo would
yield different results (require works vs. require fail to find the
module).  This patch includes the script directory to avoid this issue.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
2025-10-11 01:02:45 +00:00
Caio Oliveira
86947062e9 intel/executor: Expose a devinfo table
So we can pull other values from devinfo struct.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
2025-10-11 01:02:44 +00:00
Caio Oliveira
5987269750 intel/executor: Drop check_ver and check_verx10 functions
Favor explicit version checks, that can use different types of
comparisons other than equality on a list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
2025-10-11 01:02:44 +00:00
Emma Anholt
f8729ee920 ir3: Use bitset range operations.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This sped up the debugoptimized compile of a fossil I was looking at by
7%.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>
2025-10-10 23:13:04 +00:00
Emma Anholt
aa85e3331f ir3/parser: Make sure relative accesses have a size set.
This will avoid assertion failures about a size==0 in the upcoming change
to regmask bitset handling, when collect_info() usees them to track
references into the current alias table.  We know that relative accesses
won't go to the alias table, but that code doesn't.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>
2025-10-10 23:13:04 +00:00
Emma Anholt
30b7772ae4 ir3: Move the big block of C support code out of the parser .y file.
This way you get nice syntax highlighting and clang-formatting and all
that when trying to edit the C code.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>
2025-10-10 23:13:04 +00:00
Lionel Landwerlin
febac6d9bd anv: fix query copy with shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
First this is only possible on RCS or CCS engines.

Second if on CCS, we need to use a compute shader, 3D won't work.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37818>
2025-10-10 21:31:09 +00:00
Jesse Natalie
c2d288bf97 microsoft/compiler: Respect write masks when lowering unaligned loads and stores
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37778>
2025-10-10 19:53:15 +00:00
Jesse Natalie
b3242516ad microsoft/compiler: Use lower_mem_access_bit_sizes for scratch/shared
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37778>
2025-10-10 19:53:15 +00:00
Emma Anholt
f7cbc7b1c5 radv: Allocate BOs as implicit sync even if the WSI is doing implicit sync.
As noted, the flag we allocate with controls whether *anyone* can implicit
sync on the BO through amdgpu interfaces, not just whether our fd does.
This restores radv to the behavior before the regressing commit.

Fixes: 4dcf32c56e ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>
2025-10-10 19:17:04 +00:00
Emma Anholt
38ac55ebff radv: Restore marking WSI image's mem->buffer as uncached.
Prior to 4dcf32c56e, radv was getting a request for implicit sync, even
when we were doing the work to do implicit sync in the WSI.  Once that was
turned off, we incidentally dropped flagging WSI's mem->buffer as
uncached, due to it being under the wrong condition.

Fixes: 4dcf32c56e ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>
2025-10-10 19:17:04 +00:00
Ian Romanick
ca493b5c45 brw: elk: Fix name of function in comment
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Trivial.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
1e691e68e2 nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181254 -> 17181046 (<.01%)
instructions in affected programs: 35834 -> 35626 (-0.58%)
helped: 130 / HURT: 2

total cycles in shared programs: 888543370 -> 888554248 (<.01%)
cycles in affected programs: 7443984 -> 7454862 (0.15%)
helped: 95 / HURT: 87

fossil-db:

Lunar Lake
Totals:
Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00%

Totals from 6842 (0.87% of 790721) affected shaders:
Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00%
Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19%
Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01%
Spill count: 530380 -> 530084 (-0.06%)
Fill count: 613416 -> 612900 (-0.08%)
Scratch Memory Size: 20089856 -> 20075520 (-0.07%)
Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8034616 -> 8034584 (-0.00%)

Totals from 6653 (0.73% of 905545) affected shaders:
Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00%
Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40%
Spill count: 1953 -> 1657 (-15.16%)
Fill count: 3556 -> 3040 (-14.51%)
Scratch Memory Size: 92160 -> 77824 (-15.56%)
Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 55768 -> 55736 (-0.06%)

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
b948e6d503 brw: Use BFN to implement nir_opt_bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181559 -> 17181254 (<.01%)
instructions in affected programs: 250921 -> 250616 (-0.12%)
helped: 303 / HURT: 0

total cycles in shared programs: 888542568 -> 888543370 (<.01%)
cycles in affected programs: 49861772 -> 49862574 (<.01%)
helped: 181 / HURT: 110

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 233260591 -> 233260196 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754501248 -> 32754567116 (+0.00%); split: -0.00%, +0.00%
Max live registers: 71738476 -> 71738442 (-0.00%)
Non SSA regs after NIR: 67837262 -> 67837108 (-0.00%); split: -0.00%, +0.00%

Totals from 226 (0.03% of 790721) affected shaders:
Instrs: 382227 -> 381832 (-0.10%); split: -0.15%, +0.05%
Cycle count: 72863878 -> 72929746 (+0.09%); split: -0.65%, +0.74%
Max live registers: 36557 -> 36523 (-0.09%)
Non SSA regs after NIR: 60427 -> 60273 (-0.25%); split: -0.26%, +0.00%

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
4193895145 brw/cmod: Enable limited cmod propagation for BFN
cmod propagation needs more work. Since the result type is always UD,
BRW_CONDITION_G should be able to substitute for NZ. Either that or
users of the condition could be rewritten to use an inverted condition.

v2: Add a couple more unit tests. Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
fb193ac190 brw/builder: Add BFN
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:10 +00:00
Ian Romanick
a947e0c4db brw: Constant propagation and constant combining support for BFN
v2: Commute immediate values out of src[1].

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:10 +00:00
Ian Romanick
8a71f5e672 brw: BFN does not support source modifiers
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:10 +00:00
Ian Romanick
60c07e500d brw: Basic validation for BFN
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:10 +00:00
Ian Romanick
d2077e24f6 brw/disasm: Pretty print the BFN equation as an annotation
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Ian Romanick
fdb01f2a5a brw/disasm: Fix BFN disassembly of src1 and src2
The negate and abs bits of src1 and src2 are repurposed for some of the
function control value bits.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Zach Battleman
ca2a067469 brw: Initial bits of BFN support
v2 (idr): So much rebasing. Deleted a bunch of code that we're not
going to need yet.

v3 (Ken): bfn inst encoding fix

v4 (idr): Add BFN to brw_get_lowered_simd_width.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:09 +00:00
Ian Romanick
f7939f2fdc nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub
I noticed some things related to this while implementing support for
bitfield_select / BFN in BRW.

shader-db:

Lunar Lake
total instructions in shared programs: 17183140 -> 17183128 (<.01%)
instructions in affected programs: 3830 -> 3818 (-0.31%)
helped: 6 / HURT: 0

total cycles in shared programs: 889936934 -> 889936056 (<.01%)
cycles in affected programs: 253758 -> 252880 (-0.35%)
helped: 4 / HURT: 2

No shader-db changes on any other Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738646 -> 71738626 (-0.00%)
Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%)

Totals from 177 (0.02% of 790723) affected shaders:
Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00%
Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01%
Max live registers: 39364 -> 39344 (-0.05%)
Non SSA regs after NIR: 70453 -> 70455 (+0.00%)

Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095611 -> 264095358 (-0.00%)
Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00%
Fill count: 613233 -> 613231 (-0.00%)

Totals from 123 (0.01% of 905547) affected shaders:
Instrs: 334830 -> 334577 (-0.08%)
Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22%
Fill count: 4145 -> 4143 (-0.05%)

Tiger Lake and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 269733849 -> 269733590 (-0.00%)
Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01%

Totals from 123 (0.01% of 903812) affected shaders:
Instrs: 338617 -> 338358 (-0.08%)
Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
073ffceef6 elk: Enable saturating float to integer conversion opcodes
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:06 +00:00
Ian Romanick
65e8220180 brw: Enable saturating float to integer conversion opcodes
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:06 +00:00
Ian Romanick
986086c846 nir: Add saturating float to integer conversion opcodes
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.

v3: Apply "NaN should be zero" fix suggested by Georg.

Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:05 +00:00
Pohsiang (John) Hsu
5ce8b34a10 mediafoundation: update version to 1.07
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:53 -07:00
Pohsiang (John) Hsu
03baa8ac72 mediafoundation: remove extra ';'
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:44 -07:00
Pohsiang (John) Hsu
eb088e339f mediafoundation: periodic clang format - no code changes
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>
2025-10-10 09:36:13 -07:00