Commit graph

11289 commits

Author SHA1 Message Date
Alyssa Rosenzweig
84d8e6824b treewide: don't check before free
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.

Via Coccinelle patches:

    @@
    expression ptr;
    @@

    -if (ptr) {
    -free(ptr);
    -}
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -FREE(ptr);
    -}
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -ralloc_free(ptr);
    -}
    +ralloc_free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -free(ptr);
    -}
    -
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -FREE(ptr);
    -}
    -
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -ralloc_free(ptr);
    -}
    -
    +ralloc_free(ptr);

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
2025-10-15 23:01:33 +00:00
Dave Airlie
543c9be87a nir/coopmat: fix non square load/store lowering for flexible dimensions
This shouldn't affect radv, but we should do the calculations correctly for
when non-square matters.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>
2025-10-16 07:19:28 +10:00
Timothy Arceri
45a8cce5d6 util/range_remap: switch to using sorted array
Here we switch to using a sorted arrays for the binary search which
significantly speeds up the lookups. For a shader with ~8000
uniforms its up to 10x faster and the godot-tps-gles3-high.trace
in issue #13894 returns to its original runtime length before we
switched to using range remap in e052254066

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:36 +00:00
Timothy Arceri
9536de6eac util/range_remap: split list node from range entry
This will allow us to create an array of entries without the
node member in a following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:34 +00:00
Timothy Arceri
754c6d0f54 glsl/util: update util_range_remap to use range_remap struct
This will allow us to use the linked list for validation of inserts
during linking then switch to using an array for fast binary searches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:34 +00:00
Timothy Arceri
35d5e903ac util/range_remap: dont overwrite entry if ptr is NULL
If the ptr value is NULL and we match an existing entry during an
insert call we now just return the existing value. This allows us
to drop an extra lookup, this will become important as the
following patches change `util_range_remap()` lookups to use a
sorted array that is not created until after we have added all
entries to our linked list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:33 +00:00
Faith Ekstrand
f91f86dded compiler/rust: Implement dfs() non-recursively
Recursion in Rust can be annoyingly expensive.  The recursive DFS
implementation we have now is probably not terrible but these things
bloat in debug builds more than you'd expect.  This replaceas it with a
non-recursive implementation which uses a Vec instead.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
97febfde28 compiler/rust/cfg: Use DepthFirstSearch for finding reaches sets
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
76bba3ffba compiler/rust/cfg: Use DepthFirstSearch for find_back_edges()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
030005a772 compiler/rust/cfg: Use DepthFirstSearch for calc_dominance()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
7f73a027ae compiler/rust/cfg: Use DepthFirstSearch for rev_post_order_sort()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:39 +00:00
Faith Ekstrand
993f5c9e30 compiler/rust: Add a DepthFirstSearch trait
There are four depth first searches in cfg.rs.  This adds a DFS
abstraction which we can later optimize.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:38 +00:00
Daniel Schürmann
fad10b91a6 nir/divergence: don't assume that load_sample_positions_amd is always uniform
Sample positions aren't uniform when the sample id is divergent.
This was a regression when we started lowering fragment shader
barycentrics in NIR.

Fixes: 7f444fc72c ("nir: add nir_intrinsic_load_sample_positions_amd")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:10 +00:00
Job Noorman
0b82b803d9 nir,ir3: rename umul_low to umul_16x16
This is more in line with similar opcodes like umul_32x16.

Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
2025-10-14 12:54:54 +00:00
Georg Lehmann
92d670021a nir/opt_intrinsics: optimize atomics to atomic load/store
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
e98b218ddd nir/opt_intrinsics: don't pass nir options around
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
142079759b nir/opt_uniform_atomics: optimize xchg with uniform address and data
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:16 +00:00
Georg Lehmann
e08911dff4 nir: fix nir_get_io_offset_src for global_atomic_swap_amd
Fixes: 354df09c88 ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>
2025-10-13 09:41:41 +00:00
Ian Romanick
1e691e68e2 nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181254 -> 17181046 (<.01%)
instructions in affected programs: 35834 -> 35626 (-0.58%)
helped: 130 / HURT: 2

total cycles in shared programs: 888543370 -> 888554248 (<.01%)
cycles in affected programs: 7443984 -> 7454862 (0.15%)
helped: 95 / HURT: 87

fossil-db:

Lunar Lake
Totals:
Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00%

Totals from 6842 (0.87% of 790721) affected shaders:
Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00%
Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19%
Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01%
Spill count: 530380 -> 530084 (-0.06%)
Fill count: 613416 -> 612900 (-0.08%)
Scratch Memory Size: 20089856 -> 20075520 (-0.07%)
Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8034616 -> 8034584 (-0.00%)

Totals from 6653 (0.73% of 905545) affected shaders:
Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00%
Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40%
Spill count: 1953 -> 1657 (-15.16%)
Fill count: 3556 -> 3040 (-14.51%)
Scratch Memory Size: 92160 -> 77824 (-15.56%)
Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 55768 -> 55736 (-0.06%)

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
f7939f2fdc nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub
I noticed some things related to this while implementing support for
bitfield_select / BFN in BRW.

shader-db:

Lunar Lake
total instructions in shared programs: 17183140 -> 17183128 (<.01%)
instructions in affected programs: 3830 -> 3818 (-0.31%)
helped: 6 / HURT: 0

total cycles in shared programs: 889936934 -> 889936056 (<.01%)
cycles in affected programs: 253758 -> 252880 (-0.35%)
helped: 4 / HURT: 2

No shader-db changes on any other Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738646 -> 71738626 (-0.00%)
Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%)

Totals from 177 (0.02% of 790723) affected shaders:
Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00%
Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01%
Max live registers: 39364 -> 39344 (-0.05%)
Non SSA regs after NIR: 70453 -> 70455 (+0.00%)

Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095611 -> 264095358 (-0.00%)
Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00%
Fill count: 613233 -> 613231 (-0.00%)

Totals from 123 (0.01% of 905547) affected shaders:
Instrs: 334830 -> 334577 (-0.08%)
Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22%
Fill count: 4145 -> 4143 (-0.05%)

Tiger Lake and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 269733849 -> 269733590 (-0.00%)
Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01%

Totals from 123 (0.01% of 903812) affected shaders:
Instrs: 338617 -> 338358 (-0.08%)
Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
986086c846 nir: Add saturating float to integer conversion opcodes
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.

v3: Apply "NaN should be zero" fix suggested by Georg.

Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:05 +00:00
Lionel Landwerlin
301b71a19f compiler: add an access flag for intel EU fusion
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
c7ac46a1d8 nir/lower_io: add get_io_index_src_number support for image intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
ca1533cd03 nir/divergence: add a new mode to cover fused threads on Intel HW
The Intel Gfx12.x generation of GPU has an architecture feature called
EU fusion in which 2 subgroups run lock step. A typical case where
this happens is a compute shader with 1x1x1 local workgroup size and a
dispatch command of 2x1x1. In that case 2 threads will be run in lock
step for each of the workgroup.

This has been the sources of some troubles in the backend because one
subgroup can run with all lanes disabled, requiring care for SEND
messages using the NoMask flag (execution regardless of the lane mask).

We found out that other things are happening when 2 subgroups run
together :
  - the HW will use the surface/sampler handle from only one subgroup
  - the HW will use the sampler header from only one subgroup

So one of the fused subgroup can access the wrong surface/sampler if
the value is different between the 2 subgroups and that can happen
even with subgroup uniform values.

Fortunately we can flag SEND instructions to disable the fusion
behavior (most likely at a performance cost).

This change introduce a new divergence mode that tries to compute
things divergent between subgroups so that we can flag instructions
accordingly.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Simon Perretta
79923115e7 nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_recompute_io_bases will modify i/o intrinsics, which is not the
expected behaviour when the keep_intrinsics flag is set.

Fixes: 83aecc8f3f ("mesa/st, nir: commonize unlower_io_to_vars pass")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>
2025-10-10 11:53:24 +01:00
Juan A. Suarez Romero
9f45f09b86 glsl: use array element type to validate assignment
When comparing an vec3 and a vec4 array, scalar type is the same for
both (float). Instead use the array element type to compare (that is,
vec3 vs vec4).

Fixes
spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert
piglit test.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>
2025-10-10 09:19:55 +00:00
Job Noorman
6d59a3e3e7 nir/lower_alu: use Knuth's Algorithm M for [iu]mul_high
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This significantly simplifies the handling of signed numbers as the same
code path can handle signed and unsigned numbers by simply using ishr
instead of ushr for some of the shifts. For both cases, the number of
additions and shifts are also reduced.

Note that LLVM uses the same algorithm.

fossil-db stats for Turnip:

Totals from 4849 (2.94% of 164705) affected shaders:
MaxWaves: 52318 -> 52332 (+0.03%); split: +0.04%, -0.02%
Instrs: 5262458 -> 5218922 (-0.83%); split: -0.87%, +0.05%
CodeSize: 10831900 -> 10655170 (-1.63%); split: -1.64%, +0.01%
NOPs: 829481 -> 836010 (+0.79%); split: -0.95%, +1.74%
MOVs: 176187 -> 173788 (-1.36%); split: -3.27%, +1.91%
COVs: 104096 -> 86543 (-16.86%); split: -16.87%, +0.01%
Full: 90434 -> 90158 (-0.31%); split: -0.33%, +0.03%
(ss): 131091 -> 130866 (-0.17%); split: -0.87%, +0.70%
(sy): 55550 -> 55769 (+0.39%); split: -0.92%, +1.32%
(ss)-stall: 406003 -> 407194 (+0.29%); split: -1.10%, +1.39%
(sy)-stall: 1668213 -> 1678082 (+0.59%); split: -1.31%, +1.90%
Preamble Instrs: 1105270 -> 1067290 (-3.44%); split: -3.50%, +0.06%
Constlen: 423776 -> 423560 (-0.05%)
Last helper: 1038202 -> 1035540 (-0.26%); split: -0.42%, +0.16%
Last baryf: 38908 -> 38632 (-0.71%)
Subgroup size: 336640 -> 336832 (+0.06%)
Cat0: 916209 -> 922848 (+0.72%); split: -0.87%, +1.59%
Cat1: 282813 -> 262845 (-7.06%); split: -7.49%, +0.43%
Cat2: 2198715 -> 2183012 (-0.71%); split: -0.72%, +0.01%
Cat3: 1390914 -> 1376421 (-1.04%)
Cat7: 123127 -> 123116 (-0.01%); split: -0.24%, +0.23%

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
2025-10-10 05:31:17 +00:00
Job Noorman
18f69890d1 nir: add nir_shr builder
Sometimes we need to select between ishr/ushr based some condition; this
builder makes this less verbose.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
2025-10-10 05:31:17 +00:00
Olivia Lee
d600cd4c91 vtn_bindgen2: use anonymous namespace to avoid name collisions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All vtn_bindgen2-generated files use the same 'vtn_bindgen_dummy' struct
name. When linking more than one file (like in panfrost), the
constructor and destructor symbols collide and every instance ends up
running the same initialization. In panfrost, this results in us
dropping any printf format strings that don't occur in v6.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: b7447a94c8 ("vtn: add vtn_bindgen2 tool")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37798>
2025-10-09 19:18:58 +00:00
Emma Anholt
d01aae2fb1 nir: Add a shader bisect tool.
When you're trying to figure out what shader some NIR pass broke, use
nir_shader_bisect_select() to decide between NIR pass behaviors, and then
nir_shader_bisect.py will help you automatically bisect down to which
source_blake3 is at fault.  Once it's identified, it prints you a C call
you can use for selecting that shader specifically, which you can use for
continuing on in your debugging.

On a test I was looking at, this took 10 steps to bisect 134 shaders down
to the source_blake3 of the NIR shader in question.

This idea is heavily lifted from Job Noorman's ir3_shader_bisect.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>
2025-10-09 17:56:30 +00:00
Alyssa Rosenzweig
c1d75c6e51 treewide: use BITSET_CALLOC
Via Coccinelle patch:

    @@
    expression count;
    type T;
    @@

    -calloc(BITSET_WORDS(count), sizeof(T))
    +BITSET_CALLOC(count)

    @@
    expression count;
    type T;
    @@

    -calloc(sizeof(T), BITSET_WORDS(count))
    +BITSET_CALLOC(count)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37779>
2025-10-09 12:29:55 +00:00
Romaric Jodin
cb86341829 meson: remove '--outdir' argument in script
Usage of '--outdir' argument in python scripts makes it very
complicated for tools like ninja-to-soong to generate the Android
equivalent build file.
This is because the option is less clear on what will be generated.

Instead, change it for '--out' where we give the full path of the file
to generate. This has the good point of deduplicating the locations of
the file name to have it only in 'meson.build'.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>
2025-10-08 20:51:20 +00:00
Mike Blumenkrantz
8f0ac427b3 glsl: fix gl_ViewID_OVR type to uint
the spec defines this as a uint, and having it as an int breaks anyone
trying to actually use it

cc: mesa-stable

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37766>
2025-10-08 16:58:52 +00:00
Marek Olšák
3fe651f607 nir: remove load_smem_amd
replaced by load_global_amd + ACCESS_SMEM_AMD

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>
2025-10-08 08:54:11 +00:00
Daniel Schürmann
7593667b0a nir/divergence_analysis: check ACCESS_SMEM_AMD
Revert "nir/divergence: make smem load_global_amd uniform"

This reverts commit 2d0f93631c.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>
2025-10-08 08:53:55 +00:00
Daniel Schürmann
cacb390ec9 nir/load_store_vectorize: Fix parsing offsets through u2u64
Fixes: cfba417316 ('nir/load_store_vectorize: optimize accesses with u2u64(ishl.nuw(iadd))')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>
2025-10-08 08:53:51 +00:00
Rhys Perry
8fba196164 nir: assume non-atomic loads don't tear
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Rhys Perry
21187c4662 nir,vtn: add shader_info::assume_no_data_races
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> (nir)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Rhys Perry
e9d72ea65d vtn: set ACCESS_ATOMIC
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Rhys Perry
0dd09a292b nir: add ACCESS_ATOMIC
This is so that passes and backends can tell if a coherent load/store is
atomic or not, instead of having to assume it could be either.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Samuel Pitoiset
e868e8d946 nir: adjust nir_tex_instr_need_sampler() for AMD FMASK instructions
These instructions don't need a sampler.

This doesn't fix anything now because this helper isn't unused yet, but
it will help for descriptor heap.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37720>
2025-10-07 15:22:47 +00:00
Georg Lehmann
84f26ed117 nir: optimize atomic isub if supported
Foz-DB Navi48:
Totals from 1 (0.00% of 80287) affected shaders:
Instrs: 1641 -> 1637 (-0.24%)
CodeSize: 8472 -> 8456 (-0.19%)
Latency: 19132 -> 19131 (-0.01%)
InvThroughput: 9566 -> 9565 (-0.01%)
Copies: 126 -> 125 (-0.79%)
VALU: 565 -> 563 (-0.35%)
SALU: 439 -> 438 (-0.23%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Georg Lehmann
b0d3db3733 nir: add atomic isub
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Lionel Landwerlin
94f8d0072d nir: add pass to propagate image format to intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36773>
2025-10-07 08:54:26 +00:00
Lionel Landwerlin
0922a0dd50 nir/lower_tex: remove unused options
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37692>
2025-10-03 20:19:03 +00:00