Commit graph

11261 commits

Author SHA1 Message Date
Samuel Pitoiset
4c2207e76b spirv: Update the JSON and headers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38042>
2025-10-24 16:00:55 +00:00
Ian Romanick
f1bbc3d4e4 nir/algebraic: Don't generate integer min or max that will need to be lowered
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.

There were no shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
2025-10-23 22:35:27 +00:00
Rhys Perry
92beca9aa5 nir/lower_tex: optimize txd(coord, ddx/ddy(coord))
fossil-db (gfx1201):
Totals from 73 (0.09% of 79839) affected shaders:
MaxWaves: 1668 -> 1670 (+0.12%)
Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00%
CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00%
VGPRs: 6360 -> 6324 (-0.57%)
Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02%
InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03%
VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05%
SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86%
Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32%
PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13%
VALU: 232982 -> 228117 (-2.09%)
SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03%
VMEM: 9234 -> 9237 (+0.03%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>
2025-10-23 11:21:59 +00:00
Rhys Perry
8e7ea4a882 nir/lower_shader_calls: reobtain impl after NIR_PASS
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
2025-10-23 10:44:38 +00:00
Lionel Landwerlin
aa929ea706 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>
2025-10-22 21:21:47 +00:00
Simon Perretta
ff51e6dc9e nir: commonize barycentric intrinsic opt pass
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
2025-10-22 16:48:01 +00:00
Olivia Lee
a410d90fd2 panfrost: fix cl_local_size for precompiled shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
2025-10-22 00:15:49 +00:00
Rhys Perry
64ec757688 nir/lower_mem_access_bit_sizes: increase chunk limit
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Georg Lehmann
cf4ab485ea nir: remove manual nir_load_global_constant
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:39:53 +02:00
Georg Lehmann
654bd74c60 treewide: use nir_store_global alias of nir_build_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
2306cba65b nir: remove manual nir_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
9e41a7c139 treewide: use nir_load_global alias of nir_build_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
77540cac8c nir: remove manual nir_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Lionel Landwerlin
255d1e883d nir/divergence: fix handling of intel uniform block load
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>
2025-10-21 06:13:10 +00:00
Emma Anholt
0781edc30f nir/copy_prop_vars: Mask out no-op writes to variables.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The pass previously supported removing complete no-op writes, but we can
do better by noticing if any channel being written is the current
channel's value and masking off those writes.  I noticed this happening in
Stray's fragment shader, where it looked like some translation layer had
turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a,
b).  This in turn lets nir_shrink_vec_array_vars be more effective.

Totals:
MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01%
Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00%
CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00%
STPs: 369810 -> 234618 (-36.56%)
LDPs: 209430 -> 172011 (-17.87%)

Totals from 1884 (0.12% of 1560230) affected shaders:
MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15%
Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06%
CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05%
NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58%
MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59%
COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83%
Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31%
(ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68%
(sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18%
(ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57%
(sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12%
STPs: 212045 -> 76853 (-63.76%)
LDPs: 47337 -> 9918 (-79.05%)
Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13%

Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43%
Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81%
Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04%
Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11%
Cat6: 461411 -> 352375 (-23.63%)
Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
537cc4e0ff nir/shrink_stores: Don't shrink stores to an invalid num_components.
Avoids a regression in the CL CTS on the next commit.

Fixes: 2dba7e6056 ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
d8690f9c60 nir/link_opt_varyings: Make it participate in NIR_DEBUG=print.
It's a pass with major effects on shaders, and it's otherwise weird to see
your varying disappear between two passes that shouldn't affect them.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Aitor Camacho
f711c3afed nir: Add KosmicKrisp required utilities
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>
2025-10-20 16:22:00 +00:00
Job Noorman
ad421cdf2e nir: mark fneg distribution through fadd/ffma as nsz
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
2025-10-17 08:58:59 +00:00
Mary Guillemard
6f73533094 asahi,nir: Stop relying on zero and scratch page in GS/TESS code
Introduce new NIR intrinsics to handle getting a "sink" read-only
address and another intrinsic to handle conversion of address to
read-write (allowing implementation to replace the "sink" read-only with
another address like required for Asahi)

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Mary Guillemard
71c4943c37 compiler: rename vs.tes_agx bit to vs.tes_poly
Preparing to move AGX's GS/TESS lowering code.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Mary Guillemard
1e0c18d6cf nir: Rename stat_query_address_agx to stat_query_address_poly
This is used by the geometry lowering that we are going to move to
common code.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig
84d8e6824b treewide: don't check before free
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.

Via Coccinelle patches:

    @@
    expression ptr;
    @@

    -if (ptr) {
    -free(ptr);
    -}
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -FREE(ptr);
    -}
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -ralloc_free(ptr);
    -}
    +ralloc_free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -free(ptr);
    -}
    -
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -FREE(ptr);
    -}
    -
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -ralloc_free(ptr);
    -}
    -
    +ralloc_free(ptr);

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
2025-10-15 23:01:33 +00:00
Dave Airlie
543c9be87a nir/coopmat: fix non square load/store lowering for flexible dimensions
This shouldn't affect radv, but we should do the calculations correctly for
when non-square matters.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>
2025-10-16 07:19:28 +10:00
Timothy Arceri
45a8cce5d6 util/range_remap: switch to using sorted array
Here we switch to using a sorted arrays for the binary search which
significantly speeds up the lookups. For a shader with ~8000
uniforms its up to 10x faster and the godot-tps-gles3-high.trace
in issue #13894 returns to its original runtime length before we
switched to using range remap in e052254066

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:36 +00:00
Timothy Arceri
9536de6eac util/range_remap: split list node from range entry
This will allow us to create an array of entries without the
node member in a following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:34 +00:00
Timothy Arceri
754c6d0f54 glsl/util: update util_range_remap to use range_remap struct
This will allow us to use the linked list for validation of inserts
during linking then switch to using an array for fast binary searches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:34 +00:00
Timothy Arceri
35d5e903ac util/range_remap: dont overwrite entry if ptr is NULL
If the ptr value is NULL and we match an existing entry during an
insert call we now just return the existing value. This allows us
to drop an extra lookup, this will become important as the
following patches change `util_range_remap()` lookups to use a
sorted array that is not created until after we have added all
entries to our linked list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
2025-10-15 02:16:33 +00:00
Faith Ekstrand
f91f86dded compiler/rust: Implement dfs() non-recursively
Recursion in Rust can be annoyingly expensive.  The recursive DFS
implementation we have now is probably not terrible but these things
bloat in debug builds more than you'd expect.  This replaceas it with a
non-recursive implementation which uses a Vec instead.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
97febfde28 compiler/rust/cfg: Use DepthFirstSearch for finding reaches sets
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
76bba3ffba compiler/rust/cfg: Use DepthFirstSearch for find_back_edges()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
030005a772 compiler/rust/cfg: Use DepthFirstSearch for calc_dominance()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:40 +00:00
Faith Ekstrand
7f73a027ae compiler/rust/cfg: Use DepthFirstSearch for rev_post_order_sort()
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:39 +00:00
Faith Ekstrand
993f5c9e30 compiler/rust: Add a DepthFirstSearch trait
There are four depth first searches in cfg.rs.  This adds a DFS
abstraction which we can later optimize.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
2025-10-14 22:47:38 +00:00
Daniel Schürmann
fad10b91a6 nir/divergence: don't assume that load_sample_positions_amd is always uniform
Sample positions aren't uniform when the sample id is divergent.
This was a regression when we started lowering fragment shader
barycentrics in NIR.

Fixes: 7f444fc72c ("nir: add nir_intrinsic_load_sample_positions_amd")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:10 +00:00
Job Noorman
0b82b803d9 nir,ir3: rename umul_low to umul_16x16
This is more in line with similar opcodes like umul_32x16.

Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
2025-10-14 12:54:54 +00:00
Georg Lehmann
92d670021a nir/opt_intrinsics: optimize atomics to atomic load/store
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
e98b218ddd nir/opt_intrinsics: don't pass nir options around
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
142079759b nir/opt_uniform_atomics: optimize xchg with uniform address and data
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:16 +00:00
Georg Lehmann
e08911dff4 nir: fix nir_get_io_offset_src for global_atomic_swap_amd
Fixes: 354df09c88 ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>
2025-10-13 09:41:41 +00:00
Ian Romanick
1e691e68e2 nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181254 -> 17181046 (<.01%)
instructions in affected programs: 35834 -> 35626 (-0.58%)
helped: 130 / HURT: 2

total cycles in shared programs: 888543370 -> 888554248 (<.01%)
cycles in affected programs: 7443984 -> 7454862 (0.15%)
helped: 95 / HURT: 87

fossil-db:

Lunar Lake
Totals:
Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00%

Totals from 6842 (0.87% of 790721) affected shaders:
Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00%
Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19%
Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01%
Spill count: 530380 -> 530084 (-0.06%)
Fill count: 613416 -> 612900 (-0.08%)
Scratch Memory Size: 20089856 -> 20075520 (-0.07%)
Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8034616 -> 8034584 (-0.00%)

Totals from 6653 (0.73% of 905545) affected shaders:
Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00%
Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40%
Spill count: 1953 -> 1657 (-15.16%)
Fill count: 3556 -> 3040 (-14.51%)
Scratch Memory Size: 92160 -> 77824 (-15.56%)
Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 55768 -> 55736 (-0.06%)

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
f7939f2fdc nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub
I noticed some things related to this while implementing support for
bitfield_select / BFN in BRW.

shader-db:

Lunar Lake
total instructions in shared programs: 17183140 -> 17183128 (<.01%)
instructions in affected programs: 3830 -> 3818 (-0.31%)
helped: 6 / HURT: 0

total cycles in shared programs: 889936934 -> 889936056 (<.01%)
cycles in affected programs: 253758 -> 252880 (-0.35%)
helped: 4 / HURT: 2

No shader-db changes on any other Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738646 -> 71738626 (-0.00%)
Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%)

Totals from 177 (0.02% of 790723) affected shaders:
Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00%
Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01%
Max live registers: 39364 -> 39344 (-0.05%)
Non SSA regs after NIR: 70453 -> 70455 (+0.00%)

Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095611 -> 264095358 (-0.00%)
Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00%
Fill count: 613233 -> 613231 (-0.00%)

Totals from 123 (0.01% of 905547) affected shaders:
Instrs: 334830 -> 334577 (-0.08%)
Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22%
Fill count: 4145 -> 4143 (-0.05%)

Tiger Lake and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 269733849 -> 269733590 (-0.00%)
Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01%

Totals from 123 (0.01% of 903812) affected shaders:
Instrs: 338617 -> 338358 (-0.08%)
Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
986086c846 nir: Add saturating float to integer conversion opcodes
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.

v3: Apply "NaN should be zero" fix suggested by Georg.

Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:05 +00:00
Lionel Landwerlin
301b71a19f compiler: add an access flag for intel EU fusion
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
c7ac46a1d8 nir/lower_io: add get_io_index_src_number support for image intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00