Commit graph

6746 commits

Author SHA1 Message Date
Daniel Schürmann
9039e24751 nir/lower_flrp: ad-hoc constant-fold ALU instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
f61cd64af8 nir/builder: add option to immediately constant-fold ALU instructions upon insertion
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
870616af34 nir/constant_folding: switch to nir_shader_lower_instructions()
Small differences due to implicit DCE.
Totals from 76 (0.10% of 79839) affected shaders: (Navi48)

Instrs: 168051 -> 168044 (-0.00%); split: -0.01%, +0.01%
CodeSize: 893284 -> 893256 (-0.00%); split: -0.01%, +0.01%
Latency: 1082007 -> 1082027 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 155100 -> 155105 (+0.00%)
Copies: 9649 -> 9654 (+0.05%)
VALU: 92504 -> 92509 (+0.01%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
d1f2f1222e nir: guard nir_def_as_alu()
We will potentially create load_const_instr instead of ALU.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Daniel Schürmann
3180656bbc nir: don't use nir_build_alu() with incomplete sources
Ideally we'd have a version that takes nir_scalar arguments.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Daniel Schürmann
ef9ecc4058 nir: add nir_imul_nuw() and nir_imul_imm_nuw() helpers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Alyssa Rosenzweig
b82044c31b nir/lower_two_sided_color: cleanup
while in the area. no functional change

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38124>
2025-10-29 15:52:27 +00:00
Job Noorman
32b646c597 nir: print in_bounds info for deref_type(_ptr_as)_array
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>
2025-10-28 14:21:01 +00:00
Natalie Vock
50e65dac79 nir/lower_shader_calls: Repair SSA after wrap_instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>
2025-10-26 11:38:51 +00:00
Alyssa Rosenzweig
b824ef83ab util/dynarray: infer type in append
Most of the time, we can infer the type to append in
util_dynarray_append using __typeof__, which is standardized in C23 and
support in Jesse's MSMSVCV. This patch drops the type argument most of
the time, making util_dynarray a little more ergonomic to use.

This is done in four steps.

First, rename util_dynarray_append -> util_dynarray_append_typed

    bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;"

Then, add a new append that infers the type. This is much more ergonomic
for what you want most of the time.

Next, use type-inferred append as much as possible, via Coccinelle
patch (plus manual fixup):

    @@
    expression dynarray, element;
    type type;
    @@

    -util_dynarray_append_typed(dynarray, type, element);
    +util_dynarray_append(dynarray, element);

Finally, hand fixup cases that Coccinelle missed or incorrectly
translated, of which there were several because we can't used the
untyped append with a literal (since the sizeof won't do what you want).

All four steps are squashed to produce a single patch changing every
util_dynarray_append call site in tree to either drop a type parameter
(if possible) or insert a _typed suffix (if we can't infer). As such,
the final patch is best reviewed by hand even though it was
tool-assisted.

No Long Linguine Meals were involved in the making of this patch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
2025-10-24 18:32:07 +00:00
Ian Romanick
f1bbc3d4e4 nir/algebraic: Don't generate integer min or max that will need to be lowered
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.

There were no shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
2025-10-23 22:35:27 +00:00
Rhys Perry
92beca9aa5 nir/lower_tex: optimize txd(coord, ddx/ddy(coord))
fossil-db (gfx1201):
Totals from 73 (0.09% of 79839) affected shaders:
MaxWaves: 1668 -> 1670 (+0.12%)
Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00%
CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00%
VGPRs: 6360 -> 6324 (-0.57%)
Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02%
InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03%
VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05%
SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86%
Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32%
PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13%
VALU: 232982 -> 228117 (-2.09%)
SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03%
VMEM: 9234 -> 9237 (+0.03%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>
2025-10-23 11:21:59 +00:00
Rhys Perry
8e7ea4a882 nir/lower_shader_calls: reobtain impl after NIR_PASS
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
2025-10-23 10:44:38 +00:00
Lionel Landwerlin
aa929ea706 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>
2025-10-22 21:21:47 +00:00
Simon Perretta
ff51e6dc9e nir: commonize barycentric intrinsic opt pass
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
2025-10-22 16:48:01 +00:00
Olivia Lee
a410d90fd2 panfrost: fix cl_local_size for precompiled shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
2025-10-22 00:15:49 +00:00
Rhys Perry
64ec757688 nir/lower_mem_access_bit_sizes: increase chunk limit
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Georg Lehmann
cf4ab485ea nir: remove manual nir_load_global_constant
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:39:53 +02:00
Georg Lehmann
654bd74c60 treewide: use nir_store_global alias of nir_build_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
2306cba65b nir: remove manual nir_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
9e41a7c139 treewide: use nir_load_global alias of nir_build_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
77540cac8c nir: remove manual nir_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Lionel Landwerlin
255d1e883d nir/divergence: fix handling of intel uniform block load
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>
2025-10-21 06:13:10 +00:00
Emma Anholt
0781edc30f nir/copy_prop_vars: Mask out no-op writes to variables.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The pass previously supported removing complete no-op writes, but we can
do better by noticing if any channel being written is the current
channel's value and masking off those writes.  I noticed this happening in
Stray's fragment shader, where it looked like some translation layer had
turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a,
b).  This in turn lets nir_shrink_vec_array_vars be more effective.

Totals:
MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01%
Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00%
CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00%
STPs: 369810 -> 234618 (-36.56%)
LDPs: 209430 -> 172011 (-17.87%)

Totals from 1884 (0.12% of 1560230) affected shaders:
MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15%
Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06%
CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05%
NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58%
MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59%
COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83%
Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31%
(ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68%
(sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18%
(ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57%
(sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12%
STPs: 212045 -> 76853 (-63.76%)
LDPs: 47337 -> 9918 (-79.05%)
Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13%

Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43%
Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81%
Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04%
Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11%
Cat6: 461411 -> 352375 (-23.63%)
Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
537cc4e0ff nir/shrink_stores: Don't shrink stores to an invalid num_components.
Avoids a regression in the CL CTS on the next commit.

Fixes: 2dba7e6056 ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
d8690f9c60 nir/link_opt_varyings: Make it participate in NIR_DEBUG=print.
It's a pass with major effects on shaders, and it's otherwise weird to see
your varying disappear between two passes that shouldn't affect them.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Aitor Camacho
f711c3afed nir: Add KosmicKrisp required utilities
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>
2025-10-20 16:22:00 +00:00
Job Noorman
ad421cdf2e nir: mark fneg distribution through fadd/ffma as nsz
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
2025-10-17 08:58:59 +00:00
Mary Guillemard
6f73533094 asahi,nir: Stop relying on zero and scratch page in GS/TESS code
Introduce new NIR intrinsics to handle getting a "sink" read-only
address and another intrinsic to handle conversion of address to
read-write (allowing implementation to replace the "sink" read-only with
another address like required for Asahi)

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Mary Guillemard
1e0c18d6cf nir: Rename stat_query_address_agx to stat_query_address_poly
This is used by the geometry lowering that we are going to move to
common code.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig
84d8e6824b treewide: don't check before free
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.

Via Coccinelle patches:

    @@
    expression ptr;
    @@

    -if (ptr) {
    -free(ptr);
    -}
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -FREE(ptr);
    -}
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -ralloc_free(ptr);
    -}
    +ralloc_free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -free(ptr);
    -}
    -
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -FREE(ptr);
    -}
    -
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -ralloc_free(ptr);
    -}
    -
    +ralloc_free(ptr);

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
2025-10-15 23:01:33 +00:00
Dave Airlie
543c9be87a nir/coopmat: fix non square load/store lowering for flexible dimensions
This shouldn't affect radv, but we should do the calculations correctly for
when non-square matters.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>
2025-10-16 07:19:28 +10:00
Daniel Schürmann
fad10b91a6 nir/divergence: don't assume that load_sample_positions_amd is always uniform
Sample positions aren't uniform when the sample id is divergent.
This was a regression when we started lowering fragment shader
barycentrics in NIR.

Fixes: 7f444fc72c ("nir: add nir_intrinsic_load_sample_positions_amd")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:10 +00:00
Job Noorman
0b82b803d9 nir,ir3: rename umul_low to umul_16x16
This is more in line with similar opcodes like umul_32x16.

Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
2025-10-14 12:54:54 +00:00
Georg Lehmann
92d670021a nir/opt_intrinsics: optimize atomics to atomic load/store
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
e98b218ddd nir/opt_intrinsics: don't pass nir options around
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:17 +00:00
Georg Lehmann
142079759b nir/opt_uniform_atomics: optimize xchg with uniform address and data
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>
2025-10-14 06:24:16 +00:00
Georg Lehmann
e08911dff4 nir: fix nir_get_io_offset_src for global_atomic_swap_amd
Fixes: 354df09c88 ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>
2025-10-13 09:41:41 +00:00
Ian Romanick
1e691e68e2 nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181254 -> 17181046 (<.01%)
instructions in affected programs: 35834 -> 35626 (-0.58%)
helped: 130 / HURT: 2

total cycles in shared programs: 888543370 -> 888554248 (<.01%)
cycles in affected programs: 7443984 -> 7454862 (0.15%)
helped: 95 / HURT: 87

fossil-db:

Lunar Lake
Totals:
Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00%

Totals from 6842 (0.87% of 790721) affected shaders:
Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00%
Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19%
Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01%
Spill count: 530380 -> 530084 (-0.06%)
Fill count: 613416 -> 612900 (-0.08%)
Scratch Memory Size: 20089856 -> 20075520 (-0.07%)
Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8034616 -> 8034584 (-0.00%)

Totals from 6653 (0.73% of 905545) affected shaders:
Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00%
Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40%
Spill count: 1953 -> 1657 (-15.16%)
Fill count: 3556 -> 3040 (-14.51%)
Scratch Memory Size: 92160 -> 77824 (-15.56%)
Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 55768 -> 55736 (-0.06%)

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
f7939f2fdc nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub
I noticed some things related to this while implementing support for
bitfield_select / BFN in BRW.

shader-db:

Lunar Lake
total instructions in shared programs: 17183140 -> 17183128 (<.01%)
instructions in affected programs: 3830 -> 3818 (-0.31%)
helped: 6 / HURT: 0

total cycles in shared programs: 889936934 -> 889936056 (<.01%)
cycles in affected programs: 253758 -> 252880 (-0.35%)
helped: 4 / HURT: 2

No shader-db changes on any other Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738646 -> 71738626 (-0.00%)
Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%)

Totals from 177 (0.02% of 790723) affected shaders:
Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00%
Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01%
Max live registers: 39364 -> 39344 (-0.05%)
Non SSA regs after NIR: 70453 -> 70455 (+0.00%)

Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095611 -> 264095358 (-0.00%)
Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00%
Fill count: 613233 -> 613231 (-0.00%)

Totals from 123 (0.01% of 905547) affected shaders:
Instrs: 334830 -> 334577 (-0.08%)
Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22%
Fill count: 4145 -> 4143 (-0.05%)

Tiger Lake and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 269733849 -> 269733590 (-0.00%)
Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01%

Totals from 123 (0.01% of 903812) affected shaders:
Instrs: 338617 -> 338358 (-0.08%)
Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
986086c846 nir: Add saturating float to integer conversion opcodes
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.

v3: Apply "NaN should be zero" fix suggested by Georg.

Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:05 +00:00
Lionel Landwerlin
301b71a19f compiler: add an access flag for intel EU fusion
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
c7ac46a1d8 nir/lower_io: add get_io_index_src_number support for image intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Lionel Landwerlin
ca1533cd03 nir/divergence: add a new mode to cover fused threads on Intel HW
The Intel Gfx12.x generation of GPU has an architecture feature called
EU fusion in which 2 subgroups run lock step. A typical case where
this happens is a compute shader with 1x1x1 local workgroup size and a
dispatch command of 2x1x1. In that case 2 threads will be run in lock
step for each of the workgroup.

This has been the sources of some troubles in the backend because one
subgroup can run with all lanes disabled, requiring care for SEND
messages using the NoMask flag (execution regardless of the lane mask).

We found out that other things are happening when 2 subgroups run
together :
  - the HW will use the surface/sampler handle from only one subgroup
  - the HW will use the sampler header from only one subgroup

So one of the fused subgroup can access the wrong surface/sampler if
the value is different between the 2 subgroups and that can happen
even with subgroup uniform values.

Fortunately we can flag SEND instructions to disable the fusion
behavior (most likely at a performance cost).

This change introduce a new divergence mode that tries to compute
things divergent between subgroups so that we can flag instructions
accordingly.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
2025-10-10 11:19:39 +00:00
Simon Perretta
79923115e7 nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_recompute_io_bases will modify i/o intrinsics, which is not the
expected behaviour when the keep_intrinsics flag is set.

Fixes: 83aecc8f3f ("mesa/st, nir: commonize unlower_io_to_vars pass")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>
2025-10-10 11:53:24 +01:00