Commit graph

7304 commits

Author SHA1 Message Date
Pavel Ondračka
0b39b5ea63 nir/opt_algebraic: improve dot product narrowing
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The issue is that the current narrowing patterns are not working in a lot
of cases, for example
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
is missing patterns like this:

32x3   %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000)
32x4   %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0)
32    %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz

or after some later transforms:

32x2   %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000)
32x2   %6 = vec2 %5, %1 (0x0)
32    %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy

This patch is heavily based on old branch from Ian Romanick from 2019.

r300 RV530 shader-db:
total instructions in shared programs: 128900 -> 128882 (-0.01%)
instructions in affected programs: 621 -> 603 (-2.90%)
helped: 10
HURT: 1
total cycles in shared programs: 191837 -> 191828 (<.01%)
cycles in affected programs: 799 -> 790 (-1.13%)
helped: 7
HURT: 1

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>
2026-01-02 16:07:10 +01:00
Timur Kristóf
2b62738b9b nir: Add new nir_remove_outputs pass
Introduce a new NIR pass called nir_remove_outputs which works on
lowered I/O intrinsics and can remove any output varying or sysval.
This is meant to replace custom solutions in drivers,
such as radv_remove_varyings and similar.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>
2026-01-01 21:25:42 -06:00
Timur Kristóf
1981b9836b nir/opt_vectorize_io: Fix allow_holes option
Only allow holes between the first and last used component.
Do not load unused components before the first used component.

This fixes test failures with a bunch of VK CTS tests
with allow_holes enabled on RADV:
dEQP-VK.tessellation.tess_io.max_in_out.with_f16.*

Fixes: 6286c1c66f
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 17:38:01 -06:00
Marek Olšák
99a42bdd4b nir,radeonsi: simplify load_color0 & load_color1 intrinsics and shader_info
We don't need the shader_info fields anymore. sample and centroid fields
are unused. The interp field is already available from
si_shader_info::color_interpolate.

The loads don't need to be sysvals. Add also the _amd suffix.

Don't handle it in st_nir_lower_drawpixels either because the intrinsics
are created much later in compilation now.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>
2026-01-01 18:30:28 +00:00
Georg Lehmann
369a3b22b4 nir/opt_uniform_subgroup: optimize uniform ddx/ddy
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We can't just use 0.0 as the replacement because of NaN/Inf.
But turning the intrinsic into a simple fsub should still be better
or at least equal.

Foz-DB Navi48:
Totals from 128 (0.10% of 125402) affected shaders:
MaxWaves: 3684 -> 3708 (+0.65%)
Instrs: 111150 -> 111055 (-0.09%); split: -0.20%, +0.11%
CodeSize: 587176 -> 590800 (+0.62%); split: -0.01%, +0.63%
VGPRs: 6540 -> 6480 (-0.92%)
Latency: 382775 -> 383332 (+0.15%); split: -0.15%, +0.29%
InvThroughput: 80909 -> 80530 (-0.47%); split: -0.51%, +0.04%
VClause: 1433 -> 1430 (-0.21%)
SClause: 1834 -> 1841 (+0.38%); split: -0.11%, +0.49%
Copies: 6130 -> 6096 (-0.55%); split: -1.29%, +0.73%
PreSGPRs: 7352 -> 7356 (+0.05%)
PreVGPRs: 4797 -> 4721 (-1.58%)
VALU: 71892 -> 71435 (-0.64%); split: -0.64%, +0.01%
SALU: 12665 -> 13056 (+3.09%); split: -0.06%, +3.14%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39112>
2026-01-01 08:43:55 +00:00
Sviatoslav Peleshko
f3eb98ec57 nir/normalize_cubemap_coords: Handle the projector before the normalization
Applying the projector after the normalization breaks the coordinates, so
apply it early. Usually it's not even necessary for the cubemaps anyway,
but ARB_fragment_program and TGSI allow it.

Fixes: 52e71809 ("nir: Add a cubemap normalizing pass")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39087>
2025-12-30 16:25:09 +00:00
Georg Lehmann
5e8cc19a3b nir: remove per shader float fast math flags
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
These were redundant with the per alu fast math flags.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
2025-12-29 10:57:06 +00:00
Georg Lehmann
6e67267045 nir/opt_varyings: use per instruction nan flag for promoting to flat
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
2025-12-29 10:57:06 +00:00
Georg Lehmann
4f5a29ec32 nir/opt_varyings: use per instruction inf/nan flag for moving past interp
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
2025-12-29 10:57:06 +00:00
Georg Lehmann
f3290219ab nir: use a seperate enum for per alu floating point math control
We don't need one bit per bitsize per instruction if only one actually
matters in the end.

First step towards moving NIR in the direction of full float_controls2
only.

Also rename this from fp_fast_math, because that name implied that 0 is
the no fast math mode, while the opposite was the case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
2025-12-29 10:57:05 +00:00
Georg Lehmann
71f0c0d6a6 nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi48:
Totals from 12 (0.01% of 97623) affected shaders:
Instrs: 9207 -> 8973 (-2.54%)
CodeSize: 54192 -> 52832 (-2.51%)
VGPRs: 768 -> 480 (-37.50%)
Latency: 39516 -> 38507 (-2.55%)
InvThroughput: 10155 -> 9859 (-2.91%)
PreSGPRs: 329 -> 332 (+0.91%)
PreVGPRs: 268 -> 263 (-1.87%)
VALU: 4393 -> 4257 (-3.10%)
SALU: 1037 -> 1019 (-1.74%)
VOPD: 602 -> 599 (-0.50%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>
2025-12-19 20:23:23 +00:00
Georg Lehmann
0e5e1cb9b0 nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con)
Foz-DB Navi48:
Totals from 1 (0.00% of 97397) affected shaders:
Instrs: 1848 -> 1834 (-0.76%)
CodeSize: 9996 -> 9908 (-0.88%)
VGPRs: 96 -> 72 (-25.00%)
Latency: 17371 -> 17358 (-0.07%)
Copies: 190 -> 191 (+0.53%)
PreVGPRs: 43 -> 41 (-4.65%)
VALU: 657 -> 648 (-1.37%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>
2025-12-19 20:23:23 +00:00
Georg Lehmann
4d8cc7d82e nir/divergence: add nir_def_is_divergent_at_use_block helper
For cases where the block we are interested in is not the immediate block
of the nir_src.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>
2025-12-19 20:23:23 +00:00
Lionel Landwerlin
252e55a1bb nir/printf-helpers: set writes_memory at printf emission
Those helpers can be called late (since it's mostly for debug
purposes). This can avoid surprises in the backend and also avoids
rerunning gather_info.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>
2025-12-19 10:31:08 +00:00
Emma Anholt
5a09abe890 nir: Introduce nir_lower_vars_to_scratch_global().
This lets the driver make a more informed decision about which vars to
lower to scratch based on the vars available to spill.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Emma Anholt
059d301c79 nir: Drop the mode argument of nir_lower_vars_to_scratch().
It only makes sense for function temps, and that's the only way it's been
used.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Ian Romanick
66fd4d72fd nir/algebraic: Mask with shifted constant instead of shift-then-mask
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17088766 -> 17088765 (<.01%)
instructions in affected programs: 1375 -> 1374 (-0.07%)
helped: 1 / HURT: 1

total cycles in shared programs: 887873068 -> 887871748 (<.01%)
cycles in affected programs: 136402 -> 135082 (-0.97%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 40937696 -> 40937728 (+0.00%)
Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00%
Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00%
Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01%
Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 49078640 -> 49078656 (+0.00%)
Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00%

Totals from 13809 (0.68% of 2019450) affected shaders:
Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02%
Subgroup size: 32 -> 64 (+100.00%)
Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25%
Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24%
Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88%
Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02%
Max dispatch width: 234848 -> 234864 (+0.01%)
Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 27452928 -> 27452944 (+0.00%)
Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01%
Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02%
Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05%
Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37811544 -> 37811584 (+0.00%)
Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00%

Totals from 14438 (0.63% of 2281134) affected shaders:
Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06%
Subgroup size: 16 -> 32 (+100.00%)
Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55%
Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46%
Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37%
Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02%
Max dispatch width: 123528 -> 123568 (+0.03%)
Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01%
Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01%
Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01%
Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37805336 -> 37805472 (+0.00%)
Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00%

Totals from 14835 (0.65% of 2278397) affected shaders:
Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02%
Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71%
Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76%
Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54%
Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02%
Max dispatch width: 126600 -> 126736 (+0.11%)
Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>
2025-12-17 18:38:55 +00:00
Alyssa Rosenzweig
079e9ae606 treewide: use BITSET_*_COUNT
Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
2025-12-16 17:42:10 +00:00
Caio Oliveira
a4e84c9244 nir/gcm: Consider dead code elimination done by GCM as progress
This will also fix NIR_DEBUG=extended_validation complaining about
invalid loop analysis.  GCM will invalidate loop analysis if progress
was made, and depending on the removed instruction it will affect the
instr_cost.

Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>
2025-12-16 16:19:21 +00:00
Kenneth Graunke
88d46605bd nir: Support Intel URB intrinsics in nir_opt_offsets
We don't bother with maximums or wrapping because it shouldn't come up
for IO intrinsics anyway.

fossil-db results on Battlemage:

   Instrs: 231363032 -> 231359554 (-0.00%)
   Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00%
   Max live registers: 71873886 -> 71870438 (-0.00%)
   Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%)

   Totals from 1779 (0.23% of 788851) affected shaders:
   Instrs: 774359 -> 770881 (-0.45%)
   Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51%
   Max live registers: 158193 -> 154745 (-2.18%)
   Non SSA regs after NIR: 180104 -> 180219 (+0.06%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:46 +00:00
Kenneth Graunke
97857d3224 nir: Fix mod analysis of ishl to shift the recursive result
When considering ((x << y) % divisor), we recursed to calculate
mod = (x % (divisor << y)) but incorrectly returned mod directly,
rather than the correct value, (mod << y).

(Note that we require divisor to be a power-of-two.)

As an example of this going wrong, (x << 1) % 4 was returning (x % 2)
which is 0 or 1, but x << 1 is 2x, which is always an even number so
the result mod 4 can only be 0 or 2.

Unit test suggested by Caio Oliveira during review.

Fixes: 2255375c4d ("nir: add nir_mod_analysis & its tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
2025-12-16 00:58:37 +00:00
Marek Olšák
d17d1f53bd nir/opt_cse: update potential future plans merging copy propagation with CSE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This matches my current understanding of nir_opt_copy_prop, including that
nir_opt_copy_prop always replaces movs with vecN.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
2025-12-13 06:41:59 +00:00
Marek Olšák
9ac8e643d6 nir/lower_io: explain properly how nir_lower_io_lower_64bit_to_32* options work
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
2025-12-13 06:41:59 +00:00
Marek Olšák
41d127b9e8 nir/lower_io: remove unused option nir_lower_io_lower_64bit_float_to_32
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
2025-12-13 06:41:59 +00:00
Marek Olšák
09b2325877 nir/print: print tex->sampler_dim
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
2025-12-13 06:41:58 +00:00
Marek Olšák
4d976a5787 nir: fix the value of nir_io_use_frag_result_dual_src_blend
Due to rebasing not recognizing it as a conflict, it ended up having
the same value as nir_io_assign_color_input_bases_after_all_other_inputs.

Fixes: 9a2f1be814 - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
2025-12-13 06:41:58 +00:00
Iván Briano
a7280ab590 nir: add nir_lower_single_sampled::lower_sample_mask_in option
GLSL defines gl_SampleMaskIn as :
   "a fragment language that indicates the set of samples covered
    by the primitive generating the fragment during multisample
    rasterization"

when variable rate shading is enabled, a single invocation might cover
multiple samples. The lowering done in nir_lower_single_sampled() does
not account for that case, so add an option to selectively disable it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
2025-12-11 22:50:10 +00:00
Iván Briano
ef31f07077 nir: clear SAMPLE_MASK_IN if we lowered it
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
2025-12-11 22:50:10 +00:00
Konstantin Seurer
034f58c7e3 nir: Ignore ray query ranges that don't start with rq_initialize
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Handles is a rare edge case where the ray query is used "before"
there is a rq_initialize.

cc: mesa-stable

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>
2025-12-11 15:56:29 +00:00
Konstantin Seurer
5e03d09eb5 nir: Fix typo in nir_opt_ray_query_ranges
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>
2025-12-11 15:56:28 +00:00
Emma Anholt
1a2d0d3f31 nir: Optimistically unroll loops using induction var as a sample id.
On the assumption that nobody will use a sample id greater than the sample
count, have loop unrolling guess based on the driver's max sample count.
This unrolls a simple resolve shader with a uniform max samples on ir3 to:

  value = vec4(0);
  if (max_samples > 0) {
    value += txf_ms(coord, 0);
    if (max_samples > 1 {
      value += txf_ms(coord, 1);
      if (max_samples > 2){
        value += txf_ms(coord, 2);
        if (max_samples > 3) {
          value += txf_ms(coord, 3);
          for (i = 4; i < max_samples; i++)
            value += txf_ms(coord, i);
          }
        }
     }
   }
   ...

This is only worth a 1% win on our microbenchmark as-is, but if we could
flatten those ifs out and pull the fadds out to the end, avoiding syncs
per load would be a big win.  This seems like a first step.

I've taken a shot at updating drivers to set the value, and tried to leave
notes in places that drivers might update, and want to follow up with
updating the compiler option.

This affects over half the DX11 apps in shader-db-private.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
2025-12-11 14:26:11 +00:00
Emma Anholt
10ba7675c8 nir/uub: Use an optional max_samples from drivers for sample counts.
This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my
shader-db.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
2025-12-11 14:26:11 +00:00
Emma Anholt
dc30e1a128 nir/loop_analyze: Use nir_unsigned_upper_bound for loop trip limits.
This triggers some unrolling in Monster Hunter World, Total War:
Warhammer, and Planet Zoo.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
2025-12-11 14:26:10 +00:00
Mel Henning
2fab8fc297 nir: Use instr_clone in rematerialize_deref_in_block
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The previous implementation seems to predate nir_instr_clone() and
duplicates a lot of the deref cloning code. This also makes the pass
preserve deref->arr.in_bounds correctly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
2025-12-10 22:07:45 +00:00
Mel Henning
dc44c0f32b treewide: Use nir_deref_instr_is_arr()
Via coccinelle and some manual fixups.

@@
expression e1;
@@
- e1->deref_type == nir_deref_type_array || e1->deref_type == nir_deref_type_ptr_as_array
+ nir_deref_instr_is_arr(e1)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
2025-12-10 22:07:45 +00:00
Mel Henning
263a82f49b nir: Add nir_deref_instr_is_arr() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
2025-12-10 22:07:44 +00:00
Marek Olšák
9a2f1be814 nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it
This is potentially nicer for some drivers. AMD drivers will use it.

mesa_frag_result_get_color_index will be used often.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>
2025-12-10 19:16:46 +00:00
Georg Lehmann
621465e417 nir/opt_uniform_subgroup: handle more trivial shuffles/votes
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
e648e551c1 nir/opt_uniform_subgroup: wire up mbcnt_amd path
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
5778436e99 nir/opt_uniform_subgroup: use nir_shader_intrinsics_pass
Nothing here needs the recursion of the full lower_instructions pass.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
5f28bb72a7 nir/divergence_analysis: fix swizzle_amd without fetch inactive
Fixes: ad5be40303 ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
1fc38d8539 nir/opt_uniform_subgroup: fix swizzle_amd without fetch_inactive
Fixes: ad5be40303 ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
e11d7f06d0 nir/opt_uniform_subgroup: don't try to optimize non trivial clustered reduce
Fixes: 535caaf3e0 ("nir: Optimize uniform iadd, fadd, and ixor reduction operations")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Marek Olšák
0c400fbed9 nir: give nir_lower_clip_cull_distance_array_vars a better name
also rename the file

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>
2025-12-10 05:16:34 +00:00
Marek Olšák
74995eb64d nir: split gathering array sizes from nir_lower_clip_cull_distance_array_vars
nir_lower_clip_cull_distance_array_vars was sneakily updating
shader_info::clip/cull_distance_array_size.

This moves the gathering into a new function
nir_gather_clip_cull_distance_sizes_from_vars.

v2: remove assertions that prevented nir_lower_clip_cull_distance_array_vars
    from being used with non-compact arrays

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>
2025-12-10 05:16:34 +00:00
Marek Olšák
bdcb7bc674 nir/gather_info: clear clip/cull_distance_array_size if the IO is not present
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>
2025-12-10 05:16:33 +00:00
Alyssa Rosenzweig
5ced623fdf nir: print nir_tex_instr::backend_flags if present
I was wondering where this was disappearing to.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38789>
2025-12-09 20:44:15 +00:00
Arcady Goldmints-Orlov
68bb5d9e49 kk: enable shaderClipDistance
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since Metal doesn't pass clip distance into the fragment shader, we have to
do it ourselves. The CLIP_DIST0/1 varying slots are used to represent the
user-defined varyings we use to pass them from vertex to fragment and
a new intrinsic is added to represent the write to the built-in
clip_distance variable. Since the CLIP_DIST0/1 varying slots are not affected
by opt_varyings, there can be potential interface mismatches so the machinery
in msl_iomap.c is refactored to allow them to be output as a series of scalars
rather than vectors.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38839>
2025-12-08 23:09:53 -05:00
Connor Abbott
ad84ae2719 tu: Implement VK_QCOM_subpass_shader_resolve
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
2025-12-08 20:44:46 +00:00
Connor Abbott
bd821b9a17 nir, tu: Add and use load_frag_coord_gmem_ir3
We used load_frag_coord_unscaled_ir3 for loading the fragment coord for
input attachments in GMEM, where the normal scaling for gl_FragCoord
shouldn't be used. However with custom resolve a different scaling will
apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename
the NIR options, in preparation for this.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
2025-12-08 20:44:45 +00:00