Commit graph

6764 commits

Author SHA1 Message Date
Job Noorman
3908a228bd nir: add opt_uub pass
Add a pass that uses nir_unsigned_upper_bound to simplify some ALU
operations:
- iand src, mask: if mask is constant with N least significant bits set
  and uub(src) < 2^N, the iand does nothing and can be removed.
- ult src, const: if uub(src) < cmp -> true
- uge src, const: if uub(src) < cmp -> false
- ilt src, const: if uub(src) >= 0 && cmp <  0 -> false
-                 if uub(src) >= 0 && cmp >= 0 -> ult src, const
- ige src, const: if uub(src) >= 0 && cmp <  0 -> true
-                 if uub(src) >= 0 && cmp >= 0 -> uge src, const
- umin src, const: if uub(src) <= const -> src
- umax src, const: if uub(src) <= const -> const
- imin src, const: if uub(src) >= 0 && const <  0 -> const
-                  if uub(src) >= 0 && const >= 0 -> umin src, const
- imax src, const: if uub(src) >= 0 && const <  0 -> src
-                  if uub(src) >= 0 && const >= 0 -> umax src, const
- imul src0, src1: if uub(srci) < UINT16_MAX -> umul_16x16 src0, src1
- imul src0, src1: if uub(srci) < UINT24_MAX -> umul24 src0, src1
- imul src0, src1: if uub(srci) < UINT23_MAX -> imul24 src0, src1

The imul optimization needs to be explicitly enabled using a pass
option. This is useful since 1) most backends don't support umul_16x16,
and 2) some passes (e.g., nir_opt_load_store_vectorize) need to analyze
imuls so lowering them before running such a pass makes their job more
difficult.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>
2025-11-07 10:23:29 +00:00
Job Noorman
0b348fb375 nir: add has_umul_16x16 option
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>
2025-11-07 10:23:29 +00:00
Natalie Vock
0cb1fca8fa nir: Use sparse bitset for liveness information
Some shaders, especially RTPSO shaders that have parts of the PSO
inlined, can become absolutely huge. Using a sparse bitset avoids
quadratic complexity in memory consumption for the liveness information.

This reduces peak memory usage in worst-case tests (hammering
compilation of many huge RTPSOs on 32 threads concurrently) by ~60%,
from 43GB to 18GB.

CPU time (seconds) differences for a workload with mostly small shaders:
Difference at 95.0% confidence
        -5.27 +/- 1.08963
        -0.88811% +/- 0.183626%
        (Student's t, pooled s = 0.629735)

Peak resident set usage for the mostly-small workload:
Difference at 95.0% confidence
        30809 +/- 13394.3
        1.59276% +/- 0.69246%
        (Student's t, pooled s = 7741.09)

CPU time for the heavy workload did not show any difference.

Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
2025-11-06 21:34:33 +00:00
Faith Ekstrand
0ccadf7a86 nir: Check the deref mode in lower_point_size()
This is more robust because it ensures that we only ever check the
location on something that we know is an outupt.  Also, if it's an
output then we know (thanks, validation!) that it's a variable.

Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
2025-11-06 14:57:31 +00:00
Faith Ekstrand
5ed35866c2 nir: Handle lowered I/O in lower_viewport_transform()
While we're here, make the variable handling a little more robust by
checking the deref mode before assuming there's a reachable variable.

Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
2025-11-06 14:57:31 +00:00
Antonio Ospite
222b85328e mesa: replace most occurrences of getenv() with os_get_option()
The standard way to query options in mesa is `os_get_option()` which
abstracts platform-specific mechanisms to get config variables.

However in quite a few places `getenv()` is still used and this may
preclude controlling some options on some systems.

For instance it is not generally possible to use `MESA_DEBUG` on
Android.

So replace most `getenv()` occurrences with  `os_get_option()` to
support configuration options more consistently across different
platforms.

Do the same with `secure_getenv()` replacing it with
`os_get_option_secure()`.

The bulk of the proposed changes are mechanically performed by the
following script:

-----------------------------------------------------------------------
  #!/bin/sh

  set -e

  replace() {

    # Don't replace in some files, for example where `os_get_option` is defined,
    # or in external files
    EXCLUDE_FILES_PATTERN='(src/util/os_misc.c|src/util/u_debug.h|src/gtest/include/gtest/internal/gtest-port.h)'

    # Don't replace some "system" variables
    EXCLUDE_VARS_PATTERN='("XDG|"DISPLAY|"HOME|"TMPDIR|"POSIXLY_CORRECT)'

    git grep "[=!( ]$1(" -- src/ | cut -d ':' -f 1 | sort | uniq | \
      grep -v -E "$EXCLUDE_FILES_PATTERN" | \
      while read -r file;
      do
        # Don't replace usages of XDG_* variables or HOME
        sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!\( ])$1\(/\1$2\(/g" -i "$file";
      done
  }

  # Add const to os_get_option results, to avoid warning about discarded qualifier:
  #   warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
  # but also errors in some cases:
  #   error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
  add_const_results() {
    git grep -l -P '(?<!const )char.*os_get_option' | \
      while read -r file;
      do
        sed -e '/^\s*const/! s/\(char.*os_get_option\)/const \1/g' -i "$file"
      done
  }

  replace 'secure_getenv' 'os_get_option_secure'

  replace 'getenv' 'os_get_option'

  add_const_results
-----------------------------------------------------------------------

After this, the `#include "util/os_misc.h"` is also added in files where
`os_get_option()` was not used before.

And since the replacements from the script above generated some new
`-Wdiscarded-qualifiers` warnings, those have been addressed as well,
generally by declaring `os_get_option()` results as `const char *` and
adjusting some function declarations.

Finally some replacements caused new errors like:

-----------------------------------------------------------------------
../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok'
  127 |          for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) {
      |                               ^~~~~~
/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char *') would lose const qualifier
  124 | char* _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter);
      |                 ^      ~~~~~~~~~~~~~~~~~~~
-----------------------------------------------------------------------

Those have been addressed too, copying the const string returned by
`os_get_option()` so that it could be modified.

In particular, the error above has been fixed  by copying the `const
char *env_llc_options` variable in
`src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char *` which can
be tokenized using `strtok()`.

Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
2025-11-06 04:36:13 +00:00
Alyssa Rosenzweig
9c2a2deee6 treewide: use BITSET_BYTES, BITSET_RZALLOC
Via Coccinelle patches:

    @@
    expression bits;
    typedef BITSET_WORD;
    @@

    -BITSET_WORDS(bits) * sizeof(BITSET_WORD)
    +BITSET_BYTES(bits)

    @@
    expression memctx, bits;
    typedef BITSET_WORD;
    @@

    -rzalloc_array(memctx, BITSET_WORD, BITSET_WORDS(bits))
    +BITSET_RZALLOC(memctx, bits)

     @@
     expression memctx, bits;
     @@

     -rzalloc_size(memctx, BITSET_BYTES(bits))
     +BITSET_RZALLOC(memctx, bits)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38245>
2025-11-05 18:44:23 +00:00
Konstantin Seurer
b962063d72 nir: Remove nir_parallel_copy_instr
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>
2025-11-04 18:51:51 +00:00
Konstantin Seurer
3f3faa82b8 nir/from_ssa: Stop using nir_parallel_copy_instr
nir_parallel_copy_instr can be emulated using an intrinsic for each
entry and an array of arrays that is used by the pass to remember which
copies belong together.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>
2025-11-04 18:51:50 +00:00
Konstantin Seurer
b20fd0ef48 nir: Remove parallel copy handling from rewrite_uses_to_load_reg
Parallel copies are only created by nir_convert_from_ssa which does not
use the helper.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>
2025-11-04 18:51:50 +00:00
Ian Romanick
67a6fc0160 nir/opt_if: See through inot
Consider

    if (!x) {
       if (x) {
          ...
       }
    }

The inner use of `x` must be false, but so far only instances of `!x`
would have been replaced with a constant. See through the `inot` to
replace instances of `x` as well.

shader-db:

Lunar Lake
total instructions in shared programs: 17205147 -> 17204908 (<.01%)
instructions in affected programs: 56037 -> 55798 (-0.43%)
helped: 79 / HURT: 79

total cycles in shared programs: 879847886 -> 879992944 (0.02%)
cycles in affected programs: 5244138 -> 5389196 (2.77%)
helped: 141 / HURT: 125

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968312 -> 19968069 (<.01%)
instructions in affected programs: 65698 -> 65455 (-0.37%)
helped: 88 / HURT: 104

total cycles in shared programs: 884331007 -> 884469865 (0.02%)
cycles in affected programs: 4839695 -> 4978553 (2.87%)
helped: 172 / HURT: 136

LOST:   3
GAINED: 0

Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs: 20809765 -> 20809473 (<.01%)
instructions in affected programs: 65976 -> 65684 (-0.44%)
helped: 89 / HURT: 102

total cycles in shared programs: 872466849 -> 872433762 (<.01%)
cycles in affected programs: 5452888 -> 5419801 (-0.61%)
helped: 157 / HURT: 133

total spills in shared programs: 4014 -> 4010 (-0.10%)
spills in affected programs: 30 -> 26 (-13.33%)
helped: 1 / HURT: 0

total fills in shared programs: 3769 -> 3765 (-0.11%)
fills in affected programs: 50 -> 46 (-8.00%)
helped: 1 / HURT: 0

LOST:   3
GAINED: 1

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 910122459 -> 910097570 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 40045664 -> 40046176 (+0.00%)
Send messages: 40724361 -> 40724036 (-0.00%)
Loop count: 970500 -> 970054 (-0.05%)
Cycle count: 105785543442 -> 105794147978 (+0.01%); split: -0.02%, +0.02%
Spill count: 3426093 -> 3426032 (-0.00%); split: -0.00%, +0.00%
Fill count: 6525296 -> 6525210 (-0.00%); split: -0.00%, +0.00%
Max live registers: 188561553 -> 188519064 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 47958304 -> 47958496 (+0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 227303232 -> 227296055 (-0.00%); split: -0.00%, +0.00%

Totals from 15417 (0.78% of 1977988) affected shaders:
Instrs: 16984488 -> 16959599 (-0.15%); split: -0.20%, +0.05%
Subgroup size: 512 -> 1024 (+100.00%)
Send messages: 900193 -> 899868 (-0.04%)
Loop count: 23059 -> 22613 (-1.93%)
Cycle count: 1200149390 -> 1208753926 (+0.72%); split: -1.48%, +2.20%
Spill count: 25838 -> 25777 (-0.24%); split: -0.29%, +0.06%
Fill count: 43627 -> 43541 (-0.20%); split: -0.28%, +0.08%
Max live registers: 2550741 -> 2508252 (-1.67%); split: -1.75%, +0.08%
Max dispatch width: 296736 -> 296928 (+0.06%); split: +0.08%, -0.02%
Non SSA regs after NIR: 3264670 -> 3257493 (-0.22%); split: -0.25%, +0.03%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38196>
2025-11-04 18:04:00 +00:00
Alyssa Rosenzweig
17355f716b treewide: use UTIL_DYNARRAY_INIT
Instead of util_dynarray_init(&dynarray, NULL), just use
UTIL_DYNARRAY_INIT instead. This is more ergonomic.

Via Coccinelle patch:

    @@
    identifier dynarray;
    @@

    -struct util_dynarray dynarray = {0};
    -util_dynarray_init(&dynarray, NULL);
    +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT;

    @@
    identifier dynarray;
    @@

    -struct util_dynarray dynarray;
    -util_dynarray_init(&dynarray, NULL);
    +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT;

    @@
    expression dynarray;
    @@

    -util_dynarray_init(&(dynarray), NULL);
    +dynarray = UTIL_DYNARRAY_INIT;

    @@
    expression dynarray;
    @@

    -util_dynarray_init(dynarray, NULL);
    +(*dynarray) = UTIL_DYNARRAY_INIT;

Followed by sed:

    bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init(&\(.*\), NULL)/\1 = UTIL_DYNARRAY_INIT/g' \{} \;"
    bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init( &\(.*\), NULL )/\1 = UTIL_DYNARRAY_INIT/g' \{} \;"
    bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init(\(.*\), NULL)/*\1 = UTIL_DYNARRAY_INIT/g' \{} \;"

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38189>
2025-11-04 13:39:48 +00:00
Marek Olšák
2f6b4803ab nir/validate: expand IO intrinsic validation with nir_io_semantics
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There are many workarounds.

v2: add more validation

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>
2025-11-02 02:21:46 +00:00
Marek Olšák
390023f9fd nir/lower_io: force src offset=0 for any indirect access with num_slots == 1
This reduces indirect indexing of 1-element arrays to indexing with 0.
Without this, we fail an assertion later.

Discovered when writing a test.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>
2025-11-02 02:21:46 +00:00
Marek Olšák
3e2c11597a nir: add nir_intrinsic_ssbo_descriptor_amd for lowering get_ssbo_size
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>
2025-11-02 01:42:07 +00:00
Alyssa Rosenzweig
a014daea8f nir: use alignment helpers more
Coccinelle + filtering hunks manually +

    @@
    expression pt, pot;
    typedef uintptr_t;
    @@

    -util_is_aligned((uintptr_t)(pt), pot)
    +util_ptr_is_aligned(pt, pot)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>
2025-10-31 15:03:57 +00:00
Marek Olšák
86dd74aaeb nir/lower_indirect_derefs: don't lower compact arrays unconditionally to fix perf
This fixes bad mesh shader performance. See the comment.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>
2025-10-31 00:57:46 +00:00
Daniel Schürmann
b3615e5d6f nir/algebraic: ad-hoc constant-fold ALU instructions
Slight differences due to different optimization order.
Totals from 135 (0.17% of 79839) affected shaders: (Navi48)
Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03%
CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04%
Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12%
InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84%
VClause: 4092 -> 4084 (-0.20%)
SClause: 7462 -> 7478 (+0.21%)
Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21%
Branches: 6395 -> 6386 (-0.14%)
PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07%
PreVGPRs: 6375 -> 6382 (+0.11%)
VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02%
SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12%
VMEM: 6704 -> 6696 (-0.12%)
SMEM: 12099 -> 12129 (+0.25%)

Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV:
baseline:
  nir_opt_algebraic was called 12917 times from radv_optimize_nir()
  nir_opt_cse was called 15204 times from radv_optimize_nir()
  relative time spent in radv_optimize_nir(): 31.48%
  total instruction fetch cost: 28,642,638,021

with nir/algebraic: ad-hoc constant-fold ALU instructions
  nir_opt_algebraic was called 12797 times from radv_optimize_nir()
  nir_opt_cse was called 12963 times from radv_optimize_nir()
  relative time spent in radv_optimize_nir(): 30.63%
  total instruction fetch cost: 28,284,386,123

=> ~1.27% improvement in total compile times

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
9039e24751 nir/lower_flrp: ad-hoc constant-fold ALU instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
f61cd64af8 nir/builder: add option to immediately constant-fold ALU instructions upon insertion
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
870616af34 nir/constant_folding: switch to nir_shader_lower_instructions()
Small differences due to implicit DCE.
Totals from 76 (0.10% of 79839) affected shaders: (Navi48)

Instrs: 168051 -> 168044 (-0.00%); split: -0.01%, +0.01%
CodeSize: 893284 -> 893256 (-0.00%); split: -0.01%, +0.01%
Latency: 1082007 -> 1082027 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 155100 -> 155105 (+0.00%)
Copies: 9649 -> 9654 (+0.05%)
VALU: 92504 -> 92509 (+0.01%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:07 +00:00
Daniel Schürmann
d1f2f1222e nir: guard nir_def_as_alu()
We will potentially create load_const_instr instead of ALU.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Daniel Schürmann
3180656bbc nir: don't use nir_build_alu() with incomplete sources
Ideally we'd have a version that takes nir_scalar arguments.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Daniel Schürmann
ef9ecc4058 nir: add nir_imul_nuw() and nir_imul_imm_nuw() helpers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
2025-10-30 19:28:06 +00:00
Alyssa Rosenzweig
b82044c31b nir/lower_two_sided_color: cleanup
while in the area. no functional change

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38124>
2025-10-29 15:52:27 +00:00
Job Noorman
32b646c597 nir: print in_bounds info for deref_type(_ptr_as)_array
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>
2025-10-28 14:21:01 +00:00
Natalie Vock
50e65dac79 nir/lower_shader_calls: Repair SSA after wrap_instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>
2025-10-26 11:38:51 +00:00
Alyssa Rosenzweig
b824ef83ab util/dynarray: infer type in append
Most of the time, we can infer the type to append in
util_dynarray_append using __typeof__, which is standardized in C23 and
support in Jesse's MSMSVCV. This patch drops the type argument most of
the time, making util_dynarray a little more ergonomic to use.

This is done in four steps.

First, rename util_dynarray_append -> util_dynarray_append_typed

    bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;"

Then, add a new append that infers the type. This is much more ergonomic
for what you want most of the time.

Next, use type-inferred append as much as possible, via Coccinelle
patch (plus manual fixup):

    @@
    expression dynarray, element;
    type type;
    @@

    -util_dynarray_append_typed(dynarray, type, element);
    +util_dynarray_append(dynarray, element);

Finally, hand fixup cases that Coccinelle missed or incorrectly
translated, of which there were several because we can't used the
untyped append with a literal (since the sizeof won't do what you want).

All four steps are squashed to produce a single patch changing every
util_dynarray_append call site in tree to either drop a type parameter
(if possible) or insert a _typed suffix (if we can't infer). As such,
the final patch is best reviewed by hand even though it was
tool-assisted.

No Long Linguine Meals were involved in the making of this patch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
2025-10-24 18:32:07 +00:00
Ian Romanick
f1bbc3d4e4 nir/algebraic: Don't generate integer min or max that will need to be lowered
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.

There were no shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
2025-10-23 22:35:27 +00:00
Rhys Perry
92beca9aa5 nir/lower_tex: optimize txd(coord, ddx/ddy(coord))
fossil-db (gfx1201):
Totals from 73 (0.09% of 79839) affected shaders:
MaxWaves: 1668 -> 1670 (+0.12%)
Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00%
CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00%
VGPRs: 6360 -> 6324 (-0.57%)
Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02%
InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03%
VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05%
SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86%
Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32%
PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13%
VALU: 232982 -> 228117 (-2.09%)
SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03%
VMEM: 9234 -> 9237 (+0.03%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>
2025-10-23 11:21:59 +00:00
Rhys Perry
8e7ea4a882 nir/lower_shader_calls: reobtain impl after NIR_PASS
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
2025-10-23 10:44:38 +00:00
Lionel Landwerlin
aa929ea706 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>
2025-10-22 21:21:47 +00:00
Simon Perretta
ff51e6dc9e nir: commonize barycentric intrinsic opt pass
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
2025-10-22 16:48:01 +00:00
Olivia Lee
a410d90fd2 panfrost: fix cl_local_size for precompiled shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
2025-10-22 00:15:49 +00:00
Rhys Perry
64ec757688 nir/lower_mem_access_bit_sizes: increase chunk limit
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Georg Lehmann
cf4ab485ea nir: remove manual nir_load_global_constant
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:39:53 +02:00
Georg Lehmann
654bd74c60 treewide: use nir_store_global alias of nir_build_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
2306cba65b nir: remove manual nir_store_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
9e41a7c139 treewide: use nir_load_global alias of nir_build_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Georg Lehmann
77540cac8c nir: remove manual nir_load_global
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>
2025-10-21 12:37:58 +02:00
Lionel Landwerlin
255d1e883d nir/divergence: fix handling of intel uniform block load
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>
2025-10-21 06:13:10 +00:00
Emma Anholt
0781edc30f nir/copy_prop_vars: Mask out no-op writes to variables.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The pass previously supported removing complete no-op writes, but we can
do better by noticing if any channel being written is the current
channel's value and masking off those writes.  I noticed this happening in
Stray's fragment shader, where it looked like some translation layer had
turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a,
b).  This in turn lets nir_shrink_vec_array_vars be more effective.

Totals:
MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01%
Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00%
CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00%
STPs: 369810 -> 234618 (-36.56%)
LDPs: 209430 -> 172011 (-17.87%)

Totals from 1884 (0.12% of 1560230) affected shaders:
MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15%
Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06%
CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05%
NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58%
MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59%
COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83%
Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31%
(ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68%
(sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18%
(ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57%
(sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12%
STPs: 212045 -> 76853 (-63.76%)
LDPs: 47337 -> 9918 (-79.05%)
Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13%

Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43%
Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81%
Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04%
Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11%
Cat6: 461411 -> 352375 (-23.63%)
Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
537cc4e0ff nir/shrink_stores: Don't shrink stores to an invalid num_components.
Avoids a regression in the CL CTS on the next commit.

Fixes: 2dba7e6056 ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Emma Anholt
d8690f9c60 nir/link_opt_varyings: Make it participate in NIR_DEBUG=print.
It's a pass with major effects on shaders, and it's otherwise weird to see
your varying disappear between two passes that shouldn't affect them.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
2025-10-20 19:24:45 +00:00
Aitor Camacho
f711c3afed nir: Add KosmicKrisp required utilities
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>
2025-10-20 16:22:00 +00:00
Job Noorman
ad421cdf2e nir: mark fneg distribution through fadd/ffma as nsz
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
2025-10-17 08:58:59 +00:00
Mary Guillemard
6f73533094 asahi,nir: Stop relying on zero and scratch page in GS/TESS code
Introduce new NIR intrinsics to handle getting a "sink" read-only
address and another intrinsic to handle conversion of address to
read-write (allowing implementation to replace the "sink" read-only with
another address like required for Asahi)

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Mary Guillemard
1e0c18d6cf nir: Rename stat_query_address_agx to stat_query_address_poly
This is used by the geometry lowering that we are going to move to
common code.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig
84d8e6824b treewide: don't check before free
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.

Via Coccinelle patches:

    @@
    expression ptr;
    @@

    -if (ptr) {
    -free(ptr);
    -}
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -FREE(ptr);
    -}
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr) {
    -ralloc_free(ptr);
    -}
    +ralloc_free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -free(ptr);
    -}
    -
    +free(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -FREE(ptr);
    -}
    -
    +FREE(ptr);

    @@
    expression ptr;
    @@

    -if (ptr != NULL) {
    -ralloc_free(ptr);
    -}
    -
    +ralloc_free(ptr);

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
2025-10-15 23:01:33 +00:00
Dave Airlie
543c9be87a nir/coopmat: fix non square load/store lowering for flexible dimensions
This shouldn't affect radv, but we should do the calculations correctly for
when non-square matters.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>
2025-10-16 07:19:28 +10:00