fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 17:58:09 +02:00

Author	SHA1	Message	Date
Job Noorman	3908a228bd	nir: add opt_uub pass Add a pass that uses nir_unsigned_upper_bound to simplify some ALU operations: - iand src, mask: if mask is constant with N least significant bits set and uub(src) < 2^N, the iand does nothing and can be removed. - ult src, const: if uub(src) < cmp -> true - uge src, const: if uub(src) < cmp -> false - ilt src, const: if uub(src) >= 0 && cmp < 0 -> false - if uub(src) >= 0 && cmp >= 0 -> ult src, const - ige src, const: if uub(src) >= 0 && cmp < 0 -> true - if uub(src) >= 0 && cmp >= 0 -> uge src, const - umin src, const: if uub(src) <= const -> src - umax src, const: if uub(src) <= const -> const - imin src, const: if uub(src) >= 0 && const < 0 -> const - if uub(src) >= 0 && const >= 0 -> umin src, const - imax src, const: if uub(src) >= 0 && const < 0 -> src - if uub(src) >= 0 && const >= 0 -> umax src, const - imul src0, src1: if uub(srci) < UINT16_MAX -> umul_16x16 src0, src1 - imul src0, src1: if uub(srci) < UINT24_MAX -> umul24 src0, src1 - imul src0, src1: if uub(srci) < UINT23_MAX -> imul24 src0, src1 The imul optimization needs to be explicitly enabled using a pass option. This is useful since 1) most backends don't support umul_16x16, and 2) some passes (e.g., nir_opt_load_store_vectorize) need to analyze imuls so lowering them before running such a pass makes their job more difficult. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>	2025-11-07 10:23:29 +00:00
Job Noorman	0b348fb375	nir: add has_umul_16x16 option Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>	2025-11-07 10:23:29 +00:00
Natalie Vock	0cb1fca8fa	nir: Use sparse bitset for liveness information Some shaders, especially RTPSO shaders that have parts of the PSO inlined, can become absolutely huge. Using a sparse bitset avoids quadratic complexity in memory consumption for the liveness information. This reduces peak memory usage in worst-case tests (hammering compilation of many huge RTPSOs on 32 threads concurrently) by ~60%, from 43GB to 18GB. CPU time (seconds) differences for a workload with mostly small shaders: Difference at 95.0% confidence -5.27 +/- 1.08963 -0.88811% +/- 0.183626% (Student's t, pooled s = 0.629735) Peak resident set usage for the mostly-small workload: Difference at 95.0% confidence 30809 +/- 13394.3 1.59276% +/- 0.69246% (Student's t, pooled s = 7741.09) CPU time for the heavy workload did not show any difference. Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>	2025-11-06 21:34:33 +00:00
Faith Ekstrand	0ccadf7a86	nir: Check the deref mode in lower_point_size() This is more robust because it ensures that we only ever check the location on something that we know is an outupt. Also, if it's an output then we know (thanks, validation!) that it's a variable. Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>	2025-11-06 14:57:31 +00:00
Faith Ekstrand	5ed35866c2	nir: Handle lowered I/O in lower_viewport_transform() While we're here, make the variable handling a little more robust by checking the deref mode before assuming there's a reachable variable. Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>	2025-11-06 14:57:31 +00:00
Antonio Ospite	222b85328e	mesa: replace most occurrences of getenv() with os_get_option() The standard way to query options in mesa is `os_get_option()` which abstracts platform-specific mechanisms to get config variables. However in quite a few places `getenv()` is still used and this may preclude controlling some options on some systems. For instance it is not generally possible to use `MESA_DEBUG` on Android. So replace most `getenv()` occurrences with `os_get_option()` to support configuration options more consistently across different platforms. Do the same with `secure_getenv()` replacing it with `os_get_option_secure()`. The bulk of the proposed changes are mechanically performed by the following script: ----------------------------------------------------------------------- #!/bin/sh set -e replace() { # Don't replace in some files, for example where `os_get_option` is defined, # or in external files EXCLUDE_FILES_PATTERN='(src/util/os_misc.c\|src/util/u_debug.h\|src/gtest/include/gtest/internal/gtest-port.h)' # Don't replace some "system" variables EXCLUDE_VARS_PATTERN='("XDG\|"DISPLAY\|"HOME\|"TMPDIR\|"POSIXLY_CORRECT)' git grep "[=!( ]$1(" -- src/ \| cut -d ':' -f 1 \| sort \| uniq \| \ grep -v -E "$EXCLUDE_FILES_PATTERN" \| \ while read -r file; do # Don't replace usages of XDG_* variables or HOME sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!$ ])$1\(/\1$2\(/g" -i "$file"; done } # Add const to os_get_option results, to avoid warning about discarded qualifier: # warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] # but also errors in some cases: # error: invalid conversion from ‘const char’ to ‘char’ [-fpermissive] add_const_results() { git grep -l -P '(?<!const )char.os_get_option' \| \ while read -r file; do sed -e '/^\sconst/! s/\(char.os_get_option$/const \1/g' -i "$file" done } replace 'secure_getenv' 'os_get_option_secure' replace 'getenv' 'os_get_option' add_const_results ----------------------------------------------------------------------- After this, the `#include "util/os_misc.h"` is also added in files where `os_get_option()` was not used before. And since the replacements from the script above generated some new `-Wdiscarded-qualifiers` warnings, those have been addressed as well, generally by declaring `os_get_option()` results as `const char ` and adjusting some function declarations. Finally some replacements caused new errors like: ----------------------------------------------------------------------- ../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok' 127 \| for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) { \| ^~~~~~ /android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char ') would lose const qualifier 124 \| char _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter); \| ^ ~~~~~~~~~~~~~~~~~~~ ----------------------------------------------------------------------- Those have been addressed too, copying the const string returned by `os_get_option()` so that it could be modified. In particular, the error above has been fixed by copying the `const char env_llc_options` variable in `src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char ` which can be tokenized using `strtok()`. Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>	2025-11-06 04:36:13 +00:00
Alyssa Rosenzweig	9c2a2deee6	treewide: use BITSET_BYTES, BITSET_RZALLOC Via Coccinelle patches: @@ expression bits; typedef BITSET_WORD; @@ -BITSET_WORDS(bits) * sizeof(BITSET_WORD) +BITSET_BYTES(bits) @@ expression memctx, bits; typedef BITSET_WORD; @@ -rzalloc_array(memctx, BITSET_WORD, BITSET_WORDS(bits)) +BITSET_RZALLOC(memctx, bits) @@ expression memctx, bits; @@ -rzalloc_size(memctx, BITSET_BYTES(bits)) +BITSET_RZALLOC(memctx, bits) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38245>	2025-11-05 18:44:23 +00:00
Konstantin Seurer	b962063d72	nir: Remove nir_parallel_copy_instr Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:51 +00:00
Konstantin Seurer	3f3faa82b8	nir/from_ssa: Stop using nir_parallel_copy_instr nir_parallel_copy_instr can be emulated using an intrinsic for each entry and an array of arrays that is used by the pass to remember which copies belong together. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:50 +00:00
Konstantin Seurer	b20fd0ef48	nir: Remove parallel copy handling from rewrite_uses_to_load_reg Parallel copies are only created by nir_convert_from_ssa which does not use the helper. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:50 +00:00
Ian Romanick	67a6fc0160	nir/opt_if: See through inot Consider if (!x) { if (x) { ... } } The inner use of `x` must be false, but so far only instances of `!x` would have been replaced with a constant. See through the `inot` to replace instances of `x` as well. shader-db: Lunar Lake total instructions in shared programs: 17205147 -> 17204908 (<.01%) instructions in affected programs: 56037 -> 55798 (-0.43%) helped: 79 / HURT: 79 total cycles in shared programs: 879847886 -> 879992944 (0.02%) cycles in affected programs: 5244138 -> 5389196 (2.77%) helped: 141 / HURT: 125 Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19968312 -> 19968069 (<.01%) instructions in affected programs: 65698 -> 65455 (-0.37%) helped: 88 / HURT: 104 total cycles in shared programs: 884331007 -> 884469865 (0.02%) cycles in affected programs: 4839695 -> 4978553 (2.87%) helped: 172 / HURT: 136 LOST: 3 GAINED: 0 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 20809765 -> 20809473 (<.01%) instructions in affected programs: 65976 -> 65684 (-0.44%) helped: 89 / HURT: 102 total cycles in shared programs: 872466849 -> 872433762 (<.01%) cycles in affected programs: 5452888 -> 5419801 (-0.61%) helped: 157 / HURT: 133 total spills in shared programs: 4014 -> 4010 (-0.10%) spills in affected programs: 30 -> 26 (-13.33%) helped: 1 / HURT: 0 total fills in shared programs: 3769 -> 3765 (-0.11%) fills in affected programs: 50 -> 46 (-8.00%) helped: 1 / HURT: 0 LOST: 3 GAINED: 1 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 910122459 -> 910097570 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40045664 -> 40046176 (+0.00%) Send messages: 40724361 -> 40724036 (-0.00%) Loop count: 970500 -> 970054 (-0.05%) Cycle count: 105785543442 -> 105794147978 (+0.01%); split: -0.02%, +0.02% Spill count: 3426093 -> 3426032 (-0.00%); split: -0.00%, +0.00% Fill count: 6525296 -> 6525210 (-0.00%); split: -0.00%, +0.00% Max live registers: 188561553 -> 188519064 (-0.02%); split: -0.02%, +0.00% Max dispatch width: 47958304 -> 47958496 (+0.00%); split: +0.00%, -0.00% Non SSA regs after NIR: 227303232 -> 227296055 (-0.00%); split: -0.00%, +0.00% Totals from 15417 (0.78% of 1977988) affected shaders: Instrs: 16984488 -> 16959599 (-0.15%); split: -0.20%, +0.05% Subgroup size: 512 -> 1024 (+100.00%) Send messages: 900193 -> 899868 (-0.04%) Loop count: 23059 -> 22613 (-1.93%) Cycle count: 1200149390 -> 1208753926 (+0.72%); split: -1.48%, +2.20% Spill count: 25838 -> 25777 (-0.24%); split: -0.29%, +0.06% Fill count: 43627 -> 43541 (-0.20%); split: -0.28%, +0.08% Max live registers: 2550741 -> 2508252 (-1.67%); split: -1.75%, +0.08% Max dispatch width: 296736 -> 296928 (+0.06%); split: +0.08%, -0.02% Non SSA regs after NIR: 3264670 -> 3257493 (-0.22%); split: -0.25%, +0.03% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38196>	2025-11-04 18:04:00 +00:00
Alyssa Rosenzweig	17355f716b	treewide: use UTIL_DYNARRAY_INIT Instead of util_dynarray_init(&dynarray, NULL), just use UTIL_DYNARRAY_INIT instead. This is more ergonomic. Via Coccinelle patch: @@ identifier dynarray; @@ -struct util_dynarray dynarray = {0}; -util_dynarray_init(&dynarray, NULL); +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT; @@ identifier dynarray; @@ -struct util_dynarray dynarray; -util_dynarray_init(&dynarray, NULL); +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT; @@ expression dynarray; @@ -util_dynarray_init(&(dynarray), NULL); +dynarray = UTIL_DYNARRAY_INIT; @@ expression dynarray; @@ -util_dynarray_init(dynarray, NULL); +(dynarray) = UTIL_DYNARRAY_INIT; Followed by sed: bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init(&$.$, NULL)/\1 = UTIL_DYNARRAY_INIT/g' \{} \;" bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init( &$.$, NULL )/\1 = UTIL_DYNARRAY_INIT/g' \{} \;" bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init($.$, NULL)/*\1 = UTIL_DYNARRAY_INIT/g' \{} \;" Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38189>	2025-11-04 13:39:48 +00:00
Marek Olšák	2f6b4803ab	nir/validate: expand IO intrinsic validation with nir_io_semantics Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details There are many workarounds. v2: add more validation Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>	2025-11-02 02:21:46 +00:00
Marek Olšák	390023f9fd	nir/lower_io: force src offset=0 for any indirect access with num_slots == 1 This reduces indirect indexing of 1-element arrays to indexing with 0. Without this, we fail an assertion later. Discovered when writing a test. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>	2025-11-02 02:21:46 +00:00
Marek Olšák	3e2c11597a	nir: add nir_intrinsic_ssbo_descriptor_amd for lowering get_ssbo_size Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>	2025-11-02 01:42:07 +00:00
Alyssa Rosenzweig	a014daea8f	nir: use alignment helpers more Coccinelle + filtering hunks manually + @@ expression pt, pot; typedef uintptr_t; @@ -util_is_aligned((uintptr_t)(pt), pot) +util_ptr_is_aligned(pt, pot) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>	2025-10-31 15:03:57 +00:00
Marek Olšák	86dd74aaeb	nir/lower_indirect_derefs: don't lower compact arrays unconditionally to fix perf This fixes bad mesh shader performance. See the comment. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>	2025-10-31 00:57:46 +00:00
Daniel Schürmann	b3615e5d6f	nir/algebraic: ad-hoc constant-fold ALU instructions Slight differences due to different optimization order. Totals from 135 (0.17% of 79839) affected shaders: (Navi48) Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03% CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04% Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12% InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84% VClause: 4092 -> 4084 (-0.20%) SClause: 7462 -> 7478 (+0.21%) Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21% Branches: 6395 -> 6386 (-0.14%) PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07% PreVGPRs: 6375 -> 6382 (+0.11%) VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02% SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12% VMEM: 6704 -> 6696 (-0.12%) SMEM: 12099 -> 12129 (+0.25%) Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV: baseline: nir_opt_algebraic was called 12917 times from radv_optimize_nir() nir_opt_cse was called 15204 times from radv_optimize_nir() relative time spent in radv_optimize_nir(): 31.48% total instruction fetch cost: 28,642,638,021 with nir/algebraic: ad-hoc constant-fold ALU instructions nir_opt_algebraic was called 12797 times from radv_optimize_nir() nir_opt_cse was called 12963 times from radv_optimize_nir() relative time spent in radv_optimize_nir(): 30.63% total instruction fetch cost: 28,284,386,123 => ~1.27% improvement in total compile times Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	9039e24751	nir/lower_flrp: ad-hoc constant-fold ALU instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	f61cd64af8	nir/builder: add option to immediately constant-fold ALU instructions upon insertion Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	870616af34	nir/constant_folding: switch to nir_shader_lower_instructions() Small differences due to implicit DCE. Totals from 76 (0.10% of 79839) affected shaders: (Navi48) Instrs: 168051 -> 168044 (-0.00%); split: -0.01%, +0.01% CodeSize: 893284 -> 893256 (-0.00%); split: -0.01%, +0.01% Latency: 1082007 -> 1082027 (+0.00%); split: -0.00%, +0.00% InvThroughput: 155100 -> 155105 (+0.00%) Copies: 9649 -> 9654 (+0.05%) VALU: 92504 -> 92509 (+0.01%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	d1f2f1222e	nir: guard nir_def_as_alu() We will potentially create load_const_instr instead of ALU. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	3180656bbc	nir: don't use nir_build_alu() with incomplete sources Ideally we'd have a version that takes nir_scalar arguments. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	ef9ecc4058	nir: add nir_imul_nuw() and nir_imul_imm_nuw() helpers Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Alyssa Rosenzweig	b82044c31b	nir/lower_two_sided_color: cleanup while in the area. no functional change Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38124>	2025-10-29 15:52:27 +00:00
Job Noorman	32b646c597	nir: print in_bounds info for deref_type(_ptr_as)_array Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>	2025-10-28 14:21:01 +00:00
Natalie Vock	50e65dac79	nir/lower_shader_calls: Repair SSA after wrap_instrs Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Wrapping jump instructions that are located inside ifs can break SSA invariants because the else block no longer dominates the merge block. Repair the SSA to make the validator happy again. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>	2025-10-26 11:38:51 +00:00
Alyssa Rosenzweig	b824ef83ab	util/dynarray: infer type in append Most of the time, we can infer the type to append in util_dynarray_append using __typeof__, which is standardized in C23 and support in Jesse's MSMSVCV. This patch drops the type argument most of the time, making util_dynarray a little more ergonomic to use. This is done in four steps. First, rename util_dynarray_append -> util_dynarray_append_typed bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;" Then, add a new append that infers the type. This is much more ergonomic for what you want most of the time. Next, use type-inferred append as much as possible, via Coccinelle patch (plus manual fixup): @@ expression dynarray, element; type type; @@ -util_dynarray_append_typed(dynarray, type, element); +util_dynarray_append(dynarray, element); Finally, hand fixup cases that Coccinelle missed or incorrectly translated, of which there were several because we can't used the untyped append with a literal (since the sizeof won't do what you want). All four steps are squashed to produce a single patch changing every util_dynarray_append call site in tree to either drop a type parameter (if possible) or insert a _typed suffix (if we can't infer). As such, the final patch is best reviewed by hand even though it was tool-assisted. No Long Linguine Meals were involved in the making of this patch. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>	2025-10-24 18:32:07 +00:00
Ian Romanick	f1bbc3d4e4	nir/algebraic: Don't generate integer min or max that will need to be lowered Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In !35844, there was some discussion about allowing 64-bit bcsel that would be lowered in the driver. One challenge there would be if a 64-bit bcsel was transformed into integer min or max by an algebraic optimization. I believe these were the only algebraic patterns that could create new integer min or max that would not be immediately constant folded. There were no shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>	2025-10-23 22:35:27 +00:00
Rhys Perry	92beca9aa5	nir/lower_tex: optimize txd(coord, ddx/ddy(coord)) fossil-db (gfx1201): Totals from 73 (0.09% of 79839) affected shaders: MaxWaves: 1668 -> 1670 (+0.12%) Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00% CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00% VGPRs: 6360 -> 6324 (-0.57%) Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02% InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03% VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05% SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86% Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32% PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13% VALU: 232982 -> 228117 (-2.09%) SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03% VMEM: 9234 -> 9237 (+0.03%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>	2025-10-23 11:21:59 +00:00
Rhys Perry	8e7ea4a882	nir/lower_shader_calls: reobtain impl after NIR_PASS Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>	2025-10-23 10:44:38 +00:00
Lionel Landwerlin	aa929ea706	nir/lower_io: add missing levels intrinsics to get_io_index_src_number Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c7ac46a1d8` ("nir/lower_io: add get_io_index_src_number support for image intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>	2025-10-22 21:21:47 +00:00
Simon Perretta	ff51e6dc9e	nir: commonize barycentric intrinsic opt pass Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Introduces an opt pass that attempts to optimize load_barycentric_at_{sample,offset} with simpler load_barycentric_* equivalents where possible, and optionally lowers load_barycentric_at_sample to load_barycentric_at_offset with a position derived from the sample ID instead. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>	2025-10-22 16:48:01 +00:00
Olivia Lee	a410d90fd2	panfrost: fix cl_local_size for precompiled shaders Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_lower_compute_system_values will attempt to lower load_workgroup_size unless workgroup_size_variable is set. For precomp shaders, the workgroup size is set statically for each entrypoint by nir_precompiled_build_variant. Because we call lower_compute_system_values early, it sets the workgroup size to zero. Temporarily setting workgroup_size_variable while we are still processing all the entrypoints together inhibits this. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Fixes: `20970bcd96` ("panfrost: Add base of OpenCL C infrastructure") Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>	2025-10-22 00:15:49 +00:00
Rhys Perry	64ec757688	nir/lower_mem_access_bit_sizes: increase chunk limit Not sure about creating u64vec16 loads, but creating unaligned loads is possible with opt_if_rewrite_uniform_uses. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>	2025-10-21 22:10:34 +00:00
Georg Lehmann	cf4ab485ea	nir: remove manual nir_load_global_constant Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:39:53 +02:00
Georg Lehmann	654bd74c60	treewide: use nir_store_global alias of nir_build_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	2306cba65b	nir: remove manual nir_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	9e41a7c139	treewide: use nir_load_global alias of nir_build_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	77540cac8c	nir: remove manual nir_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Lionel Landwerlin	255d1e883d	nir/divergence: fix handling of intel uniform block load Those are normally uniform always, but for the purpose of fused threads handling, we need to check their sources. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ca1533cd03` ("nir/divergence: add a new mode to cover fused threads on Intel HW") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>	2025-10-21 06:13:10 +00:00
Emma Anholt	0781edc30f	nir/copy_prop_vars: Mask out no-op writes to variables. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The pass previously supported removing complete no-op writes, but we can do better by noticing if any channel being written is the current channel's value and masking off those writes. I noticed this happening in Stray's fragment shader, where it looked like some translation layer had turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a, b). This in turn lets nir_shrink_vec_array_vars be more effective. Totals: MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01% Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00% CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00% STPs: 369810 -> 234618 (-36.56%) LDPs: 209430 -> 172011 (-17.87%) Totals from 1884 (0.12% of 1560230) affected shaders: MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15% Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06% CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05% NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58% MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59% COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83% Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31% (ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68% (sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18% (ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57% (sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12% STPs: 212045 -> 76853 (-63.76%) LDPs: 47337 -> 9918 (-79.05%) Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13% Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43% Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81% Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04% Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11% Cat6: 461411 -> 352375 (-23.63%) Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	537cc4e0ff	nir/shrink_stores: Don't shrink stores to an invalid num_components. Avoids a regression in the CL CTS on the next commit. Fixes: `2dba7e6056` ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	d8690f9c60	nir/link_opt_varyings: Make it participate in NIR_DEBUG=print. It's a pass with major effects on shaders, and it's otherwise weird to see your varying disappear between two passes that shouldn't affect them. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Aitor Camacho	f711c3afed	nir: Add KosmicKrisp required utilities Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>	2025-10-20 16:22:00 +00:00
Job Noorman	ad421cdf2e	nir: mark fneg distribution through fadd/ffma as nsz Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `df1876f615` ("nir: Mark negative re-distribution on fadd as imprecise") fixed the fadd case by marking it as imprecise. This commit fixes the ffma case for the same reason. However, "imprecise" isn't necessary and nowadays we have "nsz" which is more accurate here. Use that for both fadd and ffma. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `62795475e8` ("nir/algebraic: Distribute source modifiers into instructions") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>	2025-10-17 08:58:59 +00:00
Mary Guillemard	6f73533094	asahi,nir: Stop relying on zero and scratch page in GS/TESS code Introduce new NIR intrinsics to handle getting a "sink" read-only address and another intrinsic to handle conversion of address to read-write (allowing implementation to replace the "sink" read-only with another address like required for Asahi) Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Mary Guillemard	1e0c18d6cf	nir: Rename stat_query_address_agx to stat_query_address_poly This is used by the geometry lowering that we are going to move to common code. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig	84d8e6824b	treewide: don't check before free Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This was something that came up in the slop MR. Not sure it's actually a good idea or not but kind of curious what people think, given we have a sound tool (Coccinelle) to do the transform. Saves a redundant branch but means extra noninlined function calls.. likely no actual perf impact but saves some code. Via Coccinelle patches: @@ expression ptr; @@ -if (ptr) { -free(ptr); -} +free(ptr); @@ expression ptr; @@ -if (ptr) { -FREE(ptr); -} +FREE(ptr); @@ expression ptr; @@ -if (ptr) { -ralloc_free(ptr); -} +ralloc_free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -free(ptr); -} - +free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -FREE(ptr); -} - +FREE(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -ralloc_free(ptr); -} - +ralloc_free(ptr); Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d] Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus] Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr] Reviewed-by: Janne Grunau <j@jannau.net> [asahi] Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv] Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3] Acked-by: Marek Olšák <maraeo@gmail.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Job Noorman <jnoorman@igalia.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>	2025-10-15 23:01:33 +00:00
Dave Airlie	543c9be87a	nir/coopmat: fix non square load/store lowering for flexible dimensions This shouldn't affect radv, but we should do the calculations correctly for when non-square matters. Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>	2025-10-16 07:19:28 +10:00

1 2 3 4 5 ...

6764 commits