fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-30 15:50:32 +01:00

Author	SHA1	Message	Date
Yonggang Luo	ecb0ccf603	treewide: Replace calling to function ALIGN with align This is done by grep ALIGN( to align( docs,*.xml,blake3 is excluded Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>	2025-11-12 21:58:40 +00:00
Konstantin Seurer	b241b26d11	nir: Remove nir_def::parent_instr This reduces the footprint of nir_def by 8B on 64-bit systems. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Konstantin Seurer	de32f9275f	treewide: add & use parent instr helpers We add a bunch of new helpers to avoid the need to touch >parent_instr, including the full set of: * nir_def_is_* * nir_def_as__or_null nir_def_as_* [assumes the right instr type] * nir_src_is_* * nir_src_as_* * nir_scalar_is_* * nir_scalar_as_* Plus nir_def_instr() where there's no more suitable helper. Also an existing helper is renamed to unify all the names, while we're churning the tree: * nir_src_as_alu_instr -> nir_src_as_alu ..and then we port the tree to use the helpers as much as possible, using nir_def_instr() where that does not work. Acked-by: Marek Olšák <maraeo@gmail.com> --- To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm taking this opportunity to clean up a lot of NIR patterns. Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Yonggang Luo	34e7fa2fe6	nir: Disable gcc warning -Wstringop-overflow for nir_intrinsic_set_* for latter commit gcc has a a false positive here, silenced with the pragmas, use separate commit for easily revert latter once gcc fixed it. Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Konstantin Seurer	e231aec0c9	nir: Move nir_def directly after nir_instr This way, all instruction types have the nir_def at the same offset. Acked-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Faith Ekstrand	6ee4ea5ea3	nir: Add a type parameter to nir_lower_point_size() On Mali, we need not only clamp but also convert to float16 on Valhall+. We could have a separate pass for this but it fits in nicely with the rest of nir_lower_point_size() so we might as well put it there. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>	2025-11-12 01:34:36 +00:00
Faith Ekstrand	0e9fcb33c3	nir: Add a couple panfrost sysvals to divergence analysis Fixes: `2af6e4beeb` ("pan: Don't pretend we support load_{vertex_id_zero_base,first_vertex}") Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>	2025-11-11 17:38:36 +00:00
Daniel Schürmann	9abbcbc00e	nir/opt_load_store_vectorize: don't add negative offsets to load/store_shared2_amd By hoisting the low address instead, we can make use of these instructions on GFX6. Totals from 3 (0.00% of 79839) affected shaders: (Navi48) Instrs: 3768 -> 3776 (+0.21%); split: -0.03%, +0.24% CodeSize: 20024 -> 20048 (+0.12%); split: -0.04%, +0.16% Latency: 16093 -> 16198 (+0.65%) InvThroughput: 3868 -> 3864 (-0.10%) VClause: 97 -> 93 (-4.12%) VALU: 2333 -> 2331 (-0.09%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37682>	2025-11-11 17:12:15 +00:00
Kenneth Graunke	6151eb4372	nir: Drop writemask from all Intel memory store intrinsics The backend has been fully ignoring all writemasks for a long time, so it really doesn't make sense to have them on our custom intrinsics. I'm not sure they even make sense for some of the block intrinsics. Also, the store_ssbo -> store_ssbo_intel pass was not setting writemask at all, leaving it at the default value of 0 (aka write nothing, if it had been respected...) Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>	2025-11-11 10:55:41 +00:00
Georg Lehmann	9ef0c96f26	nir/opt_algebraic: optimize open coded pack_32_2x16 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 4 (0.00% of 80287) affected shaders: Instrs: 6231 -> 6101 (-2.09%) CodeSize: 35916 -> 35156 (-2.12%) Latency: 72190 -> 71317 (-1.21%) InvThroughput: 20817 -> 19962 (-4.11%) VALU: 3145 -> 3029 (-3.69%) VOPD: 310 -> 312 (+0.65%) Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37937>	2025-11-10 19:04:32 +00:00
Ian Romanick	d9bed33c11	nir/opt_if: Both parts of logic-joined conditions can be evaluated For cases like 'if (X && Y)', both X and Y must be true in the then branch. Their values are unknown in the else branch. Similarly, 'if (X \|\| Y)' must have both X and Y false in the else branch. The shader-db results are pretty bad, especially on Skylake. Ouch. The fossil-db results are good enough that they make up for it. v2: s/alu/alu_src/ in nir_src_parent_instr(use_src) != &alu_src->instr. Noticed by Rhys. shader-db: Lunar Lake total instructions in shared programs: 17203905 -> 17196251 (-0.04%) instructions in affected programs: 668828 -> 661174 (-1.14%) helped: 352 / HURT: 2 total cycles in shared programs: 879896264 -> 888462774 (0.97%) cycles in affected programs: 330523984 -> 339090494 (2.59%) helped: 187 / HURT: 167 total spills in shared programs: 3318 -> 3329 (0.33%) spills in affected programs: 4 -> 15 (275.00%) helped: 0 / HURT: 4 total fills in shared programs: 1903 -> 1917 (0.74%) fills in affected programs: 7 -> 21 (200.00%) helped: 0 / HURT: 4 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19969129 -> 19961439 (-0.04%) instructions in affected programs: 665860 -> 658170 (-1.15%) helped: 354 / HURT: 0 total cycles in shared programs: 884509249 -> 887353784 (0.32%) cycles in affected programs: 323242817 -> 326087352 (0.88%) helped: 208 / HURT: 146 total spills in shared programs: 4801 -> 4808 (0.15%) spills in affected programs: 14 -> 21 (50.00%) helped: 0 / HURT: 6 total fills in shared programs: 4454 -> 4467 (0.29%) fills in affected programs: 17 -> 30 (76.47%) helped: 0 / HURT: 6 Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) total instructions in shared programs: 19913774 -> 19906147 (-0.04%) instructions in affected programs: 667348 -> 659721 (-1.14%) helped: 351 / HURT: 3 total cycles in shared programs: 861253468 -> 864535803 (0.38%) cycles in affected programs: 325577148 -> 328859483 (1.01%) helped: 180 / HURT: 174 total spills in shared programs: 3440 -> 3455 (0.44%) spills in affected programs: 18 -> 33 (83.33%) helped: 0 / HURT: 8 total fills in shared programs: 1946 -> 1961 (0.77%) fills in affected programs: 18 -> 33 (83.33%) helped: 0 / HURT: 8 Skylake total instructions in shared programs: 19031768 -> 19023604 (-0.04%) instructions in affected programs: 671633 -> 663469 (-1.22%) helped: 347 / HURT: 7 total cycles in shared programs: 868474831 -> 868132073 (-0.04%) cycles in affected programs: 320499758 -> 320157000 (-0.11%) helped: 246 / HURT: 108 total spills in shared programs: 4024 -> 4063 (0.97%) spills in affected programs: 28 -> 67 (139.29%) helped: 0 / HURT: 18 total fills in shared programs: 3722 -> 3746 (0.64%) fills in affected programs: 34 -> 58 (70.59%) helped: 0 / HURT: 18 fossil-db: Lunar Lake Totals: Instrs: 928574038 -> 928568364 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40916656 -> 40916672 (+0.00%) Send messages: 41467974 -> 41467909 (-0.00%); split: -0.00%, +0.00% Loop count: 970202 -> 970191 (-0.00%) Cycle count: 106297789925 -> 106301305901 (+0.00%); split: -0.00%, +0.01% Spill count: 3424464 -> 3424452 (-0.00%); split: -0.00%, +0.00% Fill count: 6525458 -> 6525119 (-0.01%); split: -0.01%, +0.00% Max live registers: 193525368 -> 193524886 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 232027347 -> 232026610 (-0.00%); split: -0.00%, +0.00% Totals from 1130 (0.06% of 2018793) affected shaders: Instrs: 2662692 -> 2657018 (-0.21%); split: -0.27%, +0.06% Subgroup size: 16 -> 32 (+100.00%) Send messages: 112689 -> 112624 (-0.06%); split: -0.07%, +0.01% Loop count: 5723 -> 5712 (-0.19%) Cycle count: 1176696438 -> 1180212414 (+0.30%); split: -0.33%, +0.63% Spill count: 9895 -> 9883 (-0.12%); split: -0.13%, +0.01% Fill count: 26892 -> 26553 (-1.26%); split: -1.26%, +0.00% Max live registers: 215462 -> 214980 (-0.22%); split: -0.30%, +0.08% Non SSA regs after NIR: 398940 -> 398203 (-0.18%); split: -0.21%, +0.03% Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 1000318839 -> 1000314218 (-0.00%); split: -0.00%, +0.00% Send messages: 45548952 -> 45548887 (-0.00%); split: -0.00%, +0.00% Loop count: 1026441 -> 1026430 (-0.00%) Cycle count: 92411461807 -> 92395024225 (-0.02%); split: -0.02%, +0.00% Spill count: 3665265 -> 3665221 (-0.00%); split: -0.00%, +0.00% Fill count: 6504830 -> 6504801 (-0.00%); split: -0.00%, +0.00% Max live registers: 121790079 -> 121789811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 38062488 -> 38062648 (+0.00%) Non SSA regs after NIR: 256900770 -> 256900038 (-0.00%); split: -0.00%, +0.00% Totals from 1124 (0.05% of 2284852) affected shaders: Instrs: 2724110 -> 2719489 (-0.17%); split: -0.24%, +0.07% Send messages: 112096 -> 112031 (-0.06%); split: -0.07%, +0.01% Loop count: 5697 -> 5686 (-0.19%) Cycle count: 960659254 -> 944221672 (-1.71%); split: -1.91%, +0.20% Spill count: 13791 -> 13747 (-0.32%); split: -0.40%, +0.08% Fill count: 43216 -> 43187 (-0.07%); split: -0.14%, +0.08% Max live registers: 114877 -> 114609 (-0.23%); split: -0.31%, +0.07% Max dispatch width: 12768 -> 12928 (+1.25%) Non SSA regs after NIR: 412320 -> 411588 (-0.18%); split: -0.20%, +0.03% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38321>	2025-11-10 18:30:42 +00:00
Ian Romanick	3e0c9ad316	nir/opt_if: Conditionally do not propagate constants through bcsel In some cases propagating through a bcsel may be harmful. If the bcsel uses are unlikely to be eliminated in both branch of an if statement, propagating through it may result in extra moves for phi instructions and extended live ranges. v2: Fix missing parameter in call. Noticed by Rhys. I fixed this on the test machine, but I must have forgotten to propagate the change back to my dev machine. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38321>	2025-11-10 18:30:41 +00:00
Ian Romanick	a3b6d05a3b	nir/opt_if: Specify which branches are valid for evaluate_if_condition Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38321>	2025-11-10 18:30:41 +00:00
Marek Olšák	0216f09e45	nir/lower_interpolation: check IO location correctly Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Vangogh timed out. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38337>	2025-11-10 16:44:36 +00:00
Lars-Ivar Hesselberg Simonsen	b3b6fba548	nir: Add pan intrinsics for texel buffer access Will be used by panfrost to access texel buffers. Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>	2025-11-07 17:03:53 +00:00
Faith Ekstrand	35cdddf632	nir: Simplify assign_io_var_locations() The size and stage parameters are left-overs from history. Originally, the function acted on a list and so it needed an explicit stage and size output. Now that it takes a NIR shader and a mode, we can just take the stage from the shader and set num_(in\|out)puts. The one caller that actually used the explicit output parameter was turnip. However, given that the helper sorts and re-numbers all the I/O variables, it's not like changing num_(in\|out)puts instead of writing it to some other location is that big of a deal. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38297>	2025-11-07 16:29:56 +00:00
Job Noorman	3908a228bd	nir: add opt_uub pass Add a pass that uses nir_unsigned_upper_bound to simplify some ALU operations: - iand src, mask: if mask is constant with N least significant bits set and uub(src) < 2^N, the iand does nothing and can be removed. - ult src, const: if uub(src) < cmp -> true - uge src, const: if uub(src) < cmp -> false - ilt src, const: if uub(src) >= 0 && cmp < 0 -> false - if uub(src) >= 0 && cmp >= 0 -> ult src, const - ige src, const: if uub(src) >= 0 && cmp < 0 -> true - if uub(src) >= 0 && cmp >= 0 -> uge src, const - umin src, const: if uub(src) <= const -> src - umax src, const: if uub(src) <= const -> const - imin src, const: if uub(src) >= 0 && const < 0 -> const - if uub(src) >= 0 && const >= 0 -> umin src, const - imax src, const: if uub(src) >= 0 && const < 0 -> src - if uub(src) >= 0 && const >= 0 -> umax src, const - imul src0, src1: if uub(srci) < UINT16_MAX -> umul_16x16 src0, src1 - imul src0, src1: if uub(srci) < UINT24_MAX -> umul24 src0, src1 - imul src0, src1: if uub(srci) < UINT23_MAX -> imul24 src0, src1 The imul optimization needs to be explicitly enabled using a pass option. This is useful since 1) most backends don't support umul_16x16, and 2) some passes (e.g., nir_opt_load_store_vectorize) need to analyze imuls so lowering them before running such a pass makes their job more difficult. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>	2025-11-07 10:23:29 +00:00
Job Noorman	0b348fb375	nir: add has_umul_16x16 option Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37869>	2025-11-07 10:23:29 +00:00
Natalie Vock	0cb1fca8fa	nir: Use sparse bitset for liveness information Some shaders, especially RTPSO shaders that have parts of the PSO inlined, can become absolutely huge. Using a sparse bitset avoids quadratic complexity in memory consumption for the liveness information. This reduces peak memory usage in worst-case tests (hammering compilation of many huge RTPSOs on 32 threads concurrently) by ~60%, from 43GB to 18GB. CPU time (seconds) differences for a workload with mostly small shaders: Difference at 95.0% confidence -5.27 +/- 1.08963 -0.88811% +/- 0.183626% (Student's t, pooled s = 0.629735) Peak resident set usage for the mostly-small workload: Difference at 95.0% confidence 30809 +/- 13394.3 1.59276% +/- 0.69246% (Student's t, pooled s = 7741.09) CPU time for the heavy workload did not show any difference. Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>	2025-11-06 21:34:33 +00:00
Faith Ekstrand	0ccadf7a86	nir: Check the deref mode in lower_point_size() This is more robust because it ensures that we only ever check the location on something that we know is an outupt. Also, if it's an output then we know (thanks, validation!) that it's a variable. Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>	2025-11-06 14:57:31 +00:00
Faith Ekstrand	5ed35866c2	nir: Handle lowered I/O in lower_viewport_transform() While we're here, make the variable handling a little more robust by checking the deref mode before assuming there's a reachable variable. Reviewed-by: Olivia Lee <olivia.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>	2025-11-06 14:57:31 +00:00
Antonio Ospite	222b85328e	mesa: replace most occurrences of getenv() with os_get_option() The standard way to query options in mesa is `os_get_option()` which abstracts platform-specific mechanisms to get config variables. However in quite a few places `getenv()` is still used and this may preclude controlling some options on some systems. For instance it is not generally possible to use `MESA_DEBUG` on Android. So replace most `getenv()` occurrences with `os_get_option()` to support configuration options more consistently across different platforms. Do the same with `secure_getenv()` replacing it with `os_get_option_secure()`. The bulk of the proposed changes are mechanically performed by the following script: ----------------------------------------------------------------------- #!/bin/sh set -e replace() { # Don't replace in some files, for example where `os_get_option` is defined, # or in external files EXCLUDE_FILES_PATTERN='(src/util/os_misc.c\|src/util/u_debug.h\|src/gtest/include/gtest/internal/gtest-port.h)' # Don't replace some "system" variables EXCLUDE_VARS_PATTERN='("XDG\|"DISPLAY\|"HOME\|"TMPDIR\|"POSIXLY_CORRECT)' git grep "[=!( ]$1(" -- src/ \| cut -d ':' -f 1 \| sort \| uniq \| \ grep -v -E "$EXCLUDE_FILES_PATTERN" \| \ while read -r file; do # Don't replace usages of XDG_* variables or HOME sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!$ ])$1\(/\1$2\(/g" -i "$file"; done } # Add const to os_get_option results, to avoid warning about discarded qualifier: # warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] # but also errors in some cases: # error: invalid conversion from ‘const char’ to ‘char’ [-fpermissive] add_const_results() { git grep -l -P '(?<!const )char.os_get_option' \| \ while read -r file; do sed -e '/^\sconst/! s/\(char.os_get_option$/const \1/g' -i "$file" done } replace 'secure_getenv' 'os_get_option_secure' replace 'getenv' 'os_get_option' add_const_results ----------------------------------------------------------------------- After this, the `#include "util/os_misc.h"` is also added in files where `os_get_option()` was not used before. And since the replacements from the script above generated some new `-Wdiscarded-qualifiers` warnings, those have been addressed as well, generally by declaring `os_get_option()` results as `const char ` and adjusting some function declarations. Finally some replacements caused new errors like: ----------------------------------------------------------------------- ../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok' 127 \| for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) { \| ^~~~~~ /android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char ') would lose const qualifier 124 \| char _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter); \| ^ ~~~~~~~~~~~~~~~~~~~ ----------------------------------------------------------------------- Those have been addressed too, copying the const string returned by `os_get_option()` so that it could be modified. In particular, the error above has been fixed by copying the `const char env_llc_options` variable in `src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char ` which can be tokenized using `strtok()`. Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>	2025-11-06 04:36:13 +00:00
Alyssa Rosenzweig	9c2a2deee6	treewide: use BITSET_BYTES, BITSET_RZALLOC Via Coccinelle patches: @@ expression bits; typedef BITSET_WORD; @@ -BITSET_WORDS(bits) * sizeof(BITSET_WORD) +BITSET_BYTES(bits) @@ expression memctx, bits; typedef BITSET_WORD; @@ -rzalloc_array(memctx, BITSET_WORD, BITSET_WORDS(bits)) +BITSET_RZALLOC(memctx, bits) @@ expression memctx, bits; @@ -rzalloc_size(memctx, BITSET_BYTES(bits)) +BITSET_RZALLOC(memctx, bits) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38245>	2025-11-05 18:44:23 +00:00
Konstantin Seurer	b962063d72	nir: Remove nir_parallel_copy_instr Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:51 +00:00
Konstantin Seurer	3f3faa82b8	nir/from_ssa: Stop using nir_parallel_copy_instr nir_parallel_copy_instr can be emulated using an intrinsic for each entry and an array of arrays that is used by the pass to remember which copies belong together. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:50 +00:00
Konstantin Seurer	b20fd0ef48	nir: Remove parallel copy handling from rewrite_uses_to_load_reg Parallel copies are only created by nir_convert_from_ssa which does not use the helper. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>	2025-11-04 18:51:50 +00:00
Ian Romanick	67a6fc0160	nir/opt_if: See through inot Consider if (!x) { if (x) { ... } } The inner use of `x` must be false, but so far only instances of `!x` would have been replaced with a constant. See through the `inot` to replace instances of `x` as well. shader-db: Lunar Lake total instructions in shared programs: 17205147 -> 17204908 (<.01%) instructions in affected programs: 56037 -> 55798 (-0.43%) helped: 79 / HURT: 79 total cycles in shared programs: 879847886 -> 879992944 (0.02%) cycles in affected programs: 5244138 -> 5389196 (2.77%) helped: 141 / HURT: 125 Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19968312 -> 19968069 (<.01%) instructions in affected programs: 65698 -> 65455 (-0.37%) helped: 88 / HURT: 104 total cycles in shared programs: 884331007 -> 884469865 (0.02%) cycles in affected programs: 4839695 -> 4978553 (2.87%) helped: 172 / HURT: 136 LOST: 3 GAINED: 0 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 20809765 -> 20809473 (<.01%) instructions in affected programs: 65976 -> 65684 (-0.44%) helped: 89 / HURT: 102 total cycles in shared programs: 872466849 -> 872433762 (<.01%) cycles in affected programs: 5452888 -> 5419801 (-0.61%) helped: 157 / HURT: 133 total spills in shared programs: 4014 -> 4010 (-0.10%) spills in affected programs: 30 -> 26 (-13.33%) helped: 1 / HURT: 0 total fills in shared programs: 3769 -> 3765 (-0.11%) fills in affected programs: 50 -> 46 (-8.00%) helped: 1 / HURT: 0 LOST: 3 GAINED: 1 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 910122459 -> 910097570 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40045664 -> 40046176 (+0.00%) Send messages: 40724361 -> 40724036 (-0.00%) Loop count: 970500 -> 970054 (-0.05%) Cycle count: 105785543442 -> 105794147978 (+0.01%); split: -0.02%, +0.02% Spill count: 3426093 -> 3426032 (-0.00%); split: -0.00%, +0.00% Fill count: 6525296 -> 6525210 (-0.00%); split: -0.00%, +0.00% Max live registers: 188561553 -> 188519064 (-0.02%); split: -0.02%, +0.00% Max dispatch width: 47958304 -> 47958496 (+0.00%); split: +0.00%, -0.00% Non SSA regs after NIR: 227303232 -> 227296055 (-0.00%); split: -0.00%, +0.00% Totals from 15417 (0.78% of 1977988) affected shaders: Instrs: 16984488 -> 16959599 (-0.15%); split: -0.20%, +0.05% Subgroup size: 512 -> 1024 (+100.00%) Send messages: 900193 -> 899868 (-0.04%) Loop count: 23059 -> 22613 (-1.93%) Cycle count: 1200149390 -> 1208753926 (+0.72%); split: -1.48%, +2.20% Spill count: 25838 -> 25777 (-0.24%); split: -0.29%, +0.06% Fill count: 43627 -> 43541 (-0.20%); split: -0.28%, +0.08% Max live registers: 2550741 -> 2508252 (-1.67%); split: -1.75%, +0.08% Max dispatch width: 296736 -> 296928 (+0.06%); split: +0.08%, -0.02% Non SSA regs after NIR: 3264670 -> 3257493 (-0.22%); split: -0.25%, +0.03% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38196>	2025-11-04 18:04:00 +00:00
Alyssa Rosenzweig	17355f716b	treewide: use UTIL_DYNARRAY_INIT Instead of util_dynarray_init(&dynarray, NULL), just use UTIL_DYNARRAY_INIT instead. This is more ergonomic. Via Coccinelle patch: @@ identifier dynarray; @@ -struct util_dynarray dynarray = {0}; -util_dynarray_init(&dynarray, NULL); +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT; @@ identifier dynarray; @@ -struct util_dynarray dynarray; -util_dynarray_init(&dynarray, NULL); +struct util_dynarray dynarray = UTIL_DYNARRAY_INIT; @@ expression dynarray; @@ -util_dynarray_init(&(dynarray), NULL); +dynarray = UTIL_DYNARRAY_INIT; @@ expression dynarray; @@ -util_dynarray_init(dynarray, NULL); +(dynarray) = UTIL_DYNARRAY_INIT; Followed by sed: bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init(&$.$, NULL)/\1 = UTIL_DYNARRAY_INIT/g' \{} \;" bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init( &$.$, NULL )/\1 = UTIL_DYNARRAY_INIT/g' \{} \;" bash -c "find . -type f -exec sed -i -e 's/util_dynarray_init($.$, NULL)/*\1 = UTIL_DYNARRAY_INIT/g' \{} \;" Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38189>	2025-11-04 13:39:48 +00:00
Marek Olšák	2f6b4803ab	nir/validate: expand IO intrinsic validation with nir_io_semantics Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details There are many workarounds. v2: add more validation Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>	2025-11-02 02:21:46 +00:00
Marek Olšák	390023f9fd	nir/lower_io: force src offset=0 for any indirect access with num_slots == 1 This reduces indirect indexing of 1-element arrays to indexing with 0. Without this, we fail an assertion later. Discovered when writing a test. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>	2025-11-02 02:21:46 +00:00
Marek Olšák	3e2c11597a	nir: add nir_intrinsic_ssbo_descriptor_amd for lowering get_ssbo_size Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>	2025-11-02 01:42:07 +00:00
Alyssa Rosenzweig	a014daea8f	nir: use alignment helpers more Coccinelle + filtering hunks manually + @@ expression pt, pot; typedef uintptr_t; @@ -util_is_aligned((uintptr_t)(pt), pot) +util_ptr_is_aligned(pt, pot) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>	2025-10-31 15:03:57 +00:00
Marek Olšák	86dd74aaeb	nir/lower_indirect_derefs: don't lower compact arrays unconditionally to fix perf This fixes bad mesh shader performance. See the comment. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>	2025-10-31 00:57:46 +00:00
Daniel Schürmann	b3615e5d6f	nir/algebraic: ad-hoc constant-fold ALU instructions Slight differences due to different optimization order. Totals from 135 (0.17% of 79839) affected shaders: (Navi48) Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03% CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04% Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12% InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84% VClause: 4092 -> 4084 (-0.20%) SClause: 7462 -> 7478 (+0.21%) Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21% Branches: 6395 -> 6386 (-0.14%) PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07% PreVGPRs: 6375 -> 6382 (+0.11%) VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02% SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12% VMEM: 6704 -> 6696 (-0.12%) SMEM: 12099 -> 12129 (+0.25%) Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV: baseline: nir_opt_algebraic was called 12917 times from radv_optimize_nir() nir_opt_cse was called 15204 times from radv_optimize_nir() relative time spent in radv_optimize_nir(): 31.48% total instruction fetch cost: 28,642,638,021 with nir/algebraic: ad-hoc constant-fold ALU instructions nir_opt_algebraic was called 12797 times from radv_optimize_nir() nir_opt_cse was called 12963 times from radv_optimize_nir() relative time spent in radv_optimize_nir(): 30.63% total instruction fetch cost: 28,284,386,123 => ~1.27% improvement in total compile times Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	9039e24751	nir/lower_flrp: ad-hoc constant-fold ALU instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	f61cd64af8	nir/builder: add option to immediately constant-fold ALU instructions upon insertion Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	870616af34	nir/constant_folding: switch to nir_shader_lower_instructions() Small differences due to implicit DCE. Totals from 76 (0.10% of 79839) affected shaders: (Navi48) Instrs: 168051 -> 168044 (-0.00%); split: -0.01%, +0.01% CodeSize: 893284 -> 893256 (-0.00%); split: -0.01%, +0.01% Latency: 1082007 -> 1082027 (+0.00%); split: -0.00%, +0.00% InvThroughput: 155100 -> 155105 (+0.00%) Copies: 9649 -> 9654 (+0.05%) VALU: 92504 -> 92509 (+0.01%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	d1f2f1222e	nir: guard nir_def_as_alu() We will potentially create load_const_instr instead of ALU. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	3180656bbc	nir: don't use nir_build_alu() with incomplete sources Ideally we'd have a version that takes nir_scalar arguments. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	ef9ecc4058	nir: add nir_imul_nuw() and nir_imul_imm_nuw() helpers Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Alyssa Rosenzweig	b82044c31b	nir/lower_two_sided_color: cleanup while in the area. no functional change Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38124>	2025-10-29 15:52:27 +00:00
Job Noorman	32b646c597	nir: print in_bounds info for deref_type(_ptr_as)_array Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>	2025-10-28 14:21:01 +00:00
Natalie Vock	50e65dac79	nir/lower_shader_calls: Repair SSA after wrap_instrs Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Wrapping jump instructions that are located inside ifs can break SSA invariants because the else block no longer dominates the merge block. Repair the SSA to make the validator happy again. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>	2025-10-26 11:38:51 +00:00
Alyssa Rosenzweig	b824ef83ab	util/dynarray: infer type in append Most of the time, we can infer the type to append in util_dynarray_append using __typeof__, which is standardized in C23 and support in Jesse's MSMSVCV. This patch drops the type argument most of the time, making util_dynarray a little more ergonomic to use. This is done in four steps. First, rename util_dynarray_append -> util_dynarray_append_typed bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;" Then, add a new append that infers the type. This is much more ergonomic for what you want most of the time. Next, use type-inferred append as much as possible, via Coccinelle patch (plus manual fixup): @@ expression dynarray, element; type type; @@ -util_dynarray_append_typed(dynarray, type, element); +util_dynarray_append(dynarray, element); Finally, hand fixup cases that Coccinelle missed or incorrectly translated, of which there were several because we can't used the untyped append with a literal (since the sizeof won't do what you want). All four steps are squashed to produce a single patch changing every util_dynarray_append call site in tree to either drop a type parameter (if possible) or insert a _typed suffix (if we can't infer). As such, the final patch is best reviewed by hand even though it was tool-assisted. No Long Linguine Meals were involved in the making of this patch. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>	2025-10-24 18:32:07 +00:00
Ian Romanick	f1bbc3d4e4	nir/algebraic: Don't generate integer min or max that will need to be lowered Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In !35844, there was some discussion about allowing 64-bit bcsel that would be lowered in the driver. One challenge there would be if a 64-bit bcsel was transformed into integer min or max by an algebraic optimization. I believe these were the only algebraic patterns that could create new integer min or max that would not be immediately constant folded. There were no shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>	2025-10-23 22:35:27 +00:00
Rhys Perry	92beca9aa5	nir/lower_tex: optimize txd(coord, ddx/ddy(coord)) fossil-db (gfx1201): Totals from 73 (0.09% of 79839) affected shaders: MaxWaves: 1668 -> 1670 (+0.12%) Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00% CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00% VGPRs: 6360 -> 6324 (-0.57%) Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02% InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03% VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05% SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86% Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32% PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13% VALU: 232982 -> 228117 (-2.09%) SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03% VMEM: 9234 -> 9237 (+0.03%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>	2025-10-23 11:21:59 +00:00
Rhys Perry	8e7ea4a882	nir/lower_shader_calls: reobtain impl after NIR_PASS Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>	2025-10-23 10:44:38 +00:00
Lionel Landwerlin	aa929ea706	nir/lower_io: add missing levels intrinsics to get_io_index_src_number Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c7ac46a1d8` ("nir/lower_io: add get_io_index_src_number support for image intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>	2025-10-22 21:21:47 +00:00
Simon Perretta	ff51e6dc9e	nir: commonize barycentric intrinsic opt pass Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Introduces an opt pass that attempts to optimize load_barycentric_at_{sample,offset} with simpler load_barycentric_* equivalents where possible, and optionally lowers load_barycentric_at_sample to load_barycentric_at_offset with a position derived from the sample ID instead. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>	2025-10-22 16:48:01 +00:00
Olivia Lee	a410d90fd2	panfrost: fix cl_local_size for precompiled shaders Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_lower_compute_system_values will attempt to lower load_workgroup_size unless workgroup_size_variable is set. For precomp shaders, the workgroup size is set statically for each entrypoint by nir_precompiled_build_variant. Because we call lower_compute_system_values early, it sets the workgroup size to zero. Temporarily setting workgroup_size_variable while we are still processing all the entrypoints together inhibits this. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Fixes: `20970bcd96` ("panfrost: Add base of OpenCL C infrastructure") Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>	2025-10-22 00:15:49 +00:00

1 2 3 4 5 ...

6780 commits