fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 11:18:11 +02:00

Author	SHA1	Message	Date
Konstantin Seurer	a8224e3e00	nir/opt_algebraic: Do not emit patterns for 64bit booleans Avoids assertion failures trying to constant-evaluate the pattern with the new nir_opt_algebraic_pattern_tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:48 +00:00
Konstantin Seurer	211c7db8e3	nir/opt_algebraic: Remove a pattern for 8bit floats Avoids assertion failures trying to constant-evaluate the pattern with the new nir_opt_algebraic_pattern_tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:48 +00:00
Emma Anholt	afece95101	nir/opt_algebraic: Fix return type of fdot(vec(a, 0.0, ...), b). The replace pattern was generating a vector when it should have been scalar. Fixes validation failures with the new algebraic unit tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:47 +00:00
Georg Lehmann	9c6d294111	nir/opcodes: use util_max_num/util_min_num for fmin/fmax constant folding. Hopefully, this is easier to read. The SPIR-V behavior has also since been clarified to require associativity. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39137>	2026-01-06 10:55:03 +00:00
Georg Lehmann	026d4cd200	nir/opcodes: fix fsat signed zero correctness fsat(-0.0) must return +0.0. Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39137>	2026-01-06 10:55:03 +00:00
Marek Olšák	86b74563a0	nir/clip_cull_distance_utils: add more assertions validating the type & sizes Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39146>	2026-01-05 21:24:10 +00:00
Marek Olšák	bba2536bb0	nir/clip_cull_distance_utils: fix assertion failures with GL_EXT_mesh_shader Those outputs are never compact in GLSL mesh shaders. The assertions might not be needed. Cc: mesa-stable Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39146>	2026-01-05 21:24:10 +00:00
Alyssa Rosenzweig	347a0ac212	panfrost,nir: drop my lonely Authors tags We all know who wrote a bunch of Panfrost code. No need to repeat this a million places, the copyright line is plenty. in cases where there's a joint me & Italo/Eric/.. tag, i've left it alone to respect others' potential wishes. $ find . -type f -exec perl -i -p0e 's/ \\s+\ Author[^\n]+\s+\\s+Alyssa[^\n]+\n \\// \*\//' \{} \; v2: delete more tags (Boris). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39136>	2026-01-05 17:47:52 +00:00
Georg Lehmann	c8ce0df2d2	nir/opt_algebraic: replace is_negative_zero with constant -0.0 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Now that nir_search respects the sign of zero, we don't need a manual helper for this. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Georg Lehmann	0d255011ae	nir/search: respect sign of zero when comparing floats Floating point comparison treats -0.0 and 0.0 as equal, but do this in nir_search makes patterns signed zero incorrect. Foz-DB Navi21: Totals from 1460 (1.16% of 125360) affected shaders: MaxWaves: 33704 -> 33710 (+0.02%) Instrs: 2559362 -> 2558823 (-0.02%); split: -0.02%, +0.00% CodeSize: 14502684 -> 14496352 (-0.04%); split: -0.05%, +0.00% VGPRs: 71800 -> 71776 (-0.03%) Latency: 19274782 -> 19274267 (-0.00%); split: -0.01%, +0.00% InvThroughput: 3307870 -> 3299091 (-0.27%); split: -0.27%, +0.00% SClause: 158698 -> 158703 (+0.00%); split: -0.00%, +0.00% Copies: 240291 -> 241003 (+0.30%); split: -0.03%, +0.32% PreSGPRs: 73203 -> 73206 (+0.00%); split: -0.00%, +0.01% PreVGPRs: 62515 -> 62508 (-0.01%) VALU: `1564970` -> 1564331 (-0.04%); split: -0.04%, +0.00% SALU: 378546 -> 378654 (+0.03%); split: -0.00%, +0.03% This difference is suprisingly positive, the only patterns affected did previously signed zero incorrect bcsel -> b2f. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Georg Lehmann	7d2a946730	nir/opt_algebraic: canonicalize scmp with -0.0 We already do this for non fused comparisons. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Georg Lehmann	2824c12252	nir/opt_algebraic: explicitly add some -0.0 variants of patterns Foz-DB Navi21: Totals from 5 (0.00% of 125360) affected shaders: CodeSize: 28812 -> 28744 (-0.24%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Timur Kristóf	2ecb7a9e18	nir: Add pass to lower workgroup size Lowers a shader to use a smaller workgroup to do the same work, while it will still appear as a bigger workgroup to applications. To achieve this, the pass augments the CF of the shader so that each real subgroup will execute two or more logical subgroups. A logical subgroup represents what the application can observe as a subgroup. The size of a logical subgroup is the same as a real subgroup. Only one logical subgroup may be executed per real subgroup at the same time. This ensures that all subgroup operations keep working and the subgroup invocation ID stays the same. - When the CF contains barriers, we can't just repeat the code and we need to augment each CF node individually so that they are aware of logical subgroups. - In case parts of the CF don't contain any barriers, we can simply repeat and predicate that CF for each logical subgroup. It is technically not necessary to implement this strategy, but in practice it helps reduce the amount of branches in the shader and therefore improves compile times. The pass is mainly intended for working around HW limitations, for example when the HW has an upper limit on the workgroup size or doesn't support workgroups at all, but the API requires a certain minimum. Notes: - Only applicable to shader stages that use workgroups - Hits an assertion when called on smaller workgroups - Always flattens workgroup size to 1D - Creates local variables - Does not change subgroup size - Variable workgroup size not supported yet, maybe later Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Anna Maniscalco <anna.maniscalco2000@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37985>	2026-01-02 13:33:54 -06:00
Pavel Ondračka	0b39b5ea63	nir/opt_algebraic: improve dot product narrowing Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The issue is that the current narrowing patterns are not working in a lot of cases, for example (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)), is missing patterns like this: 32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000) 32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0) 32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz or after some later transforms: 32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000) 32x2 %6 = vec2 %5, %1 (0x0) 32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy This patch is heavily based on old branch from Ian Romanick from 2019. r300 RV530 shader-db: total instructions in shared programs: 128900 -> 128882 (-0.01%) instructions in affected programs: 621 -> 603 (-2.90%) helped: 10 HURT: 1 total cycles in shared programs: 191837 -> 191828 (<.01%) cycles in affected programs: 799 -> 790 (-1.13%) helped: 7 HURT: 1 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>	2026-01-02 16:07:10 +01:00
Timur Kristóf	2b62738b9b	nir: Add new nir_remove_outputs pass Introduce a new NIR pass called nir_remove_outputs which works on lowered I/O intrinsics and can remove any output varying or sysval. This is meant to replace custom solutions in drivers, such as radv_remove_varyings and similar. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>	2026-01-01 21:25:42 -06:00
Timur Kristóf	1981b9836b	nir/opt_vectorize_io: Fix allow_holes option Only allow holes between the first and last used component. Do not load unused components before the first used component. This fixes test failures with a bunch of VK CTS tests with allow_holes enabled on RADV: dEQP-VK.tessellation.tess_io.max_in_out.with_f16.* Fixes: `6286c1c66f` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>	2026-01-01 17:38:01 -06:00
Marek Olšák	99a42bdd4b	nir,radeonsi: simplify load_color0 & load_color1 intrinsics and shader_info We don't need the shader_info fields anymore. sample and centroid fields are unused. The interp field is already available from si_shader_info::color_interpolate. The loads don't need to be sysvals. Add also the _amd suffix. Don't handle it in st_nir_lower_drawpixels either because the intrinsics are created much later in compilation now. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>	2026-01-01 18:30:28 +00:00
Georg Lehmann	369a3b22b4	nir/opt_uniform_subgroup: optimize uniform ddx/ddy Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We can't just use 0.0 as the replacement because of NaN/Inf. But turning the intrinsic into a simple fsub should still be better or at least equal. Foz-DB Navi48: Totals from 128 (0.10% of 125402) affected shaders: MaxWaves: 3684 -> 3708 (+0.65%) Instrs: 111150 -> 111055 (-0.09%); split: -0.20%, +0.11% CodeSize: 587176 -> 590800 (+0.62%); split: -0.01%, +0.63% VGPRs: 6540 -> 6480 (-0.92%) Latency: 382775 -> 383332 (+0.15%); split: -0.15%, +0.29% InvThroughput: 80909 -> 80530 (-0.47%); split: -0.51%, +0.04% VClause: 1433 -> 1430 (-0.21%) SClause: 1834 -> 1841 (+0.38%); split: -0.11%, +0.49% Copies: 6130 -> 6096 (-0.55%); split: -1.29%, +0.73% PreSGPRs: 7352 -> 7356 (+0.05%) PreVGPRs: 4797 -> 4721 (-1.58%) VALU: 71892 -> 71435 (-0.64%); split: -0.64%, +0.01% SALU: 12665 -> 13056 (+3.09%); split: -0.06%, +3.14% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39112>	2026-01-01 08:43:55 +00:00
Sviatoslav Peleshko	f3eb98ec57	nir/normalize_cubemap_coords: Handle the projector before the normalization Applying the projector after the normalization breaks the coordinates, so apply it early. Usually it's not even necessary for the cubemaps anyway, but ARB_fragment_program and TGSI allow it. Fixes: `52e71809` ("nir: Add a cubemap normalizing pass") Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39087>	2025-12-30 16:25:09 +00:00
Georg Lehmann	5e8cc19a3b	nir: remove per shader float fast math flags Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details These were redundant with the per alu fast math flags. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	6e67267045	nir/opt_varyings: use per instruction nan flag for promoting to flat Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	4f5a29ec32	nir/opt_varyings: use per instruction inf/nan flag for moving past interp Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	f3290219ab	nir: use a seperate enum for per alu floating point math control We don't need one bit per bitsize per instruction if only one actually matters in the end. First step towards moving NIR in the direction of full float_controls2 only. Also rename this from fp_fast_math, because that name implied that 0 is the no fast math mode, while the opposite was the case. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:05 +00:00
Georg Lehmann	71f0c0d6a6	nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 12 (0.01% of 97623) affected shaders: Instrs: 9207 -> 8973 (-2.54%) CodeSize: 54192 -> 52832 (-2.51%) VGPRs: 768 -> 480 (-37.50%) Latency: 39516 -> 38507 (-2.55%) InvThroughput: 10155 -> 9859 (-2.91%) PreSGPRs: 329 -> 332 (+0.91%) PreVGPRs: 268 -> 263 (-1.87%) VALU: 4393 -> 4257 (-3.10%) SALU: 1037 -> 1019 (-1.74%) VOPD: 602 -> 599 (-0.50%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	0e5e1cb9b0	nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con) Foz-DB Navi48: Totals from 1 (0.00% of 97397) affected shaders: Instrs: 1848 -> 1834 (-0.76%) CodeSize: 9996 -> 9908 (-0.88%) VGPRs: 96 -> 72 (-25.00%) Latency: 17371 -> 17358 (-0.07%) Copies: 190 -> 191 (+0.53%) PreVGPRs: 43 -> 41 (-4.65%) VALU: 657 -> 648 (-1.37%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	4d8cc7d82e	nir/divergence: add nir_def_is_divergent_at_use_block helper For cases where the block we are interested in is not the immediate block of the nir_src. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Lionel Landwerlin	252e55a1bb	nir/printf-helpers: set writes_memory at printf emission Those helpers can be called late (since it's mostly for debug purposes). This can avoid surprises in the backend and also avoids rerunning gather_info. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>	2025-12-19 10:31:08 +00:00
Emma Anholt	5a09abe890	nir: Introduce nir_lower_vars_to_scratch_global(). This lets the driver make a more informed decision about which vars to lower to scratch based on the vars available to spill. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Emma Anholt	059d301c79	nir: Drop the mode argument of nir_lower_vars_to_scratch(). It only makes sense for function temps, and that's the only way it's been used. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Ian Romanick	66fd4d72fd	nir/algebraic: Mask with shifted constant instead of shift-then-mask shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17088766 -> 17088765 (<.01%) instructions in affected programs: 1375 -> 1374 (-0.07%) helped: 1 / HURT: 1 total cycles in shared programs: 887873068 -> 887871748 (<.01%) cycles in affected programs: 136402 -> 135082 (-0.97%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40937696 -> 40937728 (+0.00%) Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00% Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00% Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01% Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 49078640 -> 49078656 (+0.00%) Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00% Totals from 13809 (0.68% of 2019450) affected shaders: Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02% Subgroup size: 32 -> 64 (+100.00%) Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25% Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24% Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88% Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02% Max dispatch width: 234848 -> 234864 (+0.01%) Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00% Subgroup size: 27452928 -> 27452944 (+0.00%) Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01% Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02% Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05% Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37811544 -> 37811584 (+0.00%) Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00% Totals from 14438 (0.63% of 2281134) affected shaders: Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06% Subgroup size: 16 -> 32 (+100.00%) Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55% Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46% Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37% Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02% Max dispatch width: 123528 -> 123568 (+0.03%) Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00% Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01% Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01% Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01% Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37805336 -> 37805472 (+0.00%) Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00% Totals from 14835 (0.65% of 2278397) affected shaders: Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02% Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71% Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76% Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54% Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02% Max dispatch width: 126600 -> 126736 (+0.11%) Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>	2025-12-17 18:38:55 +00:00
Alyssa Rosenzweig	079e9ae606	treewide: use BITSET_*_COUNT Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff as-if hand written: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>	2025-12-16 17:42:10 +00:00
Caio Oliveira	a4e84c9244	nir/gcm: Consider dead code elimination done by GCM as progress This will also fix NIR_DEBUG=extended_validation complaining about invalid loop analysis. GCM will invalidate loop analysis if progress was made, and depending on the removed instruction it will affect the instr_cost. Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>	2025-12-16 16:19:21 +00:00
Kenneth Graunke	88d46605bd	nir: Support Intel URB intrinsics in nir_opt_offsets We don't bother with maximums or wrapping because it shouldn't come up for IO intrinsics anyway. fossil-db results on Battlemage: Instrs: 231363032 -> 231359554 (-0.00%) Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00% Max live registers: 71873886 -> 71870438 (-0.00%) Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%) Totals from 1779 (0.23% of 788851) affected shaders: Instrs: 774359 -> 770881 (-0.45%) Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51% Max live registers: 158193 -> 154745 (-2.18%) Non SSA regs after NIR: 180104 -> 180219 (+0.06%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	97857d3224	nir: Fix mod analysis of ishl to shift the recursive result When considering ((x << y) % divisor), we recursed to calculate mod = (x % (divisor << y)) but incorrectly returned mod directly, rather than the correct value, (mod << y). (Note that we require divisor to be a power-of-two.) As an example of this going wrong, (x << 1) % 4 was returning (x % 2) which is 0 or 1, but x << 1 is 2x, which is always an even number so the result mod 4 can only be 0 or 2. Unit test suggested by Caio Oliveira during review. Fixes: `2255375c4d` ("nir: add nir_mod_analysis & its tests") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:37 +00:00
Marek Olšák	d17d1f53bd	nir/opt_cse: update potential future plans merging copy propagation with CSE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This matches my current understanding of nir_opt_copy_prop, including that nir_opt_copy_prop always replaces movs with vecN. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	9ac8e643d6	nir/lower_io: explain properly how nir_lower_io_lower_64bit_to_32* options work Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	41d127b9e8	nir/lower_io: remove unused option nir_lower_io_lower_64bit_float_to_32 Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	09b2325877	nir/print: print tex->sampler_dim Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Marek Olšák	4d976a5787	nir: fix the value of nir_io_use_frag_result_dual_src_blend Due to rebasing not recognizing it as a conflict, it ended up having the same value as nir_io_assign_color_input_bases_after_all_other_inputs. Fixes: `9a2f1be814` - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Iván Briano	a7280ab590	nir: add nir_lower_single_sampled::lower_sample_mask_in option GLSL defines gl_SampleMaskIn as : "a fragment language that indicates the set of samples covered by the primitive generating the fragment during multisample rasterization" when variable rate shading is enabled, a single invocation might cover multiple samples. The lowering done in nir_lower_single_sampled() does not account for that case, so add an option to selectively disable it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Iván Briano	ef31f07077	nir: clear SAMPLE_MASK_IN if we lowered it Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Konstantin Seurer	034f58c7e3	nir: Ignore ray query ranges that don't start with rq_initialize Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Handles is a rare edge case where the ray query is used "before" there is a rq_initialize. cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:29 +00:00
Konstantin Seurer	5e03d09eb5	nir: Fix typo in nir_opt_ray_query_ranges Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:28 +00:00
Emma Anholt	1a2d0d3f31	nir: Optimistically unroll loops using induction var as a sample id. On the assumption that nobody will use a sample id greater than the sample count, have loop unrolling guess based on the driver's max sample count. This unrolls a simple resolve shader with a uniform max samples on ir3 to: value = vec4(0); if (max_samples > 0) { value += txf_ms(coord, 0); if (max_samples > 1 { value += txf_ms(coord, 1); if (max_samples > 2){ value += txf_ms(coord, 2); if (max_samples > 3) { value += txf_ms(coord, 3); for (i = 4; i < max_samples; i++) value += txf_ms(coord, i); } } } } ... This is only worth a 1% win on our microbenchmark as-is, but if we could flatten those ifs out and pull the fadds out to the end, avoiding syncs per load would be a big win. This seems like a first step. I've taken a shot at updating drivers to set the value, and tried to leave notes in places that drivers might update, and want to follow up with updating the compiler option. This affects over half the DX11 apps in shader-db-private. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	10ba7675c8	nir/uub: Use an optional max_samples from drivers for sample counts. This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my shader-db. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	dc30e1a128	nir/loop_analyze: Use nir_unsigned_upper_bound for loop trip limits. This triggers some unrolling in Monster Hunter World, Total War: Warhammer, and Planet Zoo. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:10 +00:00
Mel Henning	2fab8fc297	nir: Use instr_clone in rematerialize_deref_in_block Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The previous implementation seems to predate nir_instr_clone() and duplicates a lot of the deref cloning code. This also makes the pass preserve deref->arr.in_bounds correctly. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	dc44c0f32b	treewide: Use nir_deref_instr_is_arr() Via coccinelle and some manual fixups. @@ expression e1; @@ - e1->deref_type == nir_deref_type_array \|\| e1->deref_type == nir_deref_type_ptr_as_array + nir_deref_instr_is_arr(e1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	263a82f49b	nir: Add nir_deref_instr_is_arr() helper Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:44 +00:00
Marek Olšák	9a2f1be814	nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it This is potentially nicer for some drivers. AMD drivers will use it. mesa_frag_result_get_color_index will be used often. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>	2025-12-10 19:16:46 +00:00

1 2 3 4 5 ...

6917 commits