fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 15:38:19 +02:00

Author	SHA1	Message	Date
Pavel Ondračka	0b39b5ea63	nir/opt_algebraic: improve dot product narrowing Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The issue is that the current narrowing patterns are not working in a lot of cases, for example (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)), is missing patterns like this: 32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000) 32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0) 32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz or after some later transforms: 32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000) 32x2 %6 = vec2 %5, %1 (0x0) 32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy This patch is heavily based on old branch from Ian Romanick from 2019. r300 RV530 shader-db: total instructions in shared programs: 128900 -> 128882 (-0.01%) instructions in affected programs: 621 -> 603 (-2.90%) helped: 10 HURT: 1 total cycles in shared programs: 191837 -> 191828 (<.01%) cycles in affected programs: 799 -> 790 (-1.13%) helped: 7 HURT: 1 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>	2026-01-02 16:07:10 +01:00
Timur Kristóf	2b62738b9b	nir: Add new nir_remove_outputs pass Introduce a new NIR pass called nir_remove_outputs which works on lowered I/O intrinsics and can remove any output varying or sysval. This is meant to replace custom solutions in drivers, such as radv_remove_varyings and similar. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>	2026-01-01 21:25:42 -06:00
Timur Kristóf	1981b9836b	nir/opt_vectorize_io: Fix allow_holes option Only allow holes between the first and last used component. Do not load unused components before the first used component. This fixes test failures with a bunch of VK CTS tests with allow_holes enabled on RADV: dEQP-VK.tessellation.tess_io.max_in_out.with_f16.* Fixes: `6286c1c66f` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>	2026-01-01 17:38:01 -06:00
Marek Olšák	99a42bdd4b	nir,radeonsi: simplify load_color0 & load_color1 intrinsics and shader_info We don't need the shader_info fields anymore. sample and centroid fields are unused. The interp field is already available from si_shader_info::color_interpolate. The loads don't need to be sysvals. Add also the _amd suffix. Don't handle it in st_nir_lower_drawpixels either because the intrinsics are created much later in compilation now. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>	2026-01-01 18:30:28 +00:00
Georg Lehmann	369a3b22b4	nir/opt_uniform_subgroup: optimize uniform ddx/ddy Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We can't just use 0.0 as the replacement because of NaN/Inf. But turning the intrinsic into a simple fsub should still be better or at least equal. Foz-DB Navi48: Totals from 128 (0.10% of 125402) affected shaders: MaxWaves: 3684 -> 3708 (+0.65%) Instrs: 111150 -> 111055 (-0.09%); split: -0.20%, +0.11% CodeSize: 587176 -> 590800 (+0.62%); split: -0.01%, +0.63% VGPRs: 6540 -> 6480 (-0.92%) Latency: 382775 -> 383332 (+0.15%); split: -0.15%, +0.29% InvThroughput: 80909 -> 80530 (-0.47%); split: -0.51%, +0.04% VClause: 1433 -> 1430 (-0.21%) SClause: 1834 -> 1841 (+0.38%); split: -0.11%, +0.49% Copies: 6130 -> 6096 (-0.55%); split: -1.29%, +0.73% PreSGPRs: 7352 -> 7356 (+0.05%) PreVGPRs: 4797 -> 4721 (-1.58%) VALU: 71892 -> 71435 (-0.64%); split: -0.64%, +0.01% SALU: 12665 -> 13056 (+3.09%); split: -0.06%, +3.14% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39112>	2026-01-01 08:43:55 +00:00
Sviatoslav Peleshko	f3eb98ec57	nir/normalize_cubemap_coords: Handle the projector before the normalization Applying the projector after the normalization breaks the coordinates, so apply it early. Usually it's not even necessary for the cubemaps anyway, but ARB_fragment_program and TGSI allow it. Fixes: `52e71809` ("nir: Add a cubemap normalizing pass") Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39087>	2025-12-30 16:25:09 +00:00
Georg Lehmann	5e8cc19a3b	nir: remove per shader float fast math flags Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details These were redundant with the per alu fast math flags. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	6e67267045	nir/opt_varyings: use per instruction nan flag for promoting to flat Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	4f5a29ec32	nir/opt_varyings: use per instruction inf/nan flag for moving past interp Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:06 +00:00
Georg Lehmann	f3290219ab	nir: use a seperate enum for per alu floating point math control We don't need one bit per bitsize per instruction if only one actually matters in the end. First step towards moving NIR in the direction of full float_controls2 only. Also rename this from fp_fast_math, because that name implied that 0 is the no fast math mode, while the opposite was the case. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>	2025-12-29 10:57:05 +00:00
Georg Lehmann	71f0c0d6a6	nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 12 (0.01% of 97623) affected shaders: Instrs: 9207 -> 8973 (-2.54%) CodeSize: 54192 -> 52832 (-2.51%) VGPRs: 768 -> 480 (-37.50%) Latency: 39516 -> 38507 (-2.55%) InvThroughput: 10155 -> 9859 (-2.91%) PreSGPRs: 329 -> 332 (+0.91%) PreVGPRs: 268 -> 263 (-1.87%) VALU: 4393 -> 4257 (-3.10%) SALU: 1037 -> 1019 (-1.74%) VOPD: 602 -> 599 (-0.50%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	0e5e1cb9b0	nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con) Foz-DB Navi48: Totals from 1 (0.00% of 97397) affected shaders: Instrs: 1848 -> 1834 (-0.76%) CodeSize: 9996 -> 9908 (-0.88%) VGPRs: 96 -> 72 (-25.00%) Latency: 17371 -> 17358 (-0.07%) Copies: 190 -> 191 (+0.53%) PreVGPRs: 43 -> 41 (-4.65%) VALU: 657 -> 648 (-1.37%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	4d8cc7d82e	nir/divergence: add nir_def_is_divergent_at_use_block helper For cases where the block we are interested in is not the immediate block of the nir_src. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Lionel Landwerlin	252e55a1bb	nir/printf-helpers: set writes_memory at printf emission Those helpers can be called late (since it's mostly for debug purposes). This can avoid surprises in the backend and also avoids rerunning gather_info. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>	2025-12-19 10:31:08 +00:00
Emma Anholt	5a09abe890	nir: Introduce nir_lower_vars_to_scratch_global(). This lets the driver make a more informed decision about which vars to lower to scratch based on the vars available to spill. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Emma Anholt	059d301c79	nir: Drop the mode argument of nir_lower_vars_to_scratch(). It only makes sense for function temps, and that's the only way it's been used. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Ian Romanick	66fd4d72fd	nir/algebraic: Mask with shifted constant instead of shift-then-mask shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17088766 -> 17088765 (<.01%) instructions in affected programs: 1375 -> 1374 (-0.07%) helped: 1 / HURT: 1 total cycles in shared programs: 887873068 -> 887871748 (<.01%) cycles in affected programs: 136402 -> 135082 (-0.97%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40937696 -> 40937728 (+0.00%) Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00% Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00% Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01% Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 49078640 -> 49078656 (+0.00%) Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00% Totals from 13809 (0.68% of 2019450) affected shaders: Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02% Subgroup size: 32 -> 64 (+100.00%) Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25% Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24% Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88% Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02% Max dispatch width: 234848 -> 234864 (+0.01%) Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00% Subgroup size: 27452928 -> 27452944 (+0.00%) Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01% Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02% Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05% Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37811544 -> 37811584 (+0.00%) Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00% Totals from 14438 (0.63% of 2281134) affected shaders: Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06% Subgroup size: 16 -> 32 (+100.00%) Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55% Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46% Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37% Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02% Max dispatch width: 123528 -> 123568 (+0.03%) Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00% Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01% Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01% Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01% Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37805336 -> 37805472 (+0.00%) Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00% Totals from 14835 (0.65% of 2278397) affected shaders: Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02% Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71% Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76% Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54% Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02% Max dispatch width: 126600 -> 126736 (+0.11%) Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>	2025-12-17 18:38:55 +00:00
Alyssa Rosenzweig	079e9ae606	treewide: use BITSET_*_COUNT Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff as-if hand written: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>	2025-12-16 17:42:10 +00:00
Caio Oliveira	a4e84c9244	nir/gcm: Consider dead code elimination done by GCM as progress This will also fix NIR_DEBUG=extended_validation complaining about invalid loop analysis. GCM will invalidate loop analysis if progress was made, and depending on the removed instruction it will affect the instr_cost. Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>	2025-12-16 16:19:21 +00:00
Kenneth Graunke	88d46605bd	nir: Support Intel URB intrinsics in nir_opt_offsets We don't bother with maximums or wrapping because it shouldn't come up for IO intrinsics anyway. fossil-db results on Battlemage: Instrs: 231363032 -> 231359554 (-0.00%) Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00% Max live registers: 71873886 -> 71870438 (-0.00%) Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%) Totals from 1779 (0.23% of 788851) affected shaders: Instrs: 774359 -> 770881 (-0.45%) Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51% Max live registers: 158193 -> 154745 (-2.18%) Non SSA regs after NIR: 180104 -> 180219 (+0.06%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	97857d3224	nir: Fix mod analysis of ishl to shift the recursive result When considering ((x << y) % divisor), we recursed to calculate mod = (x % (divisor << y)) but incorrectly returned mod directly, rather than the correct value, (mod << y). (Note that we require divisor to be a power-of-two.) As an example of this going wrong, (x << 1) % 4 was returning (x % 2) which is 0 or 1, but x << 1 is 2x, which is always an even number so the result mod 4 can only be 0 or 2. Unit test suggested by Caio Oliveira during review. Fixes: `2255375c4d` ("nir: add nir_mod_analysis & its tests") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:37 +00:00
Marek Olšák	d17d1f53bd	nir/opt_cse: update potential future plans merging copy propagation with CSE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This matches my current understanding of nir_opt_copy_prop, including that nir_opt_copy_prop always replaces movs with vecN. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	9ac8e643d6	nir/lower_io: explain properly how nir_lower_io_lower_64bit_to_32* options work Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	41d127b9e8	nir/lower_io: remove unused option nir_lower_io_lower_64bit_float_to_32 Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	09b2325877	nir/print: print tex->sampler_dim Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Marek Olšák	4d976a5787	nir: fix the value of nir_io_use_frag_result_dual_src_blend Due to rebasing not recognizing it as a conflict, it ended up having the same value as nir_io_assign_color_input_bases_after_all_other_inputs. Fixes: `9a2f1be814` - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Iván Briano	a7280ab590	nir: add nir_lower_single_sampled::lower_sample_mask_in option GLSL defines gl_SampleMaskIn as : "a fragment language that indicates the set of samples covered by the primitive generating the fragment during multisample rasterization" when variable rate shading is enabled, a single invocation might cover multiple samples. The lowering done in nir_lower_single_sampled() does not account for that case, so add an option to selectively disable it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Iván Briano	ef31f07077	nir: clear SAMPLE_MASK_IN if we lowered it Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Konstantin Seurer	034f58c7e3	nir: Ignore ray query ranges that don't start with rq_initialize Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Handles is a rare edge case where the ray query is used "before" there is a rq_initialize. cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:29 +00:00
Konstantin Seurer	5e03d09eb5	nir: Fix typo in nir_opt_ray_query_ranges Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:28 +00:00
Emma Anholt	1a2d0d3f31	nir: Optimistically unroll loops using induction var as a sample id. On the assumption that nobody will use a sample id greater than the sample count, have loop unrolling guess based on the driver's max sample count. This unrolls a simple resolve shader with a uniform max samples on ir3 to: value = vec4(0); if (max_samples > 0) { value += txf_ms(coord, 0); if (max_samples > 1 { value += txf_ms(coord, 1); if (max_samples > 2){ value += txf_ms(coord, 2); if (max_samples > 3) { value += txf_ms(coord, 3); for (i = 4; i < max_samples; i++) value += txf_ms(coord, i); } } } } ... This is only worth a 1% win on our microbenchmark as-is, but if we could flatten those ifs out and pull the fadds out to the end, avoiding syncs per load would be a big win. This seems like a first step. I've taken a shot at updating drivers to set the value, and tried to leave notes in places that drivers might update, and want to follow up with updating the compiler option. This affects over half the DX11 apps in shader-db-private. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	10ba7675c8	nir/uub: Use an optional max_samples from drivers for sample counts. This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my shader-db. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	dc30e1a128	nir/loop_analyze: Use nir_unsigned_upper_bound for loop trip limits. This triggers some unrolling in Monster Hunter World, Total War: Warhammer, and Planet Zoo. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:10 +00:00
Mel Henning	2fab8fc297	nir: Use instr_clone in rematerialize_deref_in_block Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The previous implementation seems to predate nir_instr_clone() and duplicates a lot of the deref cloning code. This also makes the pass preserve deref->arr.in_bounds correctly. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	dc44c0f32b	treewide: Use nir_deref_instr_is_arr() Via coccinelle and some manual fixups. @@ expression e1; @@ - e1->deref_type == nir_deref_type_array \|\| e1->deref_type == nir_deref_type_ptr_as_array + nir_deref_instr_is_arr(e1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	263a82f49b	nir: Add nir_deref_instr_is_arr() helper Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:44 +00:00
Marek Olšák	9a2f1be814	nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it This is potentially nicer for some drivers. AMD drivers will use it. mesa_frag_result_get_color_index will be used often. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>	2025-12-10 19:16:46 +00:00
Georg Lehmann	621465e417	nir/opt_uniform_subgroup: handle more trivial shuffles/votes Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e648e551c1	nir/opt_uniform_subgroup: wire up mbcnt_amd path Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	5778436e99	nir/opt_uniform_subgroup: use nir_shader_intrinsics_pass Nothing here needs the recursion of the full lower_instructions pass. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	5f28bb72a7	nir/divergence_analysis: fix swizzle_amd without fetch inactive Fixes: `ad5be40303` ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	1fc38d8539	nir/opt_uniform_subgroup: fix swizzle_amd without fetch_inactive Fixes: `ad5be40303` ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e11d7f06d0	nir/opt_uniform_subgroup: don't try to optimize non trivial clustered reduce Fixes: `535caaf3e0` ("nir: Optimize uniform iadd, fadd, and ixor reduction operations") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Marek Olšák	0c400fbed9	nir: give nir_lower_clip_cull_distance_array_vars a better name also rename the file Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:34 +00:00
Marek Olšák	74995eb64d	nir: split gathering array sizes from nir_lower_clip_cull_distance_array_vars nir_lower_clip_cull_distance_array_vars was sneakily updating shader_info::clip/cull_distance_array_size. This moves the gathering into a new function nir_gather_clip_cull_distance_sizes_from_vars. v2: remove assertions that prevented nir_lower_clip_cull_distance_array_vars from being used with non-compact arrays Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:34 +00:00
Marek Olšák	bdcb7bc674	nir/gather_info: clear clip/cull_distance_array_size if the IO is not present Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:33 +00:00
Alyssa Rosenzweig	5ced623fdf	nir: print nir_tex_instr::backend_flags if present I was wondering where this was disappearing to. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38789>	2025-12-09 20:44:15 +00:00
Arcady Goldmints-Orlov	68bb5d9e49	kk: enable shaderClipDistance Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Since Metal doesn't pass clip distance into the fragment shader, we have to do it ourselves. The CLIP_DIST0/1 varying slots are used to represent the user-defined varyings we use to pass them from vertex to fragment and a new intrinsic is added to represent the write to the built-in clip_distance variable. Since the CLIP_DIST0/1 varying slots are not affected by opt_varyings, there can be potential interface mismatches so the machinery in msl_iomap.c is refactored to allow them to be output as a series of scalars rather than vectors. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38839>	2025-12-08 23:09:53 -05:00
Connor Abbott	ad84ae2719	tu: Implement VK_QCOM_subpass_shader_resolve Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>	2025-12-08 20:44:46 +00:00
Connor Abbott	bd821b9a17	nir, tu: Add and use load_frag_coord_gmem_ir3 We used load_frag_coord_unscaled_ir3 for loading the fragment coord for input attachments in GMEM, where the normal scaling for gl_FragCoord shouldn't be used. However with custom resolve a different scaling will apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename the NIR options, in preparation for this. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>	2025-12-08 20:44:45 +00:00

... 7 8 9 10 11 ...

7304 commits