fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-04-19 14:30:39 +02:00

Author	SHA1	Message	Date
Georg Lehmann	71f0c0d6a6	nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 12 (0.01% of 97623) affected shaders: Instrs: 9207 -> 8973 (-2.54%) CodeSize: 54192 -> 52832 (-2.51%) VGPRs: 768 -> 480 (-37.50%) Latency: 39516 -> 38507 (-2.55%) InvThroughput: 10155 -> 9859 (-2.91%) PreSGPRs: 329 -> 332 (+0.91%) PreVGPRs: 268 -> 263 (-1.87%) VALU: 4393 -> 4257 (-3.10%) SALU: 1037 -> 1019 (-1.74%) VOPD: 602 -> 599 (-0.50%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	0e5e1cb9b0	nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con) Foz-DB Navi48: Totals from 1 (0.00% of 97397) affected shaders: Instrs: 1848 -> 1834 (-0.76%) CodeSize: 9996 -> 9908 (-0.88%) VGPRs: 96 -> 72 (-25.00%) Latency: 17371 -> 17358 (-0.07%) Copies: 190 -> 191 (+0.53%) PreVGPRs: 43 -> 41 (-4.65%) VALU: 657 -> 648 (-1.37%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	4d8cc7d82e	nir/divergence: add nir_def_is_divergent_at_use_block helper For cases where the block we are interested in is not the immediate block of the nir_src. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Lionel Landwerlin	252e55a1bb	nir/printf-helpers: set writes_memory at printf emission Those helpers can be called late (since it's mostly for debug purposes). This can avoid surprises in the backend and also avoids rerunning gather_info. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>	2025-12-19 10:31:08 +00:00
Emma Anholt	5a09abe890	nir: Introduce nir_lower_vars_to_scratch_global(). This lets the driver make a more informed decision about which vars to lower to scratch based on the vars available to spill. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Emma Anholt	059d301c79	nir: Drop the mode argument of nir_lower_vars_to_scratch(). It only makes sense for function temps, and that's the only way it's been used. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>	2025-12-17 19:50:28 +00:00
Ian Romanick	66fd4d72fd	nir/algebraic: Mask with shifted constant instead of shift-then-mask shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17088766 -> 17088765 (<.01%) instructions in affected programs: 1375 -> 1374 (-0.07%) helped: 1 / HURT: 1 total cycles in shared programs: 887873068 -> 887871748 (<.01%) cycles in affected programs: 136402 -> 135082 (-0.97%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40937696 -> 40937728 (+0.00%) Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00% Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00% Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01% Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 49078640 -> 49078656 (+0.00%) Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00% Totals from 13809 (0.68% of 2019450) affected shaders: Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02% Subgroup size: 32 -> 64 (+100.00%) Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25% Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24% Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88% Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02% Max dispatch width: 234848 -> 234864 (+0.01%) Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00% Subgroup size: 27452928 -> 27452944 (+0.00%) Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01% Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02% Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05% Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37811544 -> 37811584 (+0.00%) Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00% Totals from 14438 (0.63% of 2281134) affected shaders: Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06% Subgroup size: 16 -> 32 (+100.00%) Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55% Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46% Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37% Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02% Max dispatch width: 123528 -> 123568 (+0.03%) Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00% Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01% Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01% Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01% Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37805336 -> 37805472 (+0.00%) Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00% Totals from 14835 (0.65% of 2278397) affected shaders: Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02% Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71% Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76% Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54% Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02% Max dispatch width: 126600 -> 126736 (+0.11%) Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>	2025-12-17 18:38:55 +00:00
Alyssa Rosenzweig	079e9ae606	treewide: use BITSET_*_COUNT Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff as-if hand written: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>	2025-12-16 17:42:10 +00:00
Caio Oliveira	a4e84c9244	nir/gcm: Consider dead code elimination done by GCM as progress This will also fix NIR_DEBUG=extended_validation complaining about invalid loop analysis. GCM will invalidate loop analysis if progress was made, and depending on the removed instruction it will affect the instr_cost. Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>	2025-12-16 16:19:21 +00:00
Kenneth Graunke	88d46605bd	nir: Support Intel URB intrinsics in nir_opt_offsets We don't bother with maximums or wrapping because it shouldn't come up for IO intrinsics anyway. fossil-db results on Battlemage: Instrs: 231363032 -> 231359554 (-0.00%) Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00% Max live registers: 71873886 -> 71870438 (-0.00%) Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%) Totals from 1779 (0.23% of 788851) affected shaders: Instrs: 774359 -> 770881 (-0.45%) Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51% Max live registers: 158193 -> 154745 (-2.18%) Non SSA regs after NIR: 180104 -> 180219 (+0.06%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	97857d3224	nir: Fix mod analysis of ishl to shift the recursive result When considering ((x << y) % divisor), we recursed to calculate mod = (x % (divisor << y)) but incorrectly returned mod directly, rather than the correct value, (mod << y). (Note that we require divisor to be a power-of-two.) As an example of this going wrong, (x << 1) % 4 was returning (x % 2) which is 0 or 1, but x << 1 is 2x, which is always an even number so the result mod 4 can only be 0 or 2. Unit test suggested by Caio Oliveira during review. Fixes: `2255375c4d` ("nir: add nir_mod_analysis & its tests") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:37 +00:00
Marek Olšák	d17d1f53bd	nir/opt_cse: update potential future plans merging copy propagation with CSE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This matches my current understanding of nir_opt_copy_prop, including that nir_opt_copy_prop always replaces movs with vecN. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	9ac8e643d6	nir/lower_io: explain properly how nir_lower_io_lower_64bit_to_32* options work Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	41d127b9e8	nir/lower_io: remove unused option nir_lower_io_lower_64bit_float_to_32 Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:59 +00:00
Marek Olšák	09b2325877	nir/print: print tex->sampler_dim Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Marek Olšák	4d976a5787	nir: fix the value of nir_io_use_frag_result_dual_src_blend Due to rebasing not recognizing it as a conflict, it ended up having the same value as nir_io_assign_color_input_bases_after_all_other_inputs. Fixes: `9a2f1be814` - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>	2025-12-13 06:41:58 +00:00
Iván Briano	a7280ab590	nir: add nir_lower_single_sampled::lower_sample_mask_in option GLSL defines gl_SampleMaskIn as : "a fragment language that indicates the set of samples covered by the primitive generating the fragment during multisample rasterization" when variable rate shading is enabled, a single invocation might cover multiple samples. The lowering done in nir_lower_single_sampled() does not account for that case, so add an option to selectively disable it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Iván Briano	ef31f07077	nir: clear SAMPLE_MASK_IN if we lowered it Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>	2025-12-11 22:50:10 +00:00
Konstantin Seurer	034f58c7e3	nir: Ignore ray query ranges that don't start with rq_initialize Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Handles is a rare edge case where the ray query is used "before" there is a rq_initialize. cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:29 +00:00
Konstantin Seurer	5e03d09eb5	nir: Fix typo in nir_opt_ray_query_ranges Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38850>	2025-12-11 15:56:28 +00:00
Emma Anholt	1a2d0d3f31	nir: Optimistically unroll loops using induction var as a sample id. On the assumption that nobody will use a sample id greater than the sample count, have loop unrolling guess based on the driver's max sample count. This unrolls a simple resolve shader with a uniform max samples on ir3 to: value = vec4(0); if (max_samples > 0) { value += txf_ms(coord, 0); if (max_samples > 1 { value += txf_ms(coord, 1); if (max_samples > 2){ value += txf_ms(coord, 2); if (max_samples > 3) { value += txf_ms(coord, 3); for (i = 4; i < max_samples; i++) value += txf_ms(coord, i); } } } } ... This is only worth a 1% win on our microbenchmark as-is, but if we could flatten those ifs out and pull the fadds out to the end, avoiding syncs per load would be a big win. This seems like a first step. I've taken a shot at updating drivers to set the value, and tried to leave notes in places that drivers might update, and want to follow up with updating the compiler option. This affects over half the DX11 apps in shader-db-private. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	10ba7675c8	nir/uub: Use an optional max_samples from drivers for sample counts. This triggers some unrolling in Fallout 4, GTAV, and Rocky Planet in my shader-db. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:11 +00:00
Emma Anholt	dc30e1a128	nir/loop_analyze: Use nir_unsigned_upper_bound for loop trip limits. This triggers some unrolling in Monster Hunter World, Total War: Warhammer, and Planet Zoo. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>	2025-12-11 14:26:10 +00:00
Mel Henning	2fab8fc297	nir: Use instr_clone in rematerialize_deref_in_block Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The previous implementation seems to predate nir_instr_clone() and duplicates a lot of the deref cloning code. This also makes the pass preserve deref->arr.in_bounds correctly. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	dc44c0f32b	treewide: Use nir_deref_instr_is_arr() Via coccinelle and some manual fixups. @@ expression e1; @@ - e1->deref_type == nir_deref_type_array \|\| e1->deref_type == nir_deref_type_ptr_as_array + nir_deref_instr_is_arr(e1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:45 +00:00
Mel Henning	263a82f49b	nir: Add nir_deref_instr_is_arr() helper Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>	2025-12-10 22:07:44 +00:00
Marek Olšák	9a2f1be814	nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it This is potentially nicer for some drivers. AMD drivers will use it. mesa_frag_result_get_color_index will be used often. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>	2025-12-10 19:16:46 +00:00
Georg Lehmann	621465e417	nir/opt_uniform_subgroup: handle more trivial shuffles/votes Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e648e551c1	nir/opt_uniform_subgroup: wire up mbcnt_amd path Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	5778436e99	nir/opt_uniform_subgroup: use nir_shader_intrinsics_pass Nothing here needs the recursion of the full lower_instructions pass. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	5f28bb72a7	nir/divergence_analysis: fix swizzle_amd without fetch inactive Fixes: `ad5be40303` ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	1fc38d8539	nir/opt_uniform_subgroup: fix swizzle_amd without fetch_inactive Fixes: `ad5be40303` ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e11d7f06d0	nir/opt_uniform_subgroup: don't try to optimize non trivial clustered reduce Fixes: `535caaf3e0` ("nir: Optimize uniform iadd, fadd, and ixor reduction operations") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Marek Olšák	0c400fbed9	nir: give nir_lower_clip_cull_distance_array_vars a better name also rename the file Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:34 +00:00
Marek Olšák	74995eb64d	nir: split gathering array sizes from nir_lower_clip_cull_distance_array_vars nir_lower_clip_cull_distance_array_vars was sneakily updating shader_info::clip/cull_distance_array_size. This moves the gathering into a new function nir_gather_clip_cull_distance_sizes_from_vars. v2: remove assertions that prevented nir_lower_clip_cull_distance_array_vars from being used with non-compact arrays Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:34 +00:00
Marek Olšák	bdcb7bc674	nir/gather_info: clear clip/cull_distance_array_size if the IO is not present Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>	2025-12-10 05:16:33 +00:00
Alyssa Rosenzweig	5ced623fdf	nir: print nir_tex_instr::backend_flags if present I was wondering where this was disappearing to. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38789>	2025-12-09 20:44:15 +00:00
Arcady Goldmints-Orlov	68bb5d9e49	kk: enable shaderClipDistance Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Since Metal doesn't pass clip distance into the fragment shader, we have to do it ourselves. The CLIP_DIST0/1 varying slots are used to represent the user-defined varyings we use to pass them from vertex to fragment and a new intrinsic is added to represent the write to the built-in clip_distance variable. Since the CLIP_DIST0/1 varying slots are not affected by opt_varyings, there can be potential interface mismatches so the machinery in msl_iomap.c is refactored to allow them to be output as a series of scalars rather than vectors. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38839>	2025-12-08 23:09:53 -05:00
Connor Abbott	ad84ae2719	tu: Implement VK_QCOM_subpass_shader_resolve Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>	2025-12-08 20:44:46 +00:00
Connor Abbott	bd821b9a17	nir, tu: Add and use load_frag_coord_gmem_ir3 We used load_frag_coord_unscaled_ir3 for loading the fragment coord for input attachments in GMEM, where the normal scaling for gl_FragCoord shouldn't be used. However with custom resolve a different scaling will apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename the NIR options, in preparation for this. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>	2025-12-08 20:44:45 +00:00
Yiwei Zhang	2de8981351	nir: suppress clang warnings for cooperative matrix lowering This suppresses below compile warnings: - warning: variable 'idx' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38835>	2025-12-08 19:36:05 +00:00
Georg Lehmann	7f6bd8b003	nir/peephole_select: allow mbcnt_amd Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details It's just alu, so handle it like alu. Foz-DB Navi21: Totals from 3 (0.00% of 97591) affected shaders: Instrs: 433 -> 426 (-1.62%) CodeSize: 2408 -> 2388 (-0.83%) Latency: 7520 -> 7925 (+5.39%) InvThroughput: 857 -> 1009 (+17.74%) Copies: 55 -> 43 (-21.82%) Branches: 21 -> 17 (-19.05%) SALU: 79 -> 76 (-3.80%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38828>	2025-12-08 10:52:48 +00:00
Georg Lehmann	005cc4110c	nir/peephole_select: allow ballot We can allow collapsing control flow around ballot if we update the ballot condition like we do for discards. ballot_relaxed needs no condition update, as the result bits are undefined for inactive invocations. Foz-DB Navi21: Totals from 27 (0.03% of 97591) affected shaders: Instrs: 2554506 -> 2554469 (-0.00%); split: -0.00%, +0.00% CodeSize: 13765636 -> 13765684 (+0.00%); split: -0.00%, +0.00% Latency: 14186667 -> 14186861 (+0.00%); split: -0.00%, +0.00% InvThroughput: 3542516 -> 3542595 (+0.00%); split: -0.00%, +0.00% SClause: 52038 -> 52030 (-0.02%) Copies: 209410 -> 208763 (-0.31%) Branches: 83716 -> 83399 (-0.38%) PreSGPRs: 2372 -> 2386 (+0.59%); split: -0.17%, +0.76% VALU: 1701458 -> 1701482 (+0.00%) SALU: 369884 -> 370107 (+0.06%); split: -0.00%, +0.07% SMEM: 67643 -> 67634 (-0.01%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38828>	2025-12-08 10:52:48 +00:00
Georg Lehmann	077b654cc7	nir: don't sink alu that uses ballot(true) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Don't sink alu that uses ballot(true), as that can a local system value and moving the alu then requires a new mov in the old location. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38829>	2025-12-08 09:07:54 +00:00
Marek Olšák	a051d4ee6b	nir/lower_io_vars: don't insert output stores for unrelated streams before emits Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Before every emit_vertex(stream_id = n), we would insert stores for all outputs, including outputs that are not meant for that stream. Those stores would end up having no effect while potentially reducing performance. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38100>	2025-12-06 02:27:46 +00:00
Arcady Goldmints-Orlov	0df8aa940c	nir: Use nir_shader_intrinsics_pass in nir_lower_io_to_scalar Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38816>	2025-12-05 22:30:22 +00:00
Emma Anholt	66b157095c	nir/shader_bisect: Allow passing in a --lo / --hi to continue a run. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Sometimes you fumble an answer, and would like to not restart from the beginning (or just want to see the behavior of the script late in the run if you're debugging it). Pass in the last bad range, and you can keep going. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38760>	2025-12-04 22:47:25 +00:00
Emma Anholt	4287bb761e	nir/shader_bisect: Fix C code printing after review feedback changes. When I added in the printed-shader and env var value both being tracked in shaders[], it broke the C printing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38760>	2025-12-04 22:47:25 +00:00
Karol Herbst	a255e2ca56	nir: add ACCESS to shared_uniform_block_intel intel_nir_blockify_uniform_loads simply overwrites the intrinsic for load_shared, which leads to messed up indicies, e.g: "base=0, access=volatile, align_mul=4, align_offset=0 became: "base=0, align_mul=4, align_offset=4" Fixes: `0dd09a292b` ("nir: add ACCESS_ATOMIC") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38801>	2025-12-04 10:01:52 +00:00
Marek Olšák	e14f8ee0e4	nir/has_divergent_loop: require divergence metadata, check all function impls Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details instead of forcing callers to call nir_divergence_analysis Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38597>	2025-12-03 20:14:18 +00:00

1 2 3 4 5 ...

6894 commits