fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 02:38:07 +02:00

Author	SHA1	Message	Date
Daniel Schürmann	9039e24751	nir/lower_flrp: ad-hoc constant-fold ALU instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	f61cd64af8	nir/builder: add option to immediately constant-fold ALU instructions upon insertion Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	870616af34	nir/constant_folding: switch to nir_shader_lower_instructions() Small differences due to implicit DCE. Totals from 76 (0.10% of 79839) affected shaders: (Navi48) Instrs: 168051 -> 168044 (-0.00%); split: -0.01%, +0.01% CodeSize: 893284 -> 893256 (-0.00%); split: -0.01%, +0.01% Latency: 1082007 -> 1082027 (+0.00%); split: -0.00%, +0.00% InvThroughput: 155100 -> 155105 (+0.00%) Copies: 9649 -> 9654 (+0.05%) VALU: 92504 -> 92509 (+0.01%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:07 +00:00
Daniel Schürmann	d1f2f1222e	nir: guard nir_def_as_alu() We will potentially create load_const_instr instead of ALU. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	3180656bbc	nir: don't use nir_build_alu() with incomplete sources Ideally we'd have a version that takes nir_scalar arguments. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Daniel Schürmann	ef9ecc4058	nir: add nir_imul_nuw() and nir_imul_imm_nuw() helpers Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>	2025-10-30 19:28:06 +00:00
Alyssa Rosenzweig	b82044c31b	nir/lower_two_sided_color: cleanup while in the area. no functional change Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38124>	2025-10-29 15:52:27 +00:00
Job Noorman	32b646c597	nir: print in_bounds info for deref_type(_ptr_as)_array Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>	2025-10-28 14:21:01 +00:00
Natalie Vock	50e65dac79	nir/lower_shader_calls: Repair SSA after wrap_instrs Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Wrapping jump instructions that are located inside ifs can break SSA invariants because the else block no longer dominates the merge block. Repair the SSA to make the validator happy again. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>	2025-10-26 11:38:51 +00:00
Alyssa Rosenzweig	b824ef83ab	util/dynarray: infer type in append Most of the time, we can infer the type to append in util_dynarray_append using __typeof__, which is standardized in C23 and support in Jesse's MSMSVCV. This patch drops the type argument most of the time, making util_dynarray a little more ergonomic to use. This is done in four steps. First, rename util_dynarray_append -> util_dynarray_append_typed bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;" Then, add a new append that infers the type. This is much more ergonomic for what you want most of the time. Next, use type-inferred append as much as possible, via Coccinelle patch (plus manual fixup): @@ expression dynarray, element; type type; @@ -util_dynarray_append_typed(dynarray, type, element); +util_dynarray_append(dynarray, element); Finally, hand fixup cases that Coccinelle missed or incorrectly translated, of which there were several because we can't used the untyped append with a literal (since the sizeof won't do what you want). All four steps are squashed to produce a single patch changing every util_dynarray_append call site in tree to either drop a type parameter (if possible) or insert a _typed suffix (if we can't infer). As such, the final patch is best reviewed by hand even though it was tool-assisted. No Long Linguine Meals were involved in the making of this patch. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>	2025-10-24 18:32:07 +00:00
Ian Romanick	f1bbc3d4e4	nir/algebraic: Don't generate integer min or max that will need to be lowered Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In !35844, there was some discussion about allowing 64-bit bcsel that would be lowered in the driver. One challenge there would be if a 64-bit bcsel was transformed into integer min or max by an algebraic optimization. I believe these were the only algebraic patterns that could create new integer min or max that would not be immediately constant folded. There were no shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>	2025-10-23 22:35:27 +00:00
Rhys Perry	92beca9aa5	nir/lower_tex: optimize txd(coord, ddx/ddy(coord)) fossil-db (gfx1201): Totals from 73 (0.09% of 79839) affected shaders: MaxWaves: 1668 -> 1670 (+0.12%) Instrs: 352537 -> 347991 (-1.29%); split: -1.29%, +0.00% CodeSize: 1924140 -> 1887660 (-1.90%); split: -1.90%, +0.00% VGPRs: 6360 -> 6324 (-0.57%) Latency: 3891330 -> 3888192 (-0.08%); split: -0.10%, +0.02% InvThroughput: 789998 -> 783583 (-0.81%); split: -0.84%, +0.03% VClause: 6409 -> 6408 (-0.02%); split: -0.06%, +0.05% SClause: 4071 -> 4102 (+0.76%); split: -0.10%, +0.86% Copies: 16756 -> 16316 (-2.63%); split: -2.94%, +0.32% PreVGPRs: 5456 -> 5432 (-0.44%); split: -0.57%, +0.13% VALU: 232982 -> 228117 (-2.09%) SALU: 32853 -> 32848 (-0.02%); split: -0.05%, +0.03% VMEM: 9234 -> 9237 (+0.03%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37561>	2025-10-23 11:21:59 +00:00
Rhys Perry	8e7ea4a882	nir/lower_shader_calls: reobtain impl after NIR_PASS Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>	2025-10-23 10:44:38 +00:00
Lionel Landwerlin	aa929ea706	nir/lower_io: add missing levels intrinsics to get_io_index_src_number Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c7ac46a1d8` ("nir/lower_io: add get_io_index_src_number support for image intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>	2025-10-22 21:21:47 +00:00
Simon Perretta	ff51e6dc9e	nir: commonize barycentric intrinsic opt pass Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Introduces an opt pass that attempts to optimize load_barycentric_at_{sample,offset} with simpler load_barycentric_* equivalents where possible, and optionally lowers load_barycentric_at_sample to load_barycentric_at_offset with a position derived from the sample ID instead. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>	2025-10-22 16:48:01 +00:00
Olivia Lee	a410d90fd2	panfrost: fix cl_local_size for precompiled shaders Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_lower_compute_system_values will attempt to lower load_workgroup_size unless workgroup_size_variable is set. For precomp shaders, the workgroup size is set statically for each entrypoint by nir_precompiled_build_variant. Because we call lower_compute_system_values early, it sets the workgroup size to zero. Temporarily setting workgroup_size_variable while we are still processing all the entrypoints together inhibits this. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Fixes: `20970bcd96` ("panfrost: Add base of OpenCL C infrastructure") Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>	2025-10-22 00:15:49 +00:00
Rhys Perry	64ec757688	nir/lower_mem_access_bit_sizes: increase chunk limit Not sure about creating u64vec16 loads, but creating unaligned loads is possible with opt_if_rewrite_uniform_uses. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>	2025-10-21 22:10:34 +00:00
Georg Lehmann	cf4ab485ea	nir: remove manual nir_load_global_constant Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:39:53 +02:00
Georg Lehmann	654bd74c60	treewide: use nir_store_global alias of nir_build_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	2306cba65b	nir: remove manual nir_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	9e41a7c139	treewide: use nir_load_global alias of nir_build_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	77540cac8c	nir: remove manual nir_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Lionel Landwerlin	255d1e883d	nir/divergence: fix handling of intel uniform block load Those are normally uniform always, but for the purpose of fused threads handling, we need to check their sources. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ca1533cd03` ("nir/divergence: add a new mode to cover fused threads on Intel HW") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>	2025-10-21 06:13:10 +00:00
Emma Anholt	0781edc30f	nir/copy_prop_vars: Mask out no-op writes to variables. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The pass previously supported removing complete no-op writes, but we can do better by noticing if any channel being written is the current channel's value and masking off those writes. I noticed this happening in Stray's fragment shader, where it looked like some translation layer had turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a, b). This in turn lets nir_shrink_vec_array_vars be more effective. Totals: MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01% Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00% CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00% STPs: 369810 -> 234618 (-36.56%) LDPs: 209430 -> 172011 (-17.87%) Totals from 1884 (0.12% of 1560230) affected shaders: MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15% Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06% CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05% NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58% MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59% COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83% Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31% (ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68% (sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18% (ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57% (sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12% STPs: 212045 -> 76853 (-63.76%) LDPs: 47337 -> 9918 (-79.05%) Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13% Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43% Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81% Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04% Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11% Cat6: 461411 -> 352375 (-23.63%) Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	537cc4e0ff	nir/shrink_stores: Don't shrink stores to an invalid num_components. Avoids a regression in the CL CTS on the next commit. Fixes: `2dba7e6056` ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	d8690f9c60	nir/link_opt_varyings: Make it participate in NIR_DEBUG=print. It's a pass with major effects on shaders, and it's otherwise weird to see your varying disappear between two passes that shouldn't affect them. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Aitor Camacho	f711c3afed	nir: Add KosmicKrisp required utilities Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>	2025-10-20 16:22:00 +00:00
Job Noorman	ad421cdf2e	nir: mark fneg distribution through fadd/ffma as nsz Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `df1876f615` ("nir: Mark negative re-distribution on fadd as imprecise") fixed the fadd case by marking it as imprecise. This commit fixes the ffma case for the same reason. However, "imprecise" isn't necessary and nowadays we have "nsz" which is more accurate here. Use that for both fadd and ffma. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `62795475e8` ("nir/algebraic: Distribute source modifiers into instructions") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>	2025-10-17 08:58:59 +00:00
Mary Guillemard	6f73533094	asahi,nir: Stop relying on zero and scratch page in GS/TESS code Introduce new NIR intrinsics to handle getting a "sink" read-only address and another intrinsic to handle conversion of address to read-write (allowing implementation to replace the "sink" read-only with another address like required for Asahi) Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Mary Guillemard	1e0c18d6cf	nir: Rename stat_query_address_agx to stat_query_address_poly This is used by the geometry lowering that we are going to move to common code. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig	84d8e6824b	treewide: don't check before free Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This was something that came up in the slop MR. Not sure it's actually a good idea or not but kind of curious what people think, given we have a sound tool (Coccinelle) to do the transform. Saves a redundant branch but means extra noninlined function calls.. likely no actual perf impact but saves some code. Via Coccinelle patches: @@ expression ptr; @@ -if (ptr) { -free(ptr); -} +free(ptr); @@ expression ptr; @@ -if (ptr) { -FREE(ptr); -} +FREE(ptr); @@ expression ptr; @@ -if (ptr) { -ralloc_free(ptr); -} +ralloc_free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -free(ptr); -} - +free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -FREE(ptr); -} - +FREE(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -ralloc_free(ptr); -} - +ralloc_free(ptr); Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d] Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus] Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr] Reviewed-by: Janne Grunau <j@jannau.net> [asahi] Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv] Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3] Acked-by: Marek Olšák <maraeo@gmail.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Job Noorman <jnoorman@igalia.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>	2025-10-15 23:01:33 +00:00
Dave Airlie	543c9be87a	nir/coopmat: fix non square load/store lowering for flexible dimensions This shouldn't affect radv, but we should do the calculations correctly for when non-square matters. Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>	2025-10-16 07:19:28 +10:00
Daniel Schürmann	fad10b91a6	nir/divergence: don't assume that load_sample_positions_amd is always uniform Sample positions aren't uniform when the sample id is divergent. This was a regression when we started lowering fragment shader barycentrics in NIR. Fixes: `7f444fc72c` ("nir: add nir_intrinsic_load_sample_positions_amd") Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>	2025-10-14 16:33:10 +00:00
Job Noorman	0b82b803d9	nir,ir3: rename umul_low to umul_16x16 This is more in line with similar opcodes like umul_32x16. Also change its const expr: the masking based on bit size was unnecessary as it is only defined for 32 bits. Use simple casts instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>	2025-10-14 12:54:54 +00:00
Georg Lehmann	92d670021a	nir/opt_intrinsics: optimize atomics to atomic load/store Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	e98b218ddd	nir/opt_intrinsics: don't pass nir options around Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	142079759b	nir/opt_uniform_atomics: optimize xchg with uniform address and data Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:16 +00:00
Georg Lehmann	e08911dff4	nir: fix nir_get_io_offset_src for global_atomic_swap_amd Fixes: `354df09c88` ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>	2025-10-13 09:41:41 +00:00
Ian Romanick	1e691e68e2	nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181254 -> 17181046 (<.01%) instructions in affected programs: 35834 -> 35626 (-0.58%) helped: 130 / HURT: 2 total cycles in shared programs: 888543370 -> 888554248 (<.01%) cycles in affected programs: 7443984 -> 7454862 (0.15%) helped: 95 / HURT: 87 fossil-db: Lunar Lake Totals: Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00% Totals from 6842 (0.87% of 790721) affected shaders: Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00% Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19% Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00% Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01% Spill count: 530380 -> 530084 (-0.06%) Fill count: 613416 -> 612900 (-0.08%) Scratch Memory Size: 20089856 -> 20075520 (-0.07%) Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 8034616 -> 8034584 (-0.00%) Totals from 6653 (0.73% of 905545) affected shaders: Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00% Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40% Spill count: 1953 -> 1657 (-15.16%) Fill count: 3556 -> 3040 (-14.51%) Scratch Memory Size: 92160 -> 77824 (-15.56%) Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 55768 -> 55736 (-0.06%) No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	f7939f2fdc	nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub I noticed some things related to this while implementing support for bitfield_select / BFN in BRW. shader-db: Lunar Lake total instructions in shared programs: 17183140 -> 17183128 (<.01%) instructions in affected programs: 3830 -> 3818 (-0.31%) helped: 6 / HURT: 0 total cycles in shared programs: 889936934 -> 889936056 (<.01%) cycles in affected programs: 253758 -> 252880 (-0.35%) helped: 4 / HURT: 2 No shader-db changes on any other Intel platform. fossil-db: Lunar Lake Totals: Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738646 -> 71738626 (-0.00%) Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%) Totals from 177 (0.02% of 790723) affected shaders: Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00% Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01% Max live registers: 39364 -> 39344 (-0.05%) Non SSA regs after NIR: 70453 -> 70455 (+0.00%) Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095611 -> 264095358 (-0.00%) Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00% Fill count: 613233 -> 613231 (-0.00%) Totals from 123 (0.01% of 905547) affected shaders: Instrs: 334830 -> 334577 (-0.08%) Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22% Fill count: 4145 -> 4143 (-0.05%) Tiger Lake and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 269733849 -> 269733590 (-0.00%) Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01% Totals from 123 (0.01% of 903812) affected shaders: Instrs: 338617 -> 338358 (-0.08%) Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40% Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	c49d6e0480	nir/algebraic: Elide range clamping of f2u sources There are no shader-db changes on ELK platforms because those platforms don't support 8- or 16-bit integer types. v2: Restrict patterns generated such that the integer limits are exactly representable in the specified floating point format. With the exception of the value 0, this requires that float_sz > int_sz. This had no impact on shader-db or fossil-db on any Intel platform. Noticed by Georg. v3: Add a missing is_a_number. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 889936056 -> 889934082 (<.01%) cycles in affected programs: 65806 -> 63832 (-3.00%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00% Spill count: 519861 -> 519813 (-0.01%) Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01% Max live registers: 71738626 -> 71738696 (+0.00%) Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%) Totals from 1236 (0.16% of 790723) affected shaders: Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01% Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25% Spill count: 16509 -> 16461 (-0.29%) Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15% Max live registers: 250112 -> 250182 (+0.03%) Non SSA regs after NIR: 302368 -> 302114 (-0.08%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00% Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01% Spill count: 530603 -> 530615 (+0.00%) Fill count: 613231 -> 613273 (+0.01%) Max live registers: 46559041 -> 46559087 (+0.00%) Totals from 1237 (0.14% of 905547) affected shaders: Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01% Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37% Spill count: 17518 -> 17530 (+0.07%) Fill count: 32273 -> 32315 (+0.13%) Max live registers: 128360 -> 128406 (+0.04%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00% Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00% Max live registers: 46878430 -> 46878462 (+0.00%) Totals from 1233 (0.14% of 905427) affected shaders: Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00% Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10% Max live registers: 130782 -> 130814 (+0.02%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	986086c846	nir: Add saturating float to integer conversion opcodes v2: Add a comment around has_f2[ui]_sat explaining which opcodes it enables. Suggested by Georg. Cast u_uintN_max and friends to double in nir_opcodes.py. This ensures that an exact conversion is made. Eliminate duplicate conversions from half float to double. Both noticed by Georg. v3: Apply "NaN should be zero" fix suggested by Georg. Co-authored-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:05 +00:00
Lionel Landwerlin	301b71a19f	compiler: add an access flag for intel EU fusion Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	c7ac46a1d8	nir/lower_io: add get_io_index_src_number support for image intrinsics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	ca1533cd03	nir/divergence: add a new mode to cover fused threads on Intel HW The Intel Gfx12.x generation of GPU has an architecture feature called EU fusion in which 2 subgroups run lock step. A typical case where this happens is a compute shader with 1x1x1 local workgroup size and a dispatch command of 2x1x1. In that case 2 threads will be run in lock step for each of the workgroup. This has been the sources of some troubles in the backend because one subgroup can run with all lanes disabled, requiring care for SEND messages using the NoMask flag (execution regardless of the lane mask). We found out that other things are happening when 2 subgroups run together : - the HW will use the surface/sampler handle from only one subgroup - the HW will use the sampler header from only one subgroup So one of the fused subgroup can access the wrong surface/sampler if the value is different between the 2 subgroups and that can happen even with subgroup uniform values. Fortunately we can flag SEND instructions to disable the fusion behavior (most likely at a performance cost). This change introduce a new divergence mode that tries to compute things divergent between subgroups so that we can flag instructions accordingly. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Simon Perretta	79923115e7	nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_recompute_io_bases will modify i/o intrinsics, which is not the expected behaviour when the keep_intrinsics flag is set. Fixes: `83aecc8f3f` ("mesa/st, nir: commonize unlower_io_to_vars pass") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>	2025-10-10 11:53:24 +01:00

1 2 3 4 5 ...

6746 commits