fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-31 07:00:27 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	aa929ea706	nir/lower_io: add missing levels intrinsics to get_io_index_src_number Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c7ac46a1d8` ("nir/lower_io: add get_io_index_src_number support for image intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>	2025-10-22 21:21:47 +00:00
Simon Perretta	ff51e6dc9e	nir: commonize barycentric intrinsic opt pass Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Introduces an opt pass that attempts to optimize load_barycentric_at_{sample,offset} with simpler load_barycentric_* equivalents where possible, and optionally lowers load_barycentric_at_sample to load_barycentric_at_offset with a position derived from the sample ID instead. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>	2025-10-22 16:48:01 +00:00
Olivia Lee	a410d90fd2	panfrost: fix cl_local_size for precompiled shaders Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_lower_compute_system_values will attempt to lower load_workgroup_size unless workgroup_size_variable is set. For precomp shaders, the workgroup size is set statically for each entrypoint by nir_precompiled_build_variant. Because we call lower_compute_system_values early, it sets the workgroup size to zero. Temporarily setting workgroup_size_variable while we are still processing all the entrypoints together inhibits this. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Fixes: `20970bcd96` ("panfrost: Add base of OpenCL C infrastructure") Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>	2025-10-22 00:15:49 +00:00
Rhys Perry	64ec757688	nir/lower_mem_access_bit_sizes: increase chunk limit Not sure about creating u64vec16 loads, but creating unaligned loads is possible with opt_if_rewrite_uniform_uses. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>	2025-10-21 22:10:34 +00:00
Georg Lehmann	cf4ab485ea	nir: remove manual nir_load_global_constant Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:39:53 +02:00
Georg Lehmann	654bd74c60	treewide: use nir_store_global alias of nir_build_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	2306cba65b	nir: remove manual nir_store_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	9e41a7c139	treewide: use nir_load_global alias of nir_build_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Georg Lehmann	77540cac8c	nir: remove manual nir_load_global Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959>	2025-10-21 12:37:58 +02:00
Lionel Landwerlin	255d1e883d	nir/divergence: fix handling of intel uniform block load Those are normally uniform always, but for the purpose of fused threads handling, we need to check their sources. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ca1533cd03` ("nir/divergence: add a new mode to cover fused threads on Intel HW") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>	2025-10-21 06:13:10 +00:00
Emma Anholt	0781edc30f	nir/copy_prop_vars: Mask out no-op writes to variables. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The pass previously supported removing complete no-op writes, but we can do better by noticing if any channel being written is the current channel's value and masking off those writes. I noticed this happening in Stray's fragment shader, where it looked like some translation layer had turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a, b). This in turn lets nir_shrink_vec_array_vars be more effective. Totals: MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01% Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00% CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00% STPs: 369810 -> 234618 (-36.56%) LDPs: 209430 -> 172011 (-17.87%) Totals from 1884 (0.12% of 1560230) affected shaders: MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15% Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06% CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05% NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58% MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59% COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83% Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31% (ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68% (sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18% (ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57% (sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12% STPs: 212045 -> 76853 (-63.76%) LDPs: 47337 -> 9918 (-79.05%) Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13% Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43% Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81% Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04% Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11% Cat6: 461411 -> 352375 (-23.63%) Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	537cc4e0ff	nir/shrink_stores: Don't shrink stores to an invalid num_components. Avoids a regression in the CL CTS on the next commit. Fixes: `2dba7e6056` ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	d8690f9c60	nir/link_opt_varyings: Make it participate in NIR_DEBUG=print. It's a pass with major effects on shaders, and it's otherwise weird to see your varying disappear between two passes that shouldn't affect them. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Aitor Camacho	f711c3afed	nir: Add KosmicKrisp required utilities Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>	2025-10-20 16:22:00 +00:00
Job Noorman	ad421cdf2e	nir: mark fneg distribution through fadd/ffma as nsz Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `df1876f615` ("nir: Mark negative re-distribution on fadd as imprecise") fixed the fadd case by marking it as imprecise. This commit fixes the ffma case for the same reason. However, "imprecise" isn't necessary and nowadays we have "nsz" which is more accurate here. Use that for both fadd and ffma. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `62795475e8` ("nir/algebraic: Distribute source modifiers into instructions") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>	2025-10-17 08:58:59 +00:00
Mary Guillemard	6f73533094	asahi,nir: Stop relying on zero and scratch page in GS/TESS code Introduce new NIR intrinsics to handle getting a "sink" read-only address and another intrinsic to handle conversion of address to read-write (allowing implementation to replace the "sink" read-only with another address like required for Asahi) Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Mary Guillemard	1e0c18d6cf	nir: Rename stat_query_address_agx to stat_query_address_poly This is used by the geometry lowering that we are going to move to common code. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig	84d8e6824b	treewide: don't check before free Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This was something that came up in the slop MR. Not sure it's actually a good idea or not but kind of curious what people think, given we have a sound tool (Coccinelle) to do the transform. Saves a redundant branch but means extra noninlined function calls.. likely no actual perf impact but saves some code. Via Coccinelle patches: @@ expression ptr; @@ -if (ptr) { -free(ptr); -} +free(ptr); @@ expression ptr; @@ -if (ptr) { -FREE(ptr); -} +FREE(ptr); @@ expression ptr; @@ -if (ptr) { -ralloc_free(ptr); -} +ralloc_free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -free(ptr); -} - +free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -FREE(ptr); -} - +FREE(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -ralloc_free(ptr); -} - +ralloc_free(ptr); Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d] Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus] Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr] Reviewed-by: Janne Grunau <j@jannau.net> [asahi] Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv] Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3] Acked-by: Marek Olšák <maraeo@gmail.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Job Noorman <jnoorman@igalia.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>	2025-10-15 23:01:33 +00:00
Dave Airlie	543c9be87a	nir/coopmat: fix non square load/store lowering for flexible dimensions This shouldn't affect radv, but we should do the calculations correctly for when non-square matters. Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>	2025-10-16 07:19:28 +10:00
Daniel Schürmann	fad10b91a6	nir/divergence: don't assume that load_sample_positions_amd is always uniform Sample positions aren't uniform when the sample id is divergent. This was a regression when we started lowering fragment shader barycentrics in NIR. Fixes: `7f444fc72c` ("nir: add nir_intrinsic_load_sample_positions_amd") Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>	2025-10-14 16:33:10 +00:00
Job Noorman	0b82b803d9	nir,ir3: rename umul_low to umul_16x16 This is more in line with similar opcodes like umul_32x16. Also change its const expr: the masking based on bit size was unnecessary as it is only defined for 32 bits. Use simple casts instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>	2025-10-14 12:54:54 +00:00
Georg Lehmann	92d670021a	nir/opt_intrinsics: optimize atomics to atomic load/store Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	e98b218ddd	nir/opt_intrinsics: don't pass nir options around Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	142079759b	nir/opt_uniform_atomics: optimize xchg with uniform address and data Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:16 +00:00
Georg Lehmann	e08911dff4	nir: fix nir_get_io_offset_src for global_atomic_swap_amd Fixes: `354df09c88` ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>	2025-10-13 09:41:41 +00:00
Ian Romanick	1e691e68e2	nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181254 -> 17181046 (<.01%) instructions in affected programs: 35834 -> 35626 (-0.58%) helped: 130 / HURT: 2 total cycles in shared programs: 888543370 -> 888554248 (<.01%) cycles in affected programs: 7443984 -> 7454862 (0.15%) helped: 95 / HURT: 87 fossil-db: Lunar Lake Totals: Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00% Totals from 6842 (0.87% of 790721) affected shaders: Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00% Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19% Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00% Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01% Spill count: 530380 -> 530084 (-0.06%) Fill count: 613416 -> 612900 (-0.08%) Scratch Memory Size: 20089856 -> 20075520 (-0.07%) Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 8034616 -> 8034584 (-0.00%) Totals from 6653 (0.73% of 905545) affected shaders: Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00% Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40% Spill count: 1953 -> 1657 (-15.16%) Fill count: 3556 -> 3040 (-14.51%) Scratch Memory Size: 92160 -> 77824 (-15.56%) Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 55768 -> 55736 (-0.06%) No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	f7939f2fdc	nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub I noticed some things related to this while implementing support for bitfield_select / BFN in BRW. shader-db: Lunar Lake total instructions in shared programs: 17183140 -> 17183128 (<.01%) instructions in affected programs: 3830 -> 3818 (-0.31%) helped: 6 / HURT: 0 total cycles in shared programs: 889936934 -> 889936056 (<.01%) cycles in affected programs: 253758 -> 252880 (-0.35%) helped: 4 / HURT: 2 No shader-db changes on any other Intel platform. fossil-db: Lunar Lake Totals: Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738646 -> 71738626 (-0.00%) Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%) Totals from 177 (0.02% of 790723) affected shaders: Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00% Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01% Max live registers: 39364 -> 39344 (-0.05%) Non SSA regs after NIR: 70453 -> 70455 (+0.00%) Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095611 -> 264095358 (-0.00%) Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00% Fill count: 613233 -> 613231 (-0.00%) Totals from 123 (0.01% of 905547) affected shaders: Instrs: 334830 -> 334577 (-0.08%) Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22% Fill count: 4145 -> 4143 (-0.05%) Tiger Lake and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 269733849 -> 269733590 (-0.00%) Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01% Totals from 123 (0.01% of 903812) affected shaders: Instrs: 338617 -> 338358 (-0.08%) Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40% Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	c49d6e0480	nir/algebraic: Elide range clamping of f2u sources There are no shader-db changes on ELK platforms because those platforms don't support 8- or 16-bit integer types. v2: Restrict patterns generated such that the integer limits are exactly representable in the specified floating point format. With the exception of the value 0, this requires that float_sz > int_sz. This had no impact on shader-db or fossil-db on any Intel platform. Noticed by Georg. v3: Add a missing is_a_number. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 889936056 -> 889934082 (<.01%) cycles in affected programs: 65806 -> 63832 (-3.00%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00% Spill count: 519861 -> 519813 (-0.01%) Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01% Max live registers: 71738626 -> 71738696 (+0.00%) Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%) Totals from 1236 (0.16% of 790723) affected shaders: Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01% Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25% Spill count: 16509 -> 16461 (-0.29%) Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15% Max live registers: 250112 -> 250182 (+0.03%) Non SSA regs after NIR: 302368 -> 302114 (-0.08%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00% Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01% Spill count: 530603 -> 530615 (+0.00%) Fill count: 613231 -> 613273 (+0.01%) Max live registers: 46559041 -> 46559087 (+0.00%) Totals from 1237 (0.14% of 905547) affected shaders: Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01% Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37% Spill count: 17518 -> 17530 (+0.07%) Fill count: 32273 -> 32315 (+0.13%) Max live registers: 128360 -> 128406 (+0.04%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00% Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00% Max live registers: 46878430 -> 46878462 (+0.00%) Totals from 1233 (0.14% of 905427) affected shaders: Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00% Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10% Max live registers: 130782 -> 130814 (+0.02%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	986086c846	nir: Add saturating float to integer conversion opcodes v2: Add a comment around has_f2[ui]_sat explaining which opcodes it enables. Suggested by Georg. Cast u_uintN_max and friends to double in nir_opcodes.py. This ensures that an exact conversion is made. Eliminate duplicate conversions from half float to double. Both noticed by Georg. v3: Apply "NaN should be zero" fix suggested by Georg. Co-authored-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:05 +00:00
Lionel Landwerlin	301b71a19f	compiler: add an access flag for intel EU fusion Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	c7ac46a1d8	nir/lower_io: add get_io_index_src_number support for image intrinsics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	ca1533cd03	nir/divergence: add a new mode to cover fused threads on Intel HW The Intel Gfx12.x generation of GPU has an architecture feature called EU fusion in which 2 subgroups run lock step. A typical case where this happens is a compute shader with 1x1x1 local workgroup size and a dispatch command of 2x1x1. In that case 2 threads will be run in lock step for each of the workgroup. This has been the sources of some troubles in the backend because one subgroup can run with all lanes disabled, requiring care for SEND messages using the NoMask flag (execution regardless of the lane mask). We found out that other things are happening when 2 subgroups run together : - the HW will use the surface/sampler handle from only one subgroup - the HW will use the sampler header from only one subgroup So one of the fused subgroup can access the wrong surface/sampler if the value is different between the 2 subgroups and that can happen even with subgroup uniform values. Fortunately we can flag SEND instructions to disable the fusion behavior (most likely at a performance cost). This change introduce a new divergence mode that tries to compute things divergent between subgroups so that we can flag instructions accordingly. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Simon Perretta	79923115e7	nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nir_recompute_io_bases will modify i/o intrinsics, which is not the expected behaviour when the keep_intrinsics flag is set. Fixes: `83aecc8f3f` ("mesa/st, nir: commonize unlower_io_to_vars pass") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>	2025-10-10 11:53:24 +01:00
Job Noorman	6d59a3e3e7	nir/lower_alu: use Knuth's Algorithm M for [iu]mul_high Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This significantly simplifies the handling of signed numbers as the same code path can handle signed and unsigned numbers by simply using ishr instead of ushr for some of the shifts. For both cases, the number of additions and shifts are also reduced. Note that LLVM uses the same algorithm. fossil-db stats for Turnip: Totals from 4849 (2.94% of 164705) affected shaders: MaxWaves: 52318 -> 52332 (+0.03%); split: +0.04%, -0.02% Instrs: 5262458 -> 5218922 (-0.83%); split: -0.87%, +0.05% CodeSize: 10831900 -> 10655170 (-1.63%); split: -1.64%, +0.01% NOPs: 829481 -> 836010 (+0.79%); split: -0.95%, +1.74% MOVs: 176187 -> 173788 (-1.36%); split: -3.27%, +1.91% COVs: 104096 -> 86543 (-16.86%); split: -16.87%, +0.01% Full: 90434 -> 90158 (-0.31%); split: -0.33%, +0.03% (ss): 131091 -> 130866 (-0.17%); split: -0.87%, +0.70% (sy): 55550 -> 55769 (+0.39%); split: -0.92%, +1.32% (ss)-stall: 406003 -> 407194 (+0.29%); split: -1.10%, +1.39% (sy)-stall: 1668213 -> 1678082 (+0.59%); split: -1.31%, +1.90% Preamble Instrs: 1105270 -> 1067290 (-3.44%); split: -3.50%, +0.06% Constlen: 423776 -> 423560 (-0.05%) Last helper: 1038202 -> 1035540 (-0.26%); split: -0.42%, +0.16% Last baryf: 38908 -> 38632 (-0.71%) Subgroup size: 336640 -> 336832 (+0.06%) Cat0: 916209 -> 922848 (+0.72%); split: -0.87%, +1.59% Cat1: 282813 -> 262845 (-7.06%); split: -7.49%, +0.43% Cat2: 2198715 -> 2183012 (-0.71%); split: -0.72%, +0.01% Cat3: 1390914 -> 1376421 (-1.04%) Cat7: 123127 -> 123116 (-0.01%); split: -0.24%, +0.23% Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Job Noorman	18f69890d1	nir: add nir_shr builder Sometimes we need to select between ishr/ushr based some condition; this builder makes this less verbose. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Emma Anholt	d01aae2fb1	nir: Add a shader bisect tool. When you're trying to figure out what shader some NIR pass broke, use nir_shader_bisect_select() to decide between NIR pass behaviors, and then nir_shader_bisect.py will help you automatically bisect down to which source_blake3 is at fault. Once it's identified, it prints you a C call you can use for selecting that shader specifically, which you can use for continuing on in your debugging. On a test I was looking at, this took 10 steps to bisect 134 shaders down to the source_blake3 of the NIR shader in question. This idea is heavily lifted from Job Noorman's ir3_shader_bisect. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>	2025-10-09 17:56:30 +00:00
Alyssa Rosenzweig	c1d75c6e51	treewide: use BITSET_CALLOC Via Coccinelle patch: @@ expression count; type T; @@ -calloc(BITSET_WORDS(count), sizeof(T)) +BITSET_CALLOC(count) @@ expression count; type T; @@ -calloc(sizeof(T), BITSET_WORDS(count)) +BITSET_CALLOC(count) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37779>	2025-10-09 12:29:55 +00:00
Romaric Jodin	cb86341829	meson: remove '--outdir' argument in script Usage of '--outdir' argument in python scripts makes it very complicated for tools like ninja-to-soong to generate the Android equivalent build file. This is because the option is less clear on what will be generated. Instead, change it for '--out' where we give the full path of the file to generate. This has the good point of deduplicating the locations of the file name to have it only in 'meson.build'. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>	2025-10-08 20:51:20 +00:00
Marek Olšák	3fe651f607	nir: remove load_smem_amd replaced by load_global_amd + ACCESS_SMEM_AMD Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:54:11 +00:00
Daniel Schürmann	7593667b0a	nir/divergence_analysis: check ACCESS_SMEM_AMD Revert "nir/divergence: make smem load_global_amd uniform" This reverts commit `2d0f93631c`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:53:55 +00:00
Daniel Schürmann	cacb390ec9	nir/load_store_vectorize: Fix parsing offsets through u2u64 Fixes: `cfba417316` ('nir/load_store_vectorize: optimize accesses with u2u64(ishl.nuw(iadd))') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:53:51 +00:00
Rhys Perry	8fba196164	nir: assume non-atomic loads don't tear Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Rhys Perry	0dd09a292b	nir: add ACCESS_ATOMIC This is so that passes and backends can tell if a coherent load/store is atomic or not, instead of having to assume it could be either. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Samuel Pitoiset	e868e8d946	nir: adjust nir_tex_instr_need_sampler() for AMD FMASK instructions These instructions don't need a sampler. This doesn't fix anything now because this helper isn't unused yet, but it will help for descriptor heap. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37720>	2025-10-07 15:22:47 +00:00
Georg Lehmann	84f26ed117	nir: optimize atomic isub if supported Foz-DB Navi48: Totals from 1 (0.00% of 80287) affected shaders: Instrs: 1641 -> 1637 (-0.24%) CodeSize: 8472 -> 8456 (-0.19%) Latency: 19132 -> 19131 (-0.01%) InvThroughput: 9566 -> 9565 (-0.01%) Copies: 126 -> 125 (-0.79%) VALU: 565 -> 563 (-0.35%) SALU: 439 -> 438 (-0.23%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00
Georg Lehmann	b0d3db3733	nir: add atomic isub Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00

1 2 3 4 5 ...

6733 commits