fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 02:38:07 +02:00

Author	SHA1	Message	Date
Georg Lehmann	442daeb54a	nir/opt_algebraic: use fcanonicalize Mostly optimizations, some minor fixes but I don't think they are worth backporting. Foz-DB Navi21: Totals from 7570 (9.21% of 82151) affected shaders: MaxWaves: 204288 -> 204476 (+0.09%); split: +0.09%, -0.00% Instrs: 4511439 -> 4500261 (-0.25%); split: -0.25%, +0.00% CodeSize: 23727088 -> 23644388 (-0.35%); split: -0.35%, +0.00% VGPRs: 290944 -> 290616 (-0.11%); split: -0.12%, +0.01% SpillSGPRs: 1256 -> 1251 (-0.40%) Latency: 16738072 -> 16726717 (-0.07%); split: -0.10%, +0.04% InvThroughput: 3736856 -> 3716631 (-0.54%); split: -0.55%, +0.01% VClause: 66150 -> 66156 (+0.01%); split: -0.05%, +0.06% SClause: 93644 -> 93631 (-0.01%); split: -0.02%, +0.01% Copies: 448816 -> 458584 (+2.18%); split: -0.05%, +2.22% Branches: 139817 -> 139775 (-0.03%); split: -0.03%, +0.00% PreSGPRs: 321922 -> 321900 (-0.01%); split: -0.01%, +0.00% PreVGPRs: 239709 -> 238856 (-0.36%); split: -0.39%, +0.03% VALU: 2595164 -> 2584250 (-0.42%); split: -0.43%, +0.01% SALU: 839038 -> 838965 (-0.01%); split: -0.02%, +0.01% VMEM: 137584 -> 137583 (-0.00%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>	2026-01-19 16:11:29 +00:00
Rhys Perry	625afb0d29	nir: add fcanonicalize v2(Georg Lehmann): Always remove fcanonicalize if denorms must be neither flushed nor preserved. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>	2026-01-19 16:11:29 +00:00
Eric Engestrom	30c2e6dbf2	nir/meson: drop redundant --build-tests in favour of just checking if --out-tests is set Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39350>	2026-01-16 16:55:21 +00:00
Eric Engestrom	246095da49	nir/meson: only try to generate the nir_opt_algebraic tests when requested Anything listed in a meson target's `output` is expected to exist once the command has run. If it's missing, meson/ninja will run the command again to try to generate it, resulting in a ton of files getting re-generated/re-compiled for no reason. Fixes: `4c30c44b75` ("nir: Generate unit tests for nir_opt_algebraic") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14667 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39350>	2026-01-16 16:55:21 +00:00
Konstantin Seurer	4c30c44b75	nir: Generate unit tests for nir_opt_algebraic This catches a number of bugs in the current NIR algebraic optimizations or opcodes implementations (as fixed in this series, or documented in the XFAIL tests), and should prevent many future bugs from landing. This required bumping the test timeout, because s390x is very slow to emulate in CI. Closes: #3338 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>	2026-01-15 19:09:43 +00:00
Emma Anholt	df215cc3cd	nir/opt_algebraic_tests: Mark patterns as unsupported or xfails. This way as a pattern author/editor you can immediately see whether it's getting test coverage and if there are known issues with the pattern. This will also give us clear outcomes from testing as we fix failing patterns. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>	2026-01-15 19:09:43 +00:00
Georg Lehmann	93d05cdfd8	nir/opt_algebraic: move fsat last for fsqrt(fsat(a)) This should be exact, even for all special values: fsqrt(NaN) -> NaN fsqrt(-0.0) -> 0.0 fsqrt(-Inf) -> NaN fsqrt(negative finite) -> NaN So all of these get saturated to +0.0 All numbers >= 1.0 will have a square root >= 1.0, which will be saturate to 1.0 Moving the fsat guarantees that it can use an output modifier for hardware that has those, and shouldn't harm other hardware either. Foz-DB Navi21: Totals from 255 (0.31% of 82151) affected shaders: Instrs: 664906 -> 664194 (-0.11%) CodeSize: 3623500 -> 3619188 (-0.12%) Latency: 11336397 -> 11335688 (-0.01%); split: -0.01%, +0.00% InvThroughput: 2716430 -> 2715726 (-0.03%); split: -0.03%, +0.00% VALU: 442603 -> 441891 (-0.16%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39202>	2026-01-09 07:34:46 +00:00
Ian Romanick	d4a87e85b3	nir/algebraic: Add missing f on F-strings Without this, nir_algebraic.py was treating "f2i{int_sz}_sat" as the literal opcode name when it should have been "f2i8_sat" or similar. Fixes: `c49d6e0480` ("nir/algebraic: Elide range clamping of f2u sources") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39031>	2026-01-08 13:19:35 -08:00
Konstantin Seurer	a8224e3e00	nir/opt_algebraic: Do not emit patterns for 64bit booleans Avoids assertion failures trying to constant-evaluate the pattern with the new nir_opt_algebraic_pattern_tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:48 +00:00
Konstantin Seurer	211c7db8e3	nir/opt_algebraic: Remove a pattern for 8bit floats Avoids assertion failures trying to constant-evaluate the pattern with the new nir_opt_algebraic_pattern_tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:48 +00:00
Emma Anholt	afece95101	nir/opt_algebraic: Fix return type of fdot(vec(a, 0.0, ...), b). The replace pattern was generating a vector when it should have been scalar. Fixes validation failures with the new algebraic unit tests. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>	2026-01-06 21:27:47 +00:00
Georg Lehmann	c8ce0df2d2	nir/opt_algebraic: replace is_negative_zero with constant -0.0 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Now that nir_search respects the sign of zero, we don't need a manual helper for this. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Georg Lehmann	7d2a946730	nir/opt_algebraic: canonicalize scmp with -0.0 We already do this for non fused comparisons. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Georg Lehmann	2824c12252	nir/opt_algebraic: explicitly add some -0.0 variants of patterns Foz-DB Navi21: Totals from 5 (0.00% of 125360) affected shaders: CodeSize: 28812 -> 28744 (-0.24%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Pavel Ondračka	0b39b5ea63	nir/opt_algebraic: improve dot product narrowing Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The issue is that the current narrowing patterns are not working in a lot of cases, for example (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)), is missing patterns like this: 32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000) 32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0) 32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz or after some later transforms: 32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000) 32x2 %6 = vec2 %5, %1 (0x0) 32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy This patch is heavily based on old branch from Ian Romanick from 2019. r300 RV530 shader-db: total instructions in shared programs: 128900 -> 128882 (-0.01%) instructions in affected programs: 621 -> 603 (-2.90%) helped: 10 HURT: 1 total cycles in shared programs: 191837 -> 191828 (<.01%) cycles in affected programs: 799 -> 790 (-1.13%) helped: 7 HURT: 1 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>	2026-01-02 16:07:10 +01:00
Ian Romanick	66fd4d72fd	nir/algebraic: Mask with shifted constant instead of shift-then-mask shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17088766 -> 17088765 (<.01%) instructions in affected programs: 1375 -> 1374 (-0.07%) helped: 1 / HURT: 1 total cycles in shared programs: 887873068 -> 887871748 (<.01%) cycles in affected programs: 136402 -> 135082 (-0.97%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00% Subgroup size: 40937696 -> 40937728 (+0.00%) Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00% Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00% Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01% Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 49078640 -> 49078656 (+0.00%) Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00% Totals from 13809 (0.68% of 2019450) affected shaders: Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02% Subgroup size: 32 -> 64 (+100.00%) Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25% Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24% Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88% Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02% Max dispatch width: 234848 -> 234864 (+0.01%) Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00% Subgroup size: 27452928 -> 27452944 (+0.00%) Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01% Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02% Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05% Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37811544 -> 37811584 (+0.00%) Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00% Totals from 14438 (0.63% of 2281134) affected shaders: Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06% Subgroup size: 16 -> 32 (+100.00%) Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55% Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46% Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37% Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02% Max dispatch width: 123528 -> 123568 (+0.03%) Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00% Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01% Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01% Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01% Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37805336 -> 37805472 (+0.00%) Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00% Totals from 14835 (0.65% of 2278397) affected shaders: Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02% Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71% Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76% Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54% Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02% Max dispatch width: 126600 -> 126736 (+0.11%) Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>	2025-12-17 18:38:55 +00:00
Georg Lehmann	653716b745	nir/opt_algebraic: create more bit test Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Helps hackends with has_bit_test more (i.e. ACO), but it shouldn't hurt others either. Foz-DB Navi21: Totals from 1138 (1.17% of 97591) affected shaders: Instrs: 5478747 -> 5476055 (-0.05%); split: -0.05%, +0.00% CodeSize: 29850188 -> 29853140 (+0.01%); split: -0.04%, +0.05% SpillSGPRs: 1406 -> 1401 (-0.36%) Latency: 42324245 -> 42325921 (+0.00%); split: -0.01%, +0.01% InvThroughput: 11396940 -> 11394048 (-0.03%); split: -0.04%, +0.01% VClause: 142294 -> 142309 (+0.01%); split: -0.00%, +0.01% SClause: 124412 -> 124411 (-0.00%); split: -0.00%, +0.00% Copies: 572696 -> 572749 (+0.01%); split: -0.02%, +0.03% Branches: 199932 -> 199929 (-0.00%) PreSGPRs: 73372 -> 74970 (+2.18%) PreVGPRs: 79514 -> 79511 (-0.00%) VALU: 3628764 -> 3625744 (-0.08%); split: -0.08%, +0.00% SALU: 818258 -> 818475 (+0.03%); split: -0.03%, +0.06% Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38700>	2025-11-28 13:25:24 +00:00
Georg Lehmann	9ef0c96f26	nir/opt_algebraic: optimize open coded pack_32_2x16 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 4 (0.00% of 80287) affected shaders: Instrs: 6231 -> 6101 (-2.09%) CodeSize: 35916 -> 35156 (-2.12%) Latency: 72190 -> 71317 (-1.21%) InvThroughput: 20817 -> 19962 (-4.11%) VALU: 3145 -> 3029 (-3.69%) VOPD: 310 -> 312 (+0.65%) Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37937>	2025-11-10 19:04:32 +00:00
Ian Romanick	f1bbc3d4e4	nir/algebraic: Don't generate integer min or max that will need to be lowered Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In !35844, there was some discussion about allowing 64-bit bcsel that would be lowered in the driver. One challenge there would be if a 64-bit bcsel was transformed into integer min or max by an algebraic optimization. I believe these were the only algebraic patterns that could create new integer min or max that would not be immediately constant folded. There were no shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>	2025-10-23 22:35:27 +00:00
Job Noorman	ad421cdf2e	nir: mark fneg distribution through fadd/ffma as nsz Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `df1876f615` ("nir: Mark negative re-distribution on fadd as imprecise") fixed the fadd case by marking it as imprecise. This commit fixes the ffma case for the same reason. However, "imprecise" isn't necessary and nowadays we have "nsz" which is more accurate here. Use that for both fadd and ffma. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `62795475e8` ("nir/algebraic: Distribute source modifiers into instructions") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>	2025-10-17 08:58:59 +00:00
Job Noorman	0b82b803d9	nir,ir3: rename umul_low to umul_16x16 This is more in line with similar opcodes like umul_32x16. Also change its const expr: the masking based on bit size was unnecessary as it is only defined for 32 bits. Use simple casts instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>	2025-10-14 12:54:54 +00:00
Ian Romanick	1e691e68e2	nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181254 -> 17181046 (<.01%) instructions in affected programs: 35834 -> 35626 (-0.58%) helped: 130 / HURT: 2 total cycles in shared programs: 888543370 -> 888554248 (<.01%) cycles in affected programs: 7443984 -> 7454862 (0.15%) helped: 95 / HURT: 87 fossil-db: Lunar Lake Totals: Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00% Totals from 6842 (0.87% of 790721) affected shaders: Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00% Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19% Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00% Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01% Spill count: 530380 -> 530084 (-0.06%) Fill count: 613416 -> 612900 (-0.08%) Scratch Memory Size: 20089856 -> 20075520 (-0.07%) Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 8034616 -> 8034584 (-0.00%) Totals from 6653 (0.73% of 905545) affected shaders: Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00% Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40% Spill count: 1953 -> 1657 (-15.16%) Fill count: 3556 -> 3040 (-14.51%) Scratch Memory Size: 92160 -> 77824 (-15.56%) Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 55768 -> 55736 (-0.06%) No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	c49d6e0480	nir/algebraic: Elide range clamping of f2u sources There are no shader-db changes on ELK platforms because those platforms don't support 8- or 16-bit integer types. v2: Restrict patterns generated such that the integer limits are exactly representable in the specified floating point format. With the exception of the value 0, this requires that float_sz > int_sz. This had no impact on shader-db or fossil-db on any Intel platform. Noticed by Georg. v3: Add a missing is_a_number. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 889936056 -> 889934082 (<.01%) cycles in affected programs: 65806 -> 63832 (-3.00%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00% Spill count: 519861 -> 519813 (-0.01%) Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01% Max live registers: 71738626 -> 71738696 (+0.00%) Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%) Totals from 1236 (0.16% of 790723) affected shaders: Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01% Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25% Spill count: 16509 -> 16461 (-0.29%) Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15% Max live registers: 250112 -> 250182 (+0.03%) Non SSA regs after NIR: 302368 -> 302114 (-0.08%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00% Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01% Spill count: 530603 -> 530615 (+0.00%) Fill count: 613231 -> 613273 (+0.01%) Max live registers: 46559041 -> 46559087 (+0.00%) Totals from 1237 (0.14% of 905547) affected shaders: Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01% Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37% Spill count: 17518 -> 17530 (+0.07%) Fill count: 32273 -> 32315 (+0.13%) Max live registers: 128360 -> 128406 (+0.04%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00% Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00% Max live registers: 46878430 -> 46878462 (+0.00%) Totals from 1233 (0.14% of 905427) affected shaders: Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00% Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10% Max live registers: 130782 -> 130814 (+0.02%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Gert Wollny	3b3c3ccf56	nir+r600: add option to avoid contracting fabs into ffma Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On r600 ternary operations can't use the fabs source modifier, so converting "fadd(fabs(fmul(a, b), c)" to "ffma(fabs(a), fabs(b), c)" adds one more instruction in the backend, hence avoid this. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37440>	2025-09-17 21:03:58 +00:00
Christian Gmeiner	a7d2570296	nir/opt_algebraic: optimize f2i32(fround_even(x)) to f2i32_rtne(x) Add late optimization to fuse f2i32 and fround_even operations into a single f2i32_rtne instruction when the intermediate fround_even result is only used once. This eliminates redundant rounding since f2i32_rtne performs round-to-nearest-even conversion directly. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Tested-by: Simon Perretta <simon.perretta@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37426>	2025-09-17 20:31:59 +00:00
Karmjit Mahil	9c6183604f	nir, ir3: Add `lower_fmulz_with_abs_min` backend option This commits adds the `lower_fmulz_with_abs_min` which lowers `fmulz` -> `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b` `ffmaz` -> `min(abs(a), abs(b)) == 0.0 ? c : ffma(a, b, c) This is useful for ISAs which have `abs` for free on `min` such as ir3. Adreno A750 Benchmark of 10 runs of 5 DX9 single frame trimmed captures looped 2048 times using u_trace measuring `start_render_pass` to `end_render_pass` results: sysmem: -1.91156%, -2.21791%, -2.02533%, -2.21666%, -2.33272%, -2.67349%, -1.75278%, -2.05923%, -2.26892%, -2.10506% Avg: ~ -2.16% ST.S: ~ 0.25% gmem: -3.61496%, -3.66682%, -3.80901%, -3.51198%, -3.72950%, -3.71413%, -3.64467%, -3.67092%, -3.90640%, -3.83888% Avg: ~ -3.71% ST.S: ~ 0.12% Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>	2025-09-17 15:02:50 +00:00
Karmjit Mahil	8d19ffef0a	nir: Add more matches for `fmulz` In some cases after other passes, `(a == 0.0 ? 0 : b)` can be turned into `(a != 0.0 ? b : 0)`, so let's match those cases too. Also matching `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`. Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>	2025-09-17 15:02:50 +00:00
Daniel Schürmann	c78f1d516c	nir/algebraic: add pattern for (a << #b) * #c => a * (#c << #b) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Totals from 2545 (3.19% of 79839) affected shaders: (Navi48) Instrs: 6371003 -> 6364130 (-0.11%); split: -0.12%, +0.01% CodeSize: 33827548 -> 33812244 (-0.05%); split: -0.06%, +0.01% Latency: 47451755 -> 47430108 (-0.05%); split: -0.05%, +0.00% InvThroughput: 10442450 -> 10437159 (-0.05%); split: -0.05%, +0.00% SClause: 159829 -> 159874 (+0.03%); split: -0.01%, +0.04% Copies: 500725 -> 500721 (-0.00%); split: -0.01%, +0.01% PreSGPRs: 110482 -> 110478 (-0.00%); split: -0.00%, +0.00% PreVGPRs: 147289 -> 147287 (-0.00%); split: -0.00%, +0.00% VALU: 3456135 -> 3454241 (-0.05%); split: -0.06%, +0.01% SALU: 925982 -> 923616 (-0.26%) VOPD: 1243 -> 1212 (-2.49%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37173>	2025-09-06 10:18:42 +00:00
Christoph Pillmayer	f81f3c85e2	nir/opt_algebraic: Convert a + b + a to b + 2a Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This allows fusing into one FMA later. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113>	2025-09-05 11:39:51 +00:00
Georg Lehmann	3b06824e4c	nir/opt_algebraic: optimize some post peephole select patterns Foz-DB GFX1201: Totals from 208 (0.26% of 80287) affected shaders: Instrs: 427684 -> 426834 (-0.20%); split: -0.22%, +0.02% CodeSize: 2232616 -> 2228816 (-0.17%); split: -0.20%, +0.03% Latency: 3993934 -> 3992726 (-0.03%); split: -0.04%, +0.01% InvThroughput: 569055 -> 568622 (-0.08%); split: -0.09%, +0.01% SClause: 12932 -> 12927 (-0.04%) Copies: 22567 -> 22604 (+0.16%); split: -0.47%, +0.63% Branches: 7671 -> 7658 (-0.17%) VALU: 222047 -> 221625 (-0.19%) SALU: 83954 -> 83815 (-0.17%); split: -0.29%, +0.13% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>	2025-08-27 09:45:19 +00:00
Rhys Perry	46da666205	nir/algebraic: allow non-const for iand(iadd()) -> iadd(iand()) fossil-db (gfx1201): Totals from 596 (0.75% of 79839) affected shaders: Instrs: 691926 -> 691819 (-0.02%); split: -0.11%, +0.09% CodeSize: 3675216 -> 3675180 (-0.00%); split: -0.08%, +0.08% VGPRs: 37464 -> 37452 (-0.03%) Latency: 8566849 -> 8563162 (-0.04%); split: -0.09%, +0.05% InvThroughput: 1068038 -> 1063279 (-0.45%); split: -0.46%, +0.01% VClause: 17859 -> 17897 (+0.21%); split: -0.01%, +0.22% SClause: 16704 -> 16735 (+0.19%); split: -0.07%, +0.26% Copies: 45422 -> 45395 (-0.06%); split: -0.15%, +0.09% PreSGPRs: 24345 -> 24351 (+0.02%) PreVGPRs: 29121 -> 29128 (+0.02%) VALU: 349959 -> 348117 (-0.53%); split: -0.54%, +0.01% SALU: 105926 -> 107576 (+1.56%); split: -0.02%, +1.58% VOPD: 252 -> 234 (-7.14%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Rhys Perry	4f83059ac5	nir/algebraic: improve is_unsigned_multiple_of_4 and use it more fossil-db (gfx1201): Totals from 160 (0.20% of 79839) affected shaders: MaxWaves: 4008 -> 3952 (-1.40%) Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00% CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00% VGPRs: 9492 -> 9612 (+1.26%) Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00% InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35% VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25% SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11% Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20% PreSGPRs: 8914 -> 8938 (+0.27%) PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24% VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01% SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07% VMEM: 29365 -> 25327 (-13.75%) VOPD: 864 -> 1013 (+17.25%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Georg Lehmann	1d885fab9c	nir/opt_algebraic: optimize pack_half_rtz of b2f Foz-DB Navi21: Totals from 13 (0.02% of 80255) affected shaders: Instrs: 2313 -> 2306 (-0.30%); split: -0.35%, +0.04% CodeSize: 13452 -> 13480 (+0.21%) Latency: 12066 -> 12013 (-0.44%); split: -0.45%, +0.01% InvThroughput: 2172 -> 2163 (-0.41%) Copies: 112 -> 114 (+1.79%) VALU: 1480 -> 1472 (-0.54%) SALU: 154 -> 155 (+0.65%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	bc3b09c5dd	nir/opt_algebraic: optimize pack_half_rtz of bcsel with constant Foz-DB Navi21: Totals from 448 (0.56% of 80255) affected shaders: Instrs: 345474 -> 344791 (-0.20%); split: -0.20%, +0.00% CodeSize: 1917784 -> 1913324 (-0.23%); split: -0.25%, +0.02% VGPRs: 22344 -> 22416 (+0.32%) Latency: 2320847 -> 2318161 (-0.12%); split: -0.13%, +0.01% InvThroughput: 543008 -> 541722 (-0.24%) SClause: 11450 -> 11459 (+0.08%) Copies: 19991 -> 19949 (-0.21%); split: -0.23%, +0.02% PreSGPRs: 19129 -> 19114 (-0.08%) PreVGPRs: 19695 -> 19696 (+0.01%); split: -0.01%, +0.01% VALU: 257627 -> 256948 (-0.26%) SALU: 30432 -> 30422 (-0.03%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	8512479097	nir/opt_algebraic: create 16bit fmin/fmax if only used by pack_half_2x16_rtz_split Foz-DB Navi21: Totals from 1842 (2.30% of 80066) affected shaders: Instrs: 869152 -> 866751 (-0.28%) CodeSize: 4687316 -> 4682496 (-0.10%); split: -0.14%, +0.03% VGPRs: 75216 -> 75312 (+0.13%) Latency: 7297749 -> 7297929 (+0.00%); split: -0.01%, +0.02% InvThroughput: 1864933 -> 1860706 (-0.23%); split: -0.23%, +0.00% Copies: 52679 -> 52463 (-0.41%) VALU: 665076 -> 662890 (-0.33%) SALU: 56226 -> 56010 (-0.38%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	22afe83473	nir/opt_algebraic: remove fneg around fmin/fmax Foz-DB Navi21: Totals from 282 (0.35% of 80255) affected shaders: Instrs: 310515 -> 309755 (-0.24%) CodeSize: 1721236 -> 1714540 (-0.39%) Latency: 1366446 -> 1365141 (-0.10%); split: -0.10%, +0.00% InvThroughput: 352528 -> 351097 (-0.41%); split: -0.41%, +0.00% Copies: 24623 -> 24630 (+0.03%) VALU: 231716 -> 230951 (-0.33%) SALU: 28774 -> 28779 (+0.02%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	cfd5fbfde1	nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16 Foz-DB Navi31: Totals from 11 out of 14 FSR4 shaders: Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21% CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15% Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00% InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10% VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91% Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47% VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21% VOPD: 1916 -> 1400 (-26.93%) These stats looks bad, but it's actually just unlucky RA. Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16) should still be a win from a register bandwidth perspective. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:30 +00:00
Georg Lehmann	261239a492	nir/opt_algebraic: use range analysis to detect no-op fmin/fmax Foz-DB Navi31: Totals from 418 (0.52% of 80273) affected shaders: Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01% CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01% Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00% InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00% Copies: 40126 -> 40125 (-0.00%) VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01% SALU: 50290 -> 50283 (-0.01%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:28 +00:00
Georg Lehmann	a0665e79e9	nir/opt_algebraic: push fsat into bcsel with constant bcsel doesn't have a free clamp modifier on AMD hardware, but what's inside might have free clamp. Foz-DB Navi31: Totals from 873 (1.09% of 80273) affected shaders: MaxWaves: 22008 -> 21968 (-0.18%) Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02% CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01% VGPRs: 57900 -> 57960 (+0.10%) Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02% InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00% VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14% SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02% Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10% PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05% PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08% VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01% SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Georg Lehmann	e9e5146848	nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive Foz-DB Navi31: Totals from 946 (1.18% of 80273) affected shaders: Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00% CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00% Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00% InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00% VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00% SClause: 109694 -> 109688 (-0.01%) Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01% Branches: 132636 -> 132633 (-0.00%) PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00% VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00% SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Georg Lehmann	e43ef6533b	nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw) Explicit 16bit instructions are nicer to vectorize. Helps FSR4 on GFX11 marginally. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 59781 -> 58518 (-2.11%) CodeSize: 413428 -> 404156 (-2.24%) Latency: 193770 -> 190768 (-1.55%) InvThroughput: 226274 -> 221628 (-2.05%) VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63% Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02% PreSGPRs: 312 -> 305 (-2.24%) VALU: 51448 -> 50213 (-2.40%) SALU: 1074 -> 1048 (-2.42%) VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>	2025-07-30 07:25:51 +00:00
Georg Lehmann	037c2532ab	nir/opt_algebraic: create non 32bit bitfield_select Foz-DB Navi21: Totals from 68 (0.08% of 80255) affected shaders: Instrs: 197878 -> 197709 (-0.09%); split: -0.09%, +0.00% CodeSize: 1060700 -> 1060472 (-0.02%); split: -0.02%, +0.00% Latency: 659865 -> 659673 (-0.03%); split: -0.03%, +0.00% InvThroughput: 117010 -> 116985 (-0.02%); split: -0.03%, +0.00% VClause: 3781 -> 3779 (-0.05%) Copies: 15317 -> 15265 (-0.34%); split: -0.35%, +0.01% PreVGPRs: 3251 -> 3250 (-0.03%) VALU: 96800 -> 96799 (-0.00%); split: -0.00%, +0.00% SALU: 57006 -> 56836 (-0.30%); split: -0.30%, +0.00% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Alyssa Rosenzweig	421d0e0953	nir: mark exact fmul in ldexp lowering this chain of fmul is deliberately chosen for floating point precision reasons, it needs to be exact, or else we might try to reassociate it and break subnormal handling. avoids regressing dEQP-VK.glsl.builtin.precision.ldexp_subnormals.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>	2025-07-21 11:42:18 +00:00
Alyssa Rosenzweig	042adf3cc5	nir/opt_algebraic: optimize signed pow in Control used in a post-processing shader which goes 896 instrs -> 749 instrs. In my Control fossil: Totals from 2 (0.63% of 319) affected shaders: Instrs: 2078 -> 1841 (-11.41%) CodeSize: 14540 -> 12800 (-11.97%) ALU: 1779 -> 1626 (-8.60%) FSCIB: 1779 -> 1626 (-8.60%) Uniforms: 370 -> 372 (+0.54%) In radv_fossils, there are affected shaders in Dredge. Totals from 4 (0.01% of 54019) affected shaders: Instrs: 2306 -> 2294 (-0.52%) CodeSize: 16594 -> 16534 (-0.36%) ALU: 2010 -> 2004 (-0.30%) FSCIB: 2010 -> 2004 (-0.30%) Uniforms: 1138 -> 1146 (+0.70%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>	2025-07-08 17:09:16 +00:00
Alyssa Rosenzweig	2765017553	nir: fuse ffma even with float controls The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise, because they are a contraction. However, they respect signed zero/Inf/NaN rules. As such, it is legal to do this fusion with shader float controls as long as the exact bit is not set (mapping to SPIR-V NoContract). Unfortunately, NIR's imprecise rules do not distinguish between contraction issues versus float special case issues, forcing nir_search to skip all imprecise rules when any shader float control modes are used. This notably affects DXVK, which sets shader float controls to get D3D11 float behaviour and hence loses FMA fusing. Therefore, we plumb in the exact bit to express NoContract independent of the float controls, and weaken the requirement for fma fusion to allowable contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in SPIR-V is just a multiply add that we're allowed to contract rather than the real deal. Drivers that use their own FMA fusing passes (notably, Intel and AMD) are unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results on hk shown: Totals from 2194 (4.06% of 54019) affected shaders: MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01% Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01% CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01% Spills: 1094 -> 747 (-31.72%) Fills: 988 -> 681 (-31.07%) Scratch: 4444 -> 3820 (-14.04%) ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01% FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01% IC: 215398 -> 215274 (-0.06%) GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96% Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04% Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07% Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>	2025-07-08 17:09:16 +00:00
Georg Lehmann	045ddb992a	nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat Helps vectorized emulated fp16 -> fp8 conversions No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>	2025-07-03 20:08:39 +00:00

1 2 3 4 5 ...

681 commits