fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-24 17:08:20 +02:00

Author	SHA1	Message	Date
Rhys Perry	ec59b59b97	nir: rename nir_src_parent_instr to nir_src_use_instr sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f` sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f` sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f` There are two kinds of "parent" in relation to a src/def: - the instruction where the def or src's def is defined - the instruction which the src is a part of and where the def is used Clarify that the parent here is where the src's def is used, not where it's defined. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>	2026-05-06 17:09:22 +00:00
Lorenzo Rossi	2a7d817591	nir/opt_algebraic: optimize fadd/fmul with 16-bit source and constant Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>	2026-04-30 17:33:09 +00:00
Karol Herbst	2e09b4ac68	nir: handle fmul_rtz in a couple of places Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>	2026-04-30 15:42:40 +00:00
Brandon Jones	d1dd65d425	nir/opt_algebraic: fix fabs optimization Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This fixes a regression found in blender's unit testing, which called fabs(-0.0) and invoked an NIR optimization that is was not valid for the parameter -0.0. IEEE 754 requires that abs clear the sign bit for the value -0.0. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060>	2026-04-21 04:10:29 +00:00
Georg Lehmann	643dd510d4	nir/opt_algebraic: optimize b2f(a) * b When the multiplication is only used by fadd, it's not a clear win because of potential fma fusion. Totals from 8015 (6.99% of 114655) affected shaders: MaxWaves: 199394 -> 199466 (+0.04%); split: +0.04%, -0.01% Instrs: 17461518 -> 17451076 (-0.06%); split: -0.10%, +0.04% CodeSize: 94779552 -> 94769828 (-0.01%); split: -0.07%, +0.06% VGPRs: 526012 -> 525532 (-0.09%); split: -0.10%, +0.01% SpillSGPRs: 12466 -> 12517 (+0.41%); split: -0.09%, +0.50% Latency: 191274766 -> 191297394 (+0.01%); split: -0.03%, +0.04% InvThroughput: 31465968 -> 31456785 (-0.03%); split: -0.07%, +0.04% VClause: 312081 -> 312073 (-0.00%); split: -0.10%, +0.09% SClause: 366914 -> 366906 (-0.00%); split: -0.02%, +0.01% Copies: 1222482 -> 1221933 (-0.04%); split: -0.20%, +0.15% Branches: 376651 -> 376577 (-0.02%); split: -0.03%, +0.01% PreSGPRs: 442974 -> 443240 (+0.06%); split: -0.01%, +0.07% PreVGPRs: 415964 -> 415668 (-0.07%); split: -0.09%, +0.02% VALU: 9403517 -> 9393916 (-0.10%); split: -0.12%, +0.02% SALU: 2799420 -> 2800430 (+0.04%); split: -0.13%, +0.16% VOPD: 472826 -> 472347 (-0.10%); split: +0.09%, -0.19% Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>	2026-03-20 08:50:41 +00:00
Georg Lehmann	cadc74b5e2	nir/search_helpers: assume float sources without preserve flag can't be inf/nan For example, this should let us avoid needing one pattern with is_a_number and one with nnan. Foz-DB Navi48: Totals from 3564 (3.11% of 114655) affected shaders: Instrs: 8256755 -> 8255042 (-0.02%); split: -0.02%, +0.00% CodeSize: 43143184 -> 43123192 (-0.05%); split: -0.05%, +0.00% VGPRs: 268252 -> 268240 (-0.00%) Latency: 218890225 -> 218881157 (-0.00%); split: -0.00%, +0.00% InvThroughput: 31044516 -> 31042297 (-0.01%); split: -0.01%, +0.00% VClause: 96074 -> 96067 (-0.01%); split: -0.01%, +0.00% SClause: 218042 -> 218037 (-0.00%); split: -0.00%, +0.00% Copies: 508677 -> 508661 (-0.00%); split: -0.01%, +0.01% Branches: 148570 -> 148569 (-0.00%) PreSGPRs: 228110 -> 228082 (-0.01%); split: -0.01%, +0.00% PreVGPRs: 231996 -> 231982 (-0.01%) VALU: 4516327 -> 4515321 (-0.02%); split: -0.02%, +0.00% SALU: 1353696 -> 1353590 (-0.01%); split: -0.01%, +0.00% VMEM: 182189 -> 182179 (-0.01%) SMEM: 344771 -> 344756 (-0.00%) VOPD: 29463 -> 29438 (-0.08%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>	2026-03-13 07:13:10 +00:00
Georg Lehmann	4885e5cf3a	nir: remove more fsat using range analysis Foz-DB Navi48: Totals from 3018 (3.65% of 82636) affected shaders: MaxWaves: 69274 -> 69280 (+0.01%) Instrs: 7165414 -> 7157581 (-0.11%); split: -0.12%, +0.01% CodeSize: 38890212 -> 38823132 (-0.17%); split: -0.18%, +0.00% VGPRs: 228672 -> 228624 (-0.02%) Latency: 64789026 -> 64784877 (-0.01%); split: -0.01%, +0.00% InvThroughput: 11805156 -> 11802642 (-0.02%); split: -0.02%, +0.00% VClause: 136900 -> 136886 (-0.01%); split: -0.03%, +0.02% SClause: 150135 -> 150130 (-0.00%); split: -0.01%, +0.01% Copies: 574690 -> 574894 (+0.04%); split: -0.03%, +0.06% Branches: 187169 -> 187086 (-0.04%); split: -0.04%, +0.00% PreSGPRs: 190074 -> 190067 (-0.00%); split: -0.00%, +0.00% PreVGPRs: 189564 -> 189538 (-0.01%); split: -0.02%, +0.00% VALU: 3955188 -> 3949411 (-0.15%); split: -0.15%, +0.00% SALU: 1114659 -> 1114729 (+0.01%); split: -0.02%, +0.03% SMEM: 231080 -> 231077 (-0.00%); split: -0.00%, +0.00% VOPD: 116150 -> 116180 (+0.03%); split: +0.04%, -0.02% Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>	2026-03-07 05:01:45 +00:00
Georg Lehmann	506bb5a609	nir/search_helpers: use fp class analysis more Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>	2026-03-07 05:01:45 +00:00
Georg Lehmann	eb431efc19	nir/search_helpers: switch to fp class analysis Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>	2026-03-07 05:01:44 +00:00
Georg Lehmann	32b5719a9f	nir/opt_algebraic: add is_not_uint_zero for b2i16(uge) pattern More fallout from `f2a59fdea6`. is_not_zero now always returns whether the result is a floating point zero. When combined with the fp denorm handling that will be added to floating point range analysis, this is false for many sensible integer values. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>	2026-03-07 05:01:44 +00:00
Georg Lehmann	bca5aab2be	nir: let nir_analyze_fp_range take a nir_def This is midly worse for vector constants, but so much simpler. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>	2026-02-16 18:08:53 +00:00
Georg Lehmann	474af815ff	nir: rename nir_analyze_range because it's float only Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>	2026-02-16 18:08:53 +00:00
Georg Lehmann	486ea54184	nir/opt_algebraic: make bcsel(fcmp(b, a), b, a) -> fmin/fmax patterns exact These patterns need is_only_used_as_float because fmin/fmax might change NaN patterns, while bcsel is bit exact. For the same reason, the replacement must not add undefined results, so make the replacement NaN/inf preserving. It's impossible to make them signed zero correct (-0.0 == +0.0), so it's also important that the user alu doesn't care. Otherwise, the only thing that matters is is whether a is NaN. Foz-DB Navi48: Totals from 453 (0.55% of 82405) affected shaders: MaxWaves: 8242 -> 8270 (+0.34%) Instrs: 2382059 -> 2380094 (-0.08%); split: -0.09%, +0.00% CodeSize: 13197208 -> 13179488 (-0.13%); split: -0.14%, +0.00% VGPRs: 44688 -> 44604 (-0.19%) Latency: 22839894 -> 22838985 (-0.00%); split: -0.01%, +0.00% InvThroughput: 4873352 -> 4872924 (-0.01%) VClause: 50862 -> 50883 (+0.04%); split: -0.02%, +0.06% SClause: 54000 -> 53993 (-0.01%) Copies: 250215 -> 250233 (+0.01%); split: -0.00%, +0.01% PreVGPRs: 39694 -> 39620 (-0.19%) VALU: 1116881 -> 1116073 (-0.07%); split: -0.07%, +0.00% SALU: 492799 -> 492139 (-0.13%); split: -0.14%, +0.00% VOPD: 85457 -> 85461 (+0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>	2026-02-10 18:42:03 +00:00
Georg Lehmann	b2d9615000	nir/opt_algebraic: optimize bcsel to hi 16bits with undef lo Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412>	2026-01-26 10:54:20 +00:00
Rhys Perry	625afb0d29	nir: add fcanonicalize v2(Georg Lehmann): Always remove fcanonicalize if denorms must be neither flushed nor preserved. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>	2026-01-19 16:11:29 +00:00
Emma Anholt	8ebe630a13	nir/search_helpers: Avoid UB in is_2x_16_bits()/is_neg2x_16_bits(). Same trick we do for nir_imul evaluation -- do the multiply in unsigned to get defined behavior from C. Fixes UBSan failures with nir_opt_algebraic_pattern_tests. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>	2026-01-15 19:09:37 +00:00
Georg Lehmann	93d05cdfd8	nir/opt_algebraic: move fsat last for fsqrt(fsat(a)) This should be exact, even for all special values: fsqrt(NaN) -> NaN fsqrt(-0.0) -> 0.0 fsqrt(-Inf) -> NaN fsqrt(negative finite) -> NaN So all of these get saturated to +0.0 All numbers >= 1.0 will have a square root >= 1.0, which will be saturate to 1.0 Moving the fsat guarantees that it can use an output modifier for hardware that has those, and shouldn't harm other hardware either. Foz-DB Navi21: Totals from 255 (0.31% of 82151) affected shaders: Instrs: 664906 -> 664194 (-0.11%) CodeSize: 3623500 -> 3619188 (-0.12%) Latency: 11336397 -> 11335688 (-0.01%); split: -0.01%, +0.00% InvThroughput: 2716430 -> 2715726 (-0.03%); split: -0.03%, +0.00% VALU: 442603 -> 441891 (-0.16%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39202>	2026-01-09 07:34:46 +00:00
Georg Lehmann	c8ce0df2d2	nir/opt_algebraic: replace is_negative_zero with constant -0.0 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Now that nir_search respects the sign of zero, we don't need a manual helper for this. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>	2026-01-03 12:42:23 +00:00
Pavel Ondračka	0b39b5ea63	nir/opt_algebraic: improve dot product narrowing Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The issue is that the current narrowing patterns are not working in a lot of cases, for example (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)), is missing patterns like this: 32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000) 32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0) 32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz or after some later transforms: 32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000) 32x2 %6 = vec2 %5, %1 (0x0) 32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy This patch is heavily based on old branch from Ian Romanick from 2019. r300 RV530 shader-db: total instructions in shared programs: 128900 -> 128882 (-0.01%) instructions in affected programs: 621 -> 603 (-2.90%) helped: 10 HURT: 1 total cycles in shared programs: 191837 -> 191828 (<.01%) cycles in affected programs: 799 -> 790 (-1.13%) helped: 7 HURT: 1 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>	2026-01-02 16:07:10 +01:00
Konstantin Seurer	de32f9275f	treewide: add & use parent instr helpers We add a bunch of new helpers to avoid the need to touch >parent_instr, including the full set of: * nir_def_is_* * nir_def_as__or_null nir_def_as_* [assumes the right instr type] * nir_src_is_* * nir_src_as_* * nir_scalar_is_* * nir_scalar_as_* Plus nir_def_instr() where there's no more suitable helper. Also an existing helper is renamed to unify all the names, while we're churning the tree: * nir_src_as_alu_instr -> nir_src_as_alu ..and then we port the tree to use the helpers as much as possible, using nir_def_instr() where that does not work. Acked-by: Marek Olšák <maraeo@gmail.com> --- To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm taking this opportunity to clean up a lot of NIR patterns. Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Rhys Perry	4f83059ac5	nir/algebraic: improve is_unsigned_multiple_of_4 and use it more fossil-db (gfx1201): Totals from 160 (0.20% of 79839) affected shaders: MaxWaves: 4008 -> 3952 (-1.40%) Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00% CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00% VGPRs: 9492 -> 9612 (+1.26%) Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00% InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35% VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25% SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11% Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20% PreSGPRs: 8914 -> 8938 (+0.27%) PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24% VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01% SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07% VMEM: 29365 -> 25327 (-13.75%) VOPD: 864 -> 1013 (+17.25%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Rhys Perry	2a12624532	nir/search: add nir_search_state A future commit will add another hash table. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Georg Lehmann	e43ef6533b	nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw) Explicit 16bit instructions are nicer to vectorize. Helps FSR4 on GFX11 marginally. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 59781 -> 58518 (-2.11%) CodeSize: 413428 -> 404156 (-2.24%) Latency: 193770 -> 190768 (-1.55%) InvThroughput: 226274 -> 221628 (-2.05%) VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63% Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02% PreSGPRs: 312 -> 305 (-2.24%) VALU: 51448 -> 50213 (-2.40%) SALU: 1074 -> 1048 (-2.42%) VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>	2025-07-30 07:25:51 +00:00
Georg Lehmann	045ddb992a	nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat Helps vectorized emulated fp16 -> fp8 conversions No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>	2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig	6efe557718	nir/search_helpers: add has_multiple_uses helper heuristic for the next patch. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>	2025-06-26 16:41:55 +00:00
Marek Olšák	c3034fa82c	amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>	2025-06-04 17:46:38 +00:00
Georg Lehmann	b386659588	nir/opt_algebraic: create ubfe from (a & mask) >> c Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi21: Totals from 917 (1.16% of 79188) affected shaders: Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00% CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00% Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01% InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01% VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00% SClause: 62294 -> 62293 (-0.00%) Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01% VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00% SALU: 359390 -> 358910 (-0.13%) VMEM: 101966 -> 101924 (-0.04%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>	2025-03-14 11:15:04 +00:00
Georg Lehmann	5da76df4cd	nir/search_helpers: check tex source type in is_only_used_as_float Foz-DB Navi21: Totals from 164 (0.21% of 79377) affected shaders: Instrs: 197477 -> 197035 (-0.22%); split: -0.23%, +0.01% CodeSize: 1052944 -> 1051140 (-0.17%); split: -0.18%, +0.01% VGPRs: 8104 -> 8080 (-0.30%) Latency: 1115663 -> 1115567 (-0.01%); split: -0.06%, +0.05% InvThroughput: 265822 -> 265158 (-0.25%); split: -0.26%, +0.01% VClause: 3792 -> 3789 (-0.08%); split: -0.11%, +0.03% SClause: 5738 -> 5744 (+0.10%); split: -0.02%, +0.12% Copies: 12223 -> 12200 (-0.19%); split: -0.53%, +0.34% PreVGPRs: 6807 -> 6801 (-0.09%); split: -0.15%, +0.06% VALU: 139206 -> 138785 (-0.30%); split: -0.31%, +0.01% SALU: 27852 -> 27853 (+0.00%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>	2025-02-24 16:34:53 +00:00
Georg Lehmann	3d8585e4fc	nir/search_helpers: look through vecs in is_only_used_as_float Will be useful with the next commit, or for backends that don't lower alu to scalar. No changes on Navi21. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>	2025-02-24 16:34:53 +00:00
Mel Henning	0470643047	nak,nir: Add 32-bit nir_op_lea_nv and use it Changes code size by -0.80% on shaderdb. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32517>	2025-02-13 17:36:41 +00:00
Alyssa Rosenzweig	be049e1c14	nir/search_helpers: handle bcsel in is_only_used_as_float this lets algebraic see through chains of instructions. v2: Limit recursion depth (Georg). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1] Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Job Noorman	1333af5d77	nir/search: add is_only_used_by_{iand,ior} helpers Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Rob Clark <robclark@freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>	2024-11-28 06:19:59 +00:00
Job Noorman	a8c947df9a	nir/search: make is_only_used_by_iadd reusable The algorithm is exactly the same for other opcodes so we don't have to have to copy paste it. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rob Clark <robclark@freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>	2024-11-28 06:19:59 +00:00
Rhys Perry	b8c8482dbb	nir/algebraic: add ddxy to is_only_used_as_float The sources for these intrinsics are floating point. fossil-db (navi21): Totals from 67 (0.08% of 79395) affected shaders: MaxWaves: 1128 -> 1116 (-1.06%) Instrs: 113552 -> 113319 (-0.21%); split: -0.21%, +0.01% CodeSize: 595248 -> 593360 (-0.32%) VGPRs: 4344 -> 4392 (+1.10%) Latency: 578158 -> 577526 (-0.11%); split: -0.18%, +0.07% InvThroughput: 170150 -> 169908 (-0.14%); split: -0.23%, +0.09% SClause: 3787 -> 3780 (-0.18%) Copies: 4305 -> 4294 (-0.26%); split: -0.51%, +0.26% PreVGPRs: 3883 -> 3925 (+1.08%) VALU: 90007 -> 89774 (-0.26%); split: -0.27%, +0.01% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>	2024-11-21 14:50:45 +00:00
Timur Kristóf	be68aeafdc	nir/opt_algebraic: Add various bitfield extract patterns. v2 (Georg Lehmann): - fixed incorrect imin in ubfe_ubfe - simplied outer_bits of ushr((ubfe, ...), ...) opt - added is_used_once to iand(ushr(), ...) opt to improve stats For-DB Navi21: Totals from 3309 (4.18% of 79206) affected shaders: Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03% CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06% Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01% InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01% VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02% SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01% Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09% Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01% PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01% PreVGPRs: 151477 -> 151474 (-0.00%) VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04% SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03% VMEM: 188035 -> 188060 (+0.01%) SMEM: 259632 -> 259630 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:09 +00:00
Christian Gmeiner	87786a7a7e	nak: Move imad late optimization to nir It is more or less just a code move, but I touched is_only_used_by_iadd(..) to match the style of the other functions in that file. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>	2024-07-12 05:54:46 +00:00
Georg Lehmann	98cc57bccb	nir/optimize cmp(a, -0.0) +0.0 can use an inline constant for AMD hardware, -0.0 needs a literal. Foz-DB Navi21: Totals from 1014 (1.28% of 79395) affected shaders: Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00% CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00% Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00% InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00% VClause: 79475 -> 79478 (+0.00%) SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00% Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00% PreSGPRs: 59155 -> 59144 (-0.02%) VALU: 2032016 -> 2032034 (+0.00%) SALU: 386424 -> 385729 (-0.18%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Ian Romanick	4834df82e2	nir/algebraic: More patterns to generate iadd3 I noticed some shaders with patterns similar to these while working on cooperative matrix lowering. Meteor Lake and DG2 are the only platforms that support iadd3, so there were no shader-db or fossil-db changes on any other platforms. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19869445 -> 19868343 (<.01%) instructions in affected programs: 419426 -> 418324 (-0.26%) helped: 913 / HURT: 2 total cycles in shared programs: 936010029 -> 935909811 (-0.01%) cycles in affected programs: 31746523 -> 31646305 (-0.32%) helped: 495 / HURT: 356 LOST: 10 GAINED: 12 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00% Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04% Spill count: 146887 -> 146886 (-0.00%) Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00% Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5550128 -> 5550368 (+0.00%) Totals from 4401 (0.70% of 632560) affected shaders: Instrs: `3095239` -> 3086109 (-0.29%); split: -0.30%, +0.00% Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10% Spill count: 28105 -> 28104 (-0.00%) Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02% Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22% Max dispatch width: 43768 -> 44008 (+0.55%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	f1b941aaec	nir/search: Refactor is_16_bits Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Suggested-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	6e53be2a0a	nir/search: Fix is_16_bits for vectors Require that all elements of a vector be representable as either int16_t or uint16_t. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Fixes: `7ef45e661f` ("intel/fs: Add constant propagation for ADD3") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Job Noorman	96c2fe3e1a	nir: add search helper is_only_used_by_if Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>	2024-03-01 13:45:11 +00:00
Alyssa Rosenzweig	c39896b17b	nir: Use getters for nir_src::parent_* First, we need to give the parent_instr field a unique name to be able to replace with a helper. We have parent_instr fields for both nir_src and nir_def, so let's rename nir_src::parent_instr in preparation for rework. This was done with a combination of sed and manual fix-ups. Then we use semantic patches plus manual fixups: @@ expression s; @@ -s->renamed_parent_instr +nir_src_parent_instr(s) @@ expression s; @@ -s.renamed_parent_instr +nir_src_parent_instr(&s) @@ expression s; @@ -s->parent_if +nir_src_parent_if(s) @@ expression s; @@ -s.renamed_parent_if +nir_src_parent_if(&s) @@ expression s; @@ -s->is_if +nir_src_is_if(s) @@ expression s; @@ -s.is_if +nir_src_is_if(&s) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>	2023-10-10 04:58:05 -04:00
Marek Olšák	1ac379c4a0	nir/algebraic: collapse ALU opcodes sourcing NaN Undef will be replaced by NaN whenever it leads to elimination of FP instructions. This implements the elimination part. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>	2023-08-19 14:18:52 -04:00
Faith Ekstrand	6c1d32581a	nir: Drop nir_alu_dest Instead, we replace it directly with nir_def. We could replace it with nir_dest but the next commit gets rid of that so this avoids unnecessary churn. Most of this commit was generated by sed: sed -i -e 's/dest.dest.ssa/def/g' src/*/.h src/*/.c src/*/.cpp There were a few manual fixups required in the nir_legacy.c and nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a similar pattern. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>	2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig	09d31922de	nir: Drop "SSA" from NIR language Everything is SSA now. sed -e 's/nir_ssa_def/nir_def/g' \ -e 's/nir_ssa_undef/nir_undef/g' \ -e 's/nir_ssa_scalar/nir_scalar/g' \ -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \ -e 's/nir_gather_ssa_types/nir_gather_types/g' \ -i $(git grep -l nir \| grep -v relnotes) git mv src/compiler/nir/nir_gather_ssa_types.c \ src/compiler/nir/nir_gather_types.c ninja -C build/ clang-format cd src/compiler/nir && find .c .h -type f -exec clang-format -i \{} \; Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>	2023-08-12 16:44:41 -04:00
Faith Ekstrand	777d336b1f	nir: clang-format src/compiler/nir/*.[ch] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24382>	2023-08-12 19:27:28 +00:00
Ian Romanick	de60b463d7	nir/algebraic: Simplify various trivial bfi These are mostly just obvious patterns that somebody will eventually want to add. DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar results (Ice Lake shown) total instructions in shared programs: 20570033 -> 20570026 (<.01%) instructions in affected programs: 7363 -> 7356 (-0.10%) helped: 6 / HURT: 0 total cycles in shared programs: 902118781 -> 902118854 (<.01%) cycles in affected programs: 419132 -> 419205 (0.02%) helped: 4 / HURT: 2 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819500 -> 152819380 (-0.00%) Cycles: 15014627187 -> 15014624437 (-0.00%) Totals from 115 (0.02% of 662497) affected shaders: Instrs: 28963 -> 28843 (-0.41%) Cycles: 404582 -> 401832 (-0.68%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Ian Romanick	7ef45e661f	intel/fs: Add constant propagation for ADD3 v2: Require that the constant value be representable as either uint16_t or int16_t. Suggested by Matt. v3: Remove redundant patterns. Noticed by Matt. shader-db: DG2 total instructions in shared programs: 23103767 -> 23103577 (<.01%) instructions in affected programs: 51822 -> 51632 (-0.37%) helped: 98 / HURT: 15 total cycles in shared programs: 842347714 -> 842380017 (<.01%) cycles in affected programs: 1942595 -> 1974898 (1.66%) helped: 97 / HURT: 32 Nearly all of the affected shaders (around 9,900) are shaders in Cyberpunk 2077. It's about an even split between vertex and fragment shaders. The majority of the remaining affected shaders (3,600) are from Strange Brigade. This was also a nearly even split between fragment and vertex. All but two of the lost shaders are SIMD32 fragment shaders in Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2. fossil-db: DG2 Instructions in all programs: 196379107 -> 196248608 (-0.1%) helped: 13467 / HURT: 1210 Cycles in all programs: 13931355281 -> 13929955971 (-0.0%) helped: 11801 / HURT: 2922 Lost: 90 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Alyssa Rosenzweig	7f6491b76d	nir: Combine if_uses with instruction uses Every nir_ssa_def is part of a chain of uses, implemented with doubly linked lists. That means each requires 2 * 64-bit = 16 bytes per def, which is memory intensive. Together they require 32 bytes per def. Not cool. To cut that memory use in half, we can combine the two linked lists into a single use list that contains both regular instruction uses and if-uses. To do this, we augment the nir_src with a boolean "is_if", and reimplement the abstract if-uses operations on top of that list. That boolean should fit into the padding already in nir_src so should not actually affect memory use, and in the future we sneak it into the bottom bit of a pointer. However, this creates a new inefficiency: now iterating over regular uses separate from if-uses is (nominally) more expensive. It turns out virtually every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe) immediately before, so we rewrite most of the callers to instead call a new single `nir_foreach_use_including_if(_safe)` which predicates the logic based on `src->is_if`. This should mitigate the performance difference. There's a bit of churn, but this is largely a mechanical set of changes. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>	2023-04-07 23:48:03 +00:00
Rhys Perry	368be87255	nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half fossil-db (navi21): Totals from 457 (0.34% of 135636) affected shaders: Instrs: 259349 -> 250383 (-3.46%) CodeSize: 1411976 -> 1369136 (-3.03%) Latency: 2175961 -> 2148158 (-1.28%) InvThroughput: 502206 -> 490244 (-2.38%) Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>	2022-11-21 17:34:46 +00:00

1 2

95 commits