fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 18:08:15 +02:00

Author	SHA1	Message	Date
Georg Lehmann	045ddb992a	nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat Helps vectorized emulated fp16 -> fp8 conversions No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>	2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig	6efe557718	nir/search_helpers: add has_multiple_uses helper heuristic for the next patch. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>	2025-06-26 16:41:55 +00:00
Marek Olšák	c3034fa82c	amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>	2025-06-04 17:46:38 +00:00
Georg Lehmann	b386659588	nir/opt_algebraic: create ubfe from (a & mask) >> c Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi21: Totals from 917 (1.16% of 79188) affected shaders: Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00% CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00% Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01% InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01% VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00% SClause: 62294 -> 62293 (-0.00%) Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01% VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00% SALU: 359390 -> 358910 (-0.13%) VMEM: 101966 -> 101924 (-0.04%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>	2025-03-14 11:15:04 +00:00
Georg Lehmann	5da76df4cd	nir/search_helpers: check tex source type in is_only_used_as_float Foz-DB Navi21: Totals from 164 (0.21% of 79377) affected shaders: Instrs: 197477 -> 197035 (-0.22%); split: -0.23%, +0.01% CodeSize: 1052944 -> 1051140 (-0.17%); split: -0.18%, +0.01% VGPRs: 8104 -> 8080 (-0.30%) Latency: 1115663 -> 1115567 (-0.01%); split: -0.06%, +0.05% InvThroughput: 265822 -> 265158 (-0.25%); split: -0.26%, +0.01% VClause: 3792 -> 3789 (-0.08%); split: -0.11%, +0.03% SClause: 5738 -> 5744 (+0.10%); split: -0.02%, +0.12% Copies: 12223 -> 12200 (-0.19%); split: -0.53%, +0.34% PreVGPRs: 6807 -> 6801 (-0.09%); split: -0.15%, +0.06% VALU: 139206 -> 138785 (-0.30%); split: -0.31%, +0.01% SALU: 27852 -> 27853 (+0.00%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>	2025-02-24 16:34:53 +00:00
Georg Lehmann	3d8585e4fc	nir/search_helpers: look through vecs in is_only_used_as_float Will be useful with the next commit, or for backends that don't lower alu to scalar. No changes on Navi21. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>	2025-02-24 16:34:53 +00:00
Mel Henning	0470643047	nak,nir: Add 32-bit nir_op_lea_nv and use it Changes code size by -0.80% on shaderdb. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32517>	2025-02-13 17:36:41 +00:00
Alyssa Rosenzweig	be049e1c14	nir/search_helpers: handle bcsel in is_only_used_as_float this lets algebraic see through chains of instructions. v2: Limit recursion depth (Georg). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1] Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Job Noorman	1333af5d77	nir/search: add is_only_used_by_{iand,ior} helpers Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Rob Clark <robclark@freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>	2024-11-28 06:19:59 +00:00
Job Noorman	a8c947df9a	nir/search: make is_only_used_by_iadd reusable The algorithm is exactly the same for other opcodes so we don't have to have to copy paste it. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rob Clark <robclark@freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>	2024-11-28 06:19:59 +00:00
Rhys Perry	b8c8482dbb	nir/algebraic: add ddxy to is_only_used_as_float The sources for these intrinsics are floating point. fossil-db (navi21): Totals from 67 (0.08% of 79395) affected shaders: MaxWaves: 1128 -> 1116 (-1.06%) Instrs: 113552 -> 113319 (-0.21%); split: -0.21%, +0.01% CodeSize: 595248 -> 593360 (-0.32%) VGPRs: 4344 -> 4392 (+1.10%) Latency: 578158 -> 577526 (-0.11%); split: -0.18%, +0.07% InvThroughput: 170150 -> 169908 (-0.14%); split: -0.23%, +0.09% SClause: 3787 -> 3780 (-0.18%) Copies: 4305 -> 4294 (-0.26%); split: -0.51%, +0.26% PreVGPRs: 3883 -> 3925 (+1.08%) VALU: 90007 -> 89774 (-0.26%); split: -0.27%, +0.01% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>	2024-11-21 14:50:45 +00:00
Timur Kristóf	be68aeafdc	nir/opt_algebraic: Add various bitfield extract patterns. v2 (Georg Lehmann): - fixed incorrect imin in ubfe_ubfe - simplied outer_bits of ushr((ubfe, ...), ...) opt - added is_used_once to iand(ushr(), ...) opt to improve stats For-DB Navi21: Totals from 3309 (4.18% of 79206) affected shaders: Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03% CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06% Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01% InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01% VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02% SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01% Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09% Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01% PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01% PreVGPRs: 151477 -> 151474 (-0.00%) VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04% SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03% VMEM: 188035 -> 188060 (+0.01%) SMEM: 259632 -> 259630 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:09 +00:00
Christian Gmeiner	87786a7a7e	nak: Move imad late optimization to nir It is more or less just a code move, but I touched is_only_used_by_iadd(..) to match the style of the other functions in that file. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>	2024-07-12 05:54:46 +00:00
Georg Lehmann	98cc57bccb	nir/optimize cmp(a, -0.0) +0.0 can use an inline constant for AMD hardware, -0.0 needs a literal. Foz-DB Navi21: Totals from 1014 (1.28% of 79395) affected shaders: Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00% CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00% Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00% InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00% VClause: 79475 -> 79478 (+0.00%) SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00% Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00% PreSGPRs: 59155 -> 59144 (-0.02%) VALU: 2032016 -> 2032034 (+0.00%) SALU: 386424 -> 385729 (-0.18%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Ian Romanick	4834df82e2	nir/algebraic: More patterns to generate iadd3 I noticed some shaders with patterns similar to these while working on cooperative matrix lowering. Meteor Lake and DG2 are the only platforms that support iadd3, so there were no shader-db or fossil-db changes on any other platforms. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19869445 -> 19868343 (<.01%) instructions in affected programs: 419426 -> 418324 (-0.26%) helped: 913 / HURT: 2 total cycles in shared programs: 936010029 -> 935909811 (-0.01%) cycles in affected programs: 31746523 -> 31646305 (-0.32%) helped: 495 / HURT: 356 LOST: 10 GAINED: 12 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00% Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04% Spill count: 146887 -> 146886 (-0.00%) Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00% Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5550128 -> 5550368 (+0.00%) Totals from 4401 (0.70% of 632560) affected shaders: Instrs: `3095239` -> 3086109 (-0.29%); split: -0.30%, +0.00% Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10% Spill count: 28105 -> 28104 (-0.00%) Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02% Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22% Max dispatch width: 43768 -> 44008 (+0.55%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	f1b941aaec	nir/search: Refactor is_16_bits Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Suggested-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	6e53be2a0a	nir/search: Fix is_16_bits for vectors Require that all elements of a vector be representable as either int16_t or uint16_t. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Fixes: `7ef45e661f` ("intel/fs: Add constant propagation for ADD3") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Job Noorman	96c2fe3e1a	nir: add search helper is_only_used_by_if Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>	2024-03-01 13:45:11 +00:00
Alyssa Rosenzweig	c39896b17b	nir: Use getters for nir_src::parent_* First, we need to give the parent_instr field a unique name to be able to replace with a helper. We have parent_instr fields for both nir_src and nir_def, so let's rename nir_src::parent_instr in preparation for rework. This was done with a combination of sed and manual fix-ups. Then we use semantic patches plus manual fixups: @@ expression s; @@ -s->renamed_parent_instr +nir_src_parent_instr(s) @@ expression s; @@ -s.renamed_parent_instr +nir_src_parent_instr(&s) @@ expression s; @@ -s->parent_if +nir_src_parent_if(s) @@ expression s; @@ -s.renamed_parent_if +nir_src_parent_if(&s) @@ expression s; @@ -s->is_if +nir_src_is_if(s) @@ expression s; @@ -s.is_if +nir_src_is_if(&s) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>	2023-10-10 04:58:05 -04:00
Marek Olšák	1ac379c4a0	nir/algebraic: collapse ALU opcodes sourcing NaN Undef will be replaced by NaN whenever it leads to elimination of FP instructions. This implements the elimination part. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>	2023-08-19 14:18:52 -04:00
Faith Ekstrand	6c1d32581a	nir: Drop nir_alu_dest Instead, we replace it directly with nir_def. We could replace it with nir_dest but the next commit gets rid of that so this avoids unnecessary churn. Most of this commit was generated by sed: sed -i -e 's/dest.dest.ssa/def/g' src/*/.h src/*/.c src/*/.cpp There were a few manual fixups required in the nir_legacy.c and nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a similar pattern. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>	2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig	09d31922de	nir: Drop "SSA" from NIR language Everything is SSA now. sed -e 's/nir_ssa_def/nir_def/g' \ -e 's/nir_ssa_undef/nir_undef/g' \ -e 's/nir_ssa_scalar/nir_scalar/g' \ -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \ -e 's/nir_gather_ssa_types/nir_gather_types/g' \ -i $(git grep -l nir \| grep -v relnotes) git mv src/compiler/nir/nir_gather_ssa_types.c \ src/compiler/nir/nir_gather_types.c ninja -C build/ clang-format cd src/compiler/nir && find .c .h -type f -exec clang-format -i \{} \; Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>	2023-08-12 16:44:41 -04:00
Faith Ekstrand	777d336b1f	nir: clang-format src/compiler/nir/*.[ch] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24382>	2023-08-12 19:27:28 +00:00
Ian Romanick	de60b463d7	nir/algebraic: Simplify various trivial bfi These are mostly just obvious patterns that somebody will eventually want to add. DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar results (Ice Lake shown) total instructions in shared programs: 20570033 -> 20570026 (<.01%) instructions in affected programs: 7363 -> 7356 (-0.10%) helped: 6 / HURT: 0 total cycles in shared programs: 902118781 -> 902118854 (<.01%) cycles in affected programs: 419132 -> 419205 (0.02%) helped: 4 / HURT: 2 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819500 -> 152819380 (-0.00%) Cycles: 15014627187 -> 15014624437 (-0.00%) Totals from 115 (0.02% of 662497) affected shaders: Instrs: 28963 -> 28843 (-0.41%) Cycles: 404582 -> 401832 (-0.68%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Ian Romanick	7ef45e661f	intel/fs: Add constant propagation for ADD3 v2: Require that the constant value be representable as either uint16_t or int16_t. Suggested by Matt. v3: Remove redundant patterns. Noticed by Matt. shader-db: DG2 total instructions in shared programs: 23103767 -> 23103577 (<.01%) instructions in affected programs: 51822 -> 51632 (-0.37%) helped: 98 / HURT: 15 total cycles in shared programs: 842347714 -> 842380017 (<.01%) cycles in affected programs: 1942595 -> 1974898 (1.66%) helped: 97 / HURT: 32 Nearly all of the affected shaders (around 9,900) are shaders in Cyberpunk 2077. It's about an even split between vertex and fragment shaders. The majority of the remaining affected shaders (3,600) are from Strange Brigade. This was also a nearly even split between fragment and vertex. All but two of the lost shaders are SIMD32 fragment shaders in Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2. fossil-db: DG2 Instructions in all programs: 196379107 -> 196248608 (-0.1%) helped: 13467 / HURT: 1210 Cycles in all programs: 13931355281 -> 13929955971 (-0.0%) helped: 11801 / HURT: 2922 Lost: 90 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Alyssa Rosenzweig	7f6491b76d	nir: Combine if_uses with instruction uses Every nir_ssa_def is part of a chain of uses, implemented with doubly linked lists. That means each requires 2 * 64-bit = 16 bytes per def, which is memory intensive. Together they require 32 bytes per def. Not cool. To cut that memory use in half, we can combine the two linked lists into a single use list that contains both regular instruction uses and if-uses. To do this, we augment the nir_src with a boolean "is_if", and reimplement the abstract if-uses operations on top of that list. That boolean should fit into the padding already in nir_src so should not actually affect memory use, and in the future we sneak it into the bottom bit of a pointer. However, this creates a new inefficiency: now iterating over regular uses separate from if-uses is (nominally) more expensive. It turns out virtually every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe) immediately before, so we rewrite most of the callers to instead call a new single `nir_foreach_use_including_if(_safe)` which predicates the logic based on `src->is_if`. This should mitigate the performance difference. There's a bit of churn, but this is largely a mechanical set of changes. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>	2023-04-07 23:48:03 +00:00
Rhys Perry	368be87255	nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half fossil-db (navi21): Totals from 457 (0.34% of 135636) affected shaders: Instrs: 259349 -> 250383 (-3.46%) CodeSize: 1411976 -> 1369136 (-3.03%) Latency: 2175961 -> 2148158 (-1.28%) InvThroughput: 502206 -> 490244 (-2.38%) Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>	2022-11-21 17:34:46 +00:00
Alyssa Rosenzweig	45a111c21c	nir/opt_algebraic: Fuse c - a * b to FMA Algebraically it is clear that -(a * b) + c = (-a) * b + c = fma(-a, b, c) But this is not clear from the NIR ('fadd', ('fneg', ('fmul', a, b)), c) Add rules to handle this case specially. Note we don't necessarily want to solve this by pushing fneg into fmul, because the rule opt_algebraic (not the late part where FMA fusing happens) specifically pulls fneg out of fmul to push fneg up multiplication chains. Noticed in the big glmark2 "terrain" shader, which has a cycle count reduced by 22% on Mali-G57 thanks to having this pattern a ton and being FMA bound. BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords Results on the same shader on AGX are also quite dramatic: BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ... AFTER: 1154 inst, 8040 bytes, 50 halfregs, ... Similar rules apply for fabs. v2: Use a loop over the bit sizes (suggested by Emma). shader-db on Valhall (open + small subset of closed), results on Bifrost are similar: total instructions in shared programs: 167975 -> 164970 (-1.79%) instructions in affected programs: 92642 -> 89637 (-3.24%) helped: 492 HURT: 25 helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3 helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91% HURT stats (abs) min: 1.0 max: 5.0 x̄: 2.80 x̃: 3 HURT stats (rel) min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37% 95% mean confidence interval for instructions value: -6.95 -4.68 95% mean confidence interval for instructions %-change: -3.08% -2.65% Instructions are helped. total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%) cycles in affected programs: 265.56 -> 247.66 (-6.74%) helped: 88 HURT: 2 helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0 helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25% HURT stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0 HURT stats (rel) min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42% 95% mean confidence interval for cycles value: -0.28 -0.12 95% mean confidence interval for cycles %-change: -6.30% -4.30% Cycles are helped. total fma in shared programs: 1582.42 -> 1535.06 (-2.99%) fma in affected programs: 871.58 -> 824.22 (-5.43%) helped: 502 HURT: 9 helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0 helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82% HURT stats (abs) min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0 HURT stats (rel) min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35% 95% mean confidence interval for fma value: -0.11 -0.08 95% mean confidence interval for fma %-change: -5.58% -4.93% Fma are helped. total cvt in shared programs: 665.55 -> 665.95 (0.06%) cvt in affected programs: 61.72 -> 62.12 (0.66%) helped: 33 HURT: 43 helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0 helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35% HURT stats (abs) min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0 HURT stats (rel) min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90% 95% mean confidence interval for cvt value: -0.01 0.02 95% mean confidence interval for cvt %-change: 0.23% 6.24% Inconclusive result (value mean confidence interval includes 0). total quadwords in shared programs: 93376 -> 91736 (-1.76%) quadwords in affected programs: 25376 -> 23736 (-6.46%) helped: 169 HURT: 1 helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8 helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00% HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 HURT stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% 95% mean confidence interval for quadwords value: -11.18 -8.11 95% mean confidence interval for quadwords %-change: -8.95% -7.36% Quadwords are helped. total threads in shared programs: 4697 -> 4701 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.00 1.00 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>	2022-11-01 22:39:45 -04:00
Alyssa Rosenzweig	ac2964dfbd	nir: Be smarter fusing ffma If there is a single use of fmul, and that single use is fadd, it makes sense to fuse ffma, as we already do. However, if there are multiple uses, fusing may impede code gen. Consider the source fragment: a = fmul(x, y) b = fadd(a, z) c = fmin(a, t) d = fmax(b, c) The fmul has two uses. The current ffma fusing is greedy and will produce the following "optimized" code. a = fmul(x, y) b = ffma(x, y, z) c = fmin(a, t) d = fmax(b, c) Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1 fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of one multiply and an add. Depending on the ISA, that could impede scheduling or increase code size. It can also increase register pressure, extending the live range. It's tempting to gate on is_used_once, but that would hurt in cases where we really do fuse everything, e.g.: a = fmul(x, y) b = fadd(a, z) c = fadd(a, t) For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2 fadd. So what we really want is to fuse ffma iff the fmul will get deleted. That occurs iff all uses of the fmul are fadd and will themselves get fused to ffma, leaving fmul to get dead code eliminated. That's easy to implement with a new NIR search helper, checking that all uses are fadd. shader-db results on Mali-G57 [open shader-db + subset of closed]: total instructions in shared programs: 179491 -> 178991 (-0.28%) instructions in affected programs: 36862 -> 36362 (-1.36%) helped: 190 HURT: 27 total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%) cycles in affected programs: 72.02 -> 70.56 (-2.02%) helped: 28 HURT: 1 total fma in shared programs: 1590.47 -> 1582.61 (-0.49%) fma in affected programs: 319.95 -> 312.09 (-2.46%) helped: 194 HURT: 1 total cvt in shared programs: 812.98 -> 813.03 (<.01%) cvt in affected programs: 118.53 -> 118.58 (0.04%) helped: 65 HURT: 81 total quadwords in shared programs: 98968 -> 98840 (-0.13%) quadwords in affected programs: 2960 -> 2832 (-4.32%) helped: 20 HURT: 4 total threads in shared programs: 4693 -> 4697 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 v2: Update trace checksums for virgl due to numerical differences. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>	2022-10-15 17:47:31 +00:00
Rhys Perry	c23411a970	nir/algebraic: optimize bits=umin(bits, 32-(offset&0x1f)) Optimizes patterns which are created by recent versions of vkd3d-proton, when constant folding doesn't eliminate it entirely: - ubitfield_extract(value, offset, umin(bits, 32-(offset&0x1f))) - ibitfield_extract(value, offset, umin(bits, 32-(offset&0x1f))) - bitfield_insert(base, insert, offset, umin(bits, 32-(offset&0x1f))) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13225>	2022-09-13 20:36:06 +00:00
Ian Romanick	97ce3a56bd	nir/search: Constify instr parameter to nir_search_expression::cond Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>	2022-02-10 18:15:39 +00:00
Rhys Perry	495debebad	nir/algebraic: optimize expressions using fmulz/ffmaz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	14b8227083	nir: add some missing nir_alu_type_get_base_type Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	403ae3b48e	nir/algebraic: optimize more 64-bit imul with constant source Two 64-bit shifts and an addition are usually faster than the several multiplications nir_lower_int64 creates. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Rhys Perry	a2d8c5b26d	nir/algebraic: optimize a*#b & -4 fossil-db (Sienna Cichlid): Totals from 611 (0.47% of 128647) affected shaders: CodeSize: 3096680 -> 3090976 (-0.18%) Instrs: 570494 -> 569249 (-0.22%) Latency: 5765865 -> 5759619 (-0.11%) InvThroughput: 969840 -> 967608 (-0.23%) VClause: 9690 -> 9688 (-0.02%) Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03% PreVGPRs: 28290 -> 28288 (-0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>	2021-12-03 13:41:07 +00:00
Ian Romanick	839495efc6	nir/algebraic: Add lowering for dot_4x8 instructions v2: Fix copy-and-paste bugs in lowering patterns. v3: Add has_sudot_4x8 flag. Requested by Rhys. v4: Since the names of the opcodes changed from dp4 to dot_4x8, also change the names of the lowering helpers. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Rhys Perry	2bb49e4587	nir/search: don't consider INT_MIN a negative power-of-two ineg(INT_MIN)/iabs(INT_MIN) won't work as expected. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Ian Romanick	49177b9e2f	nir/algebraic: Tautology replacements require sources be numbers It seems worth the small amount of damage to give an extra cushion of not having to debug problems later. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> All Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 21043197 -> 21043359 (<.01%) instructions in affected programs: 4409 -> 4571 (3.67%) helped: 0 HURT: 25 HURT stats (abs) min: 1 max: 16 x̄: 6.48 x̃: 5 HURT stats (rel) min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40% 95% mean confidence interval for instructions value: 4.37 8.59 95% mean confidence interval for instructions %-change: 2.93% 6.26% Instructions are HURT. total cycles in shared programs: 856175986 -> 856176921 (<.01%) cycles in affected programs: 58908 -> 59843 (1.59%) helped: 0 HURT: 25 HURT stats (abs) min: 7 max: 70 x̄: 37.40 x̃: 38 HURT stats (rel) min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39% 95% mean confidence interval for cycles value: 31.11 43.69 95% mean confidence interval for cycles %-change: 1.35% 2.39% Cycles are HURT. No fossil-db changes on any Intel platform. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	7019cd84c0	nir/search: Use range analysis for is_finite There are only a couple patterns that use is_finite, so the changes aren't huge. Mostly shaders from Batman Arkham City and a few shaders from Shadow of the Tomb Raider were affected. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tiger Lake Instructions in all programs: 160902591 -> 160902489 (-0.0%) SENDs in all programs: 6812270 -> 6812270 (+0.0%) Loops in all programs: 38225 -> 38225 (+0.0%) Cycles in all programs: 7429003266 -> 7428992369 (-0.0%) Spills in all programs: 192582 -> 192582 (+0.0%) Fills in all programs: 304539 -> 304539 (+0.0%) Ice Lake Instructions in all programs: 145301634 -> 145301460 (-0.0%) SENDs in all programs: 6863890 -> 6863890 (+0.0%) Loops in all programs: 38219 -> 38219 (+0.0%) Cycles in all programs: 8798589772 -> 8798575869 (-0.0%) Spills in all programs: 216880 -> 216880 (+0.0%) Fills in all programs: 334250 -> 334250 (+0.0%) Skylake Instructions in all programs: 135892010 -> 135891836 (-0.0%) SENDs in all programs: 6802916 -> 6802916 (+0.0%) Loops in all programs: 38216 -> 38216 (+0.0%) Cycles in all programs: 8442597324 -> 8442583202 (-0.0%) Spills in all programs: 194839 -> 194839 (+0.0%) Fills in all programs: 301116 -> 301116 (+0.0%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>	2021-03-11 22:00:30 +00:00
Ian Romanick	aa5d38decd	nir/range_analysis: Add "is a number" range analysis tracking This commit is necessary to support "nir/range_analysis: Fix analysis of fmin and fmax with NaN". No shader-db or fossil-db changes on any Intel platform. v2: Pack and unpack is_a_number. v3: Don't set is_a_number of integer constants. The bit pattern might be NaN. v4: Update handling of b2i32. intBitsToFloat(int(true)) is 1.401298464324817e-45. Return a value consistent with that. Fixes: `405de7ccb6` ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>	2021-03-11 22:00:30 +00:00
Ian Romanick	c393ae9d84	nir/search: Constify instruction parameter to search helpers The search helps must never modify the instruction passed in, so let the compiler enforce this. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9378>	2021-03-03 18:32:14 +00:00
Jason Ekstrand	f9b3be09e1	nir/algebraic: Clean up up-cast of down-cast when we can There are a bunch of cases where we can pretty quickly determine that the high bits don't matter. In these cases, delete the casts. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8872>	2021-02-16 16:36:31 +00:00
Rhys Perry	2849f0b5aa	nir/algebraic: optimize out exact a*1.0 if it's used only as a float fossil-db (GFX10): Totals from 10180 (7.30% of 139391) affected shaders: SGPRs: 549392 -> 549448 (+0.01%); split: -0.00%, +0.01% VGPRs: 243228 -> 243008 (-0.09%); split: -0.11%, +0.02% CodeSize: 12939080 -> 12603996 (-2.59%); split: -2.59%, +0.00% MaxWaves: 186948 -> 186976 (+0.01%) Instrs: 2497266 -> 2414648 (-3.31%) fossil-db (GFX10.3): Totals from 10180 (7.30% of 139391) affected shaders: SGPRs: 549672 -> 549280 (-0.07%); split: -0.23%, +0.16% VGPRs: 289296 -> 283672 (-1.94%); split: -2.83%, +0.88% CodeSize: 13920180 -> 13255560 (-4.77%); split: -4.77%, +0.00% MaxWaves: 151789 -> 153165 (+0.91%) Instrs: 2756978 -> 2671517 (-3.10%); split: -3.10%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>	2021-01-26 11:36:13 +00:00
Ian Romanick	55621c6d1c	nir/algebraic: Add some compare-with-zero optimizations that are exact This prevents some fossil-db regressions in "spir-v: Mark floating point comparisons exact". v2: Note that the patterns and replacements produce the same value when isnan(b). Suggested by Caio. v3: Use C99 isfinite() instead of (obsolete) BSD finite(). Fixes various Windows builds. No fossil-db changes on any Inetl platform, Vega, or Polaris10. All Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 20908670 -> 20908672 (<.01%) instructions in affected programs: 69 -> 71 (2.90%) helped: 0 HURT: 1 total cycles in shared programs: 473515288 -> 473513940 (<.01%) cycles in affected programs: 4942 -> 3594 (-27.28%) helped: 2 HURT: 0 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>	2021-01-05 02:07:09 +00:00
Rhys Perry	89c4bba8bc	nir/algebraic: better propagate constants up fadd chains Make the optimization create more mad-friendly code if the order of the fadd's operands is unlucky. fossil-db (Navi): Totals from 9259 (8.07% of 114665) affected shaders: SGPRs: 615991 -> 616191 (+0.03%); split: -0.05%, +0.08% VGPRs: 442184 -> 443568 (+0.31%); split: -0.10%, +0.41% CodeSize: 32674876 -> 32625572 (-0.15%); split: -0.17%, +0.02% MaxWaves: 108560 -> 108152 (-0.38%); split: +0.07%, -0.44% Instrs: 6126473 -> 6120463 (-0.10%); split: -0.13%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>	2020-11-03 14:56:00 +00:00
Rhys Perry	5476d18183	nir/algebraic: add patterns for a >> #b << #b Fixes compilation of a Battlefront 2 shader with ACO by removing VGPR spilling. The reassociation makes it worse on LLVM though. pipeline-db (ACO): Totals from affected shaders: SGPRS: 10704 -> 10688 (-0.15 %) VGPRS: 18736 -> 18528 (-1.11 %) Spilled SGPRs: 70 -> 70 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 909696 -> 885796 (-2.63 %) bytes LDS: 225 -> 225 (0.00 %) blocks Max Waves: 1115 -> 1129 (1.26 %) pipeline-db (LLVM): Totals from affected shaders: SGPRS: 8472 -> 8424 (-0.57 %) VGPRS: 14284 -> 14368 (0.59 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 442 -> 503 (13.80 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 268 -> 396 (47.76 %) dwords per thread Code Size: 862568 -> 853028 (-1.11 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 971 -> 964 (-0.72 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2271>	2020-01-29 14:30:33 +00:00
Timothy Arceri	7f106a2b5d	util: rename list_empty() to list_is_empty() This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-10-28 11:24:38 +00:00
Rob Clark	ad8167c1e0	nir/search: fix the PoT helpers Otherwise, if the base type is (for example) uint32, we would incorrectly think that PoT optimizations could not apply. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Jason Ekstsrand <jason@jleksrand.net> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2019-10-18 15:08:54 -07:00
Ian Romanick	050e4e28bf	nir/search: Fix possible NULL dereference in is_fsign Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `09705747d7` ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")	2019-10-17 15:07:01 -07:00
Eric Anholt	c23db0df18	nir: Keep the range analysis HT around intra-pass until we make a change. This lets us memoize range analysis work across instructions. Reduces runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3). Fixes: `405de7ccb6` ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-10-04 19:15:01 +00:00

1 2

72 commits