fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 04:50:11 +01:00

Author	SHA1	Message	Date
Marek Olšák	ecfefe823e	nir/opt_algebraic: use fmulz for fpow lowering to fix incorrect rendering The original implementation in all radeon drivers had this behavior. Fixes: `9bc1fb4c07` - ac/llvm,radeonsi: lower nir_fpow for aco and llvm Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11464 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30069>	2024-07-23 15:23:27 +00:00
Ian Romanick	faee9426ab	nir/algebraic: Optimize some masking of extract_u8 operations I observed this pattern in several Red Dead Redemption 2 shaders. No shader-db changes on any Intel platform. v2: Remove duplicated patterns. Noticed by Georg. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151519393 -> 151507192 (-0.01%); split: -0.01%, +0.00% Cycle count: 17208246858 -> 17177437340 (-0.18%); split: -0.25%, +0.07% Spill count: 80830 -> 80759 (-0.09%); split: -0.09%, +0.00% Fill count: 152754 -> 152179 (-0.38%); split: -0.40%, +0.02% Totals from 7531 (1.20% of 630198) affected shaders: Instrs: 12606141 -> 12593940 (-0.10%); split: -0.10%, +0.00% Cycle count: 5466605514 -> 5435795996 (-0.56%); split: -0.79%, +0.22% Spill count: 25251 -> 25180 (-0.28%); split: -0.29%, +0.01% Fill count: 45143 -> 44568 (-1.27%); split: -1.36%, +0.08% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>	2024-07-20 00:19:05 +00:00
Ian Romanick	1c7e35d4e0	nir/algebraic: Optimize some bit operation nonsense observed in some shaders In updates (not post at the time of this writing) to !29884, a change caused many spill and fill regressions shader for OpenGL Tomb Raider. While looking at that shader, I noticed some odd patterns. I initially added these patterns to counteract the regressions caused by the other change, but I had no luck. On Ice Lake... this cuts 99 instructions from the shader. shader-db: All Intel platforms had simliar results. (Meteor Lake shown) total instructions in shared programs: 19732341 -> 19732295 (<.01%) instructions in affected programs: 1744 -> 1698 (-2.64%) helped: 1 / HURT: 0 total cycles in shared programs: 916273716 -> 916273068 (<.01%) cycles in affected programs: 14266 -> 13618 (-4.54%) helped: 1 / HURT: 0 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151519575 -> 151519393 (-0.00%) Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00% Totals from 159 (0.03% of 630198) affected shaders: Instrs: 51970 -> 51788 (-0.35%) Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>	2024-07-20 00:19:05 +00:00
Christian Gmeiner	87786a7a7e	nak: Move imad late optimization to nir It is more or less just a code move, but I touched is_only_used_by_iadd(..) to match the style of the other functions in that file. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>	2024-07-12 05:54:46 +00:00
Konstantin Seurer	d9e41e8a8c	nir: Stop using "capture : true" for nir_opt_algebraic "calture : true" is suboptimal and and prevents the script from writing multiple files in one go. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30041>	2024-07-06 15:51:06 +00:00
Georg Lehmann	3e86d2452f	nir/opt_algebraic: add various unordered/ordered patterns from aco Foz-DB Navi21: Totals from 6747 (8.50% of 79395) affected shaders: MaxWaves: 134646 -> 134642 (-0.00%) Instrs: 7830299 -> 7828851 (-0.02%); split: -0.03%, +0.01% CodeSize: 43045532 -> 43010260 (-0.08%); split: -0.09%, +0.00% VGPRs: 378960 -> 378968 (+0.00%) SpillSGPRs: 1209 -> 1208 (-0.08%) Latency: 74667977 -> 74670405 (+0.00%); split: -0.02%, +0.02% InvThroughput: 20124981 -> 20124768 (-0.00%); split: -0.02%, +0.02% VClause: 162870 -> 162868 (-0.00%); split: -0.00%, +0.00% SClause: 277280 -> 277315 (+0.01%); split: -0.00%, +0.02% Copies: 528627 -> 528667 (+0.01%); split: -0.00%, +0.01% PreSGPRs: 319526 -> 319508 (-0.01%) PreVGPRs: 334264 -> 334265 (+0.00%); split: -0.00%, +0.00% VALU: 5485412 -> `5485408` (-0.00%); split: -0.02%, +0.02% SALU: 743882 -> 742301 (-0.21%); split: -0.21%, +0.00% Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	434dfb51ca	nir/opt_algebraic: optimize cmp(fneg(a), #b) and feq with fabs Foz-DB Navi21: Totals from 2483 (3.13% of 79395) affected shaders: Instrs: 4067533 -> 4067756 (+0.01%); split: -0.00%, +0.01% CodeSize: 22525156 -> 22499904 (-0.11%); split: -0.12%, +0.01% Latency: 51967223 -> 51963654 (-0.01%); split: -0.01%, +0.00% InvThroughput: 16685020 -> 16683045 (-0.01%); split: -0.01%, +0.00% SClause: 131890 -> 131907 (+0.01%) Copies: 402557 -> 402510 (-0.01%); split: -0.01%, +0.00% Branches: 146962 -> 146958 (-0.00%) PreSGPRs: 118404 -> 118401 (-0.00%) PreVGPRs: 123791 -> 123787 (-0.00%) VALU: 2709846 -> 2710174 (+0.01%); split: -0.00%, +0.01% SALU: 565883 -> 565786 (-0.02%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	98cc57bccb	nir/optimize cmp(a, -0.0) +0.0 can use an inline constant for AMD hardware, -0.0 needs a literal. Foz-DB Navi21: Totals from 1014 (1.28% of 79395) affected shaders: Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00% CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00% Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00% InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00% VClause: 79475 -> 79478 (+0.00%) SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00% Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00% PreSGPRs: 59155 -> 59144 (-0.02%) VALU: 2032016 -> 2032034 (+0.00%) SALU: 386424 -> 385729 (-0.18%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	8e6bf596cb	nir/opt_algebraic: look through fabs/fneg when matching fmulz/ffmaz Prevents regressions when removing input modifiers from a == 0.0. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	75b1fa9263	nir/opt_algebraic: alternative 8bit pack_[us]norm_4x8 lowering Foz-DB Navi21: Totals from 42 (0.05% of 79395) affected shaders: Instrs: 2709529 -> 2705848 (-0.14%) CodeSize: 14720732 -> 14711384 (-0.06%); split: -0.06%, +0.00% VGPRs: 4096 -> 4104 (+0.20%) Latency: 17907612 -> 17904468 (-0.02%); split: -0.02%, +0.00% InvThroughput: 4723551 -> 4722649 (-0.02%); split: -0.02%, +0.00% Copies: 223516 -> 219819 (-1.65%) Branches: 109578 -> 109594 (+0.01%); split: -0.00%, +0.02% VALU: 1730848 -> 1727151 (-0.21%) Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28882>	2024-06-04 17:00:29 +00:00
Ian Romanick	7b7e5cf5d4	nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers v2: Add some comments explaining some of the nuance of the shift optimizations. Fix a bug in the shift count calculation of the upper 32-bits. Move the @64 from the variable to the opcode. All suggested by Jordan. No shader-db changes on any Intel platform. fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154507026 -> 154506576 (-0.00%) Cycle count: 17436298868 -> 17436295016 (-0.00%) Max live registers: 32635309 -> 32635297 (-0.00%) Totals from 42 (0.01% of 632575) affected shaders: Instrs: 5616 -> 5166 (-8.01%) Cycle count: 133680 -> 129828 (-2.88%) Max live registers: 1158 -> 1146 (-1.04%) No fossil-db changes on any other Intel platform. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	4834df82e2	nir/algebraic: More patterns to generate iadd3 I noticed some shaders with patterns similar to these while working on cooperative matrix lowering. Meteor Lake and DG2 are the only platforms that support iadd3, so there were no shader-db or fossil-db changes on any other platforms. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19869445 -> 19868343 (<.01%) instructions in affected programs: 419426 -> 418324 (-0.26%) helped: 913 / HURT: 2 total cycles in shared programs: 936010029 -> 935909811 (-0.01%) cycles in affected programs: 31746523 -> 31646305 (-0.32%) helped: 495 / HURT: 356 LOST: 10 GAINED: 12 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00% Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04% Spill count: 146887 -> 146886 (-0.00%) Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00% Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5550128 -> 5550368 (+0.00%) Totals from 4401 (0.70% of 632560) affected shaders: Instrs: `3095239` -> 3086109 (-0.29%); split: -0.30%, +0.00% Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10% Spill count: 28105 -> 28104 (-0.00%) Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02% Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22% Max dispatch width: 43768 -> 44008 (+0.55%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	22095c60bc	nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64 This allows us to not generate 64-bit iadd3 on Intel but continue generating it for NVIDIA. No shader-db or fossil-db changes. v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit iadd3 on NVIDIA platforms. v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of the optimizations from ever occuring. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Georg Lehmann	dcab408a6c	nir: remove unpack_half_flush_to_zero It doesn't make sense to have two sets of opcodes for this when all backends that support the flush_to_zero variant just rely on the global floating point mode anyway. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>	2024-05-31 09:46:35 +00:00
Marek Olšák	b4bd380704	nir/algebraic: eliminate pack+unpack and unpack+pack pairs A new NIR shader for AMD drivers will need this. Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29233>	2024-05-17 22:04:00 +00:00
Francisco Jerez	15a10786e3	nir: Add option to lower 64-bit uadd_sat. C.f. `16be909936`. Intel Xe2 won't support saturation for 64-bit integer addition, regardless of signedness. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Ian Romanick	1b8cf06fc7	nir/algebraic: Optimize some extract_* expressions v2: Add missing '!options->lower_extract_byte' to the last two patterns. Every driver except Asahi sets both or neither. shader-db: All Intel platforms had similar results. (DG2 shown) total instructions in shared programs: 19659360 -> 19659356 (<.01%) instructions in affected programs: 44 -> 40 (-9.09%) helped: 2 / HURT: 0 total cycles in shared programs: 823432524 -> 823432520 (<.01%) cycles in affected programs: 1722 -> 1718 (-0.23%) helped: 2 / HURT: 0 fossil-db: All Intel platforms had similar results. (DG2 shown) Totals: Instrs: 153989787 -> 153989617 (-0.00%) Cycle count: 17562079230 -> 17562079493 (+0.00%); split: -0.00%, +0.00% Totals from 24 (0.00% of 631369) affected shaders: Instrs: 13733 -> 13563 (-1.24%) Cycle count: 341392 -> 341655 (+0.08%); split: -0.25%, +0.33% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Jesse Natalie	894f7f4387	nir_opt_algebraic: Add a couple optimizations for lowered unpack(pack()) I noticed some unnecessary 64-bit ints in shaders that were using doubles. Perhaps there's a different missing optimization that should run on the actual pack/unpack instructions before they're lowered, or maybe I'm just lowering them too early, but these seem simple enough that we might want them even for hand-rolled pack/unpack pairs. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27314>	2024-05-01 21:55:20 +00:00
Connor Abbott	32308fe9f1	ir3/nir: Fix imadsh_mix16 definition The constant-folding definition and comments say that it takes the high 16 bits of the first source and low 16 bits of the second source, but actually it's the opposite. The algebraic optimization, which actually happens and needs to be correct, was correct but the comment above it was wrong. Note that in the way we use it when lowering multiplications, the ordering doesn't matter. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>	2024-04-26 12:55:14 +00:00
Iván Briano	7f97fa6df0	nir/algebraic: move float control conditions to be per instruction Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281>	2024-04-25 12:13:41 +00:00
Iván Briano	8c4cd3e74e	nir/algebraic: support float controls conditions per instruction v?: - Make the Python not awful (Dylan) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281>	2024-04-25 12:13:41 +00:00
Iván Briano	5218cff34b	nir/algebraic: avoid double lowering of some fp64 operations The ffloor@64 case, which lowers to use ffract, is already ignored if nir_lower_dfract is set. Do the same thing for ftrunc@64 and ffract@64 and let nir_lower_doubles take care of them directly instead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28702>	2024-04-16 13:34:36 -07:00
Rhys Perry	08903bbe89	nir: add mqsad_4x8, shfr and nir_opt_mqsad Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26251>	2024-04-05 11:01:39 +00:00
Rhys Perry	b37804c8de	nir/algebraic: optimize 64-bit comparisons with zero'd halves to 32-bit These expect nir_lower_int64 to replace u2u64 to pack_64_2x32_split(, 0). fossil-db (navi31): Totals from 149 (0.19% of 79242) affected shaders: Instrs: 433095 -> 431830 (-0.29%); split: -0.29%, +0.00% CodeSize: 2165980 -> 2160284 (-0.26%); split: -0.27%, +0.00% SpillSGPRs: 689 -> 688 (-0.15%) Latency: 3801497 -> 3799901 (-0.04%); split: -0.05%, +0.01% InvThroughput: 1547916 -> 1546567 (-0.09%); split: -0.09%, +0.01% VClause: 4698 -> 4693 (-0.11%) SClause: 9981 -> 9977 (-0.04%); split: -0.05%, +0.01% Copies: 66148 -> 65431 (-1.08%); split: -1.09%, +0.01% PreSGPRs: 6732 -> 6729 (-0.04%) PreVGPRs: 7976 -> 7945 (-0.39%) VALU: 252936 -> 252336 (-0.24%) SALU: 51794 -> 51274 (-1.00%); split: -1.03%, +0.02% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>	2024-03-06 15:23:18 +00:00
Rhys Perry	417eb390c6	nir/algebraic: remove duplicated iand(ien, ine)/ior(ieq, ieq) patterns These don't seem useful, since they're already done in the early optimizations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>	2024-03-06 15:23:18 +00:00
Rhys Perry	6952bb359c	nir/algebraic: don't create 64-bit min/max/ior if lowered fossil-db (navi31): Totals from 58 (0.07% of 79242) affected shaders: Instrs: 11692 -> 11304 (-3.32%) CodeSize: 65836 -> 62412 (-5.20%) VGPRs: 1320 -> 1344 (+1.82%) Latency: 51712 -> 50234 (-2.86%) InvThroughput: 10190 -> 10160 (-0.29%) Copies: 460 -> 688 (+49.57%) VALU: 6130 -> 5897 (-3.80%) SALU: 1231 -> 1284 (+4.31%); split: -0.32%, +4.63% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>	2024-03-06 15:23:18 +00:00
Karol Herbst	f2b7c4ce29	nir: rework and fix rotate lowering No driver supports urol/uror on all bit sizes. Intel gen11+ only for 16 and 32 bit, Nvidia GV100+ only for 32 bit. Etnaviv can support it on 8, 16 and 32 bit. Also turn the `lower` into a `has` option as only two drivers actually support `uror` and `urol` at this momemt. Fixes crashes with CL integer_rotate on iris and nouveau since we emit urol for `rotate`. v2: always lower 64 bit Fixes: `fe0965afa6` ("spirv: Don't use libclc for rotate") Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by (Intel and nir): Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: David Heidelberg <david.heidelberg@collabora.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27090>	2024-01-22 10:27:44 +00:00
Rhys Perry	e86ab8173b	nir/algebraic: optimize vkd3d-proton's MSAD Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26907>	2024-01-05 18:55:22 +00:00
Konstantin Seurer	b88ac6b381	nir: Optimize fpow with small constant exponents They would be turned into exp(log(a)*b) instead, which is slow. Totals from 2146 (2.52% of 85071) affected shaders: MaxWaves: 35769 -> 35779 (+0.03%); split: +0.03%, -0.01% Instrs: 6476835 -> 6465494 (-0.18%); split: -0.18%, +0.00% CodeSize: 35382288 -> 35347092 (-0.10%); split: -0.10%, +0.00% SpillSGPRs: 1055 -> 1017 (-3.60%) Latency: 75211743 -> 75063623 (-0.20%); split: -0.20%, +0.00% InvThroughput: 17525115 -> 17501745 (-0.13%); split: -0.14%, +0.00% VClause: 200089 -> 200077 (-0.01%); split: -0.01%, +0.01% SClause: 293566 -> 293480 (-0.03%); split: -0.03%, +0.00% Copies: 649631 -> 640516 (-1.40%); split: -1.44%, +0.03% Branches: 268441 -> 268325 (-0.04%) PreSGPRs: 146868 -> 146045 (-0.56%) PreVGPRs: 134125 -> 134128 (+0.00%); split: -0.00%, +0.01% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26727>	2024-01-02 11:16:14 +01:00
Faith Ekstrand	aac1e3f595	nir: Add a new has_fmulz_no_denorms flag Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26569>	2023-12-11 15:29:17 +00:00
Faith Ekstrand	d2ffcb6092	nir: Lower [su]dot_4x8_[ui]add_sat to [su]dot_4x8_[ui]add Since nir_opt_algebraic runs on its own results, if the driver doesn't have [su]dot_4x8_[ui]add then the [su]dot_4x8_[ui]add lowering rules will kick in and lower that to what we had originally. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>	2023-12-06 23:15:33 +00:00
Faith Ekstrand	09fc5e1c4d	nir: Split has_[su]dot_4x8 bits into regular and _sat versions Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>	2023-12-06 23:15:33 +00:00
Qiang Yu	7e4aac46ad	nir: add force_f2f16_rtz option to lower f2f16 to f2f16_rtz Used by OpenGL driver like radeonsi which has undefined rounding mode. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990>	2023-11-20 02:20:17 +00:00
Faith Ekstrand	3af5af429e	nir: Optimize boolean ieq/ine with an immediate Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26120>	2023-11-10 21:46:55 +00:00
Eric Anholt	b416248cb5	nir: Add nir_lower_dsign as 64-bit fsign lowering. Right now some drivers are doing dsign lowering in GLSL and haven't had to have a NIR path due to there not being a corresponding vulkan driver. We want this in NIR now so that we can retire that batch of GLSL lowering code. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25777>	2023-10-24 00:16:30 +00:00
Marek Olšák	8ff4847b64	nir/algebraic: use only signed_zero_preserve_* for addition by 0 patterns, etc. Some GLSL versions will set inf_preserve but not the other flags. Additions by 0 only affect signed zeros. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25392>	2023-10-17 17:27:12 +00:00
Alyssa Rosenzweig	be0ab37bac	nir/opt_algebraic: Optimize LLVM booleans Helps OpenCL kernels. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25687>	2023-10-13 02:55:48 +00:00
Alyssa Rosenzweig	8df8d1e2f2	nir/opt_algebraic: Reduce int64 If we just want the bottom 32-bits we don't need a full 64-bit operation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25625>	2023-10-12 21:03:31 +00:00
Rhys Perry	65afc8bebf	nir/algebraic: optimize u2u32(a >> 32) fossil-db (navi21): Totals from 352 (0.44% of 79330) affected shaders: Instrs: 271816 -> 271240 (-0.21%); split: -0.28%, +0.07% CodeSize: 1546520 -> 1544448 (-0.13%); split: -0.23%, +0.09% SpillVGPRs: 832 -> 827 (-0.60%); split: -1.08%, +0.48% Latency: 4037120 -> 4021748 (-0.38%); split: -0.41%, +0.03% InvThroughput: 1369540 -> 1362066 (-0.55%); split: -0.59%, +0.04% VClause: 6476 -> 6471 (-0.08%); split: -0.12%, +0.05% SClause: 6798 -> 6794 (-0.06%) Copies: 44828 -> 44630 (-0.44%); split: -0.89%, +0.45% Branches: 8845 -> 8844 (-0.01%); split: -0.05%, +0.03% PreSGPRs: 14684 -> 14659 (-0.17%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25409>	2023-09-27 22:13:01 +00:00
Georg Lehmann	136a698251	nir/opt_algebraic: remove broken fddx/fddy patterns These patterns are broken in the following scenario: %1 = f2fmp %0 %2 = fddx %1 %3 = ... // non quad uniform if %3 { %4 = f2f32 %2 ... } Which would turn into %3 = ... if %3 { %4 = fddx %0 ... } Yet another example that shows why derivative instructions should be be intrinsics, not alu. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25014>	2023-09-08 14:14:47 +00:00
Ian Romanick	5ce6e09ffc	nir/algebraic: Remove redundant pack / unpack lowering patterns No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24900>	2023-08-25 14:54:11 -07:00
Georg Lehmann	9cf6984200	nir: unify lower_find_msb with has_{find_msb_rev,uclz} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	2ac7e6614a	nir: unify lower_bitfield_extract with has_bfe Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	34c3f81614	nir: unify lower_bitfield_insert with has_{bfm,bfi,bitfield_select} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Marek Olšák	1ac379c4a0	nir/algebraic: collapse ALU opcodes sourcing NaN Undef will be replaced by NaN whenever it leads to elimination of FP instructions. This implements the elimination part. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>	2023-08-19 14:18:52 -04:00
Alyssa Rosenzweig	a257e2daad	nir: Lower fquantize2f16 Passes dEQP-VK.spirv_assembly.opquantize. Unlike the DXIL lowering, this should correctly handle NaNs. (I belive Dozen has a bug here that is masked by running constant folding early and poor CTS coverage.) It is also faster than the DXIL lowering for hardware that supports f2f16 conversions natively. It is not as good as a backend implementation that could flush-to-zero in hardware... but for a debug instruction it should be more than good enough. It might be slightly better to multiply with 0.0 to get the appropriate zero, but NIR really likes optimizing that out ... Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24616>	2023-08-18 22:20:02 +00:00
Georg Lehmann	44d0b785cc	nir/opt_algebraic: combine bitz/bitnz Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23298>	2023-06-29 13:39:30 +00:00
Pavel Ondračka	b4ca45911d	nir_opt_algebraic: don't use i32csel without native integer support Otherwise nir_lower_int_to_float (or specifically nir_gather_ssa_types) will fail to recognize we already have float constants and converts them again. Example from spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test with r300 driver (after enabling has_fused_comp_and_csel). impl main { block block_0: /* preds: / vec1 32 ssa_0 = load_const (0x00000000 = 0.000000) vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1) / gl_Vertex / vec3 32 ssa_2 = load_const (0x00000000, 0x3e800000, 0x3f800000) = (0.000000, 0.250000, 1.000000) vec3 32 ssa_3 = load_const (0x00000000, 0x3f000000, 0x3f800000) = (0.000000, 0.500000, 1.000000) vec3 32 ssa_4 = load_const (0x00000000, 0x3f400000, 0x3f800000) = (0.000000, 0.750000, 1.000000) vec2 32 ssa_5 = load_const (0x00000000, 0x3f800000) = (0.000000, 1.000000) vec1 32 ssa_6 = load_const (0x3f800000 = 1.000000) vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0) vec4 32 ssa_8 = load_const (0x00000000, 0x00000001, 0x00000002, 0x00000003) = (0.000000, 0.000000, 0.000000, 0.000000) vec4 1 ssa_9 = ilt ssa_8, ssa_7.xxxx vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4 vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3 vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2 vec3 32 ssa_15 = i32csel_gt ssa_7.xxx, ssa_12, ssa_6.xxx vec4 32 ssa_14 = fsat ssa_15.xyxz intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2()) / gl_FrontColor / intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) / gl_Position / / succs: block_1 / block block_1: } and after nir_lower_int_to_float impl main { block block_0: / preds: / vec1 32 ssa_0 = load_const (0x00000000 = 0.000000) vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1) / gl_Vertex / vec3 32 ssa_2 = load_const (0x00000000, 0x4e7a0000, 0x4e7e0000) = (0.000000, 1048576000.000000, 1065353216.000000) vec3 32 ssa_3 = load_const (0x00000000, 0x4e7c0000, 0x4e7e0000) = (0.000000, 1056964608.000000, 1065353216.000000) vec3 32 ssa_4 = load_const (0x00000000, 0x4e7d0000, 0x4e7e0000) = (0.000000, 1061158912.000000, 1065353216.000000) vec2 32 ssa_5 = load_const (0x00000000, 0x4e7e0000) = (0.000000, 1065353216.000000) vec1 32 ssa_6 = load_const (0x4e7e0000 = 1065353216.000000) vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0) vec4 32 ssa_8 = load_const (0x00000000, 0x3f800000, 0x40000000, 0x40400000) = (0.000000, 1.000000, 2.000000, 3.000000) vec4 1 ssa_9 = flt ssa_8, ssa_7.xxxx vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4 vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3 vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2 vec3 32 ssa_13 = fcsel_gt ssa_7.xxx, ssa_12, ssa_6.xxx vec4 32 ssa_14 = fsat ssa_13.xyxz intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2()) / gl_FrontColor / intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) / gl_Position / / succs: block_1 */ block block_1: } Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23704>	2023-06-22 07:25:44 +00:00
Ian Romanick	de60b463d7	nir/algebraic: Simplify various trivial bfi These are mostly just obvious patterns that somebody will eventually want to add. DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar results (Ice Lake shown) total instructions in shared programs: 20570033 -> 20570026 (<.01%) instructions in affected programs: 7363 -> 7356 (-0.10%) helped: 6 / HURT: 0 total cycles in shared programs: 902118781 -> 902118854 (<.01%) cycles in affected programs: 419132 -> 419205 (0.02%) helped: 4 / HURT: 2 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819500 -> 152819380 (-0.00%) Cycles: 15014627187 -> 15014624437 (-0.00%) Totals from 115 (0.02% of 662497) affected shaders: Instrs: 28963 -> 28843 (-0.41%) Cycles: 404582 -> 401832 (-0.68%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Ian Romanick	541e7eb389	nir/algebraic: Optimize some u2f of bfi v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See piglit!819. DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown) total instructions in shared programs: 20570063 -> 20570033 (<.01%) instructions in affected programs: 452 -> 422 (-6.64%) helped: 30 / HURT: 0 total cycles in shared programs: 902118723 -> 902118781 (<.01%) cycles in affected programs: 1762 -> 1820 (3.29%) helped: 0 / HURT: 29 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819969 -> 152819500 (-0.00%) Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00% Totals from 469 (0.07% of 662497) affected shaders: Instrs: 7644 -> 7175 (-6.14%) Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00

1 2 3 4 5 ...

540 commits