fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Isabella Basso	a27bcd63d0	nir/algebraic: extend mediump patterns Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Italo Nicola <italonicola@collabora.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Isabella Basso	b3685f3ba7	nir/algebraic: insert patterns inside optimizations list Some patterns were outside the list of optimizations. Fixes: `b86305bb` ("nir/algebraic: collapse conversion opcodes (many patterns)") Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Ian Romanick	831f9d3f61	nir/algebraic: Optimize some ifind_msb to ufind_msb On Intel platforms, the uclz lowering if ufind_msb is either one instruction better (Gfx7 and newer) or two instructions better (all older platforms) than the ifind_msb implementations. On platforms that use lower_find_msb_to_reverse, there should be no difference. All Haswell and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19938662 -> 19938634 (<.01%) instructions in affected programs: 850 -> 822 (-3.29%) helped: 2 / HURT: 0 total cycles in shared programs: 858467067 -> 858465538 (<.01%) cycles in affected programs: 10080 -> 8551 (-15.17%) helped: 2 / HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	2d6f48f6ef	nir/algebraic: Do not generate 8- or 16-bit find_msb The next commit will add validation to restrict this instruction (and others) to only 32-bit or 64-bit sources. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	28311f9d02	nir: intel/compiler: Move ufind_msb lowering to NIR Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Cycles in all programs: 9098346105 -> 9098333765 (-0.0%) Cycles helped: 6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	a4052e70ea	nir/algebraic: Only lower ufind_msb with 32-bit sources The 31-ufind_msb_rev(x) lowering only produces the correct result for 32-bit sources. ufind_msb_rev can also have 64-bit sources, and most platforms are expected to lower this to 32-bit instructions with extra logic operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	0cc7bf63b7	nir: intel/compiler: Move ifind_msb lowering to NIR Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so no @32 annotation is required. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Georg Lehmann	aeb68c29b4	nir/opt_algebraic: add patterns for iand/ior of feq/fneu with 0 Foz-DB Navi21: Totals from 1245 (0.92% of 134913) affected shaders: VGPRs: 66232 -> 66248 (+0.02%); split: -0.01%, +0.04% CodeSize: 5874976 -> 5868168 (-0.12%); split: -0.17%, +0.05% MaxWaves: 25278 -> 25274 (-0.02%); split: +0.01%, -0.02% Instrs: 1087502 -> 1085267 (-0.21%); split: -0.21%, +0.00% Latency: 6531489 -> 6531672 (+0.00%); split: -0.04%, +0.05% InvThroughput: 1531774 -> 1532327 (+0.04%); split: -0.02%, +0.05% VClause: 22218 -> 22202 (-0.07%); split: -0.08%, +0.00% SClause: 45906 -> 45873 (-0.07%); split: -0.08%, +0.01% Copies: 64004 -> 64102 (+0.15%); split: -0.24%, +0.39% Branches: 21529 -> 21534 (+0.02%); split: -0.00%, +0.03% PreSGPRs: 51936 -> 51850 (-0.17%) PreVGPRs: 55393 -> 55398 (+0.01%); split: -0.02%, +0.03% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21576>	2023-03-01 11:24:43 +00:00
Emma Anholt	6d52e6fd2c	nir: Port a floor->truncate algebraic opt pattern from GLSL. Prevents regression when dropping code from the GLSL optimizer. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>	2023-02-28 03:36:09 +00:00
Emma Anholt	ef02581590	nir: Add optimization for fdot(x, 0) -> 0. We had all these nice fdot opts to drop individual channels that were 0, but nothing handling it being entirely 0! Avoids r300g regression when dropping them from GLSL. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>	2023-02-28 03:36:08 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Timur Kristóf	1244506c15	nir/opt_algebraic: Add optimization for ieq/ine and right-shift. Fossil DB stats on GFX11: Totals from 1343 (1.00% of 134913) affected shaders: SpillSGPRs: 7145 -> 7137 (-0.11%) CodeSize: 20737744 -> 20739148 (+0.01%); split: -0.02%, +0.03% Instrs: 4010443 -> 4008449 (-0.05%); split: -0.05%, +0.00% Latency: 50021520 -> 50021105 (-0.00%); split: -0.00%, +0.00% InvThroughput: 6354371 -> 6354112 (-0.00%); split: -0.00%, +0.00% VClause: 63035 -> 63038 (+0.00%); split: -0.01%, +0.01% SClause: 121162 -> 121166 (+0.00%) Copies: 251354 -> 251058 (-0.12%); split: -0.18%, +0.06% PreSGPRs: 137283 -> 137299 (+0.01%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20936>	2023-02-02 03:08:19 +00:00
Timur Kristóf	65a917cb6e	nir: Add algebraic optimization for VKD3D-Proton fp32->fp16 conversion. VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16. Because the spec does not specify a rounding mode, it emits a sequence to ensure D3D-like behaviour for infinity. When we know the current backend has pack_half_2x16_rtz_split, we can eliminate the extra sequence. Fossil DB stats on GFX11: Totals from 835 (0.62% of 134913) affected shaders: VGPRs: 49368 -> 49224 (-0.29%) CodeSize: 5341956 -> 5124564 (-4.07%) Instrs: 1024062 -> 987041 (-3.62%) Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00% InvThroughput: 908189 -> 870253 (-4.18%) VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01% SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01% Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00% Branches: 18498 -> 18465 (-0.18%) PreSGPRs: 38409 -> 38331 (-0.20%) PreVGPRs: 44089 -> 43834 (-0.58%) Note, some fossils are from before this pattern was added to VKD3D-Proton, so the above may not reflect real-world impact. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>	2023-01-26 12:24:24 +00:00
Timur Kristóf	7985933a6d	nir: Lower pack_half_2x16_split to RTZ if available. Constant folding always uses RTNE for pack_half_2x16_split, but some backends implement it with RTZ. Lowering to RTZ when available ensures that the behaviour will be consistent between constant folding and the backend. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>	2023-01-26 12:24:24 +00:00
Alyssa Rosenzweig	c3839bd540	nir: Optimize vendored sin/cos the same way As we've done for the AMD one, to prevent any codegen regression from switching the Midgard lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Italo Nicola <italonicola@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>	2023-01-16 22:20:43 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Ian Romanick	b60b2f2add	nir/algebraic: Optimize some b2i involved in masking operations v2: Remove the ineg from the b2i in the ior pattern. Suggested by Jason. All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19914441 -> 19914369 (<.01%) instructions in affected programs: 63507 -> 63435 (-0.11%) helped: 24 / HURT: 0 total cycles in shared programs: 853869766 -> 853851470 (<.01%) cycles in affected programs: 10551542 -> 10533246 (-0.17%) helped: 24 / HURT: 0 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141163061 -> 141092683 (-0.0%) Instructions helped: 14103 Instructions hurt: 55 Cycles in all programs: 9132376195 -> 9133183045 (+0.0%) Cycles helped: 13775 Cycles hurt: 380 Spills in all programs: 18286 -> 18284 (-0.0%) Spills helped: 1 Fills in all programs: 30647 -> 30643 (-0.0%) Fills helped: 1 Gained: 133 Lost: 130 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Ian Romanick	ba0b248ac2	nir/algebraic: Eliminate unary op on src of integer comparison w/ zero This helps because it enables cmod propagation to do more. The removed patterns involving b2i will be handled by other existing patterns after the unary operations are removed. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19914458 -> 19914441 (<.01%) instructions in affected programs: 5456 -> 5439 (-0.31%) helped: 17 / HURT: 0 total cycles in shared programs: 855302118 -> 853869766 (-0.17%) cycles in affected programs: 327354347 -> 325921995 (-0.44%) helped: 291 / HURT: 81 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141205979 -> 141205961 (-0.0%) Instructions helped: 4 Instructions hurt: 3 SENDs in all programs: 7466919 -> 7466913 (-0.0%) SENDs helped: 1 Cycles in all programs: 9133387327 -> 9133384475 (-0.0%) Cycles helped: 3 Cycles hurt: 12 In the shader that was helped for sends, it appears that a NIR pass that moves code out of loops was able to move 3 send operations outside a loop after this change. I did not investigate further. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	ee15d89322	nir/algebraic: Simplify min and max of b2i This prevents ~400 shader-db regresssions and a handful of fossil-db regressions after i2b is always lowered. All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown) total cycles in shared programs: 855301494 -> 855302118 (<.01%) cycles in affected programs: 52787 -> 53411 (1.18%) helped: 4 / HURT: 5 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141206055 -> 141205979 (-0.0%) Instructions helped: 14 Cycles in all programs: 9133376616 -> 9133387327 (+0.0%) Cycles helped: 13 Cycles hurt: 3 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	19222867e4	nir/algebraic: Reassociate some iand to eliminate an operation No shader-db changes on any Intel platform. All of the helped shaders were presumably regressed by `4676b3d3dd` (nir: Use nir_test_mask instead of i2b(iand)). v2: Add some comments explaining why specific replacements are used. In the umin pattern, only markup the first usage of 'b' in the source pattern. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 141384970 -> 141200966 (-0.1%) Instructions helped: 45842 Cycles in all programs: 9133648977 -> 9133282672 (-0.0%) Cycles helped: 26812 Cycles hurt: 6025 Gained: 23 Lost: 135 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	d48ce1f47d	nir/algebraic: Remove redundant i2b(b2i(x)) patterns A loop below already adds all the permutations... including the 1-bit version that isn't included in this group. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	14a9bb04e4	nir/algebraic: Remove redundant i2b(-x) pattern The exact same pattern appears later (around line 1323). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Georg Lehmann	4dff3ff005	nir/opt_algebraic: Optimize open coded bfm. Foz-DB Navi21: Totals from 1553 (1.15% of 134913) affected shaders: SpillVGPRs: 2246 -> 2223 (-1.02%); split: -1.42%, +0.40% CodeSize: 10409156 -> 10410720 (+0.02%); split: -0.03%, +0.04% Instrs: 1899725 -> 1898773 (-0.05%); split: -0.07%, +0.02% Latency: 71225814 -> 71118314 (-0.15%); split: -0.21%, +0.06% InvThroughput: 13384926 -> 13330369 (-0.41%); split: -0.47%, +0.06% VClause: 38309 -> 38284 (-0.07%); split: -0.17%, +0.11% SClause: 70743 -> 70706 (-0.05%) Copies: 167296 -> 167230 (-0.04%); split: -0.28%, +0.24% Branches: 42446 -> 42444 (-0.00%); split: -0.01%, +0.00% PreVGPRs: 95191 -> 95188 (-0.00%) Some minor instructions count regressions in parallel-rdp because v_bfm_b32 can't use SDWA, but overall an improvement. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18887>	2022-12-09 14:59:16 +00:00
Rhys Perry	368be87255	nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half fossil-db (navi21): Totals from 457 (0.34% of 135636) affected shaders: Instrs: 259349 -> 250383 (-3.46%) CodeSize: 1411976 -> 1369136 (-3.03%) Latency: 2175961 -> 2148158 (-1.28%) InvThroughput: 502206 -> 490244 (-2.38%) Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>	2022-11-21 17:34:46 +00:00
Rhys Perry	e19584db2b	nir/algebraic: optimize open-coded uadd_sat/usub_sat fossil-db (navi21): Totals from 19 (0.01% of 135636) affected shaders: Instrs: 40730 -> 40688 (-0.10%) CodeSize: 217708 -> 217568 (-0.06%) Latency: 261466 -> 261373 (-0.04%) InvThroughput: 74944 -> 74896 (-0.06%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19473>	2022-11-18 18:31:32 +00:00
Timothy Arceri	63c4849e8b	nir: add another common ffract -> ffloor pattern shader-db results (BDW): total instructions in shared programs: 17527053 -> 17526931 (<.01%) instructions in affected programs: 5116 -> 4994 (-2.38%) helped: 25 HURT: 0 helped stats (abs) min: 2 max: 15 x̄: 4.88 x̃: 3 helped stats (rel) min: 0.25% max: 5.34% x̄: 3.39% x̃: 3.90% 95% mean confidence interval for instructions value: -6.19 -3.57 95% mean confidence interval for instructions %-change: -3.98% -2.81% Instructions are helped. total cycles in shared programs: 856680230 -> 856682009 (<.01%) cycles in affected programs: 6583780 -> 6585559 (0.03%) helped: 117 HURT: 77 helped stats (abs) min: 1 max: 854 x̄: 68.56 x̃: 16 helped stats (rel) min: <.01% max: 35.34% x̄: 2.12% x̃: 0.76% HURT stats (abs) min: 1 max: 2188 x̄: 127.27 x̃: 18 HURT stats (rel) min: 0.01% max: 22.66% x̄: 1.86% x̃: 0.67% 95% mean confidence interval for cycles value: -30.07 48.41 95% mean confidence interval for cycles %-change: -1.28% 0.19% Inconclusive result (value mean confidence interval includes 0). LOST: 3 GAINED: 1 Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19666>	2022-11-14 09:50:11 +11:00
Timothy Arceri	34c52d8cb9	nir: fix typo in lower_double options handling Seems the intention was to check that both flags were not enabled instead we were checking that the floor flag was both set and not set so the result would always be false. Fixes: `3749a6ecd2` ("nir: honor lower_double options for ffloor and ffract") Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19642>	2022-11-11 14:36:00 +00:00
Gert Wollny	917d992b32	nir/algeraic_opt: use double options too for lowering ftrunc@64 ftrunc@64 also might need lowering on fp64 only, especially now that it might be introduced by nir_lower_int64. Fixes: `29da985682` nir/lower_int64: Enable lowering of 64-bit float to 64-bit integer conversions. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19657>	2022-11-11 09:29:31 +00:00
Alyssa Rosenzweig	45a111c21c	nir/opt_algebraic: Fuse c - a * b to FMA Algebraically it is clear that -(a * b) + c = (-a) * b + c = fma(-a, b, c) But this is not clear from the NIR ('fadd', ('fneg', ('fmul', a, b)), c) Add rules to handle this case specially. Note we don't necessarily want to solve this by pushing fneg into fmul, because the rule opt_algebraic (not the late part where FMA fusing happens) specifically pulls fneg out of fmul to push fneg up multiplication chains. Noticed in the big glmark2 "terrain" shader, which has a cycle count reduced by 22% on Mali-G57 thanks to having this pattern a ton and being FMA bound. BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords Results on the same shader on AGX are also quite dramatic: BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ... AFTER: 1154 inst, 8040 bytes, 50 halfregs, ... Similar rules apply for fabs. v2: Use a loop over the bit sizes (suggested by Emma). shader-db on Valhall (open + small subset of closed), results on Bifrost are similar: total instructions in shared programs: 167975 -> 164970 (-1.79%) instructions in affected programs: 92642 -> 89637 (-3.24%) helped: 492 HURT: 25 helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3 helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91% HURT stats (abs) min: 1.0 max: 5.0 x̄: 2.80 x̃: 3 HURT stats (rel) min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37% 95% mean confidence interval for instructions value: -6.95 -4.68 95% mean confidence interval for instructions %-change: -3.08% -2.65% Instructions are helped. total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%) cycles in affected programs: 265.56 -> 247.66 (-6.74%) helped: 88 HURT: 2 helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0 helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25% HURT stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0 HURT stats (rel) min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42% 95% mean confidence interval for cycles value: -0.28 -0.12 95% mean confidence interval for cycles %-change: -6.30% -4.30% Cycles are helped. total fma in shared programs: 1582.42 -> 1535.06 (-2.99%) fma in affected programs: 871.58 -> 824.22 (-5.43%) helped: 502 HURT: 9 helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0 helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82% HURT stats (abs) min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0 HURT stats (rel) min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35% 95% mean confidence interval for fma value: -0.11 -0.08 95% mean confidence interval for fma %-change: -5.58% -4.93% Fma are helped. total cvt in shared programs: 665.55 -> 665.95 (0.06%) cvt in affected programs: 61.72 -> 62.12 (0.66%) helped: 33 HURT: 43 helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0 helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35% HURT stats (abs) min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0 HURT stats (rel) min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90% 95% mean confidence interval for cvt value: -0.01 0.02 95% mean confidence interval for cvt %-change: 0.23% 6.24% Inconclusive result (value mean confidence interval includes 0). total quadwords in shared programs: 93376 -> 91736 (-1.76%) quadwords in affected programs: 25376 -> 23736 (-6.46%) helped: 169 HURT: 1 helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8 helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00% HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 HURT stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% 95% mean confidence interval for quadwords value: -11.18 -8.11 95% mean confidence interval for quadwords %-change: -8.95% -7.36% Quadwords are helped. total threads in shared programs: 4697 -> 4701 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.00 1.00 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>	2022-11-01 22:39:45 -04:00
Karol Herbst	e58c004870	nir/algebraic: add vec8/16 cmp lowering Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>	2022-10-29 10:31:39 +00:00
Karol Herbst	5efbef833a	nir/algebraic: generalize vector_cmp lowering Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>	2022-10-29 10:31:39 +00:00
Karol Herbst	1d6014f267	nir/algebraic: add 8 and 64 bit urol and uror lowering Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>	2022-10-29 10:31:39 +00:00
Georg Lehmann	125741dbae	nir/opt_algebraic: Optimize various find_msb_rev patterns. From dxvk, dxil-spirv, fxc, dxc and others. Totals from 177 (0.13% of 134913) affected shaders: CodeSize: 1079504 -> 1059872 (-1.82%) Instrs: 195381 -> 192269 (-1.59%) Latency: 3664137 -> 3631951 (-0.88%) InvThroughput: 599479 -> 585675 (-2.30%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:33 +02:00
Georg Lehmann	7505be3497	nir/opt_algebraic: Add an option to lower uclz. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:10 +02:00
Georg Lehmann	1e552b9c95	nir/opt_algebraic: Mirror optimizations for find_msb_rev. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:56:44 +02:00
Rhys Perry	1ae73bc076	nir/algebraic: optimize b<<a + c<<a fossil-db (navi21): Totals from 248 (0.18% of 135636) affected shaders: Instrs: 85836 -> 85611 (-0.26%); split: -0.27%, +0.00% CodeSize: 481304 -> 480332 (-0.20%); split: -0.21%, +0.00% Latency: 9596559 -> 9596152 (-0.00%); split: -0.00%, +0.00% InvThroughput: 1423707 -> 1423670 (-0.00%) SClause: 3872 -> 3874 (+0.05%) PreSGPRs: 5034 -> 5038 (+0.08%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19137>	2022-10-20 18:57:23 +00:00
Alyssa Rosenzweig	ac2964dfbd	nir: Be smarter fusing ffma If there is a single use of fmul, and that single use is fadd, it makes sense to fuse ffma, as we already do. However, if there are multiple uses, fusing may impede code gen. Consider the source fragment: a = fmul(x, y) b = fadd(a, z) c = fmin(a, t) d = fmax(b, c) The fmul has two uses. The current ffma fusing is greedy and will produce the following "optimized" code. a = fmul(x, y) b = ffma(x, y, z) c = fmin(a, t) d = fmax(b, c) Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1 fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of one multiply and an add. Depending on the ISA, that could impede scheduling or increase code size. It can also increase register pressure, extending the live range. It's tempting to gate on is_used_once, but that would hurt in cases where we really do fuse everything, e.g.: a = fmul(x, y) b = fadd(a, z) c = fadd(a, t) For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2 fadd. So what we really want is to fuse ffma iff the fmul will get deleted. That occurs iff all uses of the fmul are fadd and will themselves get fused to ffma, leaving fmul to get dead code eliminated. That's easy to implement with a new NIR search helper, checking that all uses are fadd. shader-db results on Mali-G57 [open shader-db + subset of closed]: total instructions in shared programs: 179491 -> 178991 (-0.28%) instructions in affected programs: 36862 -> 36362 (-1.36%) helped: 190 HURT: 27 total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%) cycles in affected programs: 72.02 -> 70.56 (-2.02%) helped: 28 HURT: 1 total fma in shared programs: 1590.47 -> 1582.61 (-0.49%) fma in affected programs: 319.95 -> 312.09 (-2.46%) helped: 194 HURT: 1 total cvt in shared programs: 812.98 -> 813.03 (<.01%) cvt in affected programs: 118.53 -> 118.58 (0.04%) helped: 65 HURT: 81 total quadwords in shared programs: 98968 -> 98840 (-0.13%) quadwords in affected programs: 2960 -> 2832 (-4.32%) helped: 20 HURT: 4 total threads in shared programs: 4693 -> 4697 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 v2: Update trace checksums for virgl due to numerical differences. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>	2022-10-15 17:47:31 +00:00
Gert Wollny	2e50bf19cd	nir: move fusing csel and comparisons to opt_late_algebraic With that simple comparisons are cleaned up properly. This helps with some tesselation shaders on r600. Shader-db stats R600/Cayman: -------------------------------------------------------------- total dw in shared programs: 1621806 -> 1620884 (-0.06%) dw in affected programs: 41650 -> 40728 (-2.21%) helped: 211 HURT: 4 helped stats (abs) min: 2 max: 26 x̄: 4.46 x̃: 4 helped stats (rel) min: 0.30% max: 9.68% x̄: 2.87% x̃: 2.52% HURT stats (abs) min: 2 max: 8 x̄: 5.00 x̃: 5 HURT stats (rel) min: 0.23% max: 1.67% x̄: 1.02% x̃: 1.09% 95% mean confidence interval for dw value: -4.81 -3.77 95% mean confidence interval for dw %-change: -3.03% -2.57% Dw are helped. total gprs in shared programs: 41192 -> 41182 (-0.02%) gprs in affected programs: 731 -> 721 (-1.37%) helped: 53 HURT: 45 helped stats (abs) min: 1 max: 3 x̄: 1.23 x̃: 1 helped stats (rel) min: 5.88% max: 40.00% x̄: 16.56% x̃: 14.29% HURT stats (abs) min: 1 max: 2 x̄: 1.22 x̃: 1 HURT stats (rel) min: 7.69% max: 40.00% x̄: 19.42% x̃: 20.00% 95% mean confidence interval for gprs value: -0.37 0.16 95% mean confidence interval for gprs %-change: -3.92% 3.85% Inconclusive result (value mean confidence interval includes 0). total alu_groups in shared programs: 203677 -> 203632 (-0.02%) alu_groups in affected programs: 2876 -> 2831 (-1.56%) helped: 68 HURT: 30 helped stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1 helped stats (rel) min: 0.84% max: 25.00% x̄: 7.48% x̃: 5.41% HURT stats (abs) min: 1 max: 6 x̄: 1.80 x̃: 1 HURT stats (rel) min: 1.98% max: 33.33% x̄: 10.09% x̃: 5.61% 95% mean confidence interval for alu_groups value: -0.81 -0.11 95% mean confidence interval for alu_groups %-change: -4.20% <.01% Alu_groups are helped. total loops in shared programs: 72 -> 72 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cf in shared programs: 88230 -> 88233 (<.01%) cf in affected programs: 71 -> 74 (4.23%) helped: 1 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.89% max: 33.33% x̄: 17.14% x̃: 16.67% 95% mean confidence interval for cf value: -0.51 1.71 95% mean confidence interval for cf %-change: -24.20% 38.29% Inconclusive result (value mean confidence interval includes 0). total stack in shared programs: 3827 -> 3827 (0.00%) stack in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 45.32 -> 41.69 (-8.01%) -------------------------------------------------------------- v2: Simplify replacement pattern (Rhys Perry) v3: fix ws (Alexander Orzechowski) v4: move the original lowering to opt_late_algebraic and drop cleanup code (Alyssa) v5: Add shader-sb stats (Alyssa) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18970>	2022-10-14 13:08:15 +00:00
Georg Lehmann	bfb12a3b6a	nir/opt_algebraic: Optimize more (a cmp b ? a : b) to min/max. Foz-DB Navi21: Totals from 112 (0.08% of 134913) affected shaders: CodeSize: 1618384 -> 1618172 (-0.01%); split: -0.06%, +0.04% Instrs: 307695 -> 307535 (-0.05%); split: -0.05%, +0.00% Latency: 3590228 -> 3589658 (-0.02%); split: -0.02%, +0.00% InvThroughput: 563692 -> 563447 (-0.04%); split: -0.05%, +0.01% Copies: 24541 -> 24519 (-0.09%); split: -0.10%, +0.01% Branches: 13480 -> 13468 (-0.09%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18548>	2022-09-30 11:10:52 +00:00
Rhys Perry	b301c33f65	nir/algebraic: optimize fabs(bcsel(b, fneg(a), a)) fossil-db (Sienna Cichlid): Totals from 207 (0.15% of 134913) affected shaders: VGPRs: 7152 -> 6928 (-3.13%) CodeSize: 762404 -> 752888 (-1.25%) MaxWaves: 6138 -> 6146 (+0.13%) Instrs: 144031 -> 142184 (-1.28%) Latency: 817783 -> 807286 (-1.28%) InvThroughput: 151031 -> 147497 (-2.34%) VClause: 1490 -> 1453 (-2.48%) SClause: 3357 -> 3331 (-0.77%); split: -0.92%, +0.15% Copies: 9632 -> 9555 (-0.80%); split: -0.81%, +0.01% Branches: 4306 -> 4270 (-0.84%) PreSGPRs: 11232 -> 11218 (-0.12%); split: -0.15%, +0.03% PreVGPRs: 6307 -> 6121 (-2.95%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14772>	2022-09-14 12:16:07 +00:00
Rhys Perry	c23411a970	nir/algebraic: optimize bits=umin(bits, 32-(offset&0x1f)) Optimizes patterns which are created by recent versions of vkd3d-proton, when constant folding doesn't eliminate it entirely: - ubitfield_extract(value, offset, umin(bits, 32-(offset&0x1f))) - ibitfield_extract(value, offset, umin(bits, 32-(offset&0x1f))) - bitfield_insert(base, insert, offset, umin(bits, 32-(offset&0x1f))) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13225>	2022-09-13 20:36:06 +00:00
Georg Lehmann	4d7fe94f3a	nir/opt_algebraic: Optimize unpacking of upcasts to 64bit integers. Foz-DB Navi21: Totals from 7 (0.01% of 134913) affected shaders: CodeSize: 213364 -> 213028 (-0.16%) Instrs: 38347 -> 38319 (-0.07%) Latency: 780148 -> 779776 (-0.05%) InvThroughput: 520098 -> 519851 (-0.05%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18435>	2022-09-08 14:37:56 +00:00
Georg Lehmann	6eb4dfca23	nir/opt_algebraic: Optimize d3d9 pow with fmulz. Foz-DB Navi21: Totals from 69 (0.05% of 134913) affected shaders: CodeSize: 255684 -> 253788 (-0.74%); split: -0.74%, +0.00% Instrs: 46307 -> 46052 (-0.55%); split: -0.55%, +0.00% Latency: 533255 -> 530742 (-0.47%); split: -0.48%, +0.01% InvThroughput: 110001 -> 109156 (-0.77%) VClause: 839 -> 844 (+0.60%); split: -1.19%, +1.79% SClause: 1411 -> 1395 (-1.13%) Copies: 1828 -> 1816 (-0.66%); split: -1.09%, +0.44% PreSGPRs: 2243 -> 2232 (-0.49%) PreVGPRs: 2213 -> 2192 (-0.95%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18145>	2022-08-31 17:07:24 +00:00
Georg Lehmann	9c2c47884d	nir/opt_algebraic: Optimize check for single bit. Foz-DB Navi21: Totals from 3239 (2.40% of 134913) affected shaders: SpillSGPRs: 110 -> 102 (-7.27%) CodeSize: 17426512 -> 17344808 (-0.47%); split: -0.48%, +0.01% Instrs: 3194264 -> 3179366 (-0.47%) Latency: 20498012 -> 20481419 (-0.08%); split: -0.08%, +0.00% InvThroughput: 3311738 -> 3311282 (-0.01%); split: -0.02%, +0.00% SClause: 145810 -> 145690 (-0.08%) Copies: 171748 -> 169009 (-1.59%); split: -1.63%, +0.03% Branches: 86610 -> 86370 (-0.28%) PreSGPRs: 138036 -> 137104 (-0.68%) PreVGPRs: 138540 -> 138545 (+0.00%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17429>	2022-08-31 18:36:33 +02:00
Daniel Schürmann	9b843f8e4a	nir/opt_algebraic: a & ~a -> 0 Also re-ordered some optimizations for better readability. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18250>	2022-08-30 14:10:22 +00:00
Emma Anholt	f6c5b1d6c6	nir: Split usub_sat lowering flag from uadd_sat. Intel vec4 would like to do uadd_sat, but use lowering for usub_sat. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>	2022-07-22 17:54:28 +00:00
Georg Lehmann	aac8ddae2f	nir/opt_algebraic: Optimize [ui](add\|sub)_sat with 0. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>	2022-07-13 07:34:09 +00:00
Rhys Perry	bc1ea2fda9	nir/algebraic: optimize bcsel(c, fsin/cos_amd(a), fsin/cos_amd(b)) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10587>	2022-07-07 22:18:08 +00:00
Ian Romanick	a2a2fbc510	nir/algebraic: Fix NaN-unsafe fcsel patterns For example, the proof for this pattern (('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)), would be bcsel(a < 0, b, c) bcsel(!(a < 0), c, b) bcsel(a >= 0, c, b) fcsel_ge(a, c, b) However, !(a < 0) => (a >= 0) is well known to produce different results if `a` is NaN. Instead of that replacement, use this replacement: bcsel(a < 0, b, c) bcsel(-0 < -a, b, c) bcsel(0 < -a, b, c) fcsel_gt(-a, b, c) This is NaN-safe and exact. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Fixes: `0f5b3c37c5` ("nir: Add opcodes for fused comp + csel and optimizations") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>	2022-06-22 19:26:59 +00:00
Georg Lehmann	bfc25d6ec9	nir: Add optional lowering for mul_32x16. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13895>	2022-06-01 17:09:25 +00:00

1 2 3 4 5 ...

478 commits