Commit graph

478 commits

Author SHA1 Message Date
Isabella Basso
a27bcd63d0 nir/algebraic: extend mediump patterns
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Italo Nicola <italonicola@collabora.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Isabella Basso
b3685f3ba7 nir/algebraic: insert patterns inside optimizations list
Some patterns were outside the list of optimizations.

Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Ian Romanick
831f9d3f61 nir/algebraic: Optimize some ifind_msb to ufind_msb
On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.

On platforms that use lower_find_msb_to_reverse, there should be no
difference.

All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0

total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
2d6f48f6ef nir/algebraic: Do not generate 8- or 16-bit find_msb
The next commit will add validation to restrict this instruction (and
others) to only 32-bit or 64-bit sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
28311f9d02 nir: intel/compiler: Move ufind_msb lowering to NIR
Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
a4052e70ea nir/algebraic: Only lower ufind_msb with 32-bit sources
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
0cc7bf63b7 nir: intel/compiler: Move ifind_msb lowering to NIR
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Georg Lehmann
aeb68c29b4 nir/opt_algebraic: add patterns for iand/ior of feq/fneu with 0
Foz-DB Navi21:
Totals from 1245 (0.92% of 134913) affected shaders:
VGPRs: 66232 -> 66248 (+0.02%); split: -0.01%, +0.04%
CodeSize: 5874976 -> 5868168 (-0.12%); split: -0.17%, +0.05%
MaxWaves: 25278 -> 25274 (-0.02%); split: +0.01%, -0.02%
Instrs: 1087502 -> 1085267 (-0.21%); split: -0.21%, +0.00%
Latency: 6531489 -> 6531672 (+0.00%); split: -0.04%, +0.05%
InvThroughput: 1531774 -> 1532327 (+0.04%); split: -0.02%, +0.05%
VClause: 22218 -> 22202 (-0.07%); split: -0.08%, +0.00%
SClause: 45906 -> 45873 (-0.07%); split: -0.08%, +0.01%
Copies: 64004 -> 64102 (+0.15%); split: -0.24%, +0.39%
Branches: 21529 -> 21534 (+0.02%); split: -0.00%, +0.03%
PreSGPRs: 51936 -> 51850 (-0.17%)
PreVGPRs: 55393 -> 55398 (+0.01%); split: -0.02%, +0.03%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21576>
2023-03-01 11:24:43 +00:00
Emma Anholt
6d52e6fd2c nir: Port a floor->truncate algebraic opt pattern from GLSL.
Prevents regression when dropping code from the GLSL optimizer.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
2023-02-28 03:36:09 +00:00
Emma Anholt
ef02581590 nir: Add optimization for fdot(x, 0) -> 0.
We had all these nice fdot opts to drop individual channels that were 0,
but nothing handling it being entirely 0!  Avoids r300g regression when
dropping them from GLSL.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
2023-02-28 03:36:08 +00:00
Ian Romanick
ea413e826b nir: Eliminate nir_op_f2b
Builds on the work of !15121.  This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.

No shader-db or fossil-db changes on any Intel platform.

v2: Rebase on 1a35acd8d9.

v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.

v4: Another rebase. Remove f2b stuff from Midgard.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
2023-02-03 22:39:57 +00:00
Timur Kristóf
1244506c15 nir/opt_algebraic: Add optimization for ieq/ine and right-shift.
Fossil DB stats on GFX11:
Totals from 1343 (1.00% of 134913) affected shaders:
SpillSGPRs: 7145 -> 7137 (-0.11%)
CodeSize: 20737744 -> 20739148 (+0.01%); split: -0.02%, +0.03%
Instrs: 4010443 -> 4008449 (-0.05%); split: -0.05%, +0.00%
Latency: 50021520 -> 50021105 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 6354371 -> 6354112 (-0.00%); split: -0.00%, +0.00%
VClause: 63035 -> 63038 (+0.00%); split: -0.01%, +0.01%
SClause: 121162 -> 121166 (+0.00%)
Copies: 251354 -> 251058 (-0.12%); split: -0.18%, +0.06%
PreSGPRs: 137283 -> 137299 (+0.01%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20936>
2023-02-02 03:08:19 +00:00
Timur Kristóf
65a917cb6e nir: Add algebraic optimization for VKD3D-Proton fp32->fp16 conversion.
VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16.
Because the spec does not specify a rounding mode, it emits a sequence to ensure
D3D-like behaviour for infinity.

When we know the current backend has pack_half_2x16_rtz_split,
we can eliminate the extra sequence.

Fossil DB stats on GFX11:
Totals from 835 (0.62% of 134913) affected shaders:
VGPRs: 49368 -> 49224 (-0.29%)
CodeSize: 5341956 -> 5124564 (-4.07%)
Instrs: 1024062 -> 987041 (-3.62%)
Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00%
InvThroughput: 908189 -> 870253 (-4.18%)
VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01%
SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01%
Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00%
Branches: 18498 -> 18465 (-0.18%)
PreSGPRs: 38409 -> 38331 (-0.20%)
PreVGPRs: 44089 -> 43834 (-0.58%)

Note, some fossils are from before this pattern was added to VKD3D-Proton,
so the above may not reflect real-world impact.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
2023-01-26 12:24:24 +00:00
Timur Kristóf
7985933a6d nir: Lower pack_half_2x16_split to RTZ if available.
Constant folding always uses RTNE for pack_half_2x16_split, but some
backends implement it with RTZ.

Lowering to RTZ when available ensures that the behaviour will be
consistent between constant folding and the backend.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
2023-01-26 12:24:24 +00:00
Alyssa Rosenzweig
c3839bd540 nir: Optimize vendored sin/cos the same way
As we've done for the AMD one, to prevent any codegen regression from switching
the Midgard lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>
2023-01-16 22:20:43 +00:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
b60b2f2add nir/algebraic: Optimize some b2i involved in masking operations
v2: Remove the ineg from the b2i in the ior pattern.  Suggested by
Jason.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914441 -> 19914369 (<.01%)
instructions in affected programs: 63507 -> 63435 (-0.11%)
helped: 24 / HURT: 0

total cycles in shared programs: 853869766 -> 853851470 (<.01%)
cycles in affected programs: 10551542 -> 10533246 (-0.17%)
helped: 24 / HURT: 0

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141163061 -> 141092683 (-0.0%)
Instructions helped: 14103
Instructions hurt: 55

Cycles in all programs: 9132376195 -> 9133183045 (+0.0%)
Cycles helped: 13775
Cycles hurt: 380

Spills in all programs: 18286 -> 18284 (-0.0%)
Spills helped: 1

Fills in all programs: 30647 -> 30643 (-0.0%)
Fills helped: 1

Gained: 133
Lost: 130

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
ba0b248ac2 nir/algebraic: Eliminate unary op on src of integer comparison w/ zero
This helps because it enables cmod propagation to do more.

The removed patterns involving b2i will be handled by other existing
patterns after the unary operations are removed.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914458 -> 19914441 (<.01%)
instructions in affected programs: 5456 -> 5439 (-0.31%)
helped: 17 / HURT: 0

total cycles in shared programs: 855302118 -> 853869766 (-0.17%)
cycles in affected programs: 327354347 -> 325921995 (-0.44%)
helped: 291 / HURT: 81

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141205979 -> 141205961 (-0.0%)
Instructions helped: 4
Instructions hurt: 3

SENDs in all programs: 7466919 -> 7466913 (-0.0%)
SENDs helped: 1

Cycles in all programs: 9133387327 -> 9133384475 (-0.0%)
Cycles helped: 3
Cycles hurt: 12

In the shader that was helped for sends, it appears that a NIR pass that
moves code out of loops was able to move 3 send operations outside a
loop after this change.  I did not investigate further.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
ee15d89322 nir/algebraic: Simplify min and max of b2i
This prevents ~400 shader-db regresssions and a handful of fossil-db
regressions after i2b is always lowered.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total cycles in shared programs: 855301494 -> 855302118 (<.01%)
cycles in affected programs: 52787 -> 53411 (1.18%)
helped: 4 / HURT: 5

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141206055 -> 141205979 (-0.0%)
Instructions helped: 14

Cycles in all programs: 9133376616 -> 9133387327 (+0.0%)
Cycles helped: 13
Cycles hurt: 3

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
19222867e4 nir/algebraic: Reassociate some iand to eliminate an operation
No shader-db changes on any Intel platform.

All of the helped shaders were presumably regressed by 4676b3d3dd (nir:
Use nir_test_mask instead of i2b(iand)).

v2: Add some comments explaining why specific replacements are used.  In
the umin pattern, only markup the first usage of 'b' in the source
pattern.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141384970 -> 141200966 (-0.1%)
Instructions helped: 45842

Cycles in all programs: 9133648977 -> 9133282672 (-0.0%)
Cycles helped: 26812
Cycles hurt: 6025

Gained: 23
Lost: 135

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
d48ce1f47d nir/algebraic: Remove redundant i2b(b2i(x)) patterns
A loop below already adds all the permutations... including the 1-bit
version that isn't included in this group.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
14a9bb04e4 nir/algebraic: Remove redundant i2b(-x) pattern
The exact same pattern appears later (around line 1323).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Georg Lehmann
4dff3ff005 nir/opt_algebraic: Optimize open coded bfm.
Foz-DB Navi21:
Totals from 1553 (1.15% of 134913) affected shaders:
SpillVGPRs: 2246 -> 2223 (-1.02%); split: -1.42%, +0.40%
CodeSize: 10409156 -> 10410720 (+0.02%); split: -0.03%, +0.04%
Instrs: 1899725 -> 1898773 (-0.05%); split: -0.07%, +0.02%
Latency: 71225814 -> 71118314 (-0.15%); split: -0.21%, +0.06%
InvThroughput: 13384926 -> 13330369 (-0.41%); split: -0.47%, +0.06%
VClause: 38309 -> 38284 (-0.07%); split: -0.17%, +0.11%
SClause: 70743 -> 70706 (-0.05%)
Copies: 167296 -> 167230 (-0.04%); split: -0.28%, +0.24%
Branches: 42446 -> 42444 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 95191 -> 95188 (-0.00%)

Some minor instructions count regressions in parallel-rdp
because v_bfm_b32 can't use SDWA, but overall an improvement.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18887>
2022-12-09 14:59:16 +00:00
Rhys Perry
368be87255 nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half
fossil-db (navi21):
Totals from 457 (0.34% of 135636) affected shaders:
Instrs: 259349 -> 250383 (-3.46%)
CodeSize: 1411976 -> 1369136 (-3.03%)
Latency: 2175961 -> 2148158 (-1.28%)
InvThroughput: 502206 -> 490244 (-2.38%)
Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>
2022-11-21 17:34:46 +00:00
Rhys Perry
e19584db2b nir/algebraic: optimize open-coded uadd_sat/usub_sat
fossil-db (navi21):
Totals from 19 (0.01% of 135636) affected shaders:
Instrs: 40730 -> 40688 (-0.10%)
CodeSize: 217708 -> 217568 (-0.06%)
Latency: 261466 -> 261373 (-0.04%)
InvThroughput: 74944 -> 74896 (-0.06%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19473>
2022-11-18 18:31:32 +00:00
Timothy Arceri
63c4849e8b nir: add another common ffract -> ffloor pattern
shader-db results (BDW):

total instructions in shared programs: 17527053 -> 17526931 (<.01%)
instructions in affected programs: 5116 -> 4994 (-2.38%)
helped: 25
HURT: 0
helped stats (abs) min: 2 max: 15 x̄: 4.88 x̃: 3
helped stats (rel) min: 0.25% max: 5.34% x̄: 3.39% x̃: 3.90%
95% mean confidence interval for instructions value: -6.19 -3.57
95% mean confidence interval for instructions %-change: -3.98% -2.81%
Instructions are helped.

total cycles in shared programs: 856680230 -> 856682009 (<.01%)
cycles in affected programs: 6583780 -> 6585559 (0.03%)
helped: 117
HURT: 77
helped stats (abs) min: 1 max: 854 x̄: 68.56 x̃: 16
helped stats (rel) min: <.01% max: 35.34% x̄: 2.12% x̃: 0.76%
HURT stats (abs)   min: 1 max: 2188 x̄: 127.27 x̃: 18
HURT stats (rel)   min: 0.01% max: 22.66% x̄: 1.86% x̃: 0.67%
95% mean confidence interval for cycles value: -30.07 48.41
95% mean confidence interval for cycles %-change: -1.28% 0.19%
Inconclusive result (value mean confidence interval includes 0).

LOST:   3
GAINED: 1

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19666>
2022-11-14 09:50:11 +11:00
Timothy Arceri
34c52d8cb9 nir: fix typo in lower_double options handling
Seems the intention was to check that both flags were not enabled
instead we were checking that the floor flag was both set and not
set so the result would always be false.

Fixes: 3749a6ecd2 ("nir: honor lower_double options for ffloor and ffract")

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19642>
2022-11-11 14:36:00 +00:00
Gert Wollny
917d992b32 nir/algeraic_opt: use double options too for lowering ftrunc@64
ftrunc@64 also might need lowering on fp64 only, especially now
that it might be introduced by nir_lower_int64.

Fixes: 29da985682
   nir/lower_int64: Enable lowering of 64-bit float to 64-bit integer conversions.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19657>
2022-11-11 09:29:31 +00:00
Alyssa Rosenzweig
45a111c21c nir/opt_algebraic: Fuse c - a * b to FMA
Algebraically it is clear that

   -(a * b) + c = (-a) * b + c = fma(-a, b, c)

But this is not clear from the NIR

   ('fadd', ('fneg', ('fmul', a, b)), c)

Add rules to handle this case specially. Note we don't necessarily want
to  solve this by pushing fneg into fmul, because the rule opt_algebraic
(not the late part where FMA fusing happens) specifically pulls fneg out
of fmul to push fneg up multiplication chains.

Noticed in the big glmark2 "terrain" shader, which has a cycle count
reduced by 22% on Mali-G57 thanks to having this pattern a ton and being
FMA bound.

BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords
AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords

Results on the same shader on AGX are also quite dramatic:

BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ...
AFTER: 1154 inst, 8040 bytes, 50 halfregs, ...

Similar rules apply for fabs.

v2: Use a loop over the bit sizes (suggested by Emma).

shader-db on Valhall (open + small subset of closed), results on Bifrost
are similar:

total instructions in shared programs: 167975 -> 164970 (-1.79%)
instructions in affected programs: 92642 -> 89637 (-3.24%)
helped: 492
HURT: 25
helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3
helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91%
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 2.80 x̃: 3
HURT stats (rel)   min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37%
95% mean confidence interval for instructions value: -6.95 -4.68
95% mean confidence interval for instructions %-change: -3.08% -2.65%
Instructions are helped.

total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%)
cycles in affected programs: 265.56 -> 247.66 (-6.74%)
helped: 88
HURT: 2
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0
helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25%
HURT stats (abs)   min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
HURT stats (rel)   min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42%
95% mean confidence interval for cycles value: -0.28 -0.12
95% mean confidence interval for cycles %-change: -6.30% -4.30%
Cycles are helped.

total fma in shared programs: 1582.42 -> 1535.06 (-2.99%)
fma in affected programs: 871.58 -> 824.22 (-5.43%)
helped: 502
HURT: 9
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0
helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82%
HURT stats (abs)   min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35%
95% mean confidence interval for fma value: -0.11 -0.08
95% mean confidence interval for fma %-change: -5.58% -4.93%
Fma are helped.

total cvt in shared programs: 665.55 -> 665.95 (0.06%)
cvt in affected programs: 61.72 -> 62.12 (0.66%)
helped: 33
HURT: 43
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0
helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35%
HURT stats (abs)   min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90%
95% mean confidence interval for cvt value: -0.01 0.02
95% mean confidence interval for cvt %-change: 0.23% 6.24%
Inconclusive result (value mean confidence interval includes 0).

total quadwords in shared programs: 93376 -> 91736 (-1.76%)
quadwords in affected programs: 25376 -> 23736 (-6.46%)
helped: 169
HURT: 1
helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8
helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
95% mean confidence interval for quadwords value: -11.18 -8.11
95% mean confidence interval for quadwords %-change: -8.95% -7.36%
Quadwords are helped.

total threads in shared programs: 4697 -> 4701 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>
2022-11-01 22:39:45 -04:00
Karol Herbst
e58c004870 nir/algebraic: add vec8/16 cmp lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Karol Herbst
5efbef833a nir/algebraic: generalize vector_cmp lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Karol Herbst
1d6014f267 nir/algebraic: add 8 and 64 bit urol and uror lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Georg Lehmann
125741dbae nir/opt_algebraic: Optimize various find_msb_rev patterns.
From dxvk, dxil-spirv, fxc, dxc and others.

Totals from 177 (0.13% of 134913) affected shaders:
CodeSize: 1079504 -> 1059872 (-1.82%)
Instrs: 195381 -> 192269 (-1.59%)
Latency: 3664137 -> 3631951 (-0.88%)
InvThroughput: 599479 -> 585675 (-2.30%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>
2022-10-22 11:57:33 +02:00
Georg Lehmann
7505be3497 nir/opt_algebraic: Add an option to lower uclz.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>
2022-10-22 11:57:10 +02:00
Georg Lehmann
1e552b9c95 nir/opt_algebraic: Mirror optimizations for find_msb_rev.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>
2022-10-22 11:56:44 +02:00
Rhys Perry
1ae73bc076 nir/algebraic: optimize b<<a + c<<a
fossil-db (navi21):
Totals from 248 (0.18% of 135636) affected shaders:
Instrs: 85836 -> 85611 (-0.26%); split: -0.27%, +0.00%
CodeSize: 481304 -> 480332 (-0.20%); split: -0.21%, +0.00%
Latency: 9596559 -> 9596152 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1423707 -> 1423670 (-0.00%)
SClause: 3872 -> 3874 (+0.05%)
PreSGPRs: 5034 -> 5038 (+0.08%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19137>
2022-10-20 18:57:23 +00:00
Alyssa Rosenzweig
ac2964dfbd nir: Be smarter fusing ffma
If there is a single use of fmul, and that single use is fadd, it makes
sense to fuse ffma, as we already do. However, if there are multiple
uses, fusing may impede code gen. Consider the source fragment:

   a = fmul(x, y)
   b = fadd(a, z)
   c = fmin(a, t)
   d = fmax(b, c)

The fmul has two uses. The current ffma fusing is greedy and will
produce the following "optimized" code.

   a = fmul(x, y)
   b = ffma(x, y, z)
   c = fmin(a, t)
   d = fmax(b, c)

Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1
fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of
one multiply and an add. Depending on the ISA, that could impede
scheduling or increase code size. It can also increase register
pressure, extending the live range.

It's tempting to gate on is_used_once, but that would hurt in cases
where we really do fuse everything, e.g.:

   a = fmul(x, y)
   b = fadd(a, z)
   c = fadd(a, t)

For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2
fadd. So what we really want is to fuse ffma iff the fmul will get
deleted. That occurs iff all uses of the fmul are fadd and will
themselves get fused to ffma, leaving fmul to get dead code eliminated.
That's easy to implement with a new NIR search helper, checking that all
uses are fadd.

shader-db results on Mali-G57 [open shader-db + subset of closed]:

total instructions in shared programs: 179491 -> 178991 (-0.28%)
instructions in affected programs: 36862 -> 36362 (-1.36%)
helped: 190
HURT: 27

total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%)
cycles in affected programs: 72.02 -> 70.56 (-2.02%)
helped: 28
HURT: 1

total fma in shared programs: 1590.47 -> 1582.61 (-0.49%)
fma in affected programs: 319.95 -> 312.09 (-2.46%)
helped: 194
HURT: 1

total cvt in shared programs: 812.98 -> 813.03 (<.01%)
cvt in affected programs: 118.53 -> 118.58 (0.04%)
helped: 65
HURT: 81

total quadwords in shared programs: 98968 -> 98840 (-0.13%)
quadwords in affected programs: 2960 -> 2832 (-4.32%)
helped: 20
HURT: 4

total threads in shared programs: 4693 -> 4697 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0

v2: Update trace checksums for virgl due to numerical differences.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>
2022-10-15 17:47:31 +00:00
Gert Wollny
2e50bf19cd nir: move fusing csel and comparisons to opt_late_algebraic
With that simple comparisons are cleaned up properly.

This helps with some tesselation shaders on r600.

Shader-db stats R600/Cayman:

--------------------------------------------------------------
total dw in shared programs: 1621806 -> 1620884 (-0.06%)
dw in affected programs: 41650 -> 40728 (-2.21%)
helped: 211
HURT: 4
helped stats (abs) min: 2 max: 26 x̄: 4.46 x̃: 4
helped stats (rel) min: 0.30% max: 9.68% x̄: 2.87% x̃: 2.52%
HURT stats (abs)   min: 2 max: 8 x̄: 5.00 x̃: 5
HURT stats (rel)   min: 0.23% max: 1.67% x̄: 1.02% x̃: 1.09%
95% mean confidence interval for dw value: -4.81 -3.77
95% mean confidence interval for dw %-change: -3.03% -2.57%
Dw are helped.

total gprs in shared programs: 41192 -> 41182 (-0.02%)
gprs in affected programs: 731 -> 721 (-1.37%)
helped: 53
HURT: 45
helped stats (abs) min: 1 max: 3 x̄: 1.23 x̃: 1
helped stats (rel) min: 5.88% max: 40.00% x̄: 16.56% x̃: 14.29%
HURT stats (abs)   min: 1 max: 2 x̄: 1.22 x̃: 1
HURT stats (rel)   min: 7.69% max: 40.00% x̄: 19.42% x̃: 20.00%
95% mean confidence interval for gprs value: -0.37 0.16
95% mean confidence interval for gprs %-change: -3.92% 3.85%
Inconclusive result (value mean confidence interval includes 0).

total alu_groups in shared programs: 203677 -> 203632 (-0.02%)
alu_groups in affected programs: 2876 -> 2831 (-1.56%)
helped: 68
HURT: 30
helped stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1
helped stats (rel) min: 0.84% max: 25.00% x̄: 7.48% x̃: 5.41%
HURT stats (abs)   min: 1 max: 6 x̄: 1.80 x̃: 1
HURT stats (rel)   min: 1.98% max: 33.33% x̄: 10.09% x̃: 5.61%
95% mean confidence interval for alu_groups value: -0.81 -0.11
95% mean confidence interval for alu_groups %-change: -4.20% <.01%
Alu_groups are helped.

total loops in shared programs: 72 -> 72 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cf in shared programs: 88230 -> 88233 (<.01%)
cf in affected programs: 71 -> 74 (4.23%)
helped: 1
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.89% max: 33.33% x̄: 17.14% x̃: 16.67%
95% mean confidence interval for cf value: -0.51 1.71
95% mean confidence interval for cf %-change: -24.20% 38.29%
Inconclusive result (value mean confidence interval includes 0).

total stack in shared programs: 3827 -> 3827 (0.00%)
stack in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 45.32 -> 41.69 (-8.01%)
--------------------------------------------------------------

v2: Simplify replacement pattern (Rhys Perry)
v3: fix ws (Alexander Orzechowski)
v4: move the original lowering to opt_late_algebraic and
    drop cleanup code (Alyssa)
v5: Add shader-sb stats (Alyssa)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18970>
2022-10-14 13:08:15 +00:00
Georg Lehmann
bfb12a3b6a nir/opt_algebraic: Optimize more (a cmp b ? a : b) to min/max.
Foz-DB Navi21:
Totals from 112 (0.08% of 134913) affected shaders:
CodeSize: 1618384 -> 1618172 (-0.01%); split: -0.06%, +0.04%
Instrs: 307695 -> 307535 (-0.05%); split: -0.05%, +0.00%
Latency: 3590228 -> 3589658 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 563692 -> 563447 (-0.04%); split: -0.05%, +0.01%
Copies: 24541 -> 24519 (-0.09%); split: -0.10%, +0.01%
Branches: 13480 -> 13468 (-0.09%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18548>
2022-09-30 11:10:52 +00:00
Rhys Perry
b301c33f65 nir/algebraic: optimize fabs(bcsel(b, fneg(a), a))
fossil-db (Sienna Cichlid):
Totals from 207 (0.15% of 134913) affected shaders:
VGPRs: 7152 -> 6928 (-3.13%)
CodeSize: 762404 -> 752888 (-1.25%)
MaxWaves: 6138 -> 6146 (+0.13%)
Instrs: 144031 -> 142184 (-1.28%)
Latency: 817783 -> 807286 (-1.28%)
InvThroughput: 151031 -> 147497 (-2.34%)
VClause: 1490 -> 1453 (-2.48%)
SClause: 3357 -> 3331 (-0.77%); split: -0.92%, +0.15%
Copies: 9632 -> 9555 (-0.80%); split: -0.81%, +0.01%
Branches: 4306 -> 4270 (-0.84%)
PreSGPRs: 11232 -> 11218 (-0.12%); split: -0.15%, +0.03%
PreVGPRs: 6307 -> 6121 (-2.95%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14772>
2022-09-14 12:16:07 +00:00
Rhys Perry
c23411a970 nir/algebraic: optimize bits=umin(bits, 32-(offset&0x1f))
Optimizes patterns which are created by recent versions of vkd3d-proton,
when constant folding doesn't eliminate it entirely:
- ubitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- ibitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- bitfield_insert(base, insert, offset, umin(bits, 32-(offset&0x1f)))

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13225>
2022-09-13 20:36:06 +00:00
Georg Lehmann
4d7fe94f3a nir/opt_algebraic: Optimize unpacking of upcasts to 64bit integers.
Foz-DB Navi21:
Totals from 7 (0.01% of 134913) affected shaders:
CodeSize: 213364 -> 213028 (-0.16%)
Instrs: 38347 -> 38319 (-0.07%)
Latency: 780148 -> 779776 (-0.05%)
InvThroughput: 520098 -> 519851 (-0.05%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18435>
2022-09-08 14:37:56 +00:00
Georg Lehmann
6eb4dfca23 nir/opt_algebraic: Optimize d3d9 pow with fmulz.
Foz-DB Navi21:
Totals from 69 (0.05% of 134913) affected shaders:
CodeSize: 255684 -> 253788 (-0.74%); split: -0.74%, +0.00%
Instrs: 46307 -> 46052 (-0.55%); split: -0.55%, +0.00%
Latency: 533255 -> 530742 (-0.47%); split: -0.48%, +0.01%
InvThroughput: 110001 -> 109156 (-0.77%)
VClause: 839 -> 844 (+0.60%); split: -1.19%, +1.79%
SClause: 1411 -> 1395 (-1.13%)
Copies: 1828 -> 1816 (-0.66%); split: -1.09%, +0.44%
PreSGPRs: 2243 -> 2232 (-0.49%)
PreVGPRs: 2213 -> 2192 (-0.95%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18145>
2022-08-31 17:07:24 +00:00
Georg Lehmann
9c2c47884d nir/opt_algebraic: Optimize check for single bit.
Foz-DB Navi21:
Totals from 3239 (2.40% of 134913) affected shaders:
SpillSGPRs: 110 -> 102 (-7.27%)
CodeSize: 17426512 -> 17344808 (-0.47%); split: -0.48%, +0.01%
Instrs: 3194264 -> 3179366 (-0.47%)
Latency: 20498012 -> 20481419 (-0.08%); split: -0.08%, +0.00%
InvThroughput: 3311738 -> 3311282 (-0.01%); split: -0.02%, +0.00%
SClause: 145810 -> 145690 (-0.08%)
Copies: 171748 -> 169009 (-1.59%); split: -1.63%, +0.03%
Branches: 86610 -> 86370 (-0.28%)
PreSGPRs: 138036 -> 137104 (-0.68%)
PreVGPRs: 138540 -> 138545 (+0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17429>
2022-08-31 18:36:33 +02:00
Daniel Schürmann
9b843f8e4a nir/opt_algebraic: a & ~a -> 0
Also re-ordered some optimizations for better readability.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18250>
2022-08-30 14:10:22 +00:00
Emma Anholt
f6c5b1d6c6 nir: Split usub_sat lowering flag from uadd_sat.
Intel vec4 would like to do uadd_sat, but use lowering for usub_sat.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>
2022-07-22 17:54:28 +00:00
Georg Lehmann
aac8ddae2f nir/opt_algebraic: Optimize [ui](add|sub)_sat with 0.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>
2022-07-13 07:34:09 +00:00
Rhys Perry
bc1ea2fda9 nir/algebraic: optimize bcsel(c, fsin/cos_amd(a), fsin/cos_amd(b))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10587>
2022-07-07 22:18:08 +00:00
Ian Romanick
a2a2fbc510 nir/algebraic: Fix NaN-unsafe fcsel patterns
For example, the proof for this pattern

    (('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)),

would be

    bcsel(a < 0, b, c)
    bcsel(!(a < 0), c, b)
    bcsel(a >= 0, c, b)
    fcsel_ge(a, c, b)

However, !(a < 0) => (a >= 0) is well known to produce different
results if `a` is NaN.

Instead of that replacement, use this replacement:

    bcsel(a < 0, b, c)
    bcsel(-0 < -a, b, c)
    bcsel(0 < -a, b, c)
    fcsel_gt(-a, b, c)

This is NaN-safe and exact.

Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Fixes: 0f5b3c37c5 ("nir: Add opcodes for fused comp + csel and optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
2022-06-22 19:26:59 +00:00
Georg Lehmann
bfc25d6ec9 nir: Add optional lowering for mul_32x16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13895>
2022-06-01 17:09:25 +00:00