Commit graph

552 commits

Author SHA1 Message Date
Timur Kristóf
be68aeafdc nir/opt_algebraic: Add various bitfield extract patterns.
v2 (Georg Lehmann):
- fixed incorrect imin in ubfe_ubfe
- simplied outer_bits of ushr((ubfe, ...), ...) opt
- added is_used_once to iand(ushr(), ...) opt to improve stats

For-DB Navi21:
Totals from 3309 (4.18% of 79206) affected shaders:
Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03%
CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06%
Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01%
InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01%
VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02%
SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01%
Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09%
Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01%
PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01%
PreVGPRs: 151477 -> 151474 (-0.00%)
VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04%
SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03%
VMEM: 188035 -> 188060 (+0.01%)
SMEM: 259632 -> 259630 (-0.00%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>
2024-10-29 10:51:09 +00:00
Rhys Perry
8efc765a3d nir/algebraic: fix shfr optimization with zero src2
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 08903bbe89 ("nir: add mqsad_4x8, shfr and nir_opt_mqsad")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808>
2024-10-25 09:59:40 +00:00
Georg Lehmann
1f9b82bb2a nir/opt_algebraic: optimize -0.0 + a
Foz-DB Navi21:
Totals from 428 (0.54% of 79395) affected shaders:
MaxWaves: 8510 -> 8512 (+0.02%)
Instrs: 731062 -> 729665 (-0.19%); split: -0.19%, +0.00%
CodeSize: 3735788 -> 3728324 (-0.20%); split: -0.20%, +0.00%
VGPRs: 27328 -> 27336 (+0.03%); split: -0.03%, +0.06%
SpillSGPRs: 315 -> 314 (-0.32%)
Latency: 3872986 -> 3873236 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 971001 -> 970056 (-0.10%); split: -0.17%, +0.08%
VClause: 11954 -> 11956 (+0.02%); split: -0.02%, +0.03%
SClause: 17361 -> 17358 (-0.02%)
Copies: 59038 -> 59045 (+0.01%); split: -0.22%, +0.24%
Branches: 17685 -> 17656 (-0.16%)
PreSGPRs: 26103 -> 26102 (-0.00%)
PreVGPRs: 23220 -> 23206 (-0.06%)
VALU: 515293 -> 513963 (-0.26%); split: -0.26%, +0.00%
SALU: 91591 -> 91544 (-0.05%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31770>
2024-10-23 08:58:34 +00:00
Georg Lehmann
f9d2aad7a3 nir: remove alu ddx/ddy
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Gert Wollny
f19f1ec17b nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.

Fixes: 5218cff34b
   nir/algebraic: avoid double lowering of some fp64 operations

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>
2024-09-30 23:51:02 +00:00
Ian Romanick
057c7c9f53 nir/algebraic: Recognize open-coded bitfield_reverse in XCOM 2
The XCOM 2 shaders in my shader-db use iadd instead of ior.

No fossil-db changes on any Intel platform.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787210 -> 19787034 (<.01%)
instructions in affected programs: 1187 -> 1011 (-14.83%)
helped: 6 / HURT: 0

total cycles in shared programs: 906024436 -> 906012612 (<.01%)
cycles in affected programs: 72978 -> 61154 (-16.20%)
helped: 6 / HURT: 0

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>
2024-09-13 00:21:00 +00:00
Ian Romanick
a780305818 nir/algebraic: Optimize more comparisons with b2f
shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19781108 -> 19772614 (-0.04%)
instructions in affected programs: 372638 -> 364144 (-2.28%)
helped: 2915 / HURT: 0

total cycles in shared programs: 905907644 -> 905822682 (<.01%)
cycles in affected programs: 5573453 -> 5488491 (-1.52%)
helped: 2363 / HURT: 234

LOST:   42
GAINED: 16

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152519634 -> 152519610 (-0.00%)
Cycle count: 17122707642 -> 17122710974 (+0.00%); split: -0.00%, +0.00%

Totals from 5 (0.00% of 633222) affected shaders:
Instrs: 2827 -> 2803 (-0.85%)
Cycle count: 83089 -> 86421 (+4.01%); split: -0.12%, +4.13%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31068>
2024-09-10 04:15:58 +00:00
Georg Lehmann
6378bbaa82 nir/opt_algebraic: reassociate constants in ior(iand) chains
Mostly affects one F1_23 shader that packs bitfields bit by bit.

Totals from 3 (0.00% of 79395) affected shaders:
Instrs: 5004 -> 4202 (-16.03%)
CodeSize: 30992 -> 23952 (-22.72%)
Latency: 28894 -> 28464 (-1.49%)
InvThroughput: 4095 -> 3934 (-3.93%)
Copies: 363 -> 376 (+3.58%)
PreVGPRs: 110 -> 109 (-0.91%)
VALU: 3035 -> 2504 (-17.50%)
SALU: 463 -> 459 (-0.86%)

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31009>
2024-09-05 22:04:05 +00:00
Caio Oliveira
74be809237 compiler: Allow derivative_group to be used for all stages in shader_info
These will now also be used by stages that have workgroups.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>
2024-09-03 20:03:18 +00:00
Ian Romanick
f11a414645 nir/algebraic: Remove incorrect bfi of iand pattern
The comment says, "This expands to (b & 3) & ~0xc which is (b & 3) &
3." This is not correct. ~0xc is actually 0xfffffff3.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: #11695
Fixes: 1c7e35d4e0 ("nir/algebraic: Optimize some bit operation nonsense observed in some shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30913>
2024-08-29 22:21:55 +00:00
Georg Lehmann
ef970c5a9d nir: optimize pack_uint_2x16 of pack_half(a, 0)
Foz-DB Navi31:
Totals from 31 (0.04% of 79395) affected shaders:
Instrs: 6157 -> 6065 (-1.49%)
CodeSize: 35676 -> 34936 (-2.07%)
Latency: 23979 -> 23805 (-0.73%); split: -0.79%, +0.07%
InvThroughput: 5248 -> 5124 (-2.36%)
VALU: 3224 -> 3162 (-1.92%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30855>
2024-08-28 07:16:55 +00:00
Ian Romanick
198d8d9c03 nir/algebraic: Improve some find_lsb and ifind_msb patterns
These patterns were observed in shaders from parallel-rdp.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake, DG2, Ice Lake had Skylake similar results. (Meteor Lake shown)
Totals:
Instrs: 152535883 -> 152535673 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112406110 -> 17122827810 (+0.06%); split: -0.01%, +0.07%
Spill count: 78525 -> 78523 (-0.00%)
Fill count: 148132 -> 148127 (-0.00%); split: -0.01%, +0.00%
Max live registers: 31855320 -> 31855314 (-0.00%)

Totals from 206 (0.03% of 633223) affected shaders:
Instrs: 797124 -> 796914 (-0.03%); split: -0.03%, +0.00%
Cycle count: 4716743323 -> 4727165023 (+0.22%); split: -0.05%, +0.27%
Spill count: 18781 -> 18779 (-0.01%)
Fill count: 31381 -> 31376 (-0.02%); split: -0.03%, +0.01%
Max live registers: 31872 -> 31866 (-0.02%)

Tiger Lake
Totals:
Instrs: 150560465 -> 150560343 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15482372893 -> 15479328542 (-0.02%); split: -0.02%, +0.00%
Fill count: 103509 -> 103512 (+0.00%)
Max live registers: 31760378 -> 31760374 (-0.00%)

Totals from 199 (0.03% of 632445) affected shaders:
Instrs: 679513 -> 679391 (-0.02%); split: -0.02%, +0.00%
Cycle count: 4258406125 -> 4255361774 (-0.07%); split: -0.09%, +0.02%
Fill count: 30609 -> 30612 (+0.01%)
Max live registers: 30502 -> 30498 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Marek Olšák
ecfefe823e nir/opt_algebraic: use fmulz for fpow lowering to fix incorrect rendering
The original implementation in all radeon drivers had this behavior.

Fixes: 9bc1fb4c07 - ac/llvm,radeonsi: lower nir_fpow for aco and llvm
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11464

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30069>
2024-07-23 15:23:27 +00:00
Ian Romanick
faee9426ab nir/algebraic: Optimize some masking of extract_u8 operations
I observed this pattern in several Red Dead Redemption 2 shaders.

No shader-db changes on any Intel platform.

v2: Remove duplicated patterns. Noticed by Georg.

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519393 -> 151507192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17208246858 -> 17177437340 (-0.18%); split: -0.25%, +0.07%
Spill count: 80830 -> 80759 (-0.09%); split: -0.09%, +0.00%
Fill count: 152754 -> 152179 (-0.38%); split: -0.40%, +0.02%

Totals from 7531 (1.20% of 630198) affected shaders:
Instrs: 12606141 -> 12593940 (-0.10%); split: -0.10%, +0.00%
Cycle count: 5466605514 -> 5435795996 (-0.56%); split: -0.79%, +0.22%
Spill count: 25251 -> 25180 (-0.28%); split: -0.29%, +0.01%
Fill count: 45143 -> 44568 (-1.27%); split: -1.36%, +0.08%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>
2024-07-20 00:19:05 +00:00
Ian Romanick
1c7e35d4e0 nir/algebraic: Optimize some bit operation nonsense observed in some shaders
In updates (not post at the time of this writing) to !29884, a change
caused many spill and fill regressions shader for OpenGL Tomb
Raider. While looking at that shader, I noticed some odd patterns. I
initially added these patterns to counteract the regressions caused by
the other change, but I had no luck. On Ice Lake... this cuts 99
instructions from the shader.

shader-db:

All Intel platforms had simliar results. (Meteor Lake shown)
total instructions in shared programs: 19732341 -> 19732295 (<.01%)
instructions in affected programs: 1744 -> 1698 (-2.64%)
helped: 1 / HURT: 0

total cycles in shared programs: 916273716 -> 916273068 (<.01%)
cycles in affected programs: 14266 -> 13618 (-4.54%)
helped: 1 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519575 -> 151519393 (-0.00%)
Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00%

Totals from 159 (0.03% of 630198) affected shaders:
Instrs: 51970 -> 51788 (-0.35%)
Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>
2024-07-20 00:19:05 +00:00
Christian Gmeiner
87786a7a7e nak: Move imad late optimization to nir
It is more or less just a code move, but I touched
is_only_used_by_iadd(..) to match the style of the other functions in
that file.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>
2024-07-12 05:54:46 +00:00
Konstantin Seurer
d9e41e8a8c nir: Stop using "capture : true" for nir_opt_algebraic
"calture : true" is suboptimal and and prevents the script from writing
multiple files in one go.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30041>
2024-07-06 15:51:06 +00:00
Georg Lehmann
3e86d2452f nir/opt_algebraic: add various unordered/ordered patterns from aco
Foz-DB Navi21:
Totals from 6747 (8.50% of 79395) affected shaders:
MaxWaves: 134646 -> 134642 (-0.00%)
Instrs: 7830299 -> 7828851 (-0.02%); split: -0.03%, +0.01%
CodeSize: 43045532 -> 43010260 (-0.08%); split: -0.09%, +0.00%
VGPRs: 378960 -> 378968 (+0.00%)
SpillSGPRs: 1209 -> 1208 (-0.08%)
Latency: 74667977 -> 74670405 (+0.00%); split: -0.02%, +0.02%
InvThroughput: 20124981 -> 20124768 (-0.00%); split: -0.02%, +0.02%
VClause: 162870 -> 162868 (-0.00%); split: -0.00%, +0.00%
SClause: 277280 -> 277315 (+0.01%); split: -0.00%, +0.02%
Copies: 528627 -> 528667 (+0.01%); split: -0.00%, +0.01%
PreSGPRs: 319526 -> 319508 (-0.01%)
PreVGPRs: 334264 -> 334265 (+0.00%); split: -0.00%, +0.00%
VALU: 5485412 -> 5485408 (-0.00%); split: -0.02%, +0.02%
SALU: 743882 -> 742301 (-0.21%); split: -0.21%, +0.00%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Georg Lehmann
434dfb51ca nir/opt_algebraic: optimize cmp(fneg(a), #b) and feq with fabs
Foz-DB Navi21:
Totals from 2483 (3.13% of 79395) affected shaders:
Instrs: 4067533 -> 4067756 (+0.01%); split: -0.00%, +0.01%
CodeSize: 22525156 -> 22499904 (-0.11%); split: -0.12%, +0.01%
Latency: 51967223 -> 51963654 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 16685020 -> 16683045 (-0.01%); split: -0.01%, +0.00%
SClause: 131890 -> 131907 (+0.01%)
Copies: 402557 -> 402510 (-0.01%); split: -0.01%, +0.00%
Branches: 146962 -> 146958 (-0.00%)
PreSGPRs: 118404 -> 118401 (-0.00%)
PreVGPRs: 123791 -> 123787 (-0.00%)
VALU: 2709846 -> 2710174 (+0.01%); split: -0.00%, +0.01%
SALU: 565883 -> 565786 (-0.02%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Georg Lehmann
98cc57bccb nir/optimize cmp(a, -0.0)
+0.0 can use an inline constant for AMD hardware,
-0.0 needs a literal.

Foz-DB Navi21:
Totals from 1014 (1.28% of 79395) affected shaders:
Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00%
CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00%
Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00%
VClause: 79475 -> 79478 (+0.00%)
SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00%
Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00%
PreSGPRs: 59155 -> 59144 (-0.02%)
VALU: 2032016 -> 2032034 (+0.00%)
SALU: 386424 -> 385729 (-0.18%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Georg Lehmann
8e6bf596cb nir/opt_algebraic: look through fabs/fneg when matching fmulz/ffmaz
Prevents regressions when removing input modifiers from a == 0.0.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Georg Lehmann
75b1fa9263 nir/opt_algebraic: alternative 8bit pack_[us]norm_4x8 lowering
Foz-DB Navi21:
Totals from 42 (0.05% of 79395) affected shaders:
Instrs: 2709529 -> 2705848 (-0.14%)
CodeSize: 14720732 -> 14711384 (-0.06%); split: -0.06%, +0.00%
VGPRs: 4096 -> 4104 (+0.20%)
Latency: 17907612 -> 17904468 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 4723551 -> 4722649 (-0.02%); split: -0.02%, +0.00%
Copies: 223516 -> 219819 (-1.65%)
Branches: 109578 -> 109594 (+0.01%); split: -0.00%, +0.02%
VALU: 1730848 -> 1727151 (-0.21%)

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28882>
2024-06-04 17:00:29 +00:00
Ian Romanick
7b7e5cf5d4 nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers
v2: Add some comments explaining some of the nuance of the shift
optimizations. Fix a bug in the shift count calculation of the upper
32-bits. Move the @64 from the variable to the opcode. All suggested
by Jordan.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154507026 -> 154506576 (-0.00%)
Cycle count: 17436298868 -> 17436295016 (-0.00%)
Max live registers: 32635309 -> 32635297 (-0.00%)

Totals from 42 (0.01% of 632575) affected shaders:
Instrs: 5616 -> 5166 (-8.01%)
Cycle count: 133680 -> 129828 (-2.88%)
Max live registers: 1158 -> 1146 (-1.04%)

No fossil-db changes on any other Intel platform.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
4834df82e2 nir/algebraic: More patterns to generate iadd3
I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.

Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.

shader-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2

total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356

LOST:   10
GAINED: 12

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)

Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
22095c60bc nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64
This allows us to not generate 64-bit iadd3 on Intel but continue
generating it for NVIDIA.

No shader-db or fossil-db changes.

v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit
iadd3 on NVIDIA platforms.

v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of
the optimizations from ever occuring.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Georg Lehmann
dcab408a6c nir: remove unpack_half_flush_to_zero
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>
2024-05-31 09:46:35 +00:00
Marek Olšák
b4bd380704 nir/algebraic: eliminate pack+unpack and unpack+pack pairs
A new NIR shader for AMD drivers will need this.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29233>
2024-05-17 22:04:00 +00:00
Francisco Jerez
15a10786e3 nir: Add option to lower 64-bit uadd_sat.
C.f. 16be909936.  Intel Xe2 won't
support saturation for 64-bit integer addition, regardless of
signedness.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Ian Romanick
1b8cf06fc7 nir/algebraic: Optimize some extract_* expressions
v2: Add missing '!options->lower_extract_byte' to the last two
patterns. Every driver except Asahi sets both or neither.

shader-db:

All Intel platforms had similar results. (DG2 shown)
total instructions in shared programs: 19659360 -> 19659356 (<.01%)
instructions in affected programs: 44 -> 40 (-9.09%)
helped: 2 / HURT: 0

total cycles in shared programs: 823432524 -> 823432520 (<.01%)
cycles in affected programs: 1722 -> 1718 (-0.23%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (DG2 shown)
Totals:
Instrs: 153989787 -> 153989617 (-0.00%)
Cycle count: 17562079230 -> 17562079493 (+0.00%); split: -0.00%, +0.00%

Totals from 24 (0.00% of 631369) affected shaders:
Instrs: 13733 -> 13563 (-1.24%)
Cycle count: 341392 -> 341655 (+0.08%); split: -0.25%, +0.33%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
2024-05-03 15:01:43 -07:00
Jesse Natalie
894f7f4387 nir_opt_algebraic: Add a couple optimizations for lowered unpack(pack())
I noticed some unnecessary 64-bit ints in shaders that were using doubles.
Perhaps there's a different missing optimization that should run on the
actual pack/unpack instructions before they're lowered, or maybe I'm just
lowering them too early, but these seem simple enough that we might want
them even for hand-rolled pack/unpack pairs.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27314>
2024-05-01 21:55:20 +00:00
Connor Abbott
32308fe9f1 ir3/nir: Fix imadsh_mix16 definition
The constant-folding definition and comments say that it takes the high
16 bits of the first source and low 16 bits of the second source, but
actually it's the opposite. The algebraic optimization, which actually
happens and needs to be correct, was correct but the comment above it
was wrong.

Note that in the way we use it when lowering multiplications, the
ordering doesn't matter.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
2024-04-26 12:55:14 +00:00
Iván Briano
7f97fa6df0 nir/algebraic: move float control conditions to be per instruction
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281>
2024-04-25 12:13:41 +00:00
Iván Briano
8c4cd3e74e nir/algebraic: support float controls conditions per instruction
v?:
 - Make the Python not awful (Dylan)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281>
2024-04-25 12:13:41 +00:00
Iván Briano
5218cff34b nir/algebraic: avoid double lowering of some fp64 operations
The ffloor@64 case, which lowers to use ffract, is already ignored if
nir_lower_dfract is set. Do the same thing for ftrunc@64 and ffract@64
and let nir_lower_doubles take care of them directly instead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28702>
2024-04-16 13:34:36 -07:00
Rhys Perry
08903bbe89 nir: add mqsad_4x8, shfr and nir_opt_mqsad
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26251>
2024-04-05 11:01:39 +00:00
Rhys Perry
b37804c8de nir/algebraic: optimize 64-bit comparisons with zero'd halves to 32-bit
These expect nir_lower_int64 to replace u2u64 to pack_64_2x32_split(, 0).

fossil-db (navi31):
Totals from 149 (0.19% of 79242) affected shaders:
Instrs: 433095 -> 431830 (-0.29%); split: -0.29%, +0.00%
CodeSize: 2165980 -> 2160284 (-0.26%); split: -0.27%, +0.00%
SpillSGPRs: 689 -> 688 (-0.15%)
Latency: 3801497 -> 3799901 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 1547916 -> 1546567 (-0.09%); split: -0.09%, +0.01%
VClause: 4698 -> 4693 (-0.11%)
SClause: 9981 -> 9977 (-0.04%); split: -0.05%, +0.01%
Copies: 66148 -> 65431 (-1.08%); split: -1.09%, +0.01%
PreSGPRs: 6732 -> 6729 (-0.04%)
PreVGPRs: 7976 -> 7945 (-0.39%)
VALU: 252936 -> 252336 (-0.24%)
SALU: 51794 -> 51274 (-1.00%); split: -1.03%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Rhys Perry
417eb390c6 nir/algebraic: remove duplicated iand(ien, ine)/ior(ieq, ieq) patterns
These don't seem useful, since they're already done in the early optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Rhys Perry
6952bb359c nir/algebraic: don't create 64-bit min/max/ior if lowered
fossil-db (navi31):
Totals from 58 (0.07% of 79242) affected shaders:
Instrs: 11692 -> 11304 (-3.32%)
CodeSize: 65836 -> 62412 (-5.20%)
VGPRs: 1320 -> 1344 (+1.82%)
Latency: 51712 -> 50234 (-2.86%)
InvThroughput: 10190 -> 10160 (-0.29%)
Copies: 460 -> 688 (+49.57%)
VALU: 6130 -> 5897 (-3.80%)
SALU: 1231 -> 1284 (+4.31%); split: -0.32%, +4.63%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Karol Herbst
f2b7c4ce29 nir: rework and fix rotate lowering
No driver supports urol/uror on all bit sizes. Intel gen11+ only for 16
and 32 bit, Nvidia GV100+ only for 32 bit. Etnaviv can support it on 8,
16 and 32 bit.

Also turn the `lower` into a `has` option as only two drivers actually
support `uror` and `urol` at this momemt.

Fixes crashes with CL integer_rotate on iris and nouveau since we emit
urol for `rotate`.

v2: always lower 64 bit

Fixes: fe0965afa6 ("spirv: Don't use libclc for rotate")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by (Intel and nir): Ian Romanick <ian.d.romanick@intel.com>

Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27090>
2024-01-22 10:27:44 +00:00
Rhys Perry
e86ab8173b nir/algebraic: optimize vkd3d-proton's MSAD
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26907>
2024-01-05 18:55:22 +00:00
Konstantin Seurer
b88ac6b381 nir: Optimize fpow with small constant exponents
They would be turned into exp(log(a)*b) instead, which is slow.

Totals from 2146 (2.52% of 85071) affected shaders:
MaxWaves: 35769 -> 35779 (+0.03%); split: +0.03%, -0.01%
Instrs: 6476835 -> 6465494 (-0.18%); split: -0.18%, +0.00%
CodeSize: 35382288 -> 35347092 (-0.10%); split: -0.10%, +0.00%
SpillSGPRs: 1055 -> 1017 (-3.60%)
Latency: 75211743 -> 75063623 (-0.20%); split: -0.20%, +0.00%
InvThroughput: 17525115 -> 17501745 (-0.13%); split: -0.14%, +0.00%
VClause: 200089 -> 200077 (-0.01%); split: -0.01%, +0.01%
SClause: 293566 -> 293480 (-0.03%); split: -0.03%, +0.00%
Copies: 649631 -> 640516 (-1.40%); split: -1.44%, +0.03%
Branches: 268441 -> 268325 (-0.04%)
PreSGPRs: 146868 -> 146045 (-0.56%)
PreVGPRs: 134125 -> 134128 (+0.00%); split: -0.00%, +0.01%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26727>
2024-01-02 11:16:14 +01:00
Faith Ekstrand
aac1e3f595 nir: Add a new has_fmulz_no_denorms flag
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26569>
2023-12-11 15:29:17 +00:00
Faith Ekstrand
d2ffcb6092 nir: Lower [su]dot_4x8_[ui]add_sat to [su]dot_4x8_[ui]add
Since nir_opt_algebraic runs on its own results, if the driver doesn't
have [su]dot_4x8_[ui]add then the [su]dot_4x8_[ui]add lowering rules
will kick in and lower that to what we had originally.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>
2023-12-06 23:15:33 +00:00
Faith Ekstrand
09fc5e1c4d nir: Split has_[su]dot_4x8 bits into regular and _sat versions
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>
2023-12-06 23:15:33 +00:00
Qiang Yu
7e4aac46ad nir: add force_f2f16_rtz option to lower f2f16 to f2f16_rtz
Used by OpenGL driver like radeonsi which has undefined rounding mode.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990>
2023-11-20 02:20:17 +00:00
Faith Ekstrand
3af5af429e nir: Optimize boolean ieq/ine with an immediate
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26120>
2023-11-10 21:46:55 +00:00
Eric Anholt
b416248cb5 nir: Add nir_lower_dsign as 64-bit fsign lowering.
Right now some drivers are doing dsign lowering in GLSL and haven't had to
have a NIR path due to there not being a corresponding vulkan driver.  We
want this in NIR now so that we can retire that batch of GLSL lowering
code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25777>
2023-10-24 00:16:30 +00:00
Marek Olšák
8ff4847b64 nir/algebraic: use only signed_zero_preserve_* for addition by 0 patterns, etc.
Some GLSL versions will set inf_preserve but not the other flags.
Additions by 0 only affect signed zeros.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25392>
2023-10-17 17:27:12 +00:00
Alyssa Rosenzweig
be0ab37bac nir/opt_algebraic: Optimize LLVM booleans
Helps OpenCL kernels.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25687>
2023-10-13 02:55:48 +00:00
Alyssa Rosenzweig
8df8d1e2f2 nir/opt_algebraic: Reduce int64
If we just want the bottom 32-bits we don't need a full 64-bit operation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25625>
2023-10-12 21:03:31 +00:00