fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Marek Olšák	8ff4847b64	nir/algebraic: use only signed_zero_preserve_* for addition by 0 patterns, etc. Some GLSL versions will set inf_preserve but not the other flags. Additions by 0 only affect signed zeros. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25392>	2023-10-17 17:27:12 +00:00
Alyssa Rosenzweig	be0ab37bac	nir/opt_algebraic: Optimize LLVM booleans Helps OpenCL kernels. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25687>	2023-10-13 02:55:48 +00:00
Alyssa Rosenzweig	8df8d1e2f2	nir/opt_algebraic: Reduce int64 If we just want the bottom 32-bits we don't need a full 64-bit operation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25625>	2023-10-12 21:03:31 +00:00
Rhys Perry	65afc8bebf	nir/algebraic: optimize u2u32(a >> 32) fossil-db (navi21): Totals from 352 (0.44% of 79330) affected shaders: Instrs: 271816 -> 271240 (-0.21%); split: -0.28%, +0.07% CodeSize: 1546520 -> 1544448 (-0.13%); split: -0.23%, +0.09% SpillVGPRs: 832 -> 827 (-0.60%); split: -1.08%, +0.48% Latency: 4037120 -> 4021748 (-0.38%); split: -0.41%, +0.03% InvThroughput: 1369540 -> 1362066 (-0.55%); split: -0.59%, +0.04% VClause: 6476 -> 6471 (-0.08%); split: -0.12%, +0.05% SClause: 6798 -> 6794 (-0.06%) Copies: 44828 -> 44630 (-0.44%); split: -0.89%, +0.45% Branches: 8845 -> 8844 (-0.01%); split: -0.05%, +0.03% PreSGPRs: 14684 -> 14659 (-0.17%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25409>	2023-09-27 22:13:01 +00:00
Georg Lehmann	136a698251	nir/opt_algebraic: remove broken fddx/fddy patterns These patterns are broken in the following scenario: %1 = f2fmp %0 %2 = fddx %1 %3 = ... // non quad uniform if %3 { %4 = f2f32 %2 ... } Which would turn into %3 = ... if %3 { %4 = fddx %0 ... } Yet another example that shows why derivative instructions should be be intrinsics, not alu. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25014>	2023-09-08 14:14:47 +00:00
Ian Romanick	5ce6e09ffc	nir/algebraic: Remove redundant pack / unpack lowering patterns No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24900>	2023-08-25 14:54:11 -07:00
Georg Lehmann	9cf6984200	nir: unify lower_find_msb with has_{find_msb_rev,uclz} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	2ac7e6614a	nir: unify lower_bitfield_extract with has_bfe Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	34c3f81614	nir: unify lower_bitfield_insert with has_{bfm,bfi,bitfield_select} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Marek Olšák	1ac379c4a0	nir/algebraic: collapse ALU opcodes sourcing NaN Undef will be replaced by NaN whenever it leads to elimination of FP instructions. This implements the elimination part. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>	2023-08-19 14:18:52 -04:00
Alyssa Rosenzweig	a257e2daad	nir: Lower fquantize2f16 Passes dEQP-VK.spirv_assembly.opquantize. Unlike the DXIL lowering, this should correctly handle NaNs. (I belive Dozen has a bug here that is masked by running constant folding early and poor CTS coverage.) It is also faster than the DXIL lowering for hardware that supports f2f16 conversions natively. It is not as good as a backend implementation that could flush-to-zero in hardware... but for a debug instruction it should be more than good enough. It might be slightly better to multiply with 0.0 to get the appropriate zero, but NIR really likes optimizing that out ... Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24616>	2023-08-18 22:20:02 +00:00
Georg Lehmann	44d0b785cc	nir/opt_algebraic: combine bitz/bitnz Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23298>	2023-06-29 13:39:30 +00:00
Pavel Ondračka	b4ca45911d	nir_opt_algebraic: don't use i32csel without native integer support Otherwise nir_lower_int_to_float (or specifically nir_gather_ssa_types) will fail to recognize we already have float constants and converts them again. Example from spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test with r300 driver (after enabling has_fused_comp_and_csel). impl main { block block_0: /* preds: / vec1 32 ssa_0 = load_const (0x00000000 = 0.000000) vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1) / gl_Vertex / vec3 32 ssa_2 = load_const (0x00000000, 0x3e800000, 0x3f800000) = (0.000000, 0.250000, 1.000000) vec3 32 ssa_3 = load_const (0x00000000, 0x3f000000, 0x3f800000) = (0.000000, 0.500000, 1.000000) vec3 32 ssa_4 = load_const (0x00000000, 0x3f400000, 0x3f800000) = (0.000000, 0.750000, 1.000000) vec2 32 ssa_5 = load_const (0x00000000, 0x3f800000) = (0.000000, 1.000000) vec1 32 ssa_6 = load_const (0x3f800000 = 1.000000) vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0) vec4 32 ssa_8 = load_const (0x00000000, 0x00000001, 0x00000002, 0x00000003) = (0.000000, 0.000000, 0.000000, 0.000000) vec4 1 ssa_9 = ilt ssa_8, ssa_7.xxxx vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4 vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3 vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2 vec3 32 ssa_15 = i32csel_gt ssa_7.xxx, ssa_12, ssa_6.xxx vec4 32 ssa_14 = fsat ssa_15.xyxz intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2()) / gl_FrontColor / intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) / gl_Position / / succs: block_1 / block block_1: } and after nir_lower_int_to_float impl main { block block_0: / preds: / vec1 32 ssa_0 = load_const (0x00000000 = 0.000000) vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1) / gl_Vertex / vec3 32 ssa_2 = load_const (0x00000000, 0x4e7a0000, 0x4e7e0000) = (0.000000, 1048576000.000000, 1065353216.000000) vec3 32 ssa_3 = load_const (0x00000000, 0x4e7c0000, 0x4e7e0000) = (0.000000, 1056964608.000000, 1065353216.000000) vec3 32 ssa_4 = load_const (0x00000000, 0x4e7d0000, 0x4e7e0000) = (0.000000, 1061158912.000000, 1065353216.000000) vec2 32 ssa_5 = load_const (0x00000000, 0x4e7e0000) = (0.000000, 1065353216.000000) vec1 32 ssa_6 = load_const (0x4e7e0000 = 1065353216.000000) vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0) vec4 32 ssa_8 = load_const (0x00000000, 0x3f800000, 0x40000000, 0x40400000) = (0.000000, 1.000000, 2.000000, 3.000000) vec4 1 ssa_9 = flt ssa_8, ssa_7.xxxx vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4 vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3 vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2 vec3 32 ssa_13 = fcsel_gt ssa_7.xxx, ssa_12, ssa_6.xxx vec4 32 ssa_14 = fsat ssa_13.xyxz intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2()) / gl_FrontColor / intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) / gl_Position / / succs: block_1 */ block block_1: } Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23704>	2023-06-22 07:25:44 +00:00
Ian Romanick	de60b463d7	nir/algebraic: Simplify various trivial bfi These are mostly just obvious patterns that somebody will eventually want to add. DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar results (Ice Lake shown) total instructions in shared programs: 20570033 -> 20570026 (<.01%) instructions in affected programs: 7363 -> 7356 (-0.10%) helped: 6 / HURT: 0 total cycles in shared programs: 902118781 -> 902118854 (<.01%) cycles in affected programs: 419132 -> 419205 (0.02%) helped: 4 / HURT: 2 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819500 -> 152819380 (-0.00%) Cycles: 15014627187 -> 15014624437 (-0.00%) Totals from 115 (0.02% of 662497) affected shaders: Instrs: 28963 -> 28843 (-0.41%) Cycles: 404582 -> 401832 (-0.68%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Ian Romanick	541e7eb389	nir/algebraic: Optimize some u2f of bfi v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See piglit!819. DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown) total instructions in shared programs: 20570063 -> 20570033 (<.01%) instructions in affected programs: 452 -> 422 (-6.64%) helped: 30 / HURT: 0 total cycles in shared programs: 902118723 -> 902118781 (<.01%) cycles in affected programs: 1762 -> 1820 (3.29%) helped: 0 / HURT: 29 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152819969 -> 152819500 (-0.00%) Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00% Totals from 469 (0.07% of 662497) affected shaders: Instrs: 7644 -> 7175 (-6.14%) Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Ian Romanick	6603948a7a	nir/algebraic: Lower some bfi with two constant sources All Haswell and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19907054 -> 19906882 (<.01%) instructions in affected programs: 8103 -> 7931 (-2.12%) helped: 52 / HURT: 0 total cycles in shared programs: 855779334 -> 855781791 (<.01%) cycles in affected programs: 724201 -> 726658 (0.34%) helped: 38 / HURT: 7 total sends in shared programs: 1039308 -> 1039302 (<.01%) sends in affected programs: 162 -> 156 (-3.70%) helped: 2 / HURT: 0 No shader-db changes on any older Intel platforms. All Intel platforms had similar restuls. (Ice Lake shown) Totals: Instrs: 153117340 -> 152825222 (-0.19%); split: -0.19%, +0.00% Cycles: 15011904351 -> 15014072944 (+0.01%); split: -0.04%, +0.05% Send messages: 7711509 -> 7711421 (-0.00%) Spill count: 100745 -> 99907 (-0.83%); split: -0.85%, +0.02% Fill count: 203684 -> 202459 (-0.60%); split: -0.62%, +0.02% Scratch Memory Size: 4403200 -> 4376576 (-0.60%) Totals from 18603 (2.81% of 662496) affected shaders: Instrs: 5258303 -> 4966185 (-5.56%); split: -5.56%, +0.00% Cycles: 447391388 -> 449559981 (+0.48%); split: -1.29%, +1.77% Send messages: 559231 -> 559143 (-0.02%) Spill count: 5009 -> 4171 (-16.73%); split: -17.17%, +0.44% Fill count: 8769 -> 7544 (-13.97%); split: -14.33%, +0.36% Scratch Memory Size: 194560 -> 167936 (-13.68%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Jesse Natalie	fba82797d7	nir: Optimize unpacking 16 bit values that were originally packed I was seeing u2u64 still in my final shader after pack/unpack were lowered, which sounds to me like some other optimizations are missing for detecting the post-lowering pack/unpack patterns, but let's at least add some patterns for the simple cases. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>	2023-06-13 00:43:36 +00:00
Ian Romanick	7ef45e661f	intel/fs: Add constant propagation for ADD3 v2: Require that the constant value be representable as either uint16_t or int16_t. Suggested by Matt. v3: Remove redundant patterns. Noticed by Matt. shader-db: DG2 total instructions in shared programs: 23103767 -> 23103577 (<.01%) instructions in affected programs: 51822 -> 51632 (-0.37%) helped: 98 / HURT: 15 total cycles in shared programs: 842347714 -> 842380017 (<.01%) cycles in affected programs: 1942595 -> 1974898 (1.66%) helped: 97 / HURT: 32 Nearly all of the affected shaders (around 9,900) are shaders in Cyberpunk 2077. It's about an even split between vertex and fragment shaders. The majority of the remaining affected shaders (3,600) are from Strange Brigade. This was also a nearly even split between fragment and vertex. All but two of the lost shaders are SIMD32 fragment shaders in Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2. fossil-db: DG2 Instructions in all programs: 196379107 -> 196248608 (-0.1%) helped: 13467 / HURT: 1210 Cycles in all programs: 13931355281 -> 13929955971 (-0.0%) helped: 11801 / HURT: 2922 Lost: 90 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Ian Romanick	7b34808649	nir/algebraic: Fixup iadd3 related patterns There should not be any isub at this point due to lowerings that happened ages before getting to late algebraic. shader-db: DG2 total instructions in shared programs: 23103769 -> 23103767 (<.01%) instructions in affected programs: 65 -> 63 (-3.08%) helped: 1 / HURT: 0 total cycles in shared programs: 842348074 -> 842347714 (<.01%) cycles in affected programs: 28572 -> 28212 (-1.26%) helped: 3 / HURT: 0 One compute shader in Assassin's Creed Odyssey was affected. fossil-db: DG2 Instructions in all programs: 196400668 -> 196400676 (+0.0%) helped: 8 / HURT: 5 Cycles in all programs: 13931740724 -> 13931758003 (+0.0%) helped: 8 / HURT: 7 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Jesse Natalie	2d3fbb44f4	nir: Add preserve_mediump as a shader compiler option The DXIL backend would like to distinguish between casts to 16-bit that must cast, vs those that may. If a shader only ever produces 16-bit types from mediump casts and ALU ops on those values, then the resulting shader can be annotated with DXIL's min-precision qualifier, basically telling the driver to use 16-bit precision if it's faster for them. If it uses concrete 16-bit casts, or loads/ stores to externally-visible memory, then it must use the "native" 16-bit flag, which is not supported on all hardware. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>	2023-06-01 23:01:04 +00:00
Jesse Natalie	6c62eaf22d	nir_opt_algebraic: Don't shrink 64-bit bitwise ops if pack_split is going to be lowered Otherwise this can cause optimizations to fight resulting in infinite optimization loops with opt_algebraic, constant_folding, and copy_prop. Fixes: `368be872` ("nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23192>	2023-06-01 00:36:10 +00:00
Ian Romanick	71e5530c07	nir/algebraic: Undistribute fsat from fmax To be helpful, the thing inside the fsat has to be used with and without the fsat. Otherwise it just moves a saturate destination modifier around. To not be harmful, the fsat has to only be used by the bcsel. All Broadwell and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20174475 -> 20174449 (<.01%) instructions in affected programs: 3913 -> 3887 (-0.66%) helped: 13 / HURT: 0 total cycles in shared programs: 866844832 -> 866844719 (<.01%) cycles in affected programs: 46037 -> 45924 (-0.25%) helped: 10 / HURT: 1 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 161491468 -> 161491372 (-0.0%) helped: 31 / HURT: 8 Cycles in all programs: 10933090736 -> 10933024716 (-0.0%) helped: 32 / HURT: 18 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>	2023-03-29 23:48:19 +00:00
Faith Ekstrand	01275a1a95	nir: Drop a bunch of Authors tags This is what git blame is for. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22120>	2023-03-26 00:16:25 +00:00
Georg Lehmann	cec04adcee	nir: optimize i2f(f2i(fsign)) Foz-DB Navi10: Totals from 3013 (2.23% of 134906) affected shaders: VGPRs: 138068 -> 136964 (-0.80%); split: -0.80%, +0.00% CodeSize: 10476416 -> 10391800 (-0.81%) MaxWaves: 79118 -> 80088 (+1.23%) Instrs: 1963227 -> 1945003 (-0.93%) Latency: 24734883 -> 24649279 (-0.35%); split: -0.39%, +0.05% InvThroughput: 6366777 -> 6334735 (-0.50%); split: -0.50%, +0.00% VClause: 36845 -> 36882 (+0.10%); split: -0.26%, +0.36% SClause: 59249 -> 59273 (+0.04%); split: -0.25%, +0.29% Copies: 108570 -> 108501 (-0.06%); split: -0.19%, +0.13% PreSGPRs: 105371 -> 105862 (+0.47%) PreVGPRs: 117675 -> 116625 (-0.89%); split: -0.89%, +0.00% Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22003>	2023-03-22 05:34:55 +00:00
Isabella Basso	59fea8af3a	nir/algebraic: remove duplicate bool conversion lowerings While [1] added some boolean conversion lowering patterns, those were already dealt with on [2]. [1] - `b86305bb` ("nir/algebraic: collapse conversion opcodes (many patterns)") [2] - `d7e0d47b` ("nir/algebraic: nir: Add a bunch of b2[if] optimizations") Fixes: `b86305bb` ("nir/algebraic: collapse conversion opcodes (many patterns)") Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:38 +00:00
Isabella Basso	a553d3cd29	nir/algebraic: make patterns for float conversion lowerings imprecise As noted on [1], lowering patterns of the form floatS -> floatB -> floatS ==> floatS cannot require precision since this may cause flush denorming. [1] `3f779013` ("nir: Add an algebraic optimization for float->double->float") Fixes: `b86305bb` ("nir/algebraic: collapse conversion opcodes (many patterns)") Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Isabella Basso	79c94ef52e	nir/algebraic: extend lowering patterns for conversions on smaller bit sizes Conversions on smaller bit sizes should also be collapsed when composed. This also adds more patterns on the intS -> intB -> floatB ==> intS -> floatB lowering so as to deal with any int size C > B instead of a fixed intB. Closes: #7776 Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Isabella Basso	a27bcd63d0	nir/algebraic: extend mediump patterns Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Italo Nicola <italonicola@collabora.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Isabella Basso	b3685f3ba7	nir/algebraic: insert patterns inside optimizations list Some patterns were outside the list of optimizations. Fixes: `b86305bb` ("nir/algebraic: collapse conversion opcodes (many patterns)") Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>	2023-03-11 17:21:37 +00:00
Ian Romanick	831f9d3f61	nir/algebraic: Optimize some ifind_msb to ufind_msb On Intel platforms, the uclz lowering if ufind_msb is either one instruction better (Gfx7 and newer) or two instructions better (all older platforms) than the ifind_msb implementations. On platforms that use lower_find_msb_to_reverse, there should be no difference. All Haswell and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19938662 -> 19938634 (<.01%) instructions in affected programs: 850 -> 822 (-3.29%) helped: 2 / HURT: 0 total cycles in shared programs: 858467067 -> 858465538 (<.01%) cycles in affected programs: 10080 -> 8551 (-15.17%) helped: 2 / HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	2d6f48f6ef	nir/algebraic: Do not generate 8- or 16-bit find_msb The next commit will add validation to restrict this instruction (and others) to only 32-bit or 64-bit sources. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	28311f9d02	nir: intel/compiler: Move ufind_msb lowering to NIR Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Cycles in all programs: 9098346105 -> 9098333765 (-0.0%) Cycles helped: 6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	a4052e70ea	nir/algebraic: Only lower ufind_msb with 32-bit sources The 31-ufind_msb_rev(x) lowering only produces the correct result for 32-bit sources. ufind_msb_rev can also have 64-bit sources, and most platforms are expected to lower this to 32-bit instructions with extra logic operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	0cc7bf63b7	nir: intel/compiler: Move ifind_msb lowering to NIR Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so no @32 annotation is required. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Georg Lehmann	aeb68c29b4	nir/opt_algebraic: add patterns for iand/ior of feq/fneu with 0 Foz-DB Navi21: Totals from 1245 (0.92% of 134913) affected shaders: VGPRs: 66232 -> 66248 (+0.02%); split: -0.01%, +0.04% CodeSize: 5874976 -> 5868168 (-0.12%); split: -0.17%, +0.05% MaxWaves: 25278 -> 25274 (-0.02%); split: +0.01%, -0.02% Instrs: 1087502 -> 1085267 (-0.21%); split: -0.21%, +0.00% Latency: 6531489 -> 6531672 (+0.00%); split: -0.04%, +0.05% InvThroughput: 1531774 -> 1532327 (+0.04%); split: -0.02%, +0.05% VClause: 22218 -> 22202 (-0.07%); split: -0.08%, +0.00% SClause: 45906 -> 45873 (-0.07%); split: -0.08%, +0.01% Copies: 64004 -> 64102 (+0.15%); split: -0.24%, +0.39% Branches: 21529 -> 21534 (+0.02%); split: -0.00%, +0.03% PreSGPRs: 51936 -> 51850 (-0.17%) PreVGPRs: 55393 -> 55398 (+0.01%); split: -0.02%, +0.03% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21576>	2023-03-01 11:24:43 +00:00
Emma Anholt	6d52e6fd2c	nir: Port a floor->truncate algebraic opt pattern from GLSL. Prevents regression when dropping code from the GLSL optimizer. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>	2023-02-28 03:36:09 +00:00
Emma Anholt	ef02581590	nir: Add optimization for fdot(x, 0) -> 0. We had all these nice fdot opts to drop individual channels that were 0, but nothing handling it being entirely 0! Avoids r300g regression when dropping them from GLSL. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>	2023-02-28 03:36:08 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Timur Kristóf	1244506c15	nir/opt_algebraic: Add optimization for ieq/ine and right-shift. Fossil DB stats on GFX11: Totals from 1343 (1.00% of 134913) affected shaders: SpillSGPRs: 7145 -> 7137 (-0.11%) CodeSize: 20737744 -> 20739148 (+0.01%); split: -0.02%, +0.03% Instrs: 4010443 -> 4008449 (-0.05%); split: -0.05%, +0.00% Latency: 50021520 -> 50021105 (-0.00%); split: -0.00%, +0.00% InvThroughput: 6354371 -> 6354112 (-0.00%); split: -0.00%, +0.00% VClause: 63035 -> 63038 (+0.00%); split: -0.01%, +0.01% SClause: 121162 -> 121166 (+0.00%) Copies: 251354 -> 251058 (-0.12%); split: -0.18%, +0.06% PreSGPRs: 137283 -> 137299 (+0.01%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20936>	2023-02-02 03:08:19 +00:00
Timur Kristóf	65a917cb6e	nir: Add algebraic optimization for VKD3D-Proton fp32->fp16 conversion. VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16. Because the spec does not specify a rounding mode, it emits a sequence to ensure D3D-like behaviour for infinity. When we know the current backend has pack_half_2x16_rtz_split, we can eliminate the extra sequence. Fossil DB stats on GFX11: Totals from 835 (0.62% of 134913) affected shaders: VGPRs: 49368 -> 49224 (-0.29%) CodeSize: 5341956 -> 5124564 (-4.07%) Instrs: 1024062 -> 987041 (-3.62%) Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00% InvThroughput: 908189 -> 870253 (-4.18%) VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01% SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01% Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00% Branches: 18498 -> 18465 (-0.18%) PreSGPRs: 38409 -> 38331 (-0.20%) PreVGPRs: 44089 -> 43834 (-0.58%) Note, some fossils are from before this pattern was added to VKD3D-Proton, so the above may not reflect real-world impact. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>	2023-01-26 12:24:24 +00:00
Timur Kristóf	7985933a6d	nir: Lower pack_half_2x16_split to RTZ if available. Constant folding always uses RTNE for pack_half_2x16_split, but some backends implement it with RTZ. Lowering to RTZ when available ensures that the behaviour will be consistent between constant folding and the backend. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>	2023-01-26 12:24:24 +00:00
Alyssa Rosenzweig	c3839bd540	nir: Optimize vendored sin/cos the same way As we've done for the AMD one, to prevent any codegen regression from switching the Midgard lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Italo Nicola <italonicola@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>	2023-01-16 22:20:43 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Ian Romanick	b60b2f2add	nir/algebraic: Optimize some b2i involved in masking operations v2: Remove the ineg from the b2i in the ior pattern. Suggested by Jason. All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19914441 -> 19914369 (<.01%) instructions in affected programs: 63507 -> 63435 (-0.11%) helped: 24 / HURT: 0 total cycles in shared programs: 853869766 -> 853851470 (<.01%) cycles in affected programs: 10551542 -> 10533246 (-0.17%) helped: 24 / HURT: 0 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141163061 -> 141092683 (-0.0%) Instructions helped: 14103 Instructions hurt: 55 Cycles in all programs: 9132376195 -> 9133183045 (+0.0%) Cycles helped: 13775 Cycles hurt: 380 Spills in all programs: 18286 -> 18284 (-0.0%) Spills helped: 1 Fills in all programs: 30647 -> 30643 (-0.0%) Fills helped: 1 Gained: 133 Lost: 130 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Ian Romanick	ba0b248ac2	nir/algebraic: Eliminate unary op on src of integer comparison w/ zero This helps because it enables cmod propagation to do more. The removed patterns involving b2i will be handled by other existing patterns after the unary operations are removed. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19914458 -> 19914441 (<.01%) instructions in affected programs: 5456 -> 5439 (-0.31%) helped: 17 / HURT: 0 total cycles in shared programs: 855302118 -> 853869766 (-0.17%) cycles in affected programs: 327354347 -> 325921995 (-0.44%) helped: 291 / HURT: 81 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141205979 -> 141205961 (-0.0%) Instructions helped: 4 Instructions hurt: 3 SENDs in all programs: 7466919 -> 7466913 (-0.0%) SENDs helped: 1 Cycles in all programs: 9133387327 -> 9133384475 (-0.0%) Cycles helped: 3 Cycles hurt: 12 In the shader that was helped for sends, it appears that a NIR pass that moves code out of loops was able to move 3 send operations outside a loop after this change. I did not investigate further. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	ee15d89322	nir/algebraic: Simplify min and max of b2i This prevents ~400 shader-db regresssions and a handful of fossil-db regressions after i2b is always lowered. All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown) total cycles in shared programs: 855301494 -> 855302118 (<.01%) cycles in affected programs: 52787 -> 53411 (1.18%) helped: 4 / HURT: 5 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141206055 -> 141205979 (-0.0%) Instructions helped: 14 Cycles in all programs: 9133376616 -> 9133387327 (+0.0%) Cycles helped: 13 Cycles hurt: 3 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	19222867e4	nir/algebraic: Reassociate some iand to eliminate an operation No shader-db changes on any Intel platform. All of the helped shaders were presumably regressed by `4676b3d3dd` (nir: Use nir_test_mask instead of i2b(iand)). v2: Add some comments explaining why specific replacements are used. In the umin pattern, only markup the first usage of 'b' in the source pattern. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 141384970 -> 141200966 (-0.1%) Instructions helped: 45842 Cycles in all programs: 9133648977 -> 9133282672 (-0.0%) Cycles helped: 26812 Cycles hurt: 6025 Gained: 23 Lost: 135 Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	d48ce1f47d	nir/algebraic: Remove redundant i2b(b2i(x)) patterns A loop below already adds all the permutations... including the 1-bit version that isn't included in this group. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Ian Romanick	14a9bb04e4	nir/algebraic: Remove redundant i2b(-x) pattern The exact same pattern appears later (around line 1323). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:20 +00:00
Georg Lehmann	4dff3ff005	nir/opt_algebraic: Optimize open coded bfm. Foz-DB Navi21: Totals from 1553 (1.15% of 134913) affected shaders: SpillVGPRs: 2246 -> 2223 (-1.02%); split: -1.42%, +0.40% CodeSize: 10409156 -> 10410720 (+0.02%); split: -0.03%, +0.04% Instrs: 1899725 -> 1898773 (-0.05%); split: -0.07%, +0.02% Latency: 71225814 -> 71118314 (-0.15%); split: -0.21%, +0.06% InvThroughput: 13384926 -> 13330369 (-0.41%); split: -0.47%, +0.06% VClause: 38309 -> 38284 (-0.07%); split: -0.17%, +0.11% SClause: 70743 -> 70706 (-0.05%) Copies: 167296 -> 167230 (-0.04%); split: -0.28%, +0.24% Branches: 42446 -> 42444 (-0.00%); split: -0.01%, +0.00% PreVGPRs: 95191 -> 95188 (-0.00%) Some minor instructions count regressions in parallel-rdp because v_bfm_b32 can't use SDWA, but overall an improvement. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18887>	2022-12-09 14:59:16 +00:00

1 2 3 4 5 ...

505 commits