Commit graph

517 commits

Author SHA1 Message Date
Rhys Perry
b37804c8de nir/algebraic: optimize 64-bit comparisons with zero'd halves to 32-bit
These expect nir_lower_int64 to replace u2u64 to pack_64_2x32_split(, 0).

fossil-db (navi31):
Totals from 149 (0.19% of 79242) affected shaders:
Instrs: 433095 -> 431830 (-0.29%); split: -0.29%, +0.00%
CodeSize: 2165980 -> 2160284 (-0.26%); split: -0.27%, +0.00%
SpillSGPRs: 689 -> 688 (-0.15%)
Latency: 3801497 -> 3799901 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 1547916 -> 1546567 (-0.09%); split: -0.09%, +0.01%
VClause: 4698 -> 4693 (-0.11%)
SClause: 9981 -> 9977 (-0.04%); split: -0.05%, +0.01%
Copies: 66148 -> 65431 (-1.08%); split: -1.09%, +0.01%
PreSGPRs: 6732 -> 6729 (-0.04%)
PreVGPRs: 7976 -> 7945 (-0.39%)
VALU: 252936 -> 252336 (-0.24%)
SALU: 51794 -> 51274 (-1.00%); split: -1.03%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Rhys Perry
417eb390c6 nir/algebraic: remove duplicated iand(ien, ine)/ior(ieq, ieq) patterns
These don't seem useful, since they're already done in the early optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Rhys Perry
6952bb359c nir/algebraic: don't create 64-bit min/max/ior if lowered
fossil-db (navi31):
Totals from 58 (0.07% of 79242) affected shaders:
Instrs: 11692 -> 11304 (-3.32%)
CodeSize: 65836 -> 62412 (-5.20%)
VGPRs: 1320 -> 1344 (+1.82%)
Latency: 51712 -> 50234 (-2.86%)
InvThroughput: 10190 -> 10160 (-0.29%)
Copies: 460 -> 688 (+49.57%)
VALU: 6130 -> 5897 (-3.80%)
SALU: 1231 -> 1284 (+4.31%); split: -0.32%, +4.63%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
2024-03-06 15:23:18 +00:00
Karol Herbst
f2b7c4ce29 nir: rework and fix rotate lowering
No driver supports urol/uror on all bit sizes. Intel gen11+ only for 16
and 32 bit, Nvidia GV100+ only for 32 bit. Etnaviv can support it on 8,
16 and 32 bit.

Also turn the `lower` into a `has` option as only two drivers actually
support `uror` and `urol` at this momemt.

Fixes crashes with CL integer_rotate on iris and nouveau since we emit
urol for `rotate`.

v2: always lower 64 bit

Fixes: fe0965afa6 ("spirv: Don't use libclc for rotate")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by (Intel and nir): Ian Romanick <ian.d.romanick@intel.com>

Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27090>
2024-01-22 10:27:44 +00:00
Rhys Perry
e86ab8173b nir/algebraic: optimize vkd3d-proton's MSAD
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26907>
2024-01-05 18:55:22 +00:00
Konstantin Seurer
b88ac6b381 nir: Optimize fpow with small constant exponents
They would be turned into exp(log(a)*b) instead, which is slow.

Totals from 2146 (2.52% of 85071) affected shaders:
MaxWaves: 35769 -> 35779 (+0.03%); split: +0.03%, -0.01%
Instrs: 6476835 -> 6465494 (-0.18%); split: -0.18%, +0.00%
CodeSize: 35382288 -> 35347092 (-0.10%); split: -0.10%, +0.00%
SpillSGPRs: 1055 -> 1017 (-3.60%)
Latency: 75211743 -> 75063623 (-0.20%); split: -0.20%, +0.00%
InvThroughput: 17525115 -> 17501745 (-0.13%); split: -0.14%, +0.00%
VClause: 200089 -> 200077 (-0.01%); split: -0.01%, +0.01%
SClause: 293566 -> 293480 (-0.03%); split: -0.03%, +0.00%
Copies: 649631 -> 640516 (-1.40%); split: -1.44%, +0.03%
Branches: 268441 -> 268325 (-0.04%)
PreSGPRs: 146868 -> 146045 (-0.56%)
PreVGPRs: 134125 -> 134128 (+0.00%); split: -0.00%, +0.01%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26727>
2024-01-02 11:16:14 +01:00
Faith Ekstrand
aac1e3f595 nir: Add a new has_fmulz_no_denorms flag
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26569>
2023-12-11 15:29:17 +00:00
Faith Ekstrand
d2ffcb6092 nir: Lower [su]dot_4x8_[ui]add_sat to [su]dot_4x8_[ui]add
Since nir_opt_algebraic runs on its own results, if the driver doesn't
have [su]dot_4x8_[ui]add then the [su]dot_4x8_[ui]add lowering rules
will kick in and lower that to what we had originally.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>
2023-12-06 23:15:33 +00:00
Faith Ekstrand
09fc5e1c4d nir: Split has_[su]dot_4x8 bits into regular and _sat versions
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26533>
2023-12-06 23:15:33 +00:00
Qiang Yu
7e4aac46ad nir: add force_f2f16_rtz option to lower f2f16 to f2f16_rtz
Used by OpenGL driver like radeonsi which has undefined rounding mode.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990>
2023-11-20 02:20:17 +00:00
Faith Ekstrand
3af5af429e nir: Optimize boolean ieq/ine with an immediate
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26120>
2023-11-10 21:46:55 +00:00
Eric Anholt
b416248cb5 nir: Add nir_lower_dsign as 64-bit fsign lowering.
Right now some drivers are doing dsign lowering in GLSL and haven't had to
have a NIR path due to there not being a corresponding vulkan driver.  We
want this in NIR now so that we can retire that batch of GLSL lowering
code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25777>
2023-10-24 00:16:30 +00:00
Marek Olšák
8ff4847b64 nir/algebraic: use only signed_zero_preserve_* for addition by 0 patterns, etc.
Some GLSL versions will set inf_preserve but not the other flags.
Additions by 0 only affect signed zeros.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25392>
2023-10-17 17:27:12 +00:00
Alyssa Rosenzweig
be0ab37bac nir/opt_algebraic: Optimize LLVM booleans
Helps OpenCL kernels.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25687>
2023-10-13 02:55:48 +00:00
Alyssa Rosenzweig
8df8d1e2f2 nir/opt_algebraic: Reduce int64
If we just want the bottom 32-bits we don't need a full 64-bit operation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25625>
2023-10-12 21:03:31 +00:00
Rhys Perry
65afc8bebf nir/algebraic: optimize u2u32(a >> 32)
fossil-db (navi21):
Totals from 352 (0.44% of 79330) affected shaders:
Instrs: 271816 -> 271240 (-0.21%); split: -0.28%, +0.07%
CodeSize: 1546520 -> 1544448 (-0.13%); split: -0.23%, +0.09%
SpillVGPRs: 832 -> 827 (-0.60%); split: -1.08%, +0.48%
Latency: 4037120 -> 4021748 (-0.38%); split: -0.41%, +0.03%
InvThroughput: 1369540 -> 1362066 (-0.55%); split: -0.59%, +0.04%
VClause: 6476 -> 6471 (-0.08%); split: -0.12%, +0.05%
SClause: 6798 -> 6794 (-0.06%)
Copies: 44828 -> 44630 (-0.44%); split: -0.89%, +0.45%
Branches: 8845 -> 8844 (-0.01%); split: -0.05%, +0.03%
PreSGPRs: 14684 -> 14659 (-0.17%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25409>
2023-09-27 22:13:01 +00:00
Georg Lehmann
136a698251 nir/opt_algebraic: remove broken fddx/fddy patterns
These patterns are broken in the following scenario:

%1 = f2fmp %0
%2 = fddx %1
%3 = ... // non quad uniform
if %3 {
   %4 = f2f32 %2
   ...
}

Which would turn into

%3 = ...
if %3 {
   %4 = fddx %0
   ...
}

Yet another example that shows why derivative instructions should be
be intrinsics, not alu.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25014>
2023-09-08 14:14:47 +00:00
Ian Romanick
5ce6e09ffc nir/algebraic: Remove redundant pack / unpack lowering patterns
No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24900>
2023-08-25 14:54:11 -07:00
Georg Lehmann
9cf6984200 nir: unify lower_find_msb with has_{find_msb_rev,uclz}
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>
2023-08-22 12:08:37 +00:00
Georg Lehmann
2ac7e6614a nir: unify lower_bitfield_extract with has_bfe
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>
2023-08-22 12:08:37 +00:00
Georg Lehmann
34c3f81614 nir: unify lower_bitfield_insert with has_{bfm,bfi,bitfield_select}
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>
2023-08-22 12:08:37 +00:00
Marek Olšák
1ac379c4a0 nir/algebraic: collapse ALU opcodes sourcing NaN
Undef will be replaced by NaN whenever it leads to elimination of FP
instructions. This implements the elimination part.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>
2023-08-19 14:18:52 -04:00
Alyssa Rosenzweig
a257e2daad nir: Lower fquantize2f16
Passes dEQP-VK.spirv_assembly.*opquantize*.

Unlike the DXIL lowering, this should correctly handle NaNs. (I belive Dozen has
a bug here that is masked by running constant folding early and poor CTS
coverage.) It is also faster than the DXIL lowering for hardware that supports
f2f16 conversions natively. It is not as good as a backend implementation that
could flush-to-zero in hardware... but for a debug instruction it should be more
than good enough.

It might be slightly better to multiply with 0.0 to get the appropriate zero,
but NIR really likes optimizing that out ...

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24616>
2023-08-18 22:20:02 +00:00
Georg Lehmann
44d0b785cc nir/opt_algebraic: combine bitz/bitnz
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23298>
2023-06-29 13:39:30 +00:00
Pavel Ondračka
b4ca45911d nir_opt_algebraic: don't use i32csel without native integer support
Otherwise nir_lower_int_to_float (or specifically nir_gather_ssa_types)
will fail to recognize we already have float constants and converts them
again.

Example from spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test
with r300 driver (after enabling has_fused_comp_and_csel).

impl main {
        block block_0:
        /* preds: */
        vec1 32 ssa_0 = load_const (0x00000000 = 0.000000)
        vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1)      /* gl_Vertex */
        vec3 32 ssa_2 = load_const (0x00000000, 0x3e800000, 0x3f800000) = (0.000000, 0.250000, 1.000000)
        vec3 32 ssa_3 = load_const (0x00000000, 0x3f000000, 0x3f800000) = (0.000000, 0.500000, 1.000000)
        vec3 32 ssa_4 = load_const (0x00000000, 0x3f400000, 0x3f800000) = (0.000000, 0.750000, 1.000000)
        vec2 32 ssa_5 = load_const (0x00000000, 0x3f800000) = (0.000000, 1.000000)
        vec1 32 ssa_6 = load_const (0x3f800000 = 1.000000)
        vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0)
        vec4 32 ssa_8 = load_const (0x00000000, 0x00000001, 0x00000002, 0x00000003) = (0.000000, 0.000000, 0.000000, 0.000000)
        vec4  1 ssa_9 = ilt ssa_8, ssa_7.xxxx
        vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4
        vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3
        vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2
        vec3 32 ssa_15 = i32csel_gt ssa_7.xxx, ssa_12, ssa_6.xxx
        vec4 32 ssa_14 = fsat ssa_15.xyxz
        intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2())       /* gl_FrontColor */
        intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) /* gl_Position */
        /* succs: block_1 */
        block block_1:
}

and after nir_lower_int_to_float

impl main {
        block block_0:
        /* preds: */
        vec1 32 ssa_0 = load_const (0x00000000 = 0.000000)
        vec4 32 ssa_1 = intrinsic load_input (ssa_0) (base=0, component=0, dest_type=float32, io location=VERT_ATTRIB_POS slots=1)      /* gl_Vertex */
        vec3 32 ssa_2 = load_const (0x00000000, 0x4e7a0000, 0x4e7e0000) = (0.000000, 1048576000.000000, 1065353216.000000)
        vec3 32 ssa_3 = load_const (0x00000000, 0x4e7c0000, 0x4e7e0000) = (0.000000, 1056964608.000000, 1065353216.000000)
        vec3 32 ssa_4 = load_const (0x00000000, 0x4e7d0000, 0x4e7e0000) = (0.000000, 1061158912.000000, 1065353216.000000)
        vec2 32 ssa_5 = load_const (0x00000000, 0x4e7e0000) = (0.000000, 1065353216.000000)
        vec1 32 ssa_6 = load_const (0x4e7e0000 = 1065353216.000000)
        vec1 32 ssa_7 = intrinsic load_ubo_vec4 (ssa_0, ssa_0) (access=0, base=0, component=0)
        vec4 32 ssa_8 = load_const (0x00000000, 0x3f800000, 0x40000000, 0x40400000) = (0.000000, 1.000000, 2.000000, 3.000000)
        vec4  1 ssa_9 = flt ssa_8, ssa_7.xxxx
        vec3 32 ssa_10 = bcsel ssa_9.www, ssa_5.xyy, ssa_4
        vec3 32 ssa_11 = bcsel ssa_9.zzz, ssa_10, ssa_3
        vec3 32 ssa_12 = bcsel ssa_9.yyy, ssa_11, ssa_2
        vec3 32 ssa_13 = fcsel_gt ssa_7.xxx, ssa_12, ssa_6.xxx
        vec4 32 ssa_14 = fsat ssa_13.xyxz
        intrinsic store_output (ssa_14, ssa_0) (base=1, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_COL0 slots=1, xfb(), xfb2())       /* gl_FrontColor */
        intrinsic store_output (ssa_1, ssa_0) (base=0, wrmask=xyzw, component=0, src_type=float32, io location=VARYING_SLOT_POS slots=1, xfb(), xfb2()) /* gl_Position */
        /* succs: block_1 */
        block block_1:
}

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23704>
2023-06-22 07:25:44 +00:00
Ian Romanick
de60b463d7 nir/algebraic: Simplify various trivial bfi
These are mostly just obvious patterns that somebody will eventually
want to add.

DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0

total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)

Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
541e7eb389 nir/algebraic: Optimize some u2f of bfi
v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See
piglit!819.

DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown)
total instructions in shared programs: 20570063 -> 20570033 (<.01%)
instructions in affected programs: 452 -> 422 (-6.64%)
helped: 30 / HURT: 0

total cycles in shared programs: 902118723 -> 902118781 (<.01%)
cycles in affected programs: 1762 -> 1820 (3.29%)
helped: 0 / HURT: 29

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819969 -> 152819500 (-0.00%)
Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00%

Totals from 469 (0.07% of 662497) affected shaders:
Instrs: 7644 -> 7175 (-6.14%)
Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
6603948a7a nir/algebraic: Lower some bfi with two constant sources
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907054 -> 19906882 (<.01%)
instructions in affected programs: 8103 -> 7931 (-2.12%)
helped: 52 / HURT: 0

total cycles in shared programs: 855779334 -> 855781791 (<.01%)
cycles in affected programs: 724201 -> 726658 (0.34%)
helped: 38 / HURT: 7

total sends in shared programs: 1039308 -> 1039302 (<.01%)
sends in affected programs: 162 -> 156 (-3.70%)
helped: 2 / HURT: 0

No shader-db changes on any older Intel platforms.

All Intel platforms had similar restuls. (Ice Lake shown)
Totals:
Instrs: 153117340 -> 152825222 (-0.19%); split: -0.19%, +0.00%
Cycles: 15011904351 -> 15014072944 (+0.01%); split: -0.04%, +0.05%
Send messages: 7711509 -> 7711421 (-0.00%)
Spill count: 100745 -> 99907 (-0.83%); split: -0.85%, +0.02%
Fill count: 203684 -> 202459 (-0.60%); split: -0.62%, +0.02%
Scratch Memory Size: 4403200 -> 4376576 (-0.60%)

Totals from 18603 (2.81% of 662496) affected shaders:
Instrs: 5258303 -> 4966185 (-5.56%); split: -5.56%, +0.00%
Cycles: 447391388 -> 449559981 (+0.48%); split: -1.29%, +1.77%
Send messages: 559231 -> 559143 (-0.02%)
Spill count: 5009 -> 4171 (-16.73%); split: -17.17%, +0.44%
Fill count: 8769 -> 7544 (-13.97%); split: -14.33%, +0.36%
Scratch Memory Size: 194560 -> 167936 (-13.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Jesse Natalie
fba82797d7 nir: Optimize unpacking 16 bit values that were originally packed
I was seeing u2u64 still in my final shader after pack/unpack were
lowered, which sounds to me like some other optimizations are missing
for detecting the post-lowering pack/unpack patterns, but let's at
least add some patterns for the simple cases.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Ian Romanick
7ef45e661f intel/fs: Add constant propagation for ADD3
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.

v3: Remove redundant patterns. Noticed by Matt.

shader-db:

DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15

total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32

Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.

All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.

fossil-db:

DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210

Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922

Lost: 90

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
7b34808649 nir/algebraic: Fixup iadd3 related patterns
There should not be any isub at this point due to lowerings that happened
ages before getting to late algebraic.

shader-db:

DG2
total instructions in shared programs: 23103769 -> 23103767 (<.01%)
instructions in affected programs: 65 -> 63 (-3.08%)
helped: 1 / HURT: 0

total cycles in shared programs: 842348074 -> 842347714 (<.01%)
cycles in affected programs: 28572 -> 28212 (-1.26%)
helped: 3 / HURT: 0

One compute shader in Assassin's Creed Odyssey was affected.

fossil-db:

DG2
Instructions in all programs: 196400668 -> 196400676 (+0.0%)
helped: 8 / HURT: 5

Cycles in all programs: 13931740724 -> 13931758003 (+0.0%)
helped: 8 / HURT: 7

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Jesse Natalie
2d3fbb44f4 nir: Add preserve_mediump as a shader compiler option
The DXIL backend would like to distinguish between casts to 16-bit
that must cast, vs those that may. If a shader only ever produces
16-bit types from mediump casts and ALU ops on those values, then
the resulting shader can be annotated with DXIL's min-precision
qualifier, basically telling the driver to use 16-bit precision if
it's faster for them. If it uses concrete 16-bit casts, or loads/
stores to externally-visible memory, then it must use the "native"
16-bit flag, which is not supported on all hardware.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>
2023-06-01 23:01:04 +00:00
Jesse Natalie
6c62eaf22d nir_opt_algebraic: Don't shrink 64-bit bitwise ops if pack_split is going to be lowered
Otherwise this can cause optimizations to fight resulting in infinite
optimization loops with opt_algebraic, constant_folding, and copy_prop.

Fixes: 368be872 ("nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23192>
2023-06-01 00:36:10 +00:00
Ian Romanick
71e5530c07 nir/algebraic: Undistribute fsat from fmax
To be helpful, the thing inside the fsat has to be used with and without
the fsat. Otherwise it just moves a saturate destination modifier
around. To not be harmful, the fsat has to only be used by the bcsel.

All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20174475 -> 20174449 (<.01%)
instructions in affected programs: 3913 -> 3887 (-0.66%)
helped: 13 / HURT: 0

total cycles in shared programs: 866844832 -> 866844719 (<.01%)
cycles in affected programs: 46037 -> 45924 (-0.25%)
helped: 10 / HURT: 1

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 161491468 -> 161491372 (-0.0%)
helped: 31 / HURT: 8

Cycles in all programs: 10933090736 -> 10933024716 (-0.0%)
helped: 32 / HURT: 18

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
2023-03-29 23:48:19 +00:00
Faith Ekstrand
01275a1a95 nir: Drop a bunch of Authors tags
This is what git blame is for.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22120>
2023-03-26 00:16:25 +00:00
Georg Lehmann
cec04adcee nir: optimize i2f(f2i(fsign))
Foz-DB Navi10:
Totals from 3013 (2.23% of 134906) affected shaders:
VGPRs: 138068 -> 136964 (-0.80%); split: -0.80%, +0.00%
CodeSize: 10476416 -> 10391800 (-0.81%)
MaxWaves: 79118 -> 80088 (+1.23%)
Instrs: 1963227 -> 1945003 (-0.93%)
Latency: 24734883 -> 24649279 (-0.35%); split: -0.39%, +0.05%
InvThroughput: 6366777 -> 6334735 (-0.50%); split: -0.50%, +0.00%
VClause: 36845 -> 36882 (+0.10%); split: -0.26%, +0.36%
SClause: 59249 -> 59273 (+0.04%); split: -0.25%, +0.29%
Copies: 108570 -> 108501 (-0.06%); split: -0.19%, +0.13%
PreSGPRs: 105371 -> 105862 (+0.47%)
PreVGPRs: 117675 -> 116625 (-0.89%); split: -0.89%, +0.00%

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22003>
2023-03-22 05:34:55 +00:00
Isabella Basso
59fea8af3a nir/algebraic: remove duplicate bool conversion lowerings
While [1] added some boolean conversion lowering patterns, those were
already dealt with on [2].

[1] - b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
[2] - d7e0d47b ("nir/algebraic: nir: Add a bunch of b2[if] optimizations")

Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:38 +00:00
Isabella Basso
a553d3cd29 nir/algebraic: make patterns for float conversion lowerings imprecise
As noted on [1], lowering patterns of the form
floatS -> floatB -> floatS ==> floatS
cannot require precision since this may cause flush denorming.

[1] 3f779013 ("nir: Add an algebraic optimization for float->double->float")

Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Isabella Basso
79c94ef52e nir/algebraic: extend lowering patterns for conversions on smaller bit sizes
Conversions on smaller bit sizes should also be collapsed when composed.

This also adds more patterns on the
intS -> intB -> floatB ==> intS -> floatB
lowering so as to deal with any int size C > B instead of a fixed intB.

Closes: #7776
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Isabella Basso
a27bcd63d0 nir/algebraic: extend mediump patterns
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Italo Nicola <italonicola@collabora.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Isabella Basso
b3685f3ba7 nir/algebraic: insert patterns inside optimizations list
Some patterns were outside the list of optimizations.

Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
2023-03-11 17:21:37 +00:00
Ian Romanick
831f9d3f61 nir/algebraic: Optimize some ifind_msb to ufind_msb
On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.

On platforms that use lower_find_msb_to_reverse, there should be no
difference.

All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0

total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
2d6f48f6ef nir/algebraic: Do not generate 8- or 16-bit find_msb
The next commit will add validation to restrict this instruction (and
others) to only 32-bit or 64-bit sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
28311f9d02 nir: intel/compiler: Move ufind_msb lowering to NIR
Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
a4052e70ea nir/algebraic: Only lower ufind_msb with 32-bit sources
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
0cc7bf63b7 nir: intel/compiler: Move ifind_msb lowering to NIR
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Georg Lehmann
aeb68c29b4 nir/opt_algebraic: add patterns for iand/ior of feq/fneu with 0
Foz-DB Navi21:
Totals from 1245 (0.92% of 134913) affected shaders:
VGPRs: 66232 -> 66248 (+0.02%); split: -0.01%, +0.04%
CodeSize: 5874976 -> 5868168 (-0.12%); split: -0.17%, +0.05%
MaxWaves: 25278 -> 25274 (-0.02%); split: +0.01%, -0.02%
Instrs: 1087502 -> 1085267 (-0.21%); split: -0.21%, +0.00%
Latency: 6531489 -> 6531672 (+0.00%); split: -0.04%, +0.05%
InvThroughput: 1531774 -> 1532327 (+0.04%); split: -0.02%, +0.05%
VClause: 22218 -> 22202 (-0.07%); split: -0.08%, +0.00%
SClause: 45906 -> 45873 (-0.07%); split: -0.08%, +0.01%
Copies: 64004 -> 64102 (+0.15%); split: -0.24%, +0.39%
Branches: 21529 -> 21534 (+0.02%); split: -0.00%, +0.03%
PreSGPRs: 51936 -> 51850 (-0.17%)
PreVGPRs: 55393 -> 55398 (+0.01%); split: -0.02%, +0.03%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21576>
2023-03-01 11:24:43 +00:00
Emma Anholt
6d52e6fd2c nir: Port a floor->truncate algebraic opt pattern from GLSL.
Prevents regression when dropping code from the GLSL optimizer.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
2023-02-28 03:36:09 +00:00
Emma Anholt
ef02581590 nir: Add optimization for fdot(x, 0) -> 0.
We had all these nice fdot opts to drop individual channels that were 0,
but nothing handling it being entirely 0!  Avoids r300g regression when
dropping them from GLSL.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
2023-02-28 03:36:08 +00:00
Ian Romanick
ea413e826b nir: Eliminate nir_op_f2b
Builds on the work of !15121.  This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.

No shader-db or fossil-db changes on any Intel platform.

v2: Rebase on 1a35acd8d9.

v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.

v4: Another rebase. Remove f2b stuff from Midgard.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
2023-02-03 22:39:57 +00:00