fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-21 02:28:07 +02:00

Author	SHA1	Message	Date
Gert Wollny	3b3c3ccf56	nir+r600: add option to avoid contracting fabs into ffma Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On r600 ternary operations can't use the fabs source modifier, so converting "fadd(fabs(fmul(a, b), c)" to "ffma(fabs(a), fabs(b), c)" adds one more instruction in the backend, hence avoid this. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37440>	2025-09-17 21:03:58 +00:00
Christian Gmeiner	a7d2570296	nir/opt_algebraic: optimize f2i32(fround_even(x)) to f2i32_rtne(x) Add late optimization to fuse f2i32 and fround_even operations into a single f2i32_rtne instruction when the intermediate fround_even result is only used once. This eliminates redundant rounding since f2i32_rtne performs round-to-nearest-even conversion directly. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Tested-by: Simon Perretta <simon.perretta@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37426>	2025-09-17 20:31:59 +00:00
Karmjit Mahil	9c6183604f	nir, ir3: Add `lower_fmulz_with_abs_min` backend option This commits adds the `lower_fmulz_with_abs_min` which lowers `fmulz` -> `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b` `ffmaz` -> `min(abs(a), abs(b)) == 0.0 ? c : ffma(a, b, c) This is useful for ISAs which have `abs` for free on `min` such as ir3. Adreno A750 Benchmark of 10 runs of 5 DX9 single frame trimmed captures looped 2048 times using u_trace measuring `start_render_pass` to `end_render_pass` results: sysmem: -1.91156%, -2.21791%, -2.02533%, -2.21666%, -2.33272%, -2.67349%, -1.75278%, -2.05923%, -2.26892%, -2.10506% Avg: ~ -2.16% ST.S: ~ 0.25% gmem: -3.61496%, -3.66682%, -3.80901%, -3.51198%, -3.72950%, -3.71413%, -3.64467%, -3.67092%, -3.90640%, -3.83888% Avg: ~ -3.71% ST.S: ~ 0.12% Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>	2025-09-17 15:02:50 +00:00
Karmjit Mahil	8d19ffef0a	nir: Add more matches for `fmulz` In some cases after other passes, `(a == 0.0 ? 0 : b)` can be turned into `(a != 0.0 ? b : 0)`, so let's match those cases too. Also matching `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`. Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>	2025-09-17 15:02:50 +00:00
Daniel Schürmann	c78f1d516c	nir/algebraic: add pattern for (a << #b) * #c => a * (#c << #b) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Totals from 2545 (3.19% of 79839) affected shaders: (Navi48) Instrs: 6371003 -> 6364130 (-0.11%); split: -0.12%, +0.01% CodeSize: 33827548 -> 33812244 (-0.05%); split: -0.06%, +0.01% Latency: 47451755 -> 47430108 (-0.05%); split: -0.05%, +0.00% InvThroughput: 10442450 -> 10437159 (-0.05%); split: -0.05%, +0.00% SClause: 159829 -> 159874 (+0.03%); split: -0.01%, +0.04% Copies: 500725 -> 500721 (-0.00%); split: -0.01%, +0.01% PreSGPRs: 110482 -> 110478 (-0.00%); split: -0.00%, +0.00% PreVGPRs: 147289 -> 147287 (-0.00%); split: -0.00%, +0.00% VALU: 3456135 -> 3454241 (-0.05%); split: -0.06%, +0.01% SALU: 925982 -> 923616 (-0.26%) VOPD: 1243 -> 1212 (-2.49%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37173>	2025-09-06 10:18:42 +00:00
Christoph Pillmayer	f81f3c85e2	nir/opt_algebraic: Convert a + b + a to b + 2a Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This allows fusing into one FMA later. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113>	2025-09-05 11:39:51 +00:00
Georg Lehmann	3b06824e4c	nir/opt_algebraic: optimize some post peephole select patterns Foz-DB GFX1201: Totals from 208 (0.26% of 80287) affected shaders: Instrs: 427684 -> 426834 (-0.20%); split: -0.22%, +0.02% CodeSize: 2232616 -> 2228816 (-0.17%); split: -0.20%, +0.03% Latency: 3993934 -> 3992726 (-0.03%); split: -0.04%, +0.01% InvThroughput: 569055 -> 568622 (-0.08%); split: -0.09%, +0.01% SClause: 12932 -> 12927 (-0.04%) Copies: 22567 -> 22604 (+0.16%); split: -0.47%, +0.63% Branches: 7671 -> 7658 (-0.17%) VALU: 222047 -> 221625 (-0.19%) SALU: 83954 -> 83815 (-0.17%); split: -0.29%, +0.13% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>	2025-08-27 09:45:19 +00:00
Rhys Perry	46da666205	nir/algebraic: allow non-const for iand(iadd()) -> iadd(iand()) fossil-db (gfx1201): Totals from 596 (0.75% of 79839) affected shaders: Instrs: 691926 -> 691819 (-0.02%); split: -0.11%, +0.09% CodeSize: 3675216 -> 3675180 (-0.00%); split: -0.08%, +0.08% VGPRs: 37464 -> 37452 (-0.03%) Latency: 8566849 -> 8563162 (-0.04%); split: -0.09%, +0.05% InvThroughput: 1068038 -> 1063279 (-0.45%); split: -0.46%, +0.01% VClause: 17859 -> 17897 (+0.21%); split: -0.01%, +0.22% SClause: 16704 -> 16735 (+0.19%); split: -0.07%, +0.26% Copies: 45422 -> 45395 (-0.06%); split: -0.15%, +0.09% PreSGPRs: 24345 -> 24351 (+0.02%) PreVGPRs: 29121 -> 29128 (+0.02%) VALU: 349959 -> 348117 (-0.53%); split: -0.54%, +0.01% SALU: 105926 -> 107576 (+1.56%); split: -0.02%, +1.58% VOPD: 252 -> 234 (-7.14%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Rhys Perry	4f83059ac5	nir/algebraic: improve is_unsigned_multiple_of_4 and use it more fossil-db (gfx1201): Totals from 160 (0.20% of 79839) affected shaders: MaxWaves: 4008 -> 3952 (-1.40%) Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00% CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00% VGPRs: 9492 -> 9612 (+1.26%) Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00% InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35% VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25% SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11% Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20% PreSGPRs: 8914 -> 8938 (+0.27%) PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24% VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01% SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07% VMEM: 29365 -> 25327 (-13.75%) VOPD: 864 -> 1013 (+17.25%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>	2025-08-22 15:45:55 +00:00
Georg Lehmann	1d885fab9c	nir/opt_algebraic: optimize pack_half_rtz of b2f Foz-DB Navi21: Totals from 13 (0.02% of 80255) affected shaders: Instrs: 2313 -> 2306 (-0.30%); split: -0.35%, +0.04% CodeSize: 13452 -> 13480 (+0.21%) Latency: 12066 -> 12013 (-0.44%); split: -0.45%, +0.01% InvThroughput: 2172 -> 2163 (-0.41%) Copies: 112 -> 114 (+1.79%) VALU: 1480 -> 1472 (-0.54%) SALU: 154 -> 155 (+0.65%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	bc3b09c5dd	nir/opt_algebraic: optimize pack_half_rtz of bcsel with constant Foz-DB Navi21: Totals from 448 (0.56% of 80255) affected shaders: Instrs: 345474 -> 344791 (-0.20%); split: -0.20%, +0.00% CodeSize: 1917784 -> 1913324 (-0.23%); split: -0.25%, +0.02% VGPRs: 22344 -> 22416 (+0.32%) Latency: 2320847 -> 2318161 (-0.12%); split: -0.13%, +0.01% InvThroughput: 543008 -> 541722 (-0.24%) SClause: 11450 -> 11459 (+0.08%) Copies: 19991 -> 19949 (-0.21%); split: -0.23%, +0.02% PreSGPRs: 19129 -> 19114 (-0.08%) PreVGPRs: 19695 -> 19696 (+0.01%); split: -0.01%, +0.01% VALU: 257627 -> 256948 (-0.26%) SALU: 30432 -> 30422 (-0.03%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	8512479097	nir/opt_algebraic: create 16bit fmin/fmax if only used by pack_half_2x16_rtz_split Foz-DB Navi21: Totals from 1842 (2.30% of 80066) affected shaders: Instrs: 869152 -> 866751 (-0.28%) CodeSize: 4687316 -> 4682496 (-0.10%); split: -0.14%, +0.03% VGPRs: 75216 -> 75312 (+0.13%) Latency: 7297749 -> 7297929 (+0.00%); split: -0.01%, +0.02% InvThroughput: 1864933 -> 1860706 (-0.23%); split: -0.23%, +0.00% Copies: 52679 -> 52463 (-0.41%) VALU: 665076 -> 662890 (-0.33%) SALU: 56226 -> 56010 (-0.38%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	22afe83473	nir/opt_algebraic: remove fneg around fmin/fmax Foz-DB Navi21: Totals from 282 (0.35% of 80255) affected shaders: Instrs: 310515 -> 309755 (-0.24%) CodeSize: 1721236 -> 1714540 (-0.39%) Latency: 1366446 -> 1365141 (-0.10%); split: -0.10%, +0.00% InvThroughput: 352528 -> 351097 (-0.41%); split: -0.41%, +0.00% Copies: 24623 -> 24630 (+0.03%) VALU: 231716 -> 230951 (-0.33%) SALU: 28774 -> 28779 (+0.02%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>	2025-08-04 19:42:22 +00:00
Georg Lehmann	cfd5fbfde1	nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16 Foz-DB Navi31: Totals from 11 out of 14 FSR4 shaders: Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21% CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15% Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00% InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10% VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91% Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47% VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21% VOPD: 1916 -> 1400 (-26.93%) These stats looks bad, but it's actually just unlucky RA. Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16) should still be a win from a register bandwidth perspective. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:30 +00:00
Georg Lehmann	261239a492	nir/opt_algebraic: use range analysis to detect no-op fmin/fmax Foz-DB Navi31: Totals from 418 (0.52% of 80273) affected shaders: Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01% CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01% Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00% InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00% Copies: 40126 -> 40125 (-0.00%) VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01% SALU: 50290 -> 50283 (-0.01%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:28 +00:00
Georg Lehmann	a0665e79e9	nir/opt_algebraic: push fsat into bcsel with constant bcsel doesn't have a free clamp modifier on AMD hardware, but what's inside might have free clamp. Foz-DB Navi31: Totals from 873 (1.09% of 80273) affected shaders: MaxWaves: 22008 -> 21968 (-0.18%) Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02% CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01% VGPRs: 57900 -> 57960 (+0.10%) Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02% InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00% VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14% SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02% Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10% PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05% PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08% VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01% SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Georg Lehmann	e9e5146848	nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive Foz-DB Navi31: Totals from 946 (1.18% of 80273) affected shaders: Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00% CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00% Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00% InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00% VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00% SClause: 109694 -> 109688 (-0.01%) Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01% Branches: 132636 -> 132633 (-0.00%) PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00% VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00% SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>	2025-08-01 20:29:27 +00:00
Georg Lehmann	e43ef6533b	nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw) Explicit 16bit instructions are nicer to vectorize. Helps FSR4 on GFX11 marginally. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 59781 -> 58518 (-2.11%) CodeSize: 413428 -> 404156 (-2.24%) Latency: 193770 -> 190768 (-1.55%) InvThroughput: 226274 -> 221628 (-2.05%) VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63% Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02% PreSGPRs: 312 -> 305 (-2.24%) VALU: 51448 -> 50213 (-2.40%) SALU: 1074 -> 1048 (-2.42%) VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>	2025-07-30 07:25:51 +00:00
Georg Lehmann	037c2532ab	nir/opt_algebraic: create non 32bit bitfield_select Foz-DB Navi21: Totals from 68 (0.08% of 80255) affected shaders: Instrs: 197878 -> 197709 (-0.09%); split: -0.09%, +0.00% CodeSize: 1060700 -> 1060472 (-0.02%); split: -0.02%, +0.00% Latency: 659865 -> 659673 (-0.03%); split: -0.03%, +0.00% InvThroughput: 117010 -> 116985 (-0.02%); split: -0.03%, +0.00% VClause: 3781 -> 3779 (-0.05%) Copies: 15317 -> 15265 (-0.34%); split: -0.35%, +0.01% PreVGPRs: 3251 -> 3250 (-0.03%) VALU: 96800 -> 96799 (-0.00%); split: -0.00%, +0.00% SALU: 57006 -> 56836 (-0.30%); split: -0.30%, +0.00% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Alyssa Rosenzweig	421d0e0953	nir: mark exact fmul in ldexp lowering this chain of fmul is deliberately chosen for floating point precision reasons, it needs to be exact, or else we might try to reassociate it and break subnormal handling. avoids regressing dEQP-VK.glsl.builtin.precision.ldexp_subnormals.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>	2025-07-21 11:42:18 +00:00
Alyssa Rosenzweig	042adf3cc5	nir/opt_algebraic: optimize signed pow in Control used in a post-processing shader which goes 896 instrs -> 749 instrs. In my Control fossil: Totals from 2 (0.63% of 319) affected shaders: Instrs: 2078 -> 1841 (-11.41%) CodeSize: 14540 -> 12800 (-11.97%) ALU: 1779 -> 1626 (-8.60%) FSCIB: 1779 -> 1626 (-8.60%) Uniforms: 370 -> 372 (+0.54%) In radv_fossils, there are affected shaders in Dredge. Totals from 4 (0.01% of 54019) affected shaders: Instrs: 2306 -> 2294 (-0.52%) CodeSize: 16594 -> 16534 (-0.36%) ALU: 2010 -> 2004 (-0.30%) FSCIB: 2010 -> 2004 (-0.30%) Uniforms: 1138 -> 1146 (+0.70%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>	2025-07-08 17:09:16 +00:00
Alyssa Rosenzweig	2765017553	nir: fuse ffma even with float controls The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise, because they are a contraction. However, they respect signed zero/Inf/NaN rules. As such, it is legal to do this fusion with shader float controls as long as the exact bit is not set (mapping to SPIR-V NoContract). Unfortunately, NIR's imprecise rules do not distinguish between contraction issues versus float special case issues, forcing nir_search to skip all imprecise rules when any shader float control modes are used. This notably affects DXVK, which sets shader float controls to get D3D11 float behaviour and hence loses FMA fusing. Therefore, we plumb in the exact bit to express NoContract independent of the float controls, and weaken the requirement for fma fusion to allowable contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in SPIR-V is just a multiply add that we're allowed to contract rather than the real deal. Drivers that use their own FMA fusing passes (notably, Intel and AMD) are unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results on hk shown: Totals from 2194 (4.06% of 54019) affected shaders: MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01% Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01% CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01% Spills: 1094 -> 747 (-31.72%) Fills: 988 -> 681 (-31.07%) Scratch: 4444 -> 3820 (-14.04%) ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01% FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01% IC: 215398 -> 215274 (-0.06%) GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96% Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04% Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07% Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>	2025-07-08 17:09:16 +00:00
Georg Lehmann	045ddb992a	nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat Helps vectorized emulated fp16 -> fp8 conversions No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>	2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig	4f7cae5e61	nir/opt_algebraic: add trichotomy identity In https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802 we will significantly rework geometry shaders & transform feedback. In the new approach, transform feedback is executed as part of the hardware vertex shader, meaning the vertex shader needs to write out all the "copies" of the same value into different parts of the XFB buffer. In the general case of a GS writing triangle strips, we get 0-3 copies. This is good and lets us parallelize XFB better with GS. In the case of a VS alone with XFB, we insert a passthrough GS. In that case special case, we can only get at most 1 copy, so if we can prove the length of the output strip is 3 we can delete 2/3 of the shader. Anyway, the only thing preventing NIR from doing that optimization is failing to see through some conditionals, fixed by optimizing with the law of trichotomy. We could add other variants of this pattern (signed vs unsigned, iand vs ior/ixor) if we expect anything else to hit this other than my boutique use case. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>	2025-06-30 16:24:04 +00:00
Emma Anholt	bc8994cb48	nir: Add a pass to reassociate multiplication of matmatvec. The typical case of mat4mat4vec4 is 80 scalar multiplications, but mat4(mat4vec4) is only 32. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>	2025-06-23 17:49:51 +00:00
Georg Lehmann	f047a67fba	nir,aco: optimize FP16_OFVL pattern created by vkd3d-proton Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>	2025-06-23 07:59:27 +00:00
Georg Lehmann	ad80b554f4	spirv: use feq for OpIsInf Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This effectively reverts `fcca6a83cd` because feq was clarified to be ordered when used with exact and without fast math flags. It's common for HW to only have free abs for floating point instructions. Foz-DB Navi21: Totals from 63 (0.08% of 80065) affected shaders: Instrs: 337027 -> 336667 (-0.11%); split: -0.12%, +0.02% CodeSize: 1846752 -> 1845000 (-0.09%); split: -0.13%, +0.03% Latency: 3401087 -> 3400633 (-0.01%); split: -0.04%, +0.03% InvThroughput: 847299 -> 845939 (-0.16%); split: -0.19%, +0.03% VClause: 7693 -> 7694 (+0.01%) Copies: 45175 -> 45240 (+0.14%); split: -0.12%, +0.27% PreSGPRs: 3555 -> 3553 (-0.06%) PreVGPRs: 4565 -> 4564 (-0.02%) VALU: 225473 -> 225245 (-0.10%); split: -0.13%, +0.03% SALU: 44735 -> 44625 (-0.25%) Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35437>	2025-06-11 18:34:21 +00:00
Lionel Landwerlin	978933c015	nir/opt_algebraic: extend lowering for (i\|u)bitfield_extract Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35334>	2025-06-04 16:28:39 +00:00
Samuel Pitoiset	226b0e28db	nir: generalize bitfield insert/extract sizes Original patch from Alyssa Rosenzweig Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35209>	2025-06-04 09:37:53 +00:00
Alyssa Rosenzweig	759dc70bde	nir: generalize bitfield_reverse bit size No reason we can't reverse other bit sizes, we just need to generalize the constant folding & bit size lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35198>	2025-05-28 16:29:30 +00:00
Ian Romanick	37ee91679a	nir/algebraic: Generalize an existing bfi(a, 0, ...) pattern No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 210561118 -> 210560921 (-0.00%) Send messages: 10979615 -> 10979613 (-0.00%) Cycle count: 31576352808 -> 31576347218 (-0.00%); split: -0.00%, +0.00% Max live registers: 66068161 -> 66068157 (-0.00%) Non SSA regs after NIR: 60230775 -> 60230949 (+0.00%) Totals from 180 (0.03% of 707082) affected shaders: Instrs: 68035 -> 67838 (-0.29%) Send messages: 3190 -> 3188 (-0.06%) Cycle count: 3979496 -> 3973906 (-0.14%); split: -0.14%, +0.00% Max live registers: 11812 -> 11808 (-0.03%) Non SSA regs after NIR: 18878 -> 19052 (+0.92%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>	2025-05-16 14:49:25 -07:00
Ian Romanick	464955bbdd	nir/algebraic: Optimize some open-coded extract_i8 These were initially observed in Hogwarts Legacy while working on something else entirely. Two compute shaders in that app are helped for spills and fills. On Skylake, one of the shaders benefits from this change, and the other is hurt pretty significantly. About 40 vertex shaders in Shadow of the Tomb Raider were helped for instructions. v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work properly with all bit sizes. Noticed by Georg. v3: No, really, fix the various errors to ensure the patterns will work properly with all bit sizes. Noticed by Georg. No shader-db changes on any Intel platform. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake) Totals: Instrs: 210566294 -> 210561118 (-0.00%) Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00% Spill count: 519300 -> 519280 (-0.00%) Fill count: 625181 -> 625161 (-0.00%) Scratch Memory Size: 36289536 -> 36281344 (-0.02%) Max live registers: 66068413 -> 66068161 (-0.00%) Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%) Totals from 1662 (0.24% of 707082) affected shaders: Instrs: 635064 -> 629888 (-0.82%) Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14% Spill count: 246 -> 226 (-8.13%) Fill count: 280 -> 260 (-7.14%) Scratch Memory Size: 16384 -> 8192 (-50.00%) Max live registers: 178491 -> 178239 (-0.14%) Non SSA regs after NIR: 169552 -> 169554 (+0.00%) Tiger Lake Totals: Instrs: 238544730 -> 238539407 (-0.00%) Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00% Max live registers: 42494925 -> 42494799 (-0.00%) Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%) Totals from 1662 (0.21% of 802704) affected shaders: Instrs: 626604 -> 621281 (-0.85%) Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02% Max live registers: 95405 -> 95279 (-0.13%) Non SSA regs after NIR: 181150 -> 181153 (+0.00%) Ice Lake Totals: Instrs: 238855310 -> 238826534 (-0.01%) Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00% Spill count: 575510 -> 575117 (-0.07%) Fill count: 713007 -> 708632 (-0.61%) Max live registers: 42499556 -> 42499432 (-0.00%) Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%) Totals from 1662 (0.21% of 805149) affected shaders: Instrs: 926887 -> 898111 (-3.10%) Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01% Spill count: 5168 -> 4775 (-7.60%) Fill count: 32883 -> 28508 (-13.30%) Max live registers: 95614 -> 95490 (-0.13%) Non SSA regs after NIR: 181150 -> 181153 (+0.00%) Skylake Totals: Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00% Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00% Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09% Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10% Max live registers: 33895714 -> 33895590 (-0.00%) Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%) Totals from 1655 (0.25% of 653734) affected shaders: Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64% Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01% Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70% Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82% Max live registers: 95115 -> 94991 (-0.13%) Non SSA regs after NIR: 180797 -> 180798 (+0.00%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>	2025-05-16 14:49:05 -07:00
Georg Lehmann	0a30611c10	nir/opt_algebraic: some bitfield_select optimizations Foz-DB Navi21: Totals from 47 (0.06% of 79789) affected shaders: Instrs: 69536 -> 69363 (-0.25%) CodeSize: 370624 -> 369388 (-0.33%) Latency: 383505 -> 383298 (-0.05%) InvThroughput: 72924 -> 72727 (-0.27%) PreSGPRs: 2618 -> 2610 (-0.31%) VALU: 43261 -> 43091 (-0.39%) SALU: 13065 -> 13063 (-0.02%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34739>	2025-05-13 10:59:09 +00:00
Karol Herbst	f0fa2209a8	nir: add nir_opt_algebraic_integer_promotion This handles basic operations where clang promotes integers to 32 bits according to the C99 spec in OpenCL C source code. This is its own opt_algerbraic pass, because we don't wanna fight with nir_lower_bit_size. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>	2025-05-12 09:29:20 +00:00
Georg Lehmann	02e743c99e	nir: add an option to lower bf2f and f2bf Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:25 +00:00
Rhys Perry	f538cae743	nir/algebraic: optimize ior(unpack_4x8, unpack_4x8<<8) to unpack_32_2x16 No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Christian Gmeiner	f17d350001	lima: Move fdot lowering from NIR to lima This change relocates the fdot lowering from the generic NIR to the lima, since lima is the only consumer of this particular lowering. This avoids potential conflicts with the similar fdot lowering already present in nir_lower_alu_width. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>	2025-04-30 17:33:38 +00:00
Georg Lehmann	3e26fc4498	nir/opt_algebraic: disable fsat(a + 1.0) opt if a can be NaN Foz-DB Navi21: Totals from 9 (0.01% of 79789) affected shaders: Instrs: 6782 -> 6796 (+0.21%); split: -0.03%, +0.24% CodeSize: 40020 -> 40108 (+0.22%); split: -0.04%, +0.26% Latency: 23764 -> 23758 (-0.03%) InvThroughput: 6424 -> 6431 (+0.11%); split: -0.08%, +0.19% SClause: 273 -> 275 (+0.73%) Copies: 338 -> 339 (+0.30%) VALU: 5138 -> 5147 (+0.18%); split: -0.06%, +0.23% SALU: 349 -> 350 (+0.29%) SMEM: 498 -> 500 (+0.40%) Fixes: `a4a3487aae` ("nir/opt_algebraic: optimize patterns from Skia") Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>	2025-04-22 14:23:05 +00:00
Georg Lehmann	8ad695195e	nir/opt_algebraic: turn exact fmin(1.0, a) into fsat if a is not NaN and not negative Foz-DB Navi21: Totals from 2456 (3.08% of 79789) affected shaders: Instrs: 3415398 -> 3413352 (-0.06%); split: -0.06%, +0.00% CodeSize: 18781096 -> 18776092 (-0.03%); split: -0.03%, +0.00% VGPRs: 158512 -> 158528 (+0.01%) Latency: 39528900 -> 39526687 (-0.01%); split: -0.01%, +0.00% InvThroughput: 10612237 -> 10609296 (-0.03%); split: -0.03%, +0.00% VClause: 71028 -> 71034 (+0.01%) SClause: 93971 -> 93975 (+0.00%); split: -0.00%, +0.01% Copies: 257525 -> 257521 (-0.00%); split: -0.01%, +0.01% VALU: 2483374 -> 2481325 (-0.08%); split: -0.09%, +0.00% SALU: 348207 -> 348211 (+0.00%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>	2025-04-22 14:23:04 +00:00
Georg Lehmann	18a0de1834	nir/opt_algebraic: optimize fmax(ffma(a, b, c), 0.0) to fsat Foz-DB Navi21: Totals from 2621 (3.28% of 79789) affected shaders: MaxWaves: 55744 -> 55736 (-0.01%) Instrs: 2840180 -> 2832647 (-0.27%); split: -0.27%, +0.00% CodeSize: 15497364 -> 15464692 (-0.21%); split: -0.21%, +0.00% VGPRs: 138448 -> 138456 (+0.01%) Latency: 22319512 -> 22307018 (-0.06%); split: -0.06%, +0.01% InvThroughput: 5745108 -> 5729197 (-0.28%); split: -0.28%, +0.00% Copies: 110279 -> 110268 (-0.01%); split: -0.04%, +0.03% VALU: 2210578 -> 2203211 (-0.33%); split: -0.33%, +0.00% SALU: 169014 -> 168841 (-0.10%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>	2025-04-22 14:23:04 +00:00
Georg Lehmann	f71fc26393	nir/opt_algebraic: generalize fmax(fadd(a, b), 0.0) to fsat by not requiring fneg Not a large effect, but it's positive and makes the pattern simpler. Foz-DB Navi21: Totals from 1 (0.00% of 79789) affected shaders: Instrs: 145 -> 138 (-4.83%) CodeSize: 784 -> 756 (-3.57%) Latency: 1495 -> 1487 (-0.54%) InvThroughput: 210 -> 196 (-6.67%) VALU: 103 -> 96 (-6.80%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>	2025-04-22 14:23:04 +00:00
Ian Romanick	1d2ebeca17	nir/algebraic: Allow fmin(a,a) optimization when flush denorm to zero is not set I was surprised this had any affect on Intel GPUs because we have been unconditionally performing this optimization in the backend since June 2014. Once that error is fixed (later in this MR), this change prevents a couple dozen regressions in shader-db and around 90 regressions in fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and that can be a big deal. v2: Add 64-bit too. Suggested by Alyssa. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 16970141 -> 16970139 (<.01%) instructions in affected programs: 40 -> 38 (-5.00%) helped: 2 / HURT: 0 total cycles in shared programs: 914617580 -> 914617548 (<.01%) cycles in affected programs: 3428 -> 3396 (-0.93%) helped: 2 / HURT: 0 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00% Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%) Totals from 83 (0.01% of 706657) affected shaders: Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02% Non SSA regs after NIR: 78997 -> 78901 (-0.12%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Georg Lehmann	d046ecf95a	nir/opt_algebraic: optimize open coded ffract Foz-DB Navi21: Totals from 274 (0.34% of 79789) affected shaders: Instrs: 522630 -> 522181 (-0.09%); split: -0.09%, +0.01% CodeSize: 2880668 -> 2878940 (-0.06%); split: -0.07%, +0.01% VGPRs: 14488 -> 14464 (-0.17%) Latency: 4092358 -> 4091243 (-0.03%); split: -0.04%, +0.01% InvThroughput: 1014148 -> 1013471 (-0.07%); split: -0.07%, +0.00% VClause: 11646 -> 11639 (-0.06%) SClause: 18614 -> 18611 (-0.02%) Copies: 56248 -> 56309 (+0.11%); split: -0.05%, +0.16% PreVGPRs: 13649 -> 13647 (-0.01%) VALU: 359733 -> 359285 (-0.12%); split: -0.13%, +0.01% SALU: 59719 -> 59720 (+0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33369>	2025-04-11 12:36:02 +00:00
Marek Olšák	1d5c42528b	nir/opt_algebraic: lower 16-bit imul_high & umul_high Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>	2025-04-07 19:44:22 +00:00
Georg Lehmann	2b1fc1a7fe	nir: add option to keep mul24_relaxed Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>	2025-03-27 06:24:15 +00:00
Georg Lehmann	b386659588	nir/opt_algebraic: create ubfe from (a & mask) >> c Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi21: Totals from 917 (1.16% of 79188) affected shaders: Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00% CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00% Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01% InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01% VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00% SClause: 62294 -> 62293 (-0.00%) Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01% VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00% SALU: 359390 -> 358910 (-0.13%) VMEM: 101966 -> 101924 (-0.04%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>	2025-03-14 11:15:04 +00:00
Georg Lehmann	d272a6e261	nir/opt_algebraic: optimize d3d a ? b : 0 Foz-DB Navi21: Totals from 3466 (4.34% of 79789) affected shaders: MaxWaves: 73163 -> 73161 (-0.00%); split: +0.02%, -0.02% Instrs: 3993862 -> 3987633 (-0.16%); split: -0.19%, +0.04% CodeSize: 21747420 -> 21725620 (-0.10%); split: -0.15%, +0.05% VGPRs: 190736 -> 190728 (-0.00%); split: -0.04%, +0.03% SpillSGPRs: 489 -> 478 (-2.25%); split: -2.86%, +0.61% Latency: 48169718 -> 48159068 (-0.02%); split: -0.05%, +0.02% InvThroughput: 12132999 -> 12128721 (-0.04%); split: -0.05%, +0.01% VClause: 78063 -> 78052 (-0.01%); split: -0.09%, +0.08% SClause: 109095 -> 108996 (-0.09%); split: -0.13%, +0.04% Copies: 265784 -> 264530 (-0.47%); split: -0.72%, +0.25% Branches: 84533 -> 84553 (+0.02%) PreSGPRs: 172577 -> 172531 (-0.03%); split: -0.19%, +0.16% PreVGPRs: 165776 -> 165825 (+0.03%); split: -0.06%, +0.09% VALU: 2851544 -> 2850426 (-0.04%); split: -0.08%, +0.04% SALU: 413543 -> 408408 (-1.24%); split: -1.45%, +0.21% VMEM: 139890 -> 139887 (-0.00%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>	2025-03-01 07:49:28 +00:00
Georg Lehmann	2e7f34af6b	nir/opt_algebraic: optimize more ine/ieq(umin(b2i, ), 0) Foz-DB Navi21: Totals from 76 (0.10% of 79789) affected shaders: MaxWaves: 1050 -> 1062 (+1.14%) Instrs: 113754 -> 113691 (-0.06%); split: -0.11%, +0.06% CodeSize: 605096 -> 605216 (+0.02%); split: -0.03%, +0.05% VGPRs: 6024 -> 5976 (-0.80%) Latency: 1776501 -> 1777519 (+0.06%); split: -0.06%, +0.12% InvThroughput: 379644 -> 376751 (-0.76%) SClause: 2132 -> 2134 (+0.09%) Copies: 4131 -> 4128 (-0.07%); split: -1.77%, +1.69% PreSGPRs: 4275 -> 4270 (-0.12%) PreVGPRs: 5568 -> 5526 (-0.75%) VALU: 86732 -> 86581 (-0.17%); split: -0.24%, +0.07% SALU: 7112 -> 7198 (+1.21%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>	2025-03-01 07:49:28 +00:00
Georg Lehmann	7bc3062a3b	nir/opt_algebraic: push comparisons with constants into bcsel with constant Foz-DB Navi21: Totals from 1657 (2.08% of 79789) affected shaders: MaxWaves: 30275 -> 30261 (-0.05%); split: +0.01%, -0.05% Instrs: 3316251 -> 3315701 (-0.02%); split: -0.04%, +0.02% CodeSize: 17831924 -> 17832020 (+0.00%); split: -0.06%, +0.06% SpillSGPRs: 815 -> 859 (+5.40%) SpillVGPRs: 3335 -> 3293 (-1.26%) Scratch: 231424 -> 230400 (-0.44%) Latency: 33413310 -> 33402751 (-0.03%); split: -0.04%, +0.01% InvThroughput: 9116062 -> `9112904` (-0.03%); split: -0.04%, +0.00% VClause: 65587 -> 65560 (-0.04%); split: -0.05%, +0.01% SClause: 86208 -> 86261 (+0.06%); split: -0.02%, +0.08% Copies: 356158 -> 356439 (+0.08%); split: -0.07%, +0.15% PreSGPRs: 101710 -> 101806 (+0.09%); split: -0.01%, +0.11% PreVGPRs: 89293 -> 89286 (-0.01%); split: -0.04%, +0.04% VALU: 2220900 -> 2218839 (-0.09%); split: -0.11%, +0.01% SALU: 472988 -> 474567 (+0.33%); split: -0.08%, +0.42% VMEM: 118401 -> 118347 (-0.05%) SMEM: 123597 -> 123592 (-0.00%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>	2025-03-01 07:49:27 +00:00
Georg Lehmann	3837bc6d16	nir/opt_algebraic: optimize ~a == ~b and ~a == #b Foz-DB Navi21: Totals from 2 (0.00% of 79789) affected shaders: Instrs: 8343 -> 8323 (-0.24%) CodeSize: 43884 -> 43764 (-0.27%) Latency: 19390 -> 19363 (-0.14%) InvThroughput: 3380 -> 3356 (-0.71%) VALU: 5413 -> 5393 (-0.37%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>	2025-03-01 07:49:27 +00:00

1 2 3 4 5 ...

654 commits