fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Marek Olšák	3670d42c74	nir/algebraic: optimize (a \| b) \| (a \| c) ==> (a \| b) \| c shader-db with ACO: 3 shaders have -0.11% average decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Marek Olšák	978ad93375	nir/algebraic: optimize (a & b) & (a & c) ==> (a & b) & c shader-db with ACO: 3 shaders have -0.57% average decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Marek Olšák	83b093f95e	nir/algebraic: use is_used_once in a few iand/ior patterns shader-db with ACO: 1 shader has -4 decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Kenneth Graunke	92797c6878	nir/algebraic: Reassociate fadd into fmul in DP4-like pattern This extends the optimization from commit `09705747d7` ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas for a DP4-style pattern. Moving the add to the other end of the sequence allows it to be fused into an FMA. fossil-db results from Alchemist: Totals: Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00% Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00% Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01% Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18% Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23% Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14% Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00% Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00% Totals from 16378 (2.56% of 640872) affected shaders: Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04% Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01% Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56% Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53% Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48% Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40% Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07% Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10% Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32197>	2024-12-02 13:15:16 +00:00
Rhys Perry	4c7d6e9437	nir/algebraic: optimize more bcsel(, bcsel()) This inot should be pretty optimizable. fossil-db (navi21); Totals from 2361 (2.97% of 79395) affected shaders: MaxWaves: 50808 -> 50890 (+0.16%) Instrs: 4168195 -> 4167332 (-0.02%); split: -0.05%, +0.03% CodeSize: 22727496 -> 22708088 (-0.09%); split: -0.12%, +0.03% VGPRs: 135160 -> 134824 (-0.25%) SpillSGPRs: 723 -> 725 (+0.28%) Latency: 37498671 -> 37479794 (-0.05%); split: -0.07%, +0.02% InvThroughput: 10468406 -> 10453028 (-0.15%); split: -0.16%, +0.01% VClause: 98258 -> 98283 (+0.03%); split: -0.04%, +0.07% SClause: 111281 -> 111323 (+0.04%); split: -0.06%, +0.09% Copies: 299281 -> 300155 (+0.29%); split: -0.17%, +0.46% Branches: 115951 -> 116111 (+0.14%); split: -0.00%, +0.14% PreSGPRs: 109404 -> 109462 (+0.05%); split: -0.14%, +0.19% PreVGPRs: 114558 -> 114421 (-0.12%) VALU: 2876823 -> 2869990 (-0.24%); split: -0.24%, +0.00% SALU: 500286 -> 506124 (+1.17%); split: -0.03%, +1.20% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>	2024-11-21 14:50:45 +00:00
Rhys Perry	7ef1585fd6	nir/algebraic: add is_used_once to bcsel(, bcsel()) opts fossil-db (navi21): Totals from 888 (1.12% of 79395) affected shaders: MaxWaves: 18034 -> 18046 (+0.07%) Instrs: 3422053 -> 3418446 (-0.11%); split: -0.11%, +0.01% CodeSize: 18520912 -> 18500604 (-0.11%); split: -0.12%, +0.01% VGPRs: 53200 -> 53176 (-0.05%) Latency: 27739575 -> 27735200 (-0.02%); split: -0.06%, +0.04% InvThroughput: 6784257 -> 6782188 (-0.03%); split: -0.06%, +0.03% VClause: 83188 -> 83199 (+0.01%); split: -0.00%, +0.02% SClause: 91350 -> 91362 (+0.01%); split: -0.00%, +0.02% Copies: 263277 -> 262638 (-0.24%); split: -0.29%, +0.05% PreSGPRs: 52478 -> 51940 (-1.03%); split: -1.03%, +0.01% PreVGPRs: 47418 -> 47397 (-0.04%); split: -0.06%, +0.02% VALU: 2235368 -> 2234513 (-0.04%); split: -0.05%, +0.01% SALU: 547587 -> 544839 (-0.50%); split: -0.51%, +0.00% VMEM: 142861 -> 142871 (+0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>	2024-11-21 14:50:45 +00:00
Alyssa Rosenzweig	61862b209e	nir/opt_algebraic: optimize convert_uint_sat(ulong) I wrote this in my query copy shader, it didn't get the codegen I expected, so I investigated. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32208>	2024-11-20 16:53:50 +00:00
Rhys Perry	327e5465fc	nir/algebraic: check bit sizes in lowered unpack(pack()) optimization Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `894f7f4387` ("nir_opt_algebraic: Add a couple optimizations for lowered unpack(pack())") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32157>	2024-11-19 18:17:18 +00:00
Rhys Perry	ecd6ae12fb	nir/algebraic: fix iabs(ishr(iabs(a), b)) optimization iabs(a) is not positive if "a" is the minimum signed value, so this is incorrect in that case for some values of "b". Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `2b76de9b5d` ("nir/algebraic: Add a couple optimizations for iabs and ishr") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32157>	2024-11-19 18:17:17 +00:00
Rhys Perry	0c7830eb85	nir/algebraic: optimize ushr(a, ishl(iand(b, 3), 3)) nir_lower_mem_access_bit_sizes creates this. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Rhys Perry	e95a3364b8	nir/algebraic: optimize bcsel(ieq(b, 0), a, shift(a, b)) nir_lower_mem_access_bit_sizes can create this. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Alyssa Rosenzweig	fc460e7f20	nir/opt_algebraic: don't lower amul if requested Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>	2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig	227026b7ad	nir/opt_algebraic: add another 64-bit pattern clpeak Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>	2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig	2a3f133fd0	nir/opt_algebraic: add more 64-bit patterns Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>	2024-11-08 21:15:41 -04:00
Alyssa Rosenzweig	a4a3487aae	nir/opt_algebraic: optimize patterns from Skia shaders/skia/1567.shader_test relies on algebraic + constant folding, subtle changes in the input compiling flow can cause it to baloon. these patterns fix that. annoying! shader-db results aren't amazing, but they avert a major stats regression for that one Skia shader. total instructions in shared programs: 2751399 -> 2751295 (<.01%) instructions in affected programs: 6509 -> 6405 (-1.60%) helped: 21 HURT: 1 helped stats (abs) min: 1 max: 14 x̄: 5.62 x̃: 6 helped stats (rel) min: 0.53% max: 13.73% x̄: 3.57% x̃: 1.62% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 2.45% max: 2.45% x̄: 2.45% x̃: 2.45% 95% mean confidence interval for instructions value: -7.09 -2.36 95% mean confidence interval for instructions %-change: -5.14% -1.45% Instructions are helped. total alu in shared programs: 2274577 -> 2274468 (<.01%) alu in affected programs: 6178 -> 6069 (-1.76%) helped: 21 HURT: 1 helped stats (abs) min: 1 max: 14 x̄: 5.86 x̃: 7 helped stats (rel) min: 0.55% max: 16.47% x̄: 3.93% x̃: 1.72% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 2.83% max: 2.83% x̄: 2.83% x̃: 2.83% 95% mean confidence interval for alu value: -7.35 -2.56 95% mean confidence interval for alu %-change: -5.67% -1.57% Alu are helped. total fscib in shared programs: 2272894 -> 2272785 (<.01%) fscib in affected programs: 6178 -> 6069 (-1.76%) helped: 21 HURT: 1 helped stats (abs) min: 1 max: 14 x̄: 5.86 x̃: 7 helped stats (rel) min: 0.55% max: 16.47% x̄: 3.93% x̃: 1.72% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 2.83% max: 2.83% x̄: 2.83% x̃: 2.83% 95% mean confidence interval for fscib value: -7.35 -2.56 95% mean confidence interval for fscib %-change: -5.67% -1.57% Fscib are helped. total bytes in shared programs: 21489352 -> 21488668 (<.01%) bytes in affected programs: 53362 -> 52678 (-1.28%) helped: 21 HURT: 2 helped stats (abs) min: 6 max: 98 x̄: 35.52 x̃: 40 helped stats (rel) min: 0.39% max: 10.63% x̄: 2.27% x̃: 1.27% HURT stats (abs) min: 2 max: 60 x̄: 31.00 x̃: 31 HURT stats (rel) min: 0.08% max: 1.40% x̄: 0.74% x̃: 0.74% 95% mean confidence interval for bytes value: -42.73 -16.74 95% mean confidence interval for bytes %-change: -3.13% -0.89% Bytes are helped. total regs in shared programs: 865162 -> 865148 (<.01%) regs in affected programs: 509 -> 495 (-2.75%) helped: 4 HURT: 5 helped stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 4 helped stats (rel) min: 3.17% max: 35.90% x̄: 14.01% x̃: 8.48% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 3.17% max: 3.17% x̄: 3.17% x̃: 3.17% 95% mean confidence interval for regs value: -5.75 2.64 95% mean confidence interval for regs %-change: -14.31% 5.39% Inconclusive result (value mean confidence interval includes 0). total uniforms in shared programs: 2120731 -> 2120735 (<.01%) uniforms in affected programs: 358 -> 362 (1.12%) helped: 1 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 2.94% x̄: 2.94% x̃: 2.94% HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 HURT stats (rel) min: 1.05% max: 4.00% x̄: 2.53% x̃: 2.53% Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>	2024-11-08 21:15:41 -04:00
Rhys Perry	da5c5a3edd	nir/algebraic: add bit-size check to extract_u8 pattern This only worked when "a" was 16-bit because a pattern above replaced the shift. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>	2024-11-06 19:31:20 +00:00
Alyssa Rosenzweig	33299354e0	nir/opt_algebraic: optimize patterns hit with OpenCL This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL kernel. LLVM generates a lot of garbage around booleans which we need to chew through. Though there's nothing AGX or really OpenCL specific here, so some of this could help graphics shaders too. Together, their effect is significant for that kernel instr count & occupancy: before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads after: 2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads No significant changes on GL shaderdb (a single godot shader regressed 1 instruction, 1344->1345). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>	2024-10-30 12:59:10 +00:00
Georg Lehmann	d6535f2602	nir/opt_algebraic: create ubfe with non constant mask Foz-DB Navi21: Totals from 278 (0.35% of 79395) affected shaders: MaxWaves: 7444 -> 7448 (+0.05%) Instrs: 316069 -> 314584 (-0.47%); split: -0.47%, +0.00% CodeSize: 1608064 -> 1593204 (-0.92%) VGPRs: 11128 -> 11120 (-0.07%) Latency: 796599 -> 797786 (+0.15%); split: -0.19%, +0.34% InvThroughput: 141195 -> 139472 (-1.22%); split: -1.22%, +0.00% Copies: 28565 -> 29796 (+4.31%); split: -0.15%, +4.46% PreSGPRs: 14335 -> 14336 (+0.01%) VALU: 161342 -> 159426 (-1.19%) SALU: 87794 -> 88305 (+0.58%); split: -0.03%, +0.61% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:10 +00:00
Timur Kristóf	be68aeafdc	nir/opt_algebraic: Add various bitfield extract patterns. v2 (Georg Lehmann): - fixed incorrect imin in ubfe_ubfe - simplied outer_bits of ushr((ubfe, ...), ...) opt - added is_used_once to iand(ushr(), ...) opt to improve stats For-DB Navi21: Totals from 3309 (4.18% of 79206) affected shaders: Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03% CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06% Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01% InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01% VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02% SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01% Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09% Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01% PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01% PreVGPRs: 151477 -> 151474 (-0.00%) VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04% SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03% VMEM: 188035 -> 188060 (+0.01%) SMEM: 259632 -> 259630 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:09 +00:00
Rhys Perry	8efc765a3d	nir/algebraic: fix shfr optimization with zero src2 No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `08903bbe89` ("nir: add mqsad_4x8, shfr and nir_opt_mqsad") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808>	2024-10-25 09:59:40 +00:00
Georg Lehmann	1f9b82bb2a	nir/opt_algebraic: optimize -0.0 + a Foz-DB Navi21: Totals from 428 (0.54% of 79395) affected shaders: MaxWaves: 8510 -> 8512 (+0.02%) Instrs: 731062 -> 729665 (-0.19%); split: -0.19%, +0.00% CodeSize: 3735788 -> 3728324 (-0.20%); split: -0.20%, +0.00% VGPRs: 27328 -> 27336 (+0.03%); split: -0.03%, +0.06% SpillSGPRs: 315 -> 314 (-0.32%) Latency: 3872986 -> 3873236 (+0.01%); split: -0.08%, +0.09% InvThroughput: 971001 -> 970056 (-0.10%); split: -0.17%, +0.08% VClause: 11954 -> 11956 (+0.02%); split: -0.02%, +0.03% SClause: 17361 -> 17358 (-0.02%) Copies: 59038 -> 59045 (+0.01%); split: -0.22%, +0.24% Branches: 17685 -> 17656 (-0.16%) PreSGPRs: 26103 -> 26102 (-0.00%) PreVGPRs: 23220 -> 23206 (-0.06%) VALU: 515293 -> 513963 (-0.26%); split: -0.26%, +0.00% SALU: 91591 -> 91544 (-0.05%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31770>	2024-10-23 08:58:34 +00:00
Georg Lehmann	f9d2aad7a3	nir: remove alu ddx/ddy Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Gert Wollny	f19f1ec17b	nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64 If ftrunc@64 is lowered by nir_lower_doubles it is turned into a comparable long series of 32 bit operations. If the hardware supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64 to use some combinations with ffloor@64. They can then be turned into a combination of fsub@64 and ffract@64 resulting in less all-over instructions. Fixes: `5218cff34b` nir/algebraic: avoid double lowering of some fp64 operations Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>	2024-09-30 23:51:02 +00:00
Ian Romanick	057c7c9f53	nir/algebraic: Recognize open-coded bitfield_reverse in XCOM 2 The XCOM 2 shaders in my shader-db use iadd instead of ior. No fossil-db changes on any Intel platform. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19787210 -> 19787034 (<.01%) instructions in affected programs: 1187 -> 1011 (-14.83%) helped: 6 / HURT: 0 total cycles in shared programs: 906024436 -> 906012612 (<.01%) cycles in affected programs: 72978 -> 61154 (-16.20%) helped: 6 / HURT: 0 Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>	2024-09-13 00:21:00 +00:00
Ian Romanick	a780305818	nir/algebraic: Optimize more comparisons with b2f shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19781108 -> 19772614 (-0.04%) instructions in affected programs: 372638 -> 364144 (-2.28%) helped: 2915 / HURT: 0 total cycles in shared programs: 905907644 -> 905822682 (<.01%) cycles in affected programs: 5573453 -> 5488491 (-1.52%) helped: 2363 / HURT: 234 LOST: 42 GAINED: 16 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152519634 -> 152519610 (-0.00%) Cycle count: 17122707642 -> 17122710974 (+0.00%); split: -0.00%, +0.00% Totals from 5 (0.00% of 633222) affected shaders: Instrs: 2827 -> 2803 (-0.85%) Cycle count: 83089 -> 86421 (+4.01%); split: -0.12%, +4.13% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31068>	2024-09-10 04:15:58 +00:00
Georg Lehmann	6378bbaa82	nir/opt_algebraic: reassociate constants in ior(iand) chains Mostly affects one F1_23 shader that packs bitfields bit by bit. Totals from 3 (0.00% of 79395) affected shaders: Instrs: 5004 -> 4202 (-16.03%) CodeSize: 30992 -> 23952 (-22.72%) Latency: 28894 -> 28464 (-1.49%) InvThroughput: 4095 -> 3934 (-3.93%) Copies: 363 -> 376 (+3.58%) PreVGPRs: 110 -> 109 (-0.91%) VALU: 3035 -> 2504 (-17.50%) SALU: 463 -> 459 (-0.86%) Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31009>	2024-09-05 22:04:05 +00:00
Caio Oliveira	74be809237	compiler: Allow derivative_group to be used for all stages in shader_info These will now also be used by stages that have workgroups. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>	2024-09-03 20:03:18 +00:00
Ian Romanick	f11a414645	nir/algebraic: Remove incorrect bfi of iand pattern The comment says, "This expands to (b & 3) & ~0xc which is (b & 3) & 3." This is not correct. ~0xc is actually 0xfffffff3. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Closes: #11695 Fixes: `1c7e35d4e0` ("nir/algebraic: Optimize some bit operation nonsense observed in some shaders") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30913>	2024-08-29 22:21:55 +00:00
Georg Lehmann	ef970c5a9d	nir: optimize pack_uint_2x16 of pack_half(a, 0) Foz-DB Navi31: Totals from 31 (0.04% of 79395) affected shaders: Instrs: 6157 -> 6065 (-1.49%) CodeSize: 35676 -> 34936 (-2.07%) Latency: 23979 -> 23805 (-0.73%); split: -0.79%, +0.07% InvThroughput: 5248 -> 5124 (-2.36%) VALU: 3224 -> 3162 (-1.92%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30855>	2024-08-28 07:16:55 +00:00
Ian Romanick	198d8d9c03	nir/algebraic: Improve some find_lsb and ifind_msb patterns These patterns were observed in shaders from parallel-rdp. No shader-db changes on any Intel platform. fossil-db: Meteor Lake, DG2, Ice Lake had Skylake similar results. (Meteor Lake shown) Totals: Instrs: 152535883 -> 152535673 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112406110 -> 17122827810 (+0.06%); split: -0.01%, +0.07% Spill count: 78525 -> 78523 (-0.00%) Fill count: 148132 -> 148127 (-0.00%); split: -0.01%, +0.00% Max live registers: 31855320 -> 31855314 (-0.00%) Totals from 206 (0.03% of 633223) affected shaders: Instrs: 797124 -> 796914 (-0.03%); split: -0.03%, +0.00% Cycle count: 4716743323 -> 4727165023 (+0.22%); split: -0.05%, +0.27% Spill count: 18781 -> 18779 (-0.01%) Fill count: 31381 -> 31376 (-0.02%); split: -0.03%, +0.01% Max live registers: 31872 -> 31866 (-0.02%) Tiger Lake Totals: Instrs: 150560465 -> 150560343 (-0.00%); split: -0.00%, +0.00% Cycle count: 15482372893 -> 15479328542 (-0.02%); split: -0.02%, +0.00% Fill count: 103509 -> 103512 (+0.00%) Max live registers: 31760378 -> 31760374 (-0.00%) Totals from 199 (0.03% of 632445) affected shaders: Instrs: 679513 -> 679391 (-0.02%); split: -0.02%, +0.00% Cycle count: 4258406125 -> 4255361774 (-0.07%); split: -0.09%, +0.02% Fill count: 30609 -> 30612 (+0.01%) Max live registers: 30502 -> 30498 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Marek Olšák	ecfefe823e	nir/opt_algebraic: use fmulz for fpow lowering to fix incorrect rendering The original implementation in all radeon drivers had this behavior. Fixes: `9bc1fb4c07` - ac/llvm,radeonsi: lower nir_fpow for aco and llvm Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11464 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30069>	2024-07-23 15:23:27 +00:00
Ian Romanick	faee9426ab	nir/algebraic: Optimize some masking of extract_u8 operations I observed this pattern in several Red Dead Redemption 2 shaders. No shader-db changes on any Intel platform. v2: Remove duplicated patterns. Noticed by Georg. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151519393 -> 151507192 (-0.01%); split: -0.01%, +0.00% Cycle count: 17208246858 -> 17177437340 (-0.18%); split: -0.25%, +0.07% Spill count: 80830 -> 80759 (-0.09%); split: -0.09%, +0.00% Fill count: 152754 -> 152179 (-0.38%); split: -0.40%, +0.02% Totals from 7531 (1.20% of 630198) affected shaders: Instrs: 12606141 -> 12593940 (-0.10%); split: -0.10%, +0.00% Cycle count: 5466605514 -> 5435795996 (-0.56%); split: -0.79%, +0.22% Spill count: 25251 -> 25180 (-0.28%); split: -0.29%, +0.01% Fill count: 45143 -> 44568 (-1.27%); split: -1.36%, +0.08% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>	2024-07-20 00:19:05 +00:00
Ian Romanick	1c7e35d4e0	nir/algebraic: Optimize some bit operation nonsense observed in some shaders In updates (not post at the time of this writing) to !29884, a change caused many spill and fill regressions shader for OpenGL Tomb Raider. While looking at that shader, I noticed some odd patterns. I initially added these patterns to counteract the regressions caused by the other change, but I had no luck. On Ice Lake... this cuts 99 instructions from the shader. shader-db: All Intel platforms had simliar results. (Meteor Lake shown) total instructions in shared programs: 19732341 -> 19732295 (<.01%) instructions in affected programs: 1744 -> 1698 (-2.64%) helped: 1 / HURT: 0 total cycles in shared programs: 916273716 -> 916273068 (<.01%) cycles in affected programs: 14266 -> 13618 (-4.54%) helped: 1 / HURT: 0 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151519575 -> 151519393 (-0.00%) Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00% Totals from 159 (0.03% of 630198) affected shaders: Instrs: 51970 -> 51788 (-0.35%) Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>	2024-07-20 00:19:05 +00:00
Christian Gmeiner	87786a7a7e	nak: Move imad late optimization to nir It is more or less just a code move, but I touched is_only_used_by_iadd(..) to match the style of the other functions in that file. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>	2024-07-12 05:54:46 +00:00
Konstantin Seurer	d9e41e8a8c	nir: Stop using "capture : true" for nir_opt_algebraic "calture : true" is suboptimal and and prevents the script from writing multiple files in one go. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30041>	2024-07-06 15:51:06 +00:00
Georg Lehmann	3e86d2452f	nir/opt_algebraic: add various unordered/ordered patterns from aco Foz-DB Navi21: Totals from 6747 (8.50% of 79395) affected shaders: MaxWaves: 134646 -> 134642 (-0.00%) Instrs: 7830299 -> 7828851 (-0.02%); split: -0.03%, +0.01% CodeSize: 43045532 -> 43010260 (-0.08%); split: -0.09%, +0.00% VGPRs: 378960 -> 378968 (+0.00%) SpillSGPRs: 1209 -> 1208 (-0.08%) Latency: 74667977 -> 74670405 (+0.00%); split: -0.02%, +0.02% InvThroughput: 20124981 -> 20124768 (-0.00%); split: -0.02%, +0.02% VClause: 162870 -> 162868 (-0.00%); split: -0.00%, +0.00% SClause: 277280 -> 277315 (+0.01%); split: -0.00%, +0.02% Copies: 528627 -> 528667 (+0.01%); split: -0.00%, +0.01% PreSGPRs: 319526 -> 319508 (-0.01%) PreVGPRs: 334264 -> 334265 (+0.00%); split: -0.00%, +0.00% VALU: 5485412 -> `5485408` (-0.00%); split: -0.02%, +0.02% SALU: 743882 -> 742301 (-0.21%); split: -0.21%, +0.00% Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	434dfb51ca	nir/opt_algebraic: optimize cmp(fneg(a), #b) and feq with fabs Foz-DB Navi21: Totals from 2483 (3.13% of 79395) affected shaders: Instrs: 4067533 -> 4067756 (+0.01%); split: -0.00%, +0.01% CodeSize: 22525156 -> 22499904 (-0.11%); split: -0.12%, +0.01% Latency: 51967223 -> 51963654 (-0.01%); split: -0.01%, +0.00% InvThroughput: 16685020 -> 16683045 (-0.01%); split: -0.01%, +0.00% SClause: 131890 -> 131907 (+0.01%) Copies: 402557 -> 402510 (-0.01%); split: -0.01%, +0.00% Branches: 146962 -> 146958 (-0.00%) PreSGPRs: 118404 -> 118401 (-0.00%) PreVGPRs: 123791 -> 123787 (-0.00%) VALU: 2709846 -> 2710174 (+0.01%); split: -0.00%, +0.01% SALU: 565883 -> 565786 (-0.02%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	98cc57bccb	nir/optimize cmp(a, -0.0) +0.0 can use an inline constant for AMD hardware, -0.0 needs a literal. Foz-DB Navi21: Totals from 1014 (1.28% of 79395) affected shaders: Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00% CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00% Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00% InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00% VClause: 79475 -> 79478 (+0.00%) SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00% Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00% PreSGPRs: 59155 -> 59144 (-0.02%) VALU: 2032016 -> 2032034 (+0.00%) SALU: 386424 -> 385729 (-0.18%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	8e6bf596cb	nir/opt_algebraic: look through fabs/fneg when matching fmulz/ffmaz Prevents regressions when removing input modifiers from a == 0.0. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>	2024-06-27 08:12:30 +00:00
Georg Lehmann	75b1fa9263	nir/opt_algebraic: alternative 8bit pack_[us]norm_4x8 lowering Foz-DB Navi21: Totals from 42 (0.05% of 79395) affected shaders: Instrs: 2709529 -> 2705848 (-0.14%) CodeSize: 14720732 -> 14711384 (-0.06%); split: -0.06%, +0.00% VGPRs: 4096 -> 4104 (+0.20%) Latency: 17907612 -> 17904468 (-0.02%); split: -0.02%, +0.00% InvThroughput: 4723551 -> 4722649 (-0.02%); split: -0.02%, +0.00% Copies: 223516 -> 219819 (-1.65%) Branches: 109578 -> 109594 (+0.01%); split: -0.00%, +0.02% VALU: 1730848 -> 1727151 (-0.21%) Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28882>	2024-06-04 17:00:29 +00:00
Ian Romanick	7b7e5cf5d4	nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers v2: Add some comments explaining some of the nuance of the shift optimizations. Fix a bug in the shift count calculation of the upper 32-bits. Move the @64 from the variable to the opcode. All suggested by Jordan. No shader-db changes on any Intel platform. fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154507026 -> 154506576 (-0.00%) Cycle count: 17436298868 -> 17436295016 (-0.00%) Max live registers: 32635309 -> 32635297 (-0.00%) Totals from 42 (0.01% of 632575) affected shaders: Instrs: 5616 -> 5166 (-8.01%) Cycle count: 133680 -> 129828 (-2.88%) Max live registers: 1158 -> 1146 (-1.04%) No fossil-db changes on any other Intel platform. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	4834df82e2	nir/algebraic: More patterns to generate iadd3 I noticed some shaders with patterns similar to these while working on cooperative matrix lowering. Meteor Lake and DG2 are the only platforms that support iadd3, so there were no shader-db or fossil-db changes on any other platforms. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19869445 -> 19868343 (<.01%) instructions in affected programs: 419426 -> 418324 (-0.26%) helped: 913 / HURT: 2 total cycles in shared programs: 936010029 -> 935909811 (-0.01%) cycles in affected programs: 31746523 -> 31646305 (-0.32%) helped: 495 / HURT: 356 LOST: 10 GAINED: 12 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00% Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04% Spill count: 146887 -> 146886 (-0.00%) Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00% Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5550128 -> 5550368 (+0.00%) Totals from 4401 (0.70% of 632560) affected shaders: Instrs: `3095239` -> 3086109 (-0.29%); split: -0.30%, +0.00% Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10% Spill count: 28105 -> 28104 (-0.00%) Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02% Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22% Max dispatch width: 43768 -> 44008 (+0.55%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	22095c60bc	nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64 This allows us to not generate 64-bit iadd3 on Intel but continue generating it for NVIDIA. No shader-db or fossil-db changes. v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit iadd3 on NVIDIA platforms. v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of the optimizations from ever occuring. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Georg Lehmann	dcab408a6c	nir: remove unpack_half_flush_to_zero It doesn't make sense to have two sets of opcodes for this when all backends that support the flush_to_zero variant just rely on the global floating point mode anyway. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>	2024-05-31 09:46:35 +00:00
Marek Olšák	b4bd380704	nir/algebraic: eliminate pack+unpack and unpack+pack pairs A new NIR shader for AMD drivers will need this. Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29233>	2024-05-17 22:04:00 +00:00
Francisco Jerez	15a10786e3	nir: Add option to lower 64-bit uadd_sat. C.f. `16be909936`. Intel Xe2 won't support saturation for 64-bit integer addition, regardless of signedness. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Ian Romanick	1b8cf06fc7	nir/algebraic: Optimize some extract_* expressions v2: Add missing '!options->lower_extract_byte' to the last two patterns. Every driver except Asahi sets both or neither. shader-db: All Intel platforms had similar results. (DG2 shown) total instructions in shared programs: 19659360 -> 19659356 (<.01%) instructions in affected programs: 44 -> 40 (-9.09%) helped: 2 / HURT: 0 total cycles in shared programs: 823432524 -> 823432520 (<.01%) cycles in affected programs: 1722 -> 1718 (-0.23%) helped: 2 / HURT: 0 fossil-db: All Intel platforms had similar results. (DG2 shown) Totals: Instrs: 153989787 -> 153989617 (-0.00%) Cycle count: 17562079230 -> 17562079493 (+0.00%); split: -0.00%, +0.00% Totals from 24 (0.00% of 631369) affected shaders: Instrs: 13733 -> 13563 (-1.24%) Cycle count: 341392 -> 341655 (+0.08%); split: -0.25%, +0.33% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Jesse Natalie	894f7f4387	nir_opt_algebraic: Add a couple optimizations for lowered unpack(pack()) I noticed some unnecessary 64-bit ints in shaders that were using doubles. Perhaps there's a different missing optimization that should run on the actual pack/unpack instructions before they're lowered, or maybe I'm just lowering them too early, but these seem simple enough that we might want them even for hand-rolled pack/unpack pairs. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27314>	2024-05-01 21:55:20 +00:00
Connor Abbott	32308fe9f1	ir3/nir: Fix imadsh_mix16 definition The constant-folding definition and comments say that it takes the high 16 bits of the first source and low 16 bits of the second source, but actually it's the opposite. The algebraic optimization, which actually happens and needs to be correct, was correct but the comment above it was wrong. Note that in the way we use it when lowering multiplications, the ordering doesn't matter. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>	2024-04-26 12:55:14 +00:00
Iván Briano	7f97fa6df0	nir/algebraic: move float control conditions to be per instruction Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281>	2024-04-25 12:13:41 +00:00

1 2 3 4 5 ...

570 commits