rather than always early killing and then hitting pathological shuffle
situations, only early-kill when we can prove that we won't need to shuffle. it
turns out that's most of the time.
even with this heuristic, we still get hurt bad in shader-db due to extra moves.
but hopefully, the #s here are small enough that we can move on with our lives
and fix this source of known unsoundness.
this is tagged for backport as it's needed to avoid a perf regression with the
previous patch.
combined stats from this commit and the previous commit:
total instrs in shared programs: 2846065 -> 2852257 (0.22%)
instrs in affected programs: 618734 -> 624926 (1.00%)
total alu in shared programs: 2329477 -> 2335534 (0.26%)
alu in affected programs: 508119 -> 514176 (1.19%)
total gprs in shared programs: 894762 -> 901327 (0.73%)
gprs in affected programs: 36946 -> 43511 (17.77%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit b1e86b3eae)
shader-db stats combined with next commit. this is the rip off the bandaid, next
is the optimize. split to enable bisecting.
the code we have to shuffle clobbered killed sources is broken and, after
thinking about that for a Long time, I don't see a reasonable way to fix it. But
if we late-kill sources - or model our calculations as-if we were late-killing
souces - we never have to shuffle onto a killed source and the problem goes away
entirely.
this is similar in spirit to what NAK does. it's not "optimal", but it's sane.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit b88fe9b0c5)
This hurts us in two ways:
* slightly more spilling (not actually a big problem)
* slightly worse occupancy (the shaders that are "helped" here are from trying
less hard to fit at higher occupancy levels)
However, in exchange we get a LOT more flexibility in the RA.
total instrs in shared programs: 2847015 -> 2846065 (-0.03%)
instrs in affected programs: 84134 -> 83184 (-1.13%)
total alu in shared programs: 2330406 -> 2329477 (-0.04%)
alu in affected programs: 62305 -> 61376 (-1.49%)
total code size in shared programs: 20497326 -> 20491690 (-0.03%)
code size in affected programs: 586664 -> 581028 (-0.96%)
total gprs in shared programs: 894202 -> 894762 (0.06%)
gprs in affected programs: 8900 -> 9460 (6.29%)
total scratch in shared programs: 13292 -> 13304 (0.09%)
scratch in affected programs: 2924 -> 2936 (0.41%)
total threads in shared programs: 27819712 -> 27814272 (-0.02%)
threads in affected programs: 55296 -> 49856 (-9.84%)
total spills in shared programs: 907 -> 914 (0.77%)
spills in affected programs: 419 -> 426 (1.67%)
total fills in shared programs: 857 -> 862 (0.58%)
fills in affected programs: 389 -> 394 (1.29%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit 7fad96d194)
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
logicop_func=COPY is different from logicop_enable due to overriding blending.
maintain the info we need to implement properly. fixes
dEQP-VK.pipeline.shader_object_unlinked_binary.logic_op_na_formats.r32g32b32a32_sfloat.copy_blend
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
../src/asahi/compiler/agx_nir_lower_address.c:82:30: runtime error: passing zero to ctz(), which is not a valid argument
#0 0x56175dca5684 in pass ../src/asahi/compiler/agx_nir_lower_address.c:82
#1 0x56175dca2eda in nir_function_intrinsics_pass ../src/compiler/nir/nir_builder.h:164
#2 0x56175dca308c in nir_shader_intrinsics_pass ../src/compiler/nir/nir_builder.h:191
#3 0x56175dca68ae in agx_nir_lower_address ../src/asahi/compiler/agx_nir_lower_address.c:167
#4 0x56175dc885c0 in agx_optimize_nir ../src/asahi/compiler/agx_compile.c:3063
#5 0x56175dc9650d in agx_compile_shader_nir ../src/asahi/compiler/agx_compile.c:3827
#6 0x56175dc52148 in main ../src/asahi/clc/asahi_clc.c:359
#7 0x7fdf1c343249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x7fdf1c343304 in __libc_start_main_impl ../csu/libc-start.c:360
#9 0x56175dc44760 in _start (/builds/mesa/mesa/_build/src/asahi/clc/asahi_clc+0xbd0760)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33685>
This currently treats coarse and fine derivatives the same, but Qualcomm
needs to know whether just coarse derivatives are used or fine
derivatives/quad ops are also used. Rename this to
needs_coarse_quad_helper_invocations make clear the difference from the
new field, needs_full_quad_helper_invocations.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>