ssa_to_reg is necessarily sparse, and since it's allocated per block, it's
tremendously memory intensive for shaders with thousands of blocks (which
can easily happen with if-ladders)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
spilling and SSA repair will generate piles of dead SSA defs. add a reindexing
pass to keep memory usag emanagable on large shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
This massively improves our coalescing of phis by considering not just single
phi instructions but entire webs of phi-related SSA values. We do this with a
union-find data structure, which is effectively constant time thanks to union by
rank and path compression. Phi related SSA values are unioned and we try to
assign the same register to everything in the union. Boissinot might be better
but this is delightfully simple.
total instructions in shared programs: 2910655 -> 2883792 (-0.92%)
instructions in affected programs: 1295671 -> 1268808 (-2.07%)
helped: 1129
HURT: 34
Instructions are helped.
total bytes in shared programs: 19417970 -> 19255234 (-0.84%)
bytes in affected programs: 8790112 -> 8627376 (-1.85%)
helped: 1129
HURT: 34
Bytes are helped.
total halfregs in shared programs: 517813 -> 517867 (0.01%)
halfregs in affected programs: 751 -> 805 (7.19%)
helped: 2
HURT: 15
Halfregs are HURT.
total spills in shared programs: 135918 -> 134070 (-1.36%)
spills in affected programs: 135918 -> 134070 (-1.36%)
helped: 6
HURT: 0
total fills in shared programs: 343204 -> 341356 (-0.54%)
fills in affected programs: 343204 -> 341356 (-0.54%)
helped: 6
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
this requires some special handling but closes the last soundness gap (I hope)
in our RA. with later patches in this series, we actually hit this (50+ tests on
the CTS even) so I can be sure this actually works ^^
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
ASAN saves the day! Was stuck on this for hours.
==37495==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xfffff29ecdbc at pc 0xffff7c0751f4 bp 0xfffff29eca30 sp 0xfffff29eca48
READ of size 4 at 0xfffff29ecdbc thread T0
#0 0xffff7c0751f0 in __bitset_set_range ../src/util/bitset.h:249
#1 0xffff7c0751f0 in find_regs ../src/asahi/compiler/agx_register_allocate.c:642
#2 0xffff7c077d2c in pick_regs ../src/asahi/compiler/agx_register_allocate.c:1008
#3 0xffff7c077d2c in agx_ra_assign_local ../src/asahi/compiler/agx_register_allocate.c:1096
#4 0xffff7c077d2c in agx_ra ../src/asahi/compiler/agx_register_allocate.c:1353
#5 0xffff7c03b6c4 in agx_compile_function_nir ../src/asahi/compiler/agx_compile.c:2840
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
unfortunately, shader stage merging is bogus when coherent images are used, so
we need an unmerged path. i'd rather not maintain two paths, so let's just
stop merging. as a bonus this makes ESO a lot easier, and lets us reuse the same
VS for both VS->GS and VS->TCS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
we need to apply depth bias for tris with point/line poly mode (according to
offset_point/offset_line), but we need to NOT apply depth bias for API-level
points/lines. weirdly the hw is sensitive to that last part.
fixes z-fighting with FreeCAD.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
we need to gather tex masks / lower mediump io before lowering textures for our
detection to work ... also we want driver-side i/o soon lowering for Marek's
thing anyway. do some code motion / pass reordering to make this doable.
in doing so, we get rid of agx_uncompiled_shader_info which is probably good.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
Blender needs more samplers to render the "wanderer" scene. While our binding
table is limited, we can get additional samplers by downshifting to the bindless
sampler heap, at a performance penalty. That mechanism is already in place for
merged VS/TCS, so we can reuse it for this.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
Initially I left it in place because I didn't want to disturb the table
beyond what was in nv50_formats.c. However, we've moved past that now
and these formats are permanently broken so it's easiest to just remove
them from the table rather than have them in the table and then have C
code which disables them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28453>