ASAN saves the day! Was stuck on this for hours.
==37495==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xfffff29ecdbc at pc 0xffff7c0751f4 bp 0xfffff29eca30 sp 0xfffff29eca48
READ of size 4 at 0xfffff29ecdbc thread T0
#0 0xffff7c0751f0 in __bitset_set_range ../src/util/bitset.h:249
#1 0xffff7c0751f0 in find_regs ../src/asahi/compiler/agx_register_allocate.c:642
#2 0xffff7c077d2c in pick_regs ../src/asahi/compiler/agx_register_allocate.c:1008
#3 0xffff7c077d2c in agx_ra_assign_local ../src/asahi/compiler/agx_register_allocate.c:1096
#4 0xffff7c077d2c in agx_ra ../src/asahi/compiler/agx_register_allocate.c:1353
#5 0xffff7c03b6c4 in agx_compile_function_nir ../src/asahi/compiler/agx_compile.c:2840
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
unfortunately, shader stage merging is bogus when coherent images are used, so
we need an unmerged path. i'd rather not maintain two paths, so let's just
stop merging. as a bonus this makes ESO a lot easier, and lets us reuse the same
VS for both VS->GS and VS->TCS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
we need to apply depth bias for tris with point/line poly mode (according to
offset_point/offset_line), but we need to NOT apply depth bias for API-level
points/lines. weirdly the hw is sensitive to that last part.
fixes z-fighting with FreeCAD.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
we need to gather tex masks / lower mediump io before lowering textures for our
detection to work ... also we want driver-side i/o soon lowering for Marek's
thing anyway. do some code motion / pass reordering to make this doable.
in doing so, we get rid of agx_uncompiled_shader_info which is probably good.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
Blender needs more samplers to render the "wanderer" scene. While our binding
table is limited, we can get additional samplers by downshifting to the bindless
sampler heap, at a performance penalty. That mechanism is already in place for
merged VS/TCS, so we can reuse it for this.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
Initially I left it in place because I didn't want to disturb the table
beyond what was in nv50_formats.c. However, we've moved past that now
and these formats are permanently broken so it's easiest to just remove
them from the table rather than have them in the table and then have C
code which disables them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28453>
This is much more readable than the macro mess we had before. Most
notably because we no longer have this mess of #defines for format
support flags and instead have a character per flag and every every
supported flag is there every time. This makes it way easier to figure
out what we're actually claiming a format can do.
This conversion was tested with a patch that did a memcmp() of this
table against the old one to ensure that the two tables are bit-for-bit
identical. Later commits may modify the table in various ways but this
conversion to a CSV file is lossless.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28453>
In nir_lower_terminate_to_demote(), we were deleting the rest of the
block contents when we added a halt instruction but left any subsequent
CF nodes in the list. While this may be technically okay, that much
dead code makes the rest of NIR pretty grumpy. It's better to delete
everything to the end of the CF list, not just everything to the end of
the block.
Fixes: 75861c64b8 ("nir: Add a lower_terminate_to_demote pass")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28456>
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.
Also change the code to reuse the allocated src array when resizing to
a smaller length.
Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>