mesa/src/asahi/libagx
Alyssa Rosenzweig 3eb7575679
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
asahi: do not use "Null" layout
This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.

The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:

* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).

...Neither of these is a "null texture" mode.

So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?

1. Because the hardware texture access counters provide a cheap way to detect
   null texture accesses after the fact, which I think their GPU debug tools
   use. I'm not sure why release builds of the driver do/did that, but whatever.

2. Because I assumed Cupertino knew best and I didn't bother looking too close.

We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.

This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.

The fix is two-fold:

1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
   textures by checking the address, instead of the mode.

The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.

Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
2025-04-24 19:05:07 +00:00
..
compression.cl nir: Rename in-bounds-agx to in-bounds 2025-02-19 09:54:11 +00:00
compression.h libagx: port to common libcl.h 2024-12-12 21:16:12 +00:00
copy.cl libagx: assert alignment for copies 2025-04-01 17:42:50 +00:00
draws.cl libagx: fix wraparound issue with robust draw kernel 2025-02-22 02:24:28 +00:00
geometry.cl libagx: rename agx_geometry_state to agx_heap 2025-04-23 16:20:59 +00:00
geometry.h libagx: use common heap alloc for tessellator 2025-04-23 16:20:59 +00:00
helper.cl libagx: port to common libcl.h 2024-12-12 21:16:12 +00:00
helper.h compiler: use libcl.h for CL 2024-12-12 21:16:12 +00:00
libagx_dgc.h hk: pass cmdbuf, not control stream, into precomp dispatch 2025-02-22 02:24:29 +00:00
libagx_intrinsics.h libagx: port to common libcl.h 2024-12-12 21:16:12 +00:00
meson.build clc,libcl: Clean up CL includes 2025-04-11 21:27:37 +00:00
query.cl libagx: clean up query copy; bug fix 2025-04-01 17:42:50 +00:00
query.h asahi: clang-format 2025-04-01 17:42:51 +00:00
tessellation.cl libagx: use common heap allocs 2025-04-01 17:42:51 +00:00
tessellator.cl libagx: use common heap alloc for tessellator 2025-04-23 16:20:59 +00:00
tessellator.h libagx: rename agx_geometry_state to agx_heap 2025-04-23 16:20:59 +00:00
texture.cl asahi: do not use "Null" layout 2025-04-24 19:05:07 +00:00