This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
the only dynamic allocation left for geometry shaders is all done in the setup
indirect kernel. so just pass the heap to that kernel directly, so we don't
reserve a heap for direct draws with GS (including pure-VS XFB). this should
reduce our memory footprint a lot in certain apps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34661>
use 8-bit index buffer instead of 32-bit to significantly decrease the size of
serialized geometry shaders (agx_gs_info is not dynamic).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34661>
rather than a bunch of subtle booleans telling the driver how to invoke the GS
rast shader, collect everything into a common enum, and provide (CL safe)
helpers to do the appropriate calculations rather than duplicating across
GL/VK/indirects.
this fixes suboptimal handling of instancing with list topologies.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34661>
this gets us good workgroup sizes even for indirect draws with GS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34638>
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
for indirect GS, do it in the indirect kernel (not the pre-GS)
for direct, do it on the host (not the pre-GS)
we don't want pre-GS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33901>
this will let us unify behaviour across drivers a bit more.
no functional change here. (intel is specifically excluded to avoid a functional
change.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33879>
just accept the arguments as-is. this matches how gcc/clang actually work
and simplifies the meson.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33242>
reduces a bit of boilerplate.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33242>