We need these fencing intrinsics because our image caches aren't coherent with
memory. Furthermore, we need some sync intrinsics for imageblocks (which are
spicy images). These are a stub of what the final fragment shader interlock
implementation will look like, or what a real Metal-grade imageblock
implementation needs, but this is good enough for handling the sync requirements
with spilled render targets.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Texture loads can be reordered freely but image loads can't be (since there
could be writes). Implement image_load natively to avoid subtle problems with
CSE and scheduling.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
This is equivalent to texture_load but cannot be reordered, since it might be
writeable.
It also sets bit 43. This needs more investigation, but it fixes
KHR-GLES31.core.shader_image_load_store.basic-glsl-misc-fs. Some sort of cache
control bit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Bizarrely, the clamps/wrap modes are respected so we need to set them
appropriately for correct out-of-bounds behaviour (returning all zero). That in
turn means we can't use whatever sampler is already there, instead we need to
allocate a dedicated sampler just for txf. Good news is we have an extra sampler
state register available for the purpose.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
For implementing image_texel_address, when there's no point in creating an
internal texture instruction just to lower immediately.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
The mapping of 1D -> 2D coordinates for indexing into buffer textures (lowered
to fixed-width 2D images) will be shared between both texture load and image
store code paths, so pull it out.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Simply doing nothing fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.*. However, we need
to actually insert the sample_mask instruction to make sure the shader runs at
all (I think), doing that fixes:
KHR-GLES31.core.shader_image_load_store.basic-glsl-earlyFragTests
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Or cache flushes or whatever these actually are. Probably could be optimized
once we understand what the 4 individual instructions are actually doing. Fixes
dEQP-GLES31.functional.image_load_store.2d.qualifiers.*.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
We still need to insert our lowering code, though this case could probably be
optimized somehow. Fixes a massive number of KHR-GLES3 and KHR-GLES31 tests,
including
KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2 and lots
of PBO tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
We need to support arrayed images and sRGB images, which are hardware. For
atomics, we need to pack the augmented software data structure. Finally, we need
to support buffer images. Like their texture counterparts, these get lowered to
2D images.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
For implementing image atomics (and multisample image writes), we need
information about the image layout in the shader. It's a lot nicer to determine
the image layouts on the CPU (where we have ail) and stash the results in the
PBE descriptor, where we have a convenient hole to do so, rather than trying to
do all the layout calculations on the GPU on the fly. Add a data structure that
the driver will fill out and the image atomic lowering will consider as part of
the hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Add a simple API so that decode can be used as a shared library by the
Python hypervisor. Note that this is not thread-safe. If we ever want to
use this in other contexts with thread safety, it will need a refactor
(along with the core decode code anyway).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
We want to plug this library into the hypervisor, but there we don't
have all GPU memory already mapped in our address space. Refactor the
GPU mem read function to always allocate local buffers and copy in the
data there.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Needed for some Metal demos that end up creating multiple queues.
This is still definitely broken/not fully correct, but it at least
gets things working for those.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Sooner or later we were going to need divergent codepaths in decode, and
it looks like now is the time. Add a `params` typedef and pass it
through all the decoder callbacks. This is an alias for
drm_asahi_params_global, but use a typedef so we can change that later
without changing dozens of instances.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Often not needed and makes the NIR harder to read.
shader-db is noise.
total instructions in shared programs: 1752712 -> 1752688 (<.01%)
instructions in affected programs: 8338 -> 8314 (-0.29%)
helped: 21
HURT: 8
Inconclusive result (%-change mean confidence interval includes 0).
total bytes in shared programs: 11943572 -> 11943434 (<.01%)
bytes in affected programs: 56716 -> 56578 (-0.24%)
helped: 21
HURT: 8
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
If we have two 16-bit copies to/from adjacent 16-bit registers, we can instead
use a single 32-bit copy from the 32-bit register pair. Since 32-bit integer
arithmetic is (almost) as efficient as 16-bit on AGX, this (almost) doubles
performance of affected parallel copies.
total instructions in shared programs: 1788606 -> 1788301 (-0.02%)
instructions in affected programs: 17057 -> 16752 (-1.79%)
helped: 150
HURT: 0
Instructions are helped.
total bytes in shared programs: 12196492 -> 12194662 (-0.02%)
bytes in affected programs: 122894 -> 121064 (-1.49%)
helped: 150
HURT: 0
Bytes are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Not meaningfully using more registers since this is just about how we assign
registers after fixing the maximum # of registers used (note that thread count
is unaffected).
total instructions in shared programs: 1790901 -> 1788666 (-0.12%)
instructions in affected programs: 230680 -> 228445 (-0.97%)
helped: 681
HURT: 2
Instructions are helped.
total bytes in shared programs: 12210266 -> 12196852 (-0.11%)
bytes in affected programs: 1634100 -> 1620686 (-0.82%)
helped: 682
HURT: 2
Bytes are helped.
total halfregs in shared programs: 532130 -> 532218 (0.02%)
halfregs in affected programs: 848 -> 936 (10.38%)
helped: 3
HURT: 13
Halfregs are HURT.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
All shaders affected for thread count are in pubg... by chance the allocation
before used fewer registers than the calculated register demand (I guess because
we're conservative with our vector handling) and so got lucky and got higher
thread count. That shader is also helped massively for instructions.
The halfreg change doesn't matter -- we're not actually increasing register
demand, we're just being more choosy about our registers.
total instructions in shared programs: 1799738 -> 1790901 (-0.49%)
instructions in affected programs: 306081 -> 297244 (-2.89%)
helped: 889
HURT: 14
Instructions are helped.
total bytes in shared programs: 12263290 -> 12210266 (-0.43%)
bytes in affected programs: 2150966 -> 2097942 (-2.47%)
helped: 889
HURT: 14
Bytes are helped.
total halfregs in shared programs: 531981 -> 532130 (0.03%)
halfregs in affected programs: 1925 -> 2074 (7.74%)
helped: 0
HURT: 26
Halfregs are HURT.
total threads in shared programs: 18885184 -> 18884224 (<.01%)
threads in affected programs: 13440 -> 12480 (-7.14%)
helped: 0
HURT: 15
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Common logic the next few patches will use to try to assign something to the
same register as something else. "If it's already been assigned a register and
that register is free now, use it, otherwise bail."
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
We now emulate an infinitely large binding table with bindless, so the sky is
the limit for this CAP. Note we still have the limit for samplers, so this
probably doesn't do anything for OpenGL.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
For sRGB render targets, we encode sRGB when writing pixels into the tilebuffer
(in the fragment shader), not when writing out the image. When we actually write
out the tilebuffer to the image, we don't use the PBE's sRGB conversion, we just
bind it as a UNORM 8 image and blit the pre-transformed pixels.
We're about to add real sRGB support for the PBE, so make this linearization
explicit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Both textures and images share a unified indexing scheme in AGX. When binding
tables are used, they can be mapped to texture state registers. Otherwise, there
is bindless access available.
It would be nice to map OpenGL's binding table based textures and images to AGX
texture state registers 1:1. The problem is that OpenGL allows more combined
textures and images than we necessarily have texture state registers. So, we use
as many texture state registers as we can, and then we fallback on an internal
bindless scheme mapping an extended binding table.
Add and use a lowering pass to map all of the API-level texture/image indices to
either texture state registers or bindless handles as required.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>