We track buffers written by batches, but this gets messy when we end up
with an empty batch that is never submitted, since then it might have
taken over writer state from a prior already submitted batch (for its
framebuffers).
Instead of trying to track two tiers of resource writers, let's just
defer initializing batch state until we know we have a draw (or compute
launch, or clear). This means that if a batch is marked as a writer for
a buffer, we know it will not be an empty batch.
This should be a small performance win for empty batches (no need to
emit initial VDM state or run the writer code), but more impontantly it
eliminates the empty batch writer state revert corner case.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Core and opfifo stuff from the compute helper blob, vm_slot because it
was the only one changing when I poked around yesterday and it hit me
what it was ^^
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
This seems to flake some dEQPs due to some kind of race/UB (which
doesn't even always cause the dEQPs to fail due to leeway in the image
comparison, since the problem is usually just a few pixels, but it's
there).
I spent a bunch of time trying other flags/things, and almost everything
changed the bad pixel pattern randomly but nothing fixed it. Let's
revisit this one later, since it looks like a pretty deep rabbit hole.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Framebuffer fetch is coherent, so there is no need for barriers here.
This avoids pointless flushing if an app calls glBlendBarrier().
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Apparently we were still missing some fence stuff, and it started
crashing Firefox in apitrace? I'm not sure why we never noticed this
before, but it's trivial enough. Cargo culted from Panfrost.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
With the shader cache enabled, intel_clc attempts to write to ~/.cache.
Many distributions' build systems limit file-system access, and will
kill the process thus causing the build to fail.
Fixes: 639665053f ("anv/grl: Build OpenCL kernels")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22968>
VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT should always be set, as
required by the spec.
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT should be set when
radv_ahb_format_for_vk_format knowns the format. That is,
radv_create_ahb_memory should at least know how to call
AHardwareBuffer_allocate.
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT is always set. We can't know
if gralloc can allocate the format/flags/usage combo or not (gralloc
might use a private format for the combo).
Fixed
dEQP-VK.api.external.memory.android_hardware_buffer.image_formats.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
anv_ahb_format_for_vk_format needs to know the format at least. There
is no guarantee that AHardwareBuffer_allocate will succeed, but we are
reluctant to check with AHardwareBuffer_isSupported which may
test-allocate internally and is expensive.
v2: add anv_ahb_format_for_vk_format to anv_android_stubs.c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
When allocating a VkDeviceMemory exportable as AHB, it seems incorrect
to fall back to AHARDWAREBUFFER_FORMAT_BLOB when the image has no known
AHB format. We should fail the allocation instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
The base nir options were assuming all bit sizes were supported at
shader model 6.2. Multiple callers were then changing properties
based on actual support.
Standardize behavior by providing the majority of things that can
impact nir options when getting them. Some callers (e.g. meta blit
shaders or libclc) don't bother, because they are known to have
contents that are unaffected by these options. Other callers might
munge more properties afterwards, but this minimizes that.
Note that lower_helper_invocation was incorrectly being turned off
for SM6.6+ by some callers, despite load_helper_invocation being
unimplemented by the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22952>
If we're compiling shaders in shader-db, with shader-db's ./run and
ZINK_DEBUG=shaderdb, we won't get much state set on the graphics pipeline, since
shader-db doesn't actually do any rendering. For a driver like RADV, that is
*almost* ok... Since we use dynamic vertex input, we don't need to make up any
state for vertex inputs; since we use dynamic rendering, we don't need to make
up any render attachments. All of that being said, we *do* need to make up a
blend state to ensure that the Vulkan driver doesn't optimize away all of
store_derefs in the fragment shader (and in turn, optimize the entire fragment
shader away, if there are no image/SSBO writes.) So set the obvious blend state,
fixing fragment shaders in shader-db with zink + radv.
I don't know why other people would want to use Zink with shader-db, but for me
it's an easy way to test ACO, at least until radeonsi gains aco support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22948>
Due the lack of APIs to set mmap modes, Xe KMD can't support the same
memory types as i915.
So here adding a i915 and Xe function to set memory types supported
by each KMD.
Iris function iris_xe_bo_flags_to_mmap_mode() has a table with all the
mmaps modes of each type of placement.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22906>
We have an imad instruction and our iadd has a small immediate shift on the
second source. Together, these allow expressing lots of integer multiplies more
efficiently. Add some rules to optimize these now that the backend compiler can
ingest the optimized forms.
Half-register changes are from load_const scheduling changing in some vertex
shaders.
total instructions in shared programs: 1539092 -> 1537949 (-0.07%)
instructions in affected programs: 167896 -> 166753 (-0.68%)
total bytes in shared programs: 10543012 -> 10533866 (-0.09%)
bytes in affected programs: 1218068 -> 1208922 (-0.75%)
total halfregs in shared programs: 483180 -> 483448 (0.06%)
halfregs in affected programs: 1942 -> 2210 (13.80%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
Models `(a * b) + (c << d)` in general, as implemented in various forms on AGX.
This will be fused with backend NIR opt algebraic rules, both for the literal
pattern as well as to strength reduce certain multiplications, e.g. replacing
a * 5 with `a + (a << 2)` expressed as imadshl_agx(a, 1, a, 2).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
We have a few ALU instructions that take a constant source. Technically, they
have a swizzle so you can't just nir_src_as_uint them, even though a bunch of
backends do. To help backends do the right thing, add a helper that's just as
easy to use that will chase the swizzle properly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
We're missing it for the memcpy with streamout
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5cc4075f95 ("anv, iris: Add Wa_16011411144 for DG2")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22930>
This can not happen because the post-RA optimizer doesn't support sub dword
writes at the moment, but everytime I look at this I wonder if there might
be a bug here.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22821>
This reverts commit 5489033fa8.
The problem I was trying to address is that we were programming the
3DSTATE_PS::PositionXYOffsetSelect bit differently with GPL (CENTROID)
than without (NONE).
I failed to understand that this bit also impacts the thread payload
layout. GPL fragment shaders don't know ahead of time if pos_offset is
going to be used. It'll be choosen at runtime base on push constant
bits. So we need to program this bit different just to have a payload
matching the compiled shader code.
This fixes the freedoom replay with GPL FS shader in SIMD32.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22938>