When a new vertex shader is bound, the VS prolog needs to be
re-emitted, and this allows us to avoid tracking if the pipeline is
dirty.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22944>
Up until now, we have been initializing MCS with fast clears. This is
mostly safe, but there's a corner case that can be an issue.
The issue is with a workaround for MCS that requires the sampler not see
any fast-cleared blocks for certain surfaces (14013111325). Even though
we have been initializing MCS with fast clears, we expect most
applications to be safe because we expect that they would only sample
the samples they've rendered to previously (and the render would've
removed the fast-cleared blocks). In other words we don't expect that
apps would transition from VK_IMAGE_LAYOUT_UNDEFINED to
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL and start sampling immediately.
If an application took the unexpected path of sampling undefined
samples, it's possible they'd hit the issue described in the workaround.
Fix this corner case by using an ambiguate to initialize MCS.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
The comment above the warning explains that not all bit patterns are
necessarily valid. While we're at it, fix a typo in that comment.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
Add support for using BLORP's ambiguate pass to initialize MCS instead
of mapping and memsetting it on the CPU. Note that this won't be used if
the first operation on the MSAA layer is a fast clear.
Since we're no longer mapping, this removes a blocker towards getting
MCS_CCS enabled in small-BAR mode.
This functionality is difficult to test because of the way iris is set
up. It always tries to compress writes. So, a test would only read the
ambiguated MCS element if it tries to read from undefined samples.
To test this, I locally disabled fast clears and rendering with MCS (via
iris_resource_render_aux_usage). I continued to allow sampling with MCS
in iris_resource_texture_aux_usage. So, writes go directly to the main
surface and reads go through the ambiguated MCS surface.
When I then ran the test group, dEQP-GLES3.functional.multisample.*, all
48/64 supported tests passed on my Ice Lake. If I slightly changed
BLORP's ambiguate pass, I observed several tests failing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
Implement the ambiguate operation for MCS. This clears MCS layers with a
sample-dependent "uncompressed" value that tells the sampler to go look
at the main surface.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
Select a horizontal alignment value that matches the main MSAA surface.
We need a valid horizontal alignment to perform MCS ambiguates. The
halign value doesn't actually affect test behavior, but it is validated
by isl_surf_fill_state. We currently have an invalid halign for gfx125.
This patch fixes that.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
This tests the following matrix:
- Format: RGBA8Unorm, RGBA16Unorm, RGBA32Float
- Samples: 2 or 4
- Layers: 1 or 2
- Width: Interesting values 1..4097
- Height: Interesting values 1..4097
Compression is based on the dimensions (that is, everything that can be
compressed is). This test compares both the total texture size and the
compression metadata offset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We need to support up to 16 bytes/sample * 4 samples/pixel = 64 bytes/pixel for
multisampling to work with formats like RGBA32F.
Fixes dEQP-GLES3.functional.fbo.msaa.4_samples.rgba32f
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
For multisampled textures, the decision about whether to compress or not
is based on the effective width and height in samples, not pixels.
Introduce ail_can_compress() to encode this logic in ail, so the driver
can use it to decide whether to compress or not before the full layout
is determined.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
BOs can be written from several contexts, so writing to this member is
racy. We only care about this for the purposes of exporting BOs after a
submission (and if the app is racing writers/submissions at that point
all bets are off), so just keeping track of the last written value is
sufficient.
Switch to atomic operations to eliminate the race, and drop the assert
in the batch cleanup path that no longer holds when the BO might have
been written to from another context.
Fixes: asahi/mesa#20
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We track buffers written by batches, but this gets messy when we end up
with an empty batch that is never submitted, since then it might have
taken over writer state from a prior already submitted batch (for its
framebuffers).
Instead of trying to track two tiers of resource writers, let's just
defer initializing batch state until we know we have a draw (or compute
launch, or clear). This means that if a batch is marked as a writer for
a buffer, we know it will not be an empty batch.
This should be a small performance win for empty batches (no need to
emit initial VDM state or run the writer code), but more impontantly it
eliminates the empty batch writer state revert corner case.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Core and opfifo stuff from the compute helper blob, vm_slot because it
was the only one changing when I poked around yesterday and it hit me
what it was ^^
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
This seems to flake some dEQPs due to some kind of race/UB (which
doesn't even always cause the dEQPs to fail due to leeway in the image
comparison, since the problem is usually just a few pixels, but it's
there).
I spent a bunch of time trying other flags/things, and almost everything
changed the bad pixel pattern randomly but nothing fixed it. Let's
revisit this one later, since it looks like a pretty deep rabbit hole.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Framebuffer fetch is coherent, so there is no need for barriers here.
This avoids pointless flushing if an app calls glBlendBarrier().
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Apparently we were still missing some fence stuff, and it started
crashing Firefox in apitrace? I'm not sure why we never noticed this
before, but it's trivial enough. Cargo culted from Panfrost.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
With the shader cache enabled, intel_clc attempts to write to ~/.cache.
Many distributions' build systems limit file-system access, and will
kill the process thus causing the build to fail.
Fixes: 639665053f ("anv/grl: Build OpenCL kernels")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22968>
VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT should always be set, as
required by the spec.
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT should be set when
radv_ahb_format_for_vk_format knowns the format. That is,
radv_create_ahb_memory should at least know how to call
AHardwareBuffer_allocate.
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT is always set. We can't know
if gralloc can allocate the format/flags/usage combo or not (gralloc
might use a private format for the combo).
Fixed
dEQP-VK.api.external.memory.android_hardware_buffer.image_formats.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
anv_ahb_format_for_vk_format needs to know the format at least. There
is no guarantee that AHardwareBuffer_allocate will succeed, but we are
reluctant to check with AHardwareBuffer_isSupported which may
test-allocate internally and is expensive.
v2: add anv_ahb_format_for_vk_format to anv_android_stubs.c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
When allocating a VkDeviceMemory exportable as AHB, it seems incorrect
to fall back to AHARDWAREBUFFER_FORMAT_BLOB when the image has no known
AHB format. We should fail the allocation instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
The base nir options were assuming all bit sizes were supported at
shader model 6.2. Multiple callers were then changing properties
based on actual support.
Standardize behavior by providing the majority of things that can
impact nir options when getting them. Some callers (e.g. meta blit
shaders or libclc) don't bother, because they are known to have
contents that are unaffected by these options. Other callers might
munge more properties afterwards, but this minimizes that.
Note that lower_helper_invocation was incorrectly being turned off
for SM6.6+ by some callers, despite load_helper_invocation being
unimplemented by the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22952>
If we're compiling shaders in shader-db, with shader-db's ./run and
ZINK_DEBUG=shaderdb, we won't get much state set on the graphics pipeline, since
shader-db doesn't actually do any rendering. For a driver like RADV, that is
*almost* ok... Since we use dynamic vertex input, we don't need to make up any
state for vertex inputs; since we use dynamic rendering, we don't need to make
up any render attachments. All of that being said, we *do* need to make up a
blend state to ensure that the Vulkan driver doesn't optimize away all of
store_derefs in the fragment shader (and in turn, optimize the entire fragment
shader away, if there are no image/SSBO writes.) So set the obvious blend state,
fixing fragment shaders in shader-db with zink + radv.
I don't know why other people would want to use Zink with shader-db, but for me
it's an easy way to test ACO, at least until radeonsi gains aco support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22948>