Use ACCESS_* flags in call sites instead of GLC/DLC/SLC.
ACCESS_* flags are extended to describe other aspects of memory instructions
like load/store/atomic/smem.
Then add a function that converts the access flags to GLC, DLC, SLC.
The new functions are also usable by ACO.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
Refactor framebuffer to renderpass to mirror previous INTEL_MEASURE
changes.
We dump hashes/pointers for shaders and framebuffer/renderpass.
Reduce from 64bit to 32bit pointers. We don't benefit from the
extra precision and reduced output size is convenient.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22723>
This is a huge win because gallivm duplicated the translations in a zillion
places.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
This fixes vaon12 driver file name to be consistent with libva
expectation - vaon12_drv_video.dll - without lib prefix.
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22995>
If either dest or src is an invalid or null pointer, the behavior is
undefined, even if count is zero.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22979>
Add support for using BLORP's ambiguate pass to initialize MCS instead
of mapping and memsetting it on the CPU. Note that this won't be used if
the first operation on the MSAA layer is a fast clear.
Since we're no longer mapping, this removes a blocker towards getting
MCS_CCS enabled in small-BAR mode.
This functionality is difficult to test because of the way iris is set
up. It always tries to compress writes. So, a test would only read the
ambiguated MCS element if it tries to read from undefined samples.
To test this, I locally disabled fast clears and rendering with MCS (via
iris_resource_render_aux_usage). I continued to allow sampling with MCS
in iris_resource_texture_aux_usage. So, writes go directly to the main
surface and reads go through the ambiguated MCS surface.
When I then ran the test group, dEQP-GLES3.functional.multisample.*, all
48/64 supported tests passed on my Ice Lake. If I slightly changed
BLORP's ambiguate pass, I observed several tests failing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
BOs can be written from several contexts, so writing to this member is
racy. We only care about this for the purposes of exporting BOs after a
submission (and if the app is racing writers/submissions at that point
all bets are off), so just keeping track of the last written value is
sufficient.
Switch to atomic operations to eliminate the race, and drop the assert
in the batch cleanup path that no longer holds when the BO might have
been written to from another context.
Fixes: asahi/mesa#20
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We track buffers written by batches, but this gets messy when we end up
with an empty batch that is never submitted, since then it might have
taken over writer state from a prior already submitted batch (for its
framebuffers).
Instead of trying to track two tiers of resource writers, let's just
defer initializing batch state until we know we have a draw (or compute
launch, or clear). This means that if a batch is marked as a writer for
a buffer, we know it will not be an empty batch.
This should be a small performance win for empty batches (no need to
emit initial VDM state or run the writer code), but more impontantly it
eliminates the empty batch writer state revert corner case.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
This seems to flake some dEQPs due to some kind of race/UB (which
doesn't even always cause the dEQPs to fail due to leeway in the image
comparison, since the problem is usually just a few pixels, but it's
there).
I spent a bunch of time trying other flags/things, and almost everything
changed the bad pixel pattern randomly but nothing fixed it. Let's
revisit this one later, since it looks like a pretty deep rabbit hole.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Framebuffer fetch is coherent, so there is no need for barriers here.
This avoids pointless flushing if an app calls glBlendBarrier().
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Apparently we were still missing some fence stuff, and it started
crashing Firefox in apitrace? I'm not sure why we never noticed this
before, but it's trivial enough. Cargo culted from Panfrost.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
The base nir options were assuming all bit sizes were supported at
shader model 6.2. Multiple callers were then changing properties
based on actual support.
Standardize behavior by providing the majority of things that can
impact nir options when getting them. Some callers (e.g. meta blit
shaders or libclc) don't bother, because they are known to have
contents that are unaffected by these options. Other callers might
munge more properties afterwards, but this minimizes that.
Note that lower_helper_invocation was incorrectly being turned off
for SM6.6+ by some callers, despite load_helper_invocation being
unimplemented by the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22952>
If we're compiling shaders in shader-db, with shader-db's ./run and
ZINK_DEBUG=shaderdb, we won't get much state set on the graphics pipeline, since
shader-db doesn't actually do any rendering. For a driver like RADV, that is
*almost* ok... Since we use dynamic vertex input, we don't need to make up any
state for vertex inputs; since we use dynamic rendering, we don't need to make
up any render attachments. All of that being said, we *do* need to make up a
blend state to ensure that the Vulkan driver doesn't optimize away all of
store_derefs in the fragment shader (and in turn, optimize the entire fragment
shader away, if there are no image/SSBO writes.) So set the obvious blend state,
fixing fragment shaders in shader-db with zink + radv.
I don't know why other people would want to use Zink with shader-db, but for me
it's an easy way to test ACO, at least until radeonsi gains aco support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22948>
This reverts commit 5489033fa8.
The problem I was trying to address is that we were programming the
3DSTATE_PS::PositionXYOffsetSelect bit differently with GPL (CENTROID)
than without (NONE).
I failed to understand that this bit also impacts the thread payload
layout. GPL fragment shaders don't know ahead of time if pos_offset is
going to be used. It'll be choosen at runtime base on push constant
bits. So we need to program this bit different just to have a payload
matching the compiled shader code.
This fixes the freedoom replay with GPL FS shader in SIMD32.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22938>
It is very common in games to read just the .x channel of a vec4 shadow
result (since GL defaults to either LUMINANCE or RED depth mode depending
on context). So, we can avoid shader recompiles to handle the other
components, in that case.
Fixes some recompiles in CS:GO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22912>
If we have multiple LRZ clears, emit them all at once. This also avoids
redundant LRZ clears if app does multiple clears in sequence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>
Just the scaffolding for now, nothing actually creates multiple sub-
passes yet. For now, only planning to use this for a6xx, as other
gens are doing clears on 3d.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>
This will give better visibility in perfetto, and prepares for the next
commit where we could have per-subpass clears.
While we are at it, start adopting vulkan terms for tile load/store. No
need to be pointlessly different.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>
batch->fast_cleared will be per-subpass. But we can use the cleared
bitmask instead in the few places where we just need to know if there
was a clear in any subpass. For the conditional-ib it is even
preferable since we know a clear touched the contents of the tile so
we know what the result of the conditional would be.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>