Freedreno needs to know when an image has volatile or coherent access
flags in the shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20612>
Implement the get_decoder_fence vfunc. Note that the waiting for
completion in this driver happens in the end_frame vfunc itself.
Signed-off-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in picture->fence in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in picture->fence in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Use the new get_decoder_fence vfunc to implement
vaQuerySurfaceStatus and vaSyncSurface in the va state tracker.
A pointer to the surface's fence is passed to the codecs before the
end_frame vfunc and the codec is responsible for allocating a fence on
command stream submission.
This fence is then queried on vaQuerySurfaceStatus and waited on in
vaSyncSurface.
Notably both functions were not implemented as per the VA-API docs for
PIPE_VIDEO_ENTRYPOINT_BITSTREAM.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Add PIPE_DEFAULT_DECODER_FEEDBACK_TIMEOUT_NS as a way to control
how much to wait for decoders if this is supported.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Add a get_decoder_fence vfunc that can be used to query the status
of the previous decode job denoted by 'fence' given 'timeout'.
A pointer to a fence pointer can be passed to the codecs before the
end_frame vfunc and the codec should then be responsible for allocating
a fence on command stream submission.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Only construct the key on-demand if the PROG state is dirty. The newly
added "virtual" PROG_KEY state is used to know when other state that the
shader key depends on changes. Worth ~13% at drawoverhead test 0.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
A pretty significant amount of time spent in fd6_draw_vbo is calling
memset to zero init the on-stack struct. And a big part of the size
of the struct is fd6_state, of which we only need to initialize
num_groups to zero. This is worth a 15% improvement in drawoverhead
test 0 ("no state change").
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
We already overwrote the entire descriptor in patch_fb_read_sysmem().
Doing the same in patch_fb_read_gmem() will simplify things for moving
the fb_read descriptor to the FS's bindless descriptor set.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
Split out the build-up of CP_SET_DRAW_STATE packet, as we are going to
want to re-use this for compute state later when we switch to bindless
IBO descriptors.
While we are at it, drop the enable_mask param, as this is determined
solely by the group_id, and it is easier to maintain a table for the
handful of exceptions to ENABLE_ALL. The compiler should be able to
optimize away the table lookup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
If a shader's sampler state is dirty often, the sampler descriptor heap
can get used up quickly, forcing flushing. If that happens quickly, we
run out of batches and have to wait for batches to finish on the GPU.
When this happens, it is often because the sampler state is switching,
not because it's truly unique. This change hashes and saves sampler
descriptor tables that can be reused in subsequent draws in the same
batch, instead of re-copying the same descriptors and consuming the
heap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20618>
The maximum number of mipmap levels supported for cubemap can be
determined from the maximum 2D texture size. There is no need
to limit the max to 12.
This fixes a regression in creating GL4.1 and up context since
commit 2658d02516 is now explicitly checking for
MaxCubeTextureLevels >= 15 for GL4.1 context.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20600>
We are now extremelly careful when copy propagating a mov that uses
relative addressing. The search for readers will trigger abort when it
sees any other instruction using a relative addressing, irrespective of
the actual used registers or whether an address register load was seen.
Additionally, since ntt switch all movs using the relative addressing are
actually used only once right on the next line, and are result of ntt converting
vec4 32 ssa_10 = intrinsic load_ubo_vec4 (ssa_0, ssa_9) (access=0, base=11, component=0)
into
5: ARL ADDR[0].x, TEMP[0].xxxx
6: MOV TEMP[2], CONST[0][ADDR[0].x+11]
RV530 shader-db:
total instructions in shared programs: 132966 -> 131904 (-0.80%)
instructions in affected programs: 29896 -> 28834 (-3.55%)
helped: 234
HURT: 2
total temps in shared programs: 16969 -> 16905 (-0.38%)
temps in affected programs: 604 -> 540 (-10.60%)
helped: 68
HURT: 12
Partial fix for: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7723
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20577>
We can use the new functionality in the draw-helper to implement
stippled smooth lines instead of what we currently do, which is aliased
stipping on smooth lines.
Reviewed-by: Soroush Kashani <soroush.kashani@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20134>
When computing line smoothing, we can also do something similar to
compute the line stippling. This can be useful for some drivers, who
can't easily split the lines before rasterizing them.
This does lead to slightly inaccurate stippling, because the
line-smoothing extends the line-length by a small amount. That leads to
the line-stippling pattern being over-stretched over the line-segment by
a fraction of a pixel in lenght. For short lines, this can be quite a
lot of error.
Reviewed-by: Soroush Kashani <soroush.kashani@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20134>
I'm convinced that u_blitter interactions with fbfetch can't be handled
in si_update_ps_colorbuf0_slot alone, so it has to be force-disabled
by si_blitter_begin. Another reason why it has to be disabled for u_blitter
and not ignored is because FBFETCH with MSAA enables sample shading
regardless of context states, and we don't want that for u_blitter.
Also, si_update_ps_colorbuf0_slot now disables FBFETCH explicitly before
its own DCC and CMASK decompression because even though u_blitter can't do
anything (due to blitter_running), si_blitter_end calls it too.
The result is that no recursion can occur thanks to the blitter_running
and suppress_update_ps_colorbuf0_slot flags, and FBFETCH is always
force-disabled before those flags are set, which is the state we want
to be in.
Fixes: bc6d22b920 ("radeonsi: fix ps_uses_fbfetch value")
Acked-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20318>
TS sharing isn't fully stable yet. There are some fixes pending, but they
don't take care of all reported issues. Hide TS sharing behind a debug
switch until all the known issues are resolved.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20606>
Comparing the 64bit clear value to the lower half 32bit clear state is
obviously wrong and results in a lot of false positives.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20586>
Instead of replicating the lower half of the clear value, properly
use the upper half to program the second clear value BLT state.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20586>
Align the sampler view creation condition with the image and buffer
creation usage which maps to PIPE_BIND_SAMPLER_VIEW, which fixes the spam
of "Illegal sampler view creation without bind flag". Also fix the
PIPE_BIND_SHADER_IMAGE assignment for image usage bits and avoid setting
the image view struct if without PIPE_BIND_SHADER_IMAGE.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20580>
Rather than a grab-bag of random values, return 'Mesa' as the GL_VENDOR
string for all community-supported drivers.
Drivers which are primarily developed/maintained by the hardware vendor
retain that vendor's name as the GL_VENDOR string.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16064>
if the render area exceeds the attachment size, this is not only illegal,
it will crash later
cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20583>