when compute shaders can be precompiled, they can be precompiled asynchronously
which allows the implementation of the parallel shader compile hooks
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18197>
for compute shaders that don't need spec constants or cube lowering,
precompiles are possible and can be performed immediately after disk
cache lookup completes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18197>
if there are no inline uniforms, nonseamless cubes, or local size use,
then this is the "base" pipeline object that can be reused without checking
the hash table
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18197>
this simplifies the whole compute shader/program architecture and
also compiles compute shaders when apps maybe expect them to be compiled
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18197>
It's busted.
Pushing directly. There are 6 MRs in the Marge queue and we don't have that
kind of time to wait for them to time out.
Acked-by: Rob Clark on IRC
These functions are no longer used since the introduction of common
dispatch.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18284>
Reports d3d12 support for mapping all the contiguous planes.
This will be used by vaDeriveImage in the VA frontend
Fixes: a585d95803 ("radeonsi/vcn: WA 10bit encoding crash in vaapi")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18300>
vaDeriveImage should check if the underlying gallium driver can map contiguous planes
before skipping with disallowlist.
Fixes: a585d95803 ("radeonsi/vcn: WA 10bit encoding crash in vaapi")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18300>
Only re-emit the scissor state if viewports or scissors change, and
only re-emit the guardband state if viewports, line width or the
current rasterized primitive change.
This should reduce the number of emitted packets when only the line
width or the rasterized primitive change.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18247>
The guardband state (part of scissors) needs to be re-emitted when
the line width changes. Given this is a dynamic state, it's not
necessary to look at the pipeline line width.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18247>
Color write enable can change CB_TARGET_MASK and emitting a BREAK_BATCH
seems needed for binning. Though, this was broken if this enable bit
changed dynamically for the same pipeline. Split the function to not
increase CPU overhead.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18233>
We weren't setting LOCAL, so unless freedreno GL had set it since the GPU
woke up, we wouldn't get it.
This requires moving the GLOBAL unsetting out of tile_store's IB, since it
would never be executed when it mattered, anyway.
No perf difference detected on gfxbench vk-5-normal, or ANGLE minecraft,
genshin, and pubg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18187>
It's use extern struct util_cpu_caps_t util_cpu_caps that's violate the
restriction that we can not directly access util_cpu_caps
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17803>
The main intention is remove usage of extern struct util_cpu_caps_t util_cpu_caps
so we can mark util_cpu_caps to be static latter
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17803>
This is the prepare for following changes:
* Handling GALLIUM_NOSSE in u_cpu_detect.c
* Handling LP_FORCE_SSE2 and LP_NATIVE_VECTOR_WIDTH in u_cpu_detect.c
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17803>
Add comment about _util_cpu_detect_local that it's can only be called by util_get_cpu_caps
Add comment about util_cpu_caps that it's can only by accessed by util_get_cpu_caps
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17803>
If task redistribution is enabled, then some mesh shaders read
garbage from task payload.
It may be a hardware bug, or it may be our bug. Who knows :(
This change will probably negatively affect performance of task
shader-enabled workloads on multi-slice GPUs, because mesh shaders
will be executed only on the slice where task shader was spawned.
Fixes: ef04caea9b ("anv: Implement Mesh Shading pipeline")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16197>