For drivers already using vk_common_ResetCommandBuffer(), it now only
calls the driver's reset hook if the command buffer is not in the
INITIAL state. Pulled this trick from the PowerVR driver.
v2 (Jason Ekstrand):
- Rename from "status" to "state" since that's what's in the spec
- Add vk_command_buffer_begin/end instead of drivers setting it all
manually
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16922>
We need this info for gfx11 param export soon and nir vertex
export lowering in the future.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19429>
It's going to be called by si_get_nir_shader.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19429>
Clamp vertex color in nir. Now only GS copy shader use
si_vertex_color_clamping, so move it there. It will be
completely removed after we switch to nir GS copy shader.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19429>
We know cmdstream buffers are immediately mmap'd, which is both
expensive on the host, and breaks the pipelining as guest is forced
to stall waiting for the host. So pre-allocate some cmdestream
buffers, so that we have something that is (hopefully) already
allocated and mapped to guest's physical memory before we need it.
The older buffer from the head of the prealloc list replaces the
newly allocated buffer which is pushed to the tail of the prealloc
list.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19656>
Using the same size that we suballoc from for suballoc'd streaming and
long-lived stateobjs should help improve bo cache usage, by making more
of the backing BOs the same size and interchangable.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19656>
- consistently use list.extend instead of list +=, which has gotchas
- condense list extension calls when possible
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
Since we're not doing anything fancy, we can just use `subprocess.run`.
I've also removed the custom error class, we're not going to catch it,
so just printing and exiting is fine.
v2:
- Print stdout as well as stderr in case of a glslang failure
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
We expect that convert_to_static_variable and override_version will find
and replace something, so let's fail loudly if they don't.
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
I'm not 100% sure whether it's right to make --vn required, or to avoid
the static conversion, but this seems correct. Mypy (type checking
coming soon) points out that if --vn is None then the
convert_to_static_variable function will fail. Our one use of this sets
--vn, so there is no change there. Making --vn required
ensures that it will never be None, avoiding the problem.
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
The variable is called `extra`, but what's written is `extra - flags`,
and `flags` is undefined, so if the variable was ever passed there would
be an uncaught exception.
fixes: 9786d9ef2a
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
In Python 3 (the only python we support) `io.open` is an alias of the
builtin `open` function, so it's not getting us anything, and we're not
using it consistently.
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
args.Olib is set to `store_true`, which means it will always be `True`
or `False`, this means that the we always, unconditionally, add
`--keep-uncalled` to the command line.
fixes: 9786d9ef2a
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19449>
aztec_ruins under ANGLE was getting LRZ writes disabled because 0xf out of
the 0x3 mask was enabled. The goal was to see if there are partial writes
being done, though. This caused a 2-3% performance regression.
Fixes: 85d0205db1 ("tu: Implement extendedDynamicState3ColorWriteMask")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19635>
The nir_opt_algebraic() call to clean up nir_lower_imul's split up mul
operations (stuff like "the top 16 bits were 0, no need to mul and add
that part") would trigger the options->fuse_ffma_* early ffma splitting,
so you need to call nir_opt_algebraic_late() again after that (which, in
turn, requires a DCE).
Gets us a lot more ffmas in Aztec Ruins high under zink/angle, but doesn't
seem to change perf.
shader-db highlights:
total instructions in shared programs: 11574843 -> 10999629 (-4.97%)
instructions in affected programs: 3308870 -> 2733656 (-17.38%)
total dwords in shared programs: 24344722 -> 23230122 (-4.58%)
dwords in affected programs: 6569568 -> 5454968 (-16.97%)
total full in shared programs: 762616 -> 762224 (-0.05%)
full in affected programs: 15505 -> 15113 (-2.53%)
total stp in shared programs: 4046 -> 4050 (0.10%)
stp in affected programs: 3372 -> 3376 (0.12%)
total ldp in shared programs: 2166 -> 2170 (0.18%)
ldp in affected programs: 1716 -> 1720 (0.23%)
total (ss) in shared programs: 219541 -> 216261 (-1.49%)
(ss) in affected programs: 23227 -> 19947 (-14.12%)
total (sy) in shared programs: 101633 -> 101927 (0.29%)
(sy) in affected programs: 8611 -> 8905 (3.41%)
total waves in shared programs: 1501942 -> 1501772 (-0.01%)
waves in affected programs: 1880 -> 1710 (-9.04%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18946>
If we're in handle_collect()'s dst allocation and are part of a merge set
near the end of the file, our check for reg_elem_size(reg) would let us
use the preferred reg when that would immediately lead to
allocate_dst_fixed() creating an interval extending thruogh reg_size(reg)
that overflows the file.
Avoids a regression on gfxbench5/gl_5_high_off/17.shader_test in the next
commit. No change on shader-db.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18946>
While Panfrost allocates linear images with strides that are a multiple of 64
bytes, other dma-buf producers on the system may not satisfy this requirement.
However, at least on v7 and newer, any image with a regular format must have a
stride that is a multiple of 64 bytes.
This fixes a real bug in an application that created a linear R8_UNORM image
with stride 480 bytes, imported it as an EGL_image, and then tried to texture
from it with the GPU. Previously, the driver allowed this situation but it
resulted in an imprecise fault from the GPU. This patch corrects the driver to
reject the import as invalid due to the unaligned stride, ensuring we never
attempt to texture from such a resource.
To implement, we add some new layout queries to centralize knowledge about the
stride alignment requirements, and we sprinkle in asserts to show how the
invariant is upheld throughout the lifecycle of image creation to texturing.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19620>
For 2D UI workloads and even most 3D workloads, the indirect dispatch shader
won't actually be needed, but we currently compile it during eglInitialize() on
every v7 application. That hurts app start-up time, especially given that this
shader doesn't hit the disk cache. We can instead defer compiling this shader
until it's actually needed, when glDispatchComputeIndirect() gets called.
The tradeoff is that the first glDispatchComputeIndirect() call will be (much)
slower than successive calls, since we need to build and compile this internal
shader. I'm unconvinced that's a problem in practice.
An app would need to call glDispatchComputeIndirect for the first time in the
middle of a scene. 2D apps never would call that, OpenCL doesn't have that, and
GL compute will have the same costs just moved around. So it's down to a 3D
GLES3.1 app that indirectly dispatches compute for the first time time in the
middle of a scene. Which, meh? It's not entirely implausible but we have bigger
fish to fry, and this fixes a real problem (about 5% of eglInitialize time spent
building this shader that won't actually get used).
es2_info starts slightly faster with this change.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19622>
This also removes the pvr_finishme(), as this is an improvement rather than
something we must do.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19578>
* The pvr_finishme() in pvr_CreateImage() was added before vk_image_create() was
being used and is no longer relevant.
* There's nothing special we need to do for the graphics pipeline flags and
we don't currently store anything in the pipeline cache, so there's nothing
to finish here.
* The firmware interface now uses fixed sized structures, so remove related
FIXME.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19578>
- All the PDS programs setup in the pipeline are necessary. We
can attempt optimisations later on.
- No need to call pvr_pds_program_program_create_and_upload() in
a loop.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19523>