this is useful across drivers for maint5 semantics on mobile hw.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34762>
When copying between buffers, find the biggest possible block size usable
for all copy regions. A common block size is used since using different
block sizes can require additional flushing between different blocks.
Besides the single-byte and 4-byte block sizes, also allow for 16-byte
block size and the appropriate corresponding format. Using bigger block
size when possible helps potentially reduce the number of required
CP_BLIT operations. Tested on the Crucible benchmarks, especially for
larger copy regions this can improve throughput up to 3x.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34587>
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
Will be useful if app doesn't specify depth direction correctly.
E.g. the capture of "Sons of The Forest" I have has a shader
where `gl_FragDepth` has `layout(depth_less)`, but the output for different
fragments is actually sometimes less, sometimes more than the original depth
by a tiny margin.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34423>
Specifying depth write direction in shader may help us. E.g.
If depth test is GREATER and FS specifies FRAG_DEPTH_LAYOUT_LESS
it means that LRZ won't kill any fragment that shouldn't be killed,
in other words, FS can only reduce the depth value which could
make fragment to NOT pass with GREATER depth test. We just have to
enable late Z test.
There is the same concept in D3D11 and it is seen e.g. in "Stray" game.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34423>
Add a debug option which checks the status of a command buffer, if
it has completed execution on the GPU, before it's reset or
destroyed.
This works by getting the GPU to write to a specific memory address
on vkBeginCommandBuffer and vkEndCommandBuffer, so the CPU can check
that the GPU has written the TU_CMD_BUFFER_STATUS_IDLE to the slot
before actually resetting or freeing the command buffer.
This can to help in debugging sync issues within the driver.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34383>
Consider the flag from PPS when setting tc/beta offset.
This fixes some artifacts when decoding a hevc video,
hevc_scaling_list4.mkv from Lynne.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34782>
This is getting complicated and depends on some inter-linked details for
safety such as values being in-range. It's safer if the rest of the IR
is forced to use public interfaces.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
wherever we check that src_mod is none.
This commit simply does:
s/src_mod.is_none()/is_unmodified()/
across all of nak except the definition of is_unmodified() itself.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: bad23ddb48 ("nak: Add F16 and F16v2 sources")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
This lets us store up to 16 SSAValues in an SSARef, while keeping the
common case of 4-or-fewer SSAValues allocation-free.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
SSABuilder::alloc_ssa() is now only for scalars. We intoduce
SSABuilder::alloc_ssa_vec() which handles the vector case the way
alloc_ssa previously did. This matches the split used in SSAValueAllocator
We're about to drop Copy from SSARef, which makes it a lot more annoying
to deal with. SSAValue will remain Copy though, so we want to start
using it instead of SSARef where possible.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
Code that might have an invalid SSAValue is encouraged to use
Option<SSAValue> instead of NONE, which is now the same size as a u32
and provides more type safety.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
Required extensions to support wsi have been relaxed earlier for sw blit
path. So the renderer extensions enablement is fixed to be passive based
on the renderer side correspondingly.
Test: emulate to drop dma-buf and modifier support from host anv driver,
and confirm wsi via venus works with sw blit fallback and device
creation no longer returns VK_ERROR_FEATURE_NOT_PRESENT.
Fixes: 06f5d1a105 ("venus: expose WSI on renderer without dma-buf support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34827>
Without a type, we can't really interpret the data. Currently, it just
returns None in the presence of modifiers so it's okay. Also, all of
the callers of this helper today do so on the source of an OpPrmt which
doesn't support source modifiers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34818>
This makes more sense as we don't want any complex logic around source
modifiers or any of that. We just want to handle Zero and Imm32 in the
same case. Also, explicitly assert that modifiers are None, which is
more clear anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34818>
Those platforms requires aux map with 1MB alignment, for slab that
means that any buffer needs to have size of multiple of 1MB what
causes a lot of memory to be wasted causing it to run out of memory
when running multiple GPU applications.
Fixes: ea18572ff2 ("anv: Add support for ANV_BO_ALLOC_AUX_CCS in anv_slab_bo")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34817>
v10 have 96 and v12+ have 128, not the opposite.
Fixes: 811525b543 ("pan/genxml: Build libpanfrost_decode for v12")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34815>