It is a pretty common pattern to allocate a non-zero sequence # for
lightweight checking if an object is the same, changed, for use in cache
keys, etc. (And also pretty common to forget to handle the rollover
zero case.) Add a helper for this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
MESA_VK_ABORT_ON_DEVICE_LOSS=1 \
TU_DEBUG_STALE_REGS_RANGE=0x00000c00,0x0000be01 \
TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass \
./app
To pinpoint the reg causing a failure reducing regs range could be
used for bisection. Some failures may be caused by multi-reg combination,
in such case set 'inverse' flag which would change the meaning of reg
range to "do not stomp these regs".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21226>
On blob v512.490, using WRAP_GPU_ID to fake GPU versions, I see 0x41 used
everywhere, except for BLIT_OP_SCALE on a630. Define the magic number in
dev info so it can be reused in the two places that set the
non-BLIT_OP_SCALE value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19794>
The blob driver does not support
VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_SEPARATE_RECONSTRUCTION_FILTER_BIT
before a6xx_gen3. It still sets CHROMA_LINEAR bit according to
chromaFilter, but the bit has no effect before a6xx_gen3 (confirmed on
a618 with blob version 512.490.0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19609>
Fixes the following error I noticed when building against aarch64 with
musl libc:
In file included from ../src/freedreno/decode/crashdec.h:38,
from ../src/freedreno/decode/crashdec.c:40:
../src/freedreno/common/freedreno_pm4.h:104:15: error: unknown type name 'uint'
104 | static inline uint
| ^~~~
../src/freedreno/common/freedreno_pm4.h:105:25: error: unknown type name 'uint'; did you mean 'int'?
105 | pm4_calc_odd_parity_bit(uint val)
| ^~~~
| int
Signed-off-by: Jami Kettunen <jami.kettunen@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19665>
Adds support for emitting utrace markers into the CS, this allows
for useful debug information that can be decoded from a recorded
command stream.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18271>
In turnip we were using this a lot with the dynamic state enum, and
we're running out of space there because we're needing to add more and
more dynamic states that don't correspond to draw states. Make it
64-bit-safe so we don't need to rewrite everything in turnip. In the
case where the thing being operated on is 32-bit the compiler can
usually optimize it away, as can be seen with the release build size
before and after:
before:
text data bss dec hex filename
5404913 293592 22744 5721249 574ca1 /home/cwabbott/build/mesa-release/lib64/libvulkan_freedreno.so
text data bss dec hex filename
13981320 498550 205000 14684870 e012c6 /home/cwabbott/build/mesa-release/lib64/dri/msm_dri.so
after:
text data bss dec hex filename
5404969 293592 22744 5721305 574cd9 /home/cwabbott/build/mesa-release/lib64/libvulkan_freedreno.so
text data bss dec hex filename
13981320 498550 205000 14684870 e012c6 /home/cwabbott/build/mesa-release/lib64/dri/msm_dri.so
In the end the only changes is an additional ~50 bytes of text in
turnip.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18912>
There are more magic regs which have different values between GPU
subgenerations than we specified.
The updated list and values where obtained by using libwrapfake
with v631 blob and dEQP-VK.draw.renderpass.basic_draw.draw.triangle_list.1
vk cts test.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18229>
LRZ fast-clear works on all gens, however blob disables it on
gen1 and gen2. We also elect to disable fast-clear on these gens
because for close to none gains it adds complexity and seem to work
a bit differently from gen3+. Which creates at least one edge case:
if first draw which uses LRZ fast-clear doesn't lock LRZ direction
the fast-clear value is undefined.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6829
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17599>
On-GPU LRZ direction tracking allows LRZ to support secondary cmdbufs,
reusing LRZ between renderpasses, and in future to support LRZ when
VK_KHR_dynamic_rendering is used.
With on-gpu tracking we have to be careful keeping LRZ state in sync
with underlying depth image, which means we should invalidate LRZ
when underlying image is changed or the view of image is different
from previous renderpass.
All of this resulted in LRZ logic being thinly spread through the code,
making it hard to understand. So most of it was moved to tu_lrz.c.
For more details on past and new LRZ features see comment at the
top of tu_lrz.c.
Note about blob:
- Blob is much more happy to do LRZ_FLUSH, it flushes at the start
of the renderpass, after binning, and at the end of the renderpass.
- Blob seem not to care about changes in depth image done via
vkCmdCopyImage.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6347
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16251>
It seems that a660 has the same bug. Without the workaround there
are a lot of flakes with depth-stencil tests, e.g. in:
dEQP-VK.pipeline.extended_dynamic_state.*
dEQP-VK.renderpass.depth_stencil_write_conditions.*
dEQP-VK.pipeline.stencil.format.d24_unorm_s8_uint.states.*
Or guaranteed failures like of:
dEQP-VK.pipeline.render_to_image.core.2d.huge.width.r8g8b8a8_unorm_d32_sfloat_s8_uint
Enabling the workaround fixes all of them.
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15548>
A future kernel update will add fuse-id in the upper bits of the
chip_id. Do avoid breaking device matching, add a way to include
a wildcard/fallback fuse-id. (Note that this only effects un-
released devices.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14506>
We're going to need to add a couple more cases. Let's split up the
existing two cases first, rather than piling on more logic to a single
expression.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14506>
We don't really treat the two arguments identically, so rename them to
make it clear which one is the device id coming from kernel, and which
one is the reference id from the fd_dev_recs table.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14506>
- gen4 - has dp4acc and dp2acc, dp4acc is used to implement
4x8 dot product.
- gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product
on both get3 and gen4.
- gen2 - unknown, lower everything.
- gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise
cl_qcom_dot_product8 but still generates code for it.
The assembly is more verbose and uses yet to be documented
mad32.u16 instruction.
Passes:
dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.*
Only packed 4x8 unsigned and mixed versions are accelerated.
However in theory we should be able to do better for signed version
than current NIR lowering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
Inferring from blob's cmdstream the size of shader instruction
cache for:
- a630 is 64
- a650 is 128
- a660 is 128
On a650 and a660 gpu could hang if we exceed the limit. Though
it is not reproducible with computerator or a single amber
test. Also while blob limits the size to 128 - Turnip still
hangs with it but does not hang with the limit of 127.
On a630 there seem to be no hang when limit is exceeded.
Fixes the hang of compute shader in Alien Isolation on a650/a660.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14044>
While it doesn't immediately hang like on a660, it seems to be buggy and
the blob disables it.
This fixes a bunch of r8_* dEQP-VK tests, which seem to pass
individually but don't work when run after other tests. For example this
fixes failures running dEQP-VK.pipeline.sampler.*.r8_uint*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12830>
On some GPUs when:
- depth bounds test is enabled
- depth test is disabled
- depth attachment uses UBWC in sysmem mode
GPU hangs. As a workaround we should enable z test. That's what blob
is doing for a630. And since we enable z test we should make it always pass.
Blob doesn't emit this workaround on a650 and a660. Untested on a640.
Fixes:
dEQP-VK.pipeline.extended_dynamic_state.two_draws_static.depth_bounds_test_disable
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.depth_bounds_test_disable
dEQP-VK.dynamic_state.ds_state.depth_bounds_1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12407>
Move away from using gpu_id as the primary means to identify which
adreno we are running on, as future GPUs (starting with 7c3) stop
providing a gpu_id as a new naming scheme is introduced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
We only need it in a single .c file, so we can make the device table
static. Also rename the struct for device table entries, as I want
to re-use the name 'fd_dev_id'
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
Newer a6xx devices drop this packet from the sqe firmware, and use
direct (pkt4) register writes instead for the few cases that previously
used CP_REG_WRITE.
The turnip part was adapted from Jonathans patch on !10892
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>