There is nothing to do for
VK_PIPELINE_CACHE_CREATE_INTERNALLY_SYNCHRONIZED_MERGE_BIT_KHR because
the vulkan/runtime code already locks the dstCache unconditionally.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33091>
The opposite is already supported. Note that only one aspect (depth or
stencil) is supported when it's a copy<->depth/stencil copy, and
multiplanar images aren't supported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33091>
Indeed, has_streamout is not yet properly initialized at the time
of the call of r600_init_screen_caps(). This change fixes this
issue.
Here is the issue visible on palm at the glxinfo level; the right column is affected:
Preferred profile: core (0x1) | Preferred profile: compat (0x2)
Max core profile version: 4.5 | Max core profile version: 0.0
Max compat profile version: 4.5 | Max compat profile version: 2.1
Max GLES1 profile version: 1.1 Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.1 | Max GLES[23] profile version: 2.0
Fixes: 7cd606f01b ("r600: add r600_init_screen_caps")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33106>
This is pure guess but I think GFX12 now uses 48-bits VAs for
configuring the SQTT buffer. This isn't yet enough to generate a
capture because it's missing some info I don't know, but it's a start.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33068>
It's not rare when only last few draws in a big renderpass disable
LRZ, we shouldn't bail out in such case.
If LRZ is disabled in dir tracking bit during binning - LRZ would
be disabled for the whole IB in the tiling step, so we should avoid
disabling via dir tracking bit and track the state inside the driver.
This doesn't work with secondary command buffers (and renderpass
resume/suspend), in such cases we have to disable LRZ via dir tracking
bit, if LRZ is not valid.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32868>
Aside from displaying in a tracepoint, it would be useful in order
to decide whether to disable LRZ for the whole renderpass.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32868>
If blend LRZ state was already calculated from static info,
re-calculating it with dynamic state would bring stale values
and therefor result in a wrong calculations.
This resulted in LRZ being disabled when it should have not in
native VK titles.
CC: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32868>
Currently, when allocating the dst of shared collects, the registers of
its srcs would only be reused if they are killed. This results in a lot
of needless moves due to suboptimal register allocation.
This commit addresses this generically: allow unavailable registers to
be used for a new dst iff it shares a merge set with, and has the same
offset as, the currently live value.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32021>
Detecting non-trivial collects after the fact, i.e., once they are
created by a register assignment of their dst, does not work as this may
cause two different intervals to share a physreg. For example:
_meta:collect sssa_361:150(r50.x) (wrmask=0xf),
sssa_19:12(r50.x), sssa_103:13(r50.y),
sssa_355:102(r51.z), sssa_356:103(r51.w)
This is a non-trivial collect with a partial overlap with one of its
child intervals. After moving its dst to a new interval, it will have
the same physreg as the existing interval for sssa_19, causing all sorts
of trouble for RA.
Prevent this by detecting that a future collect may become non-trivial
at the moment one of its sources gets a register assignment that does
not correspond with it merge set's preferred reg and allocating a new
interval for this component.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: b36a7ce0f1 ("ir3/ra: prevent moving source intervals for shared collects")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32021>
Need to use mfence to strictly order mqd wptr update and ringing doorbell
in cpu. If the compiler or cpu re-orders it, commands will be missed.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32700>
The userq packets are added using _pkt_begin(), _pkt_add(), _pkt_end()
functions. As of now _pkt_being() and _pkt_add() is called once. It
is not advisible to update wptr value in mqd multiple times. Hence use
next_wptr as cache in the macros and update mqd mptr before job submission
only once.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32700>
The signal ioctl should only be called after guaranteeing that the hardware
started working on the submissions and that is only after doorbell is ringed.
Otherwise it can in theory happen that the application creates the fence and
is then interrupted before ringing the doorbell. That can result in a GPU
reset because the fence times out.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32700>
Determine whether the device has hardware raytracing support early, and
then use this result where needed, instead of checking for `gfx_level`
every time.
This is a prerequisite for CYAN_SKILLFISH chip enablement. This chip is
still GFX10, not GFX10_3, but has hardware support for accelerated
`image_bvh{,64}_intersect_ray` instructions. Just checking for `gfx_level`
is insufficient for it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33109>
This should let us lower the RT query stack to registers instead of
scratch by getting rid of the rest of the members of the ray query
struct. This gives a 24% decrease in total time for 3DMark InVitro.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28447>
Remove the redundant error that will never be hit in practice (because
if fd_dev_name() succeeds then so will fd_dev_info()) and move it up so
that we can use the info when generating the name.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28447>
There is native support for D3D-style untyped UAVs, which are an unsized
array of "records."
This will be needed for acceleration structures, because normal SSBO
descriptors aren't large enough to cover all the 128-byte instance
descriptors for the maximum number of instances (2**24).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28447>
Make sure that struct vk_bvh_geometry_data is defined before
vk_fill_geometry_data(), to fix this error:
In file included from ../src/freedreno/vulkan/tu_acceleration_structure.cc:29:
../src/vulkan/runtime/vk_acceleration_structure.h:138:1: error: 'vk_fill_geometry_data' has C-linkage specified, but returns incomplete type 'struct vk_bvh_geometry_data' which could be incompatible with C [-Werror,-Wreturn-type-c-linkage]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28447>