The BROADCOM_SAND128 modifier is usually used with an extra parameter
to pass in the stride via a side channel. Quoting from drm_fourcc.h:
> The pitch between the start of each column is set to optimally
> switch between SDRAM banks. This is passed as the number of lines
> of column width in the modifier (we can't use the stride value due
> to various core checks that look at it , so you should set the
> stride to width*cpp).
So apparently this is just a workaround for limitations in some kernel
APIs. DRM modifiers, however, are arguably a bad fit for extra
parameters that aren't known in advance. In the Wayland/KMS ecosystem
many components depend on being able to treat modifiers as opaque, e.g.
for negotiations etc. In practice the current approach requires various
software components to manually use the
`DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT()` macro - using the
`DRM_FORMAT_MOD_BROADCOM_SAND128` modifier directly with formats like
`NV12` results in a rejection in the KMS driver and corrupted output
in Mesa (because we'd bail out early in `v3d_sand8_blit()`).
Fortunately the stride check limitations mentioned above don't seem to
apply to Mesa though. Thus we can just add support for the base modifier
and stride (coming from V4L2), allowing various toolkits, Wayland
compositors and V4L2 decoder implementations to support e.g.
`NV12` + `DRM_FORMAT_MOD_BROADCOM_SAND128` (`NC12` in V4L2) in a generic
way.
Notes:
1. Wayland compositors trying to offload composition to KMS will still
fail when doing a test commit.
2. There is another limitation - in the V4L2 MPLANE API - that
requires userspace to know the correct offset of the second plane. That's
a known API limitation though and only affects V4L2 decoder implementations.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32033>
This fixes a corner case of the LNL sub-dword integer restrictions
that wasn't being detected by has_subdword_integer_region_restriction(),
specifically:
> if(Src.Type==Byte && Dst.Type==Byte && Dst.Stride==1 && W!=2) {
> // ...
> if(Src.Stride == 2) && (Src.UniformStride) && (Dst.SubReg%32 == Src.SubReg/2 ) { Allowed }
> // ...
> }
All the other restrictions that require agreement between the SubReg
number of source and destination only affect sources with a stride
greater than a dword, which is why
has_subdword_integer_region_restriction() was returning false except
when "byte_stride(srcs[i]) >= 4" evaluated to true, but as implied by
the pseudocode above, in the particular case of a packed byte
destination, the restriction applies for source strides as narrow as
2B.
The form of the equation that relates the subreg numbers is consistent
with the existing calculations in brw_fs_lower_regioning (see
required_src_byte_offset()), we just need to enable lowering for this
corner case, and change lower_dst_region() to call lower_instruction()
recursively, since some of the cases where we break this restriction
are copy instructions introduced by brw_fs_lower_regioning() itself
trying to lower other instructions with byte destinations.
This fixes some Vulkan CTS test-cases that were hitting these
restrictions with byte data types.
Fixes: 217d412360 ("intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30630>
Relax this assert based on x/y offsets for GFX_VERx10 >= 200.
This is getting hit when running gfxbench5 on LNL/BMG.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32128>
Since the shader parameters are passed as inline data, push constants
are no longer used and so, not actually set on dispatch. But the
nr_params = 4 was still making the shader emit the code to load them,
causing page faults on simulation, and would also on HW if we didn't
always have a scratch page set.
The uses_inline_data parameter will be set from brw_compile_cs(), called
shortly after this point, so we don't need it here.
The subgroup_size is misleading, as we don't actually require that size
and the code that checks for it isn't even running for this shader.
Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12152
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32150>
From the perspective of the gpu, host read or host write has the same
implication (gpu cache flush) in the dst access flags. We should
include host write in the dst access flags.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32102>
For the host-to-device domain operation, it is possible that
wait_sb_mask is empty but there is a cache invalidaton,
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
Fragments are processed in rasterization order within a fragment job.
The fragment subqueue self-wait is nop in most cases. The only
exception is when there is a feedback loop.
When there is a feedback loop, because we lower subpassLoad to
texelFetch, we have to split the render pass.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
IDVS jobs within a render pass use the same scoreboard slot. There is
no need to wait.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
We don't emit the fragment job until the end of a render pass. There is
nothing to wait.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
The fragment subqueue always waits for the tiler subqueue. There is no
need to emit additional waits for barriers.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
src_stages and dst_stages together define an execution dependency. Both
of them should be considered at the same time.
Add a low-level helper, add_execution_dependency, to translate pipeline
stages to subqueue wait masks. The subqueue wait masks only specify
which subqueues should wait for which. The callers will decide how the
waits are performed exactly.
Update collect_cs_deps to call add_execution_dependency and use the
subqueue wait masks to initialize panvk_cs_deps.
The main difference is that barriers such as
.srcStageMask = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
.dstStageMask = VK_PIPELINE_STAGE_2_NONE,
are ignored.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
src_access defines the availability op and the host-to-device domain op.
dst_access defines the visibility op and the device-to-host domain op.
They should be treated separately.
Add a low-level helper, add_memory_dependency, to translate access flags
to panvk_cache_flush_info.
Update collect_cache_flush_info to use add_memory_dependency. Also
replace the custom subqueue access flag mappings by
vk_filter_{src,dst}_access_flags2.
The main difference is that barriers such as
.srcAccessMask = VK_ACCESS_2_MEMORY_WRITE_BIT,
.dstAccessMask = VK_ACCESS_2_NONE,
or
.srcAccessMask = VK_ACCESS_2_NONE,
.dstAccessMask = VK_ACCESS_2_MEMORY_READ_BIT,
are no longer ignored.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32074>
Add and use panvk_memory_mmap and panvk_memory_munmap.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32125>
Adds an optional region selection, based off percentages of the
starting/ending of an image's X & Y values.
This is intended as a performance enhancement tradeoff for smaller
images to be created.
With a smaller image size, the screenshotting layer will change the
region boundaries on the GPU side, which will decrease the amount of
time it takes to copy the image over to CPU-accessible memory.
Using vkcube as an example, the original image size is 500x500:
mesa-screenshot: DEBUG: Screenshot Authorized!
mesa-screenshot: DEBUG: Needs 2 steps
mesa-screenshot: DEBUG: Time to copy: 123530 nanoseconds
Then, by cropping the area to a 100x100 image, we get the following:
mesa-screenshot: DEBUG: Screenshot Authorized!
mesa-screenshot: DEBUG: Using region: startX = 40% (200), startY = 40% (200), endX = 60% (300), endY = 60% (300)
mesa-screenshot: DEBUG: Needs 2 steps
mesa-screenshot: DEBUG: Time to copy: 12679 nanoseconds
For this example, this is a ~90% time reduction improvement!
Overall, this option reduces the copy time to a point where it can
become negligible, relative to the frame time of the application.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Felix DeGrood felix.j.degrood@intel.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32016>
Lots of tests are hitting the assert, one in particular :
dEQP-VK.binding_model.mutable_descriptor.single.switches.sampler_combined_image_sampler.update_copy.nonmutable_source.normal_source.pool_same_types.pre_update.no_array.comp
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6d11ba5b4 ("anv: Protect memcpy/memset/qsort calls against NULL arguments")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32133>
When (off_const > max) there is a wrap around uint when calling
try_extract_const_addition.
Exit early since folding doesn't make sense in this case.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32118>
All of the spilling code should work with physical register units
because for example SEND messages will expect a physical register as
destination.
So always allocate a full physical register for the spilled/unspilled
values and adjust the offsets of the registers to physical sizes too.
Cc: mesa-stable
Fixes: aa494cba ("brw: align spilling offsets to physical register sizes")
Closes: mesa/mesa#11967
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32124>
The g610-vk jobs are just too unstable to be pre-merge jobs. Let's keep
them as post-merge so people can still execute them manually if they
care.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32129>