Starting with e99081e76d, we don't re-construct liveness information
every time we spill a register. Instead, we're very careful to track
which instructions are spill instructions and not contribute those to
the IP count so that we can continue to use the old liveness information
even though instructions have been added. This commit adds an assert
that sanity-checks that we count the same number of instructions as our
liveness information is based on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
This opcode is responsible for setting up the buffer base address and
per-thread scratch space fields of a scratch message header. For the
most part, it's a copy of g0 but some messages need us to zero out g0.2
and the bottom bits of g0.5.
This may actually fix a bug when nir_load/store_scratch is used. The
docs say that the DWORD scattered messages respect the per-thread
scratch size specified in gN.3[3:0] in the message header but we've been
leaving it zero. This may mean that we've been ignoring any scratch
reads/writes from a load/store_scratch intrinsic above the 1KB mark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
In theory, this fixes a bug where we were dropping the PTSS bound on the
floor. The hardware docs claim that the A32 DWORD and BYTE scattered
read/write messages do a PTSS bounds check. However, in practice, it
seems that the hardware ignores the bounds check so this doesn't
actually matter. I verified this with the following couple of piglit
tests:
https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399
In practice, this prevents the next commit from making a subtle
behavioral change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
Because we hard-coded the default to vec4, any platform where it doesn't
have a "Dispatch Mode" field gets vec4 by default. This includes Gen11+
where vec4 is no longer a thing. Change the default so it works on
newer hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
The aliasing we were using was not always correct. Particularly,
for 3D images, the simulator would complain about image strides
not being large enough in some cases.
This patch fixes this by aliasing both src and dst images and
carefully choosing the alias dimensions taking into account the
format chosen for the copy and the ratio of block sizes between
both images.
Playing a bit with the image dimensions used by the relevant CTS
tests we confirmed this works well for all tile layouts (lineartile,
ublinear1/2 and UIF).
This fixes all CTS tests involving 3D image copies from compressed
formats without needing to force UIF layout for all compressed
images (which would actually not work for all image sizes either).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This patch addresses various issues, mostly from secondary command buffers
that recorded pipeline barriers that are not consumed in the secondary itself,
so they need to be applied to jobs that come right after the execution of the
secondary in a primary command buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.
Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far V3DV_ENABLE_DEFAULT_PIPELINE_CACHE allowed to configure
pipeline cache to avoid any caching using a pipeline cache.
With this change we can be more detailed. Then envvar is not anymore a
boolean. Allowed values:
* "off": no pipeline cache at all. PipelineCache objects behaves as
no-op objects.
* "no-default-cache": user PipelineCache caches nir/variants, but we
don't provide a default cache in case the user doesn't provide a
PipelineCache object, neither for internal pipelines.
* "full" (default): we provide a default PipelineCache, used when
the user doesn't provide one when creating a Pipeline, and for
internal Pipelines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't want to let the default pipeline cache grow without limit. We
choose a maximum number of entries that should work for all real world
applications. CTS will exceed that limit, but that is okay, as it will
prevent us from running out of memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Some shaders that need to spill hundreds of registers can take very long times
to compile as each allocation attempt spills a single register and restarts
the allocation process. We can significantly cut down these times if we allow
the compiler to spill in batches, which should be possible if we are spilling
uniforms, which is in fact the kind of spills that we do first because they
have lower cost than TMU spills.
Doing this could cause us to slightly over spill in some cases (depending on
the chosen batch size) leading to slightly worse performance, so we only
enable this behavior after we have started to spill over a certain threshold,
at which point we assume that performance won't be good and we want to
favor compilation speed instead.
v2:
- Keep it simple and just try to spill a fixed amount of registers in a
batch instead of trying to compute this dynamically based on accumulated
spills and current register pressure. (Eric).
v3:
- Check if the node is valid before doing anything with it.
- Drop the environment variable to select batch size and just fix it to 20.
With this we can take this CTS test from 35 minutes down to about 3 minutes:
dEQP-VK.ssbo.layout.random.all_shared_buffer.5
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had some code on blit_tfu to hande 3D images but it was wrong. For
example, it executed a copy on the 3D image no matter the depth
component copy needed. This was not detected until vk-gl-cts 1.2.4
introduced more 1D and 3D blitting tests.
Also add checks for rely on blit_shader if needed like when mirroring
on the depth component.
Fixes the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.whole_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.whole_3d.nearest
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When sampling the stencil aspect we want to reinterpret the D24S8 format
as RGBA8 and read stencil values from the R component.
Fixes:
dEQP-VK.renderpass.suballocation.formats.d24_unorm_s8_uint.input.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Gets tests like the following one properly skipped:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.1d.etc2_r8g8b8a8_unorm_block.etc2_r8g8b8a8_unorm_block.optimal_general
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have only been exposing linear for WSI formats and UIF on
everythig else, but we should instead expose linear or UIF based
on whether the underlying format supports any features for the given
layout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When negotiating DRM modifiers, applications may use this to validate the
features that are supported with a particular modifier. The WSI code in
Mesa relies on this to validate its modifiers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By basing the tex_coord on the max layer, instead of min (similarly to
what we do for mirroring x/y)
Avoid all crashes, and get to Pass most of the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.*
The only one failing is this one:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
but looks that the core cause would be different, as there are other
3d nearests tests failing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Command buffer object destruction callbacks take 64-bit object
handles, but we defined the color clear pipeline callback to take
a 32-bit argument.
Should fix recent crash regressions with some CTS tests on Rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is the same as nir_get_buffer_size but geared towards UBOs instead
of SSBOs. The new intrinsic is useful in Vulkan backends that need to
add bound checks on buffer accesses to honor the robust buffer access
feature.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Subpass color clear pipelines are those used to emit partial attachment
clears as draw calls inside the render pass currently bound by the
application in the command buffer, leading to a huge performance improvement
compared to the case where we emit them in their own render pass.
Unfortunately, because the pipeline references the render pass
object in which it is used and the render pass object is owned by the
application (and can be destroyed at any point), we can't cache these
pipelines (unless we implement a refcounting mechanism or other
similar strategy).
Performance impact looks negligible based on experiments with vkQuake3,
probably because the underlying pipeline cache is preventing the
redundant shader recompiles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specifically, we should select the slice to blit from on the source
image to be in the middle of the depth step.
This issue was only raised recently after the CTS improved the 3D
blitting tests.
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*.3d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Originally, copies between buffers and images required a buffer offset
that was a multiple of 4 bytes, however, the spec was later fixed to
relax this rule and only require offsets that had texel alignment.
Our implementation of image to buffer copies using the blit path needs
to bind the destination buffer as a linear image and be able to bind
the requested buffer memory at the required offset, so for that to work
we need to chnage the alignment requirements for linear images to match
the relaxed texel alignment requirement.
Fixes new tests in Vulkan CTS 1.2.4:
dEQP-VK.api.copy_and_blit.core.image_to_buffer.buffer_offset_relaxed
dEQP-VK.api.copy_and_blit.dedicated_allocation.image_to_buffer.buffer_offset_relaxed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The lowering will get all the interpolateAt() functions from GLSL lowered to
the corresponding intrinsics we have just implemented in the compiler backend,
which was the last piece we needed to enable the feature.
This gets us to pass all the relevant tests in:
dEQP-VK.pipeline.multisample_interpolation.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>