This is all supported by the common nir code, no changes needed on our
end.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
Valhall supports all combinations of ftz/preserve denorm behavior
between FP16 and FP32 except FP16=ftz, FP32=preserve. Because of this,
we can't advertise independent denorm behavior.
Even with INDEPENDENCE_NONE, it is still possible for shaders to set
denorm behavior for one size and leave the other size unspecified.
Previously we were defaulting to preserve for any unspecified size, but
with FP16=ftz, we need to default unspecified FP32 to preserve.
When advertising INDEPENDENCE_NONE, the CTS checks that the
shaderDenormFlushToZeroFloat* and shaderDenormPreserveFloat* features
are equal for all sizes, so we need to advertise the same supported
denorm behavior for FP64 even though we don't support FP64 at all.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
On bifrost independent float controls are implementable, just
potentially expensive because it requires scheduling FP16 and FP32
instructions in separate clauses.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
This allows more efficient scheduling by putting a 16-bit int
instruction in the same clause as a 32-bit float instruction even when
the 16-bit and 32-bit float controls are different.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
The current behavior is identical, but we can express that some
instructions may be packed in either FTZ and no-FTZ clauses in the
future.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
Using 'FADD.f32 x, +0' for f32->f16 conversions strips signed zero,
which we can't do if we advertise shaderSignedZeroInfNanPreserveFloat16.
Adding -0 instead preserves the original sign.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: b63ef74e73 ("pan/bi: Stop using V2F32_TO_V2F16 on Valhall")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
Many float instructions do not have a rounding mode modifier, but all of
the operations that are listed as requiring correct rounding in the
vulkan spec are supported in hardware.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
These are needed to implement VK_KHR_shader_float_controls rounding
mode.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
With vkCmdPipelineBarrier, it's possible to specify a barrier with
pipeline stages but without any memory barriers. These might not be
practical, but are legal Vulkan code.
Barriers like this are currently ignored in mesa, as we only convert
barriers with passed memory barriers into vkCmdPipelineBarrier2.
This commit adds handling of execution only barriers by converting them
into a memory barrier without access masks.
Fixes: 97f0a4494b ("vulkan: implement legacy entrypoints on top of VK_KHR_synchronization2")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34187>
Fix the nginx cache snippets - I'd missed the file nesting somehow.
Tested on a debian:bookworm image with nginx-full installed, checked
that we could pull an arbitrary external site, as well as S3, as well
as GitLab artifacts.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34341>
Get flush_id once per command buffer in the submit and use it for all
subqueues instead of getting a new flush_id for every subqueue.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34448>
Fixes some upcoming CTS tests for texture clears.
* some drivers will attempt to issue clears with zero range
and hit asserts/crashes (spec clarification for negative
values)
* fix error thrown with negative values to match spec
* fix cases for clearing generic compressed formats
* fix negative case of using color format while having
depth/stencil internalformat and vice versa
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34428>
Fixes upcoming CTS test checking for clamping.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34428>
When the color/input attachment map is known at compile time, we can
determine the set of read-only render targets and replace .wait by
.wait_resource flows, in order to avoid read-after-read serialization.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
On Valhall we can optimize lower waits, which waits for both readers and
writers, into resource_waits which only wait for writers, allowing
threads accessing read-only resources to execute concurrently.
Let's use that on LD_TILE instructions so we can optmize the read-only
case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Now that we support local reads we can safely advertise
KHR_dynamic_rendering_local_read.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
If we are in a render pass, the intra-draw synchronization happens
through the FPK parameters, shader waits and draw dependencies, so we
can safely skip the barrier in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
When we know the input attachment is also an active color attachment
we can load the value from the tile buffer instead of going back to
the texture.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
In order to dynamically load the content of the tile buffer, we need
to know the target (color, depth or stencil) and the conversion to
apply. Let's define the load_input_attachment_{target,conv}_pan
intrinsics so we can dissociate the logic lowering input attachment
loads into load_converted_output_pan, and the part optimizing the shader
when input attachment map is passed at compile time.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We take the color attachment remapping into account when emitting
blend descriptors, and we make sure we re-emit those when this color
attachment map is dirty.
We also need to take the remapping into account when checking the
render targets written by the fragment shader, hence the addition of
a color_attachment_written_mask() helper.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
The only sysval that changes is the layer_id, so let's call
cmd_prepare_draw_sysvals() outside of the layer loop, and manually
update the sysval there.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We are about to allow ZS tile buffer reads in panvk in order to support
VK_KHR_dynamic_rendering_local_read, and this requires dealing with
a new case in the early ZS logic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Do what the gallium driver does and generate the LUT when creating the
shader to avoid regenerating this LUT in the draw path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We are about to add FS specific info there, so let's make sure all the
per-stage bits are part of a union and are conditionally
filled/[de]serialized based on the shader type.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Re-order things in panvk_deserialize_shader() to avoid declaring local
variables for stuff we feed the panvk_shader with. The only exception
is pan_shader_info, because we need to know the shader stage to call
vk_shader_zalloc(), which if part of pan_shader_info.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Secondary command buffers don't necessarily inherit their render
context. If we flush draws, we should only set invalidate_inherited_ctx
when the render context is inherited, otherwise is messes up with the
primary command buffer state.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Needed if we want to lower multisample input attachment loads to
tile buffer loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
This allows us to pass a dynamic render target which will be needed
to support VK_KHR_dynamic_rendering_local_read.
While at it, we also enable support for depth/stencil tile loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
LD_TILE has a .z_stencil modifier we can use to access the depth/stencil
tile buffer.
This will be needed for native depth/stencil input attachments support in
panvk.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We should only force a preload after a batch split if the batch we
flush had draws, otherwise we might lose the effect of clears asked
by the user.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
When no input attachment location info is provided, the depth/stencil
attachment are supposed to be NO_INDEX, not UNUSED, and we should also
set the color_attachment_count to UNKNOWN.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Some upcoming changes in the runtime will make it impossible to rely
on the pipeline or runtime information to know whether a fragment
shader has input attachments.
Instead we gather that information at compile time and store it in our
shader bind_map.
At runtime we check whether the fragment shader has input attachments
and whether those map to the runtime depth/stencil input attachments
to set the 3DSTATE_PS_EXTRA::PixelShaderKillsPixel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
For drivers using the render pass emulation provided by the
runtime, it's important to express the mapping between
depth/stencil/color attachments and input attachments using
VkRenderingInputAttachmentIndexInfoKHR, otherwise those drivers
have to special-case emulated render passes in their
CmdBeginRendering() implementation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
vk_dynamic_graphics_state_copy() is not copying the input attachment
map, and color_attachment_count is not initialized in
vk_dynamic_graphics_state_init_ial().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We just opened the GEM handle a few line before (or used drmPrimeFDToHandle to
acquire it), on failure it is just better to close it.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34437>
Panfrost supports pushing uniforms to hardware uniform registers (RMU/FAU for
Midgard/Bifrost respectively). Since OpenGL uniforms are lowered to UBO #0, it
does this with a pass that pushes UBOs. That's good!
The pass also pushes 'true' OpenGL UBOs, since they look the same in the backend
at this point. This is where the trouble comes in:
- True UBOs are allocated in GPU BOs, not CPU allocated buffers. That means it's
write-combine memory, which we cannot read from efficiently (at least
depending on coherency details that were never plumbed through panfrost.ko and
unlikely to be replumbed now that panthor is the new hot stuff). So, pushing
true UBOs reduces GPU overhead at the cost of tremendous CPU overhead. This is
dubious... When I benchmarked this on MT8192 in early 2023, this pushing
improved FPS in SuperTuxKart but hurt FPS in Dolphin.
- True UBOs can be written on the GPU. In OpenGL, we have batch tracking
infrastructure to sort this mess out in theory. What this means is that
pushing UBOs requires us to flush writers AND STALL at draw-time. If this is
ever hit, our performance is utterly trashed. But it gets worse.
- True UBOs can be written in the same batch that reads them. For example, we
could bind a buffer as a transform feedback buffer, do a draw with XFB, then
rebind as a UBO and do a draw reading. This is where we collapse -- our logic
will flush the writer, which is the same batch we were in the middle of
enqueueing a draw to. When we try to push words, we'll crash with theatrics.
This could be solved by smartening the batch tracking logic but it's not
trivial by any means.
So, pushing true UBOs on the CPU is broken and can hurt performance. Stop doing
it!
Long term, the solution will be to push on the GPU instead. This avoids all of
these issues. This can be done with a compute kernel or with CSF instructions.
The Vulkan driver will likely have to do this for performance, since pushing
UBOs from the CPU is utterly broken in Vulkan for the above reasons.
I have a branch somewhere doing this on v9 but I'm doing this on NIR time to
unblock a core change that was crashing piglit due to this pile of unsoundness.
Let's fix the correctness issues first, then someone can look at recovering
performance later when we're not blocking unrelated work.
Fixes corruption in Piglit test
gles-3.0-transform-feedback-uniform-buffer-object, which writes a UBO with
transform feedback. (I suspect the test still doesn't pass for the same reason
it's broken on other tilers. But that's a better place to be than oodles of
memory corruption.)
According to CI, fixes spec@arb_uniform_buffer_object@rendering{-dsa}-offset.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
Currently nr_uniform_buffers will be whatever the previous draw set
for its vertex shader, which is not what the XFB shader usually
expects.
Fixes: c246af0d ("panfrost: Only upload UBOs when needed")
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
ss->info.ubo_mask includes the push+sysval UBO so if there's a user
UBO bound at the same index as the push+sysval UBO, without this
change we end up writing a descriptor for the user UBO at that index.
Fixes: 3b3cd59f ("panfrost: Launch transform feedback shaders")
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
only the GL driver actually wants this, neither panvk nor internal shaders do.
Cc'd as a prereq to the next patch
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>