If a loop has only one break case, Metal appears to re-order it to after
the loop ends, which goes against the expected behavior for reconvergence.
Work around this by putting the break statement into a trivial, always-true
runtime conditional, when maximal reconvergence is requested.
Fixes dEQP-VK.reconvergence.maximal.compute.nesting*
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41229>
Either the rendering begins in the primary and the information is
already known, or if it begins in the secondary, non-coherent RBs
should already be flushed in EndCommandBuffer().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41136>
This shouldn't have any effects because there is a dirty bit and
binding the same graphics pipeline doesn't trigger it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41136>
Checking if there is a bound graphics pipeline should be enough to
reset the appropriate states. Also clear the dirty pipeline bit only
for graphics pipelines because it's not used for compute pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41136>
This is used to declare barrier dependencies for an addr range
(because no VkBuffer with DAC).
This fixes new dEQP-VK.api.device_address.misc.memory_range_barrier.
Fixes: a97c889a7b ("radv: implement VK_KHR_device_address_commands")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41198>
Different HW units (R2D blit engine, 3D pipeline, etc.) apply subtly
different F32->UNORM16 rounding to the same float clear value, causing
cleared pixels to fail subsequent depth comparisons.
Pre-quantize D16 clear values to exact UNORM16 precision before passing
to any HW path. The GMEM path is unaffected as it already converts to
integer in pack_blit_event_clear_value().
Fixes dEQP-EGL.functional.image.modify.renderbuffer_depth16_renderbuffer_clear_depth
with zink and ANGLE when sysmem is used.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41196>
This checks that the subsampling is specified in the name of the
format.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41151>
This applies to all subsampled layouts, including the planar ones.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41151>
Instead of parsing the format name, let's make the subsampling explicit
in the format table.
This fixes the reported subsampling for Y8U8V8_420_UNORM_PACKED and
Y10U10V10_420_UNORM_PACKED, which didn't match because of the _PACKED
suffix.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41151>
The definiton of Y8_UNORM and Y8_400_UNORM are already identical, with
the only real difference being that pipe_format_to_chroma_format() does
not return R8 for Y8_UNORM. But Y8_UNORM is only used by freedreno,
the intel drivers and virgl, none of which doesn't even call
pipe_format_to_chroma_format() in the first place, so this shouldn't
matter.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41151>
Enables the shaderImageGatherExtended feature and sets the
{min,max}TexelGatherOffset physical device properties.
The properties are queried via Zink and are expected to be non-zero.
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
Rather than always emitting and swizzling 16 components for raw samples,
scale it by the number actually needed as defined by the selected tg4
channel/components.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
The array index value is a signed integer but the compiler was using
the unsigned version of the clamp helper function meaning the value
was not been clamped to 0 when its value was < 0.
Fix the following deqp test cases when shaderImageGatherExtended is enabled
dEQP-VK.glsl.texture_gather.basic.2d_array.*
dEQP-VK.glsl.texture_gather.offset.*.2d_array.*
dEQP-VK.glsl.texture_gather.offset_dynamic.*.2d_array.*
dEQP-VK.glsl.texture_gather.offsets.*.2d_array.*
Fixes: 854563f0f8 ("pco: fully switch over to common smp emission code")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
Use lower_tg4_offsets to take care of explicit offsets, and just swizzle
the texels in the order defined by textureGather*
Fixes: 46c9239c11 ("pvr, pco: initial texture gather support with gather sampler")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
Increase past the minimum required by the Vulkan Spec to fix tests. This
was needed due to Zink requirements which splits
`maxPerStageDescriptorStorageBuffers` between atomic buffers and
`MaxShaderStorageBlocks`.
Fixes the following GLES conformance tests:
KHR-GLES31.core.compute_shader.resources-max
KHR-GLES31.core.draw_indirect.advanced-twoPass-Compute-arrays
KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-cs
KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-fs
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-int
KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case1-cs
KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
dEQP-GLES31.functional.draw_indirect.compute_interop.combined.drawelements_compute_cmd_and_data_and_indices
dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_overwrite
dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_write
dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_overwrite
dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_write
dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_atomic_multiple_write_read
dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_multiple_write_read
dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_overwrite
dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_write
dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_overwrite
dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_write
Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41156>
Mali >= v11 has a Conservative Rast Mode field in DCD Flags 0 with
values Disabled and Over Estimate. Wire it to vk_runtime's
rasterization state and expose the extension on PAN_ARCH >= 11, with
caps restricted to overestimate only — HW has no underestimate value
and no overestimation-size granularity.
On v11-v13, degenerate triangles produce a wrong fragment w when
overestimate is enabled, so cull_zero_area is forced on alongside
the mode bit and degenerateTrianglesRasterized is reported as false.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41189>
The structure will grow in later commits.
The major change is that the preheader and exit blocks are replaced
by tracking just the innermost optimized nir_loop * and getting the
predecessor and successor blocks out of it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
ac_nir_optimize_outputs() might pack user varyings into the color
built-ins. If this happens we skip adding clamping to the
components that contain the user varying.
This change also fixes a second bug where a color built-in can be
packed into a non-color slot and was no longer being clamped.
Fixes: 3777a5d7 ("radeonsi: assign param export indices before compilation")
Closes: #14443
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40594>
No intended functional change.
This prevents possible breakage due to DCE removing input loads followed
by nir_shader_gather_info updating input masks and changing the result of
ac_nir_get_io_driver_location after PS input register contents are already
determined.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41175>
This is similar to
75ede9d9bc ("intel/brw: track last successful pass and leave the loop early")
except that it uses the common nir helpers.
Note that I've also marked nir_opt_peephole_select as NOT_IDEMPOTENT
because I'm skeptical that it actually is idempotent. This differs from
both brw and radv.
I'm also marking gcm as not idempotent because it isn't idempotent in
practice on one of the shaders in my shader-db:
2bf4ba7133/fossils/blender
pipeline hash 0e972f8e349af903
This is about a 4% geomean compile time speedup on my local collection
of shaders.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
This moves from deprecated stage_id/hw_queue_id to per-context
stage_iid/hw_queue_iid, which leads to separate timelines per app.
There are several benefits to this:
- Different driver versions could be used by different apps and perfetto
won't confuse tracepoints.
- Tracepoints from different apps may not align perfectly, so previously
we got a fair amount of weird vertical ordering of tracepoints.
The downside is that info is spread across several timelines multiplied
by queues, but I think that's better since it is easier to understand
which tracepoints correspond to which app.
The changes are mostly copied from radeon/intel perfetto integration.
This also fixes app_event emission along the way, previously
debug_marker_stage was called _before_ SEQ_INCREMENTAL_STATE_CLEARED.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41105>
Add latest maintenance extensions required by Android Vulkan
requirements, except VK_KHR_maintenance5 which enables dynamic rendering
on ANGLE and causes issues with some cuttlefish targets.
Test: CI
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41210>
On my Arc A380 (DG2), this more than doubles the performance of Jeff
Bolz's cooperative matrix benchmark. With llama.cpp modified to use
cooperative matrix on DG2, performance is improved by 37%.
Closes: #15311
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
The next commit will cause some very specific phis to not be lowered to
scalar, and that's the reason the callback is used instead of
nir_lower_all_phis_to_scalar.
It's worth noting that the comment in nir_lower_phis_to_scalar.c
specifically calls out Deus Ex as the reason some phis should not be
lowered. At least on current BRW, zero shaders from Deus Ex trace were
affected for spills or fills on any Intel platform.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050005 -> 17051449 (<.01%)
instructions in affected programs: 41032 -> 42476 (3.52%)
helped: 29 / HURT: 159
total cycles in shared programs: 876411976 -> 876433702 (<.01%)
cycles in affected programs: 1455550 -> 1477276 (1.49%)
helped: 40 / HURT: 150
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01%
CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02%
Send messages: 40870114 -> 40870113 (-0.00%)
Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01%
Spill count: 3460669 -> 3460240 (-0.01%)
Fill count: 4988325 -> 4987891 (-0.01%)
Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48848112 -> 48848128 (+0.00%)
Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03%
Totals from 5713 (0.28% of 2010434) affected shaders:
Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91%
CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82%
Send messages: 284751 -> 284750 (-0.00%)
Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90%
Spill count: 857 -> 428 (-50.06%)
Fill count: 845 -> 411 (-51.36%)
Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40%
Max dispatch width: 160416 -> 160432 (+0.01%)
Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
Enable hardware decode/encode capabilities for VCN 5.3.0 by
configuring the supported codec list. This allows vainfo to
properly enumerate available codec capabilities.
Signed-off-by: Suresh Guttula <suresh.guttula@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>
Enable VK_KHR_performance_query on GFX11 (RDNA3 / RDNA3.5) now that the
selector tables and packet emission are in place.
Tested on Strix Halo with dEQP-VK.query_pool.performance_query.* (6 pass,
6 not-supported for the allowCommandBufferQueryCopies cases).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
GFX11 reorganizes the shader perfcounter blocks: wave counts move from
SQ to the SQG registers (still mapped as the SQ block in ac/), while
per-instruction counters move from SQ to the new SQ_WGP block.
Add GFX11-specific selector enums using the new block assignments and
branch radv_query_perfcounter_descs to select them on GFX11+. GL2C,
GL1C, and TCP selectors are unchanged between GFX10.3 and GFX11.
The "Instructions" (total count) counter is dropped on GFX11 as there
is no direct SQ_WGP equivalent for INSTS_ALL.
Selector indices verified against gpu_performance_api's
gpa_hw_counter_gfx11.cc.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
Some perfcounter blocks (e.g. SQ_WGP on GFX11) define num_spm_modules
but have no select1 register array. Skip the select1 loop when the
array is NULL.
This is a prerequisite for enabling performance queries on GFX11.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
Casting a negative float to uint64_t is undefined behavior. GCC 15 with
-O2 produces 0xFFFFFFFFFFFFFFFF for (uint64_t)(-32767.5f), causing snorm
clear values to be packed incorrectly (e.g. 0xFFFF instead of 0x8001 for
snorm16 -1.0). This results in wrong DCC comp-to-single clear colors and
~966 CTS snorm multisample_resolve test failures.
Fix by casting through int64_t first, which is well-defined (truncation
toward zero) and preserves the two's complement bit pattern.
Fixes: 585c25be1e ("radv: fix color conversions for normalized uint/sint formats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41164>