nir_opt_vectorize could replace swizzled movs with vectorized movs in a
different block. If this happens with swizzled movs in a then block, it
could leave this block empty. ir3 assumes only the else block can be
empty (e.g., when lowering predicates) so make sure ifs are in that
canonical form again.
This fixes empty predication blocks in some shaders, for example:
predt
predf
...
prede
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34272>
At least on all a6xx/a7xx, mad.f32 and mad.f16 are not fused. This means
that when the sources of a NIR ffma are all uniform we can split it in
two to execute it on the scalar ALU. This is important to reduce
register pressure and make more preambles executed early.
On fossil-db the statistics are mostly a wash as expected, but with
early preambles increasing dramatically:
Totals:
MaxWaves: 2249180 -> 2249230 (+0.00%); split: +0.01%, -0.01%
Instrs: 49668884 -> 49662951 (-0.01%); split: -0.12%, +0.11%
CodeSize: 103662656 -> 103831154 (+0.16%); split: -0.22%, +0.38%
NOPs: 8502571 -> 8495568 (-0.08%); split: -0.61%, +0.53%
MOVs: 1554442 -> 1538804 (-1.01%); split: -2.01%, +1.01%
Full: 1820906 -> 1814292 (-0.36%); split: -0.39%, +0.03%
(ss): 1168628 -> 1165868 (-0.24%); split: -1.01%, +0.78%
(sy): 616751 -> 616521 (-0.04%); split: -0.52%, +0.49%
(ss)-stall: 4384397 -> 4361662 (-0.52%); split: -1.44%, +0.93%
(sy)-stall: 17850227 -> 17858949 (+0.05%); split: -0.58%, +0.63%
Early-preamble: 102262 -> 115702 (+13.14%)
Cat0: 9375820 -> 9367978 (-0.08%); split: -0.57%, +0.48%
Cat1: 2470212 -> 2454318 (-0.64%); split: -1.28%, +0.64%
Cat2: 18673655 -> 18707106 (+0.18%)
Cat3: 14227810 -> 14211106 (-0.12%)
Cat5: 1424184 -> 1424150 (-0.00%)
Cat7: 1404718 -> 1405808 (+0.08%); split: -0.39%, +0.47%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34115>
wsi_configure_image() with the same info is already called by
configure_image() in wsi_swapchain_init(), so this second call is
unnecessary. Furthermore, calling it the second time caused a memory
leak of queue family indices array.
Fixes: d4a2c0fc ("vulkan/wsi: add a headless swapchain implementation/option")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12811
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34194>
This was handled in other instances in a previous patch, but this
instance remains, as the zlib decompression routine is slightly
different.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34118>
There are three copies of this function, all of them have the same
memory leak in them. Instead of fixing them one by one, just use a
common implementation for all three, since they already all have a
shared helper lib.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34118>
With "classic" renderpasses, the VkFramebuffer's layerCount must be 1 if
multiview is enabled. We accidentally rely on this to not disable GMEM
for multiview, and possibly for other things too. Apparently the dynamic
rendering equivalent, VkRenderingInfo::layerCount, can be anything when
multiview is enabled, and some CTS tests set it to the number of views.
Sanitize it when constructing the internal framebuffer for dynamic
rendering.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34080>
We were a bit too conservative and fully disabled LRZ for when stencil
or blending were involved. There is no need to fully disable LRZ
in those cases, only LRZ writes should be disabled.
The final rules are:
LRZ is DISABLED until depth attachment is cleared when:
- Depth Write + changing direction of depth test
e.g. from OP_GREATER to OP_LESS;
- Depth Write + OP_ALWAYS or OP_NOT_EQUAL;
- Clearing depth with vkCmdClearAttachments;
- Depth image is a target of blit commands.
- (pre-a650) Not clearing depth attachment with LOAD_OP_CLEAR;
- (pre-a650) Using secondary command buffers;
LRZ WRITE is DISABLED until depth attachment is cleared when:
- Depth Write + blending (color blend, logic ops, partial color mask, etc.);
- Fragment may be killed by stencil;
LRZ is disabled for CURRENT draw when:
- Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
- Fragment shader writes depth or stencil;
LRZ WRITE is DISABLED (via LATE_Z) for CURRENT draw when:
- Fragment may be via killed alpha-to-coverage, discard, sample coverage;
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33851>
When creating the image view in the texel buffer shader copy function,
take in account the region to copy can start in a different Z-offset
than 0.
This fixes several dEQP-VK.image.concurrent_copy.* failing tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
As we will be creating an image view that covers the region to copy,
batch all the regions that share the same depth offset and depth extent.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
Source format is not involved at all on creating the blit render pass,
so remove from the function call.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
Remapping was missing for format description which made these formats
effectively unsupported as zero format features were reported.
Fixes: 0098f8ef35 ("radv: Remap 10 and 12 bit formats to 16 bit formats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34274>
VK_MESA_image_alignment_control is used by vkd3d-proton to set
optimal alignments for images. Though, the preferred alignment was
only applied to the surface (or the stencil aspect) but not to the HiZ
surface due to the NULL check.
This caused rendering issues because swizzle modes didn't match.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12831
Fixes: 079f55d405 ("radv: advertise VK_MESA_image_alignment_control on GFX12")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34322>
This adds the latency information provided by NVIDIA. This is copied
from excel spreadsheets provided to Red Hat.
This fully passes CTS on Turing TU104 with no regressions.
I'm sure future use of some instructions like IMAD may require some
changes to this, but it should be functionally complete.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
Delays greater than 15 needs to be encoded into a nop following the
instruction. These delays will start to happen when we add accurate
latency handling and with certain instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
Otherwise we schedule this sort of thing wrong,
r0 = iadd3 r0 c[0x0][0x0] rZ
r0 = shf.l.w.i32 r0 rZ 0x2
r0 p0 = iadd3 r0 c[0x1][0x0] rZ
since raw latencies are more important than waw, but we go do a
waw for the first two instructions instead of a raw which is correct.
Fixes: 2d4e445099 ("nak/calc_instr_deps: Rewrite calc_delays() again")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
These fail on Polaris10 too.
Are these tests even valid? Is this a Zink bug?
Vulkan CTS is happy with our implementation.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28237>
We need to expose this, as we support it.
Otherwise 1x1 is assumed and we fail some CTS.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28237>
When we are using compute resolve, we can get
values the CTS does not expect due to the value
we end up writing for UNORM in
`nir_image_deref_store`.
Make the compute resolve rounding path match with
the output of the fragment shader resolve path,
by going through the same FP16 RTZ conversion as
we do for UNORM/SNORM formats.
This is why VK_EXT_sample_locations CTS was
failing on > GFX9.
On <= GFX9, I am assuming we are falling back to
RESOLVE_FRAGMENT, due to DCC stuff, which is why
it works there.
I tested a handful of images from the Vulkan CTS
for the sample locations and resolve tests for
diff UNORM formats from the qpa file forcing
FRAGMENT and with this change.
With this change, we now match on the compute
resolve path the same sha for the ones I compared
with ImageMagick `identify`.
CTS passes for: *resolve*, *image_clearing* and
*sample_locations* on RX 7900XTX.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28237>
Taken from OpenGL-Registry commit ca62982097eb
("Remove plural bindings in GL_ARB_shader_texture_image_samples (#637)")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33356>