This message has been confusing users, especially now that
popular toolkits such as Gtk started using a Vulkan renderer.
Printing a message on non-conformant implementations is also
actually not required. So let's remove it.
We haven't fully finished the GFX12 implementation yet, but on
all other hardware, RADV should work just fine, and is definitely
not meant for "testing use only".
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12314
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32930>
If there is a 4-byte hole between 2 loads, they are vectorized. Example:
load 4 + hole 4 + load 8 -> load 16
This helps GLSL uniform loads, which are often sparse. See the code for more
info.
RADV could get better code by vectorizing later.
radeonsi+ACO - TOTALS FROM AFFECTED SHADERS (45482/58355)
Spilled SGPRs: 841 -> 747 (-11.18 %)
Code Size: 67552396 -> 65291092 (-3.35 %) bytes
Max Waves: 714439 -> 714520 (0.01 %)
This should have no effect on LLVM because ac_build_buffer_load scalarizes
SMEM, but it's improved for some reason:
radeonsi+LLVM - TOTALS FROM AFFECTED SHADERS (4673/58355)
Spilled SGPRs: 1450 -> 1282 (-11.59 %)
Spilled VGPRs: 106 -> 107 (0.94 %)
Scratch size: 101 -> 102 (0.99 %) dwords per thread
Code Size: 14994624 -> 14956316 (-0.26 %) bytes
Max Waves: 66679 -> 66735 (0.08 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
There is nothing preventing ACO from generating loads with unused
components. This happens often with GLSL uniforms. Some of those loads
are partially re-vectorized after this.
radeonsi+ACO:
TOTALS FROM AFFECTED SHADERS (19564/58918)
VGPRs: 732900 -> 728448 (-0.61 %)
Spilled SGPRs: 429 -> 433 (0.93 %)
Code Size: 38446004 -> 38485612 (0.10 %) bytes
Max Waves: 305440 -> 305549 (0.04 %)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
Instead of using a different voffset VGPR per streamout vertex,
point voffset to the first vertex for all 3 vertices because
the stride and vertex index are constant and can be in the immediate
offset.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>
Walk the whole vertex stride thanks to XFB info sorted by offset, gather
individual components from same or different outputs, and once we have
gathered 4, store them as vec4.
It also removes the memory_modes field from VMEM stores because I don't
think it's needed.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>
Walk the whole vertex stride thanks to XFB info sorted by offset, gather
individual components from same or different outputs, and once we have
gathered 4, store them as vec4.
It also removes the COHERENT flag from VMEM stores because NGG streamout
doesn't use it either and I don't think it's needed.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>
This is lowered in backend compilers (LLVM or ACO) because it needs
to access ttmp registers which aren't exposed to NIR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32940>
If the remaining character check fails, we should try a later line if
skip_lines=True. So the check has to be done earlier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32902>
The loop checking if exec is overwritten didn't check for NULL
instructions, and didn't fix up reg write indices after inserting
instructions.
Fixes: fcd94a8c ("aco: move try_optimize_branching_sequence() to postRA optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32746>
This is similar to what link_intrastage_shaders is doing and it
fixes the following test:
KHR-Single-GL46.subgroups.builtin_var.compute.subgroupsize_compute
Which was failing with SPIRV but passing with GLSL, the diff being:
- SPIRV: "subgroup_size: 1"
- GLSL: "subgroup_size: 2"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32698>
The blitter VS expects coords to fit in a signed int16. When this
is not the case, use util_blitter_draw_rectangle instead.
Since util_blitter_draw_rectangle sets vertex elements, we need
to make sure they're properly restored.
The alternative to this fallback would be to pass coordinates
unpacked (so 4 SGPRs instead of 2), but this doesn't fix the
fbo-blit-check-limits test because of uv interpolation precision
issue.
Using 2 triangles instead of a rectangle + disabling
window_space_position helps but then this breaks some GLES3 tests,
like dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_x
(which doesn't pass either if u_blitter is used for all cases).
Using a single triangle covering the whole rectangles fixes all
cases but it then requires to setup scissors to not write too
much pixels...
So, instead of adding so much complexity, let's use u_blitter
for the "large coordinates" fallback, and keep the rectangle blit
for the other cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32698>
GFX11 allowed only one swizzle mode for the VRS image but GFX12 allows
all 2D non-linear swizzle modes and PC_SC_VRS_INFO needs to be
configured.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32914>
Now that ACO has become the default on pre-RDNA GPUs, all pre-merge CI
coverage of radeonsi+LLVM has disapeared. Let's fix this by making
our post-merge glcts-vangogh-valve job run inpre-merge pipelines.
However, we are limited in vangogh capacity, so rather than running the
full glcts/piglit test suites we run a fraction of it to stay under 15
minutes of execution time on a single Steam Deck.
Suggested-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32865>
This caused regression by using higher pitch than needed on compute-only
devices, resulting in video decode errors.
Fixes: 308bae950f ("ac/surface: Add RADEON_SURF_VIDEO_REFERENCE")
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32863>
The Vulkan spec says:
"The application can enable a logical operation between the
fragment’s color values and the existing value in the framebuffer
attachment. This logical operation is applied prior to updating
the framebuffer attachment. Logical operations are applied only
for signed and unsigned integer and normalized integer
framebuffers. Logical operations are not applied to floating-point
or sRGB format color attachments."
Missing VKCTS coverage has been reported.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12345
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32826>
Video decode target needs custom height alignment, but tex descriptor
still needs to be set to the original size the image was created with.
This makes the descriptor wrong for layer > 0, so we need to calculate
the layer offset and add it to bo address for this case.
Fixes: 5deb476095 ("radv: align video images internal width/height inside the driver.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32069>
ac_nir_lower_intrinsics_to_args will lower most system values.
I have to keep the divergence analysis in ACO, otherwise it goes haywire.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>