The progressive YUV shader used in vl_compositor_yuv_deint_full
does the same thing as util_compute_blit, but it also supports rotation.
Remove vlVaPostProcBlit and instead move the code to vlVaPostProcCompositor.
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32919>
Copy sampler view for last component to remaining components.
Fixes sampling from Y8_400 luma-only format.
Fixes: 8a20e634ce ("gallium/vl: Add plane order for Y8_400 format")
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32919>
Ensure we've read all the relevant NIR state before freeing it for the
current shader.
Also ensure we free the shaders in the same order we compile them.
Fixes: d93f9d6d1a ("panvk: use static noperspective when statically linking VS and FS")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33011>
We're already limiting the maximum texture-size based on the available
system memory, so we shouldn't really need to limit this here. In
addition, the state-tracker also limits the max framebuffer size to
16384, so we don't have to worry about limiting this to the framebuffer
size either.
So I don't think we have a good reason to artificially limit the texture
size here. This allows us to support larger textures than 8192, which is
especially useful to support OpenCL images with RustiCL.
Unfortunately, while the HW supports up to 64k texture sizes, Gallium
currently caps us at 32k. So let's stick with that as the new limit for
now.
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32866>
We don't want huge textures to take up large amounts of system memory.
On a system with 1GB of RAM, this would limit us to 256 MB, which is the
same memory usage as a texture of 8192 x 8192 with 4 bytes per
component, which is what we currently limit max texture size to anyway.
The goal here is really to allow using larger textures on systems with
more memory, but that bit comes in a later patch in this series.
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32866>
We're doing a better job at selecting the tiler hierarchy mask in PanVK,
so let's move that to common code and reuse it for the Gallium driver as
well.
The logic to disable the first level for large tile-sizes has been left
at the call-sites, because this is specific to V10 GPUs and later, so it
doesn't apply to the JM code-paths.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32866>
We can handle invalidation of the FBO attachments at job
creation. It solves that we were skipping invalidations in jobs
that had been created by a clear call, as before this change
invalidations were only taken into account the first draw calls
of the job. In these cases where there is a clear after FB
invalidation the resource attachment was tracked as invalidated
for more time than expected. So the stores of the job with the
clear were not being loaded by the next job attaching it because
of the not correct application of the invalidation.
Fixes: 6c46890325 ("v3d: avoid load/store of tile buffer on invalidated framebuffer")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12456
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33148>
There are no calls anymore. It's just a loop with a switch now, which should
be implemented as a jump table by the compiler. The only calls are to
the pipe_context functions.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33087>
gcc should generate a jump table for the switch, so it should be faster than
indirect calls after we inline the calls in the next commit.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33087>
When a 2D_ARRAY view of a 3D image is created, the layer count of the 2D
view should be based on the depth of the 3D image. When
VK_REMAINING_ARRAY_LAYERS is used, we were incorrectly calculating them
from the layer count of the base image.
Fixes future tests: dEQP-VK.renderpass*.remaining_array_layers.*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33075>
On the last iteration of the loop, `offset` will point to the location
just beyond the last instruction in the program. If the program exactly
fills `p->store` then calling `next_offset()` will read out of bounds.
Instead just let the inner while loop call `next_offset()` one
additional time.
Fixes: a35b9cb625 ("i965: Add annotation data structure and support code.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12486
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33101>
On the last iteration of the loop, `offset` will point to the location
just beyond the last instruction in the program. If the program exactly
fills `p->store` then calling `next_offset()` will read out of bounds.
Instead just let the inner while loop call `next_offset()` one
additional time.
Fixes: a35b9cb625 ("i965: Add annotation data structure and support code.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12486
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33101>
Another drm master may have previously set these to non-zero values, which
can change the image in undesired ways.
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32670>
On AMD, this is a clear win. 2.0 is a free constant,
the multiplication can be fused into fma, or it can
be done as a free output modifier. Otherwise, fmul
and fadd have the same throughput/latency, with the only
possible downside being potentially power consumption.
For other hardware this might not be as clear,
but we should at least choose one option for NIR because
it allows more CSE.
Foz-DB Navi21:
Totals from 12231 (15.41% of 79395) affected shaders:
MaxWaves: 309068 -> 309032 (-0.01%)
Instrs: 11826395 -> 11790132 (-0.31%); split: -0.31%, +0.00%
CodeSize: 63531496 -> 63512868 (-0.03%); split: -0.03%, +0.00%
VGPRs: 551256 -> 551328 (+0.01%); split: -0.00%, +0.02%
SpillSGPRs: 984 -> 979 (-0.51%)
Latency: 88486492 -> 88394296 (-0.10%); split: -0.11%, +0.01%
InvThroughput: 22360595 -> 22300790 (-0.27%); split: -0.27%, +0.00%
VClause: 226267 -> 226273 (+0.00%); split: -0.01%, +0.01%
SClause: 293820 -> 293783 (-0.01%); split: -0.02%, +0.00%
Copies: 727187 -> 727106 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 539623 -> 539625 (+0.00%)
PreVGPRs: 440843 -> 440946 (+0.02%); split: -0.00%, +0.03%
VALU: 8324962 -> 8288809 (-0.43%); split: -0.43%, +0.00%
SALU: 1278550 -> 1278538 (-0.00%); split: -0.00%, +0.00%
VMEM: 440600 -> 440599 (-0.00%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32989>
The drm device file is closed and re-opened between perfetto traces. So
we either need to re-create the fd_device, or use fd_device_new_dup() so
we hold on to our own fd. The former is preferred so that the kernel
can realize when we are no longer reading the perfcntrs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33073>