Structured tagging captures a checksum of the component we think we're
building, and verifies this through the chain.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34813>
Bumping DEBIAN_BASE_TAG already triggers a rebuild of the test-gl and
test-vk containers, so their tags don’t need to be updated manually.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34813>
The zink-venus-lvp job wasn’t sourcing setup-test-env.sh like the other
Xvfb jobs, which will be necessary for an upcoming change.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34813>
Instead of reuploading them to a fresh BO every time the blend state
changes. This allows us to drop the separate blend shader cache for the
fb preload shaders.
This improves gfxbench gl_driver FPS on G610 from 42.39 to 61.94,
which is now slightly faster than the DDK (57.76).
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34666>
This improves blend shader performance slightly over loading the
constants from the sysval UBO. On bifrost, we have 128 FAU words, so
reserving 4 words for blend constants is not a significant cost. On
midgard, register mapped uniforms share space with working registers.
With high working register pressure, we only allocate 32 uniform
registers, and so would lose 1/8 of available space to blend constants
if we implemented the same optimization.
This improves gfxbench gl_driver FPS on G610 from 40.48 to 42.39. I did
not measure any regressions on benchmarks I tested that did not use
blend shaders.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34666>
This is similar to the approach in panvk, where we pass blend constants
to the blend shader in fixed FAU slots instead of specializing the
shader on blend constants. TODO: explain midgard stuff
This eliminates the blend shader variant cache, which performed very
badly when the working set of blend constants in an application was >32
(the maximum number of variants stored). Just increasing the cache size
like we did in f1f39fa645 ("panfrost: Increase the limit for blend
shader variants") would help for applications with a larger static set,
but we would still have cache thrashing on applications which change the
blend constants dynamically.
For gfxbench gl_driver, which uses 386 blend constant values, this
improves FPS on a G610 from 6.06 to 40.48. Most applications are
unaffected, because they don't use enough constant values to cause
thrashing.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34666>
We want the sysval UBO at a fixed index in order to pass dynamic blend
constants to blend shaders, which share UBOs with the fragment shader.
This requires shifting the existing UBOs to make space for a new UBO at
a low index. This remapping happens late, once we know whether sysvals
are actually used by a variant, so applications still use the original
indices.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34666>
The blend shader cache from pan_blend.h is not used in panvk, which has
it's own blend shader cache and compilation entrypoint. Moving this
allows us to use gallium-specific things in the cache.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34666>
VK_ERROR_INITIALIZATION_FAILED will fail physical device enumeration.
Returning VK_ERROR_INCOMPATIBLE_DRIVER means that the driver can still
be used on supported GPUs when multiple GPUs are installed.
cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34783>
This warning message has been annoying people for a long time. Let's
just get rid of it. I don't think it's really helping anyone with
anything and it's just burning logging space.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34853>
We represent vectors as packed anyway so all these ops are just data
motion that we already know how to do. Calling into NIR for these
doesn't really help. It also avoids potential optimization loops in NIR
where pack op lowering conflicts with itself. It's simpler just to
handle it all in the back-end and trust our prmt optimization and copy
propagation to clean it all up.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34849>
Otherwise, these can cause infinite loops in optimization because there
aren't _split variants and the optimizer tries to combine and split
things infinitely.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34849>
The vdrm_vpipe_connect() prototype defined in vdrm.c doesn't match
vdrm_vpipe_connect() defined in vdrm_vpipe.c. This leads to a compilation
warning about the wrong proto when Mesa linked with enabled LTO. Fix the
vdrm_vpipe_connect() definition.
Fixes: bf0e3d6274 ("virtio/vdrm: Add vtest backend")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34819>
Previously, this function tried to swap the instruction which is not
v_mov_b32, so that it doesn't introduce any new OPY-only instructions. If
both were v_mov_b32, it swapped Y. Since this makes Y opy-only, this can't
be done if X is also opy-only.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 408fa33c09 ("aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13101
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34841>
this is useful across drivers for maint5 semantics on mobile hw.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34762>
When copying between buffers, find the biggest possible block size usable
for all copy regions. A common block size is used since using different
block sizes can require additional flushing between different blocks.
Besides the single-byte and 4-byte block sizes, also allow for 16-byte
block size and the appropriate corresponding format. Using bigger block
size when possible helps potentially reduce the number of required
CP_BLIT operations. Tested on the Crucible benchmarks, especially for
larger copy regions this can improve throughput up to 3x.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34587>
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
Will be useful if app doesn't specify depth direction correctly.
E.g. the capture of "Sons of The Forest" I have has a shader
where `gl_FragDepth` has `layout(depth_less)`, but the output for different
fragments is actually sometimes less, sometimes more than the original depth
by a tiny margin.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34423>
Specifying depth write direction in shader may help us. E.g.
If depth test is GREATER and FS specifies FRAG_DEPTH_LAYOUT_LESS
it means that LRZ won't kill any fragment that shouldn't be killed,
in other words, FS can only reduce the depth value which could
make fragment to NOT pass with GREATER depth test. We just have to
enable late Z test.
There is the same concept in D3D11 and it is seen e.g. in "Stray" game.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34423>
Add a debug option which checks the status of a command buffer, if
it has completed execution on the GPU, before it's reset or
destroyed.
This works by getting the GPU to write to a specific memory address
on vkBeginCommandBuffer and vkEndCommandBuffer, so the CPU can check
that the GPU has written the TU_CMD_BUFFER_STATUS_IDLE to the slot
before actually resetting or freeing the command buffer.
This can to help in debugging sync issues within the driver.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34383>
Consider the flag from PPS when setting tc/beta offset.
This fixes some artifacts when decoding a hevc video,
hevc_scaling_list4.mkv from Lynne.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34782>
This is getting complicated and depends on some inter-linked details for
safety such as values being in-range. It's safer if the rest of the IR
is forced to use public interfaces.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
wherever we check that src_mod is none.
This commit simply does:
s/src_mod.is_none()/is_unmodified()/
across all of nak except the definition of is_unmodified() itself.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: bad23ddb48 ("nak: Add F16 and F16v2 sources")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>
This lets us store up to 16 SSAValues in an SSARef, while keeping the
common case of 4-or-fewer SSAValues allocation-free.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34794>