We can't invalidate CCU if there is any dirty data that hasn't been
flushed yet. In the case where we clear depth, we know that the depth
attachment itself isn't dirty but there may be dirty data from other
renderpasses. Therefore we need to flush before invalidating depth.
Fixes: 487aa80 ("tu: Rewrite flushing to use barriers")
Closes: #6987
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17940>
(cherry picked from commit a7e64ab63c)
For example, "DS -> branch -> VMEM -> branch -> DS".
fossil-db (navi10):
Totals from 639 (0.40% of 161220) affected shaders:
Instrs: 629090 -> 628254 (-0.13%); split: -0.19%, +0.06%
CodeSize: 3410164 -> 3406748 (-0.10%); split: -0.14%, +0.04%
Latency: 7834755 -> 7821011 (-0.18%); split: -0.70%, +0.52%
InvThroughput: 1369698 -> 1374495 (+0.35%); split: -0.12%, +0.47%
A lot of the fossil-db changes are noise.
threekingdoms.8db138826c386a62.1.foz/0b222ed175eebad0 is an example of a
shader that actually has this issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c037ba1bb7 ("aco/gfx10: Mitigate LdsBranchVmemWARHazard.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
(cherry picked from commit b17e59a03b)
This might break out-of-order rasterization on GFX8-GFX9 because it
relies on the stencil write mask which can be dynamic.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17673>
(cherry picked from commit 2012246075)
This reverts commit 96fa23bca5.
The correct fix to the problem was a1bc152340, making this
change obsolete as the pass skips any vars marked with
always_active_io. There was no real advantage to allowing these
vars to be split because they can't be removed anyway. Also there
is no way to split varying arrays gracefully here due to the xfb
layout rules, and this change didn't handle arrays at all.
Removing this obsolete code also fixes an assert in the new CTS
test KHR-Single-GL45.enhanced_layouts.xfb_all_stages. The test
was legally adding xfb offsets to all vertex stages but since
we only mark the varyings in the final vertex stage with the
always_active_io flag the other stages were correctly lowering
to scalars but when an array with an offset hit this code it
asserted since it couldn't handle it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: a1bc152340 ("spirv: mark variables decorated with XfbBuffer as always active")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6928
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17878>
(cherry picked from commit 8bffd601ed)
Typo in the handwritten packing code, oof!
Fixes incorrectly repeated shadows in Neverball (among many other bugs,
I assume). Huge thanks to Lina for the idea that this was the
bug -- fixing it was a breeze from there :-)
Fixes: 9f55538834 ("agx: Pack texture ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
(cherry picked from commit 47a3f1226c)
OpenGL API calls like glClearBufferData() result in mapping/unmapping
of a given buffer by Mesa and unmapping of a host blob fails in
virglrenderer because VirGL driver uses command that is intended for
unmapping of a guest buffer. In particular this causes problem for the
"Total War: Warhammer" game that gets GL_OUT_OF_MEMORY error due to the
failed unmapping command. Fix this by setting the mapping usage flag in
accordance to the resource flags, allowing virgl_buffer_transfer_unmap()
to differentiate host buffer from guest.
Fixes: 3b54e5837a ("virgl: support PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17914>
(cherry picked from commit 46396e97be)
Fix typo in calculation of position of start of a row of tiles. This
could otherwise cause an out-of-bounds access in the next patch.
Fixes: 81d85be9a5 freedreno/gmem: Reverse order of alternative tile rows
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17888>
(cherry picked from commit 2497741a1b)
This extension introduces a new layout which allows applications
to both render and sample from the same image inside the same draw
(aka. feedback loops).
Previously, the GENERAL layout was used and this introduced some
rendering artifacts because the hw can't read&write DCC/HTILE for
the same image, and we try to keep it compressed on GFX10+.
This helps fixing corruption with D3D9 and RPCS3 games which
are candidate for feedback loops.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4411
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17883>
Just got into a midnight discussion with a hw guy.
TC_ACTION_ENA apparently doesn't invalidate L1, so don't clear
the INV_VCACHE flag.
Fixes: 4056e953fe - radeonsi: move emit_cache_flush functions into si_gfx_cs.c
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17902>
(cherry picked from commit 279315fd73)
We're missing a condition that is currently papered over by having
ANV_PIPE_HDC_PIPELINE_FLUSH_BIT in the invalidate bits.
v2: rework with simplication (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
(cherry picked from commit 5e21f47428)
the shader is already getting a -0.5,-0.5 bias, but the viewport also
needs to be shifted by 0.5 to match
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17775>
(cherry picked from commit 55a4a6b8dc)
Conflicts:
src/gallium/drivers/zink/zink_state.c
nine uses this to pass unscaled units for depth bias, which means
the units must be scaled based on the format of the depth buffer
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17775>
(cherry picked from commit 8a8edb310d)
Conflicts:
src/gallium/drivers/zink/zink_screen.h
I've noticed this before but never tracked it down, but it's annoying.
The printf hooks would crash with debug shaders when they were loaded
from the cache. This was because nothing was initing the printf hook
in the cached path so the global was never set.
No problems just always creating this afaics.
Fixes: 333ee94285 ("gallivm: rework debug printf hook to use global mapping.")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17867>
(cherry picked from commit 4c0a7a169d)
if a non-kopper swapchain image supports srgb, add a VkImageFormatListCreateInfo
to permit srgb mutability and avoid violating spec
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17773>
(cherry picked from commit 28ee911ad6)
Since the CCU only gets used for unaligned attachment stores or resolves
with the wrong formats, we can use that space for attachments in many
cases.
This gets two more of vk-5-normal's main renderpass's attachments to fit
in the next gmem_pixels increment, leaving 1 to go. Other renderpasses do
get better gmem_pixels, and a few get better tile sizes as a result, but
the fps increase from those looks to be <.2% at least.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16921>
We now choose between two (equal as of this commit) layouts based on
whether the renderpass's stores will use the CCU space, and assert that we
always know the chosen layout when we go using the gmem offsets.
This required making vkCmdClearAttachments in a secondary take the 3D path
instead of gmem blits, since secondaries only have to be compatible with
the primary's renderpass, rather than equal.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16921>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
I'm sorry to whoever wrote this, but
(x - (int) (x < 0)) ^ -((int) (x < 0))
is not an acceptable way to write iabs.
Shader-db results on Intel Tiger Lake with lower_idiv enabled:
total instructions in shared programs: 21122548 -> 21122570 (<.01%)
instructions in affected programs: 2369 -> 2391 (0.93%)
helped: 2
HURT: 8
total cycles in shared programs: 791609360 -> 791608062 (<.01%)
cycles in affected programs: 114106 -> 112808 (-1.14%)
helped: 9
HURT: 1
If we make the Intel back-end less stupid, we get to 9/1 helped/HURT for
instructions as well but that's for a different MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17845>
the bindless and push sets don't have update templates stored to
the program, so merging these loops avoids trying to destroy them
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17866>
if the tcs was generated, then the prgram was added to the non-tcs cache,
which means deleting it from the tcs+tes cache will fail and then
context_destroy will explode
Fixes: 4123ee3c71 ("zink: invoke descriptor_program_deinit for programs on context destroy")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17866>
insert_dst checked whether dst is unused, however for precolored
inputs we always want to reserve a reg for them. Input could be
unused only if we explicitly want it.
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17771>
to put it next to its only use and remove the structure fields
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17864>