Previously, we would set WGP_MODE on GFX10+ and then only on GFX10.
Because we used bitwise or, the result was WGP_MODE being set on GFX10+.
We also set the wrong bit, S_00B848_WGP_MODE instead of S_00B228_WGP_MODE.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8811>
(cherry picked from commit 2338e4ad36)
Vertex attribute bounds checking is supposed to be done per-attribute:
is_oob = index * stride + attrib_offset + attrib_size > buffer_size
but we were obtaining num_records by dividing the buffer size by the
stride, making it per-vertex:
is_oob = index * stride + (stride - 1) >= buffer_size
An example from Dead Cells (Wine) is:
attribute bindings: 0, 1, 2
attribute formats: r32g32, r32g32, r32g32b32a32
attribute offsets: 0, 0, 0
binding buffers: all the same buffer
binding offsets: 0, 8, 16
binding sizes: 128, 120, 112
binding strides: 32, 32, 32
Workaround this issue without switching to per-attribute descriptors by
rounding up the division. This is still incorrect, but it should now no
longer consider in-bounds attributes out-of-bounds.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3796
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4199
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8835>
(cherry picked from commit 56cd79b63d)
This is needed to use the new dispatch layer code. While we're here, we
clean up the context on the error path.
Fixes: 9b1138e3f0 "radv: implement VK_EXT_private_data"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
(cherry picked from commit f695957421)
On Fiji, the CTS image can cause a hang when these are enabled.
Let's enable them for Polaris and newer only, for now.
Gitlab: #4136
Fixes: 9f43b44bf0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8646>
(cherry picked from commit 3c03fa5801)
It could happen that due to inconsistent copy-propagation
v1 = p_parallelcopy v2b
instructions were left after optimization on GFX8.
Cc: 20.3
Cc: 21.0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
(cherry picked from commit 856fd4750d)
D and linear are both DISPLAY micro tiling according to ac_surface
but don't work together. This fixes an issue with GFX9+.
This fixes the SkQP WritePixelsNonTexture_Gpu test.
Fixes: 69ea473eeb ("amd/addrlib: update to the latest version")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8665>
(cherry picked from commit 12ce72fcfc)
Conflicts:
src/amd/vulkan/radv_meta_resolve.c
We were incorrectly shifting the input VGPRs for the instance ID
for chips affected by the LS VGPR init bug (ie. Vega10 and Raven).
When there is no HS threads, the hardware loads the LS VGPR
starting from VGPR 0, so they should be shifted by two.
This fixes some sort of vertex explosion with Squad, Visage, Barn
Finders and probably more titles that use tessellation. Note that
only Vega10 and Raven were affected by this bug.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4129
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3311
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8694>
(cherry picked from commit bb8f87088c)
We used to select the stencil layout even if we should have selected
the depth/stencil one.
Fixes: e4c8491bdf ("radv: implement VK_KHR_separate_depth_stencil_layouts")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8552>
(cherry picked from commit 3ef89b245e)
With RADV_THREAD_TRACE_BUFFER_SIZE=1073741824, the computed size
will overflow and be 4096 instead of 4294967296.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8616>
(cherry picked from commit c40ea24ee0)
For example:
s2: %688:s[32-33] = p_linear_phi %3:s[10-11], %688:s[32-33]
would have been considered trivial.
This might happen due to parallelcopies when assigning phi registers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 69b6069dd2 ("aco: refactor try_remove_trivial_phi() in RA")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8645>
(cherry picked from commit 824eba2148)
When NGG is used, the hw can't know the number of geometry shader
primitives. To fix that, the NGG geometry shader accumulates itself
the number of primitives by using an atomic operation directly to GDS.
Then, begin/query copy the start/stop values from GDS to the
query pool buffer using a PS_DONE event. This was actually wrong
because PS_DONE is completely asynchronous to everything and executed
when the preceding draws finish pixel shaders.
Fix this by using a COPY_DATA packet which is synced with CP. This
fixes random failures on Sienna Cichlid with
dEQP-VK.query_pool.statistics_query.*.geometry_shader_primitives.*.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8590>
(cherry picked from commit 085e2ce3d4)
Fixes several dEQP-VK.robustness.robustness2.* tests on GFX8. Generations
other than GFX8 don't fail the tests because bounds-checking is done using
the index (making it per-vertex).
fossil-db (Polaris):
Totals from 1387 (0.99% of 140385) affected shaders:
(no statistics affected)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 03a0d39366 ("aco: use MUBUF in some situations instead of splitting vertex fetches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7834>
(cherry picked from commit 914c61d6c0)
In some rare cases, L2 needs to be flushed if an image is affected
by the pipe misaligned issue. This is roughly based on AMDVLK.
I confirmed that disabling TC-compat HTILE, and respectively DCC,
for the relevant images also fixes the regressions below.
This fixes some regressions introduced with L2 coherency for
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_* and for
dEQP-VK.renderpass2.suballocation.multisample_resolve.*.
Fixes: 4a783a3c78 ("radv: Use L2 coherency on GFX9+.")
Co-Authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8557>
(cherry picked from commit 4c99d6ff54)
The driver used to invalidate the vector cache for meta operations
but this has been removed and I think it should be restored to fix
a bunch of regressions on GFX8.
This probably needs to be cleaned up but this is a hotfix.
This fixes a bunch of regressions and flakes on GFX8 like
dEQP-VK.pipeline.multisample.sample_locations_ext.draw.color.samples_4.*.
Fixes: 8f8d72af55 ("radv: Use access helpers for flushing with meta operations.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8573>
(cherry picked from commit 8882abe47e)
v_or_b32 with a v2b definition should use SDWA if is_partial=true.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 56345b8c61 ("aco: allow reading/writing upper halves/bytes when possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8577>
(cherry picked from commit fcda9b6737)
This restores the previous logic because L2 coherency was fully
implemented. It appears that flushing L2 metadata with a CS_DONE
event hangs.
This fixes GPU hangs with Monster Hunter World.
Fixes: 4a783a3c ("radv: Use L2 coherency on GFX9+.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8566>
(cherry picked from commit c3ac6f7cd7)
The flush VA space was only allocated for command buffers on the
graphics queue. Also, the ZPASS_DONE event should never be emitted
on compute queues because it hangs.
Invalidating the L2 metadata cache is only required for coherency
between the RBs and L2, so only on the graphics queue.
The L2 cache is invalidated at beginning of any IBs and that should
also invalidate the L2 metadata cache for compute anyways.
Fixes: 4a783a3c ("radv: Use L2 coherency on GFX9+.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8494>
(cherry picked from commit c6849f9687)
When both operands of a v_sub (same apply for v_add) are mul and one
already uses clamp/omod, pick the other operand to get a chance to
combine to a MAD.
No fossils-db changes.
Co-authored-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
(cherry picked from commit 01134b0bfe)
It should work fine now.
This gives +1-2% improvements with Control MSAA (2x and 4x)
on Sienna.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8413>
Especially on GFX10 we can avoid pretty much all L2 flushes.
However, instead of that we have to do L2_METADATA invalidations. We
do that every time we could possibly be reading new DCC/HTILE info
from the L2 cache in shaders.
Benchmark results, basemark on high preset with a navi10 on profile_standard
(which is slower than a navi10 on default settings, please don't compare
to random navi10 results you find)
before:
5932
5928
5937
after:
6011
6013
6009
So this looks like a >1% increase.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
This way we're properly using the vulkan barrier paradigm instead
of adhoc guessing what caches need to be flushed. This is more robust
for cache policy changes as we now don't have to revisit all the meta
operations all the time.
Note that a barrier has both a src and dst part though. So
barrier:
flush src
meta op
flush dst
becomes
barrier:
flush barrier src
flush meta op dst
meta op
flush meta op src
flush barrier dst
And there are some places where we've been able to replace a CB flush
with a shader flush because that is what we'd need according to vulkan rules
(and it turns out that in the cases the CB flush mattered the app will set the
bit in one of the relevant flushes or it was needed as a result of an optimization
that we counter-acted in the previous patch.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
To cancel the optimization in radv_dst_access_flush if these helpers
get used by meta operations.
We could also remove that optimization but I think this triggers less
often as all SHADER_WRITE flushes on images not supporting STORAGE should
be meta
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>