Otherwise, Yakuza hangs the GPU with DXVK. We don't know if
linetrip and pointlist are affected, so my point is to do that
only for triangle strips.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Feral games aren't affected because they don't decompress DCC.
F1 2018 has one DCC decompression per frame, but I don't see
any performance improvements. This new predicate will be
probably more useful for DCC/MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's somehow similar to the FCE predicate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
After going through the spec changelog, it looks like RADV
is up to date. Note that ANV also reports 1.1.90.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan working group recently discovered that we made a mistake in
assuming that PCI domains are 16-bit even though they can potentially be
32-bit values. To fix this, the next spec update will change the types
in the VK_EXT_pci_bus_info struct to be 32 bits which will be a
backwards-incompatible change. Normally, Khronos tries very hard to
never make backwards incompatible changes to specs. Hopefully, the
extension is new enough (2 months) that there are no shipping apps which
use the extension so this should be safe.
This commit disables the extension for both anv and radv in mesa and
should be back-ported to 18.3 ASAP so we avoid any potential issues with
new apps running on old drivers. I'll send out a commit (which we can
also back-port to 18.3 if we really care) to re-enable the extension in
both drivers once this week's spec update ships. The one known use of
this extension is internal to mesa and will continue working with the
extension disabled and will naturally update when we get a new header.
Cc: "18.3" <mesa-stable@lists.freedesktop.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Nothing to do, the compiler already handles that.
All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some
16-bit tests that are quite related to fdo bug #108114.
Only enable the extension on CIK+ because it might not work on SI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In case we are unlucky if the low part is 0xffffffff.
Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the driver used a compute shader for resetting a query pool,
it should be completed when caches are flushed.
This might reduce the number of stalls if operations are done
between vkCmdResetQueryPool() and vkCmdBeginQuery()
(or vkCmdWriteTimestamp()).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
As done for vkCmdBeginQuery() already. Prevents timestamps from being
overwritten by previous vkCmdResetQueryPool() calls if the shader path
was used to do the reset.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.
Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.
This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes some crucible 3d miptree tests I've been working on
when executed using the compute shader path.
Fixes: d08f267814 (radv/gfx9: fix 3d image to image transfers on compute queues.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days we don't always allocate scanout compatible textures anymore.
That does mean we have to fix the radv android WSI though.
Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Update to the internal master as of 2018-11-15.
This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Viewport/scissor don't need to be updated for array textures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Currently only true if RADV_PERFTEST=dccmsaa is set.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver doesn't support DCC/CMASK for mipmapped textures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Improves performance in Talos by about 15% (and significant improvements
in RotR and possibly other but did not bench with final patch) on
kernel 4.19 and earlier.
On 4.20+ a similar effect comes from
433ca054949a "drm/amdgpu: try allocating VRAM as power of two"
v2: Do not impact the alignment of the physical memory.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Mirrors AMDVLK. Looks like if we go over the alignment of height
we actually start to change the addressing. Seems like the extra
miplevels actually work with this.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245
Fixes: f6cc15dccd "radv/gfx9: fix block compression texture views. (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We used the layer count which results in an off by one error.
Not sure this really affects anything.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
We don't need to flush anything before these two commands as well.
This is because they have to be externally synchronized, so the
app should have called CmdPipelineBarrier() prior to that and the
driver should have flushed the caches.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
'post_flush' is only set to NULL for the normal clear path
(ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage()
are affected commands).
Because these two operations have to be externally synchronized
with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT,
it's useless to set those flags internallY.
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded
by RADV_CMD_FLAG_INV_GLOBAL_L2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CP DMA can only be busy when the driver copies buffers. The
only affected Vulkan commands are vkCmdCopyBuffer() and
vkCmdUpdateBuffer() (because we fallback to a copy depending on
a threshold). Clear operations are currently not concerned
because the driver always syncs after the last DMA operation.
Per the spec, these two operations have to be externally
synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Unnecessary as they allow the app to call vkCmdPipelineBarrier()
inside the render pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We won't have a var to load from, so don't try to the processing
required if we don't need it.
This avoids crashes in:
dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes an assertion in SoTTR.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.
This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
BEFORE: 235 us
AFTER: 13 us
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>