I recently changed the slow depth/stencil clear path to make sure
depth values are explicitly exported by the fragment shader. This
is actually only useful when VK_EXT_depth_range_unrestricted is
enabled.
While this path is correct, it introduced a performance regression
with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and
probably more titles. This is because it prevents the hardware
to do some optimizations like discarding fragments.
This commit re-introduces the previous (a bit faster) slow
depth/stencil clear path and it selects the unrestricted path
only if VK_EXT_depth_range_unrestricted is enabled.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.
This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1
Fixes: 8d2654a419 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
GFX10 does act like GFX9 actually.
This fixes
dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It used to cause weird issues on GFX10 in the past with vkmark and
Wreckfest, and they can't be reproduced now. Shadow Of Mordor
(Vulkan beta) hits that path and it works fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.
This just prevents a crash and rbplus probaby needs more work.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.
This fixes a conditional jump reported by Valgrind on GFX10:
==194282== Conditional jump or move depends on uninitialised value(s)
==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080
This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.
Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Disable 16-bit features because fp16 isn't exposed on these chips.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This exposes what's required for DX and this is what we already
configure. The driver flushes denorms for FP32 and preserves them
for FP16/FP64. Note that we can't allow both preserving and
flushing denorms because this won't work for merged shaders. This
will require LLVM to update the float mode register to make it work.
Only enabled on GFX8+ with the LLVM path because it's untested on
previous chips and ACO doesn't support it.
This extension is required for SPIRV 1.4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
e.g. a VERTEX only flush with tess on Vega should look at the TCS
to see which bits are needed.
CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1953
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This reverts commit 2ca8629fa9.
This was initially ported from RadeonSI, but in the meantime it has
been reverted because it might hang. Be conservative and re-introduce
this packet emission.
Unfortunately this doesn't fix anything known.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a rendering issue with DiRT 4 on GFX10. Only GFX10 was
affected because intensity formats are different.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1923
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Note that radv_device.c still has to be modified to use ACO with Navi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
This is a security feature to disallow malicious apps from passing
a buffer that is too small.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
To abstract things a bit, this adds a helper function in radv_android.c.
However, this means we have to link in radv_android.c on non-android as
well, which means some scaffolding changes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since we really cannot share them ever.
Also remove an unused switch.
Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Derived from the Intel code.
For the internal format we just use the internal Vulkan format,
as we have Vulkan formats for all android formats we care about.
For the ycbcr properties we just do something. I do not have a real
clue what would be recommended.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The minigbm comment really says it all. We should
fix minigbm as well, but for now this is the more
robust solution.
Note that this only changes width and height for
the surface creation, not for the image and hence
also not for the sampler, where it would wreak
havoc due to the normalized coords.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We want this flexibility because in GFX10 we lose any stride fields,
so we have to make sure our width/height are in alignment with
the external image we import.
Furthermore, we need the ability to inject tiling modifiers on import
time which is strictly after create time for Android. So, with the
layout & patch functions being fully independent of pCreateInfo, we
can delay it until import/bind time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When the timestamp is not ready (ie. UINT64_MAX), the availabily bit
should be zero. The previous code used to copy the timestamp value
as the availabily bit and that's completely wrong.
Because it's not that simple to emit a conditional with the CP, the
driver now uses a compute shader for copying timestamp query results.
Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise, the GPU might write timestamp queries after the reset
operation. This is similar to other query operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The spec has probably been misinterpreted during RADV bringup.
This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit is a step towards the goal of being able to build RADV
without LLVM. In the future we would like to offer the option to
use RADV solely with ACO. There is still a need for the common AMD
code located in amd/common but the LLVM specific parts need to be
separated.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).
Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)
At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.
v2: remove fmod/frem handling in init_context()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.
This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>