Commit graph

3030 commits

Author SHA1 Message Date
Rhys Perry
7453c1adff radv: round vgprs/sgprs before calculating max_waves
Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9.

pipeline-db (ACO/Vega):
SGPRS: 11000 -> 11000 (0.00 %)
VGPRS: 3120 -> 3120 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 164328 -> 164328 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1125 -> 1000 (-11.11 %)

v2: consider wave32

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-23 19:11:20 +01:00
Samuel Pitoiset
f11ea22666 radv: fix a performance regression with graphics depth/stencil clears
I recently changed the slow depth/stencil clear path to make sure
depth values are explicitly exported by the fragment shader. This
is actually only useful when VK_EXT_depth_range_unrestricted is
enabled.

While this path is correct, it introduced a performance regression
with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and
probably more titles. This is because it prevents the hardware
to do some optimizations like discarding fragments.

This commit re-introduces the previous (a bit faster) slow
depth/stencil clear path and it selects the unrestricted path
only if VK_EXT_depth_range_unrestricted is enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 10:23:47 +02:00
Samuel Pitoiset
7562a2cbe3 radv: fix vkUpdateDescriptorSets with inline uniform blocks
descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.

This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1

Fixes: 8d2654a419 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 09:59:22 +02:00
Samuel Pitoiset
9c92a21fe5 radv/gfx10: fix 3D images
GFX10 does act like GFX9 actually.

This fixes
dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 09:45:49 +02:00
Samuel Pitoiset
41ace1d939 radv/gfx10: re-enable fast depth/stencil clears with separate aspects
It used to cause weird issues on GFX10 in the past with vkmark and
Wreckfest, and they can't be reproduced now. Shadow Of Mordor
(Vulkan beta) hits that path and it works fine.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 09:18:06 +02:00
Samuel Pitoiset
956d825ed8 radv: do not emit rbplus if attachments are undefined
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.

This just prevents a crash and rbplus probaby needs more work.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 08:57:31 +02:00
Samuel Pitoiset
411ad8e7c5 radv: add an assertion in radv_gfx10_compute_bin_size()
To prevent out of bounds access.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 08:33:12 +02:00
Samuel Pitoiset
f4ab58c1a0 radv: do not create meta pipelines with 16 samples
The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.

This fixes a conditional jump reported by Valgrind on GFX10:

==194282== Conditional jump or move depends on uninitialised value(s)
==194282==    at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282==    by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282==    by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282==    by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282==    by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282==    by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282==    by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282==    by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282==    by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282==    by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080

This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-23 08:33:08 +02:00
Samuel Pitoiset
a13320370e radv: fix updating bound fast ds clear values with different aspects
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.

Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-22 11:16:13 +02:00
Samuel Pitoiset
b72205a4c1 radv: advertise VK_KHR_spirv_1_4
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 09:21:40 +02:00
Samuel Pitoiset
b139198b06 radv: do not dump descriptors twice in hang reports
If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 08:50:39 +02:00
Samuel Pitoiset
cf5e55558e radv: dump trace files earlier if a GPU hang is detected
To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 08:50:39 +02:00
Samuel Pitoiset
bc2319deb2 radv: print which ring is dumped in hang reports
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 08:50:39 +02:00
Samuel Pitoiset
076f9dce7c radv: do not print useless descriptors info in hang reports
This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 08:50:39 +02:00
Samuel Pitoiset
9da94e510c radv: enable VK_KHR_shader_float_controls on GFX6-GFX7
Disable 16-bit features because fp16 isn't exposed on these chips.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 08:47:28 +02:00
Samuel Pitoiset
7c50214aab radv: implement VK_KHR_shader_float_controls
This exposes what's required for DX and this is what we already
configure. The driver flushes denorms for FP32 and preserves them
for FP16/FP64. Note that we can't allow both preserving and
flushing denorms because this won't work for merged shaders. This
will require LLVM to update the float mode register to make it work.

Only enabled on GFX8+ with the LLVM path because it's untested on
previous chips and ACO doesn't support it.

This extension is required for SPIRV 1.4.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-18 16:55:58 +02:00
Bas Nieuwenhuizen
fd21ee8b52 radv: Fix single stage constant flush with merged shaders.
e.g. a VERTEX only flush with tess on Vega should look at the TCS
to see which bits are needed.

CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1953
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-18 10:49:29 +00:00
Samuel Pitoiset
c644644c65 radv: fix DCC fast clear code for intensity formats (correctly)
Previous fix was pretty bogus.

This fixes a rendering regression with Nier (minimap too large).

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1943
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1952
Fixes: ea92273cea ("radv: fix DCC fast clear code for intensity formats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-17 15:29:43 +02:00
Samuel Pitoiset
4a3bdc6d22 Revert "radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+"
This reverts commit 2ca8629fa9.

This was initially ported from RadeonSI, but in the meantime it has
been reverted because it might hang. Be conservative and re-introduce
this packet emission.

Unfortunately this doesn't fix anything known.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-15 15:58:34 +02:00
Samuel Pitoiset
50c8c4144b radv: rename VK_KHR_shader_float16_int8 structs/constants
Trivial change.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-15 12:13:53 +02:00
Samuel Pitoiset
ea92273cea radv: fix DCC fast clear code for intensity formats
This fixes a rendering issue with DiRT 4 on GFX10. Only GFX10 was
affected because intensity formats are different.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1923
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-14 08:36:14 +02:00
Eric Engestrom
48289d8853 radv: add exported symbols check
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-13 17:40:54 +01:00
Rhys Perry
ba71be228f radv/aco: disable NGG when ACO is used
Note that radv_device.c still has to be modified to use ACO with Navi.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 20:02:36 +00:00
Bas Nieuwenhuizen
e6986bcb73 radv: Enable VK_ANDROID_external_memory_android_hardware_buffer.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
e92b9c5f4f radv: Check the size of the imported buffer.
This is a security feature to disallow malicious apps from passing
a buffer that is too small.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
dad047a56a radv: Expose image handle compat types for Android handles.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
1b0ceba925 radv: Allow Android image binding.
Using delayed layout of images.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
83a012b603 radv/android: Add android hardware buffer import/export.
Support does not include images yet.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
adad61239c radv: Deal with Android external formats.
To abstract things a bit, this adds a helper function in radv_android.c.
However, this means we have to link in radv_android.c on non-android as
well, which means some scaffolding changes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
041fc7beb8 radv: Derive android usage from create flags.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
53b1372571 radv: Disallow sparse shared images.
Since we really cannot share them ever.

Also remove an unused switch.

Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
f36b52740a radv/android: Add android hardware buffer queries.
Derived from the Intel code.

For the internal format we just use the internal Vulkan format,
as we have Vulkan formats for all android formats we care about.

For the ycbcr properties we just do something. I do not have a real
clue what would be recommended.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
a34e4dd0d2 radv/android: Add android hardware buffer field to device memory.
You cannot go from BO to Android hardware buffer, so for export we
have to remember it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
9ea72b5337 radv: Add VK_ANDROID_external_memory_android_hardware_buffer.
Still disabled but now we can add entrypoints.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
4a495e1a85 radv: Unset vk_info in radv_image_create_layout.
For better test coverage of this corner case.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
64768111c3 radv: Handle slightly different image dimensions.
The minigbm comment really says it all. We should
fix minigbm as well, but for now this is the more
robust solution.

Note that this only changes width and height for
the surface creation, not for the image and hence
also not for the sampler, where it would wreak
havoc due to the normalized coords.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
852c64ca65 radv: Delay patching for imported images until layout time.
We want this flexibility because in GFX10 we lose any stride fields,
so we have to make sure our width/height are in alignment with
the external image we import.

Furthermore, we need the ability to inject tiling modifiers on import
time which is strictly after create time for Android. So, with the
layout & patch functions being fully independent of pCreateInfo, we
can delay it until import/bind time.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
2ab4d418f9 radv: Split out layout code from image creation.
So we can delay the layout until later in some import cases.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
825ddfee59 radv: Handle device memory alloc failure with normal free.
Less duplication/complexity.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
e1469c02cf radv: Cleanup buffer_from_fd.
Unused stride/offset args.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 17:02:34 +00:00
Bas Nieuwenhuizen
a9687c4e05 radv: Implement & enable VK_EXT_texel_buffer_alignment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-10 13:24:16 +00:00
Samuel Pitoiset
9d17d97ee4 radv: use a compute shader for copying timestamp query results
When the timestamp is not ready (ie. UINT64_MAX), the availabily bit
should be zero. The previous code used to copy the timestamp value
as the availabily bit and that's completely wrong.

Because it's not that simple to emit a conditional with the CP, the
driver now uses a compute shader for copying timestamp query results.

Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-10 13:23:22 +02:00
Samuel Pitoiset
dad80eadb2 radv: sync before resetting query pools if timestamps have been written
Otherwise, the GPU might write timestamp queries after the reset
operation. This is similar to other query operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-10 13:23:20 +02:00
Samuel Pitoiset
42b2d1119a radv: get the device name from radeon_info::name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-10 08:15:41 +02:00
Samuel Pitoiset
030e67fac3 radv: bump minTexelBufferOffsetAlignment to 4
The spec has probably been misinterpreted during RADV bringup.

This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-09 11:22:58 +00:00
Samuel Pitoiset
cbd6f0a0c2 radv: implement VK_KHR_shader_clock
NIR->LLVM and ACO already support nir_intrinsic_shader_clock.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-09 08:43:14 +02:00
Timur Kristóf
3a08110d43 amd: Move all amd/common code that depends on LLVM to amd/llvm.
This commit is a step towards the goal of being able to build RADV
without LLVM. In the future we would like to offer the option to
use RADV solely with ACO. There is still a need for the common AMD
code located in amd/common but the LLVM specific parts need to be
separated.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-08 00:44:08 +00:00
Rhys Perry
a87b0f5141 radv/aco,aco: set lower_fmod
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).

Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)

At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.

v2: remove fmod/frem handling in init_context()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-04 14:00:46 +00:00
Samuel Pitoiset
5ebe1a17e9 radv: enable lower_fmod for the LLVM path
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.

This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-03 18:15:14 +02:00
Bas Nieuwenhuizen
c837872fba radv: Fix warning in 32-bit build.
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-03 13:06:08 +00:00