Commit graph

2274 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
e1df849c3c radv: Mark GTT memory as device local for APUs.
Otherwise a lot of games complain about not having enough memory,
and it is sort of local so this seems reasonable to me.

CC: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-20 18:16:16 +02:00
Samuel Pitoiset
fedd0a4215 radv/winsys: allow to submit up to 4 IBs for chips without chaining
The SI family doesn't support chaining which means the maximum
size in dwords per CS is limited. When that limit was reached
we failed to submit the CS and the application crashed.

This patch allows to submit up to 4 IBs which is currently the
limit, but recent amdgpu supports more than that.

Please note that we can reach the limit of 4 IBs per submit
but currently we can't improve that. The only solution is to
upgrade libdrm. That will be improved later but for now this
should fix crashes on SI or when using RADV_DEBUG=noibs.

Fixes: 36cb5508e8 ("radv/winsys: Fail early on overgrown cs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 18:12:26 +02:00
Samuel Pitoiset
dd069e9b41 ac/nir: handle nir_intrinsic_load_first_vertex like base_vertex
This fixes a ton of CTS crashes.

Fixes: c366f422f0 ("nir: Offset vertex_id by first_vertex instead of base_vertex")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 17:07:38 +02:00
Samuel Pitoiset
b21a4efb55 radv/winsys: allow local BOs on APUs
Ported from RadeonSI.

Local BOs ignore BO priorities, and we don't need those on APUs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:24 +02:00
Samuel Pitoiset
5c1233ed62 radv: use a global BO list only for VK_EXT_descriptor_indexing
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.

We might find a better solution in the future, but for now
just keeps two paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:18 +02:00
Samuel Pitoiset
7bd5367546 Revert "radv: Don't store buffer references in the descriptor set."
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.

One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.

With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.

This reverts commit ab6cadd3ec.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:13 +02:00
Nicolai Hähnle
24fb3e6aa1 ac/nir: use ac_build_image_opcode for image intrinsics
So that we'll use the dimension-aware intrinsics in the future.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:30:07 +02:00
Nicolai Hähnle
74063431f1 radeonsi: generate image load/store/atomic ops using ac_build_image_opcode
In preparation of dimension-aware LLVM image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:29:57 +02:00
Nicolai Hähnle
625dcbbc45 amd/common: pass address components individually to ac_build_image_intrinsic
This is in preparation for the new image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:52 +02:00
Nicolai Hähnle
f931583828 amd/common: pass new enum ac_image_dim to ac_build_image_opcode
This is in preparation for the new, dimension-aware LLVM image
intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:40 +02:00
Nicolai Hähnle
a807a9b215 ac/nir: fix atomic compare-and-swap
The LLVM instruction returns { i32, i1 }, where the i1 indicates success.
We're only interested in the first part, which is the loaded value.

Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.*

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-20 09:21:40 +02:00
Bas Nieuwenhuizen
dffdef6737 radv: Add Vega M support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-19 16:36:21 +02:00
Bas Nieuwenhuizen
d1ce31d36c radv: Add bound checking workaround for dynamic buffers.
I have seen a few applications and games do the dynamic buffer bounds incorrectly, this
make it easier to work around, e.g. for debugging.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-19 16:13:25 +02:00
Samuel Pitoiset
2f63b3dd09 radv: enable DCC for MSAA 2x textures on VI under an option
This can be enabled with RADV_PERFTEST=dccmsaa.

DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.

Vega support, as well as Polaris improvements, will be added later.

No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:55 +02:00
Samuel Pitoiset
dc3d39771f radv: decompress DCC for multisampled source images before resolving
Multisampled source images (ie. color attachments) can be now
DCC compressed, so the driver needs to perform a DCC decompression
pass before resolving

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:52 +02:00
Samuel Pitoiset
1aefb62f1e radv: add a workaround for fast clears with DCC and MSAA textures
This should be fixed at some point in order to improve
performance.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:50 +02:00
Samuel Pitoiset
373fa0b599 radv: allocate CMASK for DCC fast clear with MSAA
CMASK is required because it should be cleared to
0xCCCCCCCC for MSAA textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:48 +02:00
Samuel Pitoiset
255506c4e0 radv: implement fast color clear for DCC with MSAA
When DCC is enabled with MSAA textures, CMASK should be
cleared to 0xCCCCCCCC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:45 +02:00
Samuel Pitoiset
796b6f4aab radv: make sure to sync after resolving using the compute path
This fixes some random CTS failures:

dEQP-VK.renderpass.multisample.*.

Performing a fast-clear eliminate is still useless, but it
seems that we need to sync.

Found while running CTS with RADV_DEBUG=zerovram.

Fixes: 56a171a499 ("radv: don't fast-clear eliminate after resolving a subpass with compute")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:09:55 +02:00
Samuel Pitoiset
4a698660ae radv: dump the SHA1 of SPIRV in the hang report
Might be useful for debugging purposes, especially when we
want to replace a shader on the fly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:09:52 +02:00
Bas Nieuwenhuizen
0e10790558 radv: Enable VK_EXT_descriptor_indexing.
This adds everything except non-uniform indexing, which needs a bit
more work and testing.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
b5e04e9217 radv: Support allocating variable size descriptor sets.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
78c54acbe8 radv: Add support for variable descriptor set layouts.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
082c11e8a5 radv: Fix GetDescriptorSetLayoutSupport.
The continue means we do alignment differently than during creation,
making the buffer smaller than expected.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
d02bbde1a8 radv: Use sorted bindings for set layout creation.
Previously we did not care about havin the set storage in order,
but for variable descriptor count we want the highest binding
at the end of the storage.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
ab6cadd3ec radv: Don't store buffer references in the descriptor set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen
4b13fe55a4 radv: Keep a global BO list for VkMemory.
With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.

I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Marek Olšák
c6f1d36019 radeonsi: add support for VegaM
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-18 14:45:33 -04:00
Marek Olšák
d6a66bc8db amd/addrlib: add support for VegaM
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-18 14:45:32 -04:00
Samuel Pitoiset
893e19efb7 radv: fix scissor computation when using half-pixel viewport offset
'scale[i]' can be non-integer.

Original patch by Philip Rebohle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106074
Fixes: 0f3de89a56 ("radv: Use the guard band.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-17 22:12:14 +02:00
Jason Ekstrand
72ab499c9f anv,radv: Drop XML workarounds for VK_ANDROID_native_buffer
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-16 07:59:25 -07:00
Samuel Pitoiset
62510846b6 radv: clean up radv_decompress_resolve_subpass_src()
To handle the source color image transitions in the same place.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:21:05 +02:00
Samuel Pitoiset
56a171a499 radv: don't fast-clear eliminate after resolving a subpass with compute
That looks useless, and I think radv_handle_image_transition()
will do a fast-clear eliminate because it's called after the
resolve.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:21:02 +02:00
Samuel Pitoiset
7e84d69861 radv: handle CMASK/FMASK transitions only if DCC is disabled
DCC implies a fast-clear eliminate, so I think this sounds
reasonable.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:59 +02:00
Samuel Pitoiset
584d1f2711 radv: merge radv_handle_{dcc,cmask}_image_transition() functions
Into radv_handle_color_image_transition().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:56 +02:00
Samuel Pitoiset
d5812b900b radv: add radv_init_color_image_metadata() helper
In order to separate initialization from decompression. In the
future, that will allow us to init DCC/FMASK/CMASK in one shot.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:54 +02:00
Samuel Pitoiset
fde7b90ecf radv: make radv_initialise_cmask() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:51 +02:00
Samuel Pitoiset
790f6e4718 radv: clean up radv_handle_image_transition() a bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:49 +02:00
Samuel Pitoiset
6967d32beb radv: add radv_handle_color_image_transition() helper
To handle CMASK, FMASK and DCC transitions in the same place.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:45 +02:00
Samuel Pitoiset
c6b1f1c97a radv: handle DCC image transitions before CMASK/FMASK transitions
Mostly because DCC implies a fast-clear eliminate and we
should be able to skip some DCC decompressions by setting
a predicate like for CMASK and FMASK.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:42 +02:00
Samuel Pitoiset
79c87a45b6 radv: disable prediction only if it has been enabled
When decompressing DCC we don't enable it, so it's useless
to disable it. This reduces the number of prediction packets
sent to the GPU when performing color decompression passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:39 +02:00
Bas Nieuwenhuizen
b0e3a9b19f ac/nir: Make the GFX9 buffer size fix apply to image loads/atomics too.
No clue how I missed those ...

Fixes: 4503ff760c "ac/nir: Add workaround for GFX9 buffer views."
CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105320
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-16 11:55:48 +02:00
Daniel Schürmann
f2c6a55061 radv: enable subgroup capabilities
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:15 +02:00
Daniel Schürmann
4b0616e533 ac: handle subgroup intrinsics
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:15 +02:00
Daniel Schürmann
d5f7ebda3e ac: add LLVM build functions for subgroup instrinsics
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:09 +02:00
Daniel Schürmann
d19f20e793 ac: make ballot and umsb capable of 64bit inputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 00:52:22 +02:00
Marek Olšák
a3b785be4d winsys/amdgpu: allow local BOs on APUs
Local BOs ignore BO priorities, and we don't need those on APUs.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-13 12:31:04 -04:00
Marek Olšák
43d66c8c2d mesa: include mtypes.h less
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h

v2: fix radv build

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-12 19:31:30 -04:00
Bas Nieuwenhuizen
6ff98dbf7c radv: Implement VK_EXT_vertex_attribute_divisor.
Pretty straight forward, just pass the divisors through the shader
key and then do a LLVM divide.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-12 22:57:23 +02:00
Bas Nieuwenhuizen
7eff8d7d35 ac/surface: Allow S swizzle for displayable surfaces.
For dcn1 && < 64 bpp displayable surfaces, addrlib only accepts
S swizzles.

At the same time addrlib prefers D swizzles is allowed, so we can
just allow S swizzles as fallback.

Fixes: b64b712558 "ac/surface/gfx9: request desired micro tile mode explicitly"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-12 21:24:55 +02:00