Commit graph

3177 commits

Author SHA1 Message Date
Samuel Pitoiset
fd4041987b ac: add ac_build_load_helper_invocation() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
590a4c8981 ac: add ac_build_ddxy_interp() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
4cb13e9462 ac: add ac_build_umax() and use it where possible
This changes the predicate from LessThan to Equal.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
cf88bfa75a ac/nir: make use of ac_build_umin() where possible
This changes the predicate from LessThan to Equal.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:54 +02:00
Samuel Pitoiset
15dd81913f ac/nir: make use of ac_build_imin() where possible
This changes the predicate from LessThan to Equal.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:54 +02:00
Samuel Pitoiset
d7a0c0d53b ac/nir: make use of ac_build_imax() where possible
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 17:30:54 +02:00
Timothy Arceri
d62d434fe9 ac/nir_to_llvm: add image bindless support
With this all piglit bindless image tests pass on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri
55fb93b586 ac/nir_to_llvm: make get_sampler_desc() more generic and pass it the image intrinsic
This will be required by the bindless support in the following patches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst
d7bbb3caf1 glsl_to_nir: handle bindless textures
v2: add support for AMD

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Samuel Pitoiset
09b4049be3 radv: enable VK_AMD_gpu_shader_half_float
Should be safe to enable as all instructions seem to support 16-bit.
Unfortunately, there is no CTS test.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-10 09:07:17 +02:00
Rhys Perry
fd1fc255d9 ac: add 16-bit support to ac_build_ddxy()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-10 09:05:58 +02:00
Samuel Pitoiset
bc6d486c78 ac/nir: fix nir_op_b2f16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-10 09:05:55 +02:00
Bas Nieuwenhuizen
028ce52739 radv: Add non-uniform indexing lowering.
This patch does it as late as possible so the potential extra
basic blocks don't inhibit other optimizations.

Big thanks to Jason for writing the lowering pass.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:13 +02:00
Timothy Arceri
e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Samuel Pitoiset
775191cd99 radv: fix getting the vertex strides if the bindings aren't contiguous
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349
Fixes: a66b186beb ("radv: use typed buffer loads for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-08 21:17:15 +02:00
Samuel Pitoiset
27b8f3ecc3 ac/nir: fix intrinsic names for atomic operations with LLVM 9+
This fixes the following LLVM error when using RADV_DEBUG=checkir:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32
i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add

The cmpswap operation still uses the old intrinsic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-08 13:16:50 +02:00
Eric Engestrom
05b114e526 simplify LLVM version string printing
Figure it out once in the build system, then just use that all over the place.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-04 16:08:11 +00:00
Marek Olšák
b563460b49 radeonsi: enable displayable DCC on Ravens 2019-04-04 09:53:24 -04:00
Marek Olšák
1f21396431 radeonsi: add support for displayable DCC for multi-RB chips
A compute shader is used to reorder DCC data from aligned to unaligned.
2019-04-04 09:53:24 -04:00
Marek Olšák
2c09eb4122 radeonsi: add support for displayable DCC for 1 RB chips
This is the simpler codepath - just disable RB and pipe alignment for DCC.
2019-04-04 09:53:24 -04:00
Marek Olšák
e457454cb6 amd/addrlib: fix uninitialized values for Addr2ComputeDccAddrFromCoord
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-04 09:30:40 -04:00
Samuel Pitoiset
c25f63872b radv: partially enable VK_KHR_shader_float16_int8
Only 8-bit integers for now, float16 requires a bit more work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:59 +02:00
Samuel Pitoiset
d099bc5829 ac: add 8-bit and 64-bit support to ac_build_bitfield_reverse()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:57 +02:00
Samuel Pitoiset
2cecf6c5cc ac: add 8-bit support to ac_build_umsb()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:55 +02:00
Samuel Pitoiset
a45d9e3e8d ac: add 8-bit support to ac_find_lsb()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:53 +02:00
Samuel Pitoiset
89cf8ca0ae ac: add 8-bit support to ac_build_bit_count()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:52 +02:00
Samuel Pitoiset
869af0464a ac/nir: add support for nir_op_b2i8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:49 +02:00
Rhys Perry
0af95f0ffc radv: lower 16-bit flrp
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-01 09:58:48 +02:00
Samuel Pitoiset
4d5fce29c3 ac: fix ac_build_umsb() for 16-bit integer type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 09:51:56 +02:00
Samuel Pitoiset
7a088d1ac8 ac: fix ac_find_lsb() for 16-bit integer type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 09:51:54 +02:00
Samuel Pitoiset
b16dffff23 ac: fix ac_build_bitfield_reverse() for 16-bit integer type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 09:51:52 +02:00
Samuel Pitoiset
9d13b9e53e ac: fix ac_build_bit_count() for 16-bit integer type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 09:51:49 +02:00
Samuel Pitoiset
e39a6b940f ac/nir: fix nir_op_b2i16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 09:51:47 +02:00
Timothy Arceri
4478c5374b Revert "ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations"
This reverts commit 29132af234.

It seems the new intrinsic causes a hang on radeonsi (VEGA) when running the
piglit test:

tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test
2019-03-29 21:04:01 +11:00
Samuel Pitoiset
cc752dea61 ac: fix return type for llvm.amdgcn.frexp.exp.i32.64
This fixes the following piglit with RadeonSI
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-29 09:18:24 +01:00
Samuel Pitoiset
62a9d757e6 radv: do not always initialize HTILE in compressed state
Especially when performing a transtion from UNDEFINED->GENERAL,
the driver shouldn't initialize HTILE metadata in compressed
state because it doesn't decompress when the src layout is
GENERAL.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110259
Fixes: 3a2e93147f ("radv: always initialize HTILE when the src layout is UNDEFINED")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-29 08:28:18 +01:00
Samuel Pitoiset
6596eb2b30 radv: skip updating depth/color metadata for conditional rendering
I don't think we should update metadata when conditional rendering
is enabled. For some reasons, some CTS breaks only on SI.

This fixes the following CTS on SI:
dEQP-VK.conditional_rendering.draw_clear.clear.depth.*

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 17:37:12 +01:00
Samuel Pitoiset
227b191206 radv: enable VK_AMD_gpu_shader_int16
This extension allows 16-bit support to Frexp/FrexpStruct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 13:02:53 +01:00
Samuel Pitoiset
8a6e61cc52 radv: do not lower frexp_exp and frexp_sig
Hardware has two instructions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 13:02:51 +01:00
Samuel Pitoiset
52c02d921f ac: add ac_build_frex_exp() helper ans 16-bit/32-bit support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 13:02:48 +01:00
Samuel Pitoiset
1bf9311c59 ac: add ac_build_frexp_mant() helper and 16-bit/32-bit support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 13:02:46 +01:00
Jason Ekstrand
ce47999cee Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"
This reverts commit 4e1bbb000c.  It turns
out that some DXVK apps due to some implementation detail of DXVK or
other create and destroy instances in an interleaved way.  Freeing the
glsl_type memory without being a bit more careful causes use-after-free
issues.  Looks like we need to try again.
2019-03-27 11:24:58 -05:00
Samuel Pitoiset
d6a07732c9 ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-27 14:45:52 +01:00
Nicolai Hähnle
e16ac33f37 amd/surface: provide firstMipIdInTail for metadata surface calculations
This field was added in a recent addrlib update, and while there
currently seems to be no issue with skipping it, we will have to
set it correctly in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-03-26 10:00:55 +01:00
Bas Nieuwenhuizen
82075e3c42 ac/nir: Return frag_coord as integer.
To preserve the invariant that nir ssa defs are integers or pointers
in LLVM.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-03-26 09:41:15 +01:00
Samuel Iglesias Gonsálvez
01cf390035 radv: write availability status vkGetQueryPoolResults() when the data is not available
If VK_QUERY_RESULT_WITH_AVAILABILY_BIT is set and
VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not
set, we need return to VK_NOT_READY only and set the availability
status field for each query.

From Vulkan spec:

"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both
not set then no result values are written to pData for queries that are
in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries if
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-25 08:21:22 +01:00
Samuel Iglesias Gonsálvez
cb3ea50ec2 radv: don't overwrite results in VkGetQueryPoolResults() when queries are not available
If the query is not available and VK_QUERY_RESULT_WAIT_BIT and
VK_QUERY_RESULT_PARTIAL_BIT are both not set, the spec doesn't
allow to modify its result.

From Vulkan spec:

"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are
both not set then no result values are written to pData for queries
that are in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries
if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."

v2:
- Move VK_NOT_READY change to next patch (Samuel Pitoiset)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-25 08:21:22 +01:00
Samuel Pitoiset
23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Rhys Perry
f736250ab4 ac/nir: implement 16-bit pack/unpack opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-22 12:50:16 +01:00
Józef Kucia
c077d5d7de radv: Fix driverUUID
Fixes: 14cad8786a ("radv: generate the same driver UUID as radeonsi")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-22 08:57:16 +01:00