Commit graph

4340 commits

Author SHA1 Message Date
Samuel Pitoiset
f6770b9726 ac/llvm: fix warning in ac_build_canonicalize()
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 4567 |  return ac_build_intrinsic(ctx, intr, type, params, 1,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4568 |       AC_FUNC_ATTR_READNONE);
      |       ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-26 08:35:10 +01:00
Marek Olšák
b02e0d2604 radeonsi/nir: don't run si_nir_opts again if there is no change
0.3% less overhead

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-25 16:48:27 -05:00
Marek Olšák
f671cc4d95 ac: set swizzled bit in cache policy as a hint not to merge loads/stores
LLVM now merges loads and stores for all opcodes, so this must be set.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 16:48:27 -05:00
Samuel Pitoiset
2af39c719e radv: select the depth decompress path based on the aspect mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:23 +01:00
Samuel Pitoiset
905c005561 radv: create decompress pipelines for separate depth/stencil layouts
No functional changes as the driver still uses the depth+stencil
pipeline.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:21 +01:00
Samuel Pitoiset
faa58201f3 radv: rework creation of decompress/resummarize meta pipelines
This refactoring will help for creating more decompress pipelines.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:18 +01:00
Samuel Pitoiset
8f0fb38825 radv: set the image view aspect mask before resolves
No functional changes, but it will be used to decompress
separate depth/stencil aspects.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:16 +01:00
Samuel Pitoiset
9dec90b7bc radv: set the image view aspect mask during subpass transitions
No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:13 +01:00
Rhys Perry
459bc77763 aco: enable load/store vectorizer
Totals from affected shaders:
SGPRS: 1890373 -> 1900772 (0.55 %)
VGPRS: 1210024 -> 1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size: 81937504 -> 74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)

In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.

v9: rework the callback function
v10: handle load_shared/store_shared in the callback

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Rhys Perry
b3a3e4d1d2 radv: set alignment for load_ssbo/store_ssbo in meta shaders
Otherwise, nir_intrinsic_align() will assert when called on the intrinsics

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 13:59:11 +00:00
Connor Abbott
01eb6ef870 aco: Make unused workgroup id's 0
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
bb78f9b4e4 aco: Use common argument handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
e7f4cadd02 radv: Replace supports_spill with explict_scratch_args
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:17:51 +01:00
Connor Abbott
4d6676d78a aco: Make num_workgroups and local_invocation_ids one argument each
To match the LLVM argument setup code.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
a7f1c63442 aco: Split vector arguments at the beginning
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
b45c54ff8d aco: Use radv_shader_args in aco_compile_shader()
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
680b086db1 aco: Constify radv_nir_compiler_options in isel
It's already const for everything else.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
66c703b3e8 radv: Move argument declaration out of nir_to_llvm
Now it's executed for ACO too.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:17:51 +01:00
Connor Abbott
3b143369a5 ac/nir, radv, radeonsi: Switch to using ac_shader_args
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2019-11-25 14:17:10 +01:00
Connor Abbott
9885af3bdf ac: Add a shared interface between radv, radeonsi, LLVM and ACO
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:

- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.

- Stop using radv-specific structures for things like determining the
chip generation in ACO.

In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:12:46 +01:00
Connor Abbott
43da33c169 radv: Rename ac_arg_regfile
We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:12:46 +01:00
Samuel Pitoiset
bfb307aea9 ac/llvm: fix the local invocation index for wave32
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.

My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 07:25:48 +00:00
Samuel Pitoiset
b99295fb33 radv: disable subgroup shuffle operations on GFX10
They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 08:03:24 +01:00
Timothy Arceri
f54c4e85ce radv: create a fresh fork for each pipeline compile
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.

Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.

Fixes: cff53da3 ("radv: enable secure compile support")
2019-11-25 10:10:14 +11:00
Timothy Arceri
1663bb1f77 radv: add a secure_compile_open_fifo_fds() helper
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.

We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.

Fixes: cff53da374 ("radv: enable secure compile support")
2019-11-25 10:10:14 +11:00
Timothy Arceri
ef54f15da9 radv: add some infrastructure for fresh forks for each secure compile
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.

Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.

For more information on why we create a fresh fork see the following
commits.
2019-11-25 10:10:14 +11:00
Rhys Perry
517728477c aco: fix waitcnts for barriers at block ends
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-22 19:56:31 +00:00
Rhys Perry
29d131d619 aco: fix copy+paste error
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rhys Perry
d1b9deeea8 aco: improve waitcnt insertion around loops
Do this by repeating processing of loops until no progress is made.

Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)

v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
    of loops instead of at each continue

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Daniel Schürmann
8d7621a53f radv: Enable Subgroup Arithmetic and Clustered for SI
This patch also allows to enable VK_AMD_shader_ballot on SI.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-20 20:31:45 +00:00
Daniel Schürmann
0cbcfc071e amd/llvm: Add Subgroup Scan functions for SI
The idea of this implementation is taken from the ROCm Device Libs:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-20 20:31:45 +00:00
Samuel Pitoiset
7ecd8a3471 radv: enable VK_KHR_shader_subgroup_extended_types on GFX6-GFX7
Most of DEQP-VK.subgroups are skipped because 16-bit float aren't
supported but others pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-20 11:09:58 +00:00
Bas Nieuwenhuizen
4eb2a1dc6f radv: Do not change scratch settings while shaders are active.
When the scratch ringbuffer settings are changed, the shader unit has
to be idle or we will have shaders using old and new settings.

That combination is not supported on the HW (likely the offset is
ringbuffer idx * WAVESIZE * 1024).

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-20 01:18:36 +00:00
Marek Olšák
e7fb9c73a7 ac: fill num_rings for remaining IPs
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-19 18:31:53 -05:00
Marek Olšák
e9cc4f670f ac: add radeon_info::num_rings and move ring_type to amd_family.h
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-19 18:31:53 -05:00
Marek Olšák
ebe7579655 nir: move data.image.access to data.access
The size of the data structure doesn't change.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:05 -05:00
Rhys Perry
7eb7969213 radv/aco: enable VK_KHR_shader_subgroup_extended_types
We could enable it on GFX10 if LLVM wasn't used as a fallback for
unsupported stages. Note that the CTS only tests it if
VK_KHR_shader_float16_int8 is enabled, even though it's not a
requirement.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-19 18:58:04 +00:00
Rhys Perry
56c06c79fc aco: implement 64-bit integer reductions
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.

No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.

v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-11-19 18:58:04 +00:00
Rhys Perry
33277bd66e aco: refactor reduction lowering helpers
Should make 64-bit integer reductions easier to implement.

v4: use num_opcodes instead of last_opcode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-11-19 18:56:21 +00:00
Samuel Pitoiset
c93f2cefd5 radv: advertise VK_KHR_shader_subgroup_extended_types on GFX8-GFX9
This extension allows to use subgroup operations with 8 and 16-bits

Untested on GFX6-GFX7, and most of subgroup operations are broken
on GFX10, so don't enable it for now. Not enabled on ACO because
it's still doesn't support 8-bits/16-bits.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
80c71cbbd8 ac: add 16-bit float support to ac_build_alu_op()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
670aa24c69 ac: add 8-bit and 16-bit supports to ac_build_optimization_barrier()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
21a9243f5e ac: add 8-bit and 16-bit supports to ac_build_wwm()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
ef352a2466 ac: add 8-bit and 16-bit supports to get_reduction_identity()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
c8af1d51d4 ac: add 8-bit and 16-bit supports to ac_build_swizzle()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
1565118d8f ac: add 8-bit and 16-bit supports to ac_build_dpp()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
2113867f0c ac: add 8-bit and 16-bit supports to ac_build_set_inactive()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
c29514bd22 ac: add 8-bit and 16-bit supports to ac_build_readlane()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
58d5ab98a3 ac: add 8-bit and 16-bit supports to ac_build_shuffle()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset
204cf54b70 ac: remove useless cast in ac_build_set_inactive()
The return type is always the src type (32 or 64 bits).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00