Commit graph

1250 commits

Author SHA1 Message Date
Marek Olšák
8a71f60194 ac: replace glc,slc with cache_policy for loads
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:56 -04:00
Marek Olšák
a29e781961 ac: replace glc,slc with cache_policy for stores
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:54 -04:00
Nicolai Hähnle
cb07f91489 amd/common: move ac_shader_{binary,reloc} into r600 and rename
They are no longer used by radeonsi or radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle
510e74ff48 amd/common: removed unused ac_shader_binary functions
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle
b398230e6d amd/common: remove unused ac_compile_module_to_binary
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Marek Olšák
969e5176c2 ac: rework ac_build_waitcnt for gfx10
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
3203a74dcb radeonsi/gfx10: set PA_SC_TILE_STEERING_OVERRIDE
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
76898a8062 amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.load
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
4bdf44724f radeonsi/gfx10: set DLC for loads when GLC is set
This fixes L1 shader array cache coherency.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
1666ee183e radeonsi/gfx10: implement hardware MSAA resolve
MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile
optimization that we use on gfx9 and earlier does not work.

Be very explicit about how the swizzle mode of the temporary surface is
selected.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
016a465d7d radeonsi/gfx10: implement gfx10_shader_ngg
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.

Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
84e7ee421f ac/surface/gfx10: allow "rotated" micro mode
Standard mode does not support DCC.

The R is retconned to "render target" on gfx10.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
a66be784c3 ac/surface/gfx10: DCC is only supported with SW_64KB_{Z,R}_X modes
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
6d416ac7e1 amd/common/gfx10: print gfx10 registers in debug dumps
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
70fd27d1e3 amd/common/gfx10: CMASK is only used for FMASK
All regular color compression is done via DCC.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
b52bf8f12a amd/common/gfx10: support new tbuffer encoding
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
c067aaa580 amd/common/gfx10: pad shader buffers for instruction prefetch
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
227c29a80d amd/common/gfx10: implement scan & reduce operations
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
7ba80c1d19 amd/common/gfx10: add GS_ALLOC_REQ message define
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4c364c89e2 amd/common/gfx10: print out GCR_CNTL as part of {ACQUIRE,RELEASE}_MEM
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
74a26af913 amd/common/gfx10: add register JSON
A small number of fields now need new disambiguation.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
536782b0b7 amd/common: add GFX10 chips
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Marek Olšák
78cdf9a99f amd/addrlib: add gfx10 support
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Samuel Pitoiset
83297baf2d ac: compute the DCC fast clear size per slice on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:44 +02:00
Samuel Pitoiset
6517d226ac ac: compute the size of one DCC slice on GFX8
Addrlib doesn't provide this info. Because DCC is linear, at least
on GFX8, it's easy to compute the size of one slice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:41 +02:00
Emil Velikov
4ec32413f3 ac: change ac_query_gpu_info() signature
Currently libdrm_amdgpu provides a typedef of the various handles. While
the goal was to make those opaque, it effectively became part of the API

To the best of my knowledge there are two ways to have opaque handles:
 - "typedef void *foo;" - rather messy IMHO
 - "stuct foo;" and use "struct foo *" through the API

In our case amdgpu_device_handle is used only internally, plus
respective code is not used or applicable for r300 and r600. Hence we
copied the typedef.

Seemingly this will be a problem since libdrm_amdgpu wants to change the
API, while not updating the code(?).

Either way, we can safely s/amdgpU_device_handle/void */ and carry on.

Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak at amd.com>
2019-06-28 17:49:32 +01:00
Samuel Pitoiset
34bef8a0d7 radv: clear CMASK layers instead of the whole buffer on GFX8
This reduces the size of fill operations needed to clear CMASK
for layered color textures.

GFX9 unsupported for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:36:28 +02:00
Samuel Pitoiset
476b907a3b radv: clear FMASK layers instead of the whole buffer on GFX8
This reduces the size of fill operations needed to clear FMASK
for layered color textures.

GFX9 unsupported for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:36:25 +02:00
Marek Olšák
ac4b1e2f0a radeonsi: set the calling convention for inlined function calls
otherwise the behavior is undefined

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle
bd3a3fd25a amd/rtld: update the ELF representation of LDS symbols
The initial prototype used a processor-specific symbol type, but
feedback suggests that an approach using processor-specific section
name that encodes the alignment analogous to SHN_COMMON symbols is
preferred.

This patch keeps both variants around for now to reduce problems
with LLVM compatibility as we switch branches around.

This also cleans up the error reporting in this function.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Marek Olšák
0032f6b8a0 ac/surface: remove addrlib_family_rev_id
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Daniel Schürmann
0daeb1d127 amd/common: lower bitfield_extract to ubfe/ibfe.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
48a75e7af0 amd/common: lower bitfield_insert to bfm & bitfield_select
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Nicolai Hähnle
21dd881416 ac/rtld: report better error messages for LDS overallocation
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Marek Olšák
b64bd5887e ac/rtld: check correct LDS max size
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Nicolai Hähnle
1ee0f0d315 radeonsi: add s_sethalt to shaders for debugging
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Nicolai Hähnle
87182200c7 ac/rtld: fix sorting of LDS symbols by alignment
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Connor Abbott
53a7649e5d ac/nir: Set speculatable for buffer loads where allowed
This brings the nir path in line with the TGSI path.

Totals from affected shaders:
SGPRS: 2984 -> 2984 (0.00 %)
VGPRS: 2792 -> 2652 (-5.01 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 247380 -> 248072 (0.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 121 -> 132 (9.09 %)
Wait states: 0 -> 0 (0.00 %)

Most of the change came from DiRT: Showdown, and came from sinking SSBO
loads.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
3bf8981c51 ac,radeonsi: Always mark buffer stores as inaccessiblememonly
inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.

Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.

statistics with nir enabled:

Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

The only difference was a manhattan31 shader.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-19 14:08:27 +02:00
Samuel Pitoiset
4c7ef1b02e ac: make ac_compute_cmask() a static function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 11:30:47 +02:00
Samuel Pitoiset
b5012a0518 ac: update llvm.amdgcn.icmp intrinsic name for LLVM 9+
LLVM r363339 changed llvm.amdgcn.icmp.i* to llvm.amdgcn.icmp.i64.i*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 08:58:33 +02:00
Marek Olšák
abe9a51d27 ac: add radeon_info::is_amdgpu instead of checking drm_major == 3
and clean up

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-14 13:31:18 -04:00
Daniel Schürmann
deedc0b31d amd/common: add support for AMD_shader_ballot functions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Nicolai Hähnle
f8315ae04b amd/rtld: layout and relocate LDS symbols
Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.

Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
1ff2440eee amd/common: use ARRAY_SIZE for the LLVM command line options
This is more convenient for changing it around during debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
3c958d924a amd/common: add ac_compile_module_to_elf
A new variant of ac_compile_module_to_binary that allows us to
keep the entire ELF around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
77b05cc42d radeonsi: use ac_shader_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
b3be346c68 amd/common: add a more powerful runtime linker
Using an explicit linker instead of just concatenating .text
sections will allow us to start using .rodata sections and
explicit descriptions of data on LDS that is shared between
stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
c129cb3861 amd/common: clarify ac_shader_binary::lds_size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:21 -04:00
Nicolai Hähnle
2e96c01073 amd/common: extract ac_parse_shader_binary_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:08 -04:00