Marek Olšák
1e39c21c23
radeonsi/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
683cf11b81
radeonsi/gfx10: prefetch HW GS when NGG is used
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
76898a8062
amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.load
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
7f71579064
radeonsi/gfx10: fix PS exports for SPI_SHADER_32_AR
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
4bdf44724f
radeonsi/gfx10: set DLC for loads when GLC is set
...
This fixes L1 shader array cache coherency.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
f81aa6b0c8
radeonsi/gfx10: fix shader images
...
Don't promote 2D image instructions to 3D, and don't set z=BASE_ARRAY.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
7c805a7c67
radeonsi/gfx10: set the DCC constant encoding flag
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6eb219e963
radeonsi/gfx10: fix intensity formats
...
move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6944f99176
radeonsi/gfx10: allocate GDS BOs for streamout
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
395185912d
radeonsi/gfx10: make sure GDS is idle between IBs
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
5ff3aff0d6
radeonsi/gfx10: implement streamout
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
792a638b03
radeonsi/gfx10: implement streamout-related queries
...
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.
This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.
The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
bcd2d2e194
radeonsi/gfx10: enable the workaround for unaligned vertex fetch
...
Yes, really. Note that non-format buffer loads are unaffected and work
just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup
correctly, which amdgpu ensures).
Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
22b85bfc02
radeonsi/gfx10: re-order the initialization order in si_compile_tgsi_main
...
It's useful to be able to access gs_ngg_scratch before creating the
main wrapping branch.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
3aa622aab1
radeonsi/gfx10: apply DCC MSAA blend workaround
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
bc25ccfe22
radeonsi/gfx10: implement si_emit_global_shader_pointers
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
6bcc273de8
radeonsi/gfx10: implement si_init_tess_factor_ring
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
2492cfde66
radeonsi/gfx10: initialize EXEC for TES-as-NGG (without geometry shader)
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
591537c7fa
radeonsi/gfx10: use correct VGPR for instance ID in LS shader
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
f3b9a37278
radeonsi/gfx10: implement si_shader_hs
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
e4d6b4daae
radeonsi/gfx10: implement si_create_sampler_state
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
0bf3e6fae7
radeonsi/gfx10: double the number of tessellation offchip buffers per SE
...
Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale
the number of offchip buffers accordingly.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
2afd3c421d
radeonsi/gfx10: implement get_tess_ring_descriptor
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
d028440f57
radeonsi/gfx10: mask DCC tile swizzle by alignment
...
DCC alignment can be less than the alignment of the main surface. In that
case, the DCC tile swizzle needs to be masked accordingly. Should have no
impact on pre-gfx10.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
1666ee183e
radeonsi/gfx10: implement hardware MSAA resolve
...
MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile
optimization that we use on gfx9 and earlier does not work.
Be very explicit about how the swizzle mode of the temporary surface is
selected.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
69c41fb8ff
radeonsi/gfx10: fix binding on si_update_scratch_relocs
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
fd8758366b
radeonsi/gfx10: set llvm_has_working_vgpr_indexing
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
48810ad02d
radeonsi/gfx10: implement load_const_buffer_desc_fast_path
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1b11fb148c
radeonsi/gfx10: take PRIMID from the correct output when exported by GS
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
8060339278
radeonsi/gfx10: change location of instance ID shader input
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
ccdf792910
radeonsi/gfx10: set USER_DATA_ADDR offset for geometry shaders
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
00707922d4
radeonsi/gfx10: implement si_emit_derived_tess_state
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
e0c2a4d58c
radeonsi/gfx10: implement si_shader_gs
...
This is only used in the legacy, non-NGG path.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
2864d53deb
radeonsi/gfx10: implement preload_ring_buffers
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
56cab3e996
radeonsi/gfx10: implement si_set_ring_buffer
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
3c1aeb834f
radeonsi/gfx10: allow rectangle outputs from NGG primitive shader
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
77e715541c
radeonsi/gfx10: emit VGT_GS_OUT_PRIM_TYPE from draw and add it to VS_STATE
...
With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change.
The VS_STATE is required for both streamout and culling from a vertex
shader without pre-compiling outprim-specific variants.
We could consider compiling specialized variants in the future. We
could also consider compiling the NGG logic as an epilog.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4ecc39e1aa
radeonsi/gfx10: NGG geometry shader PM4 and upload
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
a04aa4be2b
radeonsi/gfx10: generate geometry shaders for NGG
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
efe1cd4859
radeonsi/gfx10: use the correct register for image descriptor dumping
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1ce52c1e37
radeonsi/gfx10: emit GE_CNTL instead of IA_MULTI_VGT_PARAM for legacy mode
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
77c0f9e7ba
radeonsi/gfx10: initialize GE_{MAX,MIN}_VTX_INDX/INDX_OFFSET
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
47c9505a92
radeonsi/gfx10: setup registers for OpenGL compute
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
b8d3fd46d6
radeonsi/gfx10: set user data base registers
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
016a465d7d
radeonsi/gfx10: implement gfx10_shader_ngg
...
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.
Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
d0c204a1e0
radeonsi/gfx10: add NGG registers to si_init_config
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
ae00cae0b7
radeonsi/gfx10: update shader-related fields in si_init_config
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1dee01ee13
radeonsi/gfx10: implement si_shader_ps
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
612489bd5d
radeonsi/gfx10: generate VS and TES as NGG merged ESGS shaders
...
This does not support geometry shading yet. Also missing are streamout
and NGG-specific optimizations.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
e86256c512
radeonsi/gfx10: distinguish between merged shaders and multi-part shaders
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00