Commit graph

3986 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
8ad3d8b178 radv: Fix condition for skipping the continue CS.
We need the continue CS for referencing the tess/GDS/sample position BOs.

Fixes: 46e52df34d "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab753 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f3 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-03 13:02:07 +00:00
Samuel Pitoiset
a2a68d551c radv/gfx10: fix the ESGS ring size symbol
Random hangs no longer happen, I'm actually not sure if they were
related to this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 21:50:40 +02:00
Samuel Pitoiset
34be977f80 radv: fix build
Forgot to amend the commit before updating the MR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-02 20:37:43 +02:00
Samuel Pitoiset
4304162744 Revert "radv: disable viewport clamping even if FS doesn't write Z"
This was actually the wrong fix.

This reverts commit 0a313cc285.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 19:40:39 +02:00
Samuel Pitoiset
b8fe6189a9 radv: rework the slow depthstencil clear to write depth from PS
Make sure to export the expected clear values to the depth
stencil attachment.

This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 19:31:51 +02:00
Samuel Pitoiset
e19d1ee2d1 radv/gfx10: fix NGG streamout with triangle strips for VS
The number of vertices has to be adjusted with the output primitive
type.

This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:09:35 +02:00
Samuel Pitoiset
08ab13d340 radv/gfx10: fix storing/loading NGG stream outputs for GS
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.

This fixes a bunch of CTS failures when NGG GS is force enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:09:32 +02:00
Samuel Pitoiset
3be21b5ab1 radv/gfx10: use the component mask when storing/loading NGG stream outputs
It's unnecessary to store/load more components that needed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:09:30 +02:00
Samuel Pitoiset
60f8224171 radv/gfx10: fix storing/loading NGG stream outputs for VS and TES
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.

This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:09:27 +02:00
Samuel Pitoiset
56e1b1ff0c radv/gfx10: add missing counter buffer to the BO list
The buffer isn't necessarily used before.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:09:25 +02:00
Samuel Pitoiset
683c5e27c7 radv/gfx10: add radv_device::use_ngg
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 18:06:01 +02:00
Marek Olšák
a1545af079 ac/nir: fix GLSL imageSamples()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-30 14:21:42 -04:00
Marek Olšák
0cc233e3dc ac: add ac_build_image_get_sample_count from radeonsi
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-30 14:21:42 -04:00
Marek Olšák
39e638c14e ac/surface: don't allocate FMASK if there is no graphics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-30 14:21:42 -04:00
Marek Olšák
4a0d2e2880 ac: reorder and print all radeon_info fields
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-30 13:36:21 -04:00
Marek Olšák
e8b1538587 ac: set the number of SDPs same as the number of TCCs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-30 13:36:21 -04:00
Marek Olšák
b7c2f7c5a6 ac: fix num_good_cu_per_sh for harvested chips
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-30 13:36:20 -04:00
Marek Olšák
8cbe83445b ac: add radeon_info::tcc_harvested
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-30 13:36:20 -04:00
Marek Olšák
7d97013294 ac: fix incorrect vram_size reported by the kernel
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-30 13:36:20 -04:00
Daniel Schürmann
1d29895e5b aco: call nir_opt_algebraic_late() exhaustively
57559 shaders in 28980 tests
Totals:
SGPRS: 2963407 -> 2959935 (-0.12 %)
VGPRS: 2014812 -> 2016328 (0.08 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 114545436 -> 114498084 (-0.04 %) bytes
LDS: 933 -> 933 (0.00 %) blocks
Max Waves: 375997 -> 375866 (-0.03 %)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann
0fb27f1e5a radv/aco: Don't lower subtractions
40228 shaders in 20236 tests
Totals:
SGPRS: 2045512 -> 2046496 (0.05 %)
VGPRS: 1430856 -> 1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 77202840 -> 77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Mauro Rossi
411e50a8fd android: aco: add support for libmesa_aco
Android building rules are added in src/amd/Android.compiler.mk
libmesa_aco static library is built conditionally to radeonsi
as done for vulkan.radv module

This will prevent Android build errors for non x86 systems

filter-out compiler/aco_instruction_selection_setup.cpp source,
as already included by compiler/aco_instruction_selection.cpp
and would cause several multiple definition linker errors

NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-09-28 15:56:34 +02:00
Mauro Rossi
c24ad565ae android: aco: fix undefined template 'std::__1::array' build errors
Fixes a few building errors similar to the following:

In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
    _T2 second;
        ^

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-09-28 15:56:23 +02:00
Rhys Perry
1f2813e103 aco: don't remove the loop exec mask in transition_to_Exact()
No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-27 10:57:03 +01:00
Rhys Perry
b711e62e61 aco: set loop_info::has_discard for demotes
We need the loop header phis for the outer exec masks. Needed for
dEQP-VK.glsl.demote.dynamic_loop_texture

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-27 10:57:03 +01:00
Timur Kristóf
c372dc762d radv: Fix L2 cache rinse programming.
According to radeonsi, GLM doesn't support WB alone, so
we have to set INV too when WB is set.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 22:18:16 +00:00
Timur Kristóf
30f0c0ea7d radv: Add debug option to dump meta shaders.
This new option can help debug shader compiler problems when
there are issues with the meta shaders.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 13:36:49 +00:00
Timur Kristóf
a4fd8ba7e3 amd/common: Introduce ac_get_fs_input_vgpr_cnt.
Add a function called ac_get_fs_input_vgpr_cnt which will return
the number of input VGPRs used by an AMD shader. Previously,
radv and radeonsi had the same code duplicated, but this commit also
allows them to share this code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-26 13:36:49 +00:00
Timur Kristóf
83eebdb507 radv: Set shared VGPR count in radv_postprocess_config.
This commit allows RADV to set the shared VGPR count according to
the shader config.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 13:36:49 +00:00
Timur Kristóf
7bde4ddaf7 amd/common: Add num_shared_vgprs to ac_shader_config for GFX10.
In GFX10 wave64 mode, shared VGPRs allow the two wave halves to
share some data with each other.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-26 13:36:49 +00:00
Timur Kristóf
db1fddcf0f amd/common: Extract some helper functions to ac_shader_util.
This commit moves ac_get_tbuffer_format, ac_get_sampler_dim and
ac_get_image_dim into ac_shader_util, thus enabling them to be used
by compilers other than LLVM.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-26 13:36:49 +00:00
Timur Kristóf
d8b46f8964 amd/common: Move ac_export_mrt_z to ac_llvm_build.
The aim of this commit is to keep ac_shader_util LLVM-free,
since we would like to use it in ACO later.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-26 13:36:49 +00:00
Rhys Perry
06ea3325c3 aco: CSE readlane/readfirstlane/permute/reduce with the same exec mask
v2: rename pass_temp to pass_flags
v2: also CSE reductions
v3: add ds_swizzle_b32 support
v3: check gds/offset0/offset1 fields

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-26 13:19:51 +01:00
Rhys Perry
86ecf92c23 aco: don't CSE v_readlane_b32/v_readfirstlane_b32
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-26 13:19:51 +01:00
Rhys Perry
3c966fd688 aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_string
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:47 +01:00
Rhys Perry
ec8ced9123 radv/aco: return a correct name and description for the backend IR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:43 +01:00
Rhys Perry
15ea1c5cff aco: store printed backend IR in binary
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:31 +01:00
Rhys Perry
6613b81327 aco,radv/aco: get dissassembly for release builds if requested
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:09 +01:00
Rhys Perry
0aef1a230e radv/aco: actually disable ACO when unsupported
We were setting this twice. The second time, we weren't later disabling
it if unsupported.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:04:45 +01:00
Rhys Perry
db2ca45102 aco: check for duplicate opcode numbers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-25 15:28:44 +00:00
Rhys Perry
101f47fdd7 aco: fix opcode for s_mul_hi_i32
Fixes dEQP-VK.glsl.builtin.function.integer.imulextended.*_compute

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-25 15:28:44 +00:00
Rhys Perry
2faaf04c62 aco: fix v_subrev_co_u32_e64 opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-25 15:28:44 +00:00
Rhys Perry
00aa413bae aco: fix GFX9 opcode for v_xad_u32
Fixes various dEQP-VK.image.store.* tests.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-25 15:28:44 +00:00
Rhys Perry
b125dc4839 aco: implement 64-bit ineg
We currently lower them, but nir_opt_algebraic() can add new ones because
lower_sub=true.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-25 15:27:48 +00:00
Rhys Perry
641eac953c aco: run nir_lower_int64() before nir_lower_idiv()
nir_lower_idiv() asserts on 64-bit integers.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-09-25 15:27:48 +00:00
Eric Engestrom
30f639c181 radv: fix s/load/store/ copy-paste typo
Fixes: cdc6efddf9 ("radv: implement all depth/stencil resolve modes using graphics")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-24 19:18:54 +01:00
Bas Nieuwenhuizen
780182f0a0 radv: Add workaround for hang in The Surge 2.
Released today and hangs on RADV. We don't have the root cause yet,
but this should unblock people playing the game.

No drirc because the radv debugflags are not usable from drirc and
I want this backported.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-09-24 09:51:40 +00:00
Marek Olšák
05d32850ff ac/nir: force unnormalized coordinates for RECT
This fixes VAAPI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-23 15:34:54 -04:00
Marek Olšák
500181b2ba ac/nir: port Z compare value clamping from radeonsi
This fixes some dEQP tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-23 15:34:54 -04:00
Marek Olšák
9429714233 ac: stop using PCI IDs for chip identification
PCI IDs for amdgpu will be removed from Mesa.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-09-23 15:14:11 -04:00