Commit graph

1788 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
5db0bf9994 radv: Implement VK_EXT_discard_rectangles.
Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-10 13:26:22 +01:00
Bas Nieuwenhuizen
11b9cdd2d7 radv: Add mapping between dynamic state mask and external enum.
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-10 13:24:31 +01:00
Samuel Pitoiset
7145b20afb amd/common: bump the number of available user SGPRS to 32 on GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:35:08 +01:00
Samuel Pitoiset
a1f1f708c0 radv: remove radv_pipeline_layout::push_constant_stages field
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:57 +01:00
Samuel Pitoiset
d43f50c00b amd/common: do not rely on the pipeline for the push constants logic
It makes more sense to rely on nir_intrinsic_load_push_constant
instead of the pipeline layout.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:54 +01:00
Samuel Pitoiset
4e701cf75c radv/gfx9: calculate the number of ES VGPRs for merged shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:53 +01:00
Samuel Pitoiset
232c418af5 radv/gfx9: enable LDS for GS only if the ES type is TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:51 +01:00
Samuel Pitoiset
9e2395faf5 amd/common: determine the ES type (VS or TES) for the GS on GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:49 +01:00
Timothy Arceri
f04d2ca0d9 ac: rework emit_barrier() to not segfault on radeonsi
nir_to_llvm_context will always be NULL for radeonsi so we need
work around this.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-09 10:21:32 +11:00
Timothy Arceri
19f3141e6a ac: add load_tess_level() to the abi
Fixes the following piglit tests in radeonsi:

vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test
vs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tes-tessinner-tessouter-inputs-tris.shader_test

v2: make use of si_shader_io_get_unique_index_patch()
    via the helper in the previous patch rather than
    shader_io_get_unique_index()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-09 10:21:32 +11:00
Samuel Pitoiset
08a5f4412a radv: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1

Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:30:01 +01:00
Samuel Pitoiset
be16bbe1d3 radv: avoid PS partial flushes when viewports/scissors don't change
For Vega10 and Raven that need a special workaround for the
scissor bug.

This seems to give a minor boost for Talos and Dota 2, at least.

To reduce the cost of memcmp, the driver checks if it's
really useful to do the comparison.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:58 +01:00
Samuel Pitoiset
b09b3f8834 radv: add has_scissor_bug for Vega10 and Raven
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:56 +01:00
Samuel Pitoiset
b462ceb482 radv/gfx9: do not load VGPR1 when GS uses points or lines
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:53 +01:00
Samuel Pitoiset
a3c2a86757 radv: make shader BOs read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:51 +01:00
Samuel Pitoiset
6e3459eaf4 radv: make descriptor BOs read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:49 +01:00
Samuel Pitoiset
e4f2ad403f radv: make the indirect GFX config BO read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:47 +01:00
Samuel Pitoiset
0e84fc2e2b radv/winsys: make IBs read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:45 +01:00
Samuel Pitoiset
a3aaa03624 radv/winsys: add RADEON_FLAG_READ_ONLY
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:43 +01:00
Samuel Pitoiset
2dab5e96ec radv/winsys: rework radv_amdgpu_bo_va_op()
Needed for the following commit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:41 +01:00
Marek Olšák
a140aeb619 ac: add ac_build_fmin/fmax helpers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-06 09:51:43 +01:00
Samuel Pitoiset
87efa71001 radv: remove unused radv_color_buffer_info::cb_clear_valueX
Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-05 17:26:51 +01:00
Samuel Pitoiset
ec63ab39be radv: enable denorms for 64-bit and 16-bit floats
Similar to RadeonSI.

This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-05 09:51:33 +01:00
Samuel Pitoiset
7643c71527 amd/common: correctly detect if we need ring buffers
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.

Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-05 09:49:51 +01:00
Samuel Pitoiset
50cfad0298 amd/common: use ac_image_load when lod is zero
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 09:49:45 +01:00
Samuel Pitoiset
85769759bf radv: limit the scissor bug workaround to Vega 10 and Raven
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-05 09:47:49 +01:00
Timothy Arceri
4a0c24f2dd ac: rework ac_llvm_extract_elem()
Simplifies the logic a little and asserts index is 0.

Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 12:20:38 +11:00
Timothy Arceri
14adf7853a ac/radeonsi: add load_tess_coord() to the abi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
9e1a3caf32 ac/radeonsi: add tcs_rel_ids to the abi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
f93740efc1 ac: add {tcs,tes}_patch_id to the abi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
b99ebaa4fd ac: move some helpers to ac_llvm_build.c
We will call these from the radeonsi NIR backend.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
2deb822075 ac: add store_tcs_outputs() to the abi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
b104e7e172 ac: call load_tcs_input() via the abi
This also enables some code sharing with tes.

V2: drop type param and just use ctx->i32

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Timothy Arceri
b09a3196e0 ac: add load_tes_inputs() to the abi
V2: drop type param and just use ctx->i32

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Bas Nieuwenhuizen
76daa30e4a radv: Use correct flush bits for flushing L2 during CB/DB flushes.
Copied from radeonsi.

Putting in the correct metadata flush commands for eventually not
flushing L2 on CB/DB switch.

Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT
at the moment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-04 19:35:36 +01:00
Bas Nieuwenhuizen
f2c9f13ec2 radv: Invalidate L1 for VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT.
These are just shaders reads, so we need to invalidate L1.

Fixes: 6dbb0eaccc "radv: handle subpass cache flushes"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-04 19:35:36 +01:00
Samuel Pitoiset
2670ebb584 radv/gfx9: reduce the number of input VGPRs for the GS stage
This can still be improved, but let's start with this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-04 18:43:25 +01:00
Samuel Pitoiset
a4d2782664 amd/common: scan if gl_PrimitiveID is used before translating to LLVM
It makes more sense to move all scan stuff in the same place.
Also, we don't really need to duplicate the uses_primid field
for each stages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-04 18:43:09 +01:00
Samuel Pitoiset
3b2cb2f99a amd/common: scan if gl_InvocationID is used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-04 18:43:07 +01:00
Bas Nieuwenhuizen
79724c89f8 ac: rename has_sync_file to has_fence_to_handle.
sync_files are in linux since 4.7, while the amdgpu fence_to_handle
ioctl is only in 4.15.

In particular we don't need it for sync_file in radv, because
everything happens via syncobjs, which got support earlier than
fence_to_handle.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-04 01:12:09 +01:00
Bas Nieuwenhuizen
c99426ea83 ac/nir: Handle loading data from compact arrays.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-04 00:14:23 +01:00
Bas Nieuwenhuizen
1c78e4f053 radv: Allow writing 0 scissors.
When rasterization is disabled we can have that few.

Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-04 00:14:19 +01:00
Bas Nieuwenhuizen
5158603182 radv: Use correct HTILE expanded words.
Seems like users are actually hitting 0xFFFFFFFF actually making
things broken for them, and the mad max regression is fixed, so
lets put this in once more.

v2: Use 0xf for depth-only htile. (Dave)

Fixes: af2844116f "radv: Revert HTILE reset word to 0xFFFFFFFF."
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-04 00:14:03 +01:00
Marek Olšák
4f19cc82f9 ac: rename has_syncobj_wait -> has_syncobj_wait_for_submit
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-04 00:07:45 +01:00
Bas Nieuwenhuizen
6a36bfc64d radv: Implement binning on GFX9.
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.

But with the code upstream it is easier to experiment with it.

v2: Remove initializing the registers from si_emit_config.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-12-31 15:07:07 +01:00
Bas Nieuwenhuizen
b0d17270ad radv: Add flag for enabling binning.
Letting it be disabled by default.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-12-31 13:47:51 +01:00
Bas Nieuwenhuizen
b0a6fd0274 radv: Also set DCC params for sampling for input attachment usage.
Those are implemented as texture sampling, so we need to make the
texture TC-compatible too.

Fixes: 34d23e82ca "radv: set some dcc parameters depending on if texture will be sampled"
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-12-29 23:42:30 +01:00
Bas Nieuwenhuizen
ab957243e1 radv: Enable DCC with transfers.
Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:22:02 +01:00
Bas Nieuwenhuizen
eb9a4c3464 radv: Decompress copy destination if formats are incompatible.
If both source and destination are DCC compressed, and their formats
are not compatible, we need to decompress one of them to make
sure we can do reinterpretation (which needs src format == dst format)
.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:58 +01:00
Bas Nieuwenhuizen
44fcf58744 radv: Disable DCC for GENERAL layout and compute transfer dest.
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.

This is also going to help implementing other stuff.

Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:53 +01:00