Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It makes more sense to rely on nir_intrinsic_load_push_constant
instead of the pipeline layout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
nir_to_llvm_context will always be NULL for radeonsi so we need
work around this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes the following piglit tests in radeonsi:
vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test
vs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tes-tessinner-tessouter-inputs-tris.shader_test
v2: make use of si_shader_io_get_unique_index_patch()
via the helper in the previous patch rather than
shader_io_get_unique_index()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For Vega10 and Raven that need a special workaround for the
scissor bug.
This seems to give a minor boost for Talos and Dota 2, at least.
To reduce the cost of memcmp, the driver checks if it's
really useful to do the comparison.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Similar to RadeonSI.
This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.
Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Copied from radeonsi.
Putting in the correct metadata flush commands for eventually not
flushing L2 on CB/DB switch.
Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT
at the moment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These are just shaders reads, so we need to invalidate L1.
Fixes: 6dbb0eaccc "radv: handle subpass cache flushes"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This can still be improved, but let's start with this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It makes more sense to move all scan stuff in the same place.
Also, we don't really need to duplicate the uses_primid field
for each stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
sync_files are in linux since 4.7, while the amdgpu fence_to_handle
ioctl is only in 4.15.
In particular we don't need it for sync_file in radv, because
everything happens via syncobjs, which got support earlier than
fence_to_handle.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When rasterization is disabled we can have that few.
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Seems like users are actually hitting 0xFFFFFFFF actually making
things broken for them, and the mad max regression is fixed, so
lets put this in once more.
v2: Use 0xf for depth-only htile. (Dave)
Fixes: af2844116f "radv: Revert HTILE reset word to 0xFFFFFFFF."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.
But with the code upstream it is easier to experiment with it.
v2: Remove initializing the registers from si_emit_config.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Those are implemented as texture sampling, so we need to make the
texture TC-compatible too.
Fixes: 34d23e82ca "radv: set some dcc parameters depending on if texture will be sampled"
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
If both source and destination are DCC compressed, and their formats
are not compatible, we need to decompress one of them to make
sure we can do reinterpretation (which needs src format == dst format)
.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.
This is also going to help implementing other stuff.
Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>