Commit graph

315 commits

Author SHA1 Message Date
Marek Olšák
f8dd2f5bac radeonsi: fold info->indirect conditionals into the last one in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák
408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák
5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
ac059f1c23 radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
879c73fac8 radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
573bf0940a radeonsi: always set the TCL1_ACTION_ENA when invalidating L2
Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
cf248929bf radeonsi: use a global dirty mask for shader pointers
Only vertex buffers use a separate bool flag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Bas Nieuwenhuizen
0ef1b4d5b1 ac/debug: Move IB decode to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:59 +01:00
Marek Olšák
ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
5871ebd7f1 radeonsi: add HUD queries for cache flush stats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
a816c7fe07 radeonsi: add a tess+GS hang workaround for VI dGPUs
ported from Vulkan

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
78c4528ae7 radeonsi: apply a tessellation bug workaround for SI
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
72d48fcd8e radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips
All codepaths are handled except for clover.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Nicolai Hähnle
908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Marek Olšák
dc6bbe2dd0 gallium/radeon: use r600_gfx_write_event_eop everywhere
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
d4d9ec55c5 radeonsi: implement TC-compatible HTILE
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.

Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.

This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.

I don't see a measurable increase in performance though.

(I tested Talos Principle and DiRT: Showdown, the latter is improved by
 0.5%, which is almost noise, and it originally used layered Z16,
 so at least we know that Z16 promoted to Z32F isn't slower now)

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák
40e1f7e09b radeonsi: use TC write-back instead of full cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
8c6ea5a6ff radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:07 +02:00
Marek Olšák
82e51e8188 radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emission
We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:56 +02:00
Marek Olšák
3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
James Legg
e33f31d61f radeonsi: Fix primitive restart when index changes
If primitive restart is enabled for two consecutive draws which use
different primitive restart indices, then the first draw's primitive
restart index was incorrectly used for the second draw.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 15:57:37 +02:00
Marek Olšák
a5a2cc530c radeonsi: fix the VGT performance tweak for small instances
Based on the VGT spec.

The Vulkan driver doesn't do it optimally and they plan to fix it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
bdf767dac4 radeonsi: add more unlikely() uses into si_draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
a8e7ea6abc radeonsi: skip draws with instance_count == 0
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
22cb5aecbe radeonsi: fix variable naming in si_emit_cache_flush
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
911202817d radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not used
for less noise in the HUD

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
addca75f4e radeonsi: add HUD queries for counting VS/PS/CS partial flushes
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
1469c70c2a radeonsi: fix a badly implemented GS bug workaround
Limit it to geometry shaders and Hawaii.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Nicolai Hähnle
6d7177f01b radeonsi: program additional multi draw parameters
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
b6c71d37c7 radeonsi: program the DRAWID SGPR
Note that for indirect draws, the new MULTI firmware packets are required.

There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
d34292a77f radeonsi: remove an incorrect assertion
Byte indices don't need any alignment, so remove this assertion (it got moved
into a path where a piglit test hit it during the refactoring of
commit 64ff23a58c).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Nicolai Hähnle
2852dedaa0 radeonsi: flush TC L2 cache for indirect draw data
This fixes a bug when indirect draw data is generated by transform
feedback.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Nicolai Hähnle
96bbb620a5 radeonsi: add has_draw_indirect_multi flag
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.

Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:06 +02:00
Nicolai Hähnle
5c343cce0f radeonsi: transpose indirect/index draw dispatch
This allows better code sharing for indirect draw calls.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:04 +02:00
Nicolai Hähnle
64ff23a58c radeonsi: move index buffer calculations in si_emit_draw_packets up
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:02 +02:00
Nicolai Hähnle
cf7d18b75c radeonsi: unify emitting PKT3_SET_BASE for indirect draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:52:59 +02:00
Marek Olšák
d82cfab84c radeonsi: deal with high vertex buffer memory usage correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c56ecb68e7 radeonsi: take scratch buffer and draw indirect memory usage into account
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Nicolai Hähnle
6f73c7595f radeonsi: remove the DRAW_PREAMBLE packet
According to firmware guys, the new sequence that we added for Polaris should
work on all CIK parts, and should actually be faster on some parts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-16 13:02:37 +02:00
Marek Olšák
49e3c74cdd gallium/radeon: add a heuristic enabling DCC for scanout surfaces (v2)
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.

At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)

Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.

The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.

The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.

The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).

v2: rename stuff, add assertions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
4b11ef23b4 radeonsi: use conformant line rasterization
AA lines are not completely correct (see TODO), but everything else
should be.

+ 3 linestipple piglits

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00