Commit graph

7942 commits

Author SHA1 Message Date
James Park
a64b36ecaf ac/surface: Move drm_fourcc.h to common header
Useful for including from RADV without copy/paste.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9709>
2021-04-22 08:16:11 +00:00
Rhys Perry
5760386654 radv: only set robust_modes if robustBufferAccess2 is enabled
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10384>
2021-04-21 20:26:58 +00:00
Rhys Perry
8408d0312f radv: improve vectorization callback for small bit sizes
More accurately reflect the hardware's capabilities for byte and short
aligned VMEM operations.

fossil-db (GFX10.3):
Totals from 65 (0.05% of 139391) affected shaders:
SGPRs: 4296 -> 4200 (-2.23%)
CodeSize: 1000984 -> 1000368 (-0.06%); split: -0.13%, +0.07%
Instrs: 177504 -> 177380 (-0.07%); split: -0.17%, +0.10%
Cycles: 36820596 -> 36812792 (-0.02%); split: -0.15%, +0.13%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10384>
2021-04-21 20:26:58 +00:00
Rhys Perry
776ba40115 aco: add and use Program::progress
This is used when printing the program and to avoid updating register
demand during post-RA liveness analysis.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
2021-04-21 11:09:33 +00:00
Rhys Perry
2d36232e62 aco: allow SDWA sels smaller than the operand size
p_extract_vector copy-propagation can create byte sels for v2b operands.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
2021-04-21 11:09:33 +00:00
Rhys Perry
655ba1e3a9 aco: don't update register demand during RA validation
It isn't intended to be accurate after RA, so num_waves can become zero,
breaking the sgpr_limit calculation.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
2021-04-21 11:09:33 +00:00
Samuel Pitoiset
1cf39001cd radv: allow concurrent MSAA images to be FMASK compressed
DCC decompress/FMASK expand are supported on compute queues. Since
the driver doesn't perform fast clears with concurrent images, we
don't need to perform a FCE on compute (we can't anyways).

This fixes a performance regression with Control and
VKD3D_CONFIG=multi_queue.

One more optimization (as discussed with Bas) is to implement FCE
on compute to allow fast clears.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10323>
2021-04-21 08:19:14 +00:00
Rhys Perry
0eaa5dfac0 aco: remove image parameter from get_sampler_desc()
We can just check whether tex_instr is NULL instead.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>
2021-04-20 17:42:21 +00:00
Rhys Perry
6a7b89c89d ac/nir: set TRUNC_COORD=0 for nir_texop_tg4
Fixes black squares in Assassin's Creed: Valhalla and rendering of
FidelityFX-CACAO demo.

shader-db (sienna cichlid):
Totals:
SGPRS: 2977068 -> 2977220 (0.01 %)
VGPRS: 1929624 -> 1929616 (-0.00 %)
Spilled SGPRs: 5769 -> 5769 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 61423092 -> 61424672 (0.00 %) bytes
Max Waves: 895765 -> 895766 (0.00 %)

Totals from affected shaders:
SGPRS: 9520 -> 9672 (1.60 %)
VGPRS: 7464 -> 7456 (-0.11 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 527432 -> 529012 (0.30 %) bytes
Max Waves: 1819 -> 1820 (0.05 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: 58f25098a0 ("radv: Use TRUNC_COORD on samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>
2021-04-20 17:42:21 +00:00
Rhys Perry
3cbe9894f7 aco: set TRUNC_COORD=0 for nir_texop_tg4
Fixes black squares in Assassin's Creed: Valhalla and rendering of
FidelityFX-CACAO demo.

fossil-db (sienna cichlid):
Totals from 3052 (2.09% of 146267) affected shaders:
SpillSGPRs: 8437 -> 8646 (+2.48%)
CodeSize: 30993832 -> 31116916 (+0.40%); split: -0.00%, +0.40%
Instrs: 5869934 -> 5886783 (+0.29%); split: -0.00%, +0.29%
Latency: 250330521 -> 250463770 (+0.05%); split: -0.00%, +0.05%
InvThroughput: 59797617 -> 59814584 (+0.03%); split: -0.00%, +0.03%
VClause: 92114 -> 92132 (+0.02%)
SClause: 197373 -> 197338 (-0.02%); split: -0.02%, +0.01%
Copies: 479482 -> 482394 (+0.61%); split: -0.01%, +0.61%
Branches: 219629 -> 219635 (+0.00%)
PreSGPRs: 248970 -> 249366 (+0.16%)

fossil-db (polaris10):
Totals from 3050 (2.06% of 147787) affected shaders:
SGPRs: 282864 -> 282912 (+0.02%); split: -0.01%, +0.02%
VGPRs: 242572 -> 242612 (+0.02%)
SpillSGPRs: 10387 -> 10675 (+2.77%)
CodeSize: 31872460 -> 31996128 (+0.39%)
MaxWaves: 10924 -> 10925 (+0.01%)
Instrs: 6222217 -> 6239072 (+0.27%)
Latency: 317482545 -> 317773685 (+0.09%); split: -0.00%, +0.09%
InvThroughput: 156149624 -> 156242072 (+0.06%); split: -0.00%, +0.06%
VClause: 92295 -> 92254 (-0.04%); split: -0.05%, +0.01%
SClause: 243342 -> 243321 (-0.01%); split: -0.01%, +0.00%
Copies: 678902 -> 681700 (+0.41%); split: -0.00%, +0.41%
Branches: 219698 -> 219703 (+0.00%)
PreSGPRs: 244251 -> 244644 (+0.16%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 58f25098a0 ("radv: Use TRUNC_COORD on samplers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3110
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>
2021-04-20 17:42:21 +00:00
Samuel Pitoiset
6aaa325f89 radv: remove radv_image_iview::multiplane_planes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10218>
2021-04-20 12:16:33 +00:00
Samuel Pitoiset
8198aeac8d radv: remove radv_image_iview::bo
This saves one 64-bit pointer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10218>
2021-04-20 12:16:33 +00:00
Bas Nieuwenhuizen
9da4590df8 amd/common: Use cap to test kernel modifier support.
Turns out both kernel v5.10 and v5.11 have the same amdgpu driver
version and only one has modifiers ... In addition the version check
is kinda annoying for backports.

So lets use the cap. Since the cap is technically about ADDFB2 I
tested that this works on rendernodes (and reading the code there
is no distinction from what kind of node this is called).

Fixes: 9a937330ef ("radeonsi: Only set modifier creation function for GFX9+ & with kernel support.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10337>
2021-04-20 11:59:43 +00:00
Samuel Pitoiset
1d3542694b radv: fix emitting depth bias when beginning a command buffer
If depth bias is enabled but zero values used, they were never
emitted to the command buffer because they are equal to the default
values.

Previously, they were always emitted when the bound DS attachment
changed.

This should fix some sort of Z fighting with Dota2 on all GPUs.
This also fixes a different issue (ie. some occlusion queries failures)
on GFX6 because CLEAR_STATE is not used on that chip.

Fixes: 8a47422d97 ("radv: do not scale the depth bias for D16_UNORM depth surfaces")
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10310>
2021-04-20 09:09:38 +00:00
Samuel Pitoiset
e4c0724dc6 radv: fix fast clearing depth-only or stencil-only aspects with HTILE
DB isn't coherent with L2 on GFX6-8. This is needed when the
clear HTILE mask path is selected.

This fixes an issue with avatars in Heroes of The Storm.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3362
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10277>
2021-04-20 08:20:49 +00:00
Bas Nieuwenhuizen
a144fa608d radv: Fix memory leak on descriptor pool reset with layout_size=0.
Gotta track those sets too to free them. Alse changed the search
on destroy to check for set instead of offset since offset is not
necessarily unique anymore.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4652
CC: mesa-stable
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10317>
2021-04-19 19:19:58 +00:00
Samuel Pitoiset
9434675d60 aco: fix opquantize2f16 on GFX6-7
Make sure to preserve signed zeroes.

Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.

Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
2021-04-19 16:33:37 +00:00
Marek Olšák
ec1ddb976a amd/registers: rename IMG_FORMAT to GFX10_FORMAT to disambiguate the meaning
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
2021-04-17 02:37:49 +00:00
Marek Olšák
3e0ce4af4f amd/registers: clean up gfx103.json
because gfx103.json is automatically generated and can't be changed
manually. This fixes the file generator without changing the generated
header.

Missing registers must be in registers-manually-defined.json, and
missing fields must be in parse_kernel_headers.py.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
2021-04-17 02:37:49 +00:00
Marek Olšák
a142925b7a amd/registers: fix the kernel header parser with latest headers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
2021-04-17 02:37:49 +00:00
Rhys Perry
86d903e88d radv: fix clearing DCC-compressed e5b9g9r9 images
Fixes
dEQP-VK.api.image_clearing.core.clear_color_image.2d.optimal.single_layer.e5b9g9r9_ufloat_pack32_33x128
with RADV_DEBUG=forcecompress on GFX10.3.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 21.1 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10176>
2021-04-16 11:56:32 +01:00
Samuel Pitoiset
66e1b42d06 radv: keep DCC compressed for clears on compute with image stores
Without image stores, DCC is always decompressed on compute.

Cc: 21.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10168>
2021-04-16 09:47:52 +00:00
Marek Olšák
84895dba7f amd: remove some references to older LLVM versions in comments
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Marek Olšák
b878444c3a amd: drop support for LLVM 10
It doesn't support RDNA 2.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Marek Olšák
2747332723 amd: drop support for LLVM 9
This would be easy to support except that it doesn't support RDNA 2.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Samuel Pitoiset
936b58378c amd: drop support for LLVM 8
It doesn't support Navi1x and the removal enables this nice code cleanup.

v2: rebase - mareko

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Michel Dänzer
d200f45875 Use explicit break instead of fall-through to break-only case
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Michel Dänzer
2928c21eb7 Convert most remaining free-form fall-through comments to FALLTHROUGH
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Rhys Perry
ec70882238 radv: fix barrier in radv_decompress_dcc_compute shader
ACO doesn't create a waitcnt for barriers between texture samples and
image stores because texture samples are supposed to use read-only
memory. It could also schedule the barrier to above the texture sample.
We also have use a larger memory scope to avoid an ACO optimization.

Tested on GFX8 with Sachsa Willems deferred sample. With some DCC
decompressions and the compute path forced.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 21.1 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9496>
2021-04-15 12:02:36 +00:00
Hans-Kristian Arntzen
08fdaec473 radv: Allocate buffer list for MUTABLE descriptor types as well.
Fixes: 86644b84b9 ("radv: Implement VK_VALVE_mutable_descriptor_type.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10132>
2021-04-15 11:51:33 +00:00
Hans-Kristian Arntzen
b60bc59180 radv: Take image alignment into account when allocating MUTABLE pool.
Allocating a descriptor set is aligned to 32 bytes, so just like the
other buffer types, bump the descriptor size to 32 bytes when allocating
MUTABLE descriptor types from a pool.

Fixes: 86644b84b9 ("radv: Implement VK_VALVE_mutable_descriptor_type.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10132>
2021-04-15 11:51:33 +00:00
Simon Ser
35e25ea1d0 ac/surface: allow non-DCC modifiers for YUV on GFX9+
Accept non-linear tiling for multi-planar formats on GFX9+, as long
as DCC is disabled. DCC support is possible in theory, but untested
for now.

GFX8 is still restricted to linear tiling because it's not yet clear
how modifiers should be handled on these chips for multi-planar
formats. Each plane may need a different modifier.

Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10134>
2021-04-15 09:43:17 +00:00
Simon Ser
19378dfe3c ac/surface: use blocksizebits instead of blocksize
util_format_get_blocksize asserts that the blocksize isn't zero.
However the blocksize will be zero if the format's channel encoding
is unspecified. The channel encoding is only meaningful for the
plain u_format layout, so util_format_get_blocksize can't be used
for formats with another layout. For example, YUV formats don't have
the channel encoding specified.

Use util_format_get_blocksizebits, which just returns zero without
an assertion for formats which don't have a channel encoding.

Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10134>
2021-04-15 09:43:17 +00:00
Rhys Perry
5b8a4516e6 aco/ra: remove live-in temporary from live_out_per_block when moving it
Otherwise, handle_loop_phis() might pass it to handle_live_in() and then
we could have two phis for this variable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 7c64623e94 ("aco/ra: refactor SSA repairing during register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>
2021-04-14 19:04:08 +00:00
Rhys Perry
11fde1247c aco/ra: use original names when renaming loop carried phi operands
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 7c64623e94 ("aco/ra: refactor SSA repairing during register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>
2021-04-14 19:04:08 +00:00
Samuel Pitoiset
97e7b21c42 ac: add missing BUF_DATA_FORMAT_10_11_11 vertex format on GFX10+
This format is supported by the driver.

Fixes vertex explosion in Dirt 5.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4635
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10226>
2021-04-14 18:07:41 +00:00
Bas Nieuwenhuizen
8ddbac0377 radv/winsys: Remove use_local_bos
Now that perftest is stored in the winsys.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10198>
2021-04-14 15:16:17 +00:00
Bas Nieuwenhuizen
284bc57a49 radv: Use VRAM cmdbuffers in more situations.
In most games I tested we use 32 MiB of cmdbuffers+cmd upload buffers
at most. Especially since we have mutable descriptors it seems
somewhat unlikely anything else will eat it up so be a bit more
aggressive allocating them in VRAM.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10198>
2021-04-14 15:16:17 +00:00
Bas Nieuwenhuizen
057ec395a4 radv: Refactor cs_domain to be a winsys function.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10198>
2021-04-14 15:16:17 +00:00
Timur Kristóf
f3e004cb56 aco: Add a simple heuristic to decide early or late primitive export.
Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.

The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.

Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.

Fossil DB on Sienna:

Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
5dbab03a80 aco: Emit fewer branches for NGG VS/TES with late primitive export.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
af7d5f5b86 aco: Set block_kind_export_end in create_vs/fs_exports.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
2b312a4fd7 aco: Extract ngg_nogs_export_prim_id to a separate function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
231ef14b3d aco: Use s_setprio 3 at the beginning of every VS and TES.
The user-set priority of shaders matters very little, but we hope
this might still help speed up VS input loads especially.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
4c86c7aa15 aco: Remove useless s_setprio near gs_alloc_req.
We learned that the gs_alloc_req is not actually when the export
space allocation happens. So it makes no sense to prioritize it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
2021-04-14 14:25:10 +00:00
Timur Kristóf
75cd43741a aco: Align NGG scratch size to 16 so a single ds_read can always read it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>
2021-04-14 14:05:24 +00:00
Timur Kristóf
c1346e5c22 aco: Optimize workgroup exclusive scan to better avoid bank conflicts.
Previously, every wave had multiple active lanes read the LDS, and
the data was processed by VALU DPP instructions.

Now, only the first lane reads the LDS in order to avoid bank
conflicts, and the results are processed by SALU.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>
2021-04-14 14:05:24 +00:00
Rhys Perry
e3ebc1ca4b radv: fix conditions for running nir_opt_vectorize
No fossil-db changes, probably because all fp16 shaders have at least one
16-bit mov or vec2 somehwere.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10227>
2021-04-14 12:27:06 +00:00
Samuel Pitoiset
e24049da63 radv: advertise attachmentFragmentShadingRate on GFX10.3
Layered VRS attachments is for later.
The CTS failures are similar to the existing ones, I will investigate
soon.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
2021-04-14 09:31:13 +00:00
Samuel Pitoiset
ee77dde396 radv: configure the VRS combiners when an attachment is used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
2021-04-14 09:31:13 +00:00