Commit graph

7470 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
d51a4b4c4b radv: Add initial CPU BVH building.
The algorithm used for the BVH:

1) first create 1 leaf per primitive (triangle/aabb/instance)
2) Then create internal layers from the bottom up until we are left with
   1 node in the top layer. Node i in the layer will have children
   (i*4+0) ... (i*4+3) in the previous layer.

This results in a very naive algorithm but it is also very simple to implement.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11078>
2021-06-18 22:16:27 +00:00
Bas Nieuwenhuizen
67e949a8f8 radv: Use the global BO list for acceleration structures.
We have nested structures so tracking this from the descriptor
set is going to be a mess.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11078>
2021-06-18 22:16:27 +00:00
Samuel Pitoiset
977355c6e5 radv: fix dynamic culling and depth/stencil related dynamic states
To avoid overwriting previous dynamic state with default state from
the pipeline.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4926
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11375>
2021-06-18 16:27:57 +00:00
Mike Blumenkrantz
651c6b16ff radv: move pipe_misaligned and l2_coherent image checks to flags set on init
this should save 4-5% cpu in some cases

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11462>
2021-06-18 16:02:26 +00:00
Samuel Pitoiset
60348360a2 radv: create only one pipeline for decompressing depth/stencil images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11263>
2021-06-18 14:15:30 +02:00
Samuel Pitoiset
213c4c5f44 radv: always decompress both aspects of a depth/stencil image
If compressed rendering is only used for the depth aspect of a
depth/stencil image, stencil might also be compressed and it needs
to be decompressed. This only happens for non-TC compatible images.

As long as the driver needs to decompress the depth aspect, I don't
think that decompressing the stencil aspect introduces extra cost.

Fixes dEQP-VK.renderpass*late_fragment_tests*.d32_sfloat_s8_uint for
chips that don't support TC-compat HTILE.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11263>
2021-06-18 14:15:30 +02:00
Samuel Pitoiset
50233d0daa radv: reject binding buffer/image when the device memory is too small
From the Vulkan spec 1.2.181:
    "The difference of the size of memory and memoryOffset must be
     greater than or equal to the size member of the
     VkMemoryRequirements structure returned from a call to
     vkGetImageMemoryRequirements with the same image"

This is invalid usage but adding a check in the driver is safe and
might avoid spurious failures.

This is a workaround for the inventory GPU hang with Cyberpunk 2077
which is actually a game bug. Luckily the game handles this error
gracefully.

Since the addrlib change from March, addrlib now selects a better
swizzle mode (4KB instead of 64KB) which reduces image size. Though,
the game assumes that an image with 2 mips is always smaller than the
same image but with 6 mips. This is not always true if the swizzle mode
is different. Then, it creates a D312 heap that is too small for the 2
mips image and the GPU hang with a memory violation, ugh...

Note that next vkd3d-proton release should also reject this but
fixing both sides is fine.

Cc: 21.1 mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4823
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4593
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11448>
2021-06-18 08:04:29 +00:00
Yiwei Zhang
ec1968dcc9 radv: fix build errors after commit 8b7ff784
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11373>
2021-06-16 19:55:48 +00:00
Daniel Stone
a8c1155209 ci/bare-metal: Set CPU and GPU governors to max, disable GPU runtime PM
Give us a bit more predictable performance by making sure we always run
at full tilt.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
2021-06-15 14:02:44 +02:00
Daniel Stone
0fcb53e8f4 ci/lava: Use HWCI_KERNEL_MODULES to load modules
One fewer difference to bare-metal.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
2021-06-15 14:02:44 +02:00
Rhys Perry
bc1c527834 aco/lower_phis: don't allocate unused temporary ids
The excessive number of temporary IDs caused #4872's live-out sets to be
extremely large and expensive to iterate.

With this change, #4872's shader is much faster to compile and uses much
less memory.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4872
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11300>
2021-06-14 16:48:38 +00:00
Rhys Perry
ecc0353af7 aco/lower_phis: fix undef_operands initialization with >32 predecessors
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11300>
2021-06-14 16:48:38 +00:00
Samuel Pitoiset
16d5939ff5 radv: fix dynamic rasterizer discard enable state
If a pipeline enables rasterizerDiscardEnable statically we have to
properly initialize the value, otherwise it won't be updated when a
new pipeline is bound.

Fixes few dEQP-VK.pipeline.extended_dynamic_state.*disable_raster.

Fixes: dd19bf9d7d ("radv: implement dynamic rasterizer discard enable")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11242>
2021-06-14 16:31:14 +00:00
Rhys Perry
d64f5a3f9d aco: move VMEM instructions below descriptor loads
This is to prevent sequences like:
   a = descriptor_load()
   vmem(a)
   b = descriptor_load()
   vmem(b)
and instead create:
   a = descriptor_load()
   b = descriptor_load()
   vmem(a)
   vmem(b)

fossil-db (GFX10.3):
Totals from 114521 (78.30% of 146267) affected shaders:
VGPRs: 4540352 -> 4540216 (-0.00%); split: -0.03%, +0.02%
CodeSize: 289864228 -> 289114652 (-0.26%); split: -0.29%, +0.03%
MaxWaves: 2940234 -> 2940338 (+0.00%); split: +0.00%, -0.00%
Instrs: 55112418 -> 54919910 (-0.35%); split: -0.38%, +0.03%
Latency: 956528393 -> 954682011 (-0.19%); split: -0.24%, +0.05%
InvThroughput: 229280830 -> 229238107 (-0.02%); split: -0.04%, +0.02%
VClause: 1141832 -> 1139002 (-0.25%); split: -0.63%, +0.38%
SClause: 2357840 -> 2225008 (-5.63%); split: -6.01%, +0.38%
Copies: 3316040 -> 3331519 (+0.47%); split: -0.31%, +0.77%
Branches: 1187212 -> 1186919 (-0.02%); split: -0.03%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6489>
2021-06-14 15:47:37 +00:00
Rhys Perry
bc71222cd9 aco: don't move descriptor loads below buffer loads
fossil-db (GFX10.3):
Totals from 52870 (36.15% of 146267) affected shaders:
VGPRs: 2109936 -> 2110056 (+0.01%); split: -0.01%, +0.01%
CodeSize: 134898056 -> 134812748 (-0.06%); split: -0.08%, +0.02%
MaxWaves: 1347354 -> 1347346 (-0.00%)
Instrs: 25598063 -> 25575415 (-0.09%); split: -0.11%, +0.02%
Latency: 432491613 -> 432047723 (-0.10%); split: -0.12%, +0.02%
InvThroughput: 90940977 -> 90927545 (-0.01%); split: -0.03%, +0.01%
VClause: 570039 -> 570019 (-0.00%); split: -0.05%, +0.04%
SClause: 1145076 -> 1139040 (-0.53%); split: -0.60%, +0.07%
Copies: 1513949 -> 1513102 (-0.06%); split: -0.32%, +0.26%
Branches: 524279 -> 524275 (-0.00%); split: -0.03%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6489>
2021-06-14 15:47:37 +00:00
Rhys Perry
f8bf6b9e0a aco/ra: use adjust_max_used_regs() in compact_relocate_vars()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6489>
2021-06-14 15:47:37 +00:00
Samuel Pitoiset
44e7057304 radv/winsys: remove useless errno.h includes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11269>
2021-06-14 15:52:48 +02:00
Samuel Pitoiset
ec7f7a7e33 radv/winsys: adjust some error messages
Report the return code from libdrm instead of errno. While we are at it,
fix the function name in radv_amdgpu_wait_timeline_syncobj().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11269>
2021-06-14 15:52:45 +02:00
Bas Nieuwenhuizen
720ee494e5 radv: Allow DCC images to be compressed with foreign queues.
Otherwise we would always decompress when transitioning to the
foreign queue.

Fixes: 8b9033ad0a ("radv: Support DCC modifiers fully.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10802>
2021-06-14 11:20:59 +00:00
Bas Nieuwenhuizen
f44a6c6a54 radv: Actually return correct value for read-only DCC compressedness.
Most stuff that depends on the value wouldn't be triggered anyway but
...

Fixes: b5ecf0748a ("radv: Ensure we never decompress or FCE read-only textures.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10802>
2021-06-14 11:20:59 +00:00
Bas Nieuwenhuizen
f7c622307d radv: Don't skip barriers that only change queues.
We depend on the queue mask for some decisions ...

CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10802>
2021-06-14 11:20:59 +00:00
Rhys Perry
1d50ef9ca6 aco: adjust the condition for expanding vertex fetch data format
Instead of avoiding out-of-bounds access, avoid creating a load larger
than the original attribute. This should work just as well, since the only
situations expending a load helped was because we shrunk it first.

Also fixes a bug where a 3 component load (4 components with the first
component skipped) would be incorrectly expanded to 4 components because
the stride check would never be performed. Maybe we should avoid skipping
the first component in some situations, but I'm not sure if it's worth
the VGPR cost.

fossil-db (vega10):
Totals from 583 (0.39% of 149974) affected shaders:
CodeSize: 1496848 -> 1500868 (+0.27%); split: -0.03%, +0.30%
Instrs: 286155 -> 286575 (+0.15%); split: -0.07%, +0.22%
Latency: 2947101 -> 2946865 (-0.01%); split: -0.23%, +0.22%
InvThroughput: 797396 -> 797127 (-0.03%); split: -0.08%, +0.04%

fossil-db (polaris10):
Totals from 583 (0.39% of 151365) affected shaders:
SGPRs: 38880 -> 39216 (+0.86%)
VGPRs: 24440 -> 24356 (-0.34%)
CodeSize: 1506808 -> 1510876 (+0.27%); split: -0.01%, +0.28%
Instrs: 288735 -> 289167 (+0.15%); split: -0.06%, +0.21%
Latency: 2963263 -> 2961884 (-0.05%); split: -0.24%, +0.19%
InvThroughput: 802351 -> 801665 (-0.09%); split: -0.12%, +0.04%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9007>
2021-06-14 09:48:32 +00:00
Rhys Perry
91f8f82806 radv,aco: use all attributes in a binding to obtain an alignment for fetch
Instead of assuming scalar alignment for an attribute, we can use the
required alignment of other attributes in a binding to expect a higher
one.

This uses the alignment of all attributes in the pipeline, not just the
ones loaded. This can create slightly better code, but could break
pipelines which relied on unused (and unaligned) attributes no being
loaded. I don't think such pipelines are allowed by the spec.

fossil-db (Sienna Cichlid):
Totals from 44350 (30.32% of 146267) affected shaders:
VGPRs: 1694464 -> 1700616 (+0.36%); split: -0.08%, +0.44%
CodeSize: 60207184 -> 58093836 (-3.51%); split: -3.51%, +0.00%
MaxWaves: 1175998 -> 1174948 (-0.09%); split: +0.02%, -0.11%
Instrs: 11763444 -> 11458952 (-2.59%); split: -2.60%, +0.01%
Latency: 70679612 -> 67062215 (-5.12%); split: -5.27%, +0.15%
InvThroughput: 11482495 -> 11362911 (-1.04%); split: -1.20%, +0.16%
VClause: 359459 -> 343248 (-4.51%); split: -6.36%, +1.85%
SClause: 422404 -> 419229 (-0.75%); split: -1.17%, +0.42%
Copies: 754384 -> 764368 (+1.32%); split: -1.74%, +3.06%
Branches: 197472 -> 197474 (+0.00%); split: -0.03%, +0.03%
PreVGPRs: 1215348 -> 1215503 (+0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9007>
2021-06-14 09:48:32 +00:00
Daniel Stone
d0e5203855 ci/lava: Use per-job rootfs overlay for environment
Trying to get arbitrary strings suitably quoted for shell, embedded in a
YAML file, processed by Python templating, is like seven bad ideas all
embedded into one big can of bees.

Reuse the same script we use for bare-metal to generate the environment,
tar that up into a per-job overlay which is added to the
inter-pipeline-reusable rootfs built by the container jobs and the
intra-pipeline-reusable overlay built by the build jobs.

@anholt wrote a chunk of this - replacing the $ENV_VARS GitLab CI
variable with a Python loop across the POSIX job environment - in
!11192, but this still had YAML quoting nightmares, and was more
needless duplication between LAVA and bare-metal.

The diff is large and annoying, but is mostly a sed job to get
ENV_VARS="FOO=bar BAZ=quux" into FOO: bar\nBAZ: quux.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Schürmann
bb1c06343d aco/ra: refactor register assignment for vector operands
No functional changes.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8764>
2021-06-11 12:35:46 +02:00
Daniel Schürmann
09b99f1b7c aco/ra: refactor affinity coalescing
Also adds v_interp_p2_f32 to the list of
affinity-related instructions.

Totals from 68 (0.05% of 149839) affected shaders (GFX10.3):
CodeSize: 792928 -> 792056 (-0.11%)
Instrs: 152843 -> 152625 (-0.14%)
Latency: 1235353 -> 1235278 (-0.01%)
InvThroughput: 224087 -> 224049 (-0.02%)
Copies: 9218 -> 9000 (-2.36%)

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8764>
2021-06-11 12:35:31 +02:00
Daniel Schürmann
3a98f484d1 aco/ra: only create phi-affinities for killed operands
If a phi-operand is not killed, it must be copied anyway.
The additional affinity would only overwrite any potential
better affinity that was already created

Totals from 1067 (0.71% of 149839) affected shaders (GFX10.3):
VGPRs: 68072 -> 68064 (-0.01%)
CodeSize: 8252588 -> 8245220 (-0.09%); split: -0.12%, +0.03%
Instrs: 1596146 -> 1593941 (-0.14%); split: -0.16%, +0.02%
Latency: 18828176 -> 18823914 (-0.02%); split: -0.08%, +0.06%
InvThroughput: 3575063 -> 3574787 (-0.01%); split: -0.05%, +0.04%
VClause: 24345 -> 24325 (-0.08%); split: -0.16%, +0.07%
Copies: 88712 -> 87398 (-1.48%); split: -1.77%, +0.29%
Branches: 52067 -> 51364 (-1.35%); split: -1.38%, +0.03%

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8764>
2021-06-11 12:35:12 +02:00
Georg Lehmann
d3f735a249 ac: Enable 32bit predication on gfx9 with fw feature version 52.
Amdvlk does this as well and it passes the vulkan CTS on renoir.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11297>
2021-06-11 06:07:10 +00:00
Georg Lehmann
fc437ef944 ac: Enable 32bit predication on gfx10.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11297>
2021-06-11 06:07:10 +00:00
Georg Lehmann
a41ba20cbd ac: Check me_fw_feature for 32bit predication on gfx10.3
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11297>
2021-06-11 06:07:10 +00:00
Samuel Pitoiset
4026a07e74 radv: fix aligning the image offset by using align64()
This doesn't fix anything known. Found by inspection.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11302>
2021-06-11 07:35:32 +02:00
Rhys Perry
6204e17b44 radv: increase maxComputeSharedMemorySize
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11262>
2021-06-10 12:55:53 +00:00
Rhys Perry
9162963f0a aco: fix emit_mbcnt() with a VGPR mask
Found by inspection. Should be possible with nir_intrinsic_mbcnt_amd.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11295>
2021-06-10 11:21:47 +00:00
Timur Kristóf
18337fbcf2 aco: Use as_vgpr for the second source of mbcnt_amd.
Fixes: 1e49018ced
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11292>
2021-06-10 10:13:02 +00:00
Samuel Pitoiset
3a643a9ce1 ci: add expected list of failures for Bonaire (RADV)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11239>
2021-06-10 09:36:55 +00:00
Samuel Pitoiset
cfe7e81214 radv/winsys: add a small comment explaining the CHAIN bit
Without it the hardware launches an IB2 which might hang in some
rare situations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11214>
2021-06-10 08:31:11 +02:00
Samuel Pitoiset
a234840e60 radv: do not launch an IB2 for secondary cmdbuf with INDIRECT_MULTI on GFX7
It's illegal to emit DRAW_{INDEX}_INDIRECT_MULTI from an IB2 on GFX7.

PAL applies this workaround for indirect dispatches and also on
GFX8-9 but it doesn't seem needed.

This fixes various GPU hangs on Bonaire (GFX7).

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11214>
2021-06-10 08:31:08 +02:00
Timur Kristóf
1e49018ced amd: Add extra source to the mbcnt_amd NIR intrinsic.
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
f6b2db298f ac/nir: Refactor and optimize the repacking sequence.
According to feedback, the terminology with "exclusive scan"
and "reduction" is difficult. Change it to use "repack" instead,
which better fits what this sequence is actually used for.

The new sequence stores only 1 byte / wave to LDS, and uses packed
instructions to produce the results. This has lower latency and
fewer instructions than what we previously had.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
b4e22eb482 aco: Keep VGPR destinations for uniform shared loads when beneficial.
When the result of these loads is only used by cross-lane instructions,
it is beneficial to use a VGPR destination. This is because this allows
to put the s_waitcnt further down, which decreases latency.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
ce141e4c5f aco: Implement byte and lane permute intrinsics.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
5713e059ea aco: Add validation for v_permlane instructions.
Previously there hasn't been any validation for these instructions,
but after shooting myself in the leg with it a few times, I decided
to add the validation now.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
fd6605367d aco: Implement nir_op_sad_u8x4.
Fix up the operand size for v_sad instructions, and implement
the new NIR horizontal add. There is no viable way to do this
in SALU, so let's always use a VGPR destination.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
228169c87c aco: Add note about v_alignbyte in the ISA README.
We tried to use this instruction for a more optimal sequence,
but it turned out that it doesn't exactly work as it was
supposed to. This note is to help others who want to use it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Rhys Perry
c129ede523 aco: use ds_read_{u8,u16}_d16
This allows partial writes and writes to the upper half of the destination.

fossil-db (Sienna Cichlid):
Totals from 135 (0.09% of 149839) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Rhys Perry
6334d73fc9 aco: don't ever widen 8/16-bit sgpr load_shared
Doesn't seem to create incorrect code, but it is suboptimal.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Rhys Perry
d2b9c7e982 radv: improve LDS alignment check for load/store vectorization
Previously, this could vectorize two scalar 16-bit loads into a u8vec4
load.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Rhys Perry
4870d7d829 aco: use v1b/v2b for ds_read_u8/ds_read_u16
The p_extract_vector isn't necessary.

For ds_read_u8 and ds_read_u16, we used a 32-bit regclass, but did't load
32 bits, and used dst_hint for vector loads when we shouldn't have.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4863
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Samuel Pitoiset
2fb436e92a ci: update list of expected failures for Pitcairn/Oland (RADV)
The robustness2 failures were a mistake because they are actually
not supported (no VK_EXT_scalar_block_layout on GFX6).

The sparse related failures are no longer supported since sparse
is only enabled for Polaris10+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11243>
2021-06-09 11:27:44 +00:00
Samuel Pitoiset
d169dad393 aco: fix emitting literal offsets with SMEM on GFX7
When the offset is negative, reg() isn't 255. Fix this by splitting
SGPR and literal emission. While we are at it, adjust a comment
saying that literals are also accepted on GFX6 which is wrong.

Fixes another batch of robustness tests.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11247>
2021-06-09 11:10:38 +00:00