Commit graph

2399 commits

Author SHA1 Message Date
Alex Smith
7ca0167ae9 radv: Consolidate GFX9 merged shader lookup logic
This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:31 +01:00
Alex Smith
0fa51bfdbe radv: Set active_stages the same whether or not shaders were cached
With GFX9 merged shaders, active_stages would be set to the original
stages specified if shaders were not cached, but to the stages still
present after merging if they were.

Be consistent and use the original stages.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:01 +01:00
Bas Nieuwenhuizen
b9fb2c266a radv: Add startup debug option.
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.

A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.

This adds some info messages so we can easily get a some useful
output from end users.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen
38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen
729f7373de radv: Make the sem_info allocate/free functions static.
They are only used in 1 file.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen
c2799574eb radv: Only expose subgroup shuffles on VI+.
The current implementation depends on bpermute, which
is VI+.

Fixes: f2c6a55061 "radv: enable subgroup capabilities"
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-30 13:49:46 +02:00
Samuel Pitoiset
02c7916298 radv: fix emitting descriptor pointers with LLVM < 7
This was terribly wrong, I forced use of 32-bit pointers when
emitting shader descriptor pointers. This fixes GPU hangs with
LLVM 5&6 because 32-bit pointers are only supported with LLVM 7.

Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-30 11:38:54 +02:00
Samuel Pitoiset
88d1ed0f81 radv: emit shader descriptor pointers consecutively
This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.

We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:18 +02:00
Samuel Pitoiset
21baf33a94 radv: allow radv_emit_shader_pointer_head() to emit more pointers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:16 +02:00
Samuel Pitoiset
288fe7ec71 radv: split radv_emit_shader_pointer()
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:13 +02:00
Bas Nieuwenhuizen
a29bc043ae radv: Implement VK_KHR_draw_indirect_count.
Literally the same as the AMD ext.

Passes *indirect_draw_count* CTS tests.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:08:26 +02:00
Bas Nieuwenhuizen
6914d5a2c0 radv: Implement alternate GFX9 scissor workaround.
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)

It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.

I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:04:25 +02:00
Samuel Pitoiset
45eb24fedf radv: run the EarlyCSEMemSSA LLVM pass
It's recommended by the instruction combining pass, and
RadeonSI also runs it.

This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.

Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)

Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 14:24:14 +02:00
Samuel Pitoiset
66e38654c9 radv: fix dumping compute shader on the graphics queue
The graphics pipeline can be NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:07 +02:00
Samuel Pitoiset
de06dfa9ea radv: add radv_dump_pipeline_state() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:05 +02:00
Samuel Pitoiset
6f0530ecfe radv: rework how shaders are dumped when generating a hang report
Use a flag for the active stages instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:03 +02:00
Samuel Pitoiset
8c406f0b4d radv: remove unused parameter in radv_dump_annotated_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:57:59 +02:00
Marek Olšák
25cdf754e4 radeonsi: remove some old gfx 9.x registers
Leftover from bring up.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Marek Olšák
8c1c451a90 ac/surface/gfx6: don't overallocate mipmapped HTILE
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Samuel Pitoiset
38a8c5903b radv: call nir_lower_io_to_temporaries for VS, GS, TES and FS
Do not lower FS inputs because this moves all load_var
instructions at beginning of shaders and because
interp_var_at_sample (and friends) seem broken. That might
be eventually enabled later on if we really want to preload
all FS inputs at beginning.

Polaris10:
Totals from affected shaders:
SGPRS: 54072 -> 54264 (0.36 %)
VGPRS: 38580 -> 38124 (-1.18 %)
Spilled SGPRs: 652 -> 652 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2128116 -> 2127380 (-0.03 %) bytes
Max Waves: 8048 -> 8086 (0.47 %)

Vega10:
Totals from affected shaders:
SGPRS: 52616 -> 52656 (0.08 %)
VGPRS: 37536 -> 37116 (-1.12 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Code Size: 2043756 -> 2042672 (-0.05 %) bytes
Max Waves: 9176 -> 9254 (0.85 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:57 +02:00
Samuel Pitoiset
ded1509587 radv: call nir_split_var_copies() before nir_lower_var_copies()
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:54 +02:00
Bas Nieuwenhuizen
699e1f5aac ac: Use DPP for build_ddxy where possible.
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.

This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.

v2: Use ac_build_quad_swizzle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-23 21:02:45 +02:00
Bas Nieuwenhuizen
047438287c ac/surface/gfx6: Don't force a tile index for fmask.
The bpe of the fmask often differs from the bpe of the main
surface. On SI that means it has to get a different tile
index.

addrlib is capable of figuring this out itself, so just pass
-1 instead to let it know that it is not preset.

Fixes: 9bf3570fed "ac/surface/gfx6: compute FMASK together with the color surface"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-23 02:23:03 +02:00
Samuel Pitoiset
75e919c045 radv: fix computation of user sgprs for 32-bit pointers
With 32-bit pointers we only need one user SGPR per desc set.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:29 +02:00
Samuel Pitoiset
c5536fc813 radv: drop user_sgpr_info::sgpr_count
It's only used inside allocate_user_sgprs().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:26 +02:00
Samuel Pitoiset
36a4d6d081 radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.

Vega10:
Totals from affected shaders:
SGPRS: 1008722 -> 1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size: 34819208 -> 35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)

Polaris10:
Totals from affected shaders:
SGPRS: 1029849 -> 1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size: 34915336 -> 35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:22 +02:00
Samuel Pitoiset
b654ef5808 radv: add set_loc_shader_ptr() helper
This helper will hep for switching to 32-bit GPU pointers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:20 +02:00
Samuel Pitoiset
14a7547c08 radv: allocate descriptor BOs in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:18 +02:00
Samuel Pitoiset
0d1406ad12 radv: allocate the upload BO in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:17 +02:00
Samuel Pitoiset
d8a61d3232 radv: set amdgpu-32bit-address-high-bits LLVM attribute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:15 +02:00
Samuel Pitoiset
fe2649d3ad radv/winsys: allow to allocate BOs in the 32-bit addr space
This introduces a new flag called RADEON_FLAG_32BIT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:13 +02:00
Samuel Pitoiset
b60e0ee789 radv/winsys: request high address
This is needed for 32-bit GPU pointers. Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:09 +02:00
Samuel Pitoiset
73df16dcee radv: fix centroid interpolation
It's legal to set the centroid and sample interpolation modes
when MSAA disabled. So, we have to initialize the centroid
inputs because the hardware doesn't.

This fixes rendering issues with DXVK and The Witness, World of
Warcraft, Trackmania and probably more games.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-21 13:57:46 +02:00
Bas Nieuwenhuizen
f26b008e28 radv: Cleanup unused prime blit path.
Since we have the common WSI code, we use vkCmdCopyImageToBuffer
instead.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-21 10:33:41 +02:00
Bas Nieuwenhuizen
a63a0960e3 radv: Fix SRGB compute copies.
SRGB stores are broken. We had compensation code in the
resolve path but none in the copy path. Since we don't
want any conversion and it does not matter for DCC,
just make everything UNORM instead.

This happened to cause wrong colors for the PRIME path, as
that uses image->buffer copies which always use the compute
path.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106587
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-21 10:33:41 +02:00
Christoph Haag
549e54270b radv: fix VK_EXT_descriptor_indexing
GetPhysicalDeviceProperties2KHR() was crashing because features was null

Fixes: 0e10790558 "radv: Enable VK_EXT_descriptor_indexing."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-20 13:36:07 +02:00
Bas Nieuwenhuizen
a1c87235a9 ac/surface: Only align linear power of two fmt textures.
We're not sharing 32_32_32 formats between different GPUs, so we
do not have to align for vega on pre-vega cards.

Fixes: e361970ed7 "radv: Add support for IMG_DATA_FORMAT_32_32_32."
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-20 11:57:59 +02:00
Bas Nieuwenhuizen
62e0e089d7 amd/addrlib: Use defines in autotools build.
Otherwise stuff like NDEBUG would not be passed through.

CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106479
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-20 11:57:59 +02:00
Samuel Pitoiset
03c4816093 radv: pass radv_nir_compiler_options directly to create_llvm_function()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-18 11:07:01 +02:00
Samuel Pitoiset
fcba3934fc radv: add radv_emit_shader_pointer() helper
For future work (support for 32-bit GPU pointers).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:59 +02:00
Samuel Pitoiset
9b2c310a70 radv: add some helpers for cleaning up radv_get_preamble_cs()
Because this function looks a bit ugly to me.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:57 +02:00
Marek Olšák
f9eb1ef870 amd: remove support for LLVM 4.0
It doesn't support GFX9.

Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-17 14:54:41 -04:00
Samuel Pitoiset
1fba2e10b3 radv: only declare the ESGS rings for pre GFX9 chips
GFX9 uses LDS instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:20 +02:00
Samuel Pitoiset
d349d4bd24 radv: allow to print GPU info with RADV_DEBUG=info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:17 +02:00
Samuel Pitoiset
56d53ed1d6 radv: do not emit unnecessary ES output stores
GFX9:
Totals from affected shaders:
SGPRS: 472 -> 464 (-1.69 %)
VGPRS: 576 -> 584 (1.39 %)
Code Size: 45432 -> 44324 (-2.44 %) bytes
Max Waves: 40 -> 40 (0.00 %)

VI:
SGPRS: 720 -> 720 (0.00 %)
VGPRS: 728 -> 728 (0.00 %)
Code Size: 45348 -> 43992 (-2.99 %) bytes
Max Waves: 120 -> 120 (0.00 %)

This affects Rise of Tomb Raider and the three Vulkan demos
that use a geometry shader (geometryshader, deferredshadows
and viewportarray).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:13 +02:00
Samuel Pitoiset
a6e44d1271 radv: do not emit unnecessary GS output stores
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:11 +02:00
Samuel Pitoiset
507402ada6 radv: only pass the global BO list at submit time if enabled
That way the winsys might use a faster path when the global
BO list is NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:27 +02:00
Samuel Pitoiset
6211799aff radv: remove the radv_finishme() when compiling shaders
Having an entrypoint different than "main" doesn't mean we
have multiple shaders per module.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:24 +02:00
Samuel Pitoiset
1e86eaf7d8 radv: remove radv_device::llvm_supports_spill
It's always true.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:21 +02:00
Dave Airlie
eba4cf797c ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic
Drop the use of the old intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-17 11:46:53 +10:00