Commit graph

3341 commits

Author SHA1 Message Date
Samuel Pitoiset
92fa6264cb radv: do not resolve all image layers with compute inside a subpass
When resolving inside a subpass, we should rely on the framebuffer
layer count instead of resolving all images layers. This should
improve performance of layered resolves a bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:28 +02:00
Bas Nieuwenhuizen
e0d12f79c5 radv: Handle UNDEFINED format in image format list.
Was watching a presentation on YT where this was used and it turns
out it is not invalid.

The only case it is actually valid as format in the creation of an
image or image view is with Android Hardware Buffers which have
their format specified externally.

So we can just ignore all entries with VK_FORMAT_UNDEFINED.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-10 22:21:16 +00:00
Bas Nieuwenhuizen
39c71e0025 radv: Prevent out of bound shift on 32-bit builds.
uintptr_t is 32-bits then and shifting it by 32 bits results in undefined
behavior IIRC.

Fixes: b3c8de1c55 "radv: save all descriptor pointers into the trace BO"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-10 22:18:51 +00:00
Samuel Pitoiset
e9316fdfd4 radv: fix setting CB_SHADER_MASK for dual source blending
CB_SHADER_MASK was computed without the second color buffer
format which looks totally wrong to me.

While we are at it, copy a comment from RadeonSI.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-10 17:21:56 +02:00
Samuel Pitoiset
91aa25f462 radv: fix alpha-to-coverage when there is unused color attachments
When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2019-06-10 09:23:41 +02:00
Samuel Pitoiset
0905189a25 radv: enable VK_EXT_sample_locations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:17 +02:00
Samuel Pitoiset
05f5fa661f radv: enable HTILE for images that might need variable sample locations
This is now supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:14 +02:00
Samuel Pitoiset
e7677a697b radv: handle sample locations during automatic layout transitions
From the Vulkan spec 1.1.109:

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. [...] and
    VkRenderPassSampleLocationsBeginInfoEXT can be chained from
    VkRenderPassBeginInfo to provide sample locations for layout
    transitions performed implicitly by a render pass instance."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:11 +02:00
Samuel Pitoiset
d0d41e58c3 radv: determine the first subpass id for every attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:08 +02:00
Samuel Pitoiset
f58e9f6d69 radv: handle sample locations during explicit depth/stencil transitions
From the Vulkan spec 1.1.109,

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. VkSampleLocationsInfoEXT
    can be chained from VkImageMemoryBarrier structures to provide
    sample locations for layout transitions performed by
    vkCmdWaitEvents and vkCmdPipelineBarrier calls."

This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:01 +02:00
Samuel Pitoiset
a20925f2a9 radv: allow the depth decompress pass to emit dynamic sample locations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:00 +02:00
Samuel Pitoiset
2dd8dfd913 radv: allow to set dynamic sample locations to the depth decompress pass
If VK_EXT_sample_locations is used, the driver might need to emit
the sample locations specified during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:10:55 +02:00
Samuel Pitoiset
d78990c174 radv: allow to save/restore sample locations during meta operations
This will be used for the depth decompress pass that might need
to emit variable sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:10:50 +02:00
Connor Abbott
9d93d2a404 ac/nir: Remove stale TODO
While we're here, copy the comment explaining this from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-06 17:14:28 +02:00
Samuel Pitoiset
b9d3a6b656 radv: set the subpass before any initial subpass transitions
This might fix initial subpass transitions when multiview is used.
Noticed while implementing sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-06 10:00:29 +02:00
Samuel Pitoiset
8a31eaa4e2 radv: use only one descriptor in the fmask expand pass
This removes one useless SMEM load operations which pointed to
the same descriptor anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:58 +02:00
Samuel Pitoiset
7664eb8f2b radv: set ACCESS_NON_READABLE on the fmask expand pass output image
The driver will emit GLC=1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:56 +02:00
Samuel Pitoiset
8206390546 radv: remove one useless image type in the fmask expand shader
Both input and output images use the same type.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:53 +02:00
Marek Olšák
ff63b99531 ac: rename LLVM <= 7 helpers for readability
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:53:46 -04:00
Marek Olšák
c9b64b58de ac: fix a typo in ac_build_wg_scan_bottom
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:53:46 -04:00
Bas Nieuwenhuizen
9701cb1034 radv: Use bo metadata for imported image tiling on Android.
This way we handle linear images etc. correctly.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:32:45 +00:00
Rhys Perry
73dda85512 ac/nir: mark some texture intrinsics as convergent
Otherwise LLVM can sink them and their texture coordinate calculations
into divergent branches.

v2: simplify the conditions on which the intrinsic is marked as convergent
v3: only mark as convergent in FS and CS with derivative groups

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 17:30:53 +01:00
Rhys Perry
d4a2f8b33b radv: fix some compiler warnings
Fixes -Woverflow warnings with GCC 9.1.1

v2: use a cast instead of a bitwise and

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-04 17:30:53 +01:00
Samuel Pitoiset
8a35eb0602 radv: do not use gfx fast depth clears for layered depth/stencil images
The driver should only fast depth clears with the graphics path
when the view covers all image layers, otherwise this might
corrupt layers when HTILE is enabled.

Cc: 19.0 19.1 mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-04 08:55:32 +02:00
Samuel Pitoiset
33f4e04d5a ac,radv: do not emit vec3 for raw load/store on SI
It's unsupported, only load/store format with vec3 are supported.

Fixes: 6970a9a6ca ("ac,radv: remove the vec3 restriction with LLVM 9+")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 08:47:26 +02:00
Marek Olšák
b2bbd1a27b ac/registers: don't use the si, cik, vi names, use gfxN
trivial
2019-06-03 20:06:41 -04:00
Nicolai Hähnle
f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
853ef5ccba amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0
The automatic header generation unifies identical registers in a series
and only emits definitions for the first one. This is mostly to avoid
emitting excessive definitions for CB registers, but special-casing
an exception for this family of registers doesn't seem worth it.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
cf51009ad2 amd/common: unify PITCH_GFX6 and PITCH_GFX9
The definition of the fields differs, but PITCH_GFX9 is a mere extension
of PITCH_GFX6 that does not conflict with any other fields.

This aligns the definitions with what will be generated from the
register JSON.

The information about how large the fields really are is preserved in
the register database.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
e04215815e amd/common: rename R_3F2_CONTROL to IB_CONTROL for disambiguation
This "register" name collides with R_370_CONTROL.

This aligns the definitions with what will be generated from the
register JSON.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
cd247cf456 amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names
The field layout wasn't actually changed in gfx9, so having the suffix
isn't very useful. The field *contents* were changed, but this is
reflected in the V_xxx_xxx definitions and is taken into account by
the ac_debug logic based on the register JSON.

This aligns the definitions with what will be generated from the
register JSON.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
ef6ef098af amd/common: derive ac_debug tables from register JSON 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
d02286c753 amd/registers: add JSON description of packet3 fields 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
67702e3319 amd/registers: add JSON descriptions of registers
The descriptions are mostly derived from parsing the existing
register headers.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
e6184b0892 amd/registers: scripts for processing register descriptions in JSON
We will derive both the debugging tables and (the majority of) the
register headers from descriptions in JSON, instead of deriving the
debugging tables from an awkward parsing of the register headers.

Some of the scripts are useful for maintaining the register database
itself. The scripts are designed to output reasonably readable JSON
by default.
2019-06-03 20:05:20 -04:00
Marek Olšák
486bc1e17e ac: use amdgpu-flat-work-group-size
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 14:32:47 -04:00
Samuel Pitoiset
445098916a radv: flush pending query reset caches before copying results
From the Vulkan spec 1.1.108:
   "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
    previous uses of vkCmdResetQueryPool in the same queue, without any
    additional synchronization."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 16:05:46 +02:00
Samuel Pitoiset
6970a9a6ca ac,radv: remove the vec3 restriction with LLVM 9+
This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 11:30:08 +02:00
Samuel Pitoiset
9178076a46 radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:16 +02:00
Samuel Pitoiset
0e7b029d00 radv: use CmdPushConstants when restoring constants after meta operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:13 +02:00
Samuel Pitoiset
43cc3dc9c0 radv: enable transformFeedbackStreamsLinesTriangles
The driver should already support this without any changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 15:42:36 +02:00
Samuel Pitoiset
da26013eb7 radv: implement VK_EXT_sample_locations and disable it
Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.

The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 09:52:16 +02:00
Caio Marcelo de Oliveira Filho
e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho
a3bfdacb6c radv: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with checking for is_entrypoint and calling
nir_shader_get_entrypoint().

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Samuel Pitoiset
d3771ccaa3 radv: use view format when selecting the resolve path for subpasses
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:53:48 +02:00
Samuel Pitoiset
017170a785 radv: always use view format when performing subpass resolves
It makes sense to use the image view formats when resolving
inside subpasses, while we have to use the image formats for
normal resolves.

Original patch by Philip Rebohle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110348
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:53:46 +02:00
Samuel Pitoiset
eaeaad25f7 radv: sync before resetting a pool if there is active pending queries
Make sure to sync all previous work if the given command buffer
has pending active queries. Otherwise the GPU might write queries
data after the reset operation.

This fixes a bunch of new dEQP-VK.query_pool.* CTS failures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:47:54 +02:00
Samuel Pitoiset
47a10edefb radv: allocate more space in the CS when emitting events
If the driver waits for CP DMA to be idle and emit an EOP event
we need more space.

This fixes a crash with Quake Champions.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 16:56:17 +02:00
Samuel Pitoiset
15cb19ed6f radv add radv_get_resolve_pipeline() in the compute path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 08:17:26 +02:00
Samuel Pitoiset
469258c3b1 radv: cleanup the compute resolve path for subpass
This makes use of radv_meta_resolve_compute_image() by filling
a VkImageResolve region instead of duplicating code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 08:17:23 +02:00