Commit graph

1877 commits

Author SHA1 Message Date
Samuel Pitoiset
49ef890733 radv: expose VK_EXT_scalar_block_layout
Nothing to do, the compiler already handles that.

All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some
16-bit tests that are quite related to fdo bug #108114.

Only enable the extension on CIK+ because it might not work on SI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-05 17:38:20 +01:00
Samuel Pitoiset
c7ada4901a radv: wait on the high 32 bits of timestamp queries
In case we are unlucky if the low part is 0xffffffff.

Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-05 13:05:58 +01:00
Samuel Pitoiset
e899728769 radv: reset pending_reset_query when flushing caches
If the driver used a compute shader for resetting a query pool,
it should be completed when caches are flushed.

This might reduce the number of stalls if operations are done
between vkCmdResetQueryPool() and vkCmdBeginQuery()
(or vkCmdWriteTimestamp()).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
2018-12-05 13:05:55 +01:00
Alex Smith
c1b6cb068c radv: Flush before vkCmdWriteTimestamp() if needed
As done for vkCmdBeginQuery() already. Prevents timestamps from being
overwritten by previous vkCmdResetQueryPool() calls if the shader path
was used to do the reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-05 10:52:48 +00:00
Samuel Pitoiset
824cfc1ee5 radv: rework the TC-compat HTILE hardware bug with COND_EXEC
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.

Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.

This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-05 09:26:31 +01:00
Dave Airlie
1363a47c9c radv: use 3d shader for gfx9 copies if dst is 3d
This fixes some crucible 3d miptree tests I've been working on
when executed using the compute shader path.

Fixes: d08f267814 (radv/gfx9: fix 3d image to image transfers on compute queues.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-04 10:42:31 +10:00
Bas Nieuwenhuizen
12e35a64c0 radv: Check for shareable images in central place.
One place to put the logic makes things easier to change.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-04 01:21:38 +01:00
Bas Nieuwenhuizen
3bf48741e1 radv/android: Use buffer metadata to determine scanout compat.
These days we don't always allocate scanout compatible textures anymore.
That does mean we have to fix the radv android WSI though.

Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-04 01:21:38 +01:00
Bas Nieuwenhuizen
51091b3e1f radv/android: Mark android WSI image as shareable.
Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-04 01:21:38 +01:00
Tobias Klausmann
9401a2f2e6 amd/vulkan: meson build - use radv_deps for libvulkan_radeon
Without this the build breaks with:

FAILED: src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o
cc -Isrc/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha -Isrc/amd/vulkan
-I../src/amd/vulkan -Isrc/../include -I../src/../include -Isrc -I../src
-Isrc/mapi -I../src/mapi -Isrc/mesa -I../src/mesa -I../src/gallium/include
-Isrc/gallium/auxiliary -I../src/gallium/auxiliary -Isrc/amd -I../src/amd
-Isrc/amd/common -I../src/amd/common -Isrc/compiler -I../src/compiler
-Isrc/vulkan/util -I../src/vulkan/util -Isrc/vulkan/wsi -I../src/vulkan/wsi
-Isrc/compiler/nir -I../src/compiler/nir -I/usr/include -I/usr/include/libdrm
-fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-std=c99 -O2 -g '-DVERSION="18.3.0-rc5"' -DPACKAGE_VERSION=VERSION
'-DPACKAGE_BUGREPORT="https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa"'
-DGLX_USE_TLS -DHAVE_ST_VDPAU -DENABLE_ST_OMX_BELLAGIO=0
-DENABLE_ST_OMX_TIZONIA=0 -DHAVE_X11_PLATFORM -DGLX_INDIRECT_RENDERING
-DGLX_DIRECT_RENDERING -DGLX_USE_DRM -DHAVE_DRM_PLATFORM -DENABLE_SHADER_CACHE
-DHAVE___BUILTIN_BSWAP32 -DHAVE___BUILTIN_BSWAP64 -DHAVE___BUILTIN_CLZ
-DHAVE___BUILTIN_CLZLL -DHAVE___BUILTIN_CTZ -DHAVE___BUILTIN_EXPECT
-DHAVE___BUILTIN_FFS -DHAVE___BUILTIN_FFSLL -DHAVE___BUILTIN_POPCOUNT
-DHAVE___BUILTIN_POPCOUNTLL -DHAVE___BUILTIN_UNREACHABLE
-DHAVE_FUNC_ATTRIBUTE_CONST -DHAVE_FUNC_ATTRIBUTE_FLATTEN
-DHAVE_FUNC_ATTRIBUTE_MALLOC -DHAVE_FUNC_ATTRIBUTE_PURE
-DHAVE_FUNC_ATTRIBUTE_UNUSED -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT
-DHAVE_FUNC_ATTRIBUTE_WEAK -DHAVE_FUNC_ATTRIBUTE_FORMAT
-DHAVE_FUNC_ATTRIBUTE_PACKED -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL
-DHAVE_FUNC_ATTRIBUTE_VISIBILITY -DHAVE_FUNC_ATTRIBUTE_ALIAS
-DHAVE_FUNC_ATTRIBUTE_NORETURN -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS
-DUSE_X86_64_ASM -DMAJOR_IN_SYSMACROS -DHAVE_SYS_SYSCTL_H -DHAVE_LINUX_FUTEX_H
-DHAVE_ENDIAN_H -DHAVE_DLFCN_H -DHAVE_STRTOF -DHAVE_MKOSTEMP
-DHAVE_POSIX_MEMALIGN -DHAVE_TIMESPEC_GET -DHAVE_MEMFD_CREATE -DHAVE_STRTOD_L
-DHAVE_DLADDR -DHAVE_DL_ITERATE_PHDR -DHAVE_ZLIB -DHAVE_PTHREAD
-DHAVE_PTHREAD_SETAFFINITY -DHAVE_LIBDRM -DHAVE_LLVM=0x0600
-DMESA_LLVM_VERSION_PATCH=1 -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED
-DHAVE_DRI3 -DHAVE_DRI3_MODIFIERS -Werror=implicit-function-declaration
-Werror=missing-prototypes -Werror=return-type -fno-math-errno
-fno-trapping-math -Wno-missing-field-initializers -Wno-format-truncation -O2
-Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables
-fasynchronous-unwind-tables -fstack-clash-protection -DNDEBUG -fPIC -pthread
-D__STDC_FORMAT_MACROS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_LIMIT_MACROS -fvisibility=hidden -Wno-override-init
-DVK_USE_PLATFORM_XCB_KHR -DVK_USE_PLATFORM_XLIB_KHR
-DVK_USE_PLATFORM_WAYLAND_KHR -DVK_USE_PLATFORM_DISPLAY_KHR
-DVK_USE_PLATFORM_XLIB_XRANDR_EXT  -MD -MQ
'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o' -MF
'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o.d' -o
'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o' -c
../src/amd/vulkan/radv_pipeline.c
In file included from ../src/vulkan/util/vk_alloc.h:29,
                 from ../src/amd/vulkan/radv_private.h:52,
                 from ../src/amd/vulkan/radv_debug.h:27,
                 from ../src/amd/vulkan/radv_pipeline.c:30:
../src/../include/vulkan/vulkan.h:54:10: fatal error: wayland-client.h: Datei
oder Verzeichnis nicht gefunden
 #include <wayland-client.h>
          ^~~~~~~~~~~~~~~~~~
compilation terminated.

The above command misses the include directory for wayland:
    -I/usr/include/wayland

The missing include is contained in the (until now) unused radv_deps:

if with_platform_wayland
  radv_deps += dep_wayland_client
  radv_flags += '-DVK_USE_PLATFORM_WAYLAND_KHR'
  libradv_files += files('radv_wsi_wayland.c')
endif

Fixes: 673dda8330 "meson: build "radv" vulkan driver for radeon hardware"
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-12-03 09:18:48 -08:00
Nicolai Hähnle
776b911365 amd/addrlib: update Mesa's copy of addrlib
Update to the internal master as of 2018-11-15.

This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
2018-11-29 13:18:24 +01:00
Nicolai Hähnle
729ebdf07e radv: remove dependency on addrlib gfx9_enum.h
v2:
- use SI_CONTEXT_REG_OFFSET

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-29 13:18:23 +01:00
Samuel Pitoiset
cc7deb749c radv: drop few useless state changes when doing color/depth decompressions
Viewport/scissor don't need to be updated for array textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:55 +01:00
Samuel Pitoiset
6d4f65deea radv: remove unused pending_clears param in the transition path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:53 +01:00
Samuel Pitoiset
4b9df824f7 radv: optimize CmdClear{Color,DepthStencil}Image() for layered textures
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
7484bc894b radv: refactor the fast clear path for better re-use
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
f78ee19702 radv: simplify a check in emit_fast_color_clear()
Currently only true if RADV_PERFTEST=dccmsaa is set.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
eca931a726 radv: add radv_can_fast_clear_{color,depth}() helpers
For further optimisations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
93f5ce8fa7 radv: add radv_image_view_can_fast_clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
aeaf8dbd09 radv: add radv_image_can_fast_clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Samuel Pitoiset
3e718db1ff radv: remove useless check in emit_fast_color_clear()
The driver doesn't support DCC/CMASK for mipmapped textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-29 10:18:42 +01:00
Bas Nieuwenhuizen
6569644bb6 radv: Align large buffers to the fragment size.
Improves performance in Talos by about 15% (and significant improvements
in RotR and possibly other but did not bench with final patch) on
kernel 4.19 and earlier.

On 4.20+ a similar effect comes from

433ca054949a "drm/amdgpu: try allocating VRAM as power of two"

v2: Do not impact the alignment of the physical memory.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
2018-11-27 22:17:42 +01:00
Bas Nieuwenhuizen
08ea6b9d9b radv: Clamp gfx9 image view extents to the allocated image extents.
Mirrors AMDVLK. Looks like if we go over the alignment of height
we actually start to change the addressing. Seems like the extra
miplevels actually work with this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245
Fixes: f6cc15dccd "radv/gfx9: fix block compression texture views. (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-27 10:19:52 +01:00
Bas Nieuwenhuizen
3c96a1e3a9 radv: Fix opaque metadata descriptor last layer.
We used the layer count which results in an off by one error.

Not sure this really affects anything.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-26 09:29:39 +01:00
Samuel Pitoiset
9fc1ce258c radv: ignore subpass self-dependencies for CreateRenderPass() too
We really need to refactor this...

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-23 11:59:11 +01:00
Samuel Pitoiset
2951a766bd radv: remove useless sync before CmdClear{Color,DepthStencil}Image()
We don't need to flush anything before these two commands as well.
This is because they have to be externally synchronized, so the
app should have called CmdPipelineBarrier() prior to that and the
driver should have flushed the caches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-23 11:59:08 +01:00
Samuel Pitoiset
4ff4af3d91 radv: remove useless sync after CmdClear{Color,DepthStencil}Image()
'post_flush' is only set to NULL for the normal clear path
(ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage()
are affected commands).

Because these two operations have to be externally synchronized
with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT,
it's useless to set those flags internallY.

VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded
by RADV_CMD_FLAG_INV_GLOBAL_L2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-22 08:56:36 +01:00
Samuel Pitoiset
4b9bc4791b radv: only sync CP DMA for transfer operations or bottom pipe
CP DMA can only be busy when the driver copies buffers. The
only affected Vulkan commands are vkCmdCopyBuffer() and
vkCmdUpdateBuffer() (because we fallback to a copy depending on
a threshold). Clear operations are currently not concerned
because the driver always syncs after the last DMA operation.

Per the spec, these two operations have to be externally
synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-21 10:03:01 +01:00
Samuel Pitoiset
457ac6ce1e radv: ignore subpass self-dependencies
Unnecessary as they allow the app to call vkCmdPipelineBarrier()
inside the render pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-21 10:02:59 +01:00
Bas Nieuwenhuizen
dd0172e865 radv: Use structured intrinsics instead of indexing workaround for GFX9.
These force the index to be used in the instruction so we don't need the
workaround.

Totals:
SGPRS: 1321642 -> 1321802 (0.01 %)
VGPRS: 943664 -> 943788 (0.01 %)
Spilled SGPRs: 28468 -> 28480 (0.04 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 52415292 -> 52338932 (-0.15 %) bytes
LDS: 400 -> 400 (0.00 %) blocks
Max Waves: 233903 -> 233803 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 238344 -> 238504 (0.07 %)
VGPRS: 232732 -> 232856 (0.05 %)
Spilled SGPRs: 13125 -> 13137 (0.09 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 15752712 -> 15676352 (-0.48 %) bytes
LDS: 139 -> 139 (0.00 %) blocks
Max Waves: 31680 -> 31580 (-0.32 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-19 23:36:00 +01:00
Samuel Pitoiset
724107553c radv: implement fast HTILE clears for depth or stencil only on GFX9
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.

This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).

BEFORE: 235 us
AFTER: 13 us

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:18 +01:00
Samuel Pitoiset
7dcddbe54d radv: rewrite the condition that checks allowed depth/stencil values
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:16 +01:00
Samuel Pitoiset
9133bbf186 radv: check allowed fast HTILE clears a bit earlier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:14 +01:00
Samuel Pitoiset
193ad4748b radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:12 +01:00
Samuel Pitoiset
c7e142ed78 radv: add radv_get_htile_fast_clear_value() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:10 +01:00
Samuel Pitoiset
6f3fbcc041 radv: remove unnecessary goto in the fast clear paths
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:08 +01:00
Samuel Pitoiset
36006e3cec radv/winsys: remove the max IBs per submit limit for the sysmem path
This path will be eventually improved later but as it's only
used on SI (or with RADV_DEBUG=noibs), I'm not sure if that
matters much.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:06 +01:00
Samuel Pitoiset
4d30f2c6f4 radv/winsys: remove the max IBs per submit limit for the fallback path
The chained submission is the fastest path and it should now
be used more often than before. This removes some EOP events.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:04 +01:00
Samuel Pitoiset
55c75d2b49 radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressions
DCC and FMASK also imply a fast-clear eliminate, so it should be
safe to reset the predicate unconditionally. We still only skip
FMASK or CMASK decompressions for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 14:05:35 +01:00
Samuel Pitoiset
483a28bfd4 radv: tidy up radv_set_dcc_need_cmask_elim_pred()
This is just a small cleanup.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 14:05:33 +01:00
Samuel Pitoiset
d031d5c999 radv: enable primitive binning by default
After doing a bunch of benchmarks, primitive binning helps
some games like The Talos Principle (+5%) or Serious Sam 2017
(+3%). For other titles, either it doesn't change anything or
it hurts very few (less than 1%).

This only affects GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-16 17:51:15 +01:00
Samuel Pitoiset
afd834b62e radv: add a debug option for disabling primitive binning
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-16 17:51:12 +01:00
Connor Abbott
ba94a00c7c Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT"
This reverts commit 647c2b90e9. There was
one recently-introduced bug in ac for dvec3 loads, but the other test
failures were actually bugs in the tests. See
9429e621c4

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-16 10:32:03 +01:00
Karol Herbst
099728b115 nir: replace nir_load_system_value calls with appropiate builder functions
this helps reduce the overall code changes when a bit_size parameter is
added to nir_load_system_value

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-11-14 02:09:11 +01:00
Timothy Arceri
95b513c937 radv: make use of nir_move_out_const_to_consumer()
vkpipeline-db results:

Totals from affected shaders:
SGPRS: 28400 -> 28576 (0.62 %)
VGPRS: 27916 -> 27692 (-0.80 %)
Spilled SGPRs: 140 -> 138 (-1.43 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1534456 -> 1520560 (-0.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 3541 -> 3582 (1.16 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-14 09:41:50 +11:00
Samuel Pitoiset
90d68858ed radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-13 10:24:36 +01:00
Samuel Pitoiset
f70c5d31cd radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-13 10:24:33 +01:00
Samuel Pitoiset
b5f213bb1d radv: binding streamout buffers doesn't change context regs
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-13 10:24:31 +01:00
Samuel Pitoiset
97fb1a02fd radv: make use of num_good_cu_per_sh in si_emit_graphics() too
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-12 09:35:46 +01:00
Samuel Pitoiset
d9d14346c2 radv: clean up setting partial_es_wave for distributed tess on VI
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-12 09:35:44 +01:00