Commit graph

13547 commits

Author SHA1 Message Date
Samuel Pitoiset
f882c62218 radv: add radv_clear_{cmask,dcc} helpers
They will help for DCC MSAA textures and if we support mipmaps
in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:05 +02:00
Samuel Pitoiset
8f9f62c2db radv: don't pass the pipeline to radv_flush_constants()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 19:46:27 +02:00
Samuel Pitoiset
2bd50cceff radv: rename radv_cmd_buffer_update_vertex_descriptors()
... to radv_flush_vertex_descriptors().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-06 19:46:23 +02:00
Samuel Pitoiset
e829a0cc1e radv: do not try to skip draw calls when VBOs upload failed
This is unnecessary because we record an error which should
be returned by vkEndCommandBuffer(), and the app shouldn't
submit a command buffer when this happens.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 19:46:21 +02:00
Samuel Pitoiset
f1d7c16e85 radv: fix prefetching compute shaders on CIK and older chips
Because the check was moved to radv_emit_prefetch_L2().

Fixes: 4ad7595f35 ("radv: rename radv_emit_prefetch() to radv_emit_prefetch_L2()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 19:46:18 +02:00
Samuel Pitoiset
7fe586f6fb radv: only enable PERFECT_ZPASS_COUNTS for precision occlusion queries
This unnecessary when the precision bit flag is not set, and this
might hurt performance. The Vulkan explains that not setting
VK_QUERY_CONTROL_PRECISE_BIT might be more efficient on some
implementations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 09:07:34 +02:00
Samuel Pitoiset
d53dff3bfc radv: enable the Polaris small primitive filter control
Enable it directly in the preamble, but do not enable line
on Polaris10/11/12 because there is a hw bug.

There is possibly an issue when MSAA is off, but this doesn't
regress any CTS and AMDVLK doesn't have a workaround as well.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 09:07:31 +02:00
Samuel Pitoiset
942fdfe357 radv: implement a fast prefetch path for the vertex stage
This allows to start draws as soon as possible.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-05 10:03:48 +02:00
Samuel Pitoiset
4ad7595f35 radv: rename radv_emit_prefetch() to radv_emit_prefetch_L2()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-05 10:03:45 +02:00
Samuel Pitoiset
a8a696a38f radv: use a mask for VBOs and shaders prefetching
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-05 10:03:42 +02:00
Samuel Pitoiset
922cd38172 radv: implement out-of-order rasterization when it's safe on VI+
Disabled by default for now, it can be enabled with
RADV_PERFTEST=outoforder.

No CTS regressions on Polaris, and all Vulkan games I tested
look good as well.

Expect small performance improvements for applications where
out-of-order rasterization can be enabled by the driver.

Loosely based on RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
d6709c91a6 radv: change blend_enable field to use four bits per CB
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
a8818d1af2 radv: scan which color blend attachments are enabled
With cb_target_enabled_4bit in order to have four bits per CB.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
ac456d0d1b radv: put more fields in radv_blend_state
Some will be used for further optimizations (ie. out-of-order rast).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
e4976ca33b radv: do not always disable dual quad mode when chip has RbPlus
For GFX9+ only, RadeonSI does this too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
b8c06a961c radv: don't use the SPI barrier management bug workaround
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
ab147cba77 radv: mask out high VM address bits in registers where needed
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset
acf60abc54 radv: enable VK_EXT_shader_viewport_index_layer
The driver already supports exporting the Layer and ViewportIndex
built-ins from vertex or tessellation shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-03 14:05:46 +02:00
Mike Lothian
7e144ace95 ac/nir: Fix include for LLVMAddPromoteMemoryToRegisterPass
Include llvm-c/Transforms/Utils.h with the newest LLVM 7

Signed-of-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-04-02 14:27:29 -04:00
Marek Olšák
dc04e4bba2 radeonsi: move FMASK shader logic to shared code
We'll need it for FBFETCH in both TGSI and NIR paths.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-02 13:55:22 -04:00
Marek Olšák
5d91c2ccea ac/gpu_info: print GB_ADDR_CONFIG 2018-04-02 13:10:37 -04:00
Marek Olšák
b1f33086ec ac/gpu_info: reorder the fields and print them nicely 2018-04-02 13:10:37 -04:00
Marek Olšák
a0a96819e1 ac/gpu_info: rename has_virtual_memory -> r600_has_virtual_memory 2018-04-02 13:10:37 -04:00
Marek Olšák
32b3932de1 ac/gpu_info: don't print irrelevant fields 2018-04-02 13:10:37 -04:00
Samuel Pitoiset
2a329f4ada radv: set SAMPLE_RATE to the number of samples of the current fb
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-30 17:32:15 +02:00
Ian Romanick
4925347ec5 util: Include bitscan.h directly
Previously bitset.h would include u_math.h to get bitscan.h.  u_math.h
lives in src/gallium/auxiliary/util while both bitset.h and bitscan.h
live in src/util.  Having the one file directly include another file
that lives in the same directory makes much more sense.

As a side-effect, several files need to directly include standard header
files that were previously indirectly included.

v2: Fix build break in src/amd/common/ac_nir_to_llvm.c.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2018-03-29 14:09:30 -07:00
Ian Romanick
d76c204d05 util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious.  The next
patch adds a new function with different zero-input behavior.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-03-29 14:09:23 -07:00
Samuel Pitoiset
e45fe0ed66 radv: fix scanning output_usage_mask with structs
To fix a regression in:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.struct

And the following regressions (Polaris only):
dEQP-VK.glsl.indexing.varying_array.*

Fixes: f3275ca01c ("ac/nir: only enable used channels when exporting parameters")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-29 10:22:10 +02:00
Daniel Schürmann
b91cd5dba4 radv: enable VK_AMD_shader_trinary_minmax extension
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 01:29:39 +02:00
Daniel Schürmann
d00fb7ce54 ac: add support for trinary_minmax instructions
v2: Add missing break (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 01:29:35 +02:00
Bas Nieuwenhuizen
4503ff760c ac/nir: Add workaround for GFX9 buffer views.
On GFX9 whether the buffer size is interpreted as elements or bytes
depends on whether IDXEN is enabled in the instruction. If the index
is a constant zero, LLVM optimizes IDXEN to 0.

Now the size in elements is interpreted in bytes which of course
results in out of bounds accesses.

The correct fix is most likely to disable the LLVM optimization,
but we need something to work with LLVM <= 6.0.

radeonsi does the max between stride and element count on the CPU
but that results in the size intrinsics returning the wrong size
for the buffer. This would cause CTS errors for radv.

v2: Also include the store changes.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 00:03:03 +02:00
Marek Olšák
4f96747530 ac/surface: set AddrSurfInfoIn.format = ADDR_FMT_8 for stencil, add assertions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105738

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-28 17:23:41 -04:00
Samuel Pitoiset
1c4fdcf444 radv: enable VK_EXT_sampler_filter_minmax
Only enable for CIK+ because it's buggy on SI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-28 22:55:48 +02:00
Samuel Pitoiset
413d77e7f9 radv: add support for VK_EXT_sampler_filter_minmax
The driver only supports the required formats for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-28 22:55:48 +02:00
Samuel Pitoiset
99b52aa1da radv: rename VEGA10 device name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-28 20:15:17 +02:00
Samuel Pitoiset
4d2c46dda3 radv: add support for Vega12
Based on RadeonSI. Untested.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-28 20:15:14 +02:00
Marek Olšák
20eb44ad65 radeonsi: add support for Vega12
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-28 11:37:43 -04:00
Marek Olšák
5425d32fcf amd/addrlib: update to the latest version for Vega12
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-28 11:37:43 -04:00
Timothy Arceri
92fa89a08d ac/radeonsi: pass bindless bool to load_sampler_desc()
We also fix the base_index for bindless by using the driver
location.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-03-28 12:56:16 +11:00
Timothy Arceri
51f175028d ac/nir_to_llvm: fix component packing for double outputs
We need to wait until after the writemask is widened before we
adjust it for component packing.

Together with the previous patch this fixes a number of
arb_enhanced_layouts component layout piglit tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-03-28 09:59:37 +11:00
Marek Olšák
769603564e radeonsi: don't reallocate on DMABUF export if local BOs are disabled 2018-03-26 19:22:12 -04:00
Samuel Pitoiset
ccc64f3133 radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8
The hardware only supports 32-bit depth surfaces, but we can
enable TC-compat HTILE for 16-bit depth surfaces if no Z planes
are compressed.

The main benefit is to reduce the number of depth decompression
passes. Also, we don't need to implement DB->CB copies which is
fine.

This improves Serious Sam 2017 by +4%. Talos and F12017 are also
affected but I don't see a performance difference.

This also improves the shadowmapping Vulkan demo by 10-15%
(FPS is now similar to AMDVLK).

No CTS regressions on Polaris10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-23 10:05:57 +01:00
Samuel Pitoiset
5ae9772245 radv: add radv_calc_decompress_on_z_planes() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-23 10:05:55 +01:00
Samuel Pitoiset
9b8e75bee3 radv: add radv_image_is_tc_compat_htile() helper
Instead of that huge conditional that's going to be crazy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-23 10:05:54 +01:00
Jason Ekstrand
884d27bcf6 nir: Rename image intrinsics to image_var
Generated with

git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'

and some manual fixing in nir_intrinsics.h

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-23 13:48:11 +11:00
Juan A. Suarez Romero
0bf1274883 radv: autotools: add radv_extensions.h in the generated VULKAN list
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-03-22 18:25:39 +01:00
Juan A. Suarez Romero
13459c637a anv/radv: autotools: include vulkan_*.h headers
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-03-22 18:25:39 +01:00
Samuel Pitoiset
52fba3f45d radv: remove unused radv_pipeline::needs_data_cache variable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-22 14:30:37 +01:00
Timothy Arceri
c135316555 ac/nir_to_llvm: add frexp support
Fixes CTS tests:
KHR-GL40.gpu_shader_fp64.builtin.frexp_double
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec2
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec3
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec4

And piglit test:
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-03-22 12:42:34 +11:00
Marek Olšák
f7ffa504a0 ac/surface: compute tile swizzle for GFX9
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-03-21 13:40:06 -04:00