Commit graph

118295 commits

Author SHA1 Message Date
Vasily Khoruzhick
01a451b04d lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()
Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.

Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-12-09 15:21:55 +00:00
Jonathan Marek
9d78cf4584 turnip: add hw binning
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-09 08:22:18 -05:00
Samuel Pitoiset
86dfe92bd0 radv: do not use VK_TRUE/VK_FALSE
For consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-09 09:21:26 +01:00
Dave Airlie
d7dc14628a gallivm: add bitfield reverse and ufind_msb
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
2019-12-09 06:05:02 +10:00
Roland Scheidegger
1c7693e3bd gallium/scons: fix graw_gdi build
Fixes: 44a6b0107b (gallivm: add nir->llvm translation (v2))
Reviewed-by: Dave Airlie <Airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-12-07 17:50:53 +01:00
Daniel Schürmann
8259c97b2d aco: propagate temporaries into expanded vectors
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
df3e674fb3 aco: improve readfirstlane after uniform ssbo loads on GFX7
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)

Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0837471463 aco: use soffset for MUBUF instructions on SI/CI
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)

Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7b38d95b32 radv: Enable ACO on GFX7 (Sea Islands)
This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
28c95cc402 aco: return to loop_active mask at continue_or_break blocks
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0f9447ccb0 radv: disable Youngblood app profile if ACO is used
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
746165e540 aco: implement exclusive scan for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7ae227effd aco: implement inclusive_scan for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f895a8b1df aco: implement (clustered) reductions for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
9254fb4fc7 aco: don't use a scalar temporary for reductions on GFX10
This patch also adds the scalar temporary for scans on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8ad43d8838 aco: flush denorms after fmin/fmax on pre-GFX9
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
21f67a3bdc radv: only flush scalar cache for SSBO writes with ACO on GFX8+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
79ce6c1b33 aco: disable disassembly for SI/CI due to lack of support by LLVM
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1c4afe38f2 aco: implement 64bit ine/ieq for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1e1356b2ad aco: implement 64bit i2b for SI /CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835 aco: make 1/2*PI a literal constant on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
90fad7360d aco: implement 64bit VGPR shifts for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006 aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
23319add93 aco: fix disassembly of writelane instructions.
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6fc9ddfef8 aco: recognize SI/CI SMRD hazards
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3eed4d2be5 aco: implement quad swizzles for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
bde9c1e3a1 aco: move buffer_store data to VGPR if needed
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
a8195bdf2e aco: implement nir_op_isign on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
b8783973cd aco: only use scalar loads for readonly buffers on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f27783a667 aco: implement nir_op_fquantize2f16 for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
caea4bbfdc aco: fix SMEM offsets for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8aab92b393 aco: SI/CI - fix sampler aniso
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Dave Airlie
9b533a2ca3 aco: handle gfx7 int8/10 clamping on exports
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0d42e4d7a0 aco: Initial GFX7 Support
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3177346bfc aco: refactor visit_store_fs_output() to use the Builder
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Jason Ekstrand
0f60aa4037 anv: Re-emit all compute state on pipeline switch
It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
bce1c3c668 anv: Re-capture all batch and state buffers
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.

Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
865ffe4e02 anv: Return VK_ERROR_OUT_OF_DEVICE_MEMORY for too-large buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 22:32:05 +00:00
Eric Anholt
e3b249f166 freedreno: Enable texture upload memory throttling.
Fixes oom-killer during streaming-texture-upload, which I found while
trying to enable piglit in CI.

Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-12-06 14:03:50 -08:00
Fritz Koenig
c496d44284 freedreno: reorder format check
With the addition of the planar formats helper, the
planar formats no longer have a valid block.bits field.
Calling util_format_get_blocksize therefore asserts.

Reorder the check to see if the format is supported
before doing the query to get the blocksize.

Fixes: 20f132e5ef ("gallium/util: add planar format layouts and helpers")

Signed-off-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-12-06 21:27:10 +00:00
Nanley Chery
21376cffb3 iris: Fix import of multi-planar surfaces with modifiers
Multi-planar surfaces are allowed to have modifiers. Don't require
DRM_FORMAT_MOD_INVALID in order to create a surface for each plane
defined by the format.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-06 20:31:48 +00:00
Nanley Chery
51ee8fff9b gallium: Store the image format in winsys_handle
This format will be used to properly handle planar images with modifiers
in iris.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-06 20:31:48 +00:00
Nanley Chery
d5c857837a gallium/dri2: Fix creation of multi-planar modifier images
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:

   1. Allow multiple planes.
   2. Perform YUV format lowering and extent adjustments.
   3. Use buffer_index to correctly map the given planes.

Fix these issues by removing or updating the code built on that
assumption.

Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-06 20:31:48 +00:00
Kenneth Graunke
ab016a6a2d meson: Include iris in default gallium-drivers for x86/x86_64
We build i965 by default on x86/x86_64 platforms; let's build iris too.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-12-06 12:27:26 -08:00
Jason Ekstrand
f9a3d9738b anv: Use BO fences/semaphores for AcquireNextImage
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence.  Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ecc119a96e anv: Add a fence_reset_reset_temporary helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ccb7d606f1 anv: Use submit-time implicit sync instead of allocate-time
In 83b943cc2f, we started making all VkDeviceMemory BOs resident all
the time.  One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf.  This potentially leads
to a lot of extra synchronization that we didn't intend to have.

Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent.  This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.

Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive.  Testing was,
however, subjective, so we don't have any hard data on that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
6ebf677cfd anv: Always add in EXEC_OBJECT_WRITE when specified in extra_flags
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called.  This
is likely true for fences but it seems somewhat fragile.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
778b51f491 vulkan/wsi: Add a hooks for signaling semaphores and fences
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
48e23a6406 vulkan/wsi: Provide the implicitly synchronized BO to vkQueueSubmit
This lets us treat the implicit synchronization that we need for X11 and
Wayland like a semaphore.  Instead of trusting the driver to somehow
figure out when that memory object needs to be signaled, we provide an
explicit point where the driver can set EXEC_OBJECT_WRITE and signal the
dma_fence on the BO.  Without this, we have to somehow track inside the
driver when WSI buffers are actually used to avoid extra synchronization
dependencies.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:06 +00:00