Commit graph

117072 commits

Author SHA1 Message Date
Boris Brezillon
5375d009be panfrost: Pass referenced BOs to the SUBMIT ioctls
Instead of manually adding the BOs from the various SLAB pools plus
the one backing the color FB, we insert them in the BO set attached
to the job and let panfrost_drm_submit_job() pass all BOs from this set
to the SUBMIT ioctl.
This means we are now passing all referenced BOs and let the scheduler
wait on referenced BO fences if needed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 15:00:21 +02:00
Boris Brezillon
3557746e0d panfrost: Make SLAB pool creation rely on BO helpers
There's no point duplicating the code, and it will help us simplify
the bo_handles[] filling logic in panfrost_drm_submit_job().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:59:28 +02:00
Boris Brezillon
c684a79669 panfrost: Add the panfrost_drm_{create,release}_bo() helpers
To avoid the panfrost_memory <-> panfrost_bo dance done in
panfrost_resource_create_bo() and panfrost_bo_unreference().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
948fddfc42 panfrost: Move the mmap BO logic out of panfrost_drm_import_bo()
So we can re-use it for the panfrost_drm_create_bo() function we are
about to introduce.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
8d4afcdacc panfrost: Avoid passing winsys handles to import/export BO funcs
Let's keep a clear split between ioctl wrappers and the rest of the
driver. All the import BO function need is a dmabuf FD and the screen
object, and the export one should only take care of generating a dmabuf
FD out of a BO object. Winsys handle manipulation should stay in the
resource.c file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
aa5bc35f31 panfrost: Move BO meta-data out of panfrost_bo
That's what most (all?) implementation seem to do, and my understanding
is that a BO is just a bunch of memory that can be used for anything GPU
related, not only texture/FB resources.

Let's move those meta data in panfrost_resource so we can use
panfrost_bo for all kind of memory allocation and make BO allocation
more consistent.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
c4f4193ad4 panfrost: Stop exposing internal panfrost_drm_*() functions
panfrost_drm_submit_job() and panfrost_fence_create() are not used
outside of pan_drm.c.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
6ba61324f0 panfrost: Get rid of the "free imported BO" logic
bo->imported was never set to true which means this path was never taken.
Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap()
call which seems wrong because the import BO function calls mmap().

Let's just kill this function along with the ->imported field.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
079aaa9c6d panfrost: Get rid of the panfrost_driver abstraction leftovers
Commit 5f81669d88 ("panfrost: Remove the panfrost_driver abstraction")
left a few things behind, remove them now.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
6608642d21 panfrost: Move scanout res creation out of panfrost_resource_create()
Which improves readability and help us avoid a memory leak.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
873b7b93e8 panfrost: Add the sampled texture BO to the job
Otherwise we get random use-after-{free,unmap} errors.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v2:
- Move the panfrost_job_add_bo() call out of the loop
2019-07-02 14:57:35 +02:00
Samuel Pitoiset
6cc213b3c1 radv: enable DCC for layers on GFX8
It's currently only enabled if dcc_slice_size is equal to
dcc_slice_fast_clear_size because the driver assumes that
portions of multiple layers are contiguous but it's not
always true.

Still not supported on GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:02 +02:00
Samuel Pitoiset
233224c7f7 radv: do not enable DCC for mipmapped arrays because performance is worse
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:00 +02:00
Samuel Pitoiset
e41e575e24 radv: implement clearing DCC layers on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:56 +02:00
Samuel Pitoiset
e47c68b7b0 radv: merge radv_dcc_clear_level() into radv_clear_dcc()
This will help for clearing DCC arrays because we need to know
the subresource range.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:51 +02:00
Samuel Pitoiset
f772fe6a11 radv: add support for decompressing DCC layers with compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:49 +02:00
Samuel Pitoiset
83297baf2d ac: compute the DCC fast clear size per slice on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:44 +02:00
Samuel Pitoiset
6517d226ac ac: compute the size of one DCC slice on GFX8
Addrlib doesn't provide this info. Because DCC is linear, at least
on GFX8, it's easy to compute the size of one slice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:41 +02:00
Kenneth Graunke
457a55716e iris: Defer closing and freeing VMA until buffers are idle.
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU.  To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle.  We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Kenneth Graunke
07f3455664 iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().
In the future, some images will need to be aligned to a larger value
than 4096.  Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.

v2: Fix missing alignment in vma_alloc, caught by Caio!

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Iago Toral Quiroga
042aeffd5b v3d: do not flush jobs that are synced with 'Wait for transform feedback'
Generally, we achieve this by skipping the flush on calls to
v3d_flush_jobs_writing_resource() when we detect that the resource is written
in the current job from a transform feedback write.

The exception to this is the case where the caller is about to map the
resource, in which case we need to flush immediately since we can only emit
'Wait for transform feedback' commands on rendering jobs. We add a parameter
to the function so the caller can identify that scenario.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga
88cbc4f7f6 v3d: emit 'Wait for transform feedback' commands when needed
The hardware can flush transform feedback writes before reads in the same
job by inserting this command.

This patch detects when the rendering state for the current draw call reads
resources that had been previously written by transform feedback in the
same job and inserts the 'Wait for transform feedback' command before
emitting the new draw.

v2 (Eric):
  - this was intended to look at job->tf_write_prscs for TF jobs.
  - clear job->tf_write_prscs after we emit the TF flush.
  - can skip flushes for fragment shader reads from TF.

v3 (Eric):
  - all resources in job->tf_write_prscs are resources written by TF so
   we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT.
  - documented optimization opportunity for geometry stages.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga
c7dff0e614 v3d: keep track of resources written by transform feedback
The hardware provides a feature to sync reads from previous transform feedback
writes in the same job so if we use this mechanism we no longer have to flush
the job.

In order to identify this scenario we need a mechanism to identify resources
that are written by transform feedback.

v2: use _mesa_pointer_set_create (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Mike Blumenkrantz
c8dcc308cc st/dri: fix typo in format table for GR1616 format
the dri image format here should match the fourcc format

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:17:10 -07:00
Mike Blumenkrantz
08fc14a979 st/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsys
this makes the entire struct available for use here

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:16:56 -07:00
Mike Blumenkrantz
2cc85670a7 mesa/st: simplify format usage in st_bind_egl_image
the formats handled in the switch statement will always return an
unknown mesa format, so process them directly and leave the default
case for other/unknown formats

no functional changes

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:16:43 -07:00
Kenneth Graunke
9b1b971491 iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall).  We also try and select the optimal batch.

Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO.  Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.

I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
2019-07-01 13:59:49 -07:00
Bas Nieuwenhuizen
d7e6541cc7 radv: Only allocate supplied number of descriptors when variable.
Fixes: b5e04e9217 "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-01 20:53:33 +02:00
Eric Engestrom
177c35bf13 egl: simplify loop
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
2019-07-01 19:35:22 +01:00
Eric Anholt
67ffb853f0 sparc: Reuse m_vector_asm.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:14:29 -07:00
Eric Anholt
20294dceeb mesa: Enable asm unconditionally, now that gen_matypes is gone.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 11:14:10 -07:00
Eric Anholt
52a39a332f mesa: Replace gen_matypes with a simple header for V4F/mat layout.
We can greatly simplify our builds by just hardcoding GLvector4f and
GLmatrix's layouts.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:12:15 -07:00
Eric Anholt
1738b38ce8 matypes: Drop some unused defines.
Most of these haven't been used since the conversion from checked-in
matypes to generation.  By cutting down the generated contents, this
should clarify why the file is generated: we need
architecture-specific offsets to the V4F fields in the asm that uses
it.

v2: Keep matrix offsets to prevent x86 build breakage..

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:09:26 -07:00
Eric Engestrom
1835f30097 meson: drop duplicate source & inc_dir
These two are already pulled from `idep_vulkan_util_headers`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-01 18:53:57 +01:00
Eric Engestrom
04e0ac59b1 swrast: simplify function pointer calls
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-07-01 18:51:49 +01:00
Eric Engestrom
fbf7c38da3 egl/wayland: use bitset.h for formats bit set
Currently only 7 formats are supported, but we don't want the 16 limit
(it's an `unsigned`) to hit us by surprise :]

Let's use bitset.h's BITSET magic to allow us to have any number of
formats, with a static assert to make sure we don't forget to update it.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-01 18:35:54 +01:00
Sagar Ghuge
d5f63990b4 intel/tools: Add assembler unit tests for ROL/ROR instructions
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
e9c35dd7cc intel/tools: Add ROL/ROR support in assembler
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
1e92e83856 intel/compiler: Emit ROR and ROL instruction
v2: Reorder patch (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
80117117bd nir: Add optimization to use ROR/ROL instructions
v2: 1) Add more optimization rules for ROL/ROR (Matt Turner)
    2) Add lowering rules for ROL/ROR (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
81d342e2a1 nir: Add urol and uror opcodes
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Alyssa Rosenzweig
8d74749f81 panfrost: Implement instanced rendering
We implement GLES3.0 instanced rendering with full support for instanced
arrays (via instance divisors). To do so, we use the new invocation
helpers to invoke a triplet of (1, vertex_count, instance_count), rather
than simply (1, vertex_count, 1). We rewrite the attribute handling code
into a new pan_instancing.c file which handles both the simple LINEAR
case for non-instanced as well as each of the new instancing cases:
MODULO (for per-vertex attributes), POT and NPOT divisors.

As a side effect, we rework how vertex buffers are handled, duplicating
them to be 1:1 with vertex descriptors to simplify instancing code paths
dramatically. This might be a performance regression, but this remains
to be seen; if so, we can always deduplicate later with some added logic
in pan_instancing.c

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:50:57 -07:00
Alyssa Rosenzweig
e9e22546ff panfrost/decode: Compute padded_num_vertices for MODULO
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:49:18 -07:00
Alyssa Rosenzweig
9b97ed1250 panfrost/midgard: Emit type appropriate ld_vary
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:56 -07:00
Alyssa Rosenzweig
aa333ac6ad panfrost/midgard: Add unsigned ld/st ops
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
bbc050b82e panfrost/midgard: Use the appropriate ld_attr type
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
c9b164f9b5 panfrost: Implement dispatch helpers
Rather than open-coding workgroups_shift_* type fields, we include a
general routine for packing the vertex/tiler/compute descriptor based on
the provided dispatch parameters.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
8fd748de3d panfrost: Remove ancient comment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00