Commit graph

1947 commits

Author SHA1 Message Date
Jason Ekstrand
292ceb297c v3dv: Enable VK_EXT_debug_utils
It's implemented in common code as long as you use vk_command_buffer.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15560>
2022-04-06 01:18:23 +00:00
Omar Akkila
4208895175 ci: bump VK-GL-CTS to 1.3.1.1
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15668>
2022-04-04 23:04:33 +00:00
Iago Toral Quiroga
827ef5fba9 v3dv: fix limits for inline uniform blocks
We don't support 'Update After Bind', however, the limits for this
model also include the ones without it. See the with or without remark
in the spec below:

"maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to
 maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings
 from descriptor sets created with or without the
 VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set."

Fixes:
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>
2022-04-04 09:28:55 +00:00
Iago Toral Quiroga
597560e27c broadcom/compiler: always enable per-quad on spill operations
This ensures that any channels used for helper invocations are
also spilled/filled correctly.

Alternatively, we could recursively track all temps that get
involved in computing values that are then used in explicit
(dfdx,dfdy) or implicit (texture coordinates for mipmap or
anisotropic filtering, etc) derivatives, and only enable
per-quad on these (or disable spilling of any of these
values).

Fixes:
dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>
2022-04-01 08:53:50 +00:00
Jason Ekstrand
688d478045 v3dv/queue: Rework multisync_free
Thix fixes two bugs.  First, we stop leaking in/out fences with
multisync.  Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays.  Not sure how this isn't showing up in the dEQP leak check
tests.

Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl.  This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.

Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
2022-03-29 14:38:41 +00:00
Iago Toral Quiroga
7f6ecb8667 v3dv: add reference counting for descriptor set layouts
The spec states that descriptor set layouts can be destroyed almost
at any time:

   "VkDescriptorSetLayout objects may be accessed by commands that
    operate on descriptor sets allocated using that layout, and those
    descriptor sets must not be updated with vkUpdateDescriptorSets
    after the descriptor set layout has been destroyed. Otherwise,
    descriptor set layouts can be destroyed any time they are not in
    use by an API command."

Based on a similar fix for RADV.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
2022-03-29 11:28:39 +00:00
Iago Toral Quiroga
ca861bd6f4 v3dv: drop unnecessary memset
We are already zeroing when we allocate the descriptor set layout
memory with vk_object_zalloc.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
2022-03-29 11:28:39 +00:00
Iago Toral Quiroga
591eed30b2 v3dv: fix sampler array addressing in v3dv_descriptor_set_layout
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
2022-03-29 11:28:39 +00:00
Alejandro Piñeiro
81039feda4 broadcom: update language on V3D_DEBUG options
Some typos, and bad grammar.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15593>
2022-03-28 19:21:48 +00:00
Iago Toral Quiroga
ce849032a4 broadcom/compiler: allow ldunifa with indirect uniform loads
We handle uniforms by copying them into the uniform stream to be
consumed with ldunif when they have a constant offset. Otherwise
we fallback to general TMU access, which has more latency.

However, just like we did for UBOs and read-only SSBOs, we can
also try to use the unifa mechanism to handle indirect accesses
in certain cases instead of the TMU fallback.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
2022-03-28 10:44:13 +00:00
Iago Toral Quiroga
ea3223e7a4 v3dv: implement VK_EXT_inline_uniform_block
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.

We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.

Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
2022-03-28 10:44:13 +00:00
Boris Brezillon
56a2ccf058 v3dv: Stop using VK_OUTARRAY_MAKE()
We're trying to replace VK_OUTARRAY_MAKE() by VK_OUTARRAY_MAKE_TYPED()
so people don't get tempted to use it and make things incompatible with
MSVC (which doesn't support typeof()).

Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15522>
2022-03-25 11:00:02 +00:00
Jason Ekstrand
19f56e3fc4 v3dv: Drop GetPhysicalDeviceQueueFamilyProperties
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15459>
2022-03-18 11:19:14 -05:00
Iago Toral Quiroga
4f284254e4 v3dv: support importing external semaphores
This was waiting for multisync support in our kernel interface so
we can wait on the actual imported payload of a semaphore rather
than the last job we submitted.

Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
fa1b10f36d v3dv: lock around noop job submits
Any thread we create may end up creating/submitting at least a
noop job, which is a shared object. Before multisync, this was
an issue only for the creation of the job itself, but with
multisync we can also modify parameters of the noop job
every time it is used (for signaling and serialization
configuration).

This change adds a noop mutex that all threads (main, wait and
master) take before submitting a noop job to ensure concurrent
access is not an issue.

Fixes flakyness observed with multisync with the following test:
dEQP-VK.api.command_buffers.secondary_execute_twice

Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
daa865fb2c v3dv: fix semaphore wait from CPU job
If a CPU job comes first in a command buffer with a semaphore wait operation
we need to wait on the CPU for the semaphore to be signaled before we process
the job.

We have been doing this with a WaitForIdle operation, but that only works
if the semaphore has been submitted for signaling from the same instance
of the driver. If we have an imported payload from another instance in our
semaphore however, waitForIdle may return too early since the submission
to signal the semaphore may have been submitted by a different instance
of the driver as well, and our wait for idle checks only know about this
instance submissions.

To fix this, we always submit a noop job from our instance that waits on
the semaphores on the GPU and follow up with WaitForIdle to wait for that
to complete.

Fixes test failures and/or assert crashes in:
dEQP-VK.synchronization.cross_instance.*
(when enabling support for semaphore imports)

Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
3b8ab8a9ce v3dv: don't signal semaphores/fences from a wait thread
When we have a wait thread we can't ensure that the last job in the last
command buffer will be the one to signal semaphores because in this case
there is no gurantee that jobs from command buffers in the batch will be
submitted to the GPU in order, as those put in a wait thread will be
submitted later when the event wait operation is completed.

Instead, we need to wait for all outstanding wait threads to complete
and only then we should signal any semaphores or fences.

This also fixes a bug where the wait for events was the last job in
the command buffer. In this case, once the event wait is completed
we have no additional jobs to submit and thus would never try to
signal semaphores or fences.

Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
03840bfcd1 v3dv: fix temporary imports of semaphores and fences with multisync
This is preparatory work to expose support for importing semaphores, which
was waiting on kernel multisync support.

When we implemented user-space multisync support we didn't handle
temporary fence/semaphore payload imports at all, so we fix that here.

Also, we add a has_temp boolean flag to identify the case where we have
a temporary payload in a fence/sempahore instead of just checking if
temp_sync is not 0. This is necessary to support semaphore imports
(for which we are not exposing support yet) because these need to drop
the temporary payload when they are used as wait semaphores in a submit,
but we can't destroy the underlying temp_sync at that point because it
needs to survive at least until the submit is finished, so instead
we use a flag to tell if we have an active temporary payload or not,
and we simply destroy any temp_sync on a semaphore destroy or any new
import on the same semaphore. We only strictly need this flag for
semaphores because fences drop the temporary payload when they are
reset, which happens in the CPU and can only be done if the GPU is not
using the fence, but we add the same flag for the fence for consistency.

Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
5a11a2fb6c v3dv: don't expose image load/store features for linear images
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
0590ce1362 v3dv: return early on image to buffer blit copies if image is linear
This path uses a shader blit to implement the copy which is only
supported for tiled images (except 1D). While blit_shader() already
checks for this, this path does a lot of heavy lifting to prepare for
the blit_shader call so we rather avoid that if possible when we know
blit_shader won't be able to implement the blit.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Iago Toral Quiroga
397f4963ed v3dv: TFU destination must be UIF
We had some code that considered the possibility that the destination
might be linear when configuring TFU jobs, but we never actually allow
for this to happen since we avoid hitting these paths in that case, as
the TFU always produces UIF results. Instead, add an assert when
producing the TFU packet to ensure we are expecting a UIF result.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
2022-03-18 13:17:58 +00:00
Alejandro Piñeiro
e3d905ec39 v3dv/pipeline: use new helper vk_shader_module_to_nir
In addition to use the helper, we also remove some of the lowering we
had at preprocess_nir, as they are called now by the helper.

As we are here we also move the call to nir_lower_sysvals_to_varyings,
that for some reason we were calling it before preprocess_nir.

It is worth to note that with this change we lose the ability to debug
the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done
on vk_spirv_to_nir, and as mentioned that includes several lowerings
now. The workaround to that is to use NIR_DEBUG.

We also needed to change how to check the entrypoint on the broadcom
compiler, checking just if it is an entrypoint, instead of assuming
that the name will be "main".

v2: tweak comment, squash v3dv and compiler change (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>
2022-03-18 11:05:11 +00:00
Juan A. Suarez Romero
730a294b90 v3dv: implement VK_EXT_line_rasterization
Allow to choose the line rasterization algorithm. It supports
rectangular and Bresenham-style line rasterization.

v2 (Iago):
 - Update documentation.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>
2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero
22759e9174 v3dv: add subpixel precision definition
Move number of bits for subpixel precision in rasterizer to a define.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>
2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero
b53dda6da8 broadcom: add line rasterization mode to packet definition
Add the supported line rasterization modes as enums in the XML packet
definition.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>
2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero
102ae4bdc8 broadcom: add on-disk cache debug option
Add support for`V3D_DEBUG=cache`, which prints on-disk cache events.

v2:
 - Use same debug format for v3d and v3dv (Alejandro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15380>
2022-03-18 08:58:01 +00:00
Iago Toral Quiroga
5c1302f47c v3dv: expose VK_EXT_image_drm_format_modifier
This has been implemented for a while but we could not expose it on
Vulkan 1.0 because the extension declares a dependency on
VK_KHR_sampler_ycbcr_conversion, which we don't implement, and
CTS would complain.

On Vulkan 1.1 however, VK_KHR_sampler_ycbcr_conversion was promoted
to core as an optional feature, and this is enough for the the
dependency to be satisfied, even if the feature is not supported,
meaning that we can now expose the extension.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15426>
2022-03-18 06:42:06 +00:00
Juan A. Suarez Romero
c432bfe74b broadcom/ci: Update flake list
Some of the tests marked as flake didn't show up as flakes for a long
time (more than 3 months). So likely they are already fixed.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15411>
2022-03-17 13:56:41 +00:00
Juan A. Suarez Romero
dfb6438392 v3dv: change MESA_GLSL_CACHE envvar reference
This was renamed to MESA_SHADER_CACHE.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15390>
2022-03-17 11:16:45 +01:00
Juan A. Suarez Romero
000b935c50 v3dv/ci: add test to skip list
Add test that it is a timeout in the CI, but otherwise it passes.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15374>
2022-03-14 18:55:13 +00:00
Tapani Pälli
adea096029 ci: update various ci result files
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12936>
2022-03-11 09:58:28 +00:00
Iago Toral Quiroga
49b5431197 broadcom/compiler: remove unused functions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15302>
2022-03-10 07:25:37 +00:00
Iago Toral Quiroga
44feff93c2 broadcom/compiler: don't always assign r5 if available
Instead, only favor assigning r5 if we have first decided to
assign an accumulator. This helps with assining r5 to short
lived uniforms, favoring accumulator rotation to facilitate
QPU merges.

total instructions in shared programs: 12656164 -> 12628339 (-0.22%)
instructions in affected programs: 5368373 -> 5340548 (-0.52%)
helped: 17420
HURT: 9996

total uniforms in shared programs: 3704776 -> 3704863 (<.01%)
uniforms in affected programs: 12247 -> 12334 (0.71%)
helped: 23
HURT: 78

total max-temps in shared programs: 2153505 -> 2152684 (-0.04%)
max-temps in affected programs: 26468 -> 25647 (-3.10%)
helped: 569
HURT: 328

total fills in shared programs: 4656 -> 4657 (0.02%)
fills in affected programs: 43 -> 44 (2.33%)
helped: 0
HURT: 1

total sfu-stalls in shared programs: 34728 -> 34403 (-0.94%)
sfu-stalls in affected programs: 3411 -> 3086 (-9.53%)
helped: 842
HURT: 534

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
77f58b46d9 broadcom/compiler: add comment on why we don't use r5 with ldunifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
5b140428b0 broadcom/compiler: adjust register threshold for 2-thread compiles
We have twice the registers in this case so it makes sense to double
this as well. While this causes slight regressions in shader-db
stats (due to additional register pressure), it helps us hide latency
of memory reads better on 2-thread compiles, where the thread switch
mechanism will be less effective. This shows a ~3% performance
improvement on the UE4 SunTemple demo.

total instructions in shared programs: 12642413 -> 12656164 (0.11%)
instructions in affected programs: 2272652 -> 2286403 (0.61%)
helped: 2924
HURT: 3389

total uniforms in shared programs: 3703861 -> 3704776 (0.02%)
uniforms in affected programs: 213729 -> 214644 (0.43%)
helped: 823
HURT: 1272

total max-temps in shared programs: 2150686 -> 2153505 (0.13%)
max-temps in affected programs: 191332 -> 194151 (1.47%)
helped: 1900
HURT: 1891

total spills in shared programs: 3255 -> 3274 (0.58%)
spills in affected programs: 166 -> 185 (11.45%)
helped: 3
HURT: 6

total fills in shared programs: 4630 -> 4656 (0.56%)
fills in affected programs: 367 -> 393 (7.08%)
helped: 7
HURT: 15

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
a35b47a0b1 broadcom/compiler: add a strategy to disable scheduling of general TMU reads
This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:

total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77

total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0

total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455

total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47

total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4

total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0

Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:

total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338

total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74

total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499

total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035

total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21

total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15

total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038

total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
f783bd0d2a broadcom/compiler: define v3d-specific delays for NIR instructions
We do a few changes over NIR's defaults:

1. Lower delay for texture reads. Empirically, we don't observe any
   benefits with delays over 50 and since this delay value is still
   used by the scheduler in the "favor register pressure" case it is
   benefitial to avoid overestimating it too much.

2. Adjust delay for non-filtered TMU reads to the delay selected for
   texture reads.

3. In our case, UBO reads from dynamically uniform addresses don't
   use the TMU and have a latency of 1 instruction in the best case
   scenario or 4 at worse, so we go with 1 so we don't try to move
   this early.

This helps us get back some of what we lost when updating the
default scheduler configuration to add a delay for non-filtered
memory reads:

total instructions in shared programs: 13126587 -> 12671765 (-3.46%)
instructions in affected programs: 3764097 -> 3309275 (-12.08%)
helped: 14664
HURT: 4244

total threads in shared programs: 407208 -> 415522 (2.04%)
threads in affected programs: 8716 -> 17030 (95.39%)
helped: 4224
HURT: 67

total uniforms in shared programs: 3812698 -> 3711224 (-2.66%)
uniforms in affected programs: 335170 -> 233696 (-30.28%)
helped: 2816
HURT: 3551

total max-temps in shared programs: 2318430 -> 2159345 (-6.86%)
max-temps in affected programs: 539991 -> 380906 (-29.46%)
helped: 13173
HURT: 1440

total spills in shared programs: 49086 -> 5966 (-87.85%)
spills in affected programs: 48306 -> 5186 (-89.26%)
helped: 1655
HURT: 28

total fills in shared programs: 55810 -> 9328 (-83.29%)
fills in affected programs: 54821 -> 8339 (-84.79%)
helped: 1659
HURT: 22

LOST:   0
GAINED: 3

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
e7a4e97076 nir/schedule: use larger delay for non-filtered memory reads
This has been pending for a long time. It is not very consistent to
add a significant delay for textures and not do it for UBOs, etc
The reason we have not been doing this so far is the accumulated effect
on register pressure for V3D as shown by shader-db results below, but
from the point of view of a generic scheduler it makes sense to do this.

Later patches will address V3D specific issues with register pressure
derived from this by letting the driver control its instruction delay
settings.

total instructions in shared programs: 12662138 -> 13126587 (3.67%)
instructions in affected programs: 1813091 -> 2277540 (25.62%)
helped: 2410
HURT: 10499

total threads in shared programs: 415858 -> 407208 (-2.08%)
threads in affected programs: 17348 -> 8698 (-49.86%)
helped: 8
HURT: 4333

total uniforms in shared programs: 3711483 -> 3812698 (2.73%)
uniforms in affected programs: 128012 -> 229227 (79.07%)
helped: 3474
HURT: 2143

total max-temps in shared programs: 2138763 -> 2318430 (8.40%)
max-temps in affected programs: 318780 -> 498447 (56.36%)
helped: 588
HURT: 11997

total spills in shared programs: 3860 -> 49086 (1171.66%)
spills in affected programs: 709 -> 45935 (6378.84%)
helped: 23
HURT: 1595

total fills in shared programs: 5573 -> 55810 (901.44%)
fills in affected programs: 1067 -> 51304 (4708.25%)
helped: 23
HURT: 1595

LOST:   3
GAINED: 0

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
9ef499b315 broadcom/compiler: stop moving UBO loads before NIR scheduling
This doesn't have any significant impact shader-db stats and would
reduce our capacity to hide latency from the loads, so it is probably
undesirable:

total instructions in shared programs: 12663189 -> 12663186 (<.01%)
instructions in affected programs: 4222 -> 4219 (-0.07%)
helped: 9
HURT: 4

total uniforms in shared programs: 3711624 -> 3711629 (<.01%)
uniforms in affected programs: 186 -> 191 (2.69%)
helped: 0
HURT: 2

total max-temps in shared programs: 2138822 -> 2138857 (<.01%)
max-temps in affected programs: 569 -> 604 (6.15%)
helped: 1
HURT: 9

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:03 +00:00
Timur Kristóf
64acec0ef9 nir: Fix lowering terminology of compute system values: "from"->"to".
This is to match other NIR terminology.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>
2022-03-08 17:36:31 +00:00
Iago Toral Quiroga
f761f8fd9e broadcom/compiler: simplify node/temp translation during register allocation
Now that we don't sort our nodes we can arrange them so we can
easily translate between nodes and temps without a mapping table,
just applying an offset.

To do this we have a single array of nodes where twe put first the nodes
for accumulators and then the nodes for temps. With this setup we can
ensure that for any given temp T, its node is always T + ACC_COUNT.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
871b0a7f6a broadcom/compiler: don't sort nodes for register allocation
Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.

However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.

Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).

Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we  sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:

total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987

total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4

total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173

total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195

total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12

total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
4483cd24af broadcom/compiler: sink uniform loads
total instructions in shared programs: 13014428 -> 13000420 (-0.11%)
instructions in affected programs: 743624 -> 729616 (-1.88%)
helped: 1392
HURT: 611

total threads in shared programs: 415858 -> 415874 (<.01%)
threads in affected programs: 16 -> 32 (100.00%)
helped: 8
HURT: 0

total uniforms in shared programs: 3720410 -> 3711652 (-0.24%)
uniforms in affected programs: 113442 -> 104684 (-7.72%)
helped: 635
HURT: 29

total max-temps in shared programs: 2154268 -> 2144876 (-0.44%)
max-temps in affected programs: 61279 -> 51887 (-15.33%)
helped: 1124
HURT: 187

total spills in shared programs: 4002 -> 3870 (-3.30%)
spills in affected programs: 265 -> 133 (-49.81%)
helped: 6
HURT: 0

total fills in shared programs: 5788 -> 5560 (-3.94%)
fills in affected programs: 603 -> 375 (-37.81%)
helped: 6
HURT: 0

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
e228642cf5 broadcom/compiler: move constants before their first user
For us they are basically uniforms too so we want to make their
lifespans short to facilitate allocating them to accumulators.

total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
instructions in affected programs: 8326040 -> 8297840 (-0.34%)
helped: 24939
HURT: 19894

total threads in shared programs: 415860 -> 415858 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 1

total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
uniforms in affected programs: 96134 -> 94632 (-1.56%)
helped: 744
HURT: 435

total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
max-temps in affected programs: 264598 -> 245427 (-7.25%)
helped: 10858
HURT: 841

total spills in shared programs: 4005 -> 4010 (0.12%)
spills in affected programs: 700 -> 705 (0.71%)
helped: 5
HURT: 10

total fills in shared programs: 5801 -> 5817 (0.28%)
fills in affected programs: 1346 -> 1362 (1.19%)
helped: 6
HURT: 11

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
a1998a9f43 broadcom/compiler: disallow TMU spills if max tmu spills is 0
If we are compiling with a strategy that does not allow TMU spills
we should not allow spilling anything that is not a uniform.
Otherwise the RA cost/benefit algorithm may choose to spill a
temp that is not uniform and that will cause us to immediately
fail the strategy and fallback to the next one, even if we
could've instead chosen to spill more uniforms to compile the
program successfully with that strategy.

Some relevant shader-db stats:

total instructions in shared programs: 13040711 -> 13043585 (0.02%)
instructions in affected programs: 234238 -> 237112 (1.23%)
helped: 73
HURT: 172

total threads in shared programs: 415664 -> 415860 (0.05%)
threads in affected programs: 196 -> 392 (100.00%)
helped: 98
HURT: 0

total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
uniforms in affected programs: 12831 -> 17518 (36.53%)
helped: 6
HURT: 100

total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
max-temps in affected programs: 4597 -> 3851 (-16.23%)
helped: 79
HURT: 21

total spills in shared programs: 4010 -> 4005 (-0.12%)
spills in affected programs: 55 -> 50 (-9.09%)
helped: 5
HURT: 0

total fills in shared programs: 5820 -> 5801 (-0.33%)
fills in affected programs: 186 -> 167 (-10.22%)
helped: 5
HURT: 0

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
cbb4d0dded broadcom/compiler: increase cost of TMU spills to 10
Our cost was 5 which matches the number of instructions we have to
add for a TMU spill (a fill is 4 instructions).

Uniform spills on the other hand add an extra instruction for each
fill and remove one instruction for the spill itself. These have
a cost of 1.

Therefore, if we have a single spill+fill, we end up with +9
instructions if it is a TMU spill and +0 instructions with a uniform
spill, so making the former only 5 times more costly is probably
not a good idea, and this is without even considering the added
latency of the TMU accesses.

Relevant shader-db changes show this causes as a marginal instruction
count increase in a few shaders but better thread counts and lower
TMU spilling overall:

total instructions in shared programs: 13037315 -> 13040711 (0.03%)
instructions in affected programs: 370106 -> 373502 (0.92%)
helped: 187
HURT: 321

total threads in shared programs: 415090 -> 415664 (0.14%)
threads in affected programs: 574 -> 1148 (100.00%)
helped: 287
HURT: 0

total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
uniforms in affected programs: 63075 -> 73667 (16.79%)
helped: 40
HURT: 395

total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
max-temps in affected programs: 15838 -> 13935 (-12.02%)
helped: 316
HURT: 34

total spills in shared programs: 4247 -> 4010 (-5.58%)
spills in affected programs: 2599 -> 2362 (-9.12%)
helped: 107
HURT: 14

total fills in shared programs: 6121 -> 5820 (-4.92%)
fills in affected programs: 3622 -> 3321 (-8.31%)
helped: 108
HURT: 13

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Guilherme Gallo
d1c6185b5a ci: skqp: Add Vulkan support for a630_skqp job
This commit adds support for Vulkan backend on a630_skqp job.

= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
  tested in development builds with skia tools

Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
2022-02-25 05:50:06 +00:00
Iago Toral Quiroga
cf99584f51 broadcom/compiler: move uniforms right before their first use after scheduling
On V3D the quality of the code we generate is significantly affected by
how we decide to assign accumulators during register allocation, which
is determined by liveness, favoring short-lived temps.

There are many shaders that end up doing a whole lot of uniform loads
first, and using them later, which is very inconvenient for our register
allocation process because this increases uniform liveness and causes
us to use accumulators less efficientely, leading to significant churn.

To fix this, we move uniforms right before their first use in the same
block, but we need to do this after NIR scheduling, which means we are
doing it in non-SSA form, since the scheduler has a tendency to undo
this optimization and it is not easy to modify it to avoid it, since it
works in more abstract terms, using instruction dependencies, estimated
register pressure and instruction delay information to do its work,
which are very different concepts.

total instructions in shared programs: 13316738 -> 13033613 (-2.13%)
instructions in affected programs: 10389172 -> 10106047 (-2.73%)
helped: 55442
HURT: 16144

total threads in shared programs: 413722 -> 415048 (0.32%)
threads in affected programs: 1428 -> 2754 (92.86%)
helped: 680
HURT: 17

total loops in shared programs: 1716 -> 1690 (-1.52%)
loops in affected programs: 26 -> 0
helped: 26
HURT: 0

total uniforms in shared programs: 3704313 -> 3705181 (0.02%)
uniforms in affected programs: 687730 -> 688598 (0.13%)
helped: 2920
HURT: 7384

total max-temps in shared programs: 2364785 -> 2175190 (-8.02%)
max-temps in affected programs: 1215387 -> 1025792 (-15.60%)
helped: 49667
HURT: 1556

total spills in shared programs: 4241 -> 4248 (0.17%)
spills in affected programs: 642 -> 649 (1.09%)
helped: 11
HURT: 19

total fills in shared programs: 6115 -> 6125 (0.16%)
fills in affected programs: 1276 -> 1286 (0.78%)
helped: 11
HURT: 21

total sfu-stalls in shared programs: 34381 -> 36578 (6.39%)
sfu-stalls in affected programs: 16055 -> 18252 (13.68%)
helped: 3647
HURT: 5206

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>
2022-02-24 11:36:00 +00:00
Iago Toral Quiroga
c4a78a2d2a broadcom/compiler: fix register class patching for postponed spills
If we have a postponed spill, the temp we create at ip is no longer
the spilled temp and therefore is affected by the thrsw injection.

Fixes corruption in the additive blending animation demo from
Three.js.

Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>
2022-02-22 11:17:10 +00:00
Erik Faye-Lund
25a37fabb7 vulkan/wsi: untangle buffer-images from prime
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.

To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
2022-02-22 10:04:34 +00:00