Commit graph

1372 commits

Author SHA1 Message Date
Alejandro Piñeiro
74785346b4 v3dv: Add support for the on-disk shader cache
Quoting Jason's commit message (afa8f5892), that also applies here:

"The Vulkan API provides a mechanism for applications to cache their
own shaders and manage on-disk pipeline caching themselves.
Generally, this is what I would recommend to application developers
and I've resisted implementing driver-side transparent caching in the
Vulkan driver for a long time.  However, not all applications do this
and, for some use-cases, it's just not practical."

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
cf71280d74 v3dv/device: avoid unused-result warning with asprintf
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
2bee6ffec3 v3dv/pipeline: compute sha1 for no-op fragment shaders correctly
We should use the nir shader, as with internal vkShaderModule, instead
of just the name.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
9a4099858b v3dv/pipeline: don't create a variant if compilation failed
Also return the proper Vulkan result for this case, that is somewhat
tricky. Technically Create[Graphics/Compute]Pipeline only allow OOM
errors. So for this case, there is only the alternative of the generic
VK_ERROR_UNKNOWN, even if we known the cause of the error. From spec:

 "VK_ERROR_UNKNOWN will be returned by an implementation when an
  unexpected error occurs that cannot be attributed to valid behavior
  of the application and implementation. Under these conditions, it
  may be returned from any command returning a VkResult"

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
e354c52801 v3dv/pipeline: try to get the shader variant directly from the cache
Until now we were always doing a two-step cache lookup, as we were
using the NIR shaders to fill up the key to lookup for the compiled
shaders. But since we were already generating the sha1 key with the
original SPIR-V shader (or its internal NIR representation) any info
we were collecting from from NIR is already implicit in the original
shader, so we can avoid using the NIR in most cases.

Because the v3d_key that is used to compile a shader is populated with
data coming directly from the NIR shader or produced during NIR
lowerings, we can't use it directly as part of the pipeline cache
entry. We could split them, but that would be confusing, so we add a
new struct, v3dv_pipeline_key used specifically to search for the
compiled shaders on the pipeline cache. v3d_key would be still used to
compile the shaders.

As we are using the same sha1 key for all compiled shaders in a
pipeline, we can also group all of them in the same cache entry, so we
don't need a lookup for each stage. This also allows to cache pipeline
data shared by all the stages (like the descriptor maps).

While we are here, we also create a single BO to store the assembly
for all the pipeline stages.

Finally, we remove the link to the variant on the pipeline stage
struct, to avoid the confusion of having two links to the same
data. This mostly means that we stop to use the pipeline stage
structures after the pipeline is created, so we can freed them.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
6afb8a9fec v3dv/pipeline: use broadcom_shader_stage as pipeline/variant stage type
So we could avoid using gl_shader_stage plus a is_coord boolean, that
only applies to VERTEX.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
0b98f20310 v3dv: define broadcom shader stages
Mostly the same that main mesa gl_shader_stage, but including the
coordinate shader. This would allow to loop over all the available
stages (for example if we need to free them, compute the max spill
size, etc).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
d7f4038374 v3dv/pipeline: remove v3d_key from shader_variant and pipeline stage
We stopped to re-use them after pippeline creation long ago, so let's
reduce the size of both structs, and avoid serialize/deserialize for
the variant case.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
b8c73c512a v3dv/pipeline: remove compiled_variant_count field
We are not really compiling several variants, or at least not on the
same pipeline.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
ebb2297a91 v3dv/pipeline: move topology to pipeline
So now we only store it once per pipeline, instead of once per
pipeline stage.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
dd72c99d77 v3dv/pipeline: use driver_location_map instead of nir utilities
If we were able to get a shader variant from the pipeline cache, we
will not have the nir shader available.

Note that this is what we were doing on the driver before the nir io
helpers were available.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
b71fd5587e broadcom/compiler: add driver_location_map at vs prog data
This maps the nir shader data.location to its final
data.driver_location. In general we are using the driver location as
index (like vattr_sizes on the same struct), so having this map is
useful if what we have is the data.location, and we don't have
available the original nir shader.

v2: use memset instead of for loop, and nir_foreach_shader_in_variable
    instead of nir_foreach_variable_with_modes (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
2be0c36775 broadcom/compiler: add local_size in v3d_compute_prog_data
As we plan to try to get directly the compiled variant from the cache,
it would be possible to not have available the nir shaders, so we add
this info on prog data.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
ab252d73a9 v3dv/pipeline: remove pipeline->use_push_constants
In the past we used this boolean for several things, it is really
superfluous right now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
f276efb2f8 v3dv/pipeline: remove pregenerate_variant
Right now we were not pre-generating several variants, but we decided
to let this method, just in case we need that idea back. This ended
being a bad idea. Several months have passed without that need, so
having that method just adds confusion. Also, if we need to add a
multiple-variant in the future, perhaps we would need to do it
different, so let's not template in advance.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro
098816fc9a v3dv/pipeline_cache: add more details when dumping debug info
We tweak a little some of the individual messages, and add a new
option to dump the stats when the pipeline destroy.

As we are here we also we also tweak the names of the global options
to make it more clear.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Mike Blumenkrantz
ad241b15a9 vk: consolidate dynamic descriptor binding sorting
this code was duplicated across several drivers

Reviewed-by: Adam Jackson <ajax@redhat.com>
turnip changes Reviewed-by: Hyunjun Ko <zzoon@igalia.com>

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9480>
2021-03-22 16:51:55 +00:00
Iago Toral Quiroga
cbe24a0e9c broadcom/compiler: use nir_lower_undef_to_zero
total instructions in shared programs: 13731663 -> 13721549 (-0.07%)
instructions in affected programs: 98242 -> 88128 (-10.29%)
helped: 191
HURT: 131
Instructions are helped.

total threads in shared programs: 412272 -> 412296 (<.01%)
threads in affected programs: 24 -> 48 (100.00%)
helped: 12
HURT: 0
Threads are helped.

total uniforms in shared programs: 3780693 -> 3779137 (-0.04%)
uniforms in affected programs: 10564 -> 9008 (-14.73%)
helped: 114
HURT: 7
Uniforms are helped.

total max-temps in shared programs: 2319942 -> 2319528 (-0.02%)
max-temps in affected programs: 4191 -> 3777 (-9.88%)
helped: 113
HURT: 22
Max-temps are helped.

total sfu-stalls in shared programs: 31584 -> 31616 (0.10%)
sfu-stalls in affected programs: 217 -> 249 (14.75%)
helped: 51
HURT: 54
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13763247 -> 13753165 (-0.07%)
inst-and-stalls in affected programs: 98719 -> 88637 (-10.21%)
helped: 187
HURT: 134
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Iago Toral Quiroga
1c987f5db3 broadcom/compiler: optimize constant vfpack
total instructions in shared programs: 13733627 -> 13731663 (-0.01%)
instructions in affected programs: 174140 -> 172176 (-1.13%)
helped: 1597
HURT: 310
Instructions are helped.

total uniforms in shared programs: 3784601 -> 3780693 (-0.10%)
uniforms in affected programs: 58678 -> 54770 (-6.66%)
helped: 2886
HURT: 3
Uniforms are helped.

total max-temps in shared programs: 2322714 -> 2319942 (-0.12%)
max-temps in affected programs: 15729 -> 12957 (-17.62%)
helped: 2189
HURT: 1
Max-temps are helped.

total spills in shared programs: 6010 -> 6012 (0.03%)
spills in affected programs: 61 -> 63 (3.28%)
helped: 0
HURT: 1

total fills in shared programs: 13494 -> 13497 (0.02%)
fills in affected programs: 89 -> 92 (3.37%)
helped: 0
HURT: 1

total sfu-stalls in shared programs: 31521 -> 31584 (0.20%)
sfu-stalls in affected programs: 328 -> 391 (19.21%)
helped: 30
HURT: 94
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13765148 -> 13763247 (-0.01%)
inst-and-stalls in affected programs: 174237 -> 172336 (-1.09%)
helped: 1551
HURT: 316
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Iago Toral Quiroga
b189409a46 broadcom/compiler: handle implicit uniform loads when optimizing constant alu
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Juan A. Suarez Romero
8ab6d2b4c4 ci/broadcom: use new piglit runner
Switch from the old piglit to the new piglit-runner executor.

Acked-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9680>
2021-03-18 16:31:45 +00:00
Juan A. Suarez Romero
727eadc76a ci/vc4/v3d: run piglit testsuite against Xorg
This increases the coverage adding tests that require an X server to
run.

Update also the expected results, and skip some flake tests.

Reviewed-by: Andres Gomez <agomez@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9562>
2021-03-17 11:40:58 +00:00
Juan A. Suarez Romero
dc2e3d6ff2 ci/v3dv: add flaky test in the skip list
Reviewed-by: Andres Gomez <agomez@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9562>
2021-03-17 11:40:58 +00:00
Alejandro Piñeiro
ec4c79c2b0 v3dv: avoid some maybe-uninitialized warnings
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9640>
2021-03-17 10:05:07 +00:00
Alejandro Piñeiro
c373b24369 v3dv/descriptor_set: don't free individual set if not allowed
If we have a host_memory_base pointer it means that we are handling
the pool memory as a whole, so each set is a sub-slice of the memory
pool. In this case we can't just free the individual set. In other
words, VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT is not set.

Note tha at that point we were able to sub-allocate an set, and we are
failing to sub-allocate the pool bo for the descripto bo. So
technically we could try to return that set to the pool (so undo the
change on host_memory_ptr before), that I assume was my intention when
(wrongly) calling vk_free there. But in practice, at that point we are
already on a out of descriptor pool situation, so in the end it
doesn't even matter.

This fixes a double free crash with the UE4 VehicleGame demo.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9640>
2021-03-17 10:05:07 +00:00
Iago Toral Quiroga
aefac60741 broadcom/compiler: use nir_lower_wrmasks to simplify TMU general stores
This pass splits writemaks with non-consecutive bits into multiple
store operations ensuring that each store only has consecutive
writemask bits set.

We can use this to simplify writemask handling in our backend removing
a loop solely intended to handle this case.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga
51a263530f broadcom/compiler: use nir_opt_load_store_vectorize
This will make it so we pack consecutive scalar operations into a vector
operation, reducing the amount of load/store operations in the NIR program.
Our backend can handle vector load/stores, and doing so may be more efficient
since we don't need to setup individual load/stores all the time.

A pathological case is:
dEQP-VK.spirv_assembly.instruction.compute.opcopymemory.array
which goes from 862 instructions to only 573 by converting
all scalar SSBO load/store operations to vec4 operations.

total instructions in shared programs: 13752607 -> 13733627 (-0.14%)
instructions in affected programs: 367117 -> 348137 (-5.17%)
helped: 1168
HURT: 371
Instructions are helped.

total threads in shared programs: 412230 -> 412272 (0.01%)
threads in affected programs: 54 -> 96 (77.78%)
helped: 23
HURT: 2
Threads are helped.

total uniforms in shared programs: 3790248 -> 3784601 (-0.15%)
uniforms in affected programs: 57417 -> 51770 (-9.84%)
helped: 1420
HURT: 19
Uniforms are helped.

total max-temps in shared programs: 2322170 -> 2322714 (0.02%)
max-temps in affected programs: 14353 -> 14897 (3.79%)
helped: 185
HURT: 306
Max-temps are HURT.

total spills in shared programs: 5940 -> 6010 (1.18%)
spills in affected programs: 65 -> 135 (107.69%)
helped: 0
HURT: 11

total fills in shared programs: 13372 -> 13494 (0.91%)
fills in affected programs: 75 -> 197 (162.67%)
helped: 0
HURT: 11

total sfu-stalls in shared programs: 31505 -> 31521 (0.05%)
sfu-stalls in affected programs: 751 -> 767 (2.13%)
helped: 210
HURT: 246
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13784112 -> 13765148 (-0.14%)
inst-and-stalls in affected programs: 360283 -> 341319 (-5.26%)
helped: 1125
HURT: 366
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga
3db322f305 broadcom/compiler: fix end of tmu sequence detection
TMUWT always terminates a TMU sequence.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga
1e4abf1fe3 vulkan/util: call glsl_type_singleton_init_or_ref from vk_instance_init
v2: link libvulkan_util with libglsl so it can find the glsl singleton symbols.
v3: link with libcompiler instead of libglsl (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> for the v3dv bits.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> for the turnip bits.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> for the radv bits.
Acked-by: Dave Airlie <airlied@redhat.com> for the lvp bits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9457>
2021-03-17 08:15:36 +01:00
Lukas Feller
164a51c80f v3dv: fix stride in buffer copy
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9402>
2021-03-17 06:42:34 +00:00
Lukas Feller
99a11f25b2 v3dv: fix assertion in job_compute_frame_tiling
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9402>
2021-03-17 06:42:34 +00:00
Mike Blumenkrantz
07c9dc54dd v3dv: use common interfaces for shader modules
squashed changes from Alejandro Piñeiro <apinheiro@igalia.com>:

Add call to vk_object_base_init on internal shader_module: we have
some cases where internally we have some shader modules that we don't
create through CreateShaderModule, so in this case we need to manually
call base_init. Not sure why this wasn't needed before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9508>
2021-03-15 21:47:44 +00:00
Iago Toral Quiroga
177dcd4b68 broadcom/compiler: be more flexible scheduling TMU writes
V3D 4.x allows more flexibility, so take advantage of that. Generally,
we can reorder any writes in the same sequence, so long as they are
not the sequence terminator (which must always be last, since it is
the one triggering the operation), and TMUD writes, since these must
be ordered with respect to each other.

total instructions in shared programs: 13735183 -> 13731927 (-0.02%)
instructions in affected programs: 903057 -> 899801 (-0.36%)
helped: 2358
HURT: 746
Instructions are helped.

total max-temps in shared programs: 2322020 -> 2322009 (<.01%)
max-temps in affected programs: 619 -> 608 (-1.78%)
helped: 19
HURT: 11
Inconclusive result (value mean confidence interval includes 0).

total sfu-stalls in shared programs: 31494 -> 31489 (-0.02%)
sfu-stalls in affected programs: 182 -> 177 (-2.75%)
helped: 40
HURT: 40
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13766677 -> 13763416 (-0.02%)
inst-and-stalls in affected programs: 901343 -> 898082 (-0.36%)
helped: 2349
HURT: 746
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
2021-03-15 08:03:28 +01:00
Iago Toral Quiroga
87ed614c47 broadcom/compiler: flag wrtmuc with a read dependency on last_tmu_config
Instead of using a write depdency. We use last_tmu_config to ensure ordering
of instructions participating in different TMU sequences. To this end,
all sequence terminators flag a write dependency on last_tmu_config, but
wrtmuc is not a sequence terminator, so we can be more flexible by flagging
it as a read depedency. This would prevent it to be moved into a previous
sequence (since it cannot be moved past the previous sequence terminator due
to the read depedency), but it allows it to be reordered with instructions in
the same sequence, which allows us to pair it up more effectively. Particularly,
it allows to pair up a wrtmuc with the sequence terminator of the same sequence,
turning code like this:

nop                  ; mov  tmut, r0     ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop                  ; nop               ; wrtmuc (tex[0].p1 | 0x0)
nop                  ; mov  tmus, r1

Into this:

nop                  ; mov  tmut, r0     ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop                  ; mov  tmus, r1     ; wrtmuc (tex[0].p1 | 0x0)

total instructions in shared programs: 13755738 -> 13735183 (-0.15%)
instructions in affected programs: 2510921 -> 2490366 (-0.82%)
helped: 10963
HURT: 485
Instructions are helped.

total max-temps in shared programs: 2322828 -> 2322020 (-0.03%)
max-temps in affected programs: 11303 -> 10495 (-7.15%)
helped: 608
HURT: 19
Max-temps are helped.

total sfu-stalls in shared programs: 31545 -> 31494 (-0.16%)
sfu-stalls in affected programs: 235 -> 184 (-21.70%)
helped: 62
HURT: 11
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13787283 -> 13766677 (-0.15%)
inst-and-stalls in affected programs: 2525187 -> 2504581 (-0.82%)
helped: 10989
HURT: 477
Inst-and-stalls are helped.

v2: add a comment explaining the read depdency (Piñeiro).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
2021-03-15 08:03:28 +01:00
Juan A. Suarez Romero
3f1c375581 ci/broadcom: allow custom kernels
So far, testing VC4 and V3D/V3DV requires the CI runners having access
to a Raspberry Pi 3/4 kernel, and the correspondent modules and
bootloader files. If a different kernel must be used, it means touching
the runners to provide them.

This commit adds the option to define an URL pointing to a (compressed)
tarball containing such files, without requiring dealing with the
runners. This link is provided through the `BM_BOOTFS` job variable.

The tarball must contain two directories in the root: a `/boot`
directory (containing the kernel, DTBs and bootloader files), and a
`/lib/modules` (or `/usr/lib/modules`) with the kernel modules.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9527>
2021-03-12 11:03:17 +00:00
Jason Ekstrand
4fb6c051c9 anv: Move vk_format helpers to common code
The Android ones we put in anv_android.c.  Maybe one day we'll want a
vk_android.h to put some common Android stuff but, for now, let's keep
it contained to ANV's android code.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Iago Toral Quiroga
8525cb1c53 v3dv: call util_cpu_detect() when initializing the instance
Fixes this assert in debug builds:

in __GI___assert_fail (assertion=0x7ffff731f66b "util_cpu_caps.nr_cpus >= 1", file=0x7ffff731f650 "../src/util/u_cpu_detect.h", line=116,
  function=0x7ffff7323280 <__PRETTY_FUNCTION__.11654> "util_get_cpu_caps") at assert.c:101
in util_get_cpu_caps () at ../src/util/u_cpu_detect.h:116
in _mesa_float_to_float16_rtz (val=0) at ../src/util/half_float.h:93
in util_format_r16g16b16a16_float_pack_rgba_float (dst_row=0x7fffffffbdc0 "", dst_stride=0, src_row=0x7fffffffbf90, src_stride=0, width=1, height=1)
   at src/util/format/u_format_table.c:13459
in util_format_pack_rgba (format=PIPE_FORMAT_R16G16B16A16_FLOAT, dst=0x7fffffffbdc0, src=0x7fffffffbf90, w=1) at ../src/util/format/u_format.h:1525
in util_pack_color (rgba=0x7fffffffbf90, format=PIPE_FORMAT_R16G16B16A16_FLOAT, uc=0x7fffffffbdc0) at ../src/gallium/auxiliary/util/u_pack_color.h:432
in v3dv_get_hw_clear_color (color=0x7fffffffbf90, internal_type=6, internal_size=8, hw_color=0x7fffffffbf10) at ../src/broadcom/vulkan/v3dv_cmd_buffer.c:1241

v2: move call from physical device to instance init.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9408>
2021-03-10 11:44:01 +01:00
Iago Toral Quiroga
c057a1211b broadcom/compiler: disallow ldunif during ldvary sequences if possible
This restores many of the hurt shaders from the previous patch at the
expense of re-adding ldvary tracking in the scheduler.

total instructions in shared programs: 13760415 -> 13755738 (-0.03%)
instructions in affected programs: 1207560 -> 1202883 (-0.39%)
helped: 5080
HURT: 1731
Instructions are helped.

total max-temps in shared programs: 2322991 -> 2322828 (<.01%)
max-temps in affected programs: 5063 -> 4900 (-3.22%)
helped: 229
HURT: 108
Max-temps are helped.

total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%)
sfu-stalls in affected programs: 478 -> 196 (-59.00%)
helped: 304
HURT: 21
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%)
inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%)
helped: 5162
HURT: 1697
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga
947e9e42cc broadcom/compiler: simplify ldvary pipelining
We get optimal ldvary pipelining by doing the following:

1) Carefully merge a paired ldvary into the previous instruction when
   possible.
2) When the above succeeds, flag the ldvary as scheduled immediately so
   we can merge one of its children into the current instruction.
3) When scheduling ldvary sequences, only pick up instructions that are
   part of the sequence to avoid picking up something that prevents
   successful pipelining.

This patch skips 3) assuming some hurt shaders in exchange for better
scheduling flexibility during ldvary sequences. Besides eliminating most
of the code dedicated to special handling ldvary sequences, this also
usually allows us to produce better code by merging instructions that are
unrelated to ldvary sequences into the ldvary sequences, which is
particularly effective to fill up the gaps produced when scheduling the
first and last ldvary sequences as well as the gaps produced by flat
and noperspective varyings sequences that don't have both mul and add
instructions.

Notice that there are some hurt shaders, because some times the extra
scheduler flexibility can lead to picking up instructions that will
break a sequence without compensating for that, typically an ldunif
that prevents us from doing the fixup for a follow-up ldvary. We will
try to correct some of these cases with the next patch.

total instructions in shared programs: 13786037 -> 13760415 (-0.19%)
instructions in affected programs: 3201387 -> 3175765 (-0.80%)
helped: 16155
HURT: 4146
Instructions are helped.

total max-temps in shared programs: 2324834 -> 2322991 (-0.08%)
max-temps in affected programs: 22160 -> 20317 (-8.32%)
helped: 1340
HURT: 103
Max-temps are helped.

total sfu-stalls in shared programs: 30685 -> 31827 (3.72%)
sfu-stalls in affected programs: 782 -> 1924 (146.04%)
helped: 253
HURT: 1416
Inconclusive result.

total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%)
inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%)
helped: 15331
HURT: 4179
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga
d37241bdc4 broadcom/compiler: move code block around
These checks depend on prev_inst being set, so move them down below
with all the other checks with the same requirement.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga
8bcda472a0 broadcom/compiler: add an additional sanity check assert to the ldvary fixup
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Jason Ekstrand
e20e85f01e nir: Make nir_ssa_def_rewrite_uses_after take an SSA value
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.

Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Jason Ekstrand
117668b811 nir: Make nir_ssa_def_rewrite_uses take an SSA value
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Ilia Mirkin
fd017458bc mesa: fix fbo attachment size check for RBs, make it trigger in ES2
Makes dEQP-GLES2.functional.fbo.completeness.size.distinct pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9441>
2021-03-06 20:29:41 +00:00
Iago Toral Quiroga
6e6e71ddf9 broadcom/compiler: fix flags check for ldvary merge
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.

Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
2021-03-05 12:55:47 +00:00
Iago Toral Quiroga
21c1853c55 broadcom/compiler: ldvary doesn't implicitly write to r3 since V3D 4.1
total instructions in shared programs: 13805979 -> 13786037 (-0.14%)
instructions in affected programs: 2263244 -> 2243302 (-0.88%)
helped: 10646
HURT: 1508
Instructions are helped.

total threads in shared programs: 412220 -> 412242 (<.01%)
threads in affected programs: 58 -> 80 (37.93%)
helped: 17
HURT: 6
Threads are helped.

total uniforms in shared programs: 3793200 -> 3790401 (-0.07%)
uniforms in affected programs: 131281 -> 128482 (-2.13%)
helped: 1547
HURT: 281
Uniforms are helped.

total max-temps in shared programs: 2326309 -> 2324834 (-0.06%)
max-temps in affected programs: 31836 -> 30361 (-4.63%)
helped: 1139
HURT: 153
Max-temps are helped.

total spills in shared programs: 5932 -> 5940 (0.13%)
spills in affected programs: 80 -> 88 (10.00%)
helped: 2
HURT: 3

total fills in shared programs: 13370 -> 13372 (0.01%)
fills in affected programs: 480 -> 482 (0.42%)
helped: 2
HURT: 3

total sfu-stalls in shared programs: 30829 -> 30685 (-0.47%)
sfu-stalls in affected programs: 2190 -> 2046 (-6.58%)
helped: 570
HURT: 533
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13836808 -> 13816722 (-0.15%)
inst-and-stalls in affected programs: 2276152 -> 2256066 (-0.88%)
helped: 10643
HURT: 1525
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9430>
2021-03-05 13:37:39 +01:00
Iago Toral Quiroga
839007e490 broadcom/compiler: always restart ldvary pipelining when scheduling ldvary
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.

This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.

total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.

total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.

total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
2021-03-05 10:32:19 +01:00
Juan A. Suarez Romero
7b3b8524ef ci: Bump deqp to vk-gl-cts 1.2.5.2
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9369>
2021-03-04 11:09:35 +00:00
Iago Toral Quiroga
c3732ac0d0 broadcom/compiler: be more aggressive skipping unifa writes
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.

So if we have code like this:

unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1

In practice we end up with QPU like this:

unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1

And with this patch we get:

unifa addr + 0
nop
nop
nop
ldunifa.r0  <--- reads offset 0
ldunifa.-   <--- reads offset 4
ldunifa.-   <--- reads offset 8
ldunifa.r1  <--- reads offset 12

Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.

Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:

total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.

total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.

total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).

total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
2021-03-04 09:00:15 +01:00
Iago Toral Quiroga
2897a83ff8 broadcom/compiler: drop the destination for unused ldunifa
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
2021-03-04 09:00:15 +01:00