fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 11:18:11 +02:00

Author	SHA1	Message	Date
Juan A. Suarez Romero	3f1c375581	ci/broadcom: allow custom kernels So far, testing VC4 and V3D/V3DV requires the CI runners having access to a Raspberry Pi 3/4 kernel, and the correspondent modules and bootloader files. If a different kernel must be used, it means touching the runners to provide them. This commit adds the option to define an URL pointing to a (compressed) tarball containing such files, without requiring dealing with the runners. This link is provided through the `BM_BOOTFS` job variable. The tarball must contain two directories in the root: a `/boot` directory (containing the kernel, DTBs and bootloader files), and a `/lib/modules` (or `/usr/lib/modules`) with the kernel modules. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9527>	2021-03-12 11:03:17 +00:00
Jason Ekstrand	4fb6c051c9	anv: Move vk_format helpers to common code The Android ones we put in anv_android.c. Maybe one day we'll want a vk_android.h to put some common Android stuff but, for now, let's keep it contained to ANV's android code. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>	2021-03-10 18:17:31 +00:00
Iago Toral Quiroga	8525cb1c53	v3dv: call util_cpu_detect() when initializing the instance Fixes this assert in debug builds: in __GI___assert_fail (assertion=0x7ffff731f66b "util_cpu_caps.nr_cpus >= 1", file=0x7ffff731f650 "../src/util/u_cpu_detect.h", line=116, function=0x7ffff7323280 <__PRETTY_FUNCTION__.11654> "util_get_cpu_caps") at assert.c:101 in util_get_cpu_caps () at ../src/util/u_cpu_detect.h:116 in _mesa_float_to_float16_rtz (val=0) at ../src/util/half_float.h:93 in util_format_r16g16b16a16_float_pack_rgba_float (dst_row=0x7fffffffbdc0 "", dst_stride=0, src_row=0x7fffffffbf90, src_stride=0, width=1, height=1) at src/util/format/u_format_table.c:13459 in util_format_pack_rgba (format=PIPE_FORMAT_R16G16B16A16_FLOAT, dst=0x7fffffffbdc0, src=0x7fffffffbf90, w=1) at ../src/util/format/u_format.h:1525 in util_pack_color (rgba=0x7fffffffbf90, format=PIPE_FORMAT_R16G16B16A16_FLOAT, uc=0x7fffffffbdc0) at ../src/gallium/auxiliary/util/u_pack_color.h:432 in v3dv_get_hw_clear_color (color=0x7fffffffbf90, internal_type=6, internal_size=8, hw_color=0x7fffffffbf10) at ../src/broadcom/vulkan/v3dv_cmd_buffer.c:1241 v2: move call from physical device to instance init. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9408>	2021-03-10 11:44:01 +01:00
Iago Toral Quiroga	c057a1211b	broadcom/compiler: disallow ldunif during ldvary sequences if possible This restores many of the hurt shaders from the previous patch at the expense of re-adding ldvary tracking in the scheduler. total instructions in shared programs: 13760415 -> 13755738 (-0.03%) instructions in affected programs: 1207560 -> 1202883 (-0.39%) helped: 5080 HURT: 1731 Instructions are helped. total max-temps in shared programs: 2322991 -> 2322828 (<.01%) max-temps in affected programs: 5063 -> 4900 (-3.22%) helped: 229 HURT: 108 Max-temps are helped. total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%) sfu-stalls in affected programs: 478 -> 196 (-59.00%) helped: 304 HURT: 21 Sfu-stalls are helped. total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%) inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%) helped: 5162 HURT: 1697 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>	2021-03-10 07:52:22 +00:00
Iago Toral Quiroga	947e9e42cc	broadcom/compiler: simplify ldvary pipelining We get optimal ldvary pipelining by doing the following: 1) Carefully merge a paired ldvary into the previous instruction when possible. 2) When the above succeeds, flag the ldvary as scheduled immediately so we can merge one of its children into the current instruction. 3) When scheduling ldvary sequences, only pick up instructions that are part of the sequence to avoid picking up something that prevents successful pipelining. This patch skips 3) assuming some hurt shaders in exchange for better scheduling flexibility during ldvary sequences. Besides eliminating most of the code dedicated to special handling ldvary sequences, this also usually allows us to produce better code by merging instructions that are unrelated to ldvary sequences into the ldvary sequences, which is particularly effective to fill up the gaps produced when scheduling the first and last ldvary sequences as well as the gaps produced by flat and noperspective varyings sequences that don't have both mul and add instructions. Notice that there are some hurt shaders, because some times the extra scheduler flexibility can lead to picking up instructions that will break a sequence without compensating for that, typically an ldunif that prevents us from doing the fixup for a follow-up ldvary. We will try to correct some of these cases with the next patch. total instructions in shared programs: 13786037 -> 13760415 (-0.19%) instructions in affected programs: 3201387 -> 3175765 (-0.80%) helped: 16155 HURT: 4146 Instructions are helped. total max-temps in shared programs: 2324834 -> 2322991 (-0.08%) max-temps in affected programs: 22160 -> 20317 (-8.32%) helped: 1340 HURT: 103 Max-temps are helped. total sfu-stalls in shared programs: 30685 -> 31827 (3.72%) sfu-stalls in affected programs: 782 -> 1924 (146.04%) helped: 253 HURT: 1416 Inconclusive result. total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%) inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%) helped: 15331 HURT: 4179 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>	2021-03-10 07:52:22 +00:00
Iago Toral Quiroga	d37241bdc4	broadcom/compiler: move code block around These checks depend on prev_inst being set, so move them down below with all the other checks with the same requirement. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>	2021-03-10 07:52:22 +00:00
Iago Toral Quiroga	8bcda472a0	broadcom/compiler: add an additional sanity check assert to the ldvary fixup Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>	2021-03-10 07:52:22 +00:00
Jason Ekstrand	e20e85f01e	nir: Make nir_ssa_def_rewrite_uses_after take an SSA value This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after() with an SSA def, and rewrites all the users as needed. Acked-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>	2021-03-08 16:59:55 +00:00
Jason Ekstrand	117668b811	nir: Make nir_ssa_def_rewrite_uses take an SSA value This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses() with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites all the users as needed. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>	2021-03-08 16:59:55 +00:00
Ilia Mirkin	fd017458bc	mesa: fix fbo attachment size check for RBs, make it trigger in ES2 Makes dEQP-GLES2.functional.fbo.completeness.size.distinct pass. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9441>	2021-03-06 20:29:41 +00:00
Iago Toral Quiroga	6e6e71ddf9	broadcom/compiler: fix flags check for ldvary merge We were checking that the previous instruction doesn't write flags, but we also need to check it doesn't read them. Fixes: `1784dd22a3` ('broadcom/compiler: pipeline smooth ldvary sequences') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>	2021-03-05 12:55:47 +00:00
Iago Toral Quiroga	21c1853c55	broadcom/compiler: ldvary doesn't implicitly write to r3 since V3D 4.1 total instructions in shared programs: 13805979 -> 13786037 (-0.14%) instructions in affected programs: 2263244 -> 2243302 (-0.88%) helped: 10646 HURT: 1508 Instructions are helped. total threads in shared programs: 412220 -> 412242 (<.01%) threads in affected programs: 58 -> 80 (37.93%) helped: 17 HURT: 6 Threads are helped. total uniforms in shared programs: 3793200 -> 3790401 (-0.07%) uniforms in affected programs: 131281 -> 128482 (-2.13%) helped: 1547 HURT: 281 Uniforms are helped. total max-temps in shared programs: 2326309 -> 2324834 (-0.06%) max-temps in affected programs: 31836 -> 30361 (-4.63%) helped: 1139 HURT: 153 Max-temps are helped. total spills in shared programs: 5932 -> 5940 (0.13%) spills in affected programs: 80 -> 88 (10.00%) helped: 2 HURT: 3 total fills in shared programs: 13370 -> 13372 (0.01%) fills in affected programs: 480 -> 482 (0.42%) helped: 2 HURT: 3 total sfu-stalls in shared programs: 30829 -> 30685 (-0.47%) sfu-stalls in affected programs: 2190 -> 2046 (-6.58%) helped: 570 HURT: 533 Sfu-stalls are helped. total inst-and-stalls in shared programs: 13836808 -> 13816722 (-0.15%) inst-and-stalls in affected programs: 2276152 -> 2256066 (-0.88%) helped: 10643 HURT: 1525 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9430>	2021-03-05 13:37:39 +01:00
Iago Toral Quiroga	839007e490	broadcom/compiler: always restart ldvary pipelining when scheduling ldvary When we were only able to pipeline smooth varyings, if we had to disable ldvary pipelining in the middle of a sequence it would stay disabled for the rest of the program, to prevent us from prioritizing scheduling of ldvary instructions that we would not be able to pipeline effectively. Now that we can pipeline all ldvary sequences we can change this. This change re-enables ldvary pipelining upon finding the next ldvary in the program in the hopes that we can continue pipelining succesfully. To do this, we track the number of ldvary instructions we emitted so far and compare that to the number of inputs in the fragment shader we are scheduling. This also allows us to simplify our ldvary tracking at nir to vir time, since that is all now handled in the QPU scheduler. total instructions in shared programs: 13817048 -> 13810783 (-0.05%) instructions in affected programs: 810114 -> 803849 (-0.77%) helped: 4843 HURT: 591 Instructions are helped. total max-temps in shared programs: 2326612 -> 2326300 (-0.01%) max-temps in affected programs: 4689 -> 4377 (-6.65%) helped: 285 HURT: 7 Max-temps are helped. total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%) sfu-stalls in affected programs: 207 -> 130 (-37.20%) helped: 120 HURT: 42 Sfu-stalls are helped. total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%) inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%) helped: 4899 HURT: 590 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>	2021-03-05 10:32:19 +01:00
Juan A. Suarez Romero	7b3b8524ef	ci: Bump deqp to vk-gl-cts 1.2.5.2 Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9369>	2021-03-04 11:09:35 +00:00
Iago Toral Quiroga	c3732ac0d0	broadcom/compiler: be more aggressive skipping unifa writes We had an optimization in place to skip a unifa write if the address happens to be right after the last ldunifa read address, but we can take this further and update the unifa address by emitting ldunifa instructions if needed to skip a unifa write that is close enough. This is because a unifa write involves 4 cycles: 1 for the write and 3 delay slots before we can emit the first ldunifa. So if we have code like this: unifa addr + 0 ldunifa.r0 unifa addr + 12 ldunifa.r1 In practice we end up with QPU like this: unifa addr + 0 nop nop nop ldunifa.r0 unifa addr + 12 nop nop nop ldunifa.r1 And with this patch we get: unifa addr + 0 nop nop nop ldunifa.r0 <--- reads offset 0 ldunifa.- <--- reads offset 4 ldunifa.- <--- reads offset 8 ldunifa.r1 <--- reads offset 12 Of course, QPU scheduling might find ways to fill the NOPs to some extent and remove some of the gains, but generally speaking, this is still usually a win. Going by shader-db results, allowing the next unifa address to be up to 12 bytes after the address resulting from the last ldunifa read shows the best results: total instructions in shared programs: 13817048 -> 13812202 (-0.04%) instructions in affected programs: 602701 -> 597855 (-0.80%) helped: 1750 HURT: 760 Instructions are helped. total uniforms in shared programs: 3795485 -> 3793200 (-0.06%) uniforms in affected programs: 43930 -> 41645 (-5.20%) helped: 898 HURT: 0 Uniforms are helped. total max-temps in shared programs: 2326612 -> 2326621 (<.01%) max-temps in affected programs: 651 -> 660 (1.38%) helped: 10 HURT: 21 Inconclusive result (value mean confidence interval includes 0). total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%) sfu-stalls in affected programs: 627 -> 591 (-5.74%) helped: 186 HURT: 158 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%) inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%) helped: 1747 HURT: 757 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>	2021-03-04 09:00:15 +01:00
Iago Toral Quiroga	2897a83ff8	broadcom/compiler: drop the destination for unused ldunifa We can't remove unused ldunifa that are not the first or last in a sequence, but we can still ignore their destination to reduce register pressure. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>	2021-03-04 09:00:15 +01:00
Iago Toral Quiroga	acbd4881c2	broadcom/compiler: ldvary pipelining tracking and documentation clean-ups Now that we can pipeline all varyings we should not be referring specifically to smooth varyings anywhere. Also, rename the instruction field 'ldvary_pipelining' to 'is_ldvary_sequence', which is more appropriate, since we always set this for any instruction involved with varying setups, independently of whether they end up being pipelined or not. This also does some other minor edits which intend to slightly simplify the code and make it a bit more compact. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>	2021-03-02 13:54:14 +01:00
Iago Toral Quiroga	05f8efbc2c	broadcom/compiler: allow pipelining of flat and noperspective varyings These end up having a NOP between the ldvary and the next instruction in the sequence (a MOV for flat and an FADD for noperspetive): nop ; nop ; ldvary.r0 nop ; nop fadd rf6, r0, r5 ; nop ; ldvary.r1 nop ; nop fadd rf5, r1, r5 ; nop ; ldvary.r2 nop ; nop fadd rf4, r2, r5 ; nop ; ldvary.r3 To pipeline these, we can reuse the same infrastructure we have in place for smooth varyings but we need to avoid breaking the sequence due to the NOP instruction. We do that by testing if dropping the sequence when we failed to pick up the next instruction also fails to choose an instruction. This is not perfect, because we may be able to choose an instruction outside the sequence such as an ldunif, and use that to break a sequence that we could otherwise continue after scheduling the NOP instruction, but it is still better than nothing. total instructions in shared programs: 13820690 -> 13819774 (<.01%) instructions in affected programs: 64026 -> 63110 (-1.43%) helped: 479 HURT: 62 Instructions are helped. total max-temps in shared programs: 2326435 -> 2326423 (<.01%) max-temps in affected programs: 102 -> 90 (-11.76%) helped: 7 HURT: 0 Max-temps are helped. total sfu-stalls in shared programs: 30683 -> 30710 (0.09%) sfu-stalls in affected programs: 13 -> 40 (207.69%) helped: 2 HURT: 24 Sfu-stalls are HURT. total inst-and-stalls in shared programs: 13851373 -> 13850484 (<.01%) inst-and-stalls in affected programs: 62818 -> 61929 (-1.42%) helped: 466 HURT: 65 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	1784dd22a3	broadcom/compiler: pipeline smooth ldvary sequences Typically, we would schedule smooth varyings like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 fadd rf13, r0, r5 ; nop ; ldvary.r1 nop ; fmul r2, r1, rf0 fadd rf12, r2, r5 ; nop ; ldvary.r3 nop ; fmul r4, r3, rf0 fadd rf11, r4, r5 ; nop ; ldvary.r0 where we pair up an ldvary with the fadd of the previous sequence instead of the previous fmul. This is because ldvary has an implicit write to r5 which is read by the fadd of the previous sequence, so our dependency tracking doesn't allow us to move the ldvary before the fadd, however, the r5 write of the ldvary instruction happens in the instruction after it is emitted so we can actually move it to the fmul and the r5 write would still happen in the same instruction as the fadd, which is fine. This patch allows us to pipeline these sequences optimally. For that, after merging an ldvary into a previous instruction in the middle of a pipelineable ldvary sequence, we check if we can manually move it to the last scheduled instruction instead (the one before the instruction we are currently scheduling). If we are successful at moving the ldvary to the previous instruction, then we flag the ldvary as scheduled immediately, which may promote its children (the follow-up fmul instruction for that ldvary) to DAG heads and continue the merge loop so that fmul can be picked and merged into the final fadd of the previous sequence (where we had originally merged the ldvary). This leads to a result that looks like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 Shader-db results: total instructions in shared programs: 14071591 -> 13820690 (-1.78%) instructions in affected programs: 7809692 -> 7558791 (-3.21%) helped: 41209 HURT: 4528 Instructions are helped. total max-temps in shared programs: 2335784 -> 2326435 (-0.40%) max-temps in affected programs: 84302 -> 74953 (-11.09%) helped: 4561 HURT: 293 Max-temps are helped. total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%) sfu-stalls in affected programs: 3551 -> 2697 (-24.05%) helped: 1713 HURT: 750 Sfu-stalls are helped. total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%) inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%) helped: 41411 HURT: 4535 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	1d021539a2	broadcom/compiler: track pipelineable ldvary sequences If we have two (or more) smooth varyings like this: nop t3; ldvary.rf0 fmul t5, t3, t0 fadd t6, t5, r5 nop t7; ldvary.rf0 fmul t9, t7, t0 fadd t10, t9, r5 nop t11; ldvary.rf0 fmul t13, t11, t0 fadd t14, t13, r5 We may be able to pipeline them like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 But in order to do this, we will need to manually tweak the QPU scheduling. This patch tracks information about ldvary sequences that are good candidates for pipelining, and a follow-up patch will use this information to pipeline them when we emit the QPU code. v2 (apinheiro): - Rename the v3d_compile fields to avoid confusion with the qinst fields. - Assert that a sequence's start instruction is not the same as the end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	c2c2cdc3d3	broadcom/compiler: fix indentation style Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	b41edee879	broadcom/compiler: fix DAG pre-remove for merged instructions When selecting an instruction to merge, we want to pre-remove that instruction from the DAG, not the one we are merging it in, which we had already pre-removed right before. The reason this was not causing problems before is that the consequence of this bug is we will choose the same instruction again in the merge loop and trying to merge that instruction twice will fail and we would break out of the merge loop and move on. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	8a60bde0cf	v3dv: fix branching to large secondaries with more than one BCL buffer. Fixes: dEQP-VK.api.command_buffers.record_many_draws_secondary_* Tested-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9333>	2021-03-01 15:16:45 +01:00
Andreas Bergmeier	b4772d15ab	v3dv: Output a message if file open fails in physical_device_init In the caller, this error simply gets mapped to VK_ERROR_INIT[...]. Especially for users it is very valuable to know what the driver tried and what kind of failure occured. Thus just straight out log to stderr. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9317>	2021-03-01 09:25:21 +00:00
Eric Anholt	bcea453d4a	ci/piglit: Stop including the test counts at the end of expectations. It's just a ton of fuss for driver developers fixing piglit tests. This makes the trace expectation files pretty silly (empty expectation, but you'll get a diff to a non-empty result when something fails) Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Adam Jackson <ajax@redhat.com> Acked-by: Andres Gomez <agomez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9226>	2021-02-24 18:55:02 +00:00
Eric Anholt	60573b443b	v3d: Replace driver lowering of GL_CLAMP with mesa/st's. Mesa core can do this logic for us now. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>	2021-02-24 18:03:46 +00:00
Mike Blumenkrantz	e89f158b82	v3dv: remove for_each_bit() macro this was unused Reviewed-by: Rob Clark <robclark@freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9191>	2021-02-24 17:11:44 +00:00
Juan A. Suarez Romero	15e1979c51	ci/vc4/v3d: Parallelize piglit jobs Split the piglit jobs in multiple parallel executions to speed up the runtime. v2: - Set parallel in V3D piglit jobs. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Andres Gomez <agomez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9022>	2021-02-24 09:41:45 +01:00
Juan A. Suarez Romero	e814e23f59	ci/piglit: allow parallel piglit jobs This allows to split a piglit job in several parallel jobs, to speed up the execution. Due piglit restrictions, this only works for single profiles. Otherwise an error will be shown in the runner. Also, a new gitlab job variable `PIGLIT_TESTS` is introduced that contains the excluded/included tests with `-x` or `-n`. The rest of the piglit options go to `PIGLIT_OPTIONS` (like `--timeout n`). v2 (Andres): - Replay profile is supported in parallel jobs. - Bail out inmediately if parallel jobs is tried with multiple profiles. - Use testlist only when doing parallel jobs. - Do not drop pass tests when filtering executed tests. - Get rid of PIGLIT_FRACTION. v4: - uncommit unrelated change (Andres). Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Andres Gomez <agomez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9022>	2021-02-24 09:41:33 +01:00
Iago Toral Quiroga	b17ec53c81	broadcom/compiler: use nir_opt_sink total instructions in shared programs: 14072341 -> 14062334 (-0.07%) instructions in affected programs: 1996685 -> 1986678 (-0.50%) helped: 3038 HURT: 2432 Instructions are helped. total uniforms in shared programs: 3797720 -> 3794523 (-0.08%) uniforms in affected programs: 191711 -> 188514 (-1.67%) helped: 831 HURT: 449 Uniforms are helped. total max-temps in shared programs: 2340632 -> 2335124 (-0.24%) max-temps in affected programs: 113632 -> 108124 (-4.85%) helped: 2728 HURT: 436 Max-temps are helped. total spills in shared programs: 6050 -> 5931 (-1.97%) spills in affected programs: 2869 -> 2750 (-4.15%) helped: 14 HURT: 4 total fills in shared programs: 13970 -> 13371 (-4.29%) fills in affected programs: 8831 -> 8232 (-6.78%) helped: 14 HURT: 4 total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%) inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%) helped: 3009 HURT: 2426 Inst-and-stalls are helped. LOST: 0 GAINED: 10 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9209>	2021-02-24 08:02:00 +01:00
Juan A. Suarez Romero	4675121ea6	ci/v3d: Update expected resuls for piglit Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Andres Gomez <agomez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9224>	2021-02-23 18:14:51 +00:00
Iago Toral Quiroga	54c17e45ae	broadcom/compiler: skip unnecessary unifa writes If a new UBO load happens to read exactly at the offset right after the previous UBO load (something that is fairly common, for example when reading a matrix), we can skip the unifa write (with its 3 delay slots) and just continue to call ldunifa to continue reading consecutive addresses. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e1cf2406da	broadcom/compiler: add a constant alu optimization pass Currently this is useful to clean up after DCEing leading ldunifa instructions, but it can be expanded to handle more cases which may allow to simplify the compiler code in places where we have been trying to optimize manually for similar cases. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	89de085055	broadcom/compiler: remove unused leading ldunifa This requires that we go back to the unifa write and update the address to jump over the unused leading component. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	9d16d2d0be	broadcom/compiler: allow dead code elimination of unused trailing ldunifa If a ldunifa is the last in a sequence and is not used, we can safely eliminate it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e20ae14978	broadcom/compiler: fix ldunif optimization When we look back for a previous uniform definition we want to start looking from the current position of the cursor, not the end of the current block. The latter only works when translating from NIR, since in that case both always match, but any optimization pass may rewrite code and emit uniforms at any place in the middle of the program. Also, ntq_store_dest expects result to be written by the last instruction to handle the case where it is stored to a NIR register. That won't be the case if the result comes from an optimized uniform, so in that case we need to insert a MOV, like we do in non-uniform control flow. v2: fix ntq_store_dest for optimized uniforms. Fixes: `14af7b3085` ('broadcom/compiler: don't emit redundant ldunif') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Eric Anholt	60d413b894	ci: Move the piglit expectations lists to the per-driver CI dirs. Now changing piglit expectations won't retest everyone else's drivers. Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>	2021-02-22 23:02:43 +00:00
Eric Anholt	ad77170b85	ci: Move the dEQP and traces expectations to the per-driver CI dirs. This means less custom test-source-dep stuff for these drivers, though it means that touching the CI expects files will cause a bit more retesting: - broadcom drivers retest as a group (but Igalia requested that organization of CI files) - radv+radeonsi retest as a group - lvp+llvmpipe retest as a group Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>	2021-02-22 23:02:42 +00:00
Eric Anholt	dab845d457	ci: Move specific driver testing to separate files in separate dirs. The top-level gitlab-ci.yml is big and unwieldy when one wants to work on CI for a single driver. Move the drivers to separate include files for ease of finding all your driver's tests, and also to pave the way for work on a single driver's CI to not retest all other drivers. Reviewed-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9139>	2021-02-19 17:30:36 +00:00
Eric Anholt	a687e71afd	v3d/qpu: Avoid leaking memory in the QPU disasm test. Required to run this test under ASan, as we'll be soon doing for building ARM drivers with asan testing. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9070>	2021-02-18 00:49:00 +00:00
Iago Toral Quiroga	064b846949	broadcom/compiler: don't dump shader-db stats for failed shaders Shaders that fail register allocation were dumped with an instruction count of 0, so getting them to compile would show up as an instruction count regression. Also, the LOST/GAINED stats depend on us not dumping data for failed shaders, which is why we were always seeing 0/0 there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	df6c19c1fd	broadcom/compiler: use a helper function to decide on TMU spilling As we add more compiler optimizations that can increase register pressure we may decide to disallow TMU spilling in more cases so it is probably better to move this to its own helper function. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov	7f61ff7b4d	broadcom/compiler: Merge instructions more efficiently Instructions are allowed to access up to two rf registers, or one rf register and a small immediate. This change allows qpu_merge_inst to take full advantage of this by allowint the merging of two instructions if they have no more than two different rf registers between them, or one rf register and one small immediate. qpu_merge_inst rewrites the instructions as needed to pack everything into raddr_a and raddr_b in the merged instruction. shader-db stats: total instructions in shared programs: 19938769 -> 18929664 (-5.06%) instructions in affected programs: 17929438 -> 16920333 (-5.63%) helped: 95008 HURT: 242 helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98% HURT stats (abs) min: 1 max: 2 x̄: 1.10 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54% 95% mean confidence interval for instructions value: -10.67 -10.52 95% mean confidence interval for instructions %-change: -5.37% -5.33% Instructions are helped. total max-temps in shared programs: 3122664 -> 3112446 (-0.33%) max-temps in affected programs: 124881 -> 114663 (-8.18%) helped: 5445 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1 helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67% 95% mean confidence interval for max-temps value: -1.91 -1.84 95% mean confidence interval for max-temps %-change: -9.12% -8.81% Max-temps are helped. total sfu-stalls in shared programs: 38028 -> 41231 (8.42%) sfu-stalls in affected programs: 6053 -> 9256 (52.92%) helped: 664 HURT: 3380 helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00% HURT stats (abs) min: 1 max: 4 x̄: 1.15 x̃: 1 HURT stats (rel) min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: 0.76 0.82 95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26% Sfu-stalls are HURT. total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%) inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%) helped: 95017 HURT: 245 helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95% HURT stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54% 95% mean confidence interval for inst-and-stalls value: -10.64 -10.48 95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31% Inst-and-stalls are helped. v2 (Iago): - moved early return for naddrs > 2 even earlier. - only update {add,mul}.b mux if instruction has more than one operand. - don't OR b->raddr_{a,b} if we are not merging add/mul instructions. - don't initialize packed to 0. - minor style fixes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>	2021-02-16 11:46:31 +00:00
Alejandro Piñeiro	3f614c6f7c	v3dv/meta_copy: get tlb compatible BC compressed formats for copies So we can use the tlb path for several operations (copy image, clear, copy buffer to image, etc). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Alejandro Piñeiro	6fdf375a90	v3dv/formats: expose support for BC1-3 compressed formats Even though we can't expose textureCompressedBC as the hw doesn't support all the formats, we can expose as supported individual formats. This gets several ~850 CTS tests going from skip to pass, with patterns like: * dEQP-VK.texture.compressed.bc* * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2dbc * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.3dbc * dEQP-VK.api.info.image_format_propertiesbc * etc v2: BC1-3 formats are texture filterable (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Alejandro Piñeiro	fcb229cbe0	v3dv/device: clarify that we can't expose textureCompressionBC From spec: "textureCompressionBC specifies whether all of the BC compressed texture formats are supported. If this feature is enabled" Note the all. v3d hw supports BC1, BC2, and BC3, but not BC4 through BC7. Let's clarify that we can't expose textureCompressionBC even if we support some of them. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Iago Toral Quiroga	82981ccbb1	broadcom/compiler: use unifa for UBO loads from uniform addresses This basically processes UBO loads as uniform loads by writing the load address to the unifa register and reading sequential values with ldunifa. This process is faster than going through the TMU, but we can only use it when the address we are reading from is uniform across all channels, since we are basically reading from the UBO address as if it was a uniform stream. This leads to better performance in the UE4 Shooter demo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:22 +00:00
Iago Toral Quiroga	878555976e	broadcom/compiler: emit ldunifarf when needed Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator and ldunifarf writes to the register file. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	c2a04aca48	broadcom/compiler: do not DCE ldunifa ldunifa reads a uniform from the unifa address and updates the unifa address implicitly, so if we dead-code-eliminate one a follow-up ldunifa will not read from the appropriate address. We could avoid this if the compiler ensures that every ldunifa is paired with an explicit unifa, so for example if we are reading a vec4, we could emit: unifa (addrr) ldunifa unifa (addr+4) ldunifa unifa (addr+8) ldunifa unifa (addr+12) ldunifa instead of: unifa (addr) ldunifa ldunifa ldunifa ldunifa But since each unifa has a 3 delay slot before we can do ldunifa, that would end up being quite expensive. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00

1 2 3 4 5 ...

1338 commits