Commit graph

1318 commits

Author SHA1 Message Date
Iago Toral Quiroga
c2c2cdc3d3 broadcom/compiler: fix indentation style
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga
b41edee879 broadcom/compiler: fix DAG pre-remove for merged instructions
When selecting an instruction to merge, we want to pre-remove that
instruction from the DAG, not the one we are merging it in, which
we had already pre-removed right before.

The reason this was not causing problems before is that the
consequence of this bug is we will choose the same instruction
again in the merge loop and trying to merge that instruction twice
will fail and we would break out of the merge loop and move on.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga
8a60bde0cf v3dv: fix branching to large secondaries with more than one BCL buffer.
Fixes:
dEQP-VK.api.command_buffers.record_many_draws_secondary_*

Tested-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9333>
2021-03-01 15:16:45 +01:00
Andreas Bergmeier
b4772d15ab v3dv: Output a message if file open fails in physical_device_init
In the caller, this error simply gets mapped to VK_ERROR_INIT[...].
Especially for users it is very valuable to know what the driver
tried and what kind of failure occured. Thus just straight out log
to stderr.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9317>
2021-03-01 09:25:21 +00:00
Eric Anholt
bcea453d4a ci/piglit: Stop including the test counts at the end of expectations.
It's just a ton of fuss for driver developers fixing piglit tests.  This
makes the trace expectation files pretty silly (empty expectation, but
you'll get a diff to a non-empty result when something fails)

Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9226>
2021-02-24 18:55:02 +00:00
Eric Anholt
60573b443b v3d: Replace driver lowering of GL_CLAMP with mesa/st's.
Mesa core can do this logic for us now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>
2021-02-24 18:03:46 +00:00
Mike Blumenkrantz
e89f158b82 v3dv: remove for_each_bit() macro
this was unused

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9191>
2021-02-24 17:11:44 +00:00
Juan A. Suarez Romero
15e1979c51 ci/vc4/v3d: Parallelize piglit jobs
Split the piglit jobs in multiple parallel executions to speed up the
runtime.

v2:
 - Set parallel in V3D piglit jobs.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9022>
2021-02-24 09:41:45 +01:00
Juan A. Suarez Romero
e814e23f59 ci/piglit: allow parallel piglit jobs
This allows to split a piglit job in several parallel jobs, to speed up
the execution.

Due piglit restrictions, this only works for single profiles. Otherwise
an error will be shown in the runner.

Also, a new gitlab job variable `PIGLIT_TESTS` is introduced that
contains the excluded/included tests with `-x` or `-n`. The rest of the
piglit options go to `PIGLIT_OPTIONS` (like `--timeout n`).

v2 (Andres):
 - Replay profile is supported in parallel jobs.
 - Bail out inmediately if parallel jobs is tried with multiple
profiles.
 - Use testlist only when doing parallel jobs.
 - Do not drop pass tests when filtering executed tests.
 - Get rid of PIGLIT_FRACTION.

v4:
 - uncommit unrelated change (Andres).

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9022>
2021-02-24 09:41:33 +01:00
Iago Toral Quiroga
b17ec53c81 broadcom/compiler: use nir_opt_sink
total instructions in shared programs: 14072341 -> 14062334 (-0.07%)
instructions in affected programs: 1996685 -> 1986678 (-0.50%)
helped: 3038
HURT: 2432
Instructions are helped.

total uniforms in shared programs: 3797720 -> 3794523 (-0.08%)
uniforms in affected programs: 191711 -> 188514 (-1.67%)
helped: 831
HURT: 449
Uniforms are helped.

total max-temps in shared programs: 2340632 -> 2335124 (-0.24%)
max-temps in affected programs: 113632 -> 108124 (-4.85%)
helped: 2728
HURT: 436
Max-temps are helped.

total spills in shared programs: 6050 -> 5931 (-1.97%)
spills in affected programs: 2869 -> 2750 (-4.15%)
helped: 14
HURT: 4

total fills in shared programs: 13970 -> 13371 (-4.29%)
fills in affected programs: 8831 -> 8232 (-6.78%)
helped: 14
HURT: 4

total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%)
inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%)
helped: 3009
HURT: 2426
Inst-and-stalls are helped.

LOST:   0
GAINED: 10

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9209>
2021-02-24 08:02:00 +01:00
Juan A. Suarez Romero
4675121ea6 ci/v3d: Update expected resuls for piglit
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9224>
2021-02-23 18:14:51 +00:00
Iago Toral Quiroga
54c17e45ae broadcom/compiler: skip unnecessary unifa writes
If a new UBO load happens to read exactly at the offset right after the
previous UBO load (something that is fairly common, for example when
reading a matrix), we can skip the unifa write (with its 3 delay slots)
and just continue to call ldunifa to continue reading consecutive addresses.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga
e1cf2406da broadcom/compiler: add a constant alu optimization pass
Currently this is useful to clean up after DCEing leading ldunifa
instructions, but it can be expanded to handle more cases which
may allow to simplify the compiler code in places where we have
been trying to optimize manually for similar cases.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga
89de085055 broadcom/compiler: remove unused leading ldunifa
This requires that we go back to the unifa write and update the address
to jump over the unused leading component.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga
9d16d2d0be broadcom/compiler: allow dead code elimination of unused trailing ldunifa
If a ldunifa is the last in a sequence and is not used, we can safely
eliminate it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga
e20ae14978 broadcom/compiler: fix ldunif optimization
When we look back for a previous uniform definition we want to
start looking from the current position of the cursor, not the
end of the current block. The latter only works when translating
from NIR, since in that case both always match, but any optimization
pass may rewrite code and emit uniforms at any place in the middle of
the program.

Also, ntq_store_dest expects result to be written by the last instruction
to handle the case where it is stored to a NIR register. That won't be
the case if the result comes from an optimized uniform, so in that case
we need to insert a MOV, like we do in non-uniform control flow.

v2: fix ntq_store_dest for optimized uniforms.

Fixes: 14af7b3085 ('broadcom/compiler: don't emit redundant ldunif')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Eric Anholt
60d413b894 ci: Move the piglit expectations lists to the per-driver CI dirs.
Now changing piglit expectations won't retest everyone else's drivers.

Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>
2021-02-22 23:02:43 +00:00
Eric Anholt
ad77170b85 ci: Move the dEQP and traces expectations to the per-driver CI dirs.
This means less custom test-source-dep stuff for these drivers, though it
means that touching the CI expects files will cause a bit more retesting:

- broadcom drivers retest as a group (but Igalia requested that
  organization of CI files)
- radv+radeonsi retest as a group
- lvp+llvmpipe retest as a group

Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>
2021-02-22 23:02:42 +00:00
Eric Anholt
dab845d457 ci: Move specific driver testing to separate files in separate dirs.
The top-level gitlab-ci.yml is big and unwieldy when one wants to work on
CI for a single driver.  Move the drivers to separate include files for
ease of finding all your driver's tests, and also to pave the way for work
on a single driver's CI to not retest all other drivers.

Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9139>
2021-02-19 17:30:36 +00:00
Eric Anholt
a687e71afd v3d/qpu: Avoid leaking memory in the QPU disasm test.
Required to run this test under ASan, as we'll be soon doing for building
ARM drivers with asan testing.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9070>
2021-02-18 00:49:00 +00:00
Iago Toral Quiroga
064b846949 broadcom/compiler: don't dump shader-db stats for failed shaders
Shaders that fail register allocation were dumped with an instruction
count of 0, so getting them to compile would show up as an instruction
count regression. Also, the LOST/GAINED stats depend on us not dumping
data for failed shaders, which is why we were always seeing 0/0 there.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga
df6c19c1fd broadcom/compiler: use a helper function to decide on TMU spilling
As we add more compiler optimizations that can increase register pressure
we may decide to disallow TMU spilling in more cases so it is probably
better to move this to its own helper function.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga
14af7b3085 broadcom/compiler: don't emit redundant ldunif
If we emit a new uniform and that uniform has already been emitted
in the same block we can just reuse that.

There is a balancing game here between reducing ldunif instructions
and not increasing register pressure too much though, so we put
a limit to how far back we are willing to look for a previous
definition of the uniform. Based on shader-db results, 20 instructions
produces best results.

total instructions in shared programs: 14928266 -> 14907432 (-0.14%)
instructions in affected programs: 6431841 -> 6411007 (-0.32%)
helped: 15270
HURT: 10772
Instructions are helped.

total uniforms in shared programs: 3944672 -> 3840276 (-2.65%)
uniforms in affected programs: 1827184 -> 1722788 (-5.71%)
helped: 30423
HURT: 845
Uniforms are helped.

total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%)
inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%)
helped: 15287
HURT: 10852
Inst-and-stalls are helped.

v2 (Eric):
 - consider ldunifrf too
 - check that no other instruction writes to the register

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov
7f61ff7b4d broadcom/compiler: Merge instructions more efficiently
Instructions are allowed to access up to two rf registers, or one rf
register and a small immediate. This change allows qpu_merge_inst to
take full advantage of this by allowint the merging of two instructions
if they have no more than two different rf registers between them,
or one rf register and one small immediate. qpu_merge_inst rewrites
the instructions as needed to pack everything into raddr_a and raddr_b
in the merged instruction.

shader-db stats:
total instructions in shared programs: 19938769 -> 18929664 (-5.06%)
instructions in affected programs: 17929438 -> 16920333 (-5.63%)
helped: 95008
HURT: 242
helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98%
HURT stats (abs)   min: 1 max: 2 x̄: 1.10 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54%
95% mean confidence interval for instructions value: -10.67 -10.52
95% mean confidence interval for instructions %-change: -5.37% -5.33%
Instructions are helped.

total max-temps in shared programs: 3122664 -> 3112446 (-0.33%)
max-temps in affected programs: 124881 -> 114663 (-8.18%)
helped: 5445
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1
helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67%
95% mean confidence interval for max-temps value: -1.91 -1.84
95% mean confidence interval for max-temps %-change: -9.12% -8.81%
Max-temps are helped.

total sfu-stalls in shared programs: 38028 -> 41231 (8.42%)
sfu-stalls in affected programs: 6053 -> 9256 (52.92%)
helped: 664
HURT: 3380
helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1
helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00%
HURT stats (abs)   min: 1 max: 4 x̄: 1.15 x̃: 1
HURT stats (rel)   min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: 0.76 0.82
95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26%
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%)
inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%)
helped: 95017
HURT: 245
helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95%
HURT stats (abs)   min: 1 max: 2 x̄: 1.09 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54%
95% mean confidence interval for inst-and-stalls value: -10.64 -10.48
95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31%
Inst-and-stalls are helped.

v2 (Iago):
 - moved early return for naddrs > 2 even earlier.
 - only update {add,mul}.b mux if instruction has more than one operand.
 - don't OR b->raddr_{a,b} if we are not merging add/mul instructions.
 - don't initialize packed to 0.
 - minor style fixes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>
2021-02-16 11:46:31 +00:00
Alejandro Piñeiro
3f614c6f7c v3dv/meta_copy: get tlb compatible BC compressed formats for copies
So we can use the tlb path for several operations (copy image, clear,
copy buffer to image, etc).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Alejandro Piñeiro
6fdf375a90 v3dv/formats: expose support for BC1-3 compressed formats
Even though we can't expose textureCompressedBC as the hw doesn't
support all the formats, we can expose as supported individual
formats.

This gets several ~850 CTS tests going from skip to pass, with
patterns like:

  * dEQP-VK.texture.compressed.bc*
  * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d*bc*
  * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.3d*bc*
  * dEQP-VK.api.info.image_format_properties*bc*
  * etc

v2: BC1-3 formats are texture filterable (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Alejandro Piñeiro
fcb229cbe0 v3dv/device: clarify that we can't expose textureCompressionBC
From spec:

"textureCompressionBC specifies whether all of the BC compressed
 texture formats are supported. If this feature is enabled"

Note the *all*. v3d hw supports BC1, BC2, and BC3, but not BC4 through
BC7.

Let's clarify that we can't expose textureCompressionBC even if we
support some of them.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Iago Toral Quiroga
82981ccbb1 broadcom/compiler: use unifa for UBO loads from uniform addresses
This basically processes UBO loads as uniform loads by writing
the load address to the unifa register and reading sequential
values with ldunifa.

This process is faster than going through the TMU, but we can only
use it when the address we are reading from is uniform across all
channels, since we are basically reading from the UBO address
as if it was a uniform stream.

This leads to better performance in the UE4 Shooter demo.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:22 +00:00
Iago Toral Quiroga
878555976e broadcom/compiler: emit ldunifarf when needed
Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator
and ldunifarf writes to the register file.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
c2a04aca48 broadcom/compiler: do not DCE ldunifa
ldunifa reads a uniform from the unifa address and updates the unifa
address implicitly, so if we dead-code-eliminate one a follow-up
ldunifa will not read from the appropriate address.

We could avoid this if the compiler ensures that every ldunifa is
paired with an explicit unifa, so for example if we are reading a
vec4, we could emit:

unifa (addrr)
ldunifa
unifa (addr+4)
ldunifa
unifa (addr+8)
ldunifa
unifa (addr+12)
ldunifa

instead of:

unifa (addr)
ldunifa
ldunifa
ldunifa
ldunifa

But since each unifa has a 3 delay slot before we can do ldunifa,
that would end up being quite expensive.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
efc75e13ea broadcom/compiler: disallow reading two uniforms in the same instruction
The simulator asserts on this, which can happen if we merge a ldunif
(or any other instruction that reads a uniform implicitly) and
ldunifa in the same instruction.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
e8e4bdae8d broadcom/compiler: ensure 3-slot delay between unifa and ldunifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
42880fdf5d broadcom/compiler: preserve ordering of unifa/ldunifa sequences
unifa writes the addresss from which follow-up ldunifa loads,
and each ldunifa increments the unifa addeess by 32-bit so the
loads need to be ordered too.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
97c078488f broadcom/compiler: disallow unifa overlap with thread switch/end
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
24db1a5112 broadcom/compiler: add a helper to check if an instruction writes unifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
4b929ae9f0 broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x
This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware
we are targetting. Our version resolution doesn't allow us to check
for 4.2 versions lower than .14, but that is okay because the
simulator would still validate this in any case.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
457ed5aa01 broadcom/compiler: name registers correctly based on V3D version
So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x,
which are aliased.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
f85fcaa494 broadcom/compiler: pass a devinfo to check if an instruction writes to TMU
V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA
(which isn't a TMU write address). This change passes a devinfo to
any functions that need to do these checks so we can account for the
target V3D version correctly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
449af48f42 broadcom/compiler: add V3D_QPU_WADDR_UNIFA
This only exists in V3D 4.x and aliases V3D_QPU_WADDR_TMU from V3D 3.x.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov
9909fe6bac broadcom/compiler: Skip bool_to_cond where possible
This change keeps track of when a boolean temp is loaded into the flags
by a comparison instruction and uses that information to skip emitting
instructions to set the flags in ntq_emit_bool_to_cond when the flags
already have the right contents.

total instructions in shared programs: 11116502 -> 11112225 (-0.04%)
instructions in affected programs: 631691 -> 627414 (-0.68%)
helped: 1591
HURT: 754
helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58%
HURT stats (abs)   min: 1 max: 19 x̄: 3.07 x̃: 2
HURT stats (rel)   min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15%
95% mean confidence interval for instructions value: -2.02 -1.63
95% mean confidence interval for instructions %-change: -0.94% -0.71%
Instructions are helped.

total uniforms in shared programs: 3281555 -> 3281513 (<.01%)
uniforms in affected programs: 1754 -> 1712 (-2.39%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5
helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05%
HURT stats (abs)   min: 1 max: 15 x̄: 7.40 x̃: 3
HURT stats (rel)   min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41%
95% mean confidence interval for uniforms value: -8.57 2.97
95% mean confidence interval for uniforms %-change: -7.35% 1.07%
Inconclusive result (value mean confidence interval includes 0).

total max-temps in shared programs: 1758419 -> 1758174 (-0.01%)
max-temps in affected programs: 7006 -> 6761 (-3.50%)
helped: 290
HURT: 14
helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88%
HURT stats (abs)   min: 1 max: 13 x̄: 6.00 x̃: 3
HURT stats (rel)   min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12%
95% mean confidence interval for max-temps value: -1.03 -0.58
95% mean confidence interval for max-temps %-change: -6.24% -4.16%
Max-temps are helped.

total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%)
sfu-stalls in affected programs: 1578 -> 1512 (-4.18%)
helped: 257
HURT: 252
helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1
helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: -0.25 -0.01
95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33%
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%)
inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%)
helped: 1581
HURT: 755
helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59%
HURT stats (abs)   min: 1 max: 17 x̄: 3.17 x̃: 2
HURT stats (rel)   min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20%
95% mean confidence interval for inst-and-stalls value: -2.06 -1.66
95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70%
Inst-and-stalls are helped.

Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov
8762f29e9c broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f
Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Iago Toral Quiroga
bd0ef080d0 v3d/compiler: fix QPU scheduler TMU sequence shuffling
The QPU scheduler allows to move certain TMU instructions around and
since we enabled pipelining, we need to protect against the case where
doing this might break a TMU sequence. For example, this test:

dEQP-VK.rasterization.line_continuity.line-strip

Was generating this VIR:

mov tmud, t187
mov.pushz null, t176
mov.ifa tmua, t9
nop null; wrtmuc (img[0].p0 | 0x0)
mov tmut, t185
mov tmud, t180
mov.ifa tmusf, t183
nop null; thrsw

where we have a general TMU access (tmud,tmua) followed by an image
access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning
into:

nop            ; nop               ; ldunifrf.rf22 (0xffffff00 / -nan)
nop            ; nop               ; wrtmuc (img[0].p0 | 0x0)
nop            ; nop               ; ldtmu.r2
add  r0, r2, 1 ; nop               ; ldtmu.r3
nop            ; nop               ; ldtmu.r4
nop            ; mov  tmud, r0
nop            ; mov.ifa  tmua, rf15
nop            ; mov  tmut, r4     ; thrsw
nop            ; mov  tmud, rf22
nop            ; mov.ifa  tmusf, r3

where it allowed the wrtmuc to move up and before the general TMU access,
leading to an incorrect TMU sequence.

Fix this by flagging TMUA writes (which are the sequence terminators for
general TMU accessess) as writing new TMU configuration, like we do for all
other TMU sequence terminators for textures and images.

Fixes: 197090a3fc ('broadcom/compiler: implement pipelining for general TMU operations')

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>
2021-02-10 13:18:25 +00:00
Alejandro Piñeiro
f758b1a25b v3dv: support for depthBiasClamp
Gets tests like the following working:
dEQP-VK.dynamic_state.rs_state.depth_bias_clamp

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8928>
2021-02-10 10:29:09 +00:00
Eric Anholt
bcb5f9f94a v3d: Stop advertising support for flat shading.
The GL frontend can lower this weird GL feature away for us.  This should
fix redeclaration of the gl_Color/SecondaryColor as centroid, since that
case had been missed in the !flat special case here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
ff805f8ac7 v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED.
The GL frontend can lower away this deprecated GL feature for us.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
2992dc7386 v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR.
The GL frontend can lower away this deprecated GL feature for us.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
5ddc2f916f v3d: Clean up vestiges of alpha test lowering.
We had an unnecessary case in our uniforms upload switch statement, since
we no longer advertise the cap.

Fixes: 8ad931808e ("v3d: do not report alpha-test as supported")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov
9e1aa23448 v3dv: initialize render_fd at the top of physical_device_init
This fixes an uninitialized variable warning.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8902>
2021-02-09 06:45:41 +00:00
Iago Toral Quiroga
8eeb61a3bf v3dv: add a perf trace when a device is created with robust buffer access
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
2021-02-08 13:00:16 +00:00
Iago Toral Quiroga
e6f8202749 v3dv: serialize pipeline compilation when debugging shaders
It is possible to compile pipelines in multiple threads, but when we
are dumping debug information for shaders, we want all the outputs
serialized so we can make sense of it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
2021-02-08 13:00:16 +00:00