Commit graph

1304 commits

Author SHA1 Message Date
Iago Toral Quiroga
9d16d2d0be broadcom/compiler: allow dead code elimination of unused trailing ldunifa
If a ldunifa is the last in a sequence and is not used, we can safely
eliminate it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga
e20ae14978 broadcom/compiler: fix ldunif optimization
When we look back for a previous uniform definition we want to
start looking from the current position of the cursor, not the
end of the current block. The latter only works when translating
from NIR, since in that case both always match, but any optimization
pass may rewrite code and emit uniforms at any place in the middle of
the program.

Also, ntq_store_dest expects result to be written by the last instruction
to handle the case where it is stored to a NIR register. That won't be
the case if the result comes from an optimized uniform, so in that case
we need to insert a MOV, like we do in non-uniform control flow.

v2: fix ntq_store_dest for optimized uniforms.

Fixes: 14af7b3085 ('broadcom/compiler: don't emit redundant ldunif')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Eric Anholt
60d413b894 ci: Move the piglit expectations lists to the per-driver CI dirs.
Now changing piglit expectations won't retest everyone else's drivers.

Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>
2021-02-22 23:02:43 +00:00
Eric Anholt
ad77170b85 ci: Move the dEQP and traces expectations to the per-driver CI dirs.
This means less custom test-source-dep stuff for these drivers, though it
means that touching the CI expects files will cause a bit more retesting:

- broadcom drivers retest as a group (but Igalia requested that
  organization of CI files)
- radv+radeonsi retest as a group
- lvp+llvmpipe retest as a group

Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9161>
2021-02-22 23:02:42 +00:00
Eric Anholt
dab845d457 ci: Move specific driver testing to separate files in separate dirs.
The top-level gitlab-ci.yml is big and unwieldy when one wants to work on
CI for a single driver.  Move the drivers to separate include files for
ease of finding all your driver's tests, and also to pave the way for work
on a single driver's CI to not retest all other drivers.

Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9139>
2021-02-19 17:30:36 +00:00
Eric Anholt
a687e71afd v3d/qpu: Avoid leaking memory in the QPU disasm test.
Required to run this test under ASan, as we'll be soon doing for building
ARM drivers with asan testing.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9070>
2021-02-18 00:49:00 +00:00
Iago Toral Quiroga
064b846949 broadcom/compiler: don't dump shader-db stats for failed shaders
Shaders that fail register allocation were dumped with an instruction
count of 0, so getting them to compile would show up as an instruction
count regression. Also, the LOST/GAINED stats depend on us not dumping
data for failed shaders, which is why we were always seeing 0/0 there.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga
df6c19c1fd broadcom/compiler: use a helper function to decide on TMU spilling
As we add more compiler optimizations that can increase register pressure
we may decide to disallow TMU spilling in more cases so it is probably
better to move this to its own helper function.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga
14af7b3085 broadcom/compiler: don't emit redundant ldunif
If we emit a new uniform and that uniform has already been emitted
in the same block we can just reuse that.

There is a balancing game here between reducing ldunif instructions
and not increasing register pressure too much though, so we put
a limit to how far back we are willing to look for a previous
definition of the uniform. Based on shader-db results, 20 instructions
produces best results.

total instructions in shared programs: 14928266 -> 14907432 (-0.14%)
instructions in affected programs: 6431841 -> 6411007 (-0.32%)
helped: 15270
HURT: 10772
Instructions are helped.

total uniforms in shared programs: 3944672 -> 3840276 (-2.65%)
uniforms in affected programs: 1827184 -> 1722788 (-5.71%)
helped: 30423
HURT: 845
Uniforms are helped.

total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%)
inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%)
helped: 15287
HURT: 10852
Inst-and-stalls are helped.

v2 (Eric):
 - consider ldunifrf too
 - check that no other instruction writes to the register

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov
7f61ff7b4d broadcom/compiler: Merge instructions more efficiently
Instructions are allowed to access up to two rf registers, or one rf
register and a small immediate. This change allows qpu_merge_inst to
take full advantage of this by allowint the merging of two instructions
if they have no more than two different rf registers between them,
or one rf register and one small immediate. qpu_merge_inst rewrites
the instructions as needed to pack everything into raddr_a and raddr_b
in the merged instruction.

shader-db stats:
total instructions in shared programs: 19938769 -> 18929664 (-5.06%)
instructions in affected programs: 17929438 -> 16920333 (-5.63%)
helped: 95008
HURT: 242
helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98%
HURT stats (abs)   min: 1 max: 2 x̄: 1.10 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54%
95% mean confidence interval for instructions value: -10.67 -10.52
95% mean confidence interval for instructions %-change: -5.37% -5.33%
Instructions are helped.

total max-temps in shared programs: 3122664 -> 3112446 (-0.33%)
max-temps in affected programs: 124881 -> 114663 (-8.18%)
helped: 5445
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1
helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67%
95% mean confidence interval for max-temps value: -1.91 -1.84
95% mean confidence interval for max-temps %-change: -9.12% -8.81%
Max-temps are helped.

total sfu-stalls in shared programs: 38028 -> 41231 (8.42%)
sfu-stalls in affected programs: 6053 -> 9256 (52.92%)
helped: 664
HURT: 3380
helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1
helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00%
HURT stats (abs)   min: 1 max: 4 x̄: 1.15 x̃: 1
HURT stats (rel)   min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: 0.76 0.82
95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26%
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%)
inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%)
helped: 95017
HURT: 245
helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95%
HURT stats (abs)   min: 1 max: 2 x̄: 1.09 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54%
95% mean confidence interval for inst-and-stalls value: -10.64 -10.48
95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31%
Inst-and-stalls are helped.

v2 (Iago):
 - moved early return for naddrs > 2 even earlier.
 - only update {add,mul}.b mux if instruction has more than one operand.
 - don't OR b->raddr_{a,b} if we are not merging add/mul instructions.
 - don't initialize packed to 0.
 - minor style fixes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>
2021-02-16 11:46:31 +00:00
Alejandro Piñeiro
3f614c6f7c v3dv/meta_copy: get tlb compatible BC compressed formats for copies
So we can use the tlb path for several operations (copy image, clear,
copy buffer to image, etc).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Alejandro Piñeiro
6fdf375a90 v3dv/formats: expose support for BC1-3 compressed formats
Even though we can't expose textureCompressedBC as the hw doesn't
support all the formats, we can expose as supported individual
formats.

This gets several ~850 CTS tests going from skip to pass, with
patterns like:

  * dEQP-VK.texture.compressed.bc*
  * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d*bc*
  * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.3d*bc*
  * dEQP-VK.api.info.image_format_properties*bc*
  * etc

v2: BC1-3 formats are texture filterable (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Alejandro Piñeiro
fcb229cbe0 v3dv/device: clarify that we can't expose textureCompressionBC
From spec:

"textureCompressionBC specifies whether all of the BC compressed
 texture formats are supported. If this feature is enabled"

Note the *all*. v3d hw supports BC1, BC2, and BC3, but not BC4 through
BC7.

Let's clarify that we can't expose textureCompressionBC even if we
support some of them.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>
2021-02-12 22:04:13 +00:00
Iago Toral Quiroga
82981ccbb1 broadcom/compiler: use unifa for UBO loads from uniform addresses
This basically processes UBO loads as uniform loads by writing
the load address to the unifa register and reading sequential
values with ldunifa.

This process is faster than going through the TMU, but we can only
use it when the address we are reading from is uniform across all
channels, since we are basically reading from the UBO address
as if it was a uniform stream.

This leads to better performance in the UE4 Shooter demo.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:22 +00:00
Iago Toral Quiroga
878555976e broadcom/compiler: emit ldunifarf when needed
Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator
and ldunifarf writes to the register file.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
c2a04aca48 broadcom/compiler: do not DCE ldunifa
ldunifa reads a uniform from the unifa address and updates the unifa
address implicitly, so if we dead-code-eliminate one a follow-up
ldunifa will not read from the appropriate address.

We could avoid this if the compiler ensures that every ldunifa is
paired with an explicit unifa, so for example if we are reading a
vec4, we could emit:

unifa (addrr)
ldunifa
unifa (addr+4)
ldunifa
unifa (addr+8)
ldunifa
unifa (addr+12)
ldunifa

instead of:

unifa (addr)
ldunifa
ldunifa
ldunifa
ldunifa

But since each unifa has a 3 delay slot before we can do ldunifa,
that would end up being quite expensive.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
efc75e13ea broadcom/compiler: disallow reading two uniforms in the same instruction
The simulator asserts on this, which can happen if we merge a ldunif
(or any other instruction that reads a uniform implicitly) and
ldunifa in the same instruction.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
e8e4bdae8d broadcom/compiler: ensure 3-slot delay between unifa and ldunifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
42880fdf5d broadcom/compiler: preserve ordering of unifa/ldunifa sequences
unifa writes the addresss from which follow-up ldunifa loads,
and each ldunifa increments the unifa addeess by 32-bit so the
loads need to be ordered too.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
97c078488f broadcom/compiler: disallow unifa overlap with thread switch/end
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
24db1a5112 broadcom/compiler: add a helper to check if an instruction writes unifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
4b929ae9f0 broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x
This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware
we are targetting. Our version resolution doesn't allow us to check
for 4.2 versions lower than .14, but that is okay because the
simulator would still validate this in any case.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
457ed5aa01 broadcom/compiler: name registers correctly based on V3D version
So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x,
which are aliased.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
f85fcaa494 broadcom/compiler: pass a devinfo to check if an instruction writes to TMU
V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA
(which isn't a TMU write address). This change passes a devinfo to
any functions that need to do these checks so we can account for the
target V3D version correctly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga
449af48f42 broadcom/compiler: add V3D_QPU_WADDR_UNIFA
This only exists in V3D 4.x and aliases V3D_QPU_WADDR_TMU from V3D 3.x.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov
9909fe6bac broadcom/compiler: Skip bool_to_cond where possible
This change keeps track of when a boolean temp is loaded into the flags
by a comparison instruction and uses that information to skip emitting
instructions to set the flags in ntq_emit_bool_to_cond when the flags
already have the right contents.

total instructions in shared programs: 11116502 -> 11112225 (-0.04%)
instructions in affected programs: 631691 -> 627414 (-0.68%)
helped: 1591
HURT: 754
helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58%
HURT stats (abs)   min: 1 max: 19 x̄: 3.07 x̃: 2
HURT stats (rel)   min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15%
95% mean confidence interval for instructions value: -2.02 -1.63
95% mean confidence interval for instructions %-change: -0.94% -0.71%
Instructions are helped.

total uniforms in shared programs: 3281555 -> 3281513 (<.01%)
uniforms in affected programs: 1754 -> 1712 (-2.39%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5
helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05%
HURT stats (abs)   min: 1 max: 15 x̄: 7.40 x̃: 3
HURT stats (rel)   min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41%
95% mean confidence interval for uniforms value: -8.57 2.97
95% mean confidence interval for uniforms %-change: -7.35% 1.07%
Inconclusive result (value mean confidence interval includes 0).

total max-temps in shared programs: 1758419 -> 1758174 (-0.01%)
max-temps in affected programs: 7006 -> 6761 (-3.50%)
helped: 290
HURT: 14
helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88%
HURT stats (abs)   min: 1 max: 13 x̄: 6.00 x̃: 3
HURT stats (rel)   min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12%
95% mean confidence interval for max-temps value: -1.03 -0.58
95% mean confidence interval for max-temps %-change: -6.24% -4.16%
Max-temps are helped.

total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%)
sfu-stalls in affected programs: 1578 -> 1512 (-4.18%)
helped: 257
HURT: 252
helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1
helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: -0.25 -0.01
95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33%
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%)
inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%)
helped: 1581
HURT: 755
helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59%
HURT stats (abs)   min: 1 max: 17 x̄: 3.17 x̃: 2
HURT stats (rel)   min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20%
95% mean confidence interval for inst-and-stalls value: -2.06 -1.66
95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70%
Inst-and-stalls are helped.

Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov
8762f29e9c broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f
Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Iago Toral Quiroga
bd0ef080d0 v3d/compiler: fix QPU scheduler TMU sequence shuffling
The QPU scheduler allows to move certain TMU instructions around and
since we enabled pipelining, we need to protect against the case where
doing this might break a TMU sequence. For example, this test:

dEQP-VK.rasterization.line_continuity.line-strip

Was generating this VIR:

mov tmud, t187
mov.pushz null, t176
mov.ifa tmua, t9
nop null; wrtmuc (img[0].p0 | 0x0)
mov tmut, t185
mov tmud, t180
mov.ifa tmusf, t183
nop null; thrsw

where we have a general TMU access (tmud,tmua) followed by an image
access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning
into:

nop            ; nop               ; ldunifrf.rf22 (0xffffff00 / -nan)
nop            ; nop               ; wrtmuc (img[0].p0 | 0x0)
nop            ; nop               ; ldtmu.r2
add  r0, r2, 1 ; nop               ; ldtmu.r3
nop            ; nop               ; ldtmu.r4
nop            ; mov  tmud, r0
nop            ; mov.ifa  tmua, rf15
nop            ; mov  tmut, r4     ; thrsw
nop            ; mov  tmud, rf22
nop            ; mov.ifa  tmusf, r3

where it allowed the wrtmuc to move up and before the general TMU access,
leading to an incorrect TMU sequence.

Fix this by flagging TMUA writes (which are the sequence terminators for
general TMU accessess) as writing new TMU configuration, like we do for all
other TMU sequence terminators for textures and images.

Fixes: 197090a3fc ('broadcom/compiler: implement pipelining for general TMU operations')

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>
2021-02-10 13:18:25 +00:00
Alejandro Piñeiro
f758b1a25b v3dv: support for depthBiasClamp
Gets tests like the following working:
dEQP-VK.dynamic_state.rs_state.depth_bias_clamp

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8928>
2021-02-10 10:29:09 +00:00
Eric Anholt
bcb5f9f94a v3d: Stop advertising support for flat shading.
The GL frontend can lower this weird GL feature away for us.  This should
fix redeclaration of the gl_Color/SecondaryColor as centroid, since that
case had been missed in the !flat special case here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
ff805f8ac7 v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED.
The GL frontend can lower away this deprecated GL feature for us.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
2992dc7386 v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR.
The GL frontend can lower away this deprecated GL feature for us.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt
5ddc2f916f v3d: Clean up vestiges of alpha test lowering.
We had an unnecessary case in our uniforms upload switch statement, since
we no longer advertise the cap.

Fixes: 8ad931808e ("v3d: do not report alpha-test as supported")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov
9e1aa23448 v3dv: initialize render_fd at the top of physical_device_init
This fixes an uninitialized variable warning.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8902>
2021-02-09 06:45:41 +00:00
Iago Toral Quiroga
8eeb61a3bf v3dv: add a perf trace when a device is created with robust buffer access
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
2021-02-08 13:00:16 +00:00
Iago Toral Quiroga
e6f8202749 v3dv: serialize pipeline compilation when debugging shaders
It is possible to compile pipelines in multiple threads, but when we
are dumping debug information for shaders, we want all the outputs
serialized so we can make sense of it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
2021-02-08 13:00:16 +00:00
Iago Toral Quiroga
44dcc4c24d v3d/common: use spaces instead of TABs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
2021-02-08 13:00:16 +00:00
Arcady Goldmints-Orlov
0b29a8a206 Revert "broadcom/compiler: improve generation of if conditions"
This reverts commit 93f8f83a95.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>
2021-02-08 06:52:59 +00:00
Iago Toral Quiroga
c72d99550c v3dv: allow a component swizzle in copy_buffer_to_image_shader
This is trivial because this path relies on our blit_shader interface
which supports this already, so it just needs to pass it along.

I don't think this is ever triggered practice, since we should be
able to handle any case that could require this with the texel buffer
path, but at least it allows us to simplify a bit the code.

Tested by  manually disabling the priority paths to ensure we exercise
component swizzles with this path.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>
2021-02-05 13:31:25 +01:00
Iago Toral Quiroga
4d4a0797ce v3dv: batch copies in the copy_buffer_to_image_blit path
This path is very memory hungry and batching allows us to reduce
this by allocating memory just once and reuse it for all regions
in the batch instead of allocating once per region.

v2: document return value for this function (apinheiro).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>
2021-02-05 13:31:25 +01:00
Iago Toral Quiroga
7aa04ad065 v3dv: handle D/S buffer to image copies with the texel buffer path
We do this by converting them to a compatible color copy and using a
destination color mask as well as a source component swizzle to handle
D24 format semantics according to the V3D hardware requirements,
similar to what we do with our blit shader interface.

This path is faster than the terrible copy_buffer_to_image_blit,
which requires to copy the source buffer to a tiled image first
and should be avoided as much as possible, since it is slow and
can also quickly increase device memory usage.

This fixes occasional OOM errors when loading traces in renderdoc.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>
2021-02-05 13:31:25 +01:00
Jason Ekstrand
0260b4a7e7 vulkan: Add a common helper for enumerating instance extension properties
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8792>
2021-02-04 20:02:12 +00:00
Iago Toral Quiroga
6630825dcf broadcom/compiler: let QPUs stall on TMU input/config overflows
We have been trying to avoid this by tracking fifo usages in the driver and
flushing all outstanding TMU sequences if we overflowed any of these, however,
this is actually not the most efficient strategy. Instead, we would like to
flush only enough operations to get things going again, which is better for
pipelining. Doing that in the driver would require some additional work, but
thankfully, it is not required, since this seems to be what the hardware does
automatically, so we can just remove overflow tracking for these two fifos
and enjoy the benefits.

This also further improves shader-db stats:

total instructions in shared programs: 8975062 -> 8955145 (-0.22%)
instructions in affected programs: 1637624 -> 1617707 (-1.22%)
helped: 4050
HURT: 2241
Instructions are helped.

total threads in shared programs: 236802 -> 237042 (0.10%)
threads in affected programs: 252 -> 492 (95.24%)
helped: 122
HURT: 2
Threads are helped.

total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%)
sfu-stalls in affected programs: 4744 -> 4435 (-6.51%)
helped: 1248
HURT: 1051
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%)
inst-and-stalls in affected programs: 1636184 -> 1615958 (-1.24%)
helped: 4050
HURT: 2239
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
d57a358128 broadcom/compiler: log spilling shaders to perf output
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
0f90b729fb broadcom/compiler: disallow spilling if TMU pipelining was enabled
TMU pipelining makes TMU spilling difficult and can easily lead to
doing large amounts of spills to compile a shader. It is best to
only use pipelining if we can compile without spilling.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
e18d6bbf2f broadcom/compiler: disable TMU pipelining if we fail to register allocate
TMU pipelining can severely reduce our capacity to emit TMU spills,
causing us to fail to compile a shader we may otherwise be able to
compile. This is because pipelining extends the liveness of TMU
sequences by posponing the thread switch and LDTMU until a result
is needed, and we can't emit TMU spills while in the middle of a
TMU sequence.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
ecd654bf00 broadcom/compiler: support pipelining of image load/store instructions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
0bdc6dca6c broadcom/compiler: refactor image load/store TMU emission code
This mostly moves code around to group together the code involved with
actually emitting a TMU sequence. This will make it a bit easier to
then implement pipelining while reusing this code, similar to how we
handled other cases of TMU pipelining.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
be45960d3e broadcom/compiler: support pipelining of tex instructions
This follows the same idea as for TMU general instructions of reusing
the existing infrastructure to first count required register writes and
flush outstanding TMU dependencies, and then emit the actual writes, which
requires that we split the code that decides about register writes to
a helper.

We also need to start using a component mask instead of the number
of components that we need to read with a particular TMU operation.

v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga
197090a3fc broadcom/compiler: implement pipelining for general TMU operations
This creates the basic infrastructure to implement TMU pipelining and
applies it to general TMU. Follow-up patches will expand this
to texture and image/load store operations.

TMU pipelining means that we don't immediately end TMU sequences,
and instead, we postpone the thread switch and LDTMU (for loads)
or TMUWT (for stores) until we really need to do them.

For loads, we may need to flush them if another instruction reads
the result of a load operation. We can detect this because in that
case ntq_get_src() will not find the definition for that ssa/reg
(since we have not emitted the LDTMU instructions for it yet), so
when that happens, we flush all pending TMU operations and then
try again to find the definition for the source.

We also need to flush pending TMU operations when we reach the end
of a control flow block, to prevent the case where we emit a TMU
operation in a block, but then we read the result in another block
possibly under control flow.

It is also required to flush across barriers and discards to honor
their semantics.

Since this change doesn't implement pipelining for texture and
image load/store, we also need to flush outstanding TMU operations
if we ever have to emit one of these. This will be corrected with
follow-up patches.

Finally, the TMU has 3 fifos where it can queue TMU operations.
These fifos have limited capacity, depending on the number of threads
used to compile the shader, so we also need to ensure that we
don't have too many outstanding TMU requests and flush pending
TMU operations if a new TMU operation would overflow any of these
fifos. While overflowing the Input and Config fifos only leads
to stalls (which we want to avoid anyway), overflowing the Output
fifo is incorrect and would end up with a broken shader. This means
that we need to know how many TMU register writes are required
to emit a TMU operation and use that information to decide if we need
to flush pending TMU operations before we emit any register
writes for the new TMU operation.

v2: fix TMU flushing for NIR registers reads (jasuarez)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00