Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In opcode = opcode = desc->opcode_first, opcode is
written twice with the same value.
Fixes: 3b20208f03 ("broadcom/qpu: add pack/unpack support for v71")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25735>
Add a wrapper to allow calling the right simulator function based on
the hardware under simulation.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
Some of the counters need to be defined correctly.
v2: Remove references to extended performance counters. The hw does
not support them.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
In v3d 4.x we handle formats that are reversed or R/B swapped by
applying a format swizzle. This doesn't work on border colors though,
and for that there is a specific bit to reverse the border color in
the texture shader state.
In v3d 7.x we have new reverse and swap R/B bits and we no longer have
a bit to reverse the border color because the new reverse bit applies
to border texels too. Because of this, we absolutely need to use these
new bits in order to get correct border colors in all cases with these
formats.
When we enable the reverse and/or swap R/B bits, we are effectively
applying the format swizzle through them, so in these cases we need to
make sure the swizzle we program in the texture shader state is the
view swizzle provided by the API and not the composition of the format
swizzle with the view swizzle like we do in 4.x for all formats. The
same applies to custom border colors: we must not apply the format
swizzle to them for formats that are reversed or R/B swapped, because
again, this format swizzle is already applied through these new bits.
While we are doing this, we also fully adopt the texture shader state
spec from v3d 7.1.5 for v3d 7.x instead of using a description from
7.1.2 which is incompatible and required the driver to manually pack
some of the bits.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
We can use the actual bpp of each color attachment to compute real
tile memory requirements, which may allow us to choose a larger tile
size configuration than in V3D 4.2 in certain scenarios.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
When the Z scale is too small guardband clipping may not clip
correctly, so disable it, which is a new option in V3D 7.x.
This fixes this test in V3D 7.x without needing any workarounds:
dEQP-VK.draw.renderpass.inverted_depth_ranges.nodepthclamp_deltazero
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
On V3D 7.1 there is not a flag on the Shader State Record to specify
if we are using shared or separate segments. This is done by setting
the vpm input size to 0 (so we need to ensure that the output would be
the max needed for input/output).
We were already doing the latter on the prog_data_vs, so we just need
to use those values, instead of assigning default values.
As we are here, we also add some comments on the compiler part.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
There are some new fields for YCbCr with pointers for the various
planes in multi-planar formats. These need to match the base address
pointer in the texture state, or the hardware will assume this is a
multi-planar texture.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
It is likely that we would need more changes, as this packet changed,
but this is enough to get basic tests running. Any additional support
will be handled with new commits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
Content, structure and size would depend on the generation. Even if it
is needed at all.
So let's move it to the v3dvx files.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
As the packet CLIPPER_XY scaling, this needs to be computed on 1/64ths
of pixel, instead of 1/256ths of pixels.
As this is the usual values that we get from macros, we add manually a
v42 and v71 macro, and define a new helper (V3DV_X) to get the value
for the current hw version.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
Starting point for v71 version inclusion.
This just adds it as one of the versions to be compiled (on meson),
updates the v3dX/v3dv_X macros, and update the code enough to get it
compiling when building using the two versions. For any packet not
available on v71 we just provide a generic asserted placeholder of
generation not supported.
Any real v71 support will be implemented on following commits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
So we can use it for ldunif(a) and avoid generating ldunif(a)rf which
can't be paired with conditional instructions.
shader-db (pi5):
total instructions in shared programs: 11357802 -> 11338883 (-0.17%)
instructions in affected programs: 7117889 -> 7098970 (-0.27%)
helped: 24264
HURT: 17574
Instructions are helped.
total uniforms in shared programs: 3857808 -> 3857815 (<.01%)
uniforms in affected programs: 92 -> 99 (7.61%)
helped: 0
HURT: 1
total max-temps in shared programs: 2230904 -> 2230199 (-0.03%)
max-temps in affected programs: 52309 -> 51604 (-1.35%)
helped: 1219
HURT: 725
Max-temps are helped.
total sfu-stalls in shared programs: 15021 -> 15236 (1.43%)
sfu-stalls in affected programs: 6848 -> 7063 (3.14%)
helped: 1866
HURT: 1704
Inconclusive result
total inst-and-stalls in shared programs: 11372823 -> 11354119 (-0.16%)
inst-and-stalls in affected programs: 7149177 -> 7130473 (-0.26%)
helped: 24315
HURT: 17561
Inst-and-stalls are helped.
total nops in shared programs: 273624 -> 273711 (0.03%)
nops in affected programs: 31562 -> 31649 (0.28%)
helped: 1619
HURT: 1854
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
In programs with a lot of unused temps, if we don't do this, we may
end up recycling previously used rfs more often, which can be
detrimental to instruction pairing.
total instructions in shared programs: 11464335 -> 11444136 (-0.18%)
instructions in affected programs: 8976743 -> 8956544 (-0.23%)
helped: 33196
HURT: 33778
Inconclusive result
total max-temps in shared programs: 2230150 -> 2229445 (-0.03%)
max-temps in affected programs: 86413 -> 85708 (-0.82%)
helped: 2217
HURT: 1523
Max-temps are helped.
total sfu-stalls in shared programs: 18077 -> 17104 (-5.38%)
sfu-stalls in affected programs: 8669 -> 7696 (-11.22%)
helped: 2657
HURT: 2182
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 11482412 -> 11461240 (-0.18%)
inst-and-stalls in affected programs: 8995697 -> 8974525 (-0.24%)
helped: 33319
HURT: 33708
Inconclusive result
total nops in shared programs: 298140 -> 296185 (-0.66%)
nops in affected programs: 52805 -> 50850 (-3.70%)
helped: 3797
HURT: 2662
Inconclusive result
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
The last 3 instructions can't use specific registers so flag all the
nodes for temps used in the last program instructions and try to
avoid assigning any of these. This may help us avoid injecting nops
for the last thread switch instruction.
Because regisster allocation needs to happen before QPU scheduling
and instruction merging we can't tell exactly what the last 3
instructions will be, so we do this for a few more instructions than
just 3.
We only do this for fragment shaders because other shader stages
always end with VPM store instructions that take an small immediate
and therefore will never allow us to merge the final thread switch
earlier, so limiting allocation for these shaders will never improve
anything and might instead be detrimental.
total instructions in shared programs: 11471389 -> 11464335 (-0.06%)
instructions in affected programs: 582908 -> 575854 (-1.21%)
helped: 4669
HURT: 578
Instructions are helped.
total max-temps in shared programs: 2230497 -> 2230150 (-0.02%)
max-temps in affected programs: 5662 -> 5315 (-6.13%)
helped: 344
HURT: 44
Max-temps are helped.
total sfu-stalls in shared programs: 18068 -> 18077 (0.05%)
sfu-stalls in affected programs: 264 -> 273 (3.41%)
helped: 37
HURT: 48
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 11489457 -> 11482412 (-0.06%)
inst-and-stalls in affected programs: 585180 -> 578135 (-1.20%)
helped: 4659
HURT: 588
Inst-and-stalls are helped.
total nops in shared programs: 301738 -> 298140 (-1.19%)
nops in affected programs: 14680 -> 11082 (-24.51%)
helped: 3252
HURT: 108
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
This commits adds the qpu definitions for several new v71
instructions.
Packing:
* vpack does a 2x32 to 2x16 bit integer pack
* v8pack: Pack 2 x 2x16 bit integers into 4x8 bits
* v10pack packs parts of 2 2x16 bit integer into r10g10b10a2.
* v11fpack packs parts of 2 2x16 bit float into r11g11b10 rounding
to nearest
Conversion to unorm/snorm:
* vftounorm8/vftosnorm8: converts from 2x16-bit floating point
to 2x8 bit unorm/snorm.
* ftounorm16/ftosnorm16: converts floating point to 16-bit
unorm/snorm
* vftounorm10lo: Convert 2x16-bit floating point to 2x10-bit unorm
* vftounorm10hi: Convert 2x16-bit floating point to one 2-bit and one 10-bit unorm
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>