Add new traces, remove old ones, and add more information for the
unsupported/crashes.
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23319>
This is a prepare for removing _MTX_INITIALIZER_NP.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21284>
Piglit tests for v3d highlighted issues with the padding
computation when allocating memory for slices. This change
moves the fixes from v3d to v3dv.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23260>
So far we have 12 jobs for v3d-gl (OpenGL/ES and piglit), 1 job for
v3d-traces, and 10 jobs for v3dv-vulkan, but we only have 21 rpi4
devices for testing.
So let's reduce from 12 to 10 jobs in v3d-gl, so all jobs can run
simultaneously.
Also, as the ideal goal is that each job doesn't take more than 15
minutes, let's increase a little bit the fraction for v3dv, and include
a fraction for v3d-gl as well, so all jobs are ideally under the time
limit.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23285>
This implementation reinterprets the stencil data as a RGBA8888 texture.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23136>
Besides blitting color-based buffer, we can use the tile buffers to blit
also depth and stencil buffers.
This also fixes several piglit tests.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23136>
Now that it is exposing GLSL 1.30, and we can read clipdistance arrays
in the fragment shader, let's enable this capability.
It fixes
`spec@glsl-1.30@execution@clipping@fs-clip-distance-interpolated,Crash`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23232>
Fixes hundreds of dEQP-VK.api.copy_and_blit.* tests when including the
assert that the alignment in align() is valid, as added in !20153.
Fixes: 3ba839bf73 ("v3dv: align compressed image regions to block size")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23224>
Already in hard-freeze, so we don't have to worry about breaking changes.
Significant changes:
- LLVM 15 is used instead of 11 or 13
- /dev/shm has to be manually mounted
- Debian 12 uses libdrm 2.4.114
- reworked creating of rootfs, from debootstrap to mmdebstrap
- split `create-rootfs.sh` into `lava_build.sh`, `setup-rootfs.sh`, and `strip-rootfs.sh`
- dropped winehq repository for now (Debian wine is up-to-date enough)
- we use wine now, no need to call explicitly call wine64
- bumped libasan from version 6 to 8
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21977>
We have been relying on NIR's gather info pass for this
but it is not safe unless we are certain we are always
calling it after any other pass that may emit a control
barrier.
As it stands, nir_zero_initialize_shared_memory can emit a
control barrier and we don't call the gather info pass after
it, which is problematic. The only reason this is not really
a problem right now is because for non-scoped barriers (which
is what we currently use) it doesn't emit a scoped barrier, just
a regular memory barrier (which is probably a bug in the pass!),
but as soon as we move to scoped barriers, this is going
to be a problem, since we need to know when we emit a control
barrier to ensure supergroup calculations prevent deadlocks at
the barrier op.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23228>
Now that we have nir_fsub_imm, let's use it to save some typing!
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
This simplifies things a bit. Note that in some cases, the arguments are
swapped, because multiplications are commutative, and nir_fmul_imm only
allows the second operand to be an immediate.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
disable_tmu_pipelining has been recently set to false on two
strategies that should set it to true.
Fixes the following CTS test:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
Fixes: c950098ab - broadcom/compiler: move buffer loads to lower register pressure
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23207>
Right now if we fail to register allocate, we return the qpu_insts
that we had at that point, even if the driver can't really use it.
Also v3dv_pipeline was already assuming that it would return NULL on
failure, returning VK_ERROR_UNKNOWN on that case.
This allows CTS tests with a lot of pressure, that regress now and
then to not being able to allocate, to finish with an error, instead
of blocking forever. For example:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23203>
This fixes an assert crash in UE4 when forcing the blit path for
image copies, caused by an image copy of a small miplevel which
pixel size is smaller than a single compressed block, leading to
an empty blit region.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23180>
We had a check to ensure we were copying full slices, but the
size check was done against the base mip level, so in practice
we were only using the TFU for mip 0.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23180>
Compressed textures require their width and height padding to be
calculated based on the number of blocks in the image. This change ensures
that the number of blocks in the texture is a POT for mip levels > 1.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23133>
Similar to other drivers, let's run always the traces tests.
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23135>
Instead of running all the tests, run only the GPU related ones, which
should make the CI faster.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23135>
We have an optimization for non-uniform if/else where if all channels meet the
jump condition we emit a branch to jump straight to the ELSE block. Similarly,
if at the end of the THEN block we don't have any channels that would execute
the ELSE block, we emit a branch to jump straight to the AFTER block.
This optimization has a cost though: we need to emit the condition for the
branch and a branch instruction (which also comes with a 3 delay slot), so for
very small blocks (just a couple of ALU for example) emitting the branch
instruction is typically worse. Futher, if the condition for the branch is not
met, we still pay the cost for no benefit at all.
Here is an example:
nop ; fmul.ifa rf26, 0x3e800000, rf54
xor.pushz -, rf52, 2 ; nop
bu.alla 32, r:unif (0x00000000 / 0.000000)
nop ; nop
nop ; nop
nop ; nop
xor.pushz -, rf52, 3 ; nop
nop ; mov.ifa rf52, 0
nop ; mov.pushz -, rf52
nop ; mov.ifa rf26, 0x3f800000
The bu instruction here is setup to jump over the following 4 instructions
(the last 4 instructions in there). To do this, we pay the price of the xor
to generate the condition, the bu instruction, and the 3 delay slots right
after it, so we end up paying 6 instructions to skip over 4 which we pay
always, even if the branch is not taken and we still have to execute those
4 instructions. With this change, we produce:
nop ; fmul.ifa rf56, 0x3e800000, rf28
xor.pushz -, rf9, 3 ; nop
nop ; mov.ifa rf9, 0
nop ; mov.pushz -, rf9
nop ; mov.ifa rf56, 0x3f800000
Now we don't try to skip the small block, ever. At worse, if all channels
would have met the branch condition, we only pay the cost of the 4
instructions instead of 6, at best, if any channel wouldn't take the
branch, we save ourselves 5 cycles for the branch condition, the branch
instruction and its 3 delay slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23161>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
There are no more producers of legacy atomics so these calls are inert.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>
Some values like the transform feedback offset or the number of output
vertices in VS can be obtained knowing how many vertices and primitive
type are used in the drawcall.
But when the primitive restart is enabled, doing this is quite more
complex, as we should parse the vertex buffer to know where is the
restart values, and so on.
In this case, delay this computation after the drawcall is executed, by
querying the GPU to know these values.
Similarly, this delay is also applied to compute the transform feedback
buffer offsets when there is a geometry shader, as we don't know
beforehand how many vertices it is going to output.
This fixes `spec@!opengl 3.1@primitive-restart-xfb flush` and
`spec@!opengl 3.1@primitive-restart-xfb generated`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22716>
This helps by reducing the number of branches with their corresponding
delay slots, at the expense of additional register pressure. It also helps
a lot with SFU stalls, probably because removing control-flow blocks
gives us more QPU scheduling flexibility to hide them.
Shader-db results below correspond to the "closed shaders" set, since the
full set is very dominated by the massive impact this change has on Skia's
shaders (for the better), so this is probably more representative of real
impact:
total instructions in shared programs: 11887255 -> 11854898 (-0.27%)
instructions in affected programs: 538170 -> 505813 (-6.01%)
helped: 1653
HURT: 43
Instructions are helped.
total threads in shared programs: 385924 -> 385872 (-0.01%)
threads in affected programs: 236 -> 184 (-22.03%)
helped: 22
HURT: 48
Inconclusive result (%-change mean confidence interval includes 0).
total uniforms in shared programs: 3552808 -> 3547894 (-0.14%)
uniforms in affected programs: 157486 -> 152572 (-3.12%)
helped: 1673
HURT: 35
Uniforms are helped.
total max-temps in shared programs: 2062403 -> 2064720 (0.11%)
max-temps in affected programs: 18209 -> 20526 (12.72%)
helped: 168
HURT: 369
Max-temps are HURT.
total spills in shared programs: 1937 -> 1994 (2.94%)
spills in affected programs: 79 -> 136 (72.15%)
helped: 0
HURT: 1
total fills in shared programs: 2652 -> 2717 (2.45%)
fills in affected programs: 115 -> 180 (56.52%)
helped: 0
HURT: 1
total sfu-stalls in shared programs: 19349 -> 18010 (-6.92%)
sfu-stalls in affected programs: 2321 -> 982 (-57.69%)
helped: 674
HURT: 74
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 11906604 -> 11872908 (-0.28%)
inst-and-stalls in affected programs: 541339 -> 507643 (-6.22%)
helped: 1656
HURT: 43
Inst-and-stalls are helped.
total nops in shared programs: 245740 -> 238085 (-3.12%)
nops in affected programs: 19282 -> 11627 (-39.70%)
helped: 1335
HURT: 76
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22922>
Ensure the render target values are in the proper range.
This fixes `spec@!opengl 3.0@render-integer`.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22733>
Seems the only thing that really needs this is fpow(0, 0), which should
return NaN, but then gets multiplied with zero. Let's fix that by doing
a bcsel instead of fmul to select the result here. While we're at it,
get rid of the fabs for stop, which isn't needed.
This fixes a piglits failure for most (if not all?) drivers that doesn't
support legacy math rules.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22789>
If we are trying to lower register pressure this can make a big
difference in some cases. To avoid adding even more strategies,
merge this with disabling ubo load sorting, since they are basically
trying to do the same.
total instructions in shared programs: 12848024 -> 12844510 (-0.03%)
instructions in affected programs: 236537 -> 233023 (-1.49%)
helped: 195
HURT: 87
Instructions are helped.
total uniforms in shared programs: 3815601 -> 3814932 (-0.02%)
uniforms in affected programs: 31773 -> 31104 (-2.11%)
helped: 67
HURT: 115
Inconclusive result (value mean confidence interval includes 0).
total max-temps in shared programs: 2210803 -> 2210622 (<.01%)
max-temps in affected programs: 9362 -> 9181 (-1.93%)
helped: 114
HURT: 34
Max-temps are helped.
total spills in shared programs: 2556 -> 2330 (-8.84%)
spills in affected programs: 1391 -> 1165 (-16.25%)
helped: 39
HURT: 9
total fills in shared programs: 3840 -> 3317 (-13.62%)
fills in affected programs: 2379 -> 1856 (-21.98%)
helped: 39
HURT: 23
total sfu-stalls in shared programs: 21965 -> 21978 (0.06%)
sfu-stalls in affected programs: 2618 -> 2631 (0.50%)
helped: 45
HURT: 81
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 12869989 -> 12866488 (-0.03%)
inst-and-stalls in affected programs: 238771 -> 235270 (-1.47%)
helped: 193
HURT: 87
Inst-and-stalls are helped.
total nops in shared programs: 303501 -> 303274 (-0.07%)
nops in affected programs: 4159 -> 3932 (-5.46%)
helped: 87
HURT: 105
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22824>
rf0 is affected by restrictions in some scenarios so we rather use
a register that does not cause conflicts for scheduling.
total instructions in shared programs: 12850958 -> 12848024 (-0.02%)
instructions in affected programs: 331974 -> 329040 (-0.88%)
helped: 2559
HURT: 201
Instructions are helped.
total max-temps in shared programs: 2210893 -> 2210803 (<.01%)
max-temps in affected programs: 1486 -> 1396 (-6.06%)
helped: 96
HURT: 7
Max-temps are helped.
total sfu-stalls in shared programs: 21975 -> 21965 (-0.05%)
sfu-stalls in affected programs: 32 -> 22 (-31.25%)
helped: 16
HURT: 6
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 12872933 -> 12869989 (-0.02%)
inst-and-stalls in affected programs: 332036 -> 329092 (-0.89%)
helped: 2560
HURT: 189
Inst-and-stalls are helped.
total nops in shared programs: 305911 -> 303501 (-0.79%)
nops in affected programs: 11215 -> 8805 (-21.49%)
helped: 2131
HURT: 3
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22797>
1D texture miplevels are aligned to 64b, but this should include also
texture arrays.
Fixes
`spec@glsl-1.30@execution@texelfetchoffset@vs-texelfetch-usampler1darray`
and several other piglit tests.
CC: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22775>