Use a unsigned int type in the loop to avoid unintended sign extensions.
Fixes CID#1414500 (Unintended sign extension [SIGN_EXTENSION]).
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10060>
As that handles better, and more clear, the case of bindingCount being
zero. For the case of Anvil and Turnip, this avoids allocating a
non-needed binding when bindingCount is zero.
Inspired on radv, that was what it was doing so far.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4526
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9905>
A break/continue in a loop is typically emitted like this:
if (cond) {
break/continue;
} else {
}
If cond is uniform, we'll emit code for a uniform if statement and
that will emit a branch right before the if to jump directly to the
else (or the block after the else in this case, since the else is
empty) in case cond evaluates to false. This means we end up emitting
two consecutive branch instructions, one before the if and one for the
THEN block right after:
branch(!cond) -> jump to else (or after else) if cond is false
nop
nop
nop
branch -> unconditional jump to break/continue
nop
nop
nop
Instead, if we are in this scenario, we can do better by emitting the
conditional jump directly and avoiding the "jump to else" case:
branch(cond) -> jump to break/continue if cond is true
nop
nop
nop
We need to be careful when emitting the break/continue for the case
where all lanes are disabled to avoid infinite loops: if we have a
break we always want to take the jump, but we don't want to take it
if it is a continue.
total instructions in shared programs: 13563672 -> 13557348 (-0.05%)
instructions in affected programs: 348034 -> 341710 (-1.82%)
helped: 1158
HURT: 10
Instructions are helped.
total uniforms in shared programs: 3779137 -> 3777535 (-0.04%)
uniforms in affected programs: 90583 -> 88981 (-1.77%)
helped: 1169
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2317670 -> 2317575 (<.01%)
max-temps in affected programs: 1943 -> 1848 (-4.89%)
helped: 85
HURT: 4
Max-temps are helped.
total sfu-stalls in shared programs: 32247 -> 32247 (0.00%)
sfu-stalls in affected programs: 69 -> 69 (0.00%)
helped: 7
HURT: 9
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13595919 -> 13589595 (-0.05%)
inst-and-stalls in affected programs: 350674 -> 344350 (-1.80%)
helped: 1154
HURT: 11
Inst-and-stalls are helped.
total nops in shared programs: 358202 -> 354325 (-1.08%)
nops in affected programs: 17367 -> 13490 (-22.32%)
helped: 1168
HURT: 1
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>
Across every driver...
v2: Add casts to appease -fpermissive used on CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9477>
The merged image contains kernels & rootfs for both arm64 & armhf
baremetal test jobs, and is smaller than either arm{64,hf}_test image
before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Doing so in an x86 container via qemu was slow, and started failing
recently after updating to a newer qemu version.
This also results in smaller arm*_test* docker images, since we need to
install fewer Debian packages in them.
As a bonus, this turns some piglit tests from fail to pass (Or maybe
they'll turn out to be flakes? They've passed at least 3 times in a
row).
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Fix defect reported by Coverity Scan.
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement: return;.
Fixes: bdf93f4e3b ("v3dv/cmd_buffer: return early for draw commands if there is nothing to draw")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9890>
If we have an unconditional branch then we can try to fill up its
delay slots with the initial instructions of its successor block by
copying them into the delay slots and adjusting the branch offset to
skip the copied instructions.
total nops in shared programs: 365640 -> 364471 (-0.32%)
nops in affected programs: 15416 -> 14247 (-7.58%)
helped: 462
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
For this we do something similar to what we do with thrsw where we try to
move the branch instruction earlier so the previous instructions execute
in the delay slots of the branch.
Generally, we can do this with any instruction except:
- If the instruction reads a uniform: since our branches do as well and
uniforms come from an ordered FIFO stream.
- If the instruction writes flags, since our branch instruction will
probably read them.
- If the instruction is in the delay slots of another thread switch,
branch, or unifa write, which is disallowed.
total instructions in shared programs: 13648140 -> 13613972 (-0.25%)
instructions in affected programs: 2209552 -> 2175384 (-1.55%)
helped: 6765
HURT: 0
Instructions are helped.
total max-temps in shared programs: 2318687 -> 2318436 (-0.01%)
max-temps in affected programs: 5046 -> 4795 (-4.97%)
helped: 152
HURT: 0
Max-temps are helped.
total inst-and-stalls in shared programs: 13680494 -> 13646326 (-0.25%)
inst-and-stalls in affected programs: 2220394 -> 2186226 (-1.54%)
helped: 6765
HURT: 0
Inst-and-stalls are helped.
total nops in shared programs: 399818 -> 365640 (-8.55%)
nops in affected programs: 127311 -> 93133 (-26.85%)
helped: 6765
HURT: 0
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
Do not assign to a variable that won't be used.
Fixes CID#1468098 "Unused value (UNUSED_VALUE)".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
Do not assign to a variable that won't be used.
Fixes CID#1451708 and CID#1451710 "Unused value (UNUSED_VALUE)".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
There is margin in the time budget to run the full GLES3 and GLES31 CTS
instead of only 50%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9899>
MAX2(count * struct size, 1) results in 1 for count=0, not the size of a struct.
Since this MAX only seems to exist so we can keep using NULL for error reporting,
just refactor to return a VkResult.
Fixes: ad241b15a9 ("vk: consolidate dynamic descriptor binding sorting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4522
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9880>
We were using a write dependency to ensure ordering since LDTMUs sequences
are ordered, but by using a write dependency with TMU config we were also
preserving ordering with TMU config writes that are not a sequence
terminator, which is not required and reduces scheduling flexibility.
Instead, use a write dependency to ensure strict ordering of TMU reads,
but only a read depdency with TMU config.
With this change we also need to update CS barriers to also have a write
dependency with TMU reads to ensure that we don't move TMU reads around
CS barriers.
total instructions in shared programs: 13602500 -> 13597851 (-0.03%)
instructions in affected programs: 2681428 -> 2676779 (-0.17%)
helped: 6567
HURT: 4960
Instructions are helped.
total max-temps in shared programs: 2317927 -> 2317914 (<.01%)
max-temps in affected programs: 13861 -> 13848 (-0.09%)
helped: 355
HURT: 300
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 32074 -> 32247 (0.54%)
sfu-stalls in affected programs: 848 -> 1021 (20.40%)
helped: 160
HURT: 327
Inconclusive result (%-change mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13634574 -> 13630098 (-0.03%)
inst-and-stalls in affected programs: 2703041 -> 2698565 (-0.17%)
helped: 6558
HURT: 5020
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
Instead of last TMU write. According to the documentation, the entries
in the output FIFO are pushed with the *final* input write for the
lookup, which is the one terminating the sequence. We flag these
with last_tmu_config.
This will allow us to move all TMU register writes for a lookup except
the last one ahead of the LDTMUs for the previous lookup, possibly
allowing us to pair up these writes the wrtmuc instructions for the
same lookup, turning code like this:
nop ; nop ; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[2].p1 | 0x1)
nop ; nop ; ldunif (ubo[2]+0xe0)
fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110)
fmax rf34, 0, r4 ; nop
nop ; mov tmut, rf11
nop ; mov tmus, rf0
into:
nop ; mov tmut, rf11 ; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[2].p1 | 0x1)
nop ; nop ; ldunif (ubo[2]+0xe0)
fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110)
fmax rf34, 0, r4 ; nop
nop ; mov tmus, rf0
total instructions in shared programs: 13648140 -> 13602500 (-0.33%)
instructions in affected programs: 3497402 -> 3451762 (-1.30%)
helped: 12044
HURT: 3484
Instructions are helped.
total max-temps in shared programs: 2318687 -> 2317927 (-0.03%)
max-temps in affected programs: 17234 -> 16474 (-4.41%)
helped: 615
HURT: 198
Max-temps are helped.
total sfu-stalls in shared programs: 32354 -> 32074 (-0.87%)
sfu-stalls in affected programs: 1462 -> 1182 (-19.15%)
helped: 461
HURT: 188
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13680494 -> 13634574 (-0.34%)
inst-and-stalls in affected programs: 3514405 -> 3468485 (-1.31%)
helped: 12062
HURT: 3486
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
We are providing a BO with the default attribute values for the
GL_SHADER_STATE_RECORD, that contains 16 vec4. Such default value for
each vec4 is (0, 0, 0, 1). As the attribute format could be int or
float, the "1" value needs to take into account the attribute format.
But in the practice, the most common case is all floats. So we create
one default attribute values BO assuming that all attributes will be
floats, and we store it at v3dv_device and only create a new one if a
int format type is defined. That allows to reduce the amount of BOs
needed.
Note that we could still try to reduce the amount of BOs used by the
pipelines if we create a bigger BO, and we just play with the
offsets. But as mentioned, that's not the usual, and would add an
extra complexity,so it is not a priority right now.
This makes the following test passing when disabling the pipeline
cache support:
dEQP-VK.api.object_management.max_concurrent.graphics_pipeline
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9845>
The way we handle thrsw instructions is that we try to merge them
back into previously scheduled instructions to fill up its delay
slots. This is generally safe, because the thrsw won't happen until
after the delay slots, so we are not really changing the execution
order of the instructions and we just need to make sure we don't
violate a few specific restrictions.
If we have not managed to fill up all delay slots after doing this,
then we emit as many NOPs as needed to fill them. This is to ensure
that we don't schedule an instruction that needs to execute after the
thread switch before the thread switch happens. However, doing this
can lead to inefficient code, since some times the instructions we
schedule after a thrsw are indepdent of the thrsw and could be safely
executed in its delay slots.
This change removes the fixed NOP emission after a thrsw to fill
delay slots and instead adds code to ensure that our instruction
scheduling is aware of when it is scheduling instructions in the
delay slots of a previous thrsw to avoid selecting conflicting
instructions.
The only case were we still emit fixed NOPs is for the thread end that
we emit to terminate the program after scheduling all instructions
because we can't end the instruction stream before the thread end
is properly executed.
total instructions in shared programs: 13691004 -> 13648140 (-0.31%)
instructions in affected programs: 4345951 -> 4303087 (-0.99%)
helped: 19645
HURT: 652
Instructions are helped.
total max-temps in shared programs: 2319317 -> 2318687 (-0.03%)
max-temps in affected programs: 10510 -> 9880 (-5.99%)
helped: 532
HURT: 9
Max-temps are helped.
total sfu-stalls in shared programs: 31752 -> 32354 (1.90%)
sfu-stalls in affected programs: 840 -> 1442 (71.67%)
helped: 7
HURT: 467
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13722756 -> 13680494 (-0.31%)
inst-and-stalls in affected programs: 4335590 -> 4293328 (-0.97%)
helped: 19453
HURT: 758
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>
We have helpers to check if an instruction writes to specific
accumulators. This one will check if it writes any of the general
purpose accumulators, which will come in handy in a follow-up
patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>
Integer add/sub can be implemented as either an add or a mul instruction
but we always emit them as add instructions at VIR level. We can use this
flexibility to improve our QPU scheduling so we can be more effective
at instruction merging by converting these to mul instructions when we
are attempting to merge them with another add instruction.
total instructions in shared programs: 13721549 -> 13691004 (-0.22%)
instructions in affected programs: 3340493 -> 3309948 (-0.91%)
helped: 12805
HURT: 1656
Instructions are helped.
total max-temps in shared programs: 2319528 -> 2319317 (<.01%)
max-temps in affected programs: 5285 -> 5074 (-3.99%)
helped: 195
HURT: 3
Max-temps are helped.
total sfu-stalls in shared programs: 31616 -> 31752 (0.43%)
sfu-stalls in affected programs: 469 -> 605 (29.00%)
helped: 52
HURT: 161
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13753165 -> 13722756 (-0.22%)
inst-and-stalls in affected programs: 3340383 -> 3309974 (-0.91%)
helped: 12782
HURT: 1666
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9769>
So for example, on v3dv_CmdDrawIndexed we can return early if
instanceCount is 0.
This fixes failures when using the simulator with tests with the
following pattern:
dEQP-VK.draw.instanced.draw_indexed_vk_primitive_topology*
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9820>
Given what a niche developer tool CLIF dumps are, no sense requiring
libexpat just for that.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9764>
Quoting Jason's commit message (afa8f5892), that also applies here:
"The Vulkan API provides a mechanism for applications to cache their
own shaders and manage on-disk pipeline caching themselves.
Generally, this is what I would recommend to application developers
and I've resisted implementing driver-side transparent caching in the
Vulkan driver for a long time. However, not all applications do this
and, for some use-cases, it's just not practical."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
Also return the proper Vulkan result for this case, that is somewhat
tricky. Technically Create[Graphics/Compute]Pipeline only allow OOM
errors. So for this case, there is only the alternative of the generic
VK_ERROR_UNKNOWN, even if we known the cause of the error. From spec:
"VK_ERROR_UNKNOWN will be returned by an implementation when an
unexpected error occurs that cannot be attributed to valid behavior
of the application and implementation. Under these conditions, it
may be returned from any command returning a VkResult"
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
Until now we were always doing a two-step cache lookup, as we were
using the NIR shaders to fill up the key to lookup for the compiled
shaders. But since we were already generating the sha1 key with the
original SPIR-V shader (or its internal NIR representation) any info
we were collecting from from NIR is already implicit in the original
shader, so we can avoid using the NIR in most cases.
Because the v3d_key that is used to compile a shader is populated with
data coming directly from the NIR shader or produced during NIR
lowerings, we can't use it directly as part of the pipeline cache
entry. We could split them, but that would be confusing, so we add a
new struct, v3dv_pipeline_key used specifically to search for the
compiled shaders on the pipeline cache. v3d_key would be still used to
compile the shaders.
As we are using the same sha1 key for all compiled shaders in a
pipeline, we can also group all of them in the same cache entry, so we
don't need a lookup for each stage. This also allows to cache pipeline
data shared by all the stages (like the descriptor maps).
While we are here, we also create a single BO to store the assembly
for all the pipeline stages.
Finally, we remove the link to the variant on the pipeline stage
struct, to avoid the confusion of having two links to the same
data. This mostly means that we stop to use the pipeline stage
structures after the pipeline is created, so we can freed them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
Mostly the same that main mesa gl_shader_stage, but including the
coordinate shader. This would allow to loop over all the available
stages (for example if we need to free them, compute the max spill
size, etc).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
We stopped to re-use them after pippeline creation long ago, so let's
reduce the size of both structs, and avoid serialize/deserialize for
the variant case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
If we were able to get a shader variant from the pipeline cache, we
will not have the nir shader available.
Note that this is what we were doing on the driver before the nir io
helpers were available.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
This maps the nir shader data.location to its final
data.driver_location. In general we are using the driver location as
index (like vattr_sizes on the same struct), so having this map is
useful if what we have is the data.location, and we don't have
available the original nir shader.
v2: use memset instead of for loop, and nir_foreach_shader_in_variable
instead of nir_foreach_variable_with_modes (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
As we plan to try to get directly the compiled variant from the cache,
it would be possible to not have available the nir shaders, so we add
this info on prog data.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
Right now we were not pre-generating several variants, but we decided
to let this method, just in case we need that idea back. This ended
being a bad idea. Several months have passed without that need, so
having that method just adds confusion. Also, if we need to add a
multiple-variant in the future, perhaps we would need to do it
different, so let's not template in advance.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
We tweak a little some of the individual messages, and add a new
option to dump the stats when the pipeline destroy.
As we are here we also we also tweak the names of the global options
to make it more clear.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>