In these cases we know that the BO has not been added to the job
before, so we can skip the usual process for adding the BO where
we check if we had already added it before to avoid duplicates.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10210>
Among other things, this gets us GCC 10 (was 6).
Requires some changes to third party components we use:
* Install apitrace (& waffle) from Debian; was hitting issues with the
local build, and it's the same version 9.0 anyway.
* Update Fossilize to a newer commit which builds with GCC 10.
* apt.llvm.org repositories are no longer needed.
* Use an SPIRV-LLVM-Translator commit which builds with LLVM 11.0.1.
* Install XCB packages from Debian, 1.13 fails to build with Python 3.9.
* Install wayland-protocols from Debian, 1.12 is too old for
libgtk-3-dev in bullseye.
LLVM 7/8 packages are no longer available.
Also adapt expected test results to Xvfb now exposing multi-samle
GLXFBConfigs.
v2:
* Install clang instead of clang-11.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3124
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9833>
64 was a temporary and conservative "big enough" value, but we can do
better.
Note that as mentioned on the FIXME, we could be even more detailed,
adding a descriptor map allocate method based on the descriptor
type. That would mean more individual allocations, and slightly more
complexity.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10207>
We had some cases were we have defined a value on v3dv_limits but
using other when setting it at GetPhysicalDeviceProperties (like
dynamic storage buffers).
Also we do a cleanup. So far we were adding on v3dv_limits only the
limits that were used on more that one place. But then we had the
definition of several limits on different places. It is clearer to
have a common place for those, even if it is used on just one place.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10207>
There were two problems here:
* We were multiplying by 6, when for graphics pipelines, we only
support 2.
* Right now we are tracking descriptors through the descriptor
maps, and we have one per pipeline. So in practice there is no
difference between per-stage and per-pipeline limits. So far this
was not a problem, we could revisit in the future.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10207>
v2:
- Bump up MESA_ROOTFS_TAG instead of arm_build (Michel)
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10136>
Somewhat terrifyingly, we never sent this for direct contexts, which
means the server never knew the context/drawable bindings. To handle
this sanely, pull the request code up out of the indirect backend, and
rewrite the context switch path to call it as appropriate. This
attempts to preserve the existing behavior of not calling unbind() on
the context if its refcount would not drop to zero.
Of course, you can't just do this indiscriminately, because this is GLX
and extant X servers have bugs and everything is terrible. To wit:
- For 1.20.x prior to 1.20.6, you can bind a direct context once, but
the second time you try to modify the context's binding you will get
GLXBadContextTag. This includes unbinding the context. And "deleting"
the context will leak memory, because it will still appear to be
current.
- For 1.19 and earlier, glXMakeCurrent(dpy, None, ctx) should be legal
for GL 3.0+ contexts, but the server will throw BadMatch.
To guard against this, we only send the request for indirect contexts
unless the server is known good, and only mention one context at a time
in such a request; if switching between contexts, we first unbind the
old, and then bind the new. Note that the second VendorRelease() version
is to catch XFree86 4.x and Xorg [67].x, which almost certainly have the
above bugs. Other servers might report different version numbers here,
but we can't do direct rendering against them, so this should be safe.
Fixes: mesa/mesa#4418
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9992>
Ensure subpass_idx has a valid value; we use "-1" as invalid one.
Fixes CID#1468096 "Macro compares unsigned to 0 (NO_EFFECT)"
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10203>
Vertex Shader has a store_out lowering pass that converts gallium driver
locations in offsets inside the VPM.
One of the consequences is that these offsets are consecutives; that is,
if the VS stores VARYING_SLOT_VAR0.xyz and VARYING_SLOT_VAR1.xyzw, there
isn't a hole in the VPM offsets for the un-stored VARYING_SLOT_VAR0.w.
Thus we need to change how the VPM offset is computed in the Geometry
Shader when loading the inputs.
This bug is exposed by !9050.
v2 (Iago):
- Include explanatory comment.
- Use assert.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10129>
`qpu.raddr_b` is an unsigned int, so it is always positive, even after
casting to signed int.
Fixes CID#1438117 "Operands don't affect result
(CONSTANT_EXPRESSION_RESULT)":
"result_independent_of_operands: (int)inst->qpu.raddr_b >= -16 is
always true regardless of the values of its operands. This occurs as
the logical first operand of "&&".
v2:
- Use signed pointers (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10131>
The term 'last' may be misleading because the offset represents
the current unifa offset, which is the offset used by the last
load plus 4 bytes, so rename these to use the term 'current'
instead.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This implements a NIR pass that groups together constant UBO loads
for the same UBO index in order of increasing offset when the distance
between them is small enough that it enables the "skip unifa write"
optimization.
This may increase register pressure because it can move UBO loads
earlier, so we also add a compiler strategy fallback to disable the
optimization if we need to drop thread count to compile the shader
with this optimization enabled.
total instructions in shared programs: 13557555 -> 13550300 (-0.05%)
instructions in affected programs: 814684 -> 807429 (-0.89%)
helped: 4485
HURT: 2377
Instructions are helped.
total uniforms in shared programs: 3777243 -> 3760990 (-0.43%)
uniforms in affected programs: 112554 -> 96301 (-14.44%)
helped: 7226
HURT: 36
Uniforms are helped.
total max-temps in shared programs: 2318133 -> 2333761 (0.67%)
max-temps in affected programs: 63230 -> 78858 (24.72%)
helped: 23
HURT: 3044
Max-temps are HURT.
total sfu-stalls in shared programs: 32245 -> 32567 (1.00%)
sfu-stalls in affected programs: 389 -> 711 (82.78%)
helped: 139
HURT: 451
Inconclusive result.
total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%)
inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%)
helped: 4478
HURT: 2395
Inst-and-stalls are helped.
total nops in shared programs: 354365 -> 342202 (-3.43%)
nops in affected programs: 31000 -> 18837 (-39.24%)
helped: 4405
HURT: 265
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This adds a minimum thread count parameter to each compilation strategy with
the intention to limit the minimum allowed thread count that can be used to
register allocate with that strategy.
For now all strategies allow the minimum thread count supported by the
hardware, but we will be using this infrastructure to impose a more
strict limit in an upcoming optimization.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
We will be using this distance to setup another optimization in a
follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
x# Please enter the commit message for your changes. Lines starting
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This can be called outside a render pass so we should not expect to have
a job available. Also, we should not be emitting state here, instead we
should do in the pre-draw handler with all the other draw call state.
Fixes cases of crashes in RenderDoc when selecting elements in the
Event Browser.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10130>
first_component is an uint, and thus if it takes value 0 we can't know
if it is because writemask has its first bit to 1, or all bits to 0.
As we want to ensure that at least one bit is set, apply the assertion
in writemask.
Fixes CID#1472829 "Macro compares unsigned to 0 (NO_EFFECT)".
v2:
- Restore "first_component <= last_component" assertion (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10103>
The meson.build was unaware of transitive dependencies introduced by
Python imports.
Android still needs fixing. But I did not update the Android files lest
I break the build.
Ideally, we would fix this by using a Python runner that generates
a depfile, similar to how meson creates depfiles for C files by passing
flags -MD -MQ -MF to gcc. But this patch gets the job done, without
stalling on the ideal general solution, by manually tracking the Python
imports in new 'foo_depend_files' variables.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1466>
Use a unsigned int type in the loop to avoid unintended sign extensions.
Fixes CID#1414500 (Unintended sign extension [SIGN_EXTENSION]).
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10060>
As that handles better, and more clear, the case of bindingCount being
zero. For the case of Anvil and Turnip, this avoids allocating a
non-needed binding when bindingCount is zero.
Inspired on radv, that was what it was doing so far.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4526
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9905>
A break/continue in a loop is typically emitted like this:
if (cond) {
break/continue;
} else {
}
If cond is uniform, we'll emit code for a uniform if statement and
that will emit a branch right before the if to jump directly to the
else (or the block after the else in this case, since the else is
empty) in case cond evaluates to false. This means we end up emitting
two consecutive branch instructions, one before the if and one for the
THEN block right after:
branch(!cond) -> jump to else (or after else) if cond is false
nop
nop
nop
branch -> unconditional jump to break/continue
nop
nop
nop
Instead, if we are in this scenario, we can do better by emitting the
conditional jump directly and avoiding the "jump to else" case:
branch(cond) -> jump to break/continue if cond is true
nop
nop
nop
We need to be careful when emitting the break/continue for the case
where all lanes are disabled to avoid infinite loops: if we have a
break we always want to take the jump, but we don't want to take it
if it is a continue.
total instructions in shared programs: 13563672 -> 13557348 (-0.05%)
instructions in affected programs: 348034 -> 341710 (-1.82%)
helped: 1158
HURT: 10
Instructions are helped.
total uniforms in shared programs: 3779137 -> 3777535 (-0.04%)
uniforms in affected programs: 90583 -> 88981 (-1.77%)
helped: 1169
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2317670 -> 2317575 (<.01%)
max-temps in affected programs: 1943 -> 1848 (-4.89%)
helped: 85
HURT: 4
Max-temps are helped.
total sfu-stalls in shared programs: 32247 -> 32247 (0.00%)
sfu-stalls in affected programs: 69 -> 69 (0.00%)
helped: 7
HURT: 9
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13595919 -> 13589595 (-0.05%)
inst-and-stalls in affected programs: 350674 -> 344350 (-1.80%)
helped: 1154
HURT: 11
Inst-and-stalls are helped.
total nops in shared programs: 358202 -> 354325 (-1.08%)
nops in affected programs: 17367 -> 13490 (-22.32%)
helped: 1168
HURT: 1
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>
Across every driver...
v2: Add casts to appease -fpermissive used on CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9477>
The merged image contains kernels & rootfs for both arm64 & armhf
baremetal test jobs, and is smaller than either arm{64,hf}_test image
before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Doing so in an x86 container via qemu was slow, and started failing
recently after updating to a newer qemu version.
This also results in smaller arm*_test* docker images, since we need to
install fewer Debian packages in them.
As a bonus, this turns some piglit tests from fail to pass (Or maybe
they'll turn out to be flakes? They've passed at least 3 times in a
row).
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Fix defect reported by Coverity Scan.
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement: return;.
Fixes: bdf93f4e3b ("v3dv/cmd_buffer: return early for draw commands if there is nothing to draw")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9890>
If we have an unconditional branch then we can try to fill up its
delay slots with the initial instructions of its successor block by
copying them into the delay slots and adjusting the branch offset to
skip the copied instructions.
total nops in shared programs: 365640 -> 364471 (-0.32%)
nops in affected programs: 15416 -> 14247 (-7.58%)
helped: 462
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
For this we do something similar to what we do with thrsw where we try to
move the branch instruction earlier so the previous instructions execute
in the delay slots of the branch.
Generally, we can do this with any instruction except:
- If the instruction reads a uniform: since our branches do as well and
uniforms come from an ordered FIFO stream.
- If the instruction writes flags, since our branch instruction will
probably read them.
- If the instruction is in the delay slots of another thread switch,
branch, or unifa write, which is disallowed.
total instructions in shared programs: 13648140 -> 13613972 (-0.25%)
instructions in affected programs: 2209552 -> 2175384 (-1.55%)
helped: 6765
HURT: 0
Instructions are helped.
total max-temps in shared programs: 2318687 -> 2318436 (-0.01%)
max-temps in affected programs: 5046 -> 4795 (-4.97%)
helped: 152
HURT: 0
Max-temps are helped.
total inst-and-stalls in shared programs: 13680494 -> 13646326 (-0.25%)
inst-and-stalls in affected programs: 2220394 -> 2186226 (-1.54%)
helped: 6765
HURT: 0
Inst-and-stalls are helped.
total nops in shared programs: 399818 -> 365640 (-8.55%)
nops in affected programs: 127311 -> 93133 (-26.85%)
helped: 6765
HURT: 0
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
Do not assign to a variable that won't be used.
Fixes CID#1468098 "Unused value (UNUSED_VALUE)".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
Do not assign to a variable that won't be used.
Fixes CID#1451708 and CID#1451710 "Unused value (UNUSED_VALUE)".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
There is margin in the time budget to run the full GLES3 and GLES31 CTS
instead of only 50%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9899>
MAX2(count * struct size, 1) results in 1 for count=0, not the size of a struct.
Since this MAX only seems to exist so we can keep using NULL for error reporting,
just refactor to return a VkResult.
Fixes: ad241b15a9 ("vk: consolidate dynamic descriptor binding sorting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4522
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9880>
We were using a write dependency to ensure ordering since LDTMUs sequences
are ordered, but by using a write dependency with TMU config we were also
preserving ordering with TMU config writes that are not a sequence
terminator, which is not required and reduces scheduling flexibility.
Instead, use a write dependency to ensure strict ordering of TMU reads,
but only a read depdency with TMU config.
With this change we also need to update CS barriers to also have a write
dependency with TMU reads to ensure that we don't move TMU reads around
CS barriers.
total instructions in shared programs: 13602500 -> 13597851 (-0.03%)
instructions in affected programs: 2681428 -> 2676779 (-0.17%)
helped: 6567
HURT: 4960
Instructions are helped.
total max-temps in shared programs: 2317927 -> 2317914 (<.01%)
max-temps in affected programs: 13861 -> 13848 (-0.09%)
helped: 355
HURT: 300
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 32074 -> 32247 (0.54%)
sfu-stalls in affected programs: 848 -> 1021 (20.40%)
helped: 160
HURT: 327
Inconclusive result (%-change mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13634574 -> 13630098 (-0.03%)
inst-and-stalls in affected programs: 2703041 -> 2698565 (-0.17%)
helped: 6558
HURT: 5020
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
Instead of last TMU write. According to the documentation, the entries
in the output FIFO are pushed with the *final* input write for the
lookup, which is the one terminating the sequence. We flag these
with last_tmu_config.
This will allow us to move all TMU register writes for a lookup except
the last one ahead of the LDTMUs for the previous lookup, possibly
allowing us to pair up these writes the wrtmuc instructions for the
same lookup, turning code like this:
nop ; nop ; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[2].p1 | 0x1)
nop ; nop ; ldunif (ubo[2]+0xe0)
fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110)
fmax rf34, 0, r4 ; nop
nop ; mov tmut, rf11
nop ; mov tmus, rf0
into:
nop ; mov tmut, rf11 ; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[2].p1 | 0x1)
nop ; nop ; ldunif (ubo[2]+0xe0)
fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110)
fmax rf34, 0, r4 ; nop
nop ; mov tmus, rf0
total instructions in shared programs: 13648140 -> 13602500 (-0.33%)
instructions in affected programs: 3497402 -> 3451762 (-1.30%)
helped: 12044
HURT: 3484
Instructions are helped.
total max-temps in shared programs: 2318687 -> 2317927 (-0.03%)
max-temps in affected programs: 17234 -> 16474 (-4.41%)
helped: 615
HURT: 198
Max-temps are helped.
total sfu-stalls in shared programs: 32354 -> 32074 (-0.87%)
sfu-stalls in affected programs: 1462 -> 1182 (-19.15%)
helped: 461
HURT: 188
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13680494 -> 13634574 (-0.34%)
inst-and-stalls in affected programs: 3514405 -> 3468485 (-1.31%)
helped: 12062
HURT: 3486
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
We are providing a BO with the default attribute values for the
GL_SHADER_STATE_RECORD, that contains 16 vec4. Such default value for
each vec4 is (0, 0, 0, 1). As the attribute format could be int or
float, the "1" value needs to take into account the attribute format.
But in the practice, the most common case is all floats. So we create
one default attribute values BO assuming that all attributes will be
floats, and we store it at v3dv_device and only create a new one if a
int format type is defined. That allows to reduce the amount of BOs
needed.
Note that we could still try to reduce the amount of BOs used by the
pipelines if we create a bigger BO, and we just play with the
offsets. But as mentioned, that's not the usual, and would add an
extra complexity,so it is not a priority right now.
This makes the following test passing when disabling the pipeline
cache support:
dEQP-VK.api.object_management.max_concurrent.graphics_pipeline
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9845>
The way we handle thrsw instructions is that we try to merge them
back into previously scheduled instructions to fill up its delay
slots. This is generally safe, because the thrsw won't happen until
after the delay slots, so we are not really changing the execution
order of the instructions and we just need to make sure we don't
violate a few specific restrictions.
If we have not managed to fill up all delay slots after doing this,
then we emit as many NOPs as needed to fill them. This is to ensure
that we don't schedule an instruction that needs to execute after the
thread switch before the thread switch happens. However, doing this
can lead to inefficient code, since some times the instructions we
schedule after a thrsw are indepdent of the thrsw and could be safely
executed in its delay slots.
This change removes the fixed NOP emission after a thrsw to fill
delay slots and instead adds code to ensure that our instruction
scheduling is aware of when it is scheduling instructions in the
delay slots of a previous thrsw to avoid selecting conflicting
instructions.
The only case were we still emit fixed NOPs is for the thread end that
we emit to terminate the program after scheduling all instructions
because we can't end the instruction stream before the thread end
is properly executed.
total instructions in shared programs: 13691004 -> 13648140 (-0.31%)
instructions in affected programs: 4345951 -> 4303087 (-0.99%)
helped: 19645
HURT: 652
Instructions are helped.
total max-temps in shared programs: 2319317 -> 2318687 (-0.03%)
max-temps in affected programs: 10510 -> 9880 (-5.99%)
helped: 532
HURT: 9
Max-temps are helped.
total sfu-stalls in shared programs: 31752 -> 32354 (1.90%)
sfu-stalls in affected programs: 840 -> 1442 (71.67%)
helped: 7
HURT: 467
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13722756 -> 13680494 (-0.31%)
inst-and-stalls in affected programs: 4335590 -> 4293328 (-0.97%)
helped: 19453
HURT: 758
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>