Remove the TU_DEBUG_FLUSHALL option that was force-enabled for a8xx chips.
The problematic CTS cases that required it were failing due to indirect
draw commands sourcing draw data from buffers whose content was prepared
by compute tasks.
Up until a8xx, firmware was managing an implicit wait before any indirect
draw parameters were read, with a delayed CP_WAIT_FOR_ME emitted only when
necessary or on devices enabling indirect_draw_wfm_quirk due to bugged
firmware. That implicit wait is gone on a8xx, so CP_WAIT_FOR_ME should be
emitted immediately, which also matches behavior of the proprietary driver.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40550>
With autotune allocating counters low-to-high, the conflict with
PERFORMANCE_QUERY_KHR will happen if any CP-based counters are
used. This is a temporary workaround which just drops the first
two CP counters from being usable for performance queries.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
This is more consistent with the newly established pattern of the
UMD allocating all locally used performance counters low-to-high
instead of the prior high-to-low order.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
The UMD will be switching to allocating counters from low-to-high,
so to avoid the chances of conflict with this new policy the PPS
driver now allocates the other way around. Additionally, this will
future proof it for the MSM-DRM uAPI for performance counters which
will similarly allocate from high-to-low.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
When preemption optimization is supported then the necessary CP
counters being missing causes a device initialization error which
is unnecessary as support can simply be disabled instead to allow
for a more graceful fail. This also fixes A8XX which doesn't have
performance counters hooked up yet.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
Future kernel API for perfcounter management will likely be required for
a8xx and onwards. For a7xx and earlier, cmdstream-based selector and
counter register management is still supported.
Cc: mesa-stable
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
"size" is the allocated size of the array, not the number of immediates
actually used. We could wind up returning a too-large constlen, larger
than 512, and since the binning variant uses the non-binning variant's
constlen as it's max_const we could make binning variants use c512.x and
crash when encoding.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
We need to know the immediate count even after lowering, to compute the
overall const size. Previously we were using the capacity field, but
that's unreliable and won't be available once we switch to a real
dynamic array container instead of (poorly) reinventing one.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
ldg.k can copy up to 256 vec4s at once but we currently emit one ldg.k
per vec4. Fix this by using the load size field of ldg.k.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
The load size field starts at b23 instead of b24 and is 8 bits in size.
b23 makes the blob disassembler select between interpreting the load
size as an immediate or a GPR. However, using a GPR doesn't work as the
HW still seems to interpret the field as an immediate. We copy the
blob's behavior here for consistency.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
We assumed a1.x addressing doesn't work. However, it turns out it
actually does work but instead of taking the offset's hight bits from
a1.x and adding an immediate to the low bits, the full offset is stored
in a1.x and the offset is ignored.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
Constlen cannot always be derived from the usage of @const et al. For
example when using ldc.k/ldg.k. Add a @constlen header to explicitly set
it.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40940>
Since we don't set constlen anymore based on static const reg usage,
computerator was broken. Fix this by setting constlen for @const et al.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 4e456ebde4 ("ir3/collect_info: remove max_const calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40940>
remove_trivial_phi() mostly does nothing for non-array phis, but it
rewrites sources if their definining instruction are trivial phis.
In the case of trivial phis in the loop continue block (for loops with
divergent non-trivial continues), we might need to keep those if they
write a shared register, because the source of the trivial phi will not be
reachable from the loop header phi.
In this example, the predecessors of the continue block should be block2,
but the physical predecessors are block2 and block3, requiring a phi in
the continue block which will then be lowered by ir3_lower_shared_phis.
loop {
block1:
a = phi 0, b
if (divergent) {
block2:
b = a + 1
continue;
}
block3:
break;
}
Fixes RA validation error when compiling blackmythwukong/5645a84e669a6179
from radv_fossils.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40480>
Instead of inferring constlen from the usage of const registers by
various instructions, we can calculate it directly from the const file
allocations. This greatly simplifies the calculation of constlen.
Note that the increase in constlen comes from a few binning variants.
This doesn't matter as the constlen of the corresponding non-binning
variant is used for those anyway.
Totals from 73 (0.04% of 176258) affected shaders:
Constlen: 3428 -> 3720 (+8.52%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40929>
Required for SM6.6 in vkd3d-proton and used in a number of UE5 titles.
From descriptor side R64 images are R32G32_UINT, and to get storage_descriptor
we have to move early-return if format doesn't support rendering after
storage_descriptor setup.
Passes vkd3d-proton test:
test_shader_sm66_64bit_atomics
CTS tests:
dEQP-VK.image.atomic_operations.*.r64*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39932>
Remove unused includes or heavy includes (e.g. `tu_common.h`) when
we could have done with lighter ones.
iwyu was used to find these cases.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>
Add missing folder patterns, and make the `^<vulkan/` pattern
apply to system includes too, so that all system includes are
in one group.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>
This avoids wasting CI time by catching the error early. We do
still need the meson test to catch these issues locally when
rebuilding from an already configured build directory though.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>
Add a test to ensure that we're always using one of the wrapper
files instead of including the XML generated headers directly.
Assisted-by: Opencode (MiniMax M2.7)
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>
Add some wrapper header files so that we always include everything
that's needed by the generated header. This is in preparation for
setting up a script which enforces using these instead of importing
the xml generated headers directly.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>