For a src to be killed, not only does its SSA value need to be killed,
it also shouldn't be part of or contain an interval that isn't killed
yet.
Fixes a RA assert in Windrose: "reg pressure calculation was wrong!".
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41154>
BV_RB and BV_CCU are supported on some devices (knp, but not glymur or
pakala, for ex).. we don't have a way to deal with that yet.
This doesn't yet _expose_ gen8 perfcntrs. That small patch will come
after PERFCNTR_CONFIG ioctl is supported to ensure that everything gen8
and later supports the new kernel based counter collection/reservation
(so that backwards compat of old userspace on new kernel is limited to
a7xx and earlier).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40522>
With concurrent binning, some counter reads or SEL reg programming needs
to happen explicitly on the BR or BV ring. For the most part if there
is a "BV_FOO" counter group that should be on the BV ring and the
corresponding "FOO" group on the BR ring. There are a few exceptions
like "CP" vs "BV_CP" which have different SEL reg offsets for BR vs BV,
rather than the same offsets that should be accessed via the appropriate
aperture.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40522>
1) only use "ull" for reg64, which avoids some compiler warnings on the
kernel side.
2) use "ull" for booleans as well, if reg64
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40522>
To generate the perfctr tables we need a bit more information than what
is in the .xml, such as which groups of SELECT regs correspond to which
sets of COUNTER regs, the enum type of the countables (ie. possible
SELECT reg values), etc.
It would be awkward to shoehorn this into an xml schema that is based on
describing registers. But json is easy to consume.
Field description:
- chip: variant enum used for generating correct reg offsets
- groups: array of entries for each group of counters/countables:
- name: group name
- num: the number of counters
- reserved: array of counter indices reserved for KMD use
- select_offset: Offset of the first selector reg, used in cases
where same bank of selectors is used for both BR and
BV
- select: the selector reg name
- counter: counter name if <reg64>, otherwise use counter_lo and
counter_hi
- countable_type: name of <enum> that defines selector reg values
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40522>
We could generate the rest of the tables, other than these fields. But
they are all "UINT64, AVERAGE" (for the non-derived counters), so just
drop them.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40522>
Based on 075d78115e ("panvk: implement deferred image creation"),
8aa2f1a94f ("panvk: add panvk_android_get_wsi_memory for AHB spec v8+"),
and 66bbd9eec8 ("panvk: implement AHB image deferred init and memory alloc").
Defer image initialization for both ANB alias images (gralloc v8+)
and AHB-backed images using vk_android_init_deferred_image() to
deep-copy the VkImageCreateInfo at vkCreateImage time.
For ANB alias images, tu_image_init() and tu_image_update_layout()
run at vkBindImageMemory2 time via tu_android_get_wsi_memory() when
the native buffer arrives.
For AHB images, tu_image_init() and tu_image_update_layout() run at
vkAllocateMemory time when the AHardwareBuffer handle is available
via dedicated allocation.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
ret was read after the timeout check, so breaking on timeout returned 0
instead of the actual fence status, potentially reporting a signaled
fence when it was still pending.
Fixes: 441f01e778 ("freedreno/drm/virtio: Drop blocking in host")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41108>
Fixes two bugs in the WAIT_FENCE polling loop:
1. Break on timeout returned VK_SUCCESS because ret was read too late.
2. UINT64_MAX timeout_ns overflowed end_time, causing immediate exit.
Fix by reading rsp->ret before the timeout check and using
OS_TIMEOUT_INFINITE (like virtio_pipe_wait in freedreno) to avoid
overflow.
This prevents premature BO teardown during host-side fault recovery.
Fixes: f17c5297d7 ("tu: Add virtgpu support")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41108>
This covers the DX8/DX9 single-frame apitrace collection from
traces-db-private, and the job will appear for anyone in the group with
access to restricted traces. Like other restricted traces jobs, it's set
to allow-failure, so that regressions in the job from changes by
developers not in the group don't block merging by developers with access,
but hopefully the increased visibility lets us catch rendering bugs faster
or avoid merging them in the first place.
The actual runtime for all of our dx8/9 trace collection is about 2:30,
and the whole job is about 7:30.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
Remove the TU_DEBUG_FLUSHALL option that was force-enabled for a8xx chips.
The problematic CTS cases that required it were failing due to indirect
draw commands sourcing draw data from buffers whose content was prepared
by compute tasks.
Up until a8xx, firmware was managing an implicit wait before any indirect
draw parameters were read, with a delayed CP_WAIT_FOR_ME emitted only when
necessary or on devices enabling indirect_draw_wfm_quirk due to bugged
firmware. That implicit wait is gone on a8xx, so CP_WAIT_FOR_ME should be
emitted immediately, which also matches behavior of the proprietary driver.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40550>
With autotune allocating counters low-to-high, the conflict with
PERFORMANCE_QUERY_KHR will happen if any CP-based counters are
used. This is a temporary workaround which just drops the first
two CP counters from being usable for performance queries.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
This is more consistent with the newly established pattern of the
UMD allocating all locally used performance counters low-to-high
instead of the prior high-to-low order.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
The UMD will be switching to allocating counters from low-to-high,
so to avoid the chances of conflict with this new policy the PPS
driver now allocates the other way around. Additionally, this will
future proof it for the MSM-DRM uAPI for performance counters which
will similarly allocate from high-to-low.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
When preemption optimization is supported then the necessary CP
counters being missing causes a device initialization error which
is unnecessary as support can simply be disabled instead to allow
for a more graceful fail. This also fixes A8XX which doesn't have
performance counters hooked up yet.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
Future kernel API for perfcounter management will likely be required for
a8xx and onwards. For a7xx and earlier, cmdstream-based selector and
counter register management is still supported.
Cc: mesa-stable
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
"size" is the allocated size of the array, not the number of immediates
actually used. We could wind up returning a too-large constlen, larger
than 512, and since the binning variant uses the non-binning variant's
constlen as it's max_const we could make binning variants use c512.x and
crash when encoding.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
We need to know the immediate count even after lowering, to compute the
overall const size. Previously we were using the capacity field, but
that's unreliable and won't be available once we switch to a real
dynamic array container instead of (poorly) reinventing one.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
ldg.k can copy up to 256 vec4s at once but we currently emit one ldg.k
per vec4. Fix this by using the load size field of ldg.k.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
The load size field starts at b23 instead of b24 and is 8 bits in size.
b23 makes the blob disassembler select between interpreting the load
size as an immediate or a GPR. However, using a GPR doesn't work as the
HW still seems to interpret the field as an immediate. We copy the
blob's behavior here for consistency.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
We assumed a1.x addressing doesn't work. However, it turns out it
actually does work but instead of taking the offset's hight bits from
a1.x and adding an immediate to the low bits, the full offset is stored
in a1.x and the offset is ignored.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40947>
Constlen cannot always be derived from the usage of @const et al. For
example when using ldc.k/ldg.k. Add a @constlen header to explicitly set
it.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40940>
Since we don't set constlen anymore based on static const reg usage,
computerator was broken. Fix this by setting constlen for @const et al.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 4e456ebde4 ("ir3/collect_info: remove max_const calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40940>
remove_trivial_phi() mostly does nothing for non-array phis, but it
rewrites sources if their definining instruction are trivial phis.
In the case of trivial phis in the loop continue block (for loops with
divergent non-trivial continues), we might need to keep those if they
write a shared register, because the source of the trivial phi will not be
reachable from the loop header phi.
In this example, the predecessors of the continue block should be block2,
but the physical predecessors are block2 and block3, requiring a phi in
the continue block which will then be lowered by ir3_lower_shared_phis.
loop {
block1:
a = phi 0, b
if (divergent) {
block2:
b = a + 1
continue;
}
block3:
break;
}
Fixes RA validation error when compiling blackmythwukong/5645a84e669a6179
from radv_fossils.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40480>