Add the ability to deduplicate hoisted expressions, which will be
necessary to avoid repeatedly hoisting the same descriptors and blowing
our budget. The offset calculation may have itself been hoisted into the
preamble, so we also have to be able to hoist a bindless_resource_ir3
referencing a load_preamble and connect it to the source of the
corresponding store_preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>
We have to be careful here because r63.x is also used to mean "register
not assigned yet," but dummy destinations for prefetches will use r63.x
and we don't want them to count as GPRs when determining whether to
enable early preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>
Don't multiply by 4 twice. While we're here, fix the in-bounds check on
a6xx, since we need to account for the multiplying by 4. This was done
correctly on a7xx but the commit below didn't correctly port it to a6xx
when adding the multiply on a6xx.
Fixes: 01bac643f6 ("freedreno/ir3: Fix ldg/stg offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30046>
We use the method used by the blob, which sets the USE_CPU_MAP flag,
originally intended for SVM, to allocate from a separate address range
and to control the address by passing a preferred address to mmap().
With this we can capture and replay gfxreconstruct traces on kgsl for
apps that use BDA, and we can replay them on msm with a small hack to
increase the address space size:
echo 274877906944 > /sys/module/msm/parameters/address_space_size
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29251>
Neither freedreno nor turnip make use of the display-related source
files. With the XML files being imported to the kernel, drop them from
Mesa to prevent possible confusion and/or deviation between those files.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27787>
The OFFSET_LO in #instruction-cat6-a6xx-ibo-load-store aliased with
opcode of other instructions, resolve this by being less lax in some
instruction definitions.
A proper way to solve this would probably be to reconstruct instructions
hierarchy, but it's a much more complex task.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30018>
In tu_get_image_format_properties(), image usage flags are matched against
sets of required format feature flags for the specified image format.
These mappings are defined in the Vulkan 1.3 spec in section 49.3.3.,
"Format Feature Dependent Usage Flags".
Handling for the VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT flag was missing. When
specified, at least one of color attachment or depth-stencil attachment
format feature flags should be present for the given format.
Fixes 110 new Vulkan CTS tests:
- dEQP-VK.api.info.unsupported_image_usage.linear.*
- dEQP-VK.api.info.unsupported_image_usage.optimal.*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30013>
Add basic KHR_8bit_storage support for Adreno 750 devices, for now enabling
the storageBuffer8BitAccess feature. A separate descriptor is provided for
8-bit storage access. The descriptor index is adjusted appropriately for
8-bit SSBO loads and stores.
The 8-bit SSBO loads cannot go through isam since that instruction isn't
able to handle those. The ldib and stib instruction encodings are a bit
peculiar but they match the blob's image buffer access through VK_FORMAT_R8
and the dedicated descriptor. These loads and stores do not work in
vectorized form, so they have to be scalarized. Additionally stores of
8-bit values have to clear up higher bits of those values.
8-bit truncation can leave higher bits as undefined. Zero-extension of
8-bit values has to use masking since the corresponding cov instruction
doesn't function as intended. 8-bit sign extension through cov from a
non-shared to a shared register also doesn't work, so an exception is
applied to avoid it.
Conversion of 8-bit values to and from floating-point values also doesn't
work with a straightforward cov instruction, instead the conversion has
to go through a 16-bit value.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9979
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
ir3's TYPE_S8 isn't actually a signed 8-bit type in a half-register, it's
in fact an unsigned 8-bit type in a full register. Only actual uses of
TYPE_S8 are in conversion operations, but those can be trivially replaced
with TYPE_U8, either truncating down to 8 bits or zero-extending or
sign-extending from 8 bits upward.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
Support 8-bit loads and stores in the preamble alongside 16-bit ones by
employing the correct conversion variants. Promotions to float remain
specific to 16-bit loads and stores.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
Until now, if the 16-bit storage functionality is supported by the
hardware, two separate descriptors were set up, with isam loads and stores
piping through the descriptor of the corresponding size and other storage
access using the 16-bit descriptor.
These changes keep separate descriptors on a650, but leverage post-a650
isam.v functionality that enables use of 16-bit descriptors for 32-bit
loads, removing the need for the separate 32-bit descriptor.
Storage buffer descriptors are set up according to 16-bit storage support
and the indicated isam.v support, using those descriptors for 32-bit isam
loads as well if the latter is present.
Dynamic offset application in tu_CmdBindDescriptorSets is modified to
determine the offset shift value based on the descriptor's format and not
on the descriptor's position in the layout binding.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
Move all the logic into tu_CmdBegin/EndQueryIndexedEXT. CmdBegin/EndQuery in
the common runtime is a wrapper that calls tu_CmdBegin/EndQueryIndexedEXT with
index 0.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29441>
Match the structure of the common Vulkan runtime and NVK.
Additionally update a comment to reflect the current state.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29441>
For preambles, we don't actually care which invocation we get, so we
don't have to enable helper invocations when the preamble uses "getone."
Introduce a new intrinsic with the right semantics and plumb it through.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29914>
We set LRZ_FEEDBACK_EARLY_LRZ_LATE_Z mask for rendering pass after
HW binning because:
- Draws with EARLY_Z contributed to depth buffer in BINNING stage;
- Draws with LATE_Z is what usually disables LRZ.
- Draws with EARLY_LRZ_LATE_Z are the ones we want because they
represent the common case of FS with "discard".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
Some draws do write depth but cannot contribute to LRZ during the BINNING pass
e.g. when fragment shader has "discard" in it, however they can contribute to
LRZ during the RENDERING pass via LRZ feedback meachanism. This may allow the
draws that follow to depth test against the updated LRZ, this is especially
important if such "bad" draws were at the start of the renderpass.
LRZ feedback happens during the RENDERING pass when LRZ_FEEDBACK_ZMODE_MASK
is set, if draw has a6xx_ztest_mode that has corresponding flag set in
LRZ_FEEDBACK_ZMODE_MASK - its depth values would be used for feedback.
LRZ feedback alongside with LRZ testing also works during sysmem rendering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
The ir3_info is reset by ir3_collect_shader_info() on the expectation
that all info is collected inside that function. This meant that we were
accidentally disabling early preamble. Re-enable it.
We keep a copy in ir3_info for shader statistics in the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>
nir_opt_preamble sometimes adds useless expressions, in which case we
may have stc instructions and no corresponding use of the constant.
Things can go sideways when these aren't included in the constlen, so
far only observed when earlypreamble is enabled.
Fixes: ccc64b7e00 ("ir3: Plumb through store_uniform_ir3 intrinsic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>