Required for SM6.6 in vkd3d-proton and used in a number of UE5 titles.
From descriptor side R64 images are R32G32_UINT, and to get storage_descriptor
we have to move early-return if format doesn't support rendering after
storage_descriptor setup.
Passes vkd3d-proton test:
test_shader_sm66_64bit_atomics
CTS tests:
dEQP-VK.image.atomic_operations.*.r64*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39932>
Add some wrapper header files so that we always include everything
that's needed by the generated header. This is in preparation for
setting up a script which enforces using these instead of importing
the xml generated headers directly.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40853>
This introduces a new option that makes autotune optimize for low
preemption latency which is crucial to ensure responsiveness on
systems with GPU-based composition. A large enough draw can entirely
block the compositor from running with draw-level preemption, this can
be mitigated by preferring to use GMEM which breaks up the draw into
smaller pieces and generally has a lower latency for preemption.
As a further mitigation, tiles in GMEM are then divided into smaller
and smaller pieces which lowers the non-preemptible duration. There
are static checks in place to avoid doing this when it would incur a
cost that is too large.
Uses performance counters read during ambles to detect preemption
latency events while rendering in SYSMEM. This approach is superior
to using RBBM draw time thresholds which could be imprecise as only
the average was calculated rather than true maximum draw time.
However, converting the preemption latency performance counter value
from CP ticks to wall clock is based on the average GPU frequency of
the whole period from the start of the RP until the switch-away amble
while the preemption latency stars counting from the request. Thus, if
the GPU frequency shifts rapidly throughout the RP, it may cause the
estimated wall clock time to be inaccurate, but it should be good enough
in the vast majority of cases.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Initially I'd added the offset to make things match up to blob driver on
x2-85/a840. But this gets in the way on parts with smaller GMEM.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40655>
Allow adjusting the location of RD dumps and trigger file through the
FD_RD_DUMP_PATH environment variable. When not present, the existing
defaults will be used.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40532>
Add a way to generate the table of gpu-ids that nvtop uses, to simplify
syncing nvtop with mesa when new gpu-ids are added. For example:
python3 src/freedreno/common/freedreno_devices.py -p ./$builddir/src/freedreno/registers/adreno/ --nvtop
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40283>
When these are correctly mapped to draw usage, we can't rely on them
being globally initialized in tu_init_hw(). They need to be re-
initialized after rp stomping.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39819>
Adjust or fill out various properties for the a830 GPU, setting up a gen1
base. So far these mostly mirror the gen2 properties, except for gmem
config layouts, and they will probably further diverge down the line.
A new GPU ID for a830 is also added, Turnip there runs on top of KGSL.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39874>
With a8xx a lot of chicken bit and other device-specific magic register
handling has moved into the kernel, which leaves a list of register writes
that could be more commonly shared between all a8xx devices.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39874>
Extract stats about the descriptors used by a shader. A later commit
will use this information to help filter the used descriptors and types.
Unfortunately a1.x usage throws a bit of a wrench into the gears, since
(like s2en) we don't know which tex/samp or even in some cases bindless
base is used. This could perhaps be improved to detect the commmon case
of an immed value loaded into a1.x.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
With dual-wave dispatch, the hw needs (jp)s to also reflect where
co-issued waves might reconverge. We should also consider this for
branchstack, but we do not need to consider it while generating
physical edges for uGPR (shared register) allocation.
Handle this by calling the reconvergence pass twice on devices with
dual-wave dispatch. The first pass ignores the information we get
from nir about uniformity of branch conditions (because nir does not
distinguish between shader and warp level uniformity). In this
first pass, we skip setting up physical edges, which is handled in
the second pass (since co-issued waves have their own uGPR resources).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39442>
Rather than copying an ever growing list of params from fd_dev_info to
ir3_compiler, just store the info pointer in the compiler and use that
directly.
Mechanical change. But deletes code and removes an extra step from
adding compiler related dev info props.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39442>
Some newer gen8 devices (like a840/kaanapali, but not x2-85 which is
otherwise similar) flip the hw shading rate value around to match
vulkan/gl instead of DX.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
Narrowing integer conversions on SALU with GPR src do not behave as one
would expect on gen8, so avoid them. This does not apply to uGPR srcs
or float conversions.
See, for example:
dEQP-VK.glsl.builtin.function.integer.bitcount.int_highp_compute
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
At least on gen8 (unsure if hw or fw change) this makes things pissy and
results in writes to undefined offsets.
Non-zero chance that this was even UB on older gens, but ended up
writing somewhere harmless(ish).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39254>
Move the enabled storage_8bit property toggle into the base a7xx GPUProps
class. This enables storageBuffer8BitAccess Vulkan feature on all a7xx
hardware, much like the proprietary driver does. It's also a required
feature with Vulkan 1.4.
Fixes: dEQP-VK.info.device_mandatory_features on pre-a750 a7xx hardware.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39124>
Add astc hdr (float) formats, those get treated identically as the ldr
formats as the blocks have enough metadata to be decoded as float.
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38859>
The non-variant reg packers were removed in commit fd6489c026 ("tu:
Drop emitting of deprecated packing."), along with
FD_NO_DEPRECATED_PACK.
Add support to mark the even older reg builders as deprecated, and
re-introduce FD_NO_DEPRECATED_PACK to control this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Add support for A840 and X2-85. Despite slice count, differences in
memory bus and clks, they are architecturally similar from the PoV of
the UMD.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>