I'm not sure why exactly it didn't work because
TPL1_A2D_SRC_TEXTURE_SIZE seemingly has (1 << 15) width
limit. However tests have shown that it doesn't work out.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36939>
The bottom right corner of the copy exceeded the maximum allowed
value in GRAS_A2D_DEST_BR.x
In order to fix this, we have to do a second copy per line of
the last texels.
Fixes asserts in:
dEQP-GLES31.functional.copy_image.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36939>
Turnip crashes under drm-shim when enabling VM_BIND. We don't care about
VM_BIND for shader compilation so just disable it.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 4efbfa1441 ("tu/drm: Enable VM_BIND")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37000>
FdMemoryDataSource was being registered as a Perfetto data source
unconditionally which led to anything calling fd_device_new(...)
attempting to do this even when they might not have Perfetto
initialized which is done as a part of util_perfetto_init, without
which trying to register the event causes a SEGFAULT.
Fixes: c7045e3e63 ("perfetto: unify init")
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36993>
We are still in the process of moving our kernels to gfx-ci/linux, but
we got the request to uprev the kernel a month ago when I started my
holiday, so let's not delay it more. Anyway, it is better to change
only one variable at a time so no harm done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
UCHE and CCU use virtual-tagged addresses, so whenever an alias may have
changed we have to always flush and invalidate everything. We detect
this through the sparse memory aliasing flag on the buffer/image, or for
plain memory barriers whether the feature is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
Plumb through support for a sparse queue and enable sparse binding using
the kernel interfaces we added earlier. We also support sparse residency
for buffers, which is straightforward, but sparse residency for images
is much more complicated so it will be enabled later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
Add a "sparse VMA" abstraction, and functions creating them, destroying
them, and submitting commands to map and unmap BOs into them. This
mirrors the Vulkan API, but with image offsets resolved to page offsets.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
Use a new driver-internal VM_BIND submit queue for mapping and unmapping
"normal" BOs. This will be required for sparse, because we can't mix
the old and new interface, but it should also allow us to stop using
"zombie" VMAs and the bo list.
Also use MSM_BO_NO_SHARE, which we assume is available when VM_BIND is.
This should significantly reduce kernel submit overhead, in parallel to
the userspace submit overhead cut by using VM_BIND.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
According to the spec and as implemented by other drivers, this should
use the size of the buffer instead of the size of the VkDeviceMemory
it's bound to when VK_WHOLE_SIZE is specified or pSizes is NULL. The
current behavior doesn't make sense at all for sparse buffers which are
not bound to a single VkDeviceMemory. Just use the common helper that
already does the right thing, copied from anv.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
The kernel was rounding the size up for us, but it doesn't like a
non-aligned map size, so just sanitize the size here.
tu_cs was relying on the size not being rounded to keep the maximum size
2^20-1 or less, so fix that by using the initial unrounded size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
For VM_BIND, BO deletion will have to be implemented differently in
native drm and virtio. We already have a somewhat awkward situation with
native-specific code in the common BO deletion helper, which we only get
away with because it's for kernels without SET_IOVA in which case virtio
isn't supported. Add a few common helpers for some of the guts, and move
the guts into backend-specific functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533>
This avoids having to hardcode the proxy in the traces `download-url` or
jobs setting `PIGLIT_REPLAY_EXTRA_ARGS` and accidentally overriding the
default args when the author meant to append.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36955>
We pass the tests for exchange, load, and store on R32_SFLOAT, including
shared memory (which the proprietary driver does not advertise). The blob
does not support add operations either.
Passes:
dEQP-VK.glsl.atomic_operations.exchange_float*
dEQP-VK.image.atomic_operations.exchange*r32f*
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36907>
Descriptor prefetches may be generated for instructions in control flow.
This means we cannot simply emit prefetches at the end of the preamble
because that may not be dominated by all their sources. This commit uses
the helpers introduced by e7ac1094f6 ("ir3: rematerialize preamble defs
in block dominated by sources") to find the correct block to insert
prefetches.
Fixes NIR validation errors in Dying Light 2.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 4e2a0a5ad0 ("ir3: Add descriptor prefetching optimization on a7xx")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36885>
Name the register, which is actually an array, and initialize it
programmatically using the same table as the per-primitive case. This
should produce the same value as the old hardcoded constant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36892>
We had an extra 16 entries in the VK-to-HW table that were clearly
unnecessary because Vulkan does not allow values greater than 16 for the
primitive shading rate. This appears to be an extra debug/test thing
added by the blob. Similarly there were unused entries in the HW-to-VK
table that shouldn't be necessary. Delete them.
The HW-to-VK table was also inconsistent about whether invalid values
should be 0 or 11, fix that too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36892>
A tu_bo object can be in the process of being dumped during queue submit
while also being destroyed on a separate thread. During destruction, tu_bo
should be removed from the device's dump_bo_list before unmapping, this
way the mapping of any given tu_bo won't disappear while it's being dumped.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36904>
This has gotten complicated enough that we need somewhere outside of the
driver itself to give an overall flow of how the feature is implemented.
This includes a few things that are enabled in the subsequent commits,
specifically the LRZ parts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36475>
wsi_common_vk_instance_supports_present_wait returns true for all
supported wsi platforms here, so we can unconditionally advertise them
behind TU_USE_WSI_PLATFORM like the other wsi extensions (also to not
tangle with Android).
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36835>
We can just place the set structures inside nir_block.
This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
Our SSBO access instructions expect offsets in units of the accessed
type's size. However, we were ingesting SSBO intrinsics that use byte
addresses. We were fixing this up in ir3_nir_lower_io_offsets by
inserting a ushr or, if possible, propagating this shift into another
shift that's part of the address calculation.
Having to insert a ushr if unfortunate, as for most accesses, it should
be possible to extract this shift directly from the access chain because
the array strides and struct offsets would be properly aligned. It also
prohibits nir_opt_offsets to find constant additions to extract as they
would be hidden behind a ushr that often cannot be optimized away.
57ea689273 ("ir3: optimize SSBO offset shifts for nir_opt_offsets")
tried to overcome the latter problem somewhat by pushing a ushr into
additions. This turned out to be unsound because even though SSBO
offsets are unsigned, intermediate results in the offset calculation
might be negative values which means we should use ishr in those cases.
Unfortunately, we cannot know when to use ushr or ishr.
This commit switches ir3 to the newly introduced offset_shift index for
SSBO intrinsics. This allows the shift to be extracted when lowering
derefs in nir_lower_explicit_io. In some, we still might have to add an
extra shift to make sure the offset uses the correct units. It turns out
that this is very rare and using offset_shift greatly improves the
shader stats:
Totals from 33267 (20.20% of 164705) affected shaders:
MaxWaves: 440368 -> 455258 (+3.38%); split: +3.40%, -0.01%
Instrs: 22974358 -> 21844188 (-4.92%); split: -4.98%, +0.06%
CodeSize: 45456418 -> 43099334 (-5.19%); split: -5.22%, +0.03%
NOPs: 4612549 -> 4524353 (-1.91%); split: -2.97%, +1.05%
MOVs: 802018 -> 817547 (+1.94%); split: -3.29%, +5.23%
COVs: 381987 -> 382061 (+0.02%); split: -0.03%, +0.05%
Full: 514078 -> 477339 (-7.15%); split: -7.18%, +0.04%
(ss): 544419 -> 502332 (-7.73%); split: -9.12%, +1.39%
(sy): 292099 -> 304697 (+4.31%); split: -3.19%, +7.50%
(ss)-stall: 2106134 -> 2104011 (-0.10%); split: -1.82%, +1.71%
(sy)-stall: 9704720 -> 10324864 (+6.39%); split: -4.64%, +11.03%
STPs: 11301 -> 10074 (-10.86%)
LDPs: 18654 -> 17202 (-7.78%)
Preamble Instrs: 4652214 -> 4580289 (-1.55%); split: -1.59%, +0.04%
Early Preamble: 13977 -> 13978 (+0.01%)
Constlen: 1881764 -> 1881304 (-0.02%); split: -0.03%, +0.01%
Last helper: 5157587 -> 5074042 (-1.62%); split: -1.86%, +0.24%
Subgroup size: 2262976 -> 2263232 (+0.01%)
Cat0: 5065452 -> 4976324 (-1.76%); split: -2.73%, +0.97%
Cat1: 1241085 -> 1251974 (+0.88%); split: -2.52%, +3.40%
Cat2: 8462897 -> 7723367 (-8.74%); split: -8.74%, +0.01%
Cat3: 5738382 -> 5735312 (-0.05%); split: -0.06%, +0.00%
Cat5: 761945 -> 763017 (+0.14%); split: -0.00%, +0.14%
Cat6: 199819 -> 197766 (-1.03%); split: -1.34%, +0.31%
Cat7: 890192 -> 581842 (-34.64%); split: -35.20%, +0.57%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Values are taken from minStorageBufferOffsetAlignment and
minUniformBufferOffsetAlignment.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Predicate registers can be written from the scalar ALU by using a
special cat2 encoding: if the dst is encoded as a0.c, the instruction
will execute on the scalar ALU and write to p0.c.
This commit follows the blob and disassembles scalar predicates as
up0.c. The "u" presumably stands for "uniform".
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36614>
Predicate registers can be written from the scalar ALU by using a
special cat2 encoding: if the dst is encoded as a0.c, the instruction
will execute on the scalar ALU and write to p0.c.
This commit makes the ir3 backend aware of scalar predicates. A new
register flag (IR3_REG_UNIFORM) is added that can be used to mark
predicate dsts as being written by the scalar ALU. For such dsts, the
same synchronization rules apply as for shared registers written by the
scalar ALU (e.g., (ss) is needed to read them from the vector ALU).
Scalar predicates can be used in the early preamble, which makes control
flow available there.
In many ways, the backend treats IR3_REG_UNIFORM the same as
IR3_REG_SHARED. A new flag was added because IR3_REG_SHARED is mainly
used to denote a separate register file, not as a flag to indicate usage
by the scalar ALU. Scalar predicates still use the normal predicate
register file but allow it to be written from the scalar ALU.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36614>