It doesn't change the reported values, but it will allow us to easily
advertise real hardware limits in the future.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
We can allocate more than 48k of shared memory, but the limits differ
across hardware, so we need to take it all into account to create the
shared memory splits the hardware can accept.
This does change behavior on Turing, but the assumption is, that the
hardware has simply rounded up. Might need performance testing on Turing
to verify nothing regresses here.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
It's a bit of a disaster, but each generation supports a different set of
shared memory configurations.
Knowing the maximum is important for compute shader performance, knowing
all the legal sizes for QMD generation.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
On hw we have up to 228k of available Shared memory so a 16 bit int isn't
enough for that.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
Now that we're guaranteed the upper 32 bits are zero initialized,
there's no reason we need to do a 64-bit write here.
This is a 0.3% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (638 fps -> 640 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
We can initialize this just once from the CPU side instead of
overwriting it each time using the copy engine.
This is a 5% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (607 fps -> 638 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
Within a single command buffer, we know that our operations will happen
sequentially so we don't need to allocate a unique address per
vkCmdBeginConditionalRenderingEXT - we can re-use the same address
instead.
Improves perf on the Sascha Willems conditionalrender demo with all
rendering disabled by about 2% (595 fps -> 607 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
This is a 41% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (422 fps -> 595 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
The tricky thing here is that we have to emulate the 64k "standard"
tile sizes in terms of the native 4k macrotiles. We do this by
manipulating which 4k pages get mapped, dividing the 64k tile into 4k
macrotiles and mapping each tile in such a way that, when viewed in
terms of the final swizzled image coordinates, the 4k tiles linearly
tile the image region that's supposed to be mapped to the 64k "tile".
Supporting the standard block sizes allows emulation layers to claim D3D
Tiled Resources Tier 2, which is required for the 12.0 feature level.
It's also required for ARB_sparse_texture2.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32671>
Compute the Vulkan "sparse miptail," add support for padding the array
stride in order to make sure that the sparse miptail is large enough as
mandated by the Vulkan spec, and add a function to compute the standard
sparse block size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32671>
For sparse, we will need to handle bank swizzling and macrotiles when
mapping sparse textures. However the functions for handling this were
leaking internal tiled_memcpy implementation details, like the concept
of a 256-byte "block" that doesn't really exist in the tiling (instead
everyone else deals with UBWC blocks, which may be 256 bytes or smaller,
and 4K macrotiles). Rewrite them to work in terms of macrotiles, and
take an fdl_layout.
In order to avoid having to pass an fdl_layout everywhere, pass around
the computed bank_mask and bank_swizzle everywhere. This also means that
we don't recompute several times.
Finally, expose a function to compute the macrotile size, which will
also be needed to work with bank swizzling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32671>
it's otherwise possible for this to race and hit the draw before
precompile finishes without ever waiting on the fence
I guess this just worked coincidentally before?
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37197>
This was added on kernel side in commit 13ed0a1af263 ("drm/msm: Fix a7xx
debugbus read"), but mesa copy of the registers was updated from an
earlier revision of that patch which did not have A7XX_CX_DBGC.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37192>
This is supported and must be enabled when
descriptorBinding*UpdateAfterBind is active.
Fixes the following VVL error:
Validation Error: [ VUID-VkDeviceCreateInfo-robustBufferAccess-10247 ]
vkCreateDevice(): robustBufferAccessUpdateAfterBind is false, but both
robustBufferAccess and a descriptorBinding*UpdateAfterBind feature are
enabled.
Fixes: d9fcf5de55 ("turnip: Enable nonuniform descriptor indexing")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36787>
MEC probably doesn't support EVENT_WRITE_EOP.
Both PAL and RadeonSI use RELEASE_MEM.
RADV used RELEASE_MEM too but "is_gfx8_mec" was very misleading.
This commit just cleans that up. No functional changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37121>
EOS events are buggy on GFX7 and can cause hangs when used
together in the same IB with CP DMA packets that use L2.
While we don't use the L2 for CP DMA copies, we still use it
with CP DMA prefetches, so the issue needs to be mitigated.
As a mitigation, avoid using EVENT_WRITE_EOS and prefer to use
the BOTTOM_OF_PIPE event instead of PS_DONE/CS_DONE, which should
be close enough.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37121>
GFX6 actually supports IB2, but doesn't support chaining between
chunks inside the IB2. See WaCpIb2ChainingUnsupported in PAL.
Disable IB2 on GFX6 for now.
The proper fix will be to disable use_ib in just secondary
command buffers on GFX6 and emit multiple IB2 packets in the
main command buffer. This will be implemented later.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37121>
After a refactor last year, the noibs option stopped working
because it hits an assertion when empty IBs are submitted.
Emit a single large NOP packet to avoid submitting empty IBs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37121>