The HW only supports converting BRW_TYPE_BF values to/from BRW_TYPE_F,
so intermediate conversion is needed. Move the intermediate conversion
to the implementation of `@convert_cmat_intel` and simplify the
brw_nir_lower_cooperative_matrix pass. This has two positive effects
- Fixes conversion between BF and integer type cooperative matrices,
that was still using the old emit_alu1 approach instead of the new
code for `@convert_cmat_intel`.
- Guarantee the intermediate conversion will result in a valid layout
for conversions involved USE_B matrices. If we instead used the
intrinsic twice in brw_nir_lower_cooperative_matrix.c, a matrix with
invalid layout would be visible at NIR level and we wouldn't be able
to keep the current assertion for USE_B case.
Due to the configurations we have exposed, we still don't need to
write a more complex USE_B conversion -- they are all between same
size types (and, consequently, packing factors), so no shuffling of
data is needed to respect the USE_B layout.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36185>
We were missing a couple bits from hash and a bunch of stuff from the
comparison. This puts most of nir_tex_instr into a single pack_tex
helper that's used by both and grabs everything we were missing.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36234>
Logical sends and load_payload can have large VGRFs that cannot be
split. Once all of the lowering passes and optimization passes that
might eliminate any of those instructions have completed, try to split
larger VGRFs one last time.
Register allocation can only handle VGRFs up to a certain size, so this
is the last opportunity to prevent later failures due to VGRFs that are
too large.
Closes: #13239
shader-db:
Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17114494 -> 17114496 (<.01%)
instructions in affected programs: 2790 -> 2792 (0.07%)
helped: 2 / HURT: 4
total cycles in shared programs: 886617364 -> 886315282 (-0.03%)
cycles in affected programs: 4067540 -> 3765458 (-7.43%)
helped: 48 / HURT: 9
Ice Lake and Skylake had similar restuls. (Ice Lake shown)
total instructions in shared programs: 20799801 -> 20799691 (<.01%)
instructions in affected programs: 1210 -> 1100 (-9.09%)
helped: 1 / HURT: 0
total cycles in shared programs: 865495386 -> 865498990 (<.01%)
cycles in affected programs: 60132 -> 63736 (5.99%)
helped: 2 / HURT: 1
total spills in shared programs: 3987 -> 3981 (-0.15%)
spills in affected programs: 24 -> 18 (-25.00%)
helped: 1 / HURT: 0
total fills in shared programs: 3535 -> 3519 (-0.45%)
fills in affected programs: 36 -> 20 (-44.44%)
helped: 1 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 208647246 -> 208646499 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31257819536 -> 31263957016 (+0.02%); split: -0.02%, +0.04%
Max live registers: 66160877 -> 66155728 (-0.01%)
Totals from 34703 (4.91% of 707053) affected shaders:
Instrs: 13766639 -> 13765892 (-0.01%); split: -0.02%, +0.01%
Cycle count: 3693572086 -> 3699709566 (+0.17%); split: -0.15%, +0.32%
Max live registers: 4843852 -> 4838703 (-0.11%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36202>
In debug builds, the assertion should be preferred as it will highlight
the actual problem. In non-debug builds, it is possible to fail register
allocation more gracefully. If the problem only occurs in, for example,
a SIMD32 version of a shader, the application may even continue to
function.
Closes: #13239
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36202>
... and defer getDeviceQueue impl to vk_common and trim down impls in
gfxstream.
gfxstream advertises, and selects queues/queueFamilies from what the
real device on the host advertises. During createDevice(), it needs to
allocate the queue objects to support this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36227>
... And rename LinuxVirtGpu* -> DrmVirtGpu*
The characteristic of this virtgpu implementation is that it works
through the DRI from Linux. Yes, this is traditionally "Linux" specific,
but some platforms such as QNX, have started to incorporate parts of the
"DRM framework", on a platform that otherwise still is not "Linux". This
is just a more generally applicable naming to this implementation.
Reviewed-by: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36229>
VkDeviceMemory is always 64-bit, and %p on 32-bit is, well, 32-bit,
breaking the build.
There doesn't seem to be a good way to printf a Vulkan handle
cross-platform-ly, and it's unlikely to actually be useful,
so just don't print it at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36231>
var_nodes size is x4 of nir defs count, since we need to track a node
for each individual channel of a register write. We don't need that for
SSA, but we used non-shifted indices for SSA, which made the compiler
reliant of reg nir def indeces to start after all the SSA indices.
That has changed with 7b70b419b528("nir: always index SSA defs before
printing").
Fix that by shifting SSA index as well, that would allow not to rely on
any assumptions on nir def indices.
Backport-to: 25.2
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36206>
In VCN5, the AV1 context buffer has changed to a bigger
one than VCN4. It fixed an AV1 decoding issue on VCN5.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36208>
Per spec VUID-VkMemoryAllocateInfo-pNext-01874:
If the parameters do not define an import operation, and the pNext chain
includes a VkExportMemoryAllocateInfo structure with
VK_EXTERNAL_MEMORY_HANDLE_TYPE_ANDROID_HARDWARE_BUFFER_BIT_ANDROID
included in its handleTypes member, and the pNext chain includes a
VkMemoryDedicatedAllocateInfo structure with image not equal to
VK_NULL_HANDLE, then allocationSize must be 0
- before: total 116, skip 66, pass 36, fail 14
- after: total 116, skip 66, pass 50, fail 0
Fixes: cebb2bf266 ("lavapipe: Add AHB extension")
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36204>
The implementation only takes the ownership after a successful import.
On import failure, the caller is going to handle the fd. Meanwhile,
amend a missing error code on an error path.
Fixes: 895d3399f7 ("lavapipe: add support for KHR_external_memory_fd")
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36204>
lvp hasn't used common device memory obj, and it allocates and imports
ahb on its own. Thus it has to implement the AHB export api itself.
- before: total 116, skip 66, pass 24, fail 26
- after: total 116, skip 66, pass 36, fail 14
Fixes: cebb2bf266 ("lavapipe: Add AHB extension")
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36204>
Currently tile swizzle can only be non zero for single plane
formats, for multi plane formats we always set PIPE_BIND_SHARED.
Luma only (Y400) JPG decode and encode with RGB input surface (EFC)
are the only two cases where we can get surface with tile swizzle
and ignoring it would result in corrupted output.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13346
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35647>
There is no need to have an own copy
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36218>
While the API spec does describe which flags _may_ be passed in, the
overall CL working group agreement is, that implementations should expect
random flags to be passed in as other implementations _may_ use them to
further restrict or allow image formats.
Also fix validation for importing GL objects while at it.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36216>
This unblocks the main worker thread to keep submitting work to the driver
while we still have something waiting on the completion of batches sent to
the hardware and to signal completion to the attached events.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36158>
The runtime vk_android.h header has proper android detection inside, so
no need to wrap it with redundant android detection. Meanwhile, the enum
VK_EXTERNAL_MEMORY_HANDLE_TYPE_ANDROID_HARDWARE_BUFFER_BIT_ANDROID is
defined in the vulkan_core.h, so no need Android wrap either.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36151>
Otherwise we can have a case where binning VS uses more consts than
full VS (when safe variant is used for full VS), that will result in
a rendering issue because SP_VS_CONST_CONFIG.CONSTLEN is shared between
full and binning VS in PROGRAM_CONFIG state and gets the value from the
full VS.
There are two alternative solutions that can allow binning VS to always
use maximum constlen:
- Move constlen emission to per-XS config. This interferes
PROGRAM_CONFIG state which uploads consts and does SP_UPDATE_CNTL.
Consts would need to be uploaded after constlen is defined, while
SP_UPDATE_CNTL must be done before per-XS state is emitted.
Also having SP_UPDATE_CNTL in a draw state that is always DIRTY
isn't great.
Something didn't work out on A6XX, so this idea was dropped.
- Emit constlen again in VS_BINNING draw state. This seem to work
but also likely an undefined behaviour since constlen is changed
after some consts are uploaded.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36203>