Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
The hw sm query code declared some input space, but wasn't actually using
it, so this is all dead code since clover got removed.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
Not entirely sure if it's actually required, but this makes it consistence
with r600_resource_create also calling r600_compute_global_buffer_create
for global memory buffers.
Cc: mesa-stable
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
intel_decoder_init() initializes intel_batch_decode_ctx so later
we can call decode functions but it depends on data stored in
brw/elk_isa_info but that was being allocated in stack
of intel_decoder_init() then when the decode functions were executed
it was accessing garbage at the brw/elk_isa_info memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ec2d20a70d ("intel/tools: Add helpers for decoder_init/disasm")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34776>
Or current render target cache setting is to key on the binding table
index, meaning the HW associates a number in the range [0, 7] to a
RENDER_SURFACE_STATE description. If you want change the render target
0 between 2 draw calls, you need to insert a PIPE_CONTROL in between
the 2 draw calls with pb-stall + rt-flush in order to flush an writes
to a previous RENDER_SURFACE_STATE that has now becomed disassociated
with the [0, 7] number.
This PIPE_CONTROL taking care of the flush is dealt with in
cmd_buffer_maybe_flush_rt_writes(). This function diffs the current
BTI setup for render targets (first 0 to 7 BTIs) with what the next
fragment shader wants.
The issue here is we might have a render pass with 0 color attachments
and yet in 98cdb9349a we added one pointing to the render target 0,
but in the emit_binding_table() when we finally program the BTI, we
check the render pass color count and program a null surface state
instead of an actual surface state. And this leads to hangs because
the render target cache will end up with inconsistent state data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 98cdb9349a ("anv: ensure null-rt bit in compiler isn't used when there is ds attachment")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12955
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34603>
I'm not sure how this was supposed to ensure padding was zero, and it
doesn't seem to work for me (GCC 15.0.1).
Fixes a NIR validation failure with dEQP-VK.glsl.bfloat16.constant.compute
and RADV.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 90e1b12890 ("spirv: Add bfloat16 support to SpecConstantOp")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34769>
smem_offset_max is UINT32_MAX on GFX7 and setting
max_const_offset_plus_one to 0 causes divisions by zero later.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: c26851b80b ("aco: increase max_const_offset_plus_one for SMEM load_global")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34766>
ANV_BO_ALLOC_MAPPED are internal allocated bos that need mmap() but as
internally we don't do any cflush() we need to make sure those are also
ANV_BO_ALLOC_HOST_COHERENT.
Checking for ANV_BO_ALLOC_HOST_CACHED could lead a cached+uncoherent
bo being allocated internally with ANV_BO_ALLOC_MAPPED.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34778>
It was failing in the first assert of anv_device_alloc_bo() because
it has ANV_BO_ALLOC_MAPPED but it don't have ANV_BO_ALLOC_HOST_COHERENT or
ANV_BO_ALLOC_HOST_CACHED(this second one is wrong and fixed in the next
patch).
LMEM is always write-combine, even SMEM on discrete GPU is always
write-back + coherent because the PCI bus protocol snooping at CPU
caches and that behavior can't be disabled.
So we can add this coherent flag without any side effects.
The ANV_BO_ALLOC_MAPPED is needed for ANV_BO_SLAB_HEAP_LMEM_SMEM
because to trigger SMEM+LMEM in anv_device_alloc_bo() we need
ANV_BO_ALLOC_MAPPED or ANV_BO_ALLOC_LOCAL_MEM_CPU_VISIBLE but the
second one is mostly used with small PCI bar discrete GPUs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dabb012423 ("anv: Implement anv_slab_bo and enable memory pool")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34778>
On a7xx, the max constlen for compute is increased to 512 vec4s or 8KB,
however the size of the LB was not increased beyond 40KB. A quick
calculation shows that 8KB of consts multiplied by 2 banks plus the
API maximum of 32KB shared memory would exceed 40KB. This means that
we can't always use a constlen of 512, and sometimes have to fall back
to 256 when a lot of shared memory is in use.
In the future, we can use similar calculations to figure out how much
"extra" shared memory is available for the backend to spill to, but we
currently don't support spilling to shared memory.
Fixes: 5879eaac18 ("ir3: Increase compute const size on a7xx")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34746>
This change relocates the fdot lowering from the generic NIR to the lima,
since lima is the only consumer of this particular lowering. This avoids
potential conflicts with the similar fdot lowering already present in
nir_lower_alu_width.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>
Since RMV 1.9 pcie_family_id is checked to verify whether a
capture is supported.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34763>
This allows building tools for cross-compiling without building gallium
or vulkan drivers unnecessarily.
Backport-to: 25.1
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34693>
This allows building tools for cross-compiling without building gallium
or vulkan drivers unnecessarily.
Backport-to: 25.1
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34693>
Round up allocations to nearest 2MB interval if this increases
the allocation by no more than 1.33x. This reduces page count but
at the cost of extra memory consumption. Optimization only applied
to MTL(Xe KMD only)/LNL platforms, which are particularly impacted by
page misses.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
BOs larger than 1MB don't go memory pool due the size but applications
tend to use a lot of VkMemory with size larger than 1MB so to reduce
the number of pages and improve performance here I'm aligning the size
of BOs larger than 1MB to 64kb, this allows 64kb pages to be used at
least on Xe KMD.
This bring substantial perfomance benefit in exchange of a small
memory waste.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
MTL and newer integrated platforms has a performance gain when using
transparent huge pages, because of the fixed address requirement
we can't use slab for this case but we can change the initial pool
size to 2MB so all allocations get the transparent huge page
optimization.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
I can't think in any case where that would be false, so lets drop it.
While at it, also making some variables const.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
Because of the ANV_BO_ALLOC_CAPTURE flag, batch buffers were not
allowed to use memory pool.
So to workaround that here adding a new anv_bo_slab_heap heap for
cached+coherent+capture buffers with the main goal to get batch
buffer to memory pool but other buffers will as well.
For now that will only work in Xe KMD as i915 requires more changes
to support it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
The whole purpose of anv_bo_pool is to reduce the number of
gem_create/destroy calls in command buffers that is something with
a short life span.
But slab_bo/memory pool does the same with even other benefits like
doing 2MB allocations to enable THP.
So here skipping the meat of anv_bo_pool_free() to directly return
the bo to slab_bo. This change is also necessary because the way
anv_bo_pool stores freed buffers it requires that all bos has a unique
gem handle, what not true of buffer allocated by anv_slab.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
This flag was not supported in anv_slab_bo because it is set together
with ANV_BO_ALLOC_CAPTURE and more important it has a specific VMA
range.
We can support it by adding a custom heap and allocating all bos in
the heap with all necessary flags, but because application can also
allocate those with vkAllocateMemory() here the ANV_BO_ALLOC_CAPTURE
is appended to the vkAllocateMemory() path for integrated gpu and
anv_slab_bo check if all the alloc_flags matches, because application
could choose to allocate it in a cached but not coherent memory type
for example.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
This flag was not supported in anv_slab_bo because it is set together
with ANV_BO_ALLOC_CAPTURE and more important it has a specific VMA
range.
But we can easily support it by adding a custom heap with it and
allocating all bos in the heap with all necessary flags.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
This changes allow us to support memory pool of bos with
ANV_BO_ALLOC_AUX_CCS set.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
This is implementing the functions in anv_slab_bo and actually
enabling memory pool.
This is heavily based on Iris memory pool implementation, the main
difference is that the concept of heaps only exist in anv_slab_bo, we
have function that takes the anv_bo_alloc_flags and decides what heap
to place that bo.
Some anv_bo_alloc_flags blocks memory pool, we can relax and remove
some flags from this denied list later.
This feature can be disabled in runtime by setting
ANV_DISABLE_SLAB=true, this can help us to easily check if bugs
are due to this feature or not.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>
Allocating larger buffers allows KMD/HW to enable optimizations
that makes access to memory faster, also because of minimum alignment
required in some cases we allocate 4k or 64k long buffers for
usages that only needs a few bytes, wasting a lot of memory.
Memory pool takes care of both of those things and here I'm
adding the base infrastruture to implement this feature.
The next patch will implement the functions in anv_slab_bo.c, spliting
it in two to make review easier.
The idea here is take the same approach as Iris and use pb_slab.h.
In 99% of the places it will be transparent that anv_bo is actually
a slab of a larger and real anv_bo, the remaning 1% of the places are
handled here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33558>