Turing and newer Nvidia cards can work with up to 8 discard rectangles
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33476>
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.
This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
The actual chances of this happening seem dubious, but the cleaned up
code seems nice. printf returns a value >= 0 on success, which is the
number of characters it writes a return < 0 means that an error
occurred, and then errno is set. Which negative value doesn't seem to be
specified, but it also seems unlikely that any implementation would
return `-MAX_INT`...
Anyway, this is fixed by converting the generic `print_repeated` to a
`print_separator` that avoids the need to do arithmetic at all by just
stopping the loop at 1 instead of 0, and then printing a newline.
CID: 1666497
CID: 1666256
CID: 1666531
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37746>
Coverity is pointing out that we should check this, and in reality if
this isn't what we expect the rest of the test is probably invalid
anyway.
CID: 1666504
CID: 1666544
CID: 1666552
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37750>
This was missed in the original maintenance9 MR.
Fixes the flakes in test
dEQP-VK.synchronization2.op.single_queue.event.write_ssbo_compute_read_ssbo_compute.buffer_16384_maintenance9
Fixes: 7692d3c0 ("nvk: Advertise VK_KHR_maintenance9")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37964>
With untyped pointer we might get a deref_cast with a 0 ptr_stride. But we
were supposed to ignore the stride information on the pointer anyway, so
let's do that properly now.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
The normal encode pass writes batches to a section in build scratch
memory. Those batches contain information about the internal node and
the primitive nodes. The encoder is split to avoid the register
pressure of the compressor and maximize occupancy.
The compressor works in two passes because one pass can not guarantee
that every primitive node (except) has at least two triangles. This
guarantee is used to advertise a smaller acceleration structure size to
the application.
During compression, every invocation processes at most two triangles.
Groups of 8 invocations are used to support the maximum triangle count
of 16 that the hardware supports.
The first step of compression is loading the triangle(s). Shared
vertices are deduplicated early to avoid doing it in the compression
loop. The compression loop tries to add triangles to a list of triangles
until the computed node size needed for storing the triangles reaches
the hardware node size. For this, each invocation first deduplicates
vertices with the triangles that have already been picked. It then
computes the node size of the picked triangles plus the candidate
triangles of the current invocation. The invocation that computed the
smallest size is added to the list.
Because it may not be possible to fit every triangle into the same node,
there can be multiple hardware nodes which are written in parallel for
optimal performance. If there are no nodes with only one triangle, all
nodes are written. If there is, compression of the batch is aborted and
the index of the batch is written to build scratch memory. The second
compression pass will repeat the steps above but only for those aborted
batches. The nodes with only one triangle can and are now merged.
It can not be determined during box node encode which triangles will be
compressed together so the encoder also has to fix up the parent box
node's child infos.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
Upon acquiring an external image from external/foreign queue family,
skip AFBC metadata invalidation if the app has explicitly requested
acquireUnmodifiedMemory. This also applies to CRC which may or may not
get hooked up later.
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37972>