Defines a more general purpose version of `poly_unroll_restart`
named `poly_unroll_geometry`, which allows unrolling without an
input index buffer by separating the input and output index sizes.
This allows it to be used for additional use cases, such as
unrolling triangle fans or changing index types, where the draw
may not necessarily be indexed or the input and output index types
are not the same.
`poly_unroll_restart` remains as an alias with the same declaration
as before.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41568>
Until now the BLT engine only handled same-format blits. Teach it to
convert between the UNORM formats it can represent, so format-converting
blits no longer fall back to the 3D blitter.
The supported formats are A8R8G8B8, X8R8G8B8, A4R4G4B4, A1R5G5B5,
R5G6B5, R8G8, R8 and A2R10G10B10.
The BLT format names are BGRA-based, matching the PE-internal byte order,
so an identity swizzle is correct for all of these except A2R10G10B10.
That one the PE keeps in RGBA order, so pipe R lands in the BLT B position
and vice versa. blt_conversion_needs_channel_swap() captures this and
find_blt_conversion() derives the per-image swizzle.
SRGB variants share the BLT format of their UNORM sibling. For example
R8G8B8A8_UNORM and R8G8B8A8_SRGB both map to BLT_FORMAT_A8R8G8B8, and
the sRGB handling is carried separately via img->srgb.
No regressions in dEQP-GLES3.functional.fbo.blit.conversion.*.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39879>
Add sRGB field to blt_imginfo and use it to conditionally set the
BLT_SRC_IMAGE_CONFIG_SRGB and BLT_DEST_IMAGE_CONFIG_SRGB bits in
the BLT config register setup.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39879>
Mesa core pre-seeds VS/TCS/TES/GS/FS in _mesa_init_constants(..) with
MAX_TEXTURE_IMAGE_UNITS. When a driver does not expose a stage, this
seed leaks into the GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS sum. Drivers
that only expose VS+FS (like etnaviv) overcounted by 96. Zero the
field so the sum reflects only the stages the driver advertises.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41746>
The TFU stride-0 fill path allocates a 64 KiB staging BO
(V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the
pattern, and caches it on the command buffer. For non-zero patterns
the per-cmd-buffer cache works well, but WebGPU/Dawn workloads
issue many zero-fills (lazy buffer init) across separate command
buffers, so the cache misses almost every time and each fill pays
for a fresh alloc + mmap + memcpy.
Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero,
lazily allocated under meta.mtx and used whenever data == 0. The BO
is read-only after init so it can be shared across queues without
extra synchronization, and it is freed in destroy_device_meta.
Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60
meta_fill_buffer calls, ~218 MiB total, all zero-fills):
before: TFU branch total 7.328 ms, avg 115.55 us/call
after: TFU branch total 0.296 ms, avg 4.78 us/call (~24x)
Non-zero patterns continue to use the per-cmd-buffer cache.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Adjust eligibility check on imageExtent vs slice dimensions
rather than on the buffer addressing dimensions. The TFU codepath
here always writes/reads the full slice from its origin, so the
required invariant is 'imageExtent == slice'; bufferRowLength and
bufferImageHeight may be larger than imageExtent (the spec permits
this for non-zero values), in which case the TFU reads/writes at the
buffer's row/layer stride but only touches slice->width pixels per
row and slice->height rows per layer, leaving the trailing padding
untouched.
The previous combined check (width == slice->width && height ==
slice->height applied to the buffer dimensions) would reject any
caller that set bufferRowLength or bufferImageHeight larger than the
image (this is common for buffers shared across mip levels or
for alignment requirements like Dawn aligning bufferRowLength to 2
for 1-pixel-wide textures), forcing those copies through the slower
TLB / blit / compute paths.
For compressed formats, keep the strict equality check since
block-level stride semantics are more complex.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Generalize copy_buffer_image_tfu with a to_buffer flag selecting which
side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2
before the TLB path.
The to_buffer=true direction has the same eligibility constraints as
buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot
produce raster output, and for image-to-buffer the destination is
always a raster buffer.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Drop the direction from the function name in preparation for sharing
this implementation with image-to-buffer copies in the next commit.
Pure rename, no functional change.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Replace the TLB-based meta_fill_buffer path on V3D 7.1+ with a TFU
raster-to-raster copy that broadcasts a single staging row across
the output via iis=0 (stride-0 input). This eliminates the per-fill
CL render job and its tile_alloc/TSDA BO overhead, which is
substantial on workloads that issue many small fills (e.g. WebGPU
lazy buffer initialization in Dawn).
The staging BO holding one row of the fill pattern is cached on the
command buffer and reused across fills with the same data value, so
sequences of identical-pattern fills share a single staging BO.
The existing TLB-based fill is kept as a fallback and is also used
when V3D_DEBUG=disable_tfu is set, or on V3D simulator builds where
the stride-0 TFU input mode is not supported and would assert.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Move from v3dv_meta_copy.c to a generic v3dv_cmd_buffer_destroy_bo_cb
in the cmd buffer module. This makes it reusable for different callers
that want to attach a v3dv_bo to a command buffer's private_objs list.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Buffer-to-buffer copies on V3D 7.1+ can be served by the TFU as a
raster-to-raster copy, avoiding the per-copy CL render job and
tile_alloc/TSDA BO overhead of the TLB-based path.
Treat the buffer as a raster texture and chunk the copy into TFU
jobs of up to 16384x16384 pixels. Pick the largest pixel size
(cpp in {4,2,1}) such that src/dst offsets and size are all
cpp-aligned: cpp=4 (R8G8B8A8_UINT) is the expected common case;
cpp=2 (R8G8_UINT) and cpp=1 (R8_UINT) handle Vulkan-permitted
unaligned vkCmdCopyBuffer regions that would otherwise fall back
to the slow TLB path. Skipped when V3D_DEBUG=disable_tfu is set;
emits perf_debug when the cpp=1/2 fallback is taken.
Drop the `if (copy_job)` guard on src_bo cleanup registration in
v3dv_CmdUpdateBuffer: the TFU path queues jobs without returning a
v3dv_job*, so the staging BO must be tracked unconditionally to
avoid leaking once the cmd buffer is submitted.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Use the RT descriptors pointer as dummy VA.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41780>
To avoid weird crashes caused by trying to load unused attributes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41780>
can_bind_const_buffer_as_vertex is a direct copy of
screen->caps.can_bind_const_buffer_as_vertex.
Read the cap from the screen directly.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
allow_st_finalize_nir_twice is a direct copy of
screen->caps.call_finalize_nir_in_linker.
Read the cap from the screen directly at each call site.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
prefer_real_buffer_in_constbuf0 is a direct copy of
screen->caps.prefer_real_buffer_in_constbuf0.
Read the cap from the screen directly.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
lower_two_sided_color is the negation of screen->caps.two_sided_color.
Replace all call sites with !screen->caps.two_sided_color directly.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>