pvr: consider the size of DMA request when setting msize of DDMADT

The DDMADT instruction of PDS has out-of-bound test capability, which is
used for implementation of robust vertex input fetch.

According to the pseudocode in the comment block before the "LAST DDMAD"
mark in pvr_pipeline_pds.c, the check is between
`calculated_source_address + (burst_size << 2)` and `base_address +
buffer_size`, in which the `burst_size` seems to correspond to the BSIZE
field set in the low 32-bit of DDMAD(T) src3 and the `buffer_size`
corresponds to the MSIZE field set in the DDMADT-specific high 32-bit of
src3. As the calculated source address is just the base address adds the
multiplication result (the offset), the base address could be eliminated
from the check, results in the check between `offset + (BSIZE * 4)` and
`MSIZE` .

Naturally it's expected to just set the MSIZE field to the buffer size.
In addition, as the Vulkan spec says "Reads from a vertex input MAY
instead be bounds checked against a range rounded down to the nearest
multiple of the stride of its binding", the driver rounds down the
accessible buffer size before setting MSIZE to it.

However when running OpenGL ES 2.0 CTS, two problems are exhibited about
the setting of the size to check:

- dEQP-GLES2.functional.buffer.write.basic.array_stream_draw sets up a
  VBO with 3 bytes per vertex (RGB colors and 1B per color) and 340
  vertices (results in a buffer size of 1020 = 0x3fc). However as the
  DMA request size, which is specified by BSIZE, is counted by dwords,
  3 bytes are rounded up to 1 dword (which is 4 bytes). When the bound
  check of the last vertex happens, the vertex's DMA start offset is
  0x3f9, so the DDMADT check happens between 0x3fd (0x3f9 + 1 * 4) and
  0x3fc, and indicates a check failure. This prevents the last vertex,
  which is perfectly in-bound, from being properly fetched; this is
  against the Vulkan specification, and needs to be fixed.
- dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.
  buffer_0_32_float2_vec4_dynamic_draw_quads_1 sets up a VBO with a size
  of 168 bytes, and tries to draw 6 vertices (each vertex consumes 2
  floats (thus 8 bytes) of attribute) with a stride of 32 bytes using
  this VBO. Zink then translates the VBO to a Vulkan vertex buffer bound
  with size = 168B, stride = 32B. Here the optional rule about rounding
  down buffer size happens in the current PowerVR driver, and the
  checked bound is rounded down to 160B, which prevented the last
  vertex's 8B attributes to be fetched. It looks like this kind of
  situation is considered in the codepath without DDMADT, but omitted
  for the codepath utilizing DDMADT for bound check.

So this patch tries to mimic the behavior of DDMADT when setting the
MSIZE field of it to prevent false out-of-bounds. It first calculates
the offset of the last valid vertex DMA, then adds the DMA request size
to it to form the final MSIZE value. With the code calculating the last
valid DMA offset considering the situation of fetching the attribute
from the space after the last whole multiple of stride, both problems
mentioned above are solved by this rework.

There're 99 GLES CTS testcases fixed by this change, and Vulkan CTS
shows no regression on `dEQP-VK.robustness.robustness1_vertex_access.*`
tests.

Fixes: 4873903b56 ("pvr: Enable PDS_DDMADT")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit 252904f3d1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This commit is contained in:
Icenowy Zheng 2026-03-20 22:22:38 +08:00 committed by Eric Engestrom
parent e62ef26e01
commit dbdbef673f
4 changed files with 44 additions and 9 deletions

View file

@ -1844,7 +1844,7 @@
"description": "pvr: consider the size of DMA request when setting msize of DDMADT",
"nominated": true,
"nomination_type": 2,
"resolution": 0,
"resolution": 1,
"main_sha": null,
"because_sha": "4873903b565a10322669db325de7284218539248",
"notes": null

View file

@ -1203,6 +1203,9 @@ struct pvr_pds_const_map_entry_vertex_attr_ddmadt_oob_buffer_size {
uint8_t type;
uint8_t const_offset;
uint8_t binding_index;
uint8_t size_in_dwords;
uint16_t offset;
uint8_t attrib_size_in_bytes;
} PVR_PACKED;
struct pvr_pds_const_map_entry_cond_render {

View file

@ -1101,7 +1101,11 @@ void pvr_pds_generate_vertex_primary_program(
oob_buffer_size->type =
PVR_PDS_CONST_MAP_ENTRY_TYPE_VERTEX_ATTR_DDMADT_OOB_BUFFER_SIZE;
oob_buffer_size->const_offset = const_base + 7;
oob_buffer_size->offset = vertex_dma->offset;
oob_buffer_size->binding_index = vertex_dma->binding_index;
oob_buffer_size->size_in_dwords = vertex_dma->size_in_dwords;
oob_buffer_size->attrib_size_in_bytes =
vertex_dma->attrib_size_in_bytes;
} else {
literal_entry =
pvr_prepare_next_pds_const_map_entry(&entry_write_state,

View file

@ -5345,26 +5345,54 @@ pvr_setup_vertex_buffers(struct pvr_cmd_buffer *cmd_buffer,
&cmd_buffer->vk.dynamic_graphics_state;
uint32_t stride =
dynamic_state->vi_binding_strides[ddmadt_src3->binding_index];
uint32_t bound_size = binding->buffer->vk.size - binding->offset;
uint32_t valid_buffer_size =
binding->buffer->vk.size - binding->offset - ddmadt_src3->offset;
uint32_t attrib_size = ddmadt_src3->attrib_size_in_bytes;
uint32_t dma_size = ddmadt_src3->size_in_dwords * 4;
uint64_t control_qword;
uint32_t control_dword;
assert(PVR_HAS_FEATURE(&cmd_buffer->device->pdevice->dev_info,
pds_ddmadt));
if (stride) {
bound_size -= bound_size % stride;
if (bound_size == 0) {
/* If size is zero, DMA OOB won't execute. Read will come from
* robustness buffer.
/* The DMA should cover the whole attribute, but do not overread
* more than the headroom when allocating memory for buffers.
*/
assert(dma_size >= attrib_size);
assert(dma_size <= attrib_size + PVR_BUFFER_MEMORY_PADDING_SIZE);
/* DDMADT checks whether the whole DMA request (size indicated by
* the sizes_in_dwords field) is out of the regions set by msize.
* Set the buffer size to the last valid vertex DMA address plus
* the size of one DMA request.
*
* This also ensures at least one valid inbound DDMAD is possible.
* In case the original buffer isn't big enough for one DMA, the
* DDMADT source_address should have been already redirected to the
* robust buffer when setting up ROBUST_VERTEX_ATTRIBUTE_ADDRESS.
*
* See also the pseudocode in pvr_pds_generate_vertex_primary_program()
* before the "LAST DDMAD" mark, which explains how DDMAD(T) works.
*/
if (stride == 0) {
/* All DMA requests happen on the same address */
valid_buffer_size = 0;
} else {
uint32_t buffer_tail_size = valid_buffer_size % stride;
valid_buffer_size -= buffer_tail_size;
if (buffer_tail_size < attrib_size && valid_buffer_size != 0) {
/* The buffer tail isn't big enough for a vertex attribute fetch
* operation so the last valid DMA happens on the last whole
* stride.
*/
bound_size = stride;
valid_buffer_size -= stride;
}
}
valid_buffer_size += dma_size;
pvr_csb_pack (&control_qword, PDSINST_DDMAD_FIELDS_SRC3, src3) {
src3.test = true;
src3.msize = bound_size;
src3.msize = valid_buffer_size;
}
control_dword = (uint32_t)(control_qword >> 32);