mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2025-12-28 08:10:09 +01:00
Both per-primitive and per-vertex space is allocated in MUE in 8 dword chunks and those 8-dword chunks (granularity of 3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength) are passed to fragment shaders as inputs (either non-interpolated for per-primitive and flat vertex attributes or interpolated for non-flat vertex attributes). Some attributes have a special meaning and must be placed in separate 8/16-dword slot called Primitive Header or Vertex Header. Primitive Header contains 4 such attributes (Cull Primitive, ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword slot) potentially unused. Vertex Header is similar - it starts with 3 unused dwords, 1 dword for Point Size (but if we declare that shader doesn't produce Point Size then we can reuse it), followed by 4 dwords for Position and optionally 8 dwords for clip distances. This means we have an interesting optimization problem - we can put some user attributes into holes in Primitive and Vertex Headers, which may lead to smaller MUE size and potentially more mesh threads running in parallel, but we have to be careful to use those holes only when we need it, otherwise we could force HW to pass too much data to fragment shader. Example 1: Let's assume that Primitive Header is enabled and user defined 12 dwords of per-primitive attributes. Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader. With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to fragment shader. 16/16 is better than 24/16, so packing makes sense. Example 2: Now let's assume that Primitive Header is enabled and user defined 16 dwords of per-primitive attributes. Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader. With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to fragment shader. 24/24 is worse than 24/16, so packing doesn't make sense. This change doesn't affect vk_meshlet_cadscene in default configuration, but it speeds it up by up to 25% with "-extraattributes N", where N is some small value divisible by 2 (by default N == 1) and we are bound by URB size. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407> |
||
|---|---|---|
| .. | ||
| amd | ||
| android_stub | ||
| asahi | ||
| broadcom | ||
| c11 | ||
| compiler | ||
| drm-shim | ||
| egl | ||
| etnaviv | ||
| freedreno | ||
| gallium | ||
| gbm | ||
| getopt | ||
| glx | ||
| gtest | ||
| imagination | ||
| imgui | ||
| intel | ||
| loader | ||
| mapi | ||
| mesa | ||
| microsoft | ||
| nouveau | ||
| panfrost | ||
| tool | ||
| util | ||
| virtio | ||
| vulkan | ||
| .clang-format | ||
| meson.build | ||