mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-03-11 02:40:39 +01:00
tu: Implement subsampled images
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39868>
This commit is contained in:
parent
cc710283a7
commit
4b87df29b3
20 changed files with 2137 additions and 194 deletions
|
|
@ -36,11 +36,12 @@ This space exists whenever tiled rendering/GMEM is used, even without FDM. It
|
|||
is the space used to access GMEM, with the origin at the upper left of the
|
||||
tile. The hardware automatically transforms rendering space into GMEM space
|
||||
whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
|
||||
origin of this space will be called :math:`b_{cs}`, the common bin start, for
|
||||
reasons that are explained below. When using FDM, coordinates in this space
|
||||
must be multiplied by the scaling factor :math:`s` derived from the fragment
|
||||
density map, or equivalently divided by the fragment area (as defined by the
|
||||
Vulkan specification), with the origin still at the upper left of the tile. For
|
||||
origin of this space in rendering space, or the value of ``*_WINDOW_OFFSET``,
|
||||
will be called :math:`b_{cs}`, the common bin start, for reasons that are
|
||||
explained below. When using FDM, coordinates in this space must be multiplied
|
||||
by the scaling factor :math:`s` derived from the fragment density map, or
|
||||
equivalently divided by the fragment area (as defined by the Vulkan
|
||||
specification), with the origin still at the upper left of the tile. For
|
||||
example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
|
||||
without FDM and all coordinates in this space must be divided by 2.
|
||||
|
||||
|
|
@ -81,6 +82,104 @@ a multiple of :math:`1 / s`. This is a natural constraint anyway, because if
|
|||
it wasn't the case then the bin would start in the middle of a fragment which
|
||||
isn't possible to handle correctly.
|
||||
|
||||
Subsampled Space
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
When using subsampled images, this is the space where the bin is stored in the
|
||||
underlying subsampled image. When sampling from a subsampled image, the driver
|
||||
inserts shader code to transform from framebuffer space to subsampled space
|
||||
using metadata written when rendering to the image.
|
||||
|
||||
Accesses towards the edge of a bin may partially bleed into its neighboring bin
|
||||
with linear or bicubic sampling. If its neighbor has a different scale or isn't
|
||||
adjacent in subsampled space, we will sample the incorrect data or empty space
|
||||
and return a corrupted result. In order to handle this, we need to insert an
|
||||
"apron" around problematic edges and corners. This is done by blitting from the
|
||||
nearest neighbor of each bin after the renderpass.
|
||||
|
||||
Subsampled space is normally scaled down similar to rendering space, which is
|
||||
the point of subsampled images in the first place, but the origin of the bin
|
||||
is up to the driver. The driver chooses the origin of each bin when rendering a
|
||||
given render pass and then encodes it in the metadata used when sampling the
|
||||
image. Bins that require an apron must be far enough away from each other that
|
||||
their aprons don't intersect, and all of the bins must be contained within the
|
||||
underlying image.
|
||||
|
||||
Even when subsampled images are in use, not all bins may be subsampled. For
|
||||
example, there may not be enough space to insert aprons around every bin. When
|
||||
this is the case, subsampled space is not scaled like rendering space, that is
|
||||
we expand the bin when resolving similar to non-subsampled images, however the
|
||||
origin of the bin may still differ from framebuffer space origin.
|
||||
|
||||
The algorithm used by turnip used to calculate the bin layout in subsampled
|
||||
space is to start with a "default" layout of the bins and then recursively
|
||||
solve conflicts caused by bins whose aprons are too close together. The first
|
||||
strategy used is to shift one of the bins over by a certain amount. The second
|
||||
fallback strategy is to un-subsample both neighboring bins, making them
|
||||
expanded so that they touch each other and there is no apron.
|
||||
|
||||
One natural choice for the "default" layout is to just use rendering space.
|
||||
That is, start each bin at :math:`b_cs` by default. That mostly works, except
|
||||
for two problems. The first is easier to solve, and has to do with the border
|
||||
when sampling: it is allowed to use border colors with subsampled images, and
|
||||
when that happens and the framebuffer covers the entire image, it is expected
|
||||
that sampling around the edge correctly blends the border color and the edge
|
||||
pixel. In order for that to happen, bins that touch or intersect the edge of
|
||||
the framebuffer in framebuffer space have to be shifted over so that their edge
|
||||
touches the framebuffer edge in subsampled space too.
|
||||
|
||||
Doing this also allows an optimization: because we are storing the tile's
|
||||
contents one to one from GMEM to system memory instead of scaling it up, we can
|
||||
use the dedicated resolve engine instead of GRAS to resolve the tile to system
|
||||
memory. Normally GRAS has to be used with non-subsampled images to scale up the
|
||||
bin when resolving. However this doesn't work for tiles around the right and
|
||||
bottom edge where we have to shift over the tile to align to the edge. This
|
||||
also gets a bit tricky when the tile is shifted to avoid apron conflicts, because
|
||||
normally the resolve engine would write the tile directly without shifting.
|
||||
However there is a trick we can use to avoid falling back to GRAS: by
|
||||
overriding ``RB_RESOLVE_WINDOW_OFFSET``, we can effectively apply an offset by
|
||||
telling the resolve engine that the tile was rendered somewhere else. This
|
||||
means that the shift amount has to be aligned to the alignment of
|
||||
``RB_RESOLVE_WINDOW_OFFSET``, which is ``tile_align_*`` in the device info.
|
||||
|
||||
The other problem with making subsampled space equal rendering space is that
|
||||
with an FDM offset, rendering space can be arbitrarily larger than framebuffer
|
||||
space, and we may overflow the attachments by up to the size of a tile. The API
|
||||
is designed to allow the driver to allocate extra slop space in the image in
|
||||
this case, because there are image create flags for subsampled and FDM offset,
|
||||
however the maximum tile size is far too large and images would take up
|
||||
far too much memory if we allocated enough slop space for the largest
|
||||
possible tile. An alternative is to use a hybrid of framebuffer space and
|
||||
rendering space: shift over the tiles by :math:`b_o` so that their origin
|
||||
is :math:`b_s` instead of :math:`b_cs`, but leave them scaled down. This
|
||||
requires no slop space whatsoever, because the bins are shifted inside the
|
||||
original image, but we can no longer use the resolve engine as the tile offsets
|
||||
are no longer aligned to ``tile_align_*``. So in the driver we combine both
|
||||
approaches: we calculate an aligned offset :math:`b_o'` which is :math:`b_o`
|
||||
aligned down to ``tile_align_*`` and shift over the tiles by subtracting
|
||||
:math:`b_o'` instead of :math:`b_o`. This requires slop space, but only
|
||||
:math:`b_o - b_o'` slop space is required, which must be less than
|
||||
``tile_align_*``. As usual the first row/column are not shifted over in x/y
|
||||
respectively.
|
||||
|
||||
Here is an example of what a subsampled image looks like in memory, in this
|
||||
case without any FDM offset:
|
||||
|
||||
.. figure:: subsampled_annotated.jpg
|
||||
:alt: Example of a subsampled image
|
||||
|
||||
Note how some of the bins are shifted over to make space for the apron. After
|
||||
applying the coordinate transform when sampling, this is the final image:
|
||||
|
||||
.. figure:: subsampled_final.jpg
|
||||
:alt: Example of a subsampled image after coordinate transform
|
||||
|
||||
When ``VK_EXT_custom_resolve`` and subsampled images are used together, the
|
||||
custom resolve subpass writes directly to the subsampled image. This means that
|
||||
it needs to use subsampled space instead of rendering space, which in practice
|
||||
means replacing :math:`b_{cs}` with the origin of the bin in the subsampled
|
||||
image.
|
||||
|
||||
Viewport and Scissor Patching
|
||||
-----------------------------
|
||||
|
||||
|
|
|
|||
BIN
docs/drivers/freedreno/subsampled_annotated.jpg
Normal file
BIN
docs/drivers/freedreno/subsampled_annotated.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.6 MiB |
BIN
docs/drivers/freedreno/subsampled_final.jpg
Normal file
BIN
docs/drivers/freedreno/subsampled_final.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.9 MiB |
|
|
@ -48,6 +48,7 @@ libtu_files = files(
|
|||
'tu_rmv.cc',
|
||||
'tu_shader.cc',
|
||||
'tu_suballoc.cc',
|
||||
'tu_subsampled_image.cc',
|
||||
'tu_tile_config.cc',
|
||||
'tu_util.cc',
|
||||
)
|
||||
|
|
|
|||
|
|
@ -647,6 +647,51 @@ build_blit_vs_shader(void)
|
|||
return b->shader;
|
||||
}
|
||||
|
||||
static nir_shader *
|
||||
build_multi_blit_vs_shader(void)
|
||||
{
|
||||
nir_builder _b =
|
||||
nir_builder_init_simple_shader(MESA_SHADER_VERTEX, NULL, "multi blit vs");
|
||||
nir_builder *b = &_b;
|
||||
|
||||
nir_variable *out_pos =
|
||||
nir_create_variable_with_location(b->shader, nir_var_shader_out,
|
||||
VARYING_SLOT_POS,
|
||||
glsl_vec4_type());
|
||||
|
||||
b->shader->info.num_ubos = 1;
|
||||
|
||||
nir_def *vertex = nir_load_vertex_id(b);
|
||||
nir_def *pos_and_coords =
|
||||
nir_load_ubo(b, 4, 32, nir_imm_int(b, 0),
|
||||
nir_ishl_imm(b, vertex, 4),
|
||||
.align_mul = 16,
|
||||
.align_offset = 0,
|
||||
.range = 1 << 16);
|
||||
|
||||
nir_def *pos = nir_channels(b, pos_and_coords, 0x3);
|
||||
nir_def *coords = nir_channels(b, pos_and_coords, 0xc);
|
||||
|
||||
pos = nir_vec4(b, nir_channel(b, pos, 0),
|
||||
nir_channel(b, pos, 1),
|
||||
nir_imm_float(b, 0.0),
|
||||
nir_imm_float(b, 1.0));
|
||||
|
||||
nir_store_var(b, out_pos, pos, 0xf);
|
||||
|
||||
nir_variable *out_coords =
|
||||
nir_create_variable_with_location(b->shader, nir_var_shader_out,
|
||||
VARYING_SLOT_VAR0,
|
||||
glsl_vec_type(3));
|
||||
|
||||
coords = nir_vec3(b, nir_channel(b, coords, 0), nir_channel(b, coords, 1),
|
||||
nir_imm_float(b, 0));
|
||||
|
||||
nir_store_var(b, out_coords, coords, 0x7);
|
||||
|
||||
return b->shader;
|
||||
}
|
||||
|
||||
static nir_shader *
|
||||
build_clear_vs_shader(void)
|
||||
{
|
||||
|
|
@ -823,6 +868,7 @@ tu_init_clear_blit_shaders(struct tu_device *dev)
|
|||
{
|
||||
unsigned offset = 0;
|
||||
compile_shader(dev, build_blit_vs_shader(), 3, &offset, GLOBAL_SH_VS_BLIT);
|
||||
compile_shader(dev, build_multi_blit_vs_shader(), 3, &offset, GLOBAL_SH_VS_MULTI_BLIT);
|
||||
compile_shader(dev, build_clear_vs_shader(), 2, &offset, GLOBAL_SH_VS_CLEAR);
|
||||
compile_shader(dev, build_blit_fs_shader(false), 0, &offset, GLOBAL_SH_FS_BLIT);
|
||||
compile_shader(dev, build_blit_fs_shader(true), 0, &offset, GLOBAL_SH_FS_BLIT_ZSCALE);
|
||||
|
|
@ -846,6 +892,7 @@ tu_destroy_clear_blit_shaders(struct tu_device *dev)
|
|||
enum r3d_type {
|
||||
R3D_CLEAR,
|
||||
R3D_BLIT,
|
||||
R3D_MULTI_BLIT,
|
||||
};
|
||||
|
||||
template <chip CHIP>
|
||||
|
|
@ -855,7 +902,8 @@ r3d_common(struct tu_cmd_buffer *cmd, struct tu_cs *cs, enum r3d_type type,
|
|||
VkSampleCountFlagBits dst_samples)
|
||||
{
|
||||
enum global_shader vs_id =
|
||||
type == R3D_CLEAR ? GLOBAL_SH_VS_CLEAR : GLOBAL_SH_VS_BLIT;
|
||||
type == R3D_CLEAR ? GLOBAL_SH_VS_CLEAR :
|
||||
(type == R3D_MULTI_BLIT ? GLOBAL_SH_VS_MULTI_BLIT : GLOBAL_SH_VS_BLIT);
|
||||
|
||||
struct ir3_shader_variant *vs = cmd->device->global_shader_variants[vs_id];
|
||||
uint64_t vs_iova = cmd->device->global_shader_va[vs_id];
|
||||
|
|
@ -1056,6 +1104,49 @@ r3d_coords(struct tu_cmd_buffer *cmd,
|
|||
r3d_coords_raw(cmd, cs, coords);
|
||||
}
|
||||
|
||||
static void
|
||||
r3d_coords_multi(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
const VkRect2D *dst,
|
||||
const tu_rect2d_float *src,
|
||||
unsigned count)
|
||||
{
|
||||
struct tu_cs sub_cs;
|
||||
VkResult result =
|
||||
tu_cs_begin_sub_stream_aligned(&cmd->sub_cs, count * 2, 4, &sub_cs);
|
||||
if (result != VK_SUCCESS) {
|
||||
vk_command_buffer_set_error(&cmd->vk, result);
|
||||
return;
|
||||
}
|
||||
|
||||
for (unsigned i = 0; i < count; i++) {
|
||||
tu_cs_emit(&sub_cs, fui(dst[i].offset.x));
|
||||
tu_cs_emit(&sub_cs, fui(dst[i].offset.y));
|
||||
tu_cs_emit(&sub_cs, fui(src[i].x_start));
|
||||
tu_cs_emit(&sub_cs, fui(src[i].y_start));
|
||||
tu_cs_emit(&sub_cs, fui(dst[i].offset.x + dst[i].extent.width));
|
||||
tu_cs_emit(&sub_cs, fui(dst[i].offset.y + dst[i].extent.height));
|
||||
tu_cs_emit(&sub_cs, fui(src[i].x_end));
|
||||
tu_cs_emit(&sub_cs, fui(src[i].y_end));
|
||||
}
|
||||
|
||||
struct tu_draw_state coords_ubo = tu_cs_end_draw_state(&cmd->sub_cs,
|
||||
&sub_cs);
|
||||
|
||||
tu_cs_emit_pkt7(cs, CP_LOAD_STATE6_GEOM, 5);
|
||||
tu_cs_emit(cs,
|
||||
CP_LOAD_STATE6_0_DST_OFF(0) |
|
||||
CP_LOAD_STATE6_0_STATE_TYPE(ST6_UBO) |
|
||||
CP_LOAD_STATE6_0_STATE_SRC(SS6_DIRECT) |
|
||||
CP_LOAD_STATE6_0_STATE_BLOCK(SB6_VS_SHADER) |
|
||||
CP_LOAD_STATE6_0_NUM_UNIT(1));
|
||||
tu_cs_emit(cs, CP_LOAD_STATE6_1_EXT_SRC_ADDR(0));
|
||||
tu_cs_emit(cs, CP_LOAD_STATE6_2_EXT_SRC_ADDR_HI(0));
|
||||
tu_cs_emit_qw(cs,
|
||||
coords_ubo.iova |
|
||||
(uint64_t)A6XX_UBO_1_SIZE(count * 2) << 32);
|
||||
}
|
||||
|
||||
static void
|
||||
r3d_clear_value(struct tu_cmd_buffer *cmd, struct tu_cs *cs, enum pipe_format format, const VkClearValue *val)
|
||||
{
|
||||
|
|
@ -1290,6 +1381,7 @@ r3d_src_load(struct tu_cmd_buffer *cmd,
|
|||
struct tu_cs *cs,
|
||||
const struct tu_image_view *iview,
|
||||
uint32_t layer,
|
||||
VkFilter filter,
|
||||
bool override_swap)
|
||||
{
|
||||
uint32_t desc[FDL6_TEX_CONST_DWORDS];
|
||||
|
|
@ -1321,7 +1413,7 @@ r3d_src_load(struct tu_cmd_buffer *cmd,
|
|||
r3d_src_common<CHIP>(cmd, cs, desc,
|
||||
iview->view.layer_size * layer,
|
||||
iview->view.ubwc_layer_size * layer,
|
||||
VK_FILTER_NEAREST);
|
||||
filter);
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
|
|
@ -1331,7 +1423,7 @@ r3d_src_gmem_load(struct tu_cmd_buffer *cmd,
|
|||
const struct tu_image_view *iview,
|
||||
uint32_t layer)
|
||||
{
|
||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, true);
|
||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST, true);
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
|
|
@ -1339,9 +1431,10 @@ static void
|
|||
r3d_src_sysmem_load(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
const struct tu_image_view *iview,
|
||||
uint32_t layer)
|
||||
uint32_t layer,
|
||||
VkFilter filter)
|
||||
{
|
||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, false);
|
||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, filter, false);
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
|
|
@ -1594,6 +1687,9 @@ enum r3d_blit_param {
|
|||
R3D_Z_SCALE = 1 << 0,
|
||||
R3D_DST_GMEM = 1 << 1,
|
||||
R3D_COPY = 1 << 2,
|
||||
R3D_USE_MULTI_BLIT = 1 << 3,
|
||||
R3D_OUTSIDE_PASS = 1 << 4,
|
||||
R3D_OVERLAPPING = 1 << 5,
|
||||
};
|
||||
|
||||
template <chip CHIP>
|
||||
|
|
@ -1617,7 +1713,7 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
|||
blit_param & R3D_DST_GMEM);
|
||||
fixup_dst_format(src_format, &dst_format, &fmt);
|
||||
|
||||
if (!cmd->state.pass) {
|
||||
if (!cmd->state.pass || (blit_param & R3D_OUTSIDE_PASS)) {
|
||||
tu_emit_cache_flush_ccu<CHIP>(cmd, cs, TU_CMD_CCU_SYSMEM);
|
||||
tu6_emit_window_scissor<CHIP>(cs, 0, 0, 0x3fff, 0x3fff);
|
||||
if (cmd->device->physical_device->info->props.has_hw_bin_scaling) {
|
||||
|
|
@ -1651,7 +1747,8 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
}
|
||||
|
||||
const enum r3d_type type = (clear) ? R3D_CLEAR : R3D_BLIT;
|
||||
const enum r3d_type type = (clear) ? R3D_CLEAR :
|
||||
((blit_param & R3D_USE_MULTI_BLIT) ? R3D_MULTI_BLIT : R3D_BLIT);
|
||||
r3d_common<CHIP>(cmd, cs, type, 1, blit_param & R3D_Z_SCALE, src_samples,
|
||||
dst_samples);
|
||||
|
||||
|
|
@ -1696,7 +1793,17 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
|||
tu_cs_emit_regs(cs, GRAS_VRS_CONFIG(CHIP));
|
||||
}
|
||||
|
||||
tu_cs_emit_regs(cs, GRAS_SC_CNTL(CHIP, .ccusinglecachelinesize = 2));
|
||||
/* We need to handle overlapping blits the same as feedback loops, which
|
||||
* means setting this bit to avoid corruption due to UBWC flag caches
|
||||
* becoming desynchronized. On a7xx+ UBWC caches are coherent.
|
||||
*/
|
||||
enum a6xx_single_prim_mode prim_mode =
|
||||
CHIP == A6XX && (blit_param & R3D_OVERLAPPING) && ubwc ?
|
||||
FLUSH_PER_OVERLAP_AND_OVERWRITE : NO_FLUSH;
|
||||
|
||||
tu_cs_emit_regs(cs, GRAS_SC_CNTL(CHIP,
|
||||
.single_prim_mode = prim_mode,
|
||||
.ccusinglecachelinesize = 2));
|
||||
|
||||
/* Disable sample counting in order to not affect occlusion query. */
|
||||
tu_cs_emit_regs(cs, A6XX_RB_SAMPLE_COUNTER_CNTL(.disable = true));
|
||||
|
|
@ -1738,6 +1845,17 @@ r3d_run_vis(struct tu_cmd_buffer *cmd, struct tu_cs *cs)
|
|||
tu_cs_emit(cs, 2); /* vertex count */
|
||||
}
|
||||
|
||||
static void
|
||||
r3d_run_multi(struct tu_cmd_buffer *cmd, struct tu_cs *cs, unsigned count)
|
||||
{
|
||||
tu_cs_emit_pkt7(cs, CP_DRAW_INDX_OFFSET, 3);
|
||||
tu_cs_emit(cs, CP_DRAW_INDX_OFFSET_0_PRIM_TYPE(DI_PT_RECTLIST) |
|
||||
CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT(DI_SRC_SEL_AUTO_INDEX) |
|
||||
CP_DRAW_INDX_OFFSET_0_VIS_CULL(IGNORE_VISIBILITY));
|
||||
tu_cs_emit(cs, 1); /* instance count */
|
||||
tu_cs_emit(cs, count * 2); /* vertex count */
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
r3d_teardown(struct tu_cmd_buffer *cmd, struct tu_cs *cs)
|
||||
|
|
@ -3620,12 +3738,6 @@ tu_CmdResolveImage2(VkCommandBuffer commandBuffer,
|
|||
}
|
||||
TU_GENX(tu_CmdResolveImage2);
|
||||
|
||||
#define for_each_layer(layer, layer_mask, layers) \
|
||||
for (uint32_t layer = 0; \
|
||||
layer < ((layer_mask) ? (util_logbase2(layer_mask) + 1) : layers); \
|
||||
layer++) \
|
||||
if (!layer_mask || (layer_mask & BIT(layer)))
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
resolve_sysmem(struct tu_cmd_buffer *cmd,
|
||||
|
|
@ -3673,7 +3785,7 @@ resolve_sysmem(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
} else {
|
||||
if (ops == &r3d_ops<CHIP>) {
|
||||
r3d_src_sysmem_load<CHIP>(cmd, cs, src, i);
|
||||
r3d_src_sysmem_load<CHIP>(cmd, cs, src, i, VK_FILTER_NEAREST);
|
||||
} else {
|
||||
ops->src(cmd, cs, &src->view, i, VK_FILTER_NEAREST, dst_format);
|
||||
}
|
||||
|
|
@ -4984,6 +5096,124 @@ tu7_generic_clear_attachment(struct tu_cmd_buffer *cmd,
|
|||
trace_end_generic_clear(&cmd->rp_trace, cs);
|
||||
}
|
||||
|
||||
/* Transform the render area from framebuffer space to subsampled space. Be
|
||||
* conservative if the render area partially covers a fragment.
|
||||
*/
|
||||
static VkRect2D
|
||||
transform_render_area(VkRect2D render_area, const struct tu_tile_config *tile,
|
||||
const VkRect2D *bins, unsigned view)
|
||||
{
|
||||
/* Calculate transform from framebuffer space to subsampled space.
|
||||
*/
|
||||
VkExtent2D frag_area = (tile->subsampled_views & (1u << view)) ?
|
||||
tile->frag_areas[view] : (VkExtent2D) { 1, 1 };
|
||||
|
||||
VkOffset2D offset = {
|
||||
tile->subsampled_pos[view].offset.x -
|
||||
bins[view].offset.x / frag_area.width,
|
||||
tile->subsampled_pos[view].offset.y -
|
||||
bins[view].offset.y / frag_area.height,
|
||||
};
|
||||
|
||||
/* In the unlikely case subsampling was disabled due to running out of
|
||||
* tiles, don't transform the render area.
|
||||
*/
|
||||
if (!tile->subsampled)
|
||||
offset = (VkOffset2D) { 0, 0 };
|
||||
|
||||
unsigned x1 =
|
||||
render_area.offset.x / frag_area.width + offset.x;
|
||||
unsigned x2 =
|
||||
DIV_ROUND_UP(render_area.offset.x + render_area.extent.width,
|
||||
frag_area.width) + offset.x;
|
||||
unsigned y1 =
|
||||
render_area.offset.y / frag_area.height + offset.y;
|
||||
unsigned y2 =
|
||||
DIV_ROUND_UP(render_area.offset.y + render_area.extent.height,
|
||||
frag_area.height) + offset.y;
|
||||
|
||||
return (VkRect2D) {
|
||||
{ x1, y1 }, { x2 - x1, y2 - y1 }
|
||||
};
|
||||
}
|
||||
|
||||
struct apply_blit_scissor_state {
|
||||
unsigned view;
|
||||
VkRect2D render_area;
|
||||
};
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
fdm_apply_blit_scissor(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
void *data,
|
||||
VkOffset2D common_bin_offset,
|
||||
const VkOffset2D *hw_viewport_offsets,
|
||||
unsigned views,
|
||||
const struct tu_tile_config *tile,
|
||||
const VkRect2D *bins,
|
||||
bool binning)
|
||||
{
|
||||
struct tu_physical_device *phys_dev = cmd->device->physical_device;
|
||||
const struct apply_blit_scissor_state *state =
|
||||
(const struct apply_blit_scissor_state *)data;
|
||||
unsigned view = MIN2(state->view, views - 1);
|
||||
|
||||
VkRect2D subsampled_render_area =
|
||||
transform_render_area(state->render_area, tile, bins, view);
|
||||
VkOffset2D pos = tile->subsampled ?
|
||||
tile->subsampled_pos[view].offset : common_bin_offset;
|
||||
|
||||
VkRect2D scissor = subsampled_render_area;
|
||||
if (tile->subsampled) {
|
||||
/* Intersect the render area with the subsampled tile. We don't want to
|
||||
* store the whole unscaled tile, and the unscaled tile may jut into the
|
||||
* next tile.
|
||||
*/
|
||||
scissor.offset.x = MAX2(scissor.offset.x, tile->subsampled_pos[view].offset.x);
|
||||
scissor.offset.y = MAX2(scissor.offset.y, tile->subsampled_pos[view].offset.y);
|
||||
scissor.extent.width =
|
||||
MIN2(subsampled_render_area.offset.x +
|
||||
subsampled_render_area.extent.width,
|
||||
tile->subsampled_pos[view].offset.x +
|
||||
tile->subsampled_pos[view].extent.width) - scissor.offset.x;
|
||||
scissor.extent.height =
|
||||
MIN2(subsampled_render_area.offset.y +
|
||||
subsampled_render_area.extent.height,
|
||||
tile->subsampled_pos[view].offset.y +
|
||||
tile->subsampled_pos[view].extent.height) - scissor.offset.y;
|
||||
}
|
||||
|
||||
if (bins[view].extent.width == 0 && bins[view].extent.height == 0) {
|
||||
tu_cs_emit_regs(cs,
|
||||
A6XX_RB_RESOLVE_CNTL_1(.x = 1, .y = 1),
|
||||
A6XX_RB_RESOLVE_CNTL_2(.x = 0, .y = 0));
|
||||
tu_cs_emit_regs(cs,
|
||||
A6XX_RB_RESOLVE_WINDOW_OFFSET(.x = 0, .y = 0));
|
||||
} else {
|
||||
/* Note: we will not dynamically enable CCU_RESOLVE for stores unless the
|
||||
* offset is aligned, but this patchpoint will be executed anyway so we
|
||||
* have to do something and not assert in the builder.
|
||||
*/
|
||||
uint32_t x1 = scissor.offset.x &
|
||||
~(phys_dev->info->gmem_align_w - 1);
|
||||
uint32_t y1 = scissor.offset.y &
|
||||
~(phys_dev->info->gmem_align_h - 1);
|
||||
uint32_t x2 = ALIGN_POT(scissor.offset.x +
|
||||
scissor.extent.width,
|
||||
phys_dev->info->gmem_align_w) - 1;
|
||||
uint32_t y2 = ALIGN_POT(scissor.offset.y +
|
||||
scissor.extent.height,
|
||||
phys_dev->info->gmem_align_h) - 1;
|
||||
|
||||
tu_cs_emit_regs(cs,
|
||||
A6XX_RB_RESOLVE_CNTL_1(.x = x1, .y = y1),
|
||||
A6XX_RB_RESOLVE_CNTL_2(.x = x2, .y = y2));
|
||||
tu_cs_emit_regs(cs,
|
||||
A6XX_RB_RESOLVE_WINDOW_OFFSET(.x = pos.x, .y = pos.y));
|
||||
}
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
tu_emit_blit(struct tu_cmd_buffer *cmd,
|
||||
|
|
@ -5041,8 +5271,17 @@ tu_emit_blit(struct tu_cmd_buffer *cmd,
|
|||
event_blit_setup(cs, buffer_id, attachment, blit_event_type, clear_mask);
|
||||
|
||||
for_each_layer(i, attachment->used_views, cmd->state.framebuffer->layers) {
|
||||
if (scissor_per_layer)
|
||||
if (cmd->state.pass->has_fdm && cmd->state.fdm_subsampled) {
|
||||
struct apply_blit_scissor_state state = {
|
||||
.view = i,
|
||||
.render_area = scissor_per_layer ?
|
||||
cmd->state.render_areas[i] : cmd->state.render_areas[0],
|
||||
};
|
||||
tu_create_fdm_bin_patchpoint(cmd, cs, 5, TU_FDM_SKIP_BINNING,
|
||||
fdm_apply_blit_scissor<CHIP>, state);
|
||||
} else if (scissor_per_layer) {
|
||||
tu6_emit_blit_scissor(cmd, cs, i, align_scissor);
|
||||
}
|
||||
event_blit_dst_view blt_view = blt_view_from_tu_view(iview, i);
|
||||
event_blit_run<CHIP>(cmd, cs, attachment, &blt_view, separate_stencil);
|
||||
}
|
||||
|
|
@ -5331,7 +5570,8 @@ store_cp_blit(struct tu_cmd_buffer *cmd,
|
|||
{
|
||||
r2d_setup_common<CHIP>(cmd, cs, src_format, dst_format,
|
||||
VK_IMAGE_ASPECT_COLOR_BIT, 0, false,
|
||||
dst_iview->view.ubwc_enabled, true);
|
||||
dst_iview->view.ubwc_enabled,
|
||||
true);
|
||||
|
||||
if (dst_iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||
if (!separate_stencil) {
|
||||
|
|
@ -5509,13 +5749,16 @@ tu_attachment_store_unaligned(struct tu_cmd_buffer *cmd, uint32_t a)
|
|||
if (TU_DEBUG(UNALIGNED_STORE))
|
||||
return true;
|
||||
|
||||
/* We always use the unaligned store path when scaling rendering. */
|
||||
if (cmd->state.pass->has_fdm)
|
||||
return true;
|
||||
|
||||
unsigned render_area_count =
|
||||
cmd->state.per_layer_render_area ? cmd->state.pass->num_views : 1;
|
||||
|
||||
/* With subsampling, the formula below doesn't work, but we already
|
||||
* conditionally use A2D for the unaligned blits at the edge. Just return
|
||||
* false here.
|
||||
*/
|
||||
if (cmd->state.fdm_subsampled)
|
||||
return false;
|
||||
|
||||
for (unsigned i = 0; i < render_area_count; i++) {
|
||||
const VkRect2D *render_area = &cmd->state.render_areas[i];
|
||||
uint32_t x1 = render_area->offset.x;
|
||||
|
|
@ -5564,6 +5807,9 @@ tu_choose_gmem_layout(struct tu_cmd_buffer *cmd)
|
|||
{
|
||||
cmd->state.gmem_layout = TU_GMEM_LAYOUT_FULL;
|
||||
|
||||
if (cmd->state.pass->has_fdm)
|
||||
cmd->state.gmem_layout = TU_GMEM_LAYOUT_AVOID_CCU;
|
||||
|
||||
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||
if (!cmd->state.attachments[i])
|
||||
continue;
|
||||
|
|
@ -5620,8 +5866,9 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
|||
{
|
||||
const struct apply_store_coords_state *state =
|
||||
(const struct apply_store_coords_state *)data;
|
||||
VkExtent2D frag_area = tile->frag_areas[MIN2(state->view, views - 1)];
|
||||
VkRect2D bin = bins[MIN2(state->view, views - 1)];
|
||||
unsigned view = MIN2(state->view, views - 1);
|
||||
VkExtent2D frag_area = tile->frag_areas[view];
|
||||
VkRect2D bin = bins[view];
|
||||
|
||||
/* The bin width/height must be a multiple of the frag_area to make sure
|
||||
* that the scaling happens correctly. This means there may be some
|
||||
|
|
@ -5643,10 +5890,22 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
|||
GRAS_A2D_SRC_YMIN(CHIP, 1),
|
||||
GRAS_A2D_SRC_YMAX(CHIP, 0));
|
||||
} else {
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_DEST_TL(CHIP, .x = bin.offset.x, .y = bin.offset.y),
|
||||
GRAS_A2D_DEST_BR(CHIP, .x = bin.offset.x + bin.extent.width - 1,
|
||||
.y = bin.offset.y + bin.extent.height - 1));
|
||||
VkOffset2D start =
|
||||
tile->subsampled ? tile->subsampled_pos[view].offset : bin.offset;
|
||||
if (tile->subsampled_views & (1u << view)) {
|
||||
/* Subsampled blits don't scale up the bin, and go to the subsampled
|
||||
* destination.
|
||||
*/
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_DEST_TL(CHIP, .x = start.x, .y = start.y),
|
||||
GRAS_A2D_DEST_BR(CHIP, .x = start.x + scaled_width - 1,
|
||||
.y = start.y + scaled_height - 1));
|
||||
} else {
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_DEST_TL(CHIP, .x = start.x, .y = start.y),
|
||||
GRAS_A2D_DEST_BR(CHIP, .x = start.x + bin.extent.width - 1,
|
||||
.y = start.y + bin.extent.height - 1));
|
||||
}
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_SRC_XMIN(CHIP, common_bin_offset.x),
|
||||
GRAS_A2D_SRC_XMAX(CHIP, common_bin_offset.x + scaled_width - 1),
|
||||
|
|
@ -5655,6 +5914,45 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
}
|
||||
|
||||
struct apply_render_area_state {
|
||||
unsigned view;
|
||||
VkRect2D render_area;
|
||||
};
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
fdm_apply_render_area(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
void *data,
|
||||
VkOffset2D common_bin_offset,
|
||||
const VkOffset2D *hw_viewport_offsets,
|
||||
unsigned views,
|
||||
const struct tu_tile_config *tile,
|
||||
const VkRect2D *bins,
|
||||
bool binning)
|
||||
{
|
||||
struct apply_render_area_state *state =
|
||||
(struct apply_render_area_state *)data;
|
||||
|
||||
unsigned view = MIN2(state->view, views - 1);
|
||||
|
||||
VkRect2D subsampled_render_area =
|
||||
transform_render_area(state->render_area, tile, bins, view);
|
||||
|
||||
unsigned x1 = subsampled_render_area.offset.x;
|
||||
unsigned x2 = subsampled_render_area.offset.x +
|
||||
subsampled_render_area.extent.width - 1;
|
||||
unsigned y1 = subsampled_render_area.offset.y;
|
||||
unsigned y2 = subsampled_render_area.offset.y +
|
||||
subsampled_render_area.extent.height - 1;
|
||||
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_SCISSOR_TL(CHIP, .x = x1,
|
||||
.y = y1,),
|
||||
GRAS_A2D_SCISSOR_BR(CHIP, .x = x2,
|
||||
.y = y2,));
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
void
|
||||
tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||
|
|
@ -5703,7 +6001,10 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
|
||||
bool use_fast_path = !unaligned && !mismatched_mutability &&
|
||||
!resolve_d24s8_s8 &&
|
||||
(a == gmem_a || blit_can_resolve(dst->format));
|
||||
(a == gmem_a || blit_can_resolve(dst->format)) &&
|
||||
(!cmd->state.pass->has_fdm || CHIP >= A7XX);
|
||||
|
||||
bool fast_path_conditional = use_fast_path && cmd->state.pass->has_fdm;
|
||||
|
||||
trace_start_gmem_store(&cmd->rp_trace, cs, cmd, dst->format, use_fast_path, unaligned);
|
||||
|
||||
|
|
@ -5717,6 +6018,11 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
|
||||
/* use fast path when render area is aligned, except for unsupported resolve cases */
|
||||
if (use_fast_path) {
|
||||
if (fast_path_conditional) {
|
||||
tu_cond_exec_start(cs, CP_COND_REG_EXEC_0_MODE(PRED_TEST) |
|
||||
CP_COND_REG_EXEC_0_PRED_BIT(TU_PREDICATE_FAST_STORE));
|
||||
}
|
||||
|
||||
if (store_common)
|
||||
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
||||
BLIT_EVENT_STORE, per_layer_render_area, true, false);
|
||||
|
|
@ -5724,16 +6030,25 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
||||
BLIT_EVENT_STORE, per_layer_render_area, true, true);
|
||||
|
||||
if (cond_exec) {
|
||||
tu_end_load_store_cond_exec(cmd, cs, false);
|
||||
}
|
||||
if (fast_path_conditional) {
|
||||
tu_cond_exec_end(cs);
|
||||
} else {
|
||||
if (cond_exec) {
|
||||
tu_end_load_store_cond_exec(cmd, cs, false);
|
||||
}
|
||||
|
||||
trace_end_gmem_store(&cmd->rp_trace, cs);
|
||||
return;
|
||||
trace_end_gmem_store(&cmd->rp_trace, cs);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
assert(cmd->state.gmem_layout == TU_GMEM_LAYOUT_AVOID_CCU);
|
||||
|
||||
if (fast_path_conditional) {
|
||||
tu_cond_exec_start(cs, CP_COND_REG_EXEC_0_MODE(PRED_TEST) |
|
||||
CP_COND_REG_EXEC_0_PRED_BIT(TU_PREDICATE_NO_FAST_STORE));
|
||||
}
|
||||
|
||||
enum pipe_format src_format = vk_format_to_pipe_format(src->format);
|
||||
if (src_format == PIPE_FORMAT_Z32_FLOAT_S8X24_UINT)
|
||||
src_format = PIPE_FORMAT_Z32_FLOAT;
|
||||
|
|
@ -5773,7 +6088,7 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
if (!cmd->state.pass->has_fdm) {
|
||||
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
||||
render_area->extent);
|
||||
} else {
|
||||
} else if (!cmd->state.fdm_subsampled) {
|
||||
/* Usually GRAS_2D_RESOLVE_CNTL_* clips the destination to the bin
|
||||
* area and the coordinates span the entire render area, but for
|
||||
* FDM we need to scale the coordinates so we need to take the
|
||||
|
|
@ -5795,7 +6110,7 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
if (!cmd->state.pass->has_fdm) {
|
||||
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
||||
render_area->extent);
|
||||
} else {
|
||||
} else if (!cmd->state.fdm_subsampled) {
|
||||
tu_cs_emit_regs(cs,
|
||||
GRAS_A2D_SCISSOR_TL(CHIP, .x = render_area->offset.x,
|
||||
.y = render_area->offset.y,),
|
||||
|
|
@ -5805,6 +6120,17 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
|
||||
if (cmd->state.pass->has_fdm) {
|
||||
if (cmd->state.fdm_subsampled) {
|
||||
struct apply_render_area_state state {
|
||||
.view = i,
|
||||
.render_area =
|
||||
per_layer_render_area ? cmd->state.render_areas[i] :
|
||||
cmd->state.render_areas[0],
|
||||
};
|
||||
tu_create_fdm_bin_patchpoint(cmd, cs, 3, TU_FDM_SKIP_BINNING,
|
||||
fdm_apply_render_area<CHIP>,
|
||||
state);
|
||||
}
|
||||
struct apply_store_coords_state state = {
|
||||
.view = i,
|
||||
};
|
||||
|
|
@ -5822,6 +6148,9 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
}
|
||||
|
||||
if (fast_path_conditional)
|
||||
tu_cond_exec_end(cs);
|
||||
|
||||
if (cond_exec) {
|
||||
tu_end_load_store_cond_exec(cmd, cs, false);
|
||||
}
|
||||
|
|
@ -5829,3 +6158,71 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
|||
trace_end_gmem_store(&cmd->rp_trace, cs);
|
||||
}
|
||||
TU_GENX(tu_store_gmem_attachment);
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
const struct tu_image_view *iview,
|
||||
enum VkFormat vk_format,
|
||||
unsigned layer,
|
||||
const VkRect2D *dst_coord,
|
||||
const tu_rect2d_float *src_coord,
|
||||
unsigned count)
|
||||
{
|
||||
enum pipe_format format = vk_format_to_pipe_format(vk_format);
|
||||
|
||||
r3d_setup<CHIP>(cmd, cs, format, format, VK_IMAGE_ASPECT_COLOR_BIT,
|
||||
R3D_USE_MULTI_BLIT | R3D_OUTSIDE_PASS | R3D_OVERLAPPING,
|
||||
false, iview->image->layout[0].ubwc,
|
||||
VK_SAMPLE_COUNT_1_BIT, VK_SAMPLE_COUNT_1_BIT);
|
||||
|
||||
for (unsigned i = 0; i < count; i++) {
|
||||
assert(dst_coord[i].offset.x + dst_coord[i].extent.width <=
|
||||
iview->image->layout[0].width0);
|
||||
assert(dst_coord[i].offset.y + dst_coord[i].extent.height <=
|
||||
iview->image->layout[0].height0);
|
||||
}
|
||||
|
||||
r3d_coords_multi(cmd, cs, dst_coord, src_coord, count);
|
||||
|
||||
if (iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||
if (vk_format == VK_FORMAT_D32_SFLOAT) {
|
||||
r3d_src_stencil<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||
r3d_dst_stencil<CHIP>(cs, iview, layer);
|
||||
} else {
|
||||
r3d_src_depth<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||
r3d_dst_depth<CHIP>(cs, iview, layer);
|
||||
}
|
||||
} else {
|
||||
r3d_src_sysmem_load<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||
r3d_dst<CHIP>(cs, &iview->view, layer, format);
|
||||
}
|
||||
|
||||
r3d_run_multi(cmd, cs, count);
|
||||
|
||||
r3d_teardown<CHIP>(cmd, cs);
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
void
|
||||
tu_blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
const struct tu_image_view *iview,
|
||||
unsigned layer,
|
||||
const VkRect2D *dst_coord,
|
||||
const tu_rect2d_float *src_coord,
|
||||
unsigned count)
|
||||
{
|
||||
if (iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||
blit_subsampled_apron<CHIP>(cmd, cs, iview, VK_FORMAT_D32_SFLOAT, layer,
|
||||
dst_coord, src_coord, count);
|
||||
blit_subsampled_apron<CHIP>(cmd, cs, iview, VK_FORMAT_S8_UINT, layer,
|
||||
dst_coord, src_coord, count);
|
||||
} else {
|
||||
blit_subsampled_apron<CHIP>(cmd, cs, iview, iview->vk.format, layer,
|
||||
dst_coord, src_coord, count);
|
||||
}
|
||||
}
|
||||
TU_GENX(tu_blit_subsampled_apron);
|
||||
|
||||
|
|
|
|||
|
|
@ -100,4 +100,14 @@ tu_cmd_fill_buffer_addr(VkCommandBuffer commandBuffer,
|
|||
VkDeviceSize fillSize,
|
||||
uint32_t data);
|
||||
|
||||
template <chip CHIP>
|
||||
void
|
||||
tu_blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
const struct tu_image_view *iview,
|
||||
unsigned layer,
|
||||
const VkRect2D *dst_coord,
|
||||
const tu_rect2d_float *src_coord,
|
||||
unsigned count);
|
||||
|
||||
#endif /* TU_CLEAR_BLIT_H */
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@
|
|||
#include "tu_knl.h"
|
||||
#include "tu_tile_config.h"
|
||||
#include "tu_tracepoints.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
|
||||
#include "common/freedreno_gpu_event.h"
|
||||
#include "common/freedreno_lrz.h"
|
||||
|
|
@ -1733,6 +1734,29 @@ tu6_emit_tile_select(struct tu_cmd_buffer *cmd,
|
|||
}
|
||||
}
|
||||
|
||||
if (CHIP >= A7XX) {
|
||||
/* Without FDM offset, b_s = b_cs which is always aligned. With FDM
|
||||
* offset, none may be aligned. With FDM offset, it may not be
|
||||
* aligned. However with FDM offset and subsampled, we shift the
|
||||
* subsampled coordinates to align the bins, so we can enable the
|
||||
* fast path except for the last row/column where the end has to be
|
||||
* aligned to the framebuffer end.
|
||||
*
|
||||
* We don't just directly check for aligned-ness because that depends
|
||||
* on the actual offset, and signficantly changing the performance
|
||||
* could result in jank between frames as the offset changes.
|
||||
*/
|
||||
bool use_fast_store = (!fdm_offsets && !bin_scale_en) ||
|
||||
(tile->subsampled_views == tile->visible_views &&
|
||||
!tile->subsampled_border);
|
||||
|
||||
tu7_set_pred_mask(cs, (1u << TU_PREDICATE_FAST_STORE) |
|
||||
(1u << TU_PREDICATE_NO_FAST_STORE),
|
||||
(1u << (use_fast_store ?
|
||||
TU_PREDICATE_FAST_STORE :
|
||||
TU_PREDICATE_NO_FAST_STORE)));
|
||||
}
|
||||
|
||||
util_dynarray_foreach (&cmd->fdm_bin_patchpoints,
|
||||
struct tu_fdm_bin_patchpoint, patch) {
|
||||
tu_cs_emit_pkt7(cs, CP_MEM_WRITE, 2 + patch->size);
|
||||
|
|
@ -2951,6 +2975,16 @@ tu_renderpass_begin(struct tu_cmd_buffer *cmd)
|
|||
MESA_VK_DYNAMIC_IA_PRIMITIVE_RESTART_ENABLE);
|
||||
|
||||
cmd->state.fdm_enabled = cmd->state.pass->has_fdm;
|
||||
|
||||
cmd->state.fdm_subsampled = false;
|
||||
|
||||
for (unsigned i = 0; i < cmd->state.framebuffer->attachment_count; i++) {
|
||||
const struct tu_image_view *iview = cmd->state.attachments[i];
|
||||
if (iview && (iview->image->vk.create_flags &
|
||||
VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT)) {
|
||||
cmd->state.fdm_subsampled = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static inline bool
|
||||
|
|
@ -3169,6 +3203,18 @@ tu6_sysmem_render_end(struct tu_cmd_buffer *cmd, struct tu_cs *cs,
|
|||
tu_cs_emit_pkt7(cs, CP_SKIP_IB2_ENABLE_GLOBAL, 1);
|
||||
tu_cs_emit(cs, 0x0);
|
||||
|
||||
if (cmd->state.fdm_subsampled) {
|
||||
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||
cmd->state.pass->attachments[i].store) {
|
||||
/* emit dummy subsampled metadata since we didn't use FDM */
|
||||
tu_emit_subsampled_metadata(cmd, &cmd->cs, i,
|
||||
NULL, NULL, NULL,
|
||||
cmd->state.framebuffer, NULL);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
tu_lrz_sysmem_end<CHIP>(cmd, cs);
|
||||
|
||||
/* Clear the resource list for any LRZ resources we emitted at the
|
||||
|
|
@ -3651,6 +3697,73 @@ tu_allocate_transient_attachments(struct tu_cmd_buffer *cmd, bool sysmem)
|
|||
return VK_SUCCESS;
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
tu_emit_subsampled(struct tu_cmd_buffer *cmd,
|
||||
const struct tu_tile_config *tiles,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const struct tu_framebuffer *fb,
|
||||
const VkOffset2D *fdm_offsets)
|
||||
{
|
||||
struct tu_cs *cs = &cmd->cs;
|
||||
|
||||
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||
cmd->state.pass->attachments[i].store) {
|
||||
tu_emit_subsampled_metadata(cmd, cs, i,
|
||||
tiles, tiling, vsc,
|
||||
cmd->state.framebuffer,
|
||||
fdm_offsets);
|
||||
}
|
||||
}
|
||||
|
||||
/* We may have subsampled images without FDM if FDM is disabled due to
|
||||
* multisampled loads/stores, in which case we only need to emit the
|
||||
* metadata.
|
||||
*/
|
||||
if (!tiles)
|
||||
return;
|
||||
|
||||
/* Flush for GMEM -> UCHE */
|
||||
cmd->state.cache.pending_flush_bits |=
|
||||
TU_CMD_FLAG_CACHE_INVALIDATE |
|
||||
TU_CMD_FLAG_WAIT_FOR_IDLE;
|
||||
|
||||
VkRect2D *dst =
|
||||
(VkRect2D *)malloc(8 * vsc->tile_count.width * vsc->tile_count.height *
|
||||
(sizeof(VkRect2D) + sizeof(struct tu_rect2d_float)));
|
||||
struct tu_rect2d_float *src =
|
||||
(struct tu_rect2d_float *)(dst + 8 * vsc->tile_count.width * vsc->tile_count.height);
|
||||
unsigned count;
|
||||
|
||||
/* Iterate over layers and then attachments so that we don't recompute the
|
||||
* list of areas to copy for each attachment.
|
||||
*/
|
||||
for (unsigned layer = 0; layer < MAX2(cmd->state.pass->num_views,
|
||||
fb->layers); layer++) {
|
||||
unsigned view = fb->layers > 1 ?
|
||||
(cmd->state.fdm_per_layer ? layer : 0) : layer;
|
||||
count = tu_calc_subsampled_aprons(dst, src, view, tiles, tiling, vsc, fb,
|
||||
fdm_offsets);
|
||||
|
||||
if (count != 0) {
|
||||
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||
cmd->state.pass->attachments[i].store &&
|
||||
(cmd->state.pass->num_views == 0 ||
|
||||
(cmd->state.pass->attachments[i].used_views & (1u << layer)) ||
|
||||
(cmd->state.pass->attachments[i].resolve_views & (1u << layer)))) {
|
||||
tu_blit_subsampled_apron<CHIP>(cmd, cs, cmd->state.attachments[i],
|
||||
layer, dst, src, count);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
free(dst);
|
||||
}
|
||||
|
||||
template <chip CHIP>
|
||||
static void
|
||||
tu_cmd_render_tiles(struct tu_cmd_buffer *cmd,
|
||||
|
|
@ -3750,6 +3863,17 @@ tu_cmd_render_tiles(struct tu_cmd_buffer *cmd,
|
|||
|
||||
tu6_tile_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
||||
|
||||
/* Outside of renderpasses we assume all draw states are disabled. We do
|
||||
* this outside the draw CS for the normal case where 3d gmem stores aren't
|
||||
* used. Do this before emitting subsampled blits.
|
||||
*/
|
||||
tu_disable_draw_states(cmd, &cmd->cs);
|
||||
|
||||
if (cmd->state.fdm_subsampled) {
|
||||
tu_emit_subsampled<CHIP>(cmd, tiles, tiling, vsc, cmd->state.framebuffer,
|
||||
fdm_offsets);
|
||||
}
|
||||
|
||||
tu_trace_end_render_pass<CHIP>(cmd, true);
|
||||
|
||||
/* We have trashed the dynamically-emitted viewport, scissor, and FS params
|
||||
|
|
@ -3791,6 +3915,9 @@ tu_cmd_render_sysmem(struct tu_cmd_buffer *cmd,
|
|||
|
||||
tu6_sysmem_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
||||
|
||||
/* Outside of renderpasses we assume all draw states are disabled. */
|
||||
tu_disable_draw_states(cmd, &cmd->cs);
|
||||
|
||||
tu_clone_trace_range(cmd, &cmd->cs, &cmd->trace,
|
||||
cmd->trace_renderpass_start,
|
||||
u_trace_end_iterator(&cmd->rp_trace));
|
||||
|
|
@ -3811,13 +3938,6 @@ tu_cmd_render(struct tu_cmd_buffer *cmd_buffer,
|
|||
tu_cmd_render_sysmem<CHIP>(cmd_buffer, autotune_result);
|
||||
else
|
||||
tu_cmd_render_tiles<CHIP>(cmd_buffer, autotune_result, fdm_offsets);
|
||||
|
||||
/* Outside of renderpasses we assume all draw states are disabled. We do
|
||||
* this outside the draw CS for the normal case where 3d gmem stores aren't
|
||||
* used.
|
||||
*/
|
||||
tu_disable_draw_states(cmd_buffer, &cmd_buffer->cs);
|
||||
|
||||
}
|
||||
|
||||
static void tu_reset_render_pass(struct tu_cmd_buffer *cmd_buffer)
|
||||
|
|
@ -5907,7 +6027,8 @@ tu_restore_suspended_pass(struct tu_cmd_buffer *cmd,
|
|||
memcpy(cmd->state.render_areas,
|
||||
suspended->state.suspended_pass.render_areas,
|
||||
sizeof(cmd->state.render_areas));
|
||||
cmd->state.per_layer_render_area = suspended->state.per_layer_render_area;
|
||||
cmd->state.per_layer_render_area = suspended->state.suspended_pass.per_layer_render_area;
|
||||
cmd->state.fdm_subsampled = suspended->state.suspended_pass.fdm_subsampled;
|
||||
cmd->state.gmem_layout = suspended->state.suspended_pass.gmem_layout;
|
||||
cmd->state.tiling = &cmd->state.framebuffer->tiling[cmd->state.gmem_layout];
|
||||
cmd->state.lrz = suspended->state.suspended_pass.lrz;
|
||||
|
|
@ -6903,6 +7024,7 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
|||
tu_lrz_begin_renderpass<CHIP>(cmd);
|
||||
}
|
||||
|
||||
tu_renderpass_begin(cmd);
|
||||
|
||||
if (suspending) {
|
||||
cmd->state.suspended_pass.pass = cmd->state.pass;
|
||||
|
|
@ -6912,6 +7034,8 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
|||
cmd->state.render_areas, sizeof(cmd->state.render_areas));
|
||||
cmd->state.suspended_pass.per_layer_render_area =
|
||||
cmd->state.per_layer_render_area;
|
||||
cmd->state.suspended_pass.fdm_subsampled =
|
||||
cmd->state.fdm_subsampled;
|
||||
cmd->state.suspended_pass.attachments = cmd->state.attachments;
|
||||
cmd->state.suspended_pass.clear_values = cmd->state.clear_values;
|
||||
cmd->state.suspended_pass.gmem_layout = cmd->state.gmem_layout;
|
||||
|
|
@ -6919,8 +7043,6 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
|||
|
||||
tu_fill_render_pass_state(&cmd->state.vk_rp, cmd->state.pass, cmd->state.subpass);
|
||||
|
||||
tu_renderpass_begin(cmd);
|
||||
|
||||
if (!resuming) {
|
||||
cmd->patchpoints_ctx = ralloc_context(NULL);
|
||||
tu_emit_subpass_begin<CHIP>(cmd);
|
||||
|
|
@ -7676,41 +7798,53 @@ fdm_apply_fs_params(struct tu_cmd_buffer *cmd,
|
|||
* in which case views will be 1 and we have to replicate the one view
|
||||
* to all of the layers.
|
||||
*/
|
||||
VkExtent2D area = config->frag_areas[MIN2(i, views - 1)];
|
||||
unsigned view = MIN2(i, views - 1);
|
||||
VkExtent2D tile_frag_area = config->frag_areas[view];
|
||||
VkRect2D bin = bins[MIN2(i, views - 1)];
|
||||
VkOffset2D offset = tu_fdm_per_bin_offset(area, bin, common_bin_offset);
|
||||
|
||||
/* For custom resolve, we switch to rendering directly to sysmem and so
|
||||
* the fragment size becomes 1x1. This means we have to scale down
|
||||
* FragCoord when accessing GMEM input attachments.
|
||||
/* The space HW FragCoord (as well as viewport and scissor) is in is:
|
||||
* - Without custom resolve, rendering space as usual.
|
||||
* - With custom resolve to non-subsampled images, framebuffer
|
||||
* space.
|
||||
* - With custom resolve to subsampled images, subsampled space. Its
|
||||
* origin is subsampled_pos.offset, and it may or may not be scaled
|
||||
* down depending on whether the view is subsampled.
|
||||
*
|
||||
* TODO: When we support subsampled images, this should also only happen
|
||||
* for non-subsampled images.
|
||||
* For user FragCoord, we need to transform from this space to
|
||||
* framebuffer space. However the transform in the shader performs the
|
||||
* opposite, so we actually need to transform from framebuffer space to
|
||||
* this "custom rendering space". For GMEM FragCoord, we need to
|
||||
* transform this space to rendering space.
|
||||
*/
|
||||
VkOffset2D tile_start = common_bin_offset;
|
||||
VkExtent2D rendering_frag_area = tile_frag_area;
|
||||
VkExtent2D gmem_frag_area = (VkExtent2D) { 1, 1 };
|
||||
if (state->custom_resolve) {
|
||||
tu_cs_emit(cs, 1 /* width */);
|
||||
tu_cs_emit(cs, 1 /* height */);
|
||||
tu_cs_emit(cs, fui(0.0));
|
||||
tu_cs_emit(cs, fui(0.0));
|
||||
} else {
|
||||
tu_cs_emit(cs, area.width);
|
||||
tu_cs_emit(cs, area.height);
|
||||
tu_cs_emit(cs, fui(offset.x));
|
||||
tu_cs_emit(cs, fui(offset.y));
|
||||
if (config->subsampled)
|
||||
tile_start = config->subsampled_pos[view].offset;
|
||||
else
|
||||
tile_start = bin.offset;
|
||||
if (!(config->subsampled_views & (1u << view))) {
|
||||
rendering_frag_area = (VkExtent2D){ 1, 1 };
|
||||
gmem_frag_area = tile_frag_area;
|
||||
}
|
||||
}
|
||||
VkRect2D gmem_bin = bin;
|
||||
gmem_bin.offset = tile_start;
|
||||
|
||||
VkOffset2D offset = tu_fdm_per_bin_offset(rendering_frag_area, bin, tile_start);
|
||||
VkOffset2D gmem_offset = tu_fdm_per_bin_offset(gmem_frag_area, gmem_bin,
|
||||
common_bin_offset);
|
||||
|
||||
tu_cs_emit(cs, rendering_frag_area.width);
|
||||
tu_cs_emit(cs, rendering_frag_area.height);
|
||||
tu_cs_emit(cs, fui(offset.x));
|
||||
tu_cs_emit(cs, fui(offset.y));
|
||||
|
||||
if (i * 2 + 1 < num_consts) {
|
||||
if (state->custom_resolve) {
|
||||
tu_cs_emit(cs, fui(1. / area.width));
|
||||
tu_cs_emit(cs, fui(1. / area.height));
|
||||
tu_cs_emit(cs, fui(offset.x));
|
||||
tu_cs_emit(cs, fui(offset.y));
|
||||
} else {
|
||||
tu_cs_emit(cs, fui(1.0));
|
||||
tu_cs_emit(cs, fui(1.0));
|
||||
tu_cs_emit(cs, fui(0.0));
|
||||
tu_cs_emit(cs, fui(0.0));
|
||||
}
|
||||
tu_cs_emit(cs, fui(1. / gmem_frag_area.width));
|
||||
tu_cs_emit(cs, fui(1. / gmem_frag_area.height));
|
||||
tu_cs_emit(cs, fui(gmem_offset.x));
|
||||
tu_cs_emit(cs, fui(gmem_offset.y));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -551,6 +551,7 @@ struct tu_cmd_state
|
|||
const struct tu_framebuffer *framebuffer;
|
||||
VkRect2D render_areas[MAX_VIEWS];
|
||||
bool per_layer_render_area;
|
||||
bool fdm_subsampled;
|
||||
enum tu_gmem_layout gmem_layout;
|
||||
|
||||
const struct tu_image_view **attachments;
|
||||
|
|
@ -560,6 +561,7 @@ struct tu_cmd_state
|
|||
} suspended_pass;
|
||||
|
||||
bool fdm_enabled;
|
||||
bool fdm_subsampled;
|
||||
|
||||
bool tessfactor_addr_set;
|
||||
bool predication_active;
|
||||
|
|
|
|||
|
|
@ -156,6 +156,8 @@ enum tu_predicate_bit {
|
|||
TU_PREDICATE_VTX_STATS_RUNNING = 3,
|
||||
TU_PREDICATE_VTX_STATS_NOT_RUNNING = 4,
|
||||
TU_PREDICATE_FIRST_TILE = 5,
|
||||
TU_PREDICATE_FAST_STORE = 6,
|
||||
TU_PREDICATE_NO_FAST_STORE = 7,
|
||||
};
|
||||
|
||||
/* Onchip timestamp register layout. */
|
||||
|
|
@ -176,6 +178,11 @@ enum tu_onchip_addr {
|
|||
*/
|
||||
};
|
||||
|
||||
struct tu_rect2d_float {
|
||||
float x_start, y_start;
|
||||
float x_end, y_end;
|
||||
};
|
||||
|
||||
|
||||
#define TU_GENX(FUNC_NAME) FD_GENX(FUNC_NAME)
|
||||
|
||||
|
|
@ -213,4 +220,13 @@ struct tu_suballocator;
|
|||
struct tu_subpass;
|
||||
struct tu_u_trace_submission_data;
|
||||
|
||||
/* Helper for iterating over layers of an attachment that handles both
|
||||
* multiview and layered rendering cases.
|
||||
*/
|
||||
#define for_each_layer(layer, layer_mask, layers) \
|
||||
for (uint32_t layer = 0; \
|
||||
layer < ((layer_mask) ? (util_logbase2(layer_mask) + 1) : (layers)); \
|
||||
layer++) \
|
||||
if (!(layer_mask) || ((layer_mask) & BIT(layer)))
|
||||
|
||||
#endif /* TU_COMMON_H */
|
||||
|
|
|
|||
|
|
@ -32,6 +32,7 @@
|
|||
#include "tu_image.h"
|
||||
#include "tu_formats.h"
|
||||
#include "tu_rmv.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
#include "bvh/tu_build_interface.h"
|
||||
|
||||
static inline uint8_t *
|
||||
|
|
@ -43,7 +44,8 @@ pool_base(struct tu_descriptor_pool *pool)
|
|||
static uint32_t
|
||||
descriptor_size(struct tu_device *dev,
|
||||
const VkDescriptorSetLayoutBinding *binding,
|
||||
VkDescriptorType type)
|
||||
VkDescriptorType type,
|
||||
bool subsampled)
|
||||
{
|
||||
switch (type) {
|
||||
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
|
||||
|
|
@ -54,7 +56,7 @@ descriptor_size(struct tu_device *dev,
|
|||
* descriptors which are less than 16 dwords. However combined images
|
||||
* and samplers are actually two descriptors, so they have size 2.
|
||||
*/
|
||||
return FDL6_TEX_CONST_DWORDS * 4 * 2;
|
||||
return FDL6_TEX_CONST_DWORDS * 4 * (subsampled ? 3 : 2);
|
||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
|
||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
||||
/* isam.v allows using a single 16-bit descriptor for both 16-bit and
|
||||
|
|
@ -80,7 +82,8 @@ mutable_descriptor_size(struct tu_device *dev,
|
|||
uint32_t max_size = 0;
|
||||
|
||||
for (uint32_t i = 0; i < list->descriptorTypeCount; i++) {
|
||||
uint32_t size = descriptor_size(dev, NULL, list->pDescriptorTypes[i]);
|
||||
uint32_t size = descriptor_size(dev, NULL, list->pDescriptorTypes[i],
|
||||
false);
|
||||
max_size = MAX2(max_size, size);
|
||||
}
|
||||
|
||||
|
|
@ -194,6 +197,7 @@ tu_CreateDescriptorSetLayout(
|
|||
set_layout->binding[b].dynamic_offset_offset = dynamic_offset_size;
|
||||
set_layout->binding[b].shader_stages = binding->stageFlags;
|
||||
|
||||
bool has_subsampled_sampler = false;
|
||||
if ((binding->descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER ||
|
||||
binding->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) &&
|
||||
binding->pImmutableSamplers) {
|
||||
|
|
@ -208,8 +212,12 @@ tu_CreateDescriptorSetLayout(
|
|||
|
||||
bool has_ycbcr_sampler = false;
|
||||
for (unsigned i = 0; i < pCreateInfo->pBindings[j].descriptorCount; ++i) {
|
||||
if (tu_sampler_from_handle(binding->pImmutableSamplers[i])->vk.ycbcr_conversion)
|
||||
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||
binding->pImmutableSamplers[i]);
|
||||
if (sampler->vk.ycbcr_conversion)
|
||||
has_ycbcr_sampler = true;
|
||||
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)
|
||||
has_subsampled_sampler = true;
|
||||
}
|
||||
|
||||
if (has_ycbcr_sampler) {
|
||||
|
|
@ -236,7 +244,8 @@ tu_CreateDescriptorSetLayout(
|
|||
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[j]);
|
||||
} else {
|
||||
set_layout->binding[b].size =
|
||||
descriptor_size(device, binding, binding->descriptorType);
|
||||
descriptor_size(device, binding, binding->descriptorType,
|
||||
has_subsampled_sampler);
|
||||
}
|
||||
|
||||
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK)
|
||||
|
|
@ -365,7 +374,19 @@ tu_GetDescriptorSetLayoutSupport(
|
|||
descriptor_sz =
|
||||
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[i]);
|
||||
} else {
|
||||
descriptor_sz = descriptor_size(device, binding, binding->descriptorType);
|
||||
bool has_subsampled_sampler = false;
|
||||
if (binding->pImmutableSamplers) {
|
||||
for (unsigned i = 0; i < binding->descriptorType; i++) {
|
||||
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||
binding->pImmutableSamplers[i]);
|
||||
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||
has_subsampled_sampler = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
descriptor_sz = descriptor_size(device, binding, binding->descriptorType,
|
||||
has_subsampled_sampler);
|
||||
}
|
||||
uint64_t descriptor_alignment = 4 * FDL6_TEX_CONST_DWORDS;
|
||||
|
||||
|
|
@ -453,6 +474,9 @@ sha1_update_descriptor_set_binding_layout(struct mesa_sha1 *ctx,
|
|||
SHA1_UPDATE_VALUE(ctx, layout->dynamic_offset_offset);
|
||||
SHA1_UPDATE_VALUE(ctx, layout->immutable_samplers_offset);
|
||||
|
||||
const struct tu_sampler *samplers =
|
||||
tu_immutable_samplers(set_layout, layout);
|
||||
|
||||
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
||||
tu_immutable_ycbcr_samplers(set_layout, layout);
|
||||
|
||||
|
|
@ -460,6 +484,16 @@ sha1_update_descriptor_set_binding_layout(struct mesa_sha1 *ctx,
|
|||
for (unsigned i = 0; i < layout->array_size; i++)
|
||||
sha1_update_ycbcr_sampler(ctx, ycbcr_samplers + i);
|
||||
}
|
||||
|
||||
if (samplers) {
|
||||
for (unsigned i = 0; i < layout->array_size; i++) {
|
||||
if (samplers[i].vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||
SHA1_UPDATE_VALUE(ctx, i);
|
||||
SHA1_UPDATE_VALUE(ctx, samplers[i].vk.address_mode_u);
|
||||
SHA1_UPDATE_VALUE(ctx, samplers[i].vk.address_mode_v);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -721,7 +755,7 @@ tu_CreateDescriptorPool(VkDevice _device,
|
|||
switch (pool_size->type) {
|
||||
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
|
||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
||||
dynamic_size += descriptor_size(device, NULL, pool_size->type) *
|
||||
dynamic_size += descriptor_size(device, NULL, pool_size->type, false) *
|
||||
pool_size->descriptorCount;
|
||||
break;
|
||||
case VK_DESCRIPTOR_TYPE_MUTABLE_EXT:
|
||||
|
|
@ -740,7 +774,11 @@ tu_CreateDescriptorPool(VkDevice _device,
|
|||
bo_size += pool_size->descriptorCount;
|
||||
break;
|
||||
default:
|
||||
bo_size += descriptor_size(device, NULL, pool_size->type) *
|
||||
/* We don't know whether this pool will be used with subsampled
|
||||
* images, so we have to assume it may be.
|
||||
*/
|
||||
bo_size += descriptor_size(device, NULL, pool_size->type,
|
||||
device->vk.enabled_features.fragmentDensityMap) *
|
||||
pool_size->descriptorCount;
|
||||
break;
|
||||
}
|
||||
|
|
@ -1084,15 +1122,35 @@ static void
|
|||
write_combined_image_sampler_descriptor(uint32_t *dst,
|
||||
VkDescriptorType descriptor_type,
|
||||
const VkDescriptorImageInfo *image_info,
|
||||
bool has_sampler)
|
||||
bool write_sampler,
|
||||
const struct tu_sampler *immutable_sampler)
|
||||
{
|
||||
write_image_descriptor(dst, descriptor_type, image_info);
|
||||
/* copy over sampler state */
|
||||
if (has_sampler) {
|
||||
VK_FROM_HANDLE(tu_sampler, sampler, image_info->sampler);
|
||||
|
||||
/* copy over sampler state */
|
||||
if (write_sampler) {
|
||||
VK_FROM_HANDLE(tu_sampler, sampler, image_info->sampler);
|
||||
memcpy(dst + FDL6_TEX_CONST_DWORDS, sampler->descriptor, sizeof(sampler->descriptor));
|
||||
}
|
||||
|
||||
/* It's technically legal to sample from a mismatched descriptor (i.e. only
|
||||
* the sampler or only the image has SUBSAMPLED_BIT) but it gives undefined
|
||||
* results. So we have to make sure not to crash or disturb other
|
||||
* descriptors. Therefore we check the sampler, because that's what
|
||||
* triggers allocating extra space in the descriptor set.
|
||||
*/
|
||||
if (immutable_sampler &&
|
||||
(immutable_sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)) {
|
||||
VK_FROM_HANDLE(tu_image_view, iview, image_info->imageView);
|
||||
VkDescriptorAddressInfoEXT info = {
|
||||
.address = iview->image->iova +
|
||||
iview->image->subsampled_metadata_offset +
|
||||
iview->vk.base_array_layer * sizeof(struct tu_subsampled_metadata),
|
||||
.range =
|
||||
iview->vk.layer_count * sizeof(struct tu_subsampled_metadata),
|
||||
};
|
||||
write_ubo_descriptor_addr(dst + 2 * FDL6_TEX_CONST_DWORDS, &info);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
|
|
@ -1156,12 +1214,15 @@ tu_GetDescriptorEXT(
|
|||
write_image_descriptor(dest, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
|
||||
pDescriptorInfo->data.pStorageImage);
|
||||
break;
|
||||
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
|
||||
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: {
|
||||
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||
pDescriptorInfo->data.pCombinedImageSampler->sampler);
|
||||
write_combined_image_sampler_descriptor(dest,
|
||||
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
|
||||
pDescriptorInfo->data.pCombinedImageSampler,
|
||||
true);
|
||||
true, sampler);
|
||||
break;
|
||||
}
|
||||
case VK_DESCRIPTOR_TYPE_SAMPLER:
|
||||
write_sampler_descriptor(dest, *pDescriptorInfo->data.pSampler);
|
||||
break;
|
||||
|
|
@ -1285,7 +1346,8 @@ tu_update_descriptor_sets(const struct tu_device *device,
|
|||
write_combined_image_sampler_descriptor(ptr,
|
||||
writeset->descriptorType,
|
||||
writeset->pImageInfo + j,
|
||||
!binding_layout->immutable_samplers_offset);
|
||||
!samplers,
|
||||
samplers ? &samplers[writeset->dstArrayElement + j] : NULL);
|
||||
|
||||
if (copy_immutable_samplers)
|
||||
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[writeset->dstArrayElement + j]);
|
||||
|
|
@ -1636,7 +1698,8 @@ tu_update_descriptor_set_with_template(
|
|||
write_combined_image_sampler_descriptor(ptr,
|
||||
templ->entry[i].descriptor_type,
|
||||
(const VkDescriptorImageInfo *) src,
|
||||
!samplers);
|
||||
!samplers,
|
||||
samplers ? &samplers[j] : NULL);
|
||||
if (templ->entry[i].copy_immutable_samplers)
|
||||
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[j]);
|
||||
break;
|
||||
|
|
|
|||
|
|
@ -1411,7 +1411,7 @@ tu_get_properties(struct tu_physical_device *pdevice,
|
|||
props->samplerDescriptorBufferAddressSpaceSize = ~0ull;
|
||||
props->resourceDescriptorBufferAddressSpaceSize = ~0ull;
|
||||
props->descriptorBufferAddressSpaceSize = ~0ull;
|
||||
props->combinedImageSamplerDensityMapDescriptorSize = 2 * FDL6_TEX_CONST_DWORDS * 4;
|
||||
props->combinedImageSamplerDensityMapDescriptorSize = 3 * FDL6_TEX_CONST_DWORDS * 4;
|
||||
|
||||
/* VK_EXT_legacy_vertex_attributes */
|
||||
props->nativeUnalignedPerformance = true;
|
||||
|
|
|
|||
|
|
@ -44,6 +44,7 @@
|
|||
|
||||
enum global_shader {
|
||||
GLOBAL_SH_VS_BLIT,
|
||||
GLOBAL_SH_VS_MULTI_BLIT,
|
||||
GLOBAL_SH_VS_CLEAR,
|
||||
GLOBAL_SH_FS_BLIT,
|
||||
GLOBAL_SH_FS_BLIT_ZSCALE,
|
||||
|
|
|
|||
|
|
@ -29,6 +29,7 @@
|
|||
#include "tu_formats.h"
|
||||
#include "tu_lrz.h"
|
||||
#include "tu_rmv.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
#include "tu_wsi.h"
|
||||
|
||||
uint32_t
|
||||
|
|
@ -538,6 +539,15 @@ tu_image_update_layout(struct tu_device *device, struct tu_image *image,
|
|||
/* no UBWC for separate stencil */
|
||||
image->ubwc_enabled = false;
|
||||
|
||||
/* Subsampled images with FDM offset require extra space for adjusting
|
||||
* the offset to make the tiles aligned.
|
||||
*/
|
||||
if ((image->vk.create_flags & VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT) &&
|
||||
(image->vk.create_flags & VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT)) {
|
||||
width0 += device->physical_device->info->tile_align_w;
|
||||
height0 += device->physical_device->info->tile_align_h;
|
||||
}
|
||||
|
||||
struct fdl_explicit_layout plane_layout;
|
||||
|
||||
if (plane_layouts) {
|
||||
|
|
@ -634,6 +644,12 @@ tu_image_update_layout(struct tu_device *device, struct tu_image *image,
|
|||
image->lrz_layout.lrz_total_size = 0;
|
||||
}
|
||||
|
||||
if (image->vk.create_flags & VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||
image->subsampled_metadata_offset = align64(image->total_size, 16);
|
||||
image->total_size = image->subsampled_metadata_offset +
|
||||
image->vk.array_layers * sizeof(struct tu_subsampled_metadata);
|
||||
}
|
||||
|
||||
return VK_SUCCESS;
|
||||
}
|
||||
TU_GENX(tu_image_update_layout);
|
||||
|
|
|
|||
|
|
@ -34,6 +34,7 @@ struct tu_image
|
|||
struct vk_image vk;
|
||||
|
||||
struct fdl_layout layout[3];
|
||||
uint64_t subsampled_metadata_offset;
|
||||
uint64_t total_size;
|
||||
|
||||
/* Set when bound */
|
||||
|
|
|
|||
|
|
@ -2732,32 +2732,46 @@ fdm_apply_viewports(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
|||
* renderpass, views will be 1 and we also have to replicate the 0'th
|
||||
* view to every view.
|
||||
*/
|
||||
VkExtent2D frag_area =
|
||||
(state->share_scale || views == 1) ? tile->frag_areas[0] : tile->frag_areas[i];
|
||||
VkRect2D bin =
|
||||
(state->share_scale || views == 1) ? bins[0] : bins[i];
|
||||
VkOffset2D hw_viewport_offset =
|
||||
(state->share_scale || views == 1) ? hw_viewport_offsets[0] :
|
||||
hw_viewport_offsets[i];
|
||||
unsigned view = (state->share_scale || views == 1) ? 0 : i;
|
||||
VkExtent2D frag_area = tile->frag_areas[view];
|
||||
VkRect2D bin = bins[view];
|
||||
VkOffset2D hw_viewport_offset = hw_viewport_offsets[view];
|
||||
/* Implement fake_single_viewport by replicating viewport 0 across all
|
||||
* views.
|
||||
*/
|
||||
VkViewport viewport =
|
||||
state->fake_single_viewport ? state->vp.viewports[0] : state->vp.viewports[i];
|
||||
if ((frag_area.width == 1 && frag_area.height == 1 &&
|
||||
common_bin_offset.x == bin.offset.x &&
|
||||
common_bin_offset.y == bin.offset.y) ||
|
||||
/* When in a custom resolve operation (TODO: and using
|
||||
* non-subsampled images) we switch to framebuffer coordinates so we
|
||||
* shouldn't apply the transform. However the binning pass isn't
|
||||
* aware of this, so we have to keep applying the transform for
|
||||
* binning.
|
||||
*/
|
||||
(state->custom_resolve && !binning)) {
|
||||
if (frag_area.width == 1 && frag_area.height == 1 &&
|
||||
common_bin_offset.x == bin.offset.x &&
|
||||
common_bin_offset.y == bin.offset.y) {
|
||||
vp.viewports[i] = viewport;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* When custom resolve is enabled, we need to apply the viewport
|
||||
* transform so that we render to where we would've blitted the tile to.
|
||||
* Without subsampled images, this the framebuffer space bin (so there
|
||||
* is effectively no transform). With subsampled images, this is
|
||||
* subsampled space, which may not be the same as rendering space if
|
||||
* we had to shift the tile or with FDM offset.
|
||||
*/
|
||||
VkOffset2D tile_start = common_bin_offset;
|
||||
if (state->custom_resolve && !binning) {
|
||||
if (tile->subsampled)
|
||||
tile_start = tile->subsampled_pos[view].offset;
|
||||
else
|
||||
tile_start = bin.offset;
|
||||
}
|
||||
|
||||
/* When in a custom resolve operation without subsampling we shouldn't
|
||||
* scale the viewport down. However the binning pass isn't aware of
|
||||
* this, so we have to keep applying the transform for binning.
|
||||
*/
|
||||
if (state->custom_resolve &&
|
||||
!(tile->subsampled_views & (1u << view)) && !binning) {
|
||||
frag_area = (VkExtent2D) {1, 1};
|
||||
}
|
||||
|
||||
float scale_x = (float) 1.0f / frag_area.width;
|
||||
float scale_y = (float) 1.0f / frag_area.height;
|
||||
|
||||
|
|
@ -2767,9 +2781,12 @@ fdm_apply_viewports(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
|||
vp.viewports[i].height = viewport.height * scale_y;
|
||||
|
||||
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
||||
common_bin_offset);
|
||||
offset.x -= hw_viewport_offset.x;
|
||||
offset.y -= hw_viewport_offset.y;
|
||||
tile_start);
|
||||
/* FDM offsets are disabled with custom resolve. */
|
||||
if (!state->custom_resolve) {
|
||||
offset.x -= hw_viewport_offset.x;
|
||||
offset.y -= hw_viewport_offset.y;
|
||||
}
|
||||
|
||||
vp.viewports[i].x = scale_x * viewport.x + offset.x;
|
||||
vp.viewports[i].y = scale_y * viewport.y + offset.y;
|
||||
|
|
@ -2861,15 +2878,33 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
|||
struct vk_viewport_state vp = state->vp;
|
||||
|
||||
for (unsigned i = 0; i < vp.scissor_count; i++) {
|
||||
VkExtent2D frag_area =
|
||||
(state->share_scale || views == 1) ? tile->frag_areas[0] : tile->frag_areas[i];
|
||||
VkRect2D bin =
|
||||
(state->share_scale || views == 1) ? bins[0] : bins[i];
|
||||
unsigned view = (state->share_scale || views == 1) ? 0 : i;
|
||||
VkExtent2D frag_area = tile->frag_areas[view];
|
||||
VkRect2D bin = bins[view];
|
||||
VkRect2D scissor =
|
||||
state->fake_single_viewport ? state->vp.scissors[0] : state->vp.scissors[i];
|
||||
VkOffset2D hw_viewport_offset =
|
||||
(state->share_scale || views == 1) ? hw_viewport_offsets[0] :
|
||||
hw_viewport_offsets[i];
|
||||
VkOffset2D hw_viewport_offset = hw_viewport_offsets[view];
|
||||
|
||||
VkOffset2D tile_start = common_bin_offset;
|
||||
if (state->custom_resolve && !binning) {
|
||||
if (tile->subsampled)
|
||||
tile_start = tile->subsampled_pos[view].offset;
|
||||
else
|
||||
tile_start = bin.offset;
|
||||
}
|
||||
|
||||
/* Disable scaling when doing a custom resolve to a non-subsampled image
|
||||
* and not in the binning pass, because we use framebuffer coordinates.
|
||||
*/
|
||||
if (state->custom_resolve &&
|
||||
!(tile->subsampled_views & (1u << view)) && !binning) {
|
||||
frag_area = (VkExtent2D) {1, 1};
|
||||
}
|
||||
|
||||
if (!state->custom_resolve) {
|
||||
tile_start.x -= hw_viewport_offset.x;
|
||||
tile_start.y -= hw_viewport_offset.y;
|
||||
}
|
||||
|
||||
/* Transform the scissor following the viewport. It's unclear how this
|
||||
* is supposed to handle cases where the scissor isn't aligned to the
|
||||
|
|
@ -2878,22 +2913,7 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
|||
* isn't aligned to the fragment area.
|
||||
*/
|
||||
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
||||
common_bin_offset);
|
||||
offset.x -= hw_viewport_offset.x;
|
||||
offset.y -= hw_viewport_offset.y;
|
||||
|
||||
/* Disable scaling and offset when doing a custom resolve to a
|
||||
* non-subsampled image and not in the binning pass, because we
|
||||
* use framebuffer coordinates.
|
||||
*
|
||||
* TODO: When we support subsampled images, only do this for
|
||||
* non-subsampled images.
|
||||
*/
|
||||
if (state->custom_resolve && !binning) {
|
||||
offset = (VkOffset2D) {};
|
||||
frag_area = (VkExtent2D) {1, 1};
|
||||
}
|
||||
|
||||
tile_start);
|
||||
VkOffset2D min = {
|
||||
scissor.offset.x / frag_area.width + offset.x,
|
||||
scissor.offset.y / frag_area.height + offset.y,
|
||||
|
|
@ -2904,26 +2924,17 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
|||
};
|
||||
|
||||
/* Intersect scissor with the scaled bin, this essentially replaces the
|
||||
* window scissor. With custom resolve (TODO: and non-subsampled images)
|
||||
* we have to use the unscaled bin instead.
|
||||
* window scissor. With custom resolve we have to use the unscaled bin
|
||||
* instead.
|
||||
*/
|
||||
uint32_t scaled_width = bin.extent.width / frag_area.width;
|
||||
uint32_t scaled_height = bin.extent.height / frag_area.height;
|
||||
int32_t bin_x;
|
||||
int32_t bin_y;
|
||||
if (state->custom_resolve && !binning) {
|
||||
bin_x = bin.offset.x;
|
||||
bin_y = bin.offset.y;
|
||||
} else {
|
||||
bin_x = common_bin_offset.x - hw_viewport_offset.x;
|
||||
bin_y = common_bin_offset.y - hw_viewport_offset.y;
|
||||
}
|
||||
vp.scissors[i].offset.x = MAX2(min.x, bin_x);
|
||||
vp.scissors[i].offset.y = MAX2(min.y, bin_y);
|
||||
vp.scissors[i].offset.x = MAX2(min.x, tile_start.x);
|
||||
vp.scissors[i].offset.y = MAX2(min.y, tile_start.y);
|
||||
vp.scissors[i].extent.width =
|
||||
MIN2(max.x, bin_x + scaled_width) - vp.scissors[i].offset.x;
|
||||
MIN2(max.x, tile_start.x + scaled_width) - vp.scissors[i].offset.x;
|
||||
vp.scissors[i].extent.height =
|
||||
MIN2(max.y, bin_y + scaled_height) - vp.scissors[i].offset.y;
|
||||
MIN2(max.y, tile_start.y + scaled_height) - vp.scissors[i].offset.y;
|
||||
}
|
||||
|
||||
TU_CALLX(cs->device, tu6_emit_scissor)(cs, &vp);
|
||||
|
|
|
|||
|
|
@ -21,6 +21,7 @@
|
|||
#include "tu_lrz.h"
|
||||
#include "tu_pipeline.h"
|
||||
#include "tu_rmv.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
|
||||
#include <initializer_list>
|
||||
|
||||
|
|
@ -506,7 +507,7 @@ lower_ssbo_ubo_intrinsic(struct tu_device *dev,
|
|||
|
||||
static nir_def *
|
||||
build_bindless(struct tu_device *dev, nir_builder *b,
|
||||
nir_deref_instr *deref, bool is_sampler,
|
||||
nir_deref_instr *deref, unsigned combined_descriptor_offset,
|
||||
struct tu_shader *shader,
|
||||
const struct tu_pipeline_layout *layout,
|
||||
uint32_t read_only_input_attachments,
|
||||
|
|
@ -568,9 +569,8 @@ build_bindless(struct tu_device *dev, nir_builder *b,
|
|||
/* Samplers come second in combined image/sampler descriptors, see
|
||||
* write_combined_image_sampler_descriptor().
|
||||
*/
|
||||
if (is_sampler && bind_layout->type ==
|
||||
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) {
|
||||
offset = 1;
|
||||
if (bind_layout->type == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) {
|
||||
offset = combined_descriptor_offset;
|
||||
}
|
||||
desc_offset =
|
||||
nir_imm_int(b, (bind_layout->offset / (4 * FDL6_TEX_CONST_DWORDS)) +
|
||||
|
|
@ -594,7 +594,7 @@ lower_image_deref(struct tu_device *dev, nir_builder *b,
|
|||
const struct tu_pipeline_layout *layout)
|
||||
{
|
||||
nir_deref_instr *deref = nir_src_as_deref(instr->src[0]);
|
||||
nir_def *bindless = build_bindless(dev, b, deref, false, shader, layout, 0, false);
|
||||
nir_def *bindless = build_bindless(dev, b, deref, 0, shader, layout, 0, false);
|
||||
nir_rewrite_image_intrinsic(instr, bindless, true);
|
||||
}
|
||||
|
||||
|
|
@ -697,42 +697,93 @@ lower_intrinsic(nir_builder *b, nir_intrinsic_instr *instr,
|
|||
}
|
||||
|
||||
static void
|
||||
lower_tex_ycbcr(const struct tu_pipeline_layout *layout,
|
||||
lower_tex_subsampled(const struct tu_sampler *sampler,
|
||||
struct tu_device *dev,
|
||||
struct tu_shader *shader,
|
||||
const struct tu_pipeline_layout *layout,
|
||||
nir_builder *b,
|
||||
nir_tex_instr *tex)
|
||||
{
|
||||
/* Only these ops are allowed with subsampled images */
|
||||
if (tex->op != nir_texop_tex &&
|
||||
tex->op != nir_texop_txl)
|
||||
return;
|
||||
|
||||
b->cursor = nir_before_instr(&tex->instr);
|
||||
|
||||
int tex_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
||||
assert(tex_src_idx >= 0);
|
||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_src_idx].src);
|
||||
nir_def *bindless = build_bindless(dev, b, deref, 2, shader, layout,
|
||||
0, /* read_only_input_attachments (not used) */
|
||||
false /* dynamic_renderpass (not used)*/
|
||||
);
|
||||
|
||||
nir_def *coord = nir_steal_tex_src(tex, nir_tex_src_coord);
|
||||
nir_def *coord_xy = nir_channels(b, coord, 0x3);
|
||||
nir_def *layer = NULL;
|
||||
if (coord->num_components > 2)
|
||||
layer = nir_channel(b, coord, 2);
|
||||
|
||||
/* In order to avoid problems in the math for finding the bin with
|
||||
* an x or y coordinate of exactly 1.0, where we would overflow into the
|
||||
* next bin, we have to clamp to some 1.0 - epsilon. The largest possible
|
||||
* framebuffer is 2^14 pixels currently, and we cannot shift the coordinate
|
||||
* to before the pixel center, so we use 2^-15.
|
||||
*/
|
||||
const float epsilon = 0x1p-15f;
|
||||
nir_def *clamped_coord_xy =
|
||||
nir_fmax(b, nir_fmin(b, coord_xy, nir_imm_float(b, 1.0f - epsilon)),
|
||||
nir_imm_float(b, 0.0));
|
||||
|
||||
nir_def *clamped_coord = clamped_coord_xy;
|
||||
if (layer) {
|
||||
clamped_coord = nir_vec3(b, nir_channel(b, clamped_coord_xy, 0),
|
||||
nir_channel(b, clamped_coord_xy, 1),
|
||||
layer);
|
||||
}
|
||||
|
||||
nir_def *transformed_coord_xy =
|
||||
tu_get_subsampled_coordinates(b, clamped_coord, bindless);
|
||||
|
||||
/* Due to VUID-VkSamplerCreateInfo-flags-02577 we only have to handle
|
||||
* CLAMP_TO_EDGE and CLAMP_TO_BORDER. We implicitly do CLAMP_TO_EDGE to
|
||||
* prevent OOB accesses to the metadata anyway, so we just fixup the
|
||||
* coordinates to pass the original coordinates if OOB.
|
||||
*/
|
||||
if (sampler->vk.address_mode_u == VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER) {
|
||||
nir_def *x = nir_channel(b, coord, 0);
|
||||
nir_def *oob = nir_fneu(b, nir_fsat(b, x), x);
|
||||
transformed_coord_xy =
|
||||
nir_vec2(b, nir_bcsel(b, oob, x,
|
||||
nir_channel(b, transformed_coord_xy, 0)),
|
||||
nir_channel(b, transformed_coord_xy, 1));
|
||||
}
|
||||
|
||||
if (sampler->vk.address_mode_v == VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER) {
|
||||
nir_def *y = nir_channel(b, coord, 1);
|
||||
nir_def *oob = nir_fneu(b, nir_fsat(b, y), y);
|
||||
transformed_coord_xy =
|
||||
nir_vec2(b, nir_channel(b, transformed_coord_xy, 0),
|
||||
nir_bcsel(b, oob, y,
|
||||
nir_channel(b, transformed_coord_xy, 1)));
|
||||
}
|
||||
|
||||
nir_def *transformed_coord = transformed_coord_xy;
|
||||
if (layer) {
|
||||
transformed_coord = nir_vec3(b, nir_channel(b, transformed_coord_xy, 0),
|
||||
nir_channel(b, transformed_coord_xy, 1),
|
||||
layer);
|
||||
}
|
||||
|
||||
nir_tex_instr_add_src(tex, nir_tex_src_coord, transformed_coord);
|
||||
}
|
||||
|
||||
static void
|
||||
lower_tex_ycbcr(const struct vk_ycbcr_conversion_state *ycbcr_sampler,
|
||||
nir_builder *builder,
|
||||
nir_tex_instr *tex)
|
||||
{
|
||||
int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
||||
assert(deref_src_idx >= 0);
|
||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src);
|
||||
|
||||
nir_variable *var = nir_deref_instr_get_variable(deref);
|
||||
const struct tu_descriptor_set_layout *set_layout =
|
||||
layout->set[var->data.descriptor_set].layout;
|
||||
const struct tu_descriptor_set_binding_layout *binding =
|
||||
&set_layout->binding[var->data.binding];
|
||||
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
||||
tu_immutable_ycbcr_samplers(set_layout, binding);
|
||||
|
||||
if (!ycbcr_samplers)
|
||||
return;
|
||||
|
||||
/* For the following instructions, we don't apply any change */
|
||||
if (tex->op == nir_texop_txs ||
|
||||
tex->op == nir_texop_query_levels ||
|
||||
tex->op == nir_texop_lod)
|
||||
return;
|
||||
|
||||
assert(tex->texture_index == 0);
|
||||
unsigned array_index = 0;
|
||||
if (deref->deref_type != nir_deref_type_var) {
|
||||
assert(deref->deref_type == nir_deref_type_array);
|
||||
if (!nir_src_is_const(deref->arr.index))
|
||||
return;
|
||||
array_index = nir_src_as_uint(deref->arr.index);
|
||||
array_index = MIN2(array_index, binding->array_size - 1);
|
||||
}
|
||||
const struct vk_ycbcr_conversion_state *ycbcr_sampler = ycbcr_samplers + array_index;
|
||||
|
||||
if (ycbcr_sampler->ycbcr_model == VK_SAMPLER_YCBCR_MODEL_CONVERSION_RGB_IDENTITY)
|
||||
return;
|
||||
|
||||
|
|
@ -756,6 +807,55 @@ lower_tex_ycbcr(const struct tu_pipeline_layout *layout,
|
|||
builder->cursor = nir_before_instr(&tex->instr);
|
||||
}
|
||||
|
||||
static void
|
||||
lower_tex_immutable(struct tu_device *dev,
|
||||
struct tu_shader *shader,
|
||||
const struct tu_pipeline_layout *layout,
|
||||
nir_builder *builder,
|
||||
nir_tex_instr *tex)
|
||||
{
|
||||
int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
||||
assert(deref_src_idx >= 0);
|
||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src);
|
||||
|
||||
nir_variable *var = nir_deref_instr_get_variable(deref);
|
||||
const struct tu_descriptor_set_layout *set_layout =
|
||||
layout->set[var->data.descriptor_set].layout;
|
||||
const struct tu_descriptor_set_binding_layout *binding =
|
||||
&set_layout->binding[var->data.binding];
|
||||
|
||||
/* For the following instructions, we don't apply any change */
|
||||
if (tex->op == nir_texop_txs ||
|
||||
tex->op == nir_texop_query_levels ||
|
||||
tex->op == nir_texop_lod)
|
||||
return;
|
||||
|
||||
assert(tex->texture_index == 0);
|
||||
unsigned array_index = 0;
|
||||
if (deref->deref_type != nir_deref_type_var) {
|
||||
assert(deref->deref_type == nir_deref_type_array);
|
||||
if (!nir_src_is_const(deref->arr.index))
|
||||
return;
|
||||
array_index = nir_src_as_uint(deref->arr.index);
|
||||
array_index = MIN2(array_index, binding->array_size - 1);
|
||||
}
|
||||
|
||||
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
||||
tu_immutable_ycbcr_samplers(set_layout, binding);
|
||||
if (ycbcr_samplers) {
|
||||
const struct vk_ycbcr_conversion_state *ycbcr_sampler = ycbcr_samplers + array_index;
|
||||
lower_tex_ycbcr(ycbcr_sampler, builder, tex);
|
||||
}
|
||||
|
||||
const struct tu_sampler *samplers =
|
||||
tu_immutable_samplers(set_layout, binding);
|
||||
if (samplers) {
|
||||
const struct tu_sampler *sampler = samplers + array_index;
|
||||
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)
|
||||
lower_tex_subsampled(sampler, dev, shader, layout, builder, tex);
|
||||
}
|
||||
}
|
||||
|
||||
static bool
|
||||
lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
||||
struct tu_shader *shader, const struct tu_pipeline_layout *layout,
|
||||
|
|
@ -765,7 +865,7 @@ lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
|||
int sampler_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_sampler_2_deref : nir_tex_src_sampler_deref);
|
||||
if (sampler_src_idx >= 0) {
|
||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[sampler_src_idx].src);
|
||||
nir_def *bindless = build_bindless(dev, b, deref, true, shader, layout,
|
||||
nir_def *bindless = build_bindless(dev, b, deref, 1, shader, layout,
|
||||
read_only_input_attachments,
|
||||
dynamic_renderpass);
|
||||
nir_src_rewrite(&tex->src[sampler_src_idx].src, bindless);
|
||||
|
|
@ -775,7 +875,7 @@ lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
|||
int tex_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_texture_2_deref : nir_tex_src_texture_deref);
|
||||
if (tex_src_idx >= 0) {
|
||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_src_idx].src);
|
||||
nir_def *bindless = build_bindless(dev, b, deref, false, shader, layout,
|
||||
nir_def *bindless = build_bindless(dev, b, deref, 0, shader, layout,
|
||||
read_only_input_attachments,
|
||||
dynamic_renderpass);
|
||||
nir_src_rewrite(&tex->src[tex_src_idx].src, bindless);
|
||||
|
|
@ -800,7 +900,7 @@ lower_tex(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
|||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
||||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, true);
|
||||
} else {
|
||||
lower_tex_ycbcr(layout, b, tex);
|
||||
lower_tex_immutable(dev, shader, layout, b, tex);
|
||||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
||||
}
|
||||
|
||||
|
|
|
|||
584
src/freedreno/vulkan/tu_subsampled_image.cc
Normal file
584
src/freedreno/vulkan/tu_subsampled_image.cc
Normal file
|
|
@ -0,0 +1,584 @@
|
|||
/*
|
||||
* Copyright © 2026 Valve Corporation.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "tu_cmd_buffer.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
|
||||
#include "nir_builder.h"
|
||||
|
||||
/* If a tile is not subsampled, we treat it as if its fragment area is (1,1)
|
||||
* for the purposes of subsampling.
|
||||
*/
|
||||
static VkExtent2D
|
||||
get_effective_frag_area(const struct tu_tile_config *tile, unsigned view)
|
||||
{
|
||||
return (tile->subsampled_views & (1u << view)) ?
|
||||
tile->frag_areas[view] : (VkExtent2D) {1, 1};
|
||||
}
|
||||
|
||||
void
|
||||
tu_emit_subsampled_metadata(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
unsigned a,
|
||||
const struct tu_tile_config *tiles,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const struct tu_framebuffer *fb,
|
||||
const VkOffset2D *fdm_offsets)
|
||||
{
|
||||
const struct tu_image_view *iview = cmd->state.attachments[a];
|
||||
float size_ratio_x = (float)iview->image->vk.extent.width /
|
||||
iview->image->layout[0].width0;
|
||||
float size_ratio_y = (float)iview->image->vk.extent.height /
|
||||
iview->image->layout[0].height0;
|
||||
for_each_layer (i, cmd->state.pass->attachments[a].used_views |
|
||||
cmd->state.pass->attachments[a].resolve_views,
|
||||
fb->layers) {
|
||||
struct tu_subsampled_metadata metadata;
|
||||
|
||||
metadata.hdr.pad0[0] = metadata.hdr.pad0[1] = metadata.hdr.pad0[2] = 0;
|
||||
|
||||
unsigned tile_count;
|
||||
if (!tiles || vsc->tile_count.width * vsc->tile_count.height >
|
||||
TU_SUBSAMPLED_MAX_BINS) {
|
||||
tile_count = 1;
|
||||
metadata.hdr.scale_x = 1.0;
|
||||
metadata.hdr.scale_y = 1.0;
|
||||
metadata.hdr.offset_x = 0.0;
|
||||
metadata.hdr.offset_y = 0.0;
|
||||
metadata.hdr.bin_stride = 1;
|
||||
metadata.bins[0].scale_x = size_ratio_x;
|
||||
metadata.bins[0].scale_y = size_ratio_y;
|
||||
metadata.bins[0].offset_x = 0.0;
|
||||
metadata.bins[0].offset_y = 0.0;
|
||||
} else {
|
||||
unsigned view = MIN2(i, tu_fdm_num_layers(cmd) - 1);
|
||||
VkOffset2D bin_offset = {};
|
||||
if (fdm_offsets)
|
||||
bin_offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||
tile_count = vsc->tile_count.width * vsc->tile_count.height;
|
||||
metadata.hdr.scale_x = (float)iview->vk.extent.width / tiling->tile0.width;
|
||||
metadata.hdr.scale_y = (float)iview->vk.extent.height / tiling->tile0.height;
|
||||
metadata.hdr.offset_x = (float)bin_offset.x / tiling->tile0.width;
|
||||
metadata.hdr.offset_y = (float)bin_offset.y / tiling->tile0.height;
|
||||
metadata.hdr.bin_stride = vsc->tile_count.width;
|
||||
|
||||
for (unsigned j = 0; j < tile_count; j++) {
|
||||
const struct tu_tile_config *tile = &tiles[j];
|
||||
|
||||
while (tile->merged_tile)
|
||||
tile = tile->merged_tile;
|
||||
|
||||
if (!(tile->visible_views & (1u << view)) ||
|
||||
!tile->subsampled) {
|
||||
metadata.bins[j].scale_x = metadata.bins[j].scale_y = 1.0;
|
||||
metadata.bins[j].offset_x = metadata.bins[j].offset_y = 0.0;
|
||||
continue;
|
||||
}
|
||||
|
||||
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||
VkOffset2D fb_bin_start = (VkOffset2D) {
|
||||
MAX2(tile->pos.x * (int32_t)tiling->tile0.width - bin_offset.x, 0),
|
||||
MAX2(tile->pos.y * (int32_t)tiling->tile0.height - bin_offset.y, 0),
|
||||
};
|
||||
metadata.bins[j].scale_x = 1.0 / frag_area.width * size_ratio_x;
|
||||
metadata.bins[j].scale_y = 1.0 / frag_area.height * size_ratio_y;
|
||||
metadata.bins[j].offset_x =
|
||||
(float)(tile->subsampled_pos[view].offset.x -
|
||||
fb_bin_start.x / frag_area.width) /
|
||||
iview->image->layout[0].width0;
|
||||
metadata.bins[j].offset_y =
|
||||
(float)(tile->subsampled_pos[view].offset.y -
|
||||
fb_bin_start.y / frag_area.height) /
|
||||
iview->image->layout[0].height0;
|
||||
}
|
||||
}
|
||||
|
||||
uint64_t iova = iview->image->iova +
|
||||
iview->image->subsampled_metadata_offset +
|
||||
sizeof(struct tu_subsampled_metadata) *
|
||||
(iview->vk.base_array_layer + i);
|
||||
|
||||
tu_cs_emit_pkt7(cs, CP_MEM_WRITE,
|
||||
2 + (sizeof(struct tu_subsampled_header) +
|
||||
tile_count * sizeof(struct tu_subsampled_bin)) / 4);
|
||||
tu_cs_emit_qw(cs, iova);
|
||||
tu_cs_emit_array(cs, (const uint32_t *)&metadata.hdr,
|
||||
sizeof(struct tu_subsampled_header) / 4);
|
||||
tu_cs_emit_array(cs, (const uint32_t *)&metadata.bins,
|
||||
sizeof(struct tu_subsampled_bin) * tile_count / 4);
|
||||
}
|
||||
|
||||
/* The cache-tracking infrastructure can't be aware of subsampled images,
|
||||
* so manually make sure the writes land. Sampling as an image should
|
||||
* already insert a CACHE_INVALIDATE + WFI.
|
||||
*/
|
||||
cmd->state.cache.pending_flush_bits |=
|
||||
TU_CMD_FLAG_WAIT_MEM_WRITES;
|
||||
}
|
||||
|
||||
nir_def *
|
||||
tu_get_subsampled_coordinates(nir_builder *b,
|
||||
nir_def *coords,
|
||||
nir_def *descriptor)
|
||||
{
|
||||
nir_def *layer;
|
||||
if (coords->num_components > 2)
|
||||
layer = nir_f2u16(b, nir_channel(b, coords, 2));
|
||||
else
|
||||
layer = nir_imm_intN_t(b, 0, 16);
|
||||
|
||||
nir_def *layer_offset =
|
||||
nir_imul_imm_nuw(b, layer, sizeof(struct tu_subsampled_metadata) / 16);
|
||||
|
||||
nir_def *hdr0 =
|
||||
nir_load_ubo(b, 4, 32, descriptor,
|
||||
nir_ishl_imm(b, nir_u2u32(b, layer_offset), 4),
|
||||
.align_mul = 16,
|
||||
.align_offset = 0,
|
||||
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||
nir_def *bin_stride =
|
||||
nir_load_ubo(b, 1, 32, descriptor, nir_ishl_imm(b, nir_u2u32(b, nir_iadd_imm(b, layer_offset, 1)), 4),
|
||||
.align_mul = 16,
|
||||
.align_offset = 0,
|
||||
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||
|
||||
nir_def *hdr_scale = nir_channels(b, hdr0, 0x3);
|
||||
nir_def *hdr_offset = nir_channels(b, hdr0, 0xc);
|
||||
|
||||
nir_def *bin = nir_f2u16(b, nir_ffma(b, coords, hdr_scale, hdr_offset));
|
||||
nir_def *bin_idx = nir_iadd(b, nir_imul(b, nir_channel(b, bin, 1),
|
||||
nir_u2u16(b, bin_stride)),
|
||||
nir_channel(b, bin, 0));
|
||||
|
||||
bin_idx = nir_iadd_imm(b, nir_iadd(b, bin_idx, layer_offset),
|
||||
sizeof(struct tu_subsampled_header) / 16);
|
||||
|
||||
nir_def *bin_data =
|
||||
nir_load_ubo(b, 4, 32, descriptor, nir_ishl_imm(b, nir_u2u32(b, bin_idx), 4),
|
||||
.align_mul = 16,
|
||||
.align_offset = 0,
|
||||
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||
|
||||
nir_def *bin_scale = nir_channels(b, bin_data, 0x3);
|
||||
nir_def *bin_offset = nir_channels(b, bin_data, 0xc);
|
||||
|
||||
return nir_ffma(b, coords, bin_scale, bin_offset);
|
||||
}
|
||||
|
||||
/* Calculate the y coordinate in subsampled space of a given number of tiles
|
||||
* after the start of "tile".
|
||||
*/
|
||||
static void
|
||||
calc_tile_vert_pos(const struct tu_tile_config *tile,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_framebuffer *fb,
|
||||
unsigned view,
|
||||
VkOffset2D bin_offset,
|
||||
unsigned tile_offset,
|
||||
unsigned *pos_y_out)
|
||||
{
|
||||
int offset_px = 0;
|
||||
if (tile->pos.y == 0 && tile_offset > 0) {
|
||||
/* The first row is a partial row with FDM offset. */
|
||||
offset_px += tiling->tile0.height - bin_offset.y;
|
||||
tile_offset--;
|
||||
}
|
||||
offset_px += tiling->tile0.height * tile_offset;
|
||||
|
||||
unsigned pos_y = tile->subsampled_pos[view].offset.y +
|
||||
offset_px / get_effective_frag_area(tile, view).height;
|
||||
|
||||
/* The last tile is along the framebuffer edge, so clamp to the framebuffer
|
||||
* height.
|
||||
*/
|
||||
*pos_y_out = MIN2(pos_y, tile->subsampled_pos[view].offset.y +
|
||||
tile->subsampled_pos[view].extent.height);
|
||||
}
|
||||
|
||||
static void
|
||||
calc_tile_horiz_pos(const struct tu_tile_config *tile,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_framebuffer *fb,
|
||||
unsigned view,
|
||||
VkOffset2D bin_offset,
|
||||
unsigned tile_offset,
|
||||
unsigned *pos_x_out)
|
||||
{
|
||||
int offset_px = 0;
|
||||
if (tile->pos.x == 0 && tile_offset > 0) {
|
||||
/* The first column is a partial column with FDM offset. */
|
||||
offset_px += tiling->tile0.width - bin_offset.x;
|
||||
tile_offset--;
|
||||
}
|
||||
offset_px += tiling->tile0.width * tile_offset;
|
||||
|
||||
unsigned pos_x = tile->subsampled_pos[view].offset.x +
|
||||
offset_px / get_effective_frag_area(tile, view).width;
|
||||
|
||||
/* The last tile is along the framebuffer edge, so clamp to the framebuffer
|
||||
* width.
|
||||
*/
|
||||
*pos_x_out = MIN2(pos_x, tile->subsampled_pos[view].offset.x +
|
||||
tile->subsampled_pos[view].extent.width);
|
||||
}
|
||||
|
||||
/* Given two tiles "tile" and "other_tile", calculate the y coordinates of
|
||||
* their shared vertical edge in subsampled space relative to "tile". That is,
|
||||
* calculate the y coordinates along the edge of "tile" where "other_tile"
|
||||
* will touch it after scaling up to framebuffer coordinates. The start and
|
||||
* end may be the same coordinate if "tile" and "other_tile" only share a
|
||||
* corner, but this will be extended when handling corners.
|
||||
*/
|
||||
static void
|
||||
calc_shared_vert_edge(const struct tu_tile_config *tile,
|
||||
const struct tu_tile_config *other_tile,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_framebuffer *fb,
|
||||
unsigned view,
|
||||
VkOffset2D bin_offset,
|
||||
unsigned *out_start,
|
||||
unsigned *out_end)
|
||||
{
|
||||
int other_start_tile = MAX2(other_tile->pos.y - tile->pos.y, 0);
|
||||
assert(other_start_tile <= tile->sysmem_extent.height);
|
||||
calc_tile_vert_pos(tile, tiling, fb, view, bin_offset,
|
||||
other_start_tile, out_start);
|
||||
int other_end_tile =
|
||||
MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||
other_tile->pos.y + other_tile->sysmem_extent.height) - tile->pos.y;
|
||||
assert(other_end_tile >= 0);
|
||||
calc_tile_vert_pos(tile, tiling, fb, view, bin_offset,
|
||||
other_end_tile, out_end);
|
||||
}
|
||||
|
||||
static void
|
||||
calc_shared_horiz_edge(const struct tu_tile_config *tile,
|
||||
const struct tu_tile_config *other_tile,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_framebuffer *fb,
|
||||
unsigned view,
|
||||
VkOffset2D bin_offset,
|
||||
unsigned *out_start,
|
||||
unsigned *out_end)
|
||||
{
|
||||
int other_start_tile = MAX2(other_tile->pos.x - tile->pos.x, 0);
|
||||
assert(other_start_tile <= tile->sysmem_extent.width);
|
||||
calc_tile_horiz_pos(tile, tiling, fb, view, bin_offset,
|
||||
other_start_tile, out_start);
|
||||
int other_end_tile =
|
||||
MIN2(tile->pos.x + tile->sysmem_extent.width,
|
||||
other_tile->pos.x + other_tile->sysmem_extent.width) - tile->pos.x;
|
||||
assert(other_end_tile >= 0);
|
||||
calc_tile_horiz_pos(tile, tiling, fb, view, bin_offset,
|
||||
other_end_tile, out_end);
|
||||
}
|
||||
|
||||
/* Extend vertical-edge blit start and end for apron corners. */
|
||||
static void
|
||||
handle_vertical_corners(const struct tu_tile_config *tile,
|
||||
const struct tu_tile_config *other_tile,
|
||||
unsigned view,
|
||||
VkRect2D *tile_dst,
|
||||
struct tu_rect2d_float *other_src)
|
||||
{
|
||||
float other_apron_height =
|
||||
(float)APRON_SIZE * get_effective_frag_area(tile, view).height /
|
||||
get_effective_frag_area(other_tile, view).height;
|
||||
if ((unsigned)other_src->y_start > other_tile->subsampled_pos[view].offset.y) {
|
||||
tile_dst->offset.y -= APRON_SIZE;
|
||||
tile_dst->extent.height += APRON_SIZE;
|
||||
other_src->y_start -= other_apron_height;
|
||||
}
|
||||
if ((unsigned)other_src->y_end <
|
||||
other_tile->subsampled_pos[view].offset.y +
|
||||
other_tile->subsampled_pos[view].extent.height) {
|
||||
tile_dst->extent.height += APRON_SIZE;
|
||||
other_src->y_end += other_apron_height;
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
handle_horizontal_corners(const struct tu_tile_config *tile,
|
||||
const struct tu_tile_config *other_tile,
|
||||
unsigned view,
|
||||
VkRect2D *tile_dst,
|
||||
struct tu_rect2d_float *other_src)
|
||||
{
|
||||
float other_apron_width =
|
||||
(float)APRON_SIZE * get_effective_frag_area(tile, view).width /
|
||||
get_effective_frag_area(other_tile, view).width;
|
||||
if (other_src->x_start > other_tile->subsampled_pos[view].offset.x) {
|
||||
tile_dst->offset.x -= APRON_SIZE;
|
||||
tile_dst->extent.width += APRON_SIZE;
|
||||
other_src->x_start -= other_apron_width;
|
||||
}
|
||||
if ((unsigned)other_src->x_end <
|
||||
other_tile->subsampled_pos[view].offset.x +
|
||||
other_tile->subsampled_pos[view].extent.width) {
|
||||
tile_dst->extent.width += APRON_SIZE;
|
||||
other_src->x_end += other_apron_width;
|
||||
}
|
||||
}
|
||||
unsigned
|
||||
tu_calc_subsampled_aprons(VkRect2D *dst,
|
||||
struct tu_rect2d_float *src,
|
||||
unsigned view,
|
||||
const struct tu_tile_config *tiles,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const struct tu_framebuffer *fb,
|
||||
const VkOffset2D *fdm_offsets)
|
||||
{
|
||||
unsigned count = 0;
|
||||
|
||||
VkOffset2D bin_offset = {};
|
||||
if (fdm_offsets)
|
||||
bin_offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||
|
||||
for (unsigned y = 0; y < vsc->tile_count.height; y++) {
|
||||
for (unsigned x = 0; x < vsc->tile_count.width; x++) {
|
||||
const struct tu_tile_config *tile = &tiles[y * vsc->tile_count.width + x];
|
||||
|
||||
if (tile->merged_tile || !(tile->visible_views & (1u << view)))
|
||||
continue;
|
||||
|
||||
int x_neighbor = tile->pos.x + tile->sysmem_extent.width;
|
||||
int y_neighbor = tile->pos.y + tile->sysmem_extent.height;
|
||||
|
||||
/* Start with vertically adjacent tiles. For a given neighbor to the
|
||||
* right, produce aprons for both this tile and its neighbor along
|
||||
* their shared edge. We handle tiles that only share an edge:
|
||||
*
|
||||
* -------- -------
|
||||
* | | |
|
||||
* | tile | other |
|
||||
* | | |
|
||||
* -------- -------
|
||||
*
|
||||
* Tiles that only share a corner:
|
||||
*
|
||||
* -------
|
||||
* | |
|
||||
* | other |
|
||||
* | |
|
||||
* -------- -------
|
||||
* | |
|
||||
* | tile |
|
||||
* | |
|
||||
* --------
|
||||
*
|
||||
* And tiles where the corner of one tile comes from the edge of
|
||||
* another:
|
||||
*
|
||||
* -------
|
||||
* | |
|
||||
* | |
|
||||
* | |
|
||||
* --------| other |
|
||||
* | | |
|
||||
* | tile | |
|
||||
* | | |
|
||||
* -------- -------
|
||||
*
|
||||
*/
|
||||
if (x_neighbor < vsc->tile_count.width) {
|
||||
int y_start = MAX2(tile->pos.y - 1, 0);
|
||||
int y_end = MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||
vsc->tile_count.height - 1);
|
||||
const struct tu_tile_config *other_tile;
|
||||
|
||||
/* Sweep all tiles directly to the right, keeping in mind
|
||||
* merged tiles.
|
||||
*/
|
||||
for (int y = y_start; y <= y_end;
|
||||
y = other_tile->pos.y + other_tile->sysmem_extent.height) {
|
||||
other_tile = tu_get_merged_tile_const(&tiles[y * vsc->tile_count.width + x_neighbor]);
|
||||
|
||||
if (!(other_tile->visible_views & (1u << view)))
|
||||
continue;
|
||||
|
||||
/* If they are next to each other then neither needs an apron. */
|
||||
if (tile->subsampled_pos[view].offset.x +
|
||||
tile->subsampled_pos[view].extent.width ==
|
||||
other_tile->subsampled_pos[view].offset.x)
|
||||
continue;
|
||||
|
||||
/* If other_tile isn't entirely to the right of tile, it is not
|
||||
* vertically adjacent and will be handled below instead.
|
||||
*/
|
||||
if (other_tile->pos.x < tile->pos.x + tile->sysmem_extent.width)
|
||||
continue;
|
||||
|
||||
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||
VkExtent2D other_frag_area =
|
||||
get_effective_frag_area(other_tile, view);
|
||||
|
||||
unsigned tile_start, tile_end;
|
||||
calc_shared_vert_edge(tile, other_tile, tiling, fb, view,
|
||||
bin_offset, &tile_start, &tile_end);
|
||||
|
||||
unsigned other_tile_start, other_tile_end;
|
||||
calc_shared_vert_edge(other_tile, tile, tiling, fb, view,
|
||||
bin_offset, &other_tile_start,
|
||||
&other_tile_end);
|
||||
|
||||
VkRect2D tile_dst;
|
||||
|
||||
tile_dst.offset.y = tile_start;
|
||||
tile_dst.extent.height = tile_end - tile_start;
|
||||
|
||||
tile_dst.offset.x = tile->subsampled_pos[view].offset.x +
|
||||
tile->subsampled_pos[view].extent.width;
|
||||
tile_dst.extent.width = APRON_SIZE;
|
||||
|
||||
struct tu_rect2d_float other_src;
|
||||
|
||||
other_src.x_start = other_tile->subsampled_pos[view].offset.x;
|
||||
other_src.x_end = other_src.x_start +
|
||||
(float)APRON_SIZE * frag_area.width / other_frag_area.width;
|
||||
|
||||
other_src.y_start = other_tile_start;
|
||||
other_src.y_end = other_tile_end;
|
||||
|
||||
/* Extend start and end for apron corners. */
|
||||
handle_vertical_corners(tile, other_tile, view, &tile_dst,
|
||||
&other_src);
|
||||
|
||||
/* Add other_tile -> tile blit to the list. */
|
||||
dst[count] = tile_dst;
|
||||
src[count] = other_src;
|
||||
count++;
|
||||
|
||||
VkRect2D other_dst;
|
||||
|
||||
other_dst.offset.y = other_tile_start;
|
||||
other_dst.extent.height = other_tile_end - other_tile_start;
|
||||
|
||||
other_dst.offset.x =
|
||||
other_tile->subsampled_pos[view].offset.x - APRON_SIZE;
|
||||
other_dst.extent.width = APRON_SIZE;
|
||||
|
||||
struct tu_rect2d_float tile_src;
|
||||
|
||||
tile_src.x_end = tile->subsampled_pos[view].offset.x
|
||||
+ tile->subsampled_pos[view].extent.width;
|
||||
tile_src.x_start = tile_src.x_end -
|
||||
(float)APRON_SIZE * other_frag_area.width / frag_area.width;
|
||||
|
||||
tile_src.y_start = tile_start;
|
||||
tile_src.y_end = tile_end;
|
||||
|
||||
handle_vertical_corners(other_tile, tile, view, &other_dst,
|
||||
&tile_src);
|
||||
|
||||
/* Add tile -> other_tile blit to the list. */
|
||||
dst[count] = other_dst;
|
||||
src[count] = tile_src;
|
||||
count++;
|
||||
}
|
||||
}
|
||||
|
||||
/* Now do the same thing but for horizontally adjacent tiles. Because
|
||||
* the above loop handled tiles that only share a corner, we only
|
||||
* have to handle neighbors below it that share an edge. However,
|
||||
* these neighbors may also share a corner if they are merged tiles.
|
||||
*/
|
||||
if (y_neighbor < vsc->tile_count.height) {
|
||||
const struct tu_tile_config *other_tile;
|
||||
|
||||
/* Sweep all tiles directly below, keeping in mind merged tiles.
|
||||
*/
|
||||
for (int x = tile->pos.x;
|
||||
x < tile->pos.x + tile->sysmem_extent.width;
|
||||
x = other_tile->pos.x + other_tile->sysmem_extent.width) {
|
||||
other_tile = tu_get_merged_tile_const(&tiles[y_neighbor * vsc->tile_count.width + x]);
|
||||
|
||||
if (!(other_tile->visible_views & (1u << view)))
|
||||
continue;
|
||||
|
||||
/* If both are next to each other then neither needs an apron. */
|
||||
if (tile->subsampled_pos[view].offset.y +
|
||||
tile->subsampled_pos[view].extent.height ==
|
||||
other_tile->subsampled_pos[view].offset.y)
|
||||
continue;
|
||||
|
||||
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||
VkExtent2D other_frag_area =
|
||||
get_effective_frag_area(other_tile, view);
|
||||
|
||||
unsigned tile_start, tile_end;
|
||||
calc_shared_horiz_edge(tile, other_tile, tiling, fb, view,
|
||||
bin_offset, &tile_start, &tile_end);
|
||||
|
||||
unsigned other_tile_start, other_tile_end;
|
||||
calc_shared_horiz_edge(other_tile, tile, tiling, fb, view,
|
||||
bin_offset, &other_tile_start,
|
||||
&other_tile_end);
|
||||
|
||||
VkRect2D tile_dst;
|
||||
|
||||
tile_dst.offset.x = tile_start;
|
||||
tile_dst.extent.width = tile_end - tile_start;
|
||||
|
||||
tile_dst.offset.y = tile->subsampled_pos[view].offset.y +
|
||||
tile->subsampled_pos[view].extent.height;
|
||||
tile_dst.extent.height = APRON_SIZE;
|
||||
|
||||
struct tu_rect2d_float other_src;
|
||||
|
||||
other_src.y_start = other_tile->subsampled_pos[view].offset.y;
|
||||
other_src.y_end = other_src.y_start +
|
||||
(float)APRON_SIZE * frag_area.height / other_frag_area.height;
|
||||
|
||||
other_src.x_start = other_tile_start;
|
||||
other_src.x_end = other_tile_end;
|
||||
|
||||
/* Extend start and end for apron corners. */
|
||||
handle_horizontal_corners(tile, other_tile, view, &tile_dst,
|
||||
&other_src);
|
||||
|
||||
/* Add other_tile -> tile blit to the list. */
|
||||
dst[count] = tile_dst;
|
||||
src[count] = other_src;
|
||||
assert(tile_dst.offset.x >= 0);
|
||||
assert(tile_dst.offset.y >= 0);
|
||||
count++;
|
||||
|
||||
VkRect2D other_dst;
|
||||
|
||||
other_dst.offset.x = other_tile_start;
|
||||
other_dst.extent.width = other_tile_end - other_tile_start;
|
||||
|
||||
other_dst.offset.y =
|
||||
other_tile->subsampled_pos[view].offset.y - APRON_SIZE;
|
||||
other_dst.extent.height = APRON_SIZE;
|
||||
|
||||
struct tu_rect2d_float tile_src;
|
||||
|
||||
tile_src.y_end = tile->subsampled_pos[view].offset.y
|
||||
+ tile->subsampled_pos[view].extent.height;
|
||||
tile_src.y_start = tile_src.y_end -
|
||||
(float)APRON_SIZE * other_frag_area.height / frag_area.height;
|
||||
|
||||
tile_src.x_start = tile_start;
|
||||
tile_src.x_end = tile_end;
|
||||
|
||||
handle_horizontal_corners(other_tile, tile, view, &other_dst,
|
||||
&tile_src);
|
||||
|
||||
/* Add tile -> other_tile blit to the list. */
|
||||
dst[count] = other_dst;
|
||||
src[count] = tile_src;
|
||||
assert(other_dst.offset.x >= 0);
|
||||
assert(other_dst.offset.y >= 0);
|
||||
count++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return count;
|
||||
}
|
||||
88
src/freedreno/vulkan/tu_subsampled_image.h
Normal file
88
src/freedreno/vulkan/tu_subsampled_image.h
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
/*
|
||||
* Copyright © 2026 Valve Corporation.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
#include "tu_common.h"
|
||||
|
||||
/* Describe the format used for subsampled image metadata. This is attached to
|
||||
* subsampled images, via a separate UBO descriptor after the image
|
||||
* descriptor. It is written after the render pass which writes to the image,
|
||||
* and is read via code injected into the shader when sampling from a
|
||||
* subsampled image.
|
||||
*/
|
||||
|
||||
/* The maximum number of bins a subsampled image can have before we disable
|
||||
* subsampling.
|
||||
*/
|
||||
#define TU_SUBSAMPLED_MAX_BINS 512
|
||||
|
||||
/* The maximum number of layers a view of a subsampled image can have.
|
||||
*
|
||||
* There is one metadata structure per layer, and the view uses a UBO for the
|
||||
* metadata, so this is bounded by the maximum UBO size.
|
||||
*
|
||||
* TODO: When we implement fdm2, we should expose this as
|
||||
* maxSubsampledArrayLayers. The Vulkan spec says that the minimum value for
|
||||
* maxSubsampledArrayLayers is 2, so users can only rely on 2 layers even
|
||||
* though we support more.
|
||||
*/
|
||||
#define TU_SUBSAMPLED_MAX_LAYERS 6
|
||||
|
||||
/* This is 2 to allow for floating-point precision errors and in case the user
|
||||
* uses bicubic filtering.
|
||||
*/
|
||||
#define APRON_SIZE 2
|
||||
|
||||
struct tu_subsampled_bin {
|
||||
float scale_x;
|
||||
float scale_y;
|
||||
float offset_x;
|
||||
float offset_y;
|
||||
};
|
||||
|
||||
struct tu_subsampled_header {
|
||||
/* The bin coordinate to use is calculated as:
|
||||
* bin = int(coord * scale + offset)
|
||||
*/
|
||||
float scale_x;
|
||||
float scale_y;
|
||||
float offset_x;
|
||||
float offset_y;
|
||||
|
||||
uint32_t bin_stride;
|
||||
uint32_t pad0[3];
|
||||
};
|
||||
|
||||
struct tu_subsampled_metadata {
|
||||
struct tu_subsampled_header hdr;
|
||||
|
||||
struct tu_subsampled_bin bins[TU_SUBSAMPLED_MAX_BINS];
|
||||
};
|
||||
|
||||
void
|
||||
tu_emit_subsampled_metadata(struct tu_cmd_buffer *cmd,
|
||||
struct tu_cs *cs,
|
||||
unsigned a,
|
||||
const struct tu_tile_config *tiles,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const struct tu_framebuffer *fb,
|
||||
const VkOffset2D *fdm_offsets);
|
||||
|
||||
unsigned
|
||||
tu_calc_subsampled_aprons(VkRect2D *dst,
|
||||
struct tu_rect2d_float *src,
|
||||
unsigned view,
|
||||
const struct tu_tile_config *tiles,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const struct tu_framebuffer *fb,
|
||||
const VkOffset2D *fdm_offsets);
|
||||
|
||||
nir_def *
|
||||
tu_get_subsampled_coordinates(nir_builder *b,
|
||||
nir_def *coords,
|
||||
nir_def *descriptor);
|
||||
|
|
@ -10,6 +10,9 @@
|
|||
|
||||
#include "tu_cmd_buffer.h"
|
||||
#include "tu_tile_config.h"
|
||||
#include "tu_subsampled_image.h"
|
||||
|
||||
#include "util/u_worklist.h"
|
||||
|
||||
static void
|
||||
tu_calc_frag_area(struct tu_cmd_buffer *cmd,
|
||||
|
|
@ -369,6 +372,370 @@ tu_merge_tiles(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
|||
}
|
||||
}
|
||||
|
||||
/* Get the default position of the tile in subsampled space. It may be shifted
|
||||
* over later, but it has to stay within the non-subsampled rectangle (i.e.
|
||||
* the result we return with frag_area = 1,1). If the tile is made
|
||||
* non-subsampled then its frag_area becomes 1,1.
|
||||
*/
|
||||
static VkRect2D
|
||||
get_default_tile_pos(const struct tu_physical_device *phys_dev,
|
||||
struct tu_tile_config *tile,
|
||||
unsigned view,
|
||||
const struct tu_framebuffer *fb,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const VkOffset2D *fdm_offsets,
|
||||
VkExtent2D frag_area)
|
||||
{
|
||||
VkOffset2D offset = {};
|
||||
if (fdm_offsets)
|
||||
offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||
VkOffset2D aligned_offset = {};
|
||||
aligned_offset.x = offset.x / phys_dev->info->tile_align_w *
|
||||
phys_dev->info->tile_align_w;;
|
||||
aligned_offset.y = offset.y / phys_dev->info->tile_align_h *
|
||||
phys_dev->info->tile_align_h;
|
||||
int32_t fb_start_x =
|
||||
MAX2(tile->pos.x * (int32_t)tiling->tile0.width - offset.x, 0);
|
||||
int32_t fb_end_x =
|
||||
(tile->pos.x + tile->sysmem_extent.width) * tiling->tile0.width - offset.x;
|
||||
int32_t fb_start_y =
|
||||
MAX2(tile->pos.y * (int32_t)tiling->tile0.height - offset.y, 0);
|
||||
int32_t fb_end_y =
|
||||
(tile->pos.y + tile->sysmem_extent.height) * tiling->tile0.height - offset.y;
|
||||
|
||||
/* For tiles in the last row/column, we cannot create an apron for their
|
||||
* right/bottom edges because we don't know what addressing mode the
|
||||
* sampler will use. If the edge of the framebuffer is the same as the edge
|
||||
* of the image, then when sampling the image near the edge we'd expect the
|
||||
* sampler border handling to kick in, but that doesn't work unless the
|
||||
* tile is shifted to the end of the framebuffer. Because the images are
|
||||
* made larger, we have to shift it over by the same amount, which is
|
||||
* currently gmem_align_w/gmem_align_h, so that if the framebuffer is the
|
||||
* same size as the original API image then the border works correctly.
|
||||
*
|
||||
* For tiles not in the first row/column, we align the FDM offset down so
|
||||
* that we can use the faster tile store method. This means that the
|
||||
* subsampled space tile start may be shifted compared to framebuffer
|
||||
* space. This will create a gap between the first and second tiles, which
|
||||
* will require an apron even if neither is subsampled. This works because
|
||||
* gmem_align_w/gmem_align_h is always at least the apron size times two.
|
||||
*/
|
||||
bool stick_to_end_x = fb_end_x >= fb->width;
|
||||
bool stick_to_end_y = fb_end_y >= fb->height;
|
||||
unsigned fb_offset_x = fdm_offsets ?
|
||||
phys_dev->info->tile_align_w : 0;
|
||||
unsigned fb_offset_y = fdm_offsets ?
|
||||
phys_dev->info->tile_align_h : 0;
|
||||
int32_t start_x, end_x, start_y, end_y;
|
||||
if (stick_to_end_x) {
|
||||
end_x = fb->width + fb_offset_x;
|
||||
start_x = end_x - DIV_ROUND_UP(fb->width - fb_start_x, frag_area.width);
|
||||
} else if (tile->pos.x == 0) {
|
||||
start_x = 0;
|
||||
end_x = fb_end_x / frag_area.width;
|
||||
} else {
|
||||
start_x = tile->pos.x * tiling->tile0.width - aligned_offset.x;
|
||||
end_x = start_x + tile->sysmem_extent.width * tiling->tile0.width / frag_area.width;
|
||||
}
|
||||
|
||||
if (stick_to_end_y) {
|
||||
end_y = fb->height + fb_offset_y;
|
||||
start_y = end_y - DIV_ROUND_UP(fb->height - fb_start_y, frag_area.height);
|
||||
} else if (tile->pos.y == 0) {
|
||||
start_y = 0;
|
||||
end_y = fb_end_y / frag_area.height;
|
||||
} else {
|
||||
start_y = tile->pos.y * tiling->tile0.height - aligned_offset.y;
|
||||
end_y = start_y + tile->sysmem_extent.height * tiling->tile0.height / frag_area.height;
|
||||
}
|
||||
|
||||
if (stick_to_end_x || stick_to_end_y)
|
||||
tile->subsampled_border = true;
|
||||
|
||||
return (VkRect2D) {
|
||||
.offset = { start_x, start_y },
|
||||
.extent = { end_x - start_x, end_y - start_y },
|
||||
};
|
||||
}
|
||||
|
||||
static void
|
||||
make_non_subsampled(const struct tu_physical_device *phys_dev,
|
||||
struct tu_tile_config *tile,
|
||||
unsigned view,
|
||||
const struct tu_framebuffer *fb,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const VkOffset2D *fdm_offsets)
|
||||
{
|
||||
tile->subsampled_views &= ~(1u << view);
|
||||
tile->subsampled_pos[view] =
|
||||
get_default_tile_pos(phys_dev, tile, view, fb, tiling, fdm_offsets,
|
||||
(VkExtent2D) { 1, 1 });
|
||||
}
|
||||
|
||||
static bool
|
||||
aprons_intersect(struct tu_tile_config *a, struct tu_tile_config *b,
|
||||
unsigned view)
|
||||
{
|
||||
if (a->subsampled_pos[view].offset.x +
|
||||
a->subsampled_pos[view].extent.width + APRON_SIZE * 2 <=
|
||||
b->subsampled_pos[view].offset.x)
|
||||
return false;
|
||||
|
||||
if (b->subsampled_pos[view].offset.x +
|
||||
b->subsampled_pos[view].extent.width + APRON_SIZE * 2 <=
|
||||
a->subsampled_pos[view].offset.x)
|
||||
return false;
|
||||
|
||||
if (a->subsampled_pos[view].offset.y +
|
||||
a->subsampled_pos[view].extent.height + APRON_SIZE * 2 <=
|
||||
b->subsampled_pos[view].offset.y)
|
||||
return false;
|
||||
|
||||
if (b->subsampled_pos[view].offset.y +
|
||||
b->subsampled_pos[view].extent.height + APRON_SIZE * 2 <=
|
||||
a->subsampled_pos[view].offset.y)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Calculate the location of each bin in the subsampled image and whether we
|
||||
* need to avoid subsampling it. The constraint we have to deal with here is
|
||||
* that for any two tiles sharing an edge, either both must not be subsampled
|
||||
* (so that we do not need to insert an apron) or they must be at least 4
|
||||
* pixels apart along that edge to create an apron of 2 pixels around each
|
||||
* tile. The apron includes the corner of the tile, so tiles that only touch
|
||||
* corners also count as touching along both edges. The two strategies
|
||||
* available to us to deal with this are disabling subsampling and shifting
|
||||
* over the origin of the tile, which only works when there is enough free
|
||||
* space to shift it. This is complicated by the fact that one or both of the
|
||||
* neighboring tiles may be a merged tile, so each tile may have several
|
||||
* neighbors sharing an edge instead of just 3.
|
||||
*
|
||||
* By default, we make each bin start at an aligned version of the start in
|
||||
* framebuffer space, b_s. This means that the tile grid is shifted up and to
|
||||
* the right for FDM offset, making sure the last row/column of tiles always
|
||||
* fits within the image and we only need a small fixed amount of extra space
|
||||
* to hold the overflow.
|
||||
*/
|
||||
static void
|
||||
tu_calc_subsampled(struct tu_tile_config *tiles,
|
||||
const struct tu_physical_device *phys_dev,
|
||||
const struct tu_tiling_config *tiling,
|
||||
const struct tu_framebuffer *fb,
|
||||
const struct tu_vsc_config *vsc,
|
||||
const VkOffset2D *fdm_offsets)
|
||||
{
|
||||
u_worklist worklist;
|
||||
u_worklist_init(&worklist, vsc->tile_count.width * vsc->tile_count.height,
|
||||
NULL);
|
||||
|
||||
for (unsigned y = 0; y < vsc->tile_count.height; y++) {
|
||||
for (unsigned x = 0; x < vsc->tile_count.width; x++) {
|
||||
struct tu_tile_config *tile = &tiles[y * vsc->tile_count.width + x];
|
||||
|
||||
if (!tile->visible_views || tile->merged_tile)
|
||||
continue;
|
||||
|
||||
u_foreach_bit (view, tile->visible_views) {
|
||||
VkOffset2D offset = {};
|
||||
if (fdm_offsets)
|
||||
offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||
tile->subsampled_pos[view] =
|
||||
get_default_tile_pos(phys_dev, tile, view, fb, tiling, fdm_offsets,
|
||||
tile->frag_areas[view]);
|
||||
|
||||
if (tile->frag_areas[view].width != 1 ||
|
||||
tile->frag_areas[view].height != 1)
|
||||
tile->subsampled_views |= 1u << view;
|
||||
}
|
||||
|
||||
tile->subsampled = true;
|
||||
tile->worklist_idx = y * vsc->tile_count.width + x;
|
||||
|
||||
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||
}
|
||||
}
|
||||
|
||||
while (!u_worklist_is_empty(&worklist)) {
|
||||
struct tu_tile_config *tile =
|
||||
u_worklist_pop_head(&worklist, struct tu_tile_config, worklist_idx);
|
||||
|
||||
/* First, iterate over the vertically adjacent tiles and check for
|
||||
* vertical issues.
|
||||
*/
|
||||
for (unsigned i = 0; i < 2; i++) {
|
||||
int x_offset = i == 0 ? -1 : tile->sysmem_extent.width;
|
||||
int x_pos = tile->pos.x + x_offset;
|
||||
if (x_pos < 0 || x_pos >= vsc->tile_count.width)
|
||||
continue;
|
||||
int y_start = MAX2(tile->pos.y - 1, 0);
|
||||
int y_end = MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||
vsc->tile_count.height - 1);
|
||||
struct tu_tile_config *other_tile =
|
||||
tu_get_merged_tile(&tiles[y_start * vsc->tile_count.width + x_pos]);
|
||||
/* Sweep from (x_pos, y_start) to (x_pos, y_end), keeping in mind
|
||||
* merged tiles.
|
||||
*/
|
||||
for (int y = y_start; y <= y_end;
|
||||
y = other_tile->pos.y + other_tile->sysmem_extent.height) {
|
||||
other_tile = tu_get_merged_tile(&tiles[y * vsc->tile_count.width + x_pos]);
|
||||
uint32_t common_views = tile->visible_views &
|
||||
other_tile->visible_views;
|
||||
if (common_views == 0)
|
||||
continue;
|
||||
|
||||
if (((tile->subsampled_views | other_tile->subsampled_views) &
|
||||
common_views) == 0)
|
||||
continue;
|
||||
|
||||
struct tu_tile_config *left_tile = (i == 0) ? other_tile : tile;
|
||||
struct tu_tile_config *right_tile = (i == 0) ? tile : other_tile;
|
||||
|
||||
/* Due to bin merging, the right tile may not actually be
|
||||
* to the right of the left tile, instead extending to the right
|
||||
* of it, for example if other_tile includes (0, 0) and (1, 0) and
|
||||
* the current tile is (0, 1) or vice versa. top_tile will then
|
||||
* also be vertically adjacent, and we can skip it because it will
|
||||
* be handled below, and it should not touch horizontally
|
||||
* which means it will also not touch vertically.
|
||||
*/
|
||||
if (right_tile->pos.x < left_tile->pos.x +
|
||||
left_tile->sysmem_extent.width)
|
||||
continue;
|
||||
|
||||
u_foreach_bit (view, common_views) {
|
||||
if (!((tile->subsampled_views | other_tile->subsampled_views) &
|
||||
(1u << view)))
|
||||
continue;
|
||||
|
||||
if (!aprons_intersect(tile, other_tile, view))
|
||||
continue;
|
||||
|
||||
/* Try shifting the right tile to the right. */
|
||||
if (right_tile->subsampled_views & (1u << view)) {
|
||||
VkRect2D right_unsubsampled =
|
||||
get_default_tile_pos(phys_dev, right_tile, view, fb,
|
||||
tiling, fdm_offsets,
|
||||
(VkExtent2D) { 1, 1 });
|
||||
const unsigned shift_amount =
|
||||
MAX2(APRON_SIZE * 2, phys_dev->info->tile_align_w);
|
||||
if (right_tile->subsampled_pos[view].offset.x +
|
||||
right_tile->subsampled_pos[view].extent.width +
|
||||
shift_amount <= right_unsubsampled.offset.x +
|
||||
right_unsubsampled.extent.width) {
|
||||
right_tile->subsampled_pos[view].offset.x +=
|
||||
shift_amount;
|
||||
u_worklist_push_tail(&worklist, right_tile,
|
||||
worklist_idx);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
/* Now we have to make both tiles non-subsampled. */
|
||||
if (tile->subsampled_views & (1u << view)) {
|
||||
make_non_subsampled(phys_dev, tile, view, fb, tiling, fdm_offsets);
|
||||
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||
}
|
||||
|
||||
if (other_tile->subsampled_views & (1u << view)) {
|
||||
make_non_subsampled(phys_dev, other_tile, view, fb, tiling, fdm_offsets);
|
||||
u_worklist_push_tail(&worklist, other_tile, worklist_idx);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Do the identical thing for horizontally adjacent tiles.
|
||||
*/
|
||||
for (unsigned i = 0; i < 2; i++) {
|
||||
int y_offset = i == 0 ? -1 : tile->sysmem_extent.height;
|
||||
int y_pos = tile->pos.y + y_offset;
|
||||
if (y_pos < 0 || y_pos >= vsc->tile_count.height)
|
||||
continue;
|
||||
int x_start = MAX2(tile->pos.x - 1, 0);
|
||||
int x_end = MIN2(tile->pos.x + tile->sysmem_extent.width,
|
||||
vsc->tile_count.width - 1);
|
||||
struct tu_tile_config *other_tile =
|
||||
tu_get_merged_tile(&tiles[y_pos * vsc->tile_count.width + x_start]);
|
||||
/* Sweep from (x_start, y_pos) to (x_end, y_pos), keeping in mind
|
||||
* merged tiles.
|
||||
*/
|
||||
for (int x = x_start; x <= x_end;
|
||||
x = other_tile->pos.x + other_tile->sysmem_extent.width) {
|
||||
other_tile = tu_get_merged_tile(&tiles[y_pos * vsc->tile_count.width + x]);
|
||||
uint32_t common_views = tile->visible_views &
|
||||
other_tile->visible_views;
|
||||
if (common_views == 0)
|
||||
continue;
|
||||
|
||||
if (((tile->subsampled_views | other_tile->subsampled_views) &
|
||||
common_views) == 0)
|
||||
continue;
|
||||
|
||||
struct tu_tile_config *top_tile = (i == 0) ? other_tile : tile;
|
||||
struct tu_tile_config *bottom_tile = (i == 0) ? tile : other_tile;
|
||||
|
||||
/* Due to bin merging, the bottom tile may not actually be
|
||||
* below the top tile, instead extending below it, for example
|
||||
* if other_tile includes (0, 0) and (0, 1) and the current
|
||||
* tile is (1, 0) or vice versa. top_tile will then also be
|
||||
* vertically adjacent, and we can skip it because it will have
|
||||
* been handled above, and it should not touch vertically which
|
||||
* means it will also not touch horizontally.
|
||||
*/
|
||||
if (bottom_tile->pos.y < top_tile->pos.y +
|
||||
top_tile->sysmem_extent.height)
|
||||
continue;
|
||||
|
||||
u_foreach_bit (view, common_views) {
|
||||
if (!((tile->subsampled_views | other_tile->subsampled_views) &
|
||||
(1u << view)))
|
||||
continue;
|
||||
|
||||
if (!aprons_intersect(tile, other_tile, view))
|
||||
continue;
|
||||
|
||||
/* Try shifting the bottom tile down. */
|
||||
if (bottom_tile->subsampled_views & (1u << view)) {
|
||||
VkRect2D bottom_unsubsampled =
|
||||
get_default_tile_pos(phys_dev, bottom_tile, view, fb,
|
||||
tiling, fdm_offsets,
|
||||
(VkExtent2D) { 1, 1 });
|
||||
const unsigned shift_amount =
|
||||
MAX2(APRON_SIZE * 2, phys_dev->info->tile_align_h);
|
||||
if (bottom_tile->subsampled_pos[view].offset.y +
|
||||
bottom_tile->subsampled_pos[view].extent.height +
|
||||
shift_amount <= bottom_unsubsampled.offset.y +
|
||||
bottom_unsubsampled.extent.height) {
|
||||
bottom_tile->subsampled_pos[view].offset.y +=
|
||||
shift_amount;
|
||||
u_worklist_push_tail(&worklist, bottom_tile,
|
||||
worklist_idx);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
/* Now we have to make both tiles non-subsampled. One or both
|
||||
* may be shifted so we have to un-shift them.
|
||||
*/
|
||||
if (tile->subsampled_views & (1u << view)) {
|
||||
make_non_subsampled(phys_dev, tile, view, fb, tiling, fdm_offsets);
|
||||
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||
}
|
||||
|
||||
if (other_tile->subsampled_views & (1u << view)) {
|
||||
make_non_subsampled(phys_dev, other_tile, view, fb, tiling, fdm_offsets);
|
||||
u_worklist_push_tail(&worklist, other_tile, worklist_idx);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
u_worklist_fini(&worklist);
|
||||
}
|
||||
|
||||
struct tu_tile_config *
|
||||
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||
|
|
@ -420,6 +787,13 @@ tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
|||
}
|
||||
}
|
||||
|
||||
if (cmd->state.fdm_subsampled &&
|
||||
vsc->tile_count.width * vsc->tile_count.height <= TU_SUBSAMPLED_MAX_BINS) {
|
||||
tu_calc_subsampled(tiles, cmd->device->physical_device,
|
||||
cmd->state.tiling, cmd->state.framebuffer,
|
||||
vsc, fdm_offsets);
|
||||
}
|
||||
|
||||
return tiles;
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -18,10 +18,37 @@ struct tu_tile_config {
|
|||
uint32_t pipe;
|
||||
uint32_t slot_mask;
|
||||
uint32_t visible_views;
|
||||
|
||||
/* Whether to use subsampled_pos instead of the normal origin in
|
||||
* framebuffer space when storing this tile.
|
||||
*/
|
||||
bool subsampled;
|
||||
|
||||
/* If subsampled is true, whether this is a border tile that may not be
|
||||
* aligned.
|
||||
*/
|
||||
bool subsampled_border;
|
||||
|
||||
/* If subsampled is true, which views to store subsampled. If true, the
|
||||
* view is stored low-resolution as is, if false the view is expanded to
|
||||
* its full size in sysmem when resolving. However the origin of the tile
|
||||
* in subsampled space is always subsampled_pos when subsampled is true,
|
||||
* regardless of the value of this field.
|
||||
*/
|
||||
uint32_t subsampled_views;
|
||||
|
||||
/* Used internally. */
|
||||
unsigned worklist_idx;
|
||||
|
||||
/* The tile this tile was merged with. */
|
||||
struct tu_tile_config *merged_tile;
|
||||
|
||||
/* For subsampled images, the start of the tile in the final subsampled
|
||||
* image for each view. This may or may not be the start of the tile in
|
||||
* framebuffer space, due to the need to shift tiles over.
|
||||
*/
|
||||
VkRect2D subsampled_pos[MAX_VIEWS];
|
||||
|
||||
/* For merged tiles, the extent in tiles when resolved to system memory.
|
||||
*/
|
||||
VkExtent2D sysmem_extent;
|
||||
|
|
@ -34,6 +61,25 @@ struct tu_tile_config {
|
|||
VkExtent2D frag_areas[MAX_VIEWS];
|
||||
};
|
||||
|
||||
/* After merging, follow the trail of merged_tile pointers back to the tile
|
||||
* this tile was ultimately merged with.
|
||||
*/
|
||||
static inline struct tu_tile_config *
|
||||
tu_get_merged_tile(struct tu_tile_config *tile)
|
||||
{
|
||||
while (tile->merged_tile)
|
||||
tile = tile->merged_tile;
|
||||
return tile;
|
||||
}
|
||||
|
||||
static inline const struct tu_tile_config *
|
||||
tu_get_merged_tile_const(const struct tu_tile_config *tile)
|
||||
{
|
||||
while (tile->merged_tile)
|
||||
tile = tile->merged_tile;
|
||||
return tile;
|
||||
}
|
||||
|
||||
struct tu_tile_config *
|
||||
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||
const struct tu_image_view *fdm, const VkOffset2D *fdm_offsets);
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue