mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-07 11:28:05 +02:00
tu: Implement subsampled images
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39868>
This commit is contained in:
parent
cc710283a7
commit
4b87df29b3
20 changed files with 2137 additions and 194 deletions
|
|
@ -36,11 +36,12 @@ This space exists whenever tiled rendering/GMEM is used, even without FDM. It
|
||||||
is the space used to access GMEM, with the origin at the upper left of the
|
is the space used to access GMEM, with the origin at the upper left of the
|
||||||
tile. The hardware automatically transforms rendering space into GMEM space
|
tile. The hardware automatically transforms rendering space into GMEM space
|
||||||
whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
|
whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
|
||||||
origin of this space will be called :math:`b_{cs}`, the common bin start, for
|
origin of this space in rendering space, or the value of ``*_WINDOW_OFFSET``,
|
||||||
reasons that are explained below. When using FDM, coordinates in this space
|
will be called :math:`b_{cs}`, the common bin start, for reasons that are
|
||||||
must be multiplied by the scaling factor :math:`s` derived from the fragment
|
explained below. When using FDM, coordinates in this space must be multiplied
|
||||||
density map, or equivalently divided by the fragment area (as defined by the
|
by the scaling factor :math:`s` derived from the fragment density map, or
|
||||||
Vulkan specification), with the origin still at the upper left of the tile. For
|
equivalently divided by the fragment area (as defined by the Vulkan
|
||||||
|
specification), with the origin still at the upper left of the tile. For
|
||||||
example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
|
example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
|
||||||
without FDM and all coordinates in this space must be divided by 2.
|
without FDM and all coordinates in this space must be divided by 2.
|
||||||
|
|
||||||
|
|
@ -81,6 +82,104 @@ a multiple of :math:`1 / s`. This is a natural constraint anyway, because if
|
||||||
it wasn't the case then the bin would start in the middle of a fragment which
|
it wasn't the case then the bin would start in the middle of a fragment which
|
||||||
isn't possible to handle correctly.
|
isn't possible to handle correctly.
|
||||||
|
|
||||||
|
Subsampled Space
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
When using subsampled images, this is the space where the bin is stored in the
|
||||||
|
underlying subsampled image. When sampling from a subsampled image, the driver
|
||||||
|
inserts shader code to transform from framebuffer space to subsampled space
|
||||||
|
using metadata written when rendering to the image.
|
||||||
|
|
||||||
|
Accesses towards the edge of a bin may partially bleed into its neighboring bin
|
||||||
|
with linear or bicubic sampling. If its neighbor has a different scale or isn't
|
||||||
|
adjacent in subsampled space, we will sample the incorrect data or empty space
|
||||||
|
and return a corrupted result. In order to handle this, we need to insert an
|
||||||
|
"apron" around problematic edges and corners. This is done by blitting from the
|
||||||
|
nearest neighbor of each bin after the renderpass.
|
||||||
|
|
||||||
|
Subsampled space is normally scaled down similar to rendering space, which is
|
||||||
|
the point of subsampled images in the first place, but the origin of the bin
|
||||||
|
is up to the driver. The driver chooses the origin of each bin when rendering a
|
||||||
|
given render pass and then encodes it in the metadata used when sampling the
|
||||||
|
image. Bins that require an apron must be far enough away from each other that
|
||||||
|
their aprons don't intersect, and all of the bins must be contained within the
|
||||||
|
underlying image.
|
||||||
|
|
||||||
|
Even when subsampled images are in use, not all bins may be subsampled. For
|
||||||
|
example, there may not be enough space to insert aprons around every bin. When
|
||||||
|
this is the case, subsampled space is not scaled like rendering space, that is
|
||||||
|
we expand the bin when resolving similar to non-subsampled images, however the
|
||||||
|
origin of the bin may still differ from framebuffer space origin.
|
||||||
|
|
||||||
|
The algorithm used by turnip used to calculate the bin layout in subsampled
|
||||||
|
space is to start with a "default" layout of the bins and then recursively
|
||||||
|
solve conflicts caused by bins whose aprons are too close together. The first
|
||||||
|
strategy used is to shift one of the bins over by a certain amount. The second
|
||||||
|
fallback strategy is to un-subsample both neighboring bins, making them
|
||||||
|
expanded so that they touch each other and there is no apron.
|
||||||
|
|
||||||
|
One natural choice for the "default" layout is to just use rendering space.
|
||||||
|
That is, start each bin at :math:`b_cs` by default. That mostly works, except
|
||||||
|
for two problems. The first is easier to solve, and has to do with the border
|
||||||
|
when sampling: it is allowed to use border colors with subsampled images, and
|
||||||
|
when that happens and the framebuffer covers the entire image, it is expected
|
||||||
|
that sampling around the edge correctly blends the border color and the edge
|
||||||
|
pixel. In order for that to happen, bins that touch or intersect the edge of
|
||||||
|
the framebuffer in framebuffer space have to be shifted over so that their edge
|
||||||
|
touches the framebuffer edge in subsampled space too.
|
||||||
|
|
||||||
|
Doing this also allows an optimization: because we are storing the tile's
|
||||||
|
contents one to one from GMEM to system memory instead of scaling it up, we can
|
||||||
|
use the dedicated resolve engine instead of GRAS to resolve the tile to system
|
||||||
|
memory. Normally GRAS has to be used with non-subsampled images to scale up the
|
||||||
|
bin when resolving. However this doesn't work for tiles around the right and
|
||||||
|
bottom edge where we have to shift over the tile to align to the edge. This
|
||||||
|
also gets a bit tricky when the tile is shifted to avoid apron conflicts, because
|
||||||
|
normally the resolve engine would write the tile directly without shifting.
|
||||||
|
However there is a trick we can use to avoid falling back to GRAS: by
|
||||||
|
overriding ``RB_RESOLVE_WINDOW_OFFSET``, we can effectively apply an offset by
|
||||||
|
telling the resolve engine that the tile was rendered somewhere else. This
|
||||||
|
means that the shift amount has to be aligned to the alignment of
|
||||||
|
``RB_RESOLVE_WINDOW_OFFSET``, which is ``tile_align_*`` in the device info.
|
||||||
|
|
||||||
|
The other problem with making subsampled space equal rendering space is that
|
||||||
|
with an FDM offset, rendering space can be arbitrarily larger than framebuffer
|
||||||
|
space, and we may overflow the attachments by up to the size of a tile. The API
|
||||||
|
is designed to allow the driver to allocate extra slop space in the image in
|
||||||
|
this case, because there are image create flags for subsampled and FDM offset,
|
||||||
|
however the maximum tile size is far too large and images would take up
|
||||||
|
far too much memory if we allocated enough slop space for the largest
|
||||||
|
possible tile. An alternative is to use a hybrid of framebuffer space and
|
||||||
|
rendering space: shift over the tiles by :math:`b_o` so that their origin
|
||||||
|
is :math:`b_s` instead of :math:`b_cs`, but leave them scaled down. This
|
||||||
|
requires no slop space whatsoever, because the bins are shifted inside the
|
||||||
|
original image, but we can no longer use the resolve engine as the tile offsets
|
||||||
|
are no longer aligned to ``tile_align_*``. So in the driver we combine both
|
||||||
|
approaches: we calculate an aligned offset :math:`b_o'` which is :math:`b_o`
|
||||||
|
aligned down to ``tile_align_*`` and shift over the tiles by subtracting
|
||||||
|
:math:`b_o'` instead of :math:`b_o`. This requires slop space, but only
|
||||||
|
:math:`b_o - b_o'` slop space is required, which must be less than
|
||||||
|
``tile_align_*``. As usual the first row/column are not shifted over in x/y
|
||||||
|
respectively.
|
||||||
|
|
||||||
|
Here is an example of what a subsampled image looks like in memory, in this
|
||||||
|
case without any FDM offset:
|
||||||
|
|
||||||
|
.. figure:: subsampled_annotated.jpg
|
||||||
|
:alt: Example of a subsampled image
|
||||||
|
|
||||||
|
Note how some of the bins are shifted over to make space for the apron. After
|
||||||
|
applying the coordinate transform when sampling, this is the final image:
|
||||||
|
|
||||||
|
.. figure:: subsampled_final.jpg
|
||||||
|
:alt: Example of a subsampled image after coordinate transform
|
||||||
|
|
||||||
|
When ``VK_EXT_custom_resolve`` and subsampled images are used together, the
|
||||||
|
custom resolve subpass writes directly to the subsampled image. This means that
|
||||||
|
it needs to use subsampled space instead of rendering space, which in practice
|
||||||
|
means replacing :math:`b_{cs}` with the origin of the bin in the subsampled
|
||||||
|
image.
|
||||||
|
|
||||||
Viewport and Scissor Patching
|
Viewport and Scissor Patching
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
|
|
||||||
BIN
docs/drivers/freedreno/subsampled_annotated.jpg
Normal file
BIN
docs/drivers/freedreno/subsampled_annotated.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.6 MiB |
BIN
docs/drivers/freedreno/subsampled_final.jpg
Normal file
BIN
docs/drivers/freedreno/subsampled_final.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.9 MiB |
|
|
@ -48,6 +48,7 @@ libtu_files = files(
|
||||||
'tu_rmv.cc',
|
'tu_rmv.cc',
|
||||||
'tu_shader.cc',
|
'tu_shader.cc',
|
||||||
'tu_suballoc.cc',
|
'tu_suballoc.cc',
|
||||||
|
'tu_subsampled_image.cc',
|
||||||
'tu_tile_config.cc',
|
'tu_tile_config.cc',
|
||||||
'tu_util.cc',
|
'tu_util.cc',
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -647,6 +647,51 @@ build_blit_vs_shader(void)
|
||||||
return b->shader;
|
return b->shader;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static nir_shader *
|
||||||
|
build_multi_blit_vs_shader(void)
|
||||||
|
{
|
||||||
|
nir_builder _b =
|
||||||
|
nir_builder_init_simple_shader(MESA_SHADER_VERTEX, NULL, "multi blit vs");
|
||||||
|
nir_builder *b = &_b;
|
||||||
|
|
||||||
|
nir_variable *out_pos =
|
||||||
|
nir_create_variable_with_location(b->shader, nir_var_shader_out,
|
||||||
|
VARYING_SLOT_POS,
|
||||||
|
glsl_vec4_type());
|
||||||
|
|
||||||
|
b->shader->info.num_ubos = 1;
|
||||||
|
|
||||||
|
nir_def *vertex = nir_load_vertex_id(b);
|
||||||
|
nir_def *pos_and_coords =
|
||||||
|
nir_load_ubo(b, 4, 32, nir_imm_int(b, 0),
|
||||||
|
nir_ishl_imm(b, vertex, 4),
|
||||||
|
.align_mul = 16,
|
||||||
|
.align_offset = 0,
|
||||||
|
.range = 1 << 16);
|
||||||
|
|
||||||
|
nir_def *pos = nir_channels(b, pos_and_coords, 0x3);
|
||||||
|
nir_def *coords = nir_channels(b, pos_and_coords, 0xc);
|
||||||
|
|
||||||
|
pos = nir_vec4(b, nir_channel(b, pos, 0),
|
||||||
|
nir_channel(b, pos, 1),
|
||||||
|
nir_imm_float(b, 0.0),
|
||||||
|
nir_imm_float(b, 1.0));
|
||||||
|
|
||||||
|
nir_store_var(b, out_pos, pos, 0xf);
|
||||||
|
|
||||||
|
nir_variable *out_coords =
|
||||||
|
nir_create_variable_with_location(b->shader, nir_var_shader_out,
|
||||||
|
VARYING_SLOT_VAR0,
|
||||||
|
glsl_vec_type(3));
|
||||||
|
|
||||||
|
coords = nir_vec3(b, nir_channel(b, coords, 0), nir_channel(b, coords, 1),
|
||||||
|
nir_imm_float(b, 0));
|
||||||
|
|
||||||
|
nir_store_var(b, out_coords, coords, 0x7);
|
||||||
|
|
||||||
|
return b->shader;
|
||||||
|
}
|
||||||
|
|
||||||
static nir_shader *
|
static nir_shader *
|
||||||
build_clear_vs_shader(void)
|
build_clear_vs_shader(void)
|
||||||
{
|
{
|
||||||
|
|
@ -823,6 +868,7 @@ tu_init_clear_blit_shaders(struct tu_device *dev)
|
||||||
{
|
{
|
||||||
unsigned offset = 0;
|
unsigned offset = 0;
|
||||||
compile_shader(dev, build_blit_vs_shader(), 3, &offset, GLOBAL_SH_VS_BLIT);
|
compile_shader(dev, build_blit_vs_shader(), 3, &offset, GLOBAL_SH_VS_BLIT);
|
||||||
|
compile_shader(dev, build_multi_blit_vs_shader(), 3, &offset, GLOBAL_SH_VS_MULTI_BLIT);
|
||||||
compile_shader(dev, build_clear_vs_shader(), 2, &offset, GLOBAL_SH_VS_CLEAR);
|
compile_shader(dev, build_clear_vs_shader(), 2, &offset, GLOBAL_SH_VS_CLEAR);
|
||||||
compile_shader(dev, build_blit_fs_shader(false), 0, &offset, GLOBAL_SH_FS_BLIT);
|
compile_shader(dev, build_blit_fs_shader(false), 0, &offset, GLOBAL_SH_FS_BLIT);
|
||||||
compile_shader(dev, build_blit_fs_shader(true), 0, &offset, GLOBAL_SH_FS_BLIT_ZSCALE);
|
compile_shader(dev, build_blit_fs_shader(true), 0, &offset, GLOBAL_SH_FS_BLIT_ZSCALE);
|
||||||
|
|
@ -846,6 +892,7 @@ tu_destroy_clear_blit_shaders(struct tu_device *dev)
|
||||||
enum r3d_type {
|
enum r3d_type {
|
||||||
R3D_CLEAR,
|
R3D_CLEAR,
|
||||||
R3D_BLIT,
|
R3D_BLIT,
|
||||||
|
R3D_MULTI_BLIT,
|
||||||
};
|
};
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
|
|
@ -855,7 +902,8 @@ r3d_common(struct tu_cmd_buffer *cmd, struct tu_cs *cs, enum r3d_type type,
|
||||||
VkSampleCountFlagBits dst_samples)
|
VkSampleCountFlagBits dst_samples)
|
||||||
{
|
{
|
||||||
enum global_shader vs_id =
|
enum global_shader vs_id =
|
||||||
type == R3D_CLEAR ? GLOBAL_SH_VS_CLEAR : GLOBAL_SH_VS_BLIT;
|
type == R3D_CLEAR ? GLOBAL_SH_VS_CLEAR :
|
||||||
|
(type == R3D_MULTI_BLIT ? GLOBAL_SH_VS_MULTI_BLIT : GLOBAL_SH_VS_BLIT);
|
||||||
|
|
||||||
struct ir3_shader_variant *vs = cmd->device->global_shader_variants[vs_id];
|
struct ir3_shader_variant *vs = cmd->device->global_shader_variants[vs_id];
|
||||||
uint64_t vs_iova = cmd->device->global_shader_va[vs_id];
|
uint64_t vs_iova = cmd->device->global_shader_va[vs_id];
|
||||||
|
|
@ -1056,6 +1104,49 @@ r3d_coords(struct tu_cmd_buffer *cmd,
|
||||||
r3d_coords_raw(cmd, cs, coords);
|
r3d_coords_raw(cmd, cs, coords);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
r3d_coords_multi(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
const VkRect2D *dst,
|
||||||
|
const tu_rect2d_float *src,
|
||||||
|
unsigned count)
|
||||||
|
{
|
||||||
|
struct tu_cs sub_cs;
|
||||||
|
VkResult result =
|
||||||
|
tu_cs_begin_sub_stream_aligned(&cmd->sub_cs, count * 2, 4, &sub_cs);
|
||||||
|
if (result != VK_SUCCESS) {
|
||||||
|
vk_command_buffer_set_error(&cmd->vk, result);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (unsigned i = 0; i < count; i++) {
|
||||||
|
tu_cs_emit(&sub_cs, fui(dst[i].offset.x));
|
||||||
|
tu_cs_emit(&sub_cs, fui(dst[i].offset.y));
|
||||||
|
tu_cs_emit(&sub_cs, fui(src[i].x_start));
|
||||||
|
tu_cs_emit(&sub_cs, fui(src[i].y_start));
|
||||||
|
tu_cs_emit(&sub_cs, fui(dst[i].offset.x + dst[i].extent.width));
|
||||||
|
tu_cs_emit(&sub_cs, fui(dst[i].offset.y + dst[i].extent.height));
|
||||||
|
tu_cs_emit(&sub_cs, fui(src[i].x_end));
|
||||||
|
tu_cs_emit(&sub_cs, fui(src[i].y_end));
|
||||||
|
}
|
||||||
|
|
||||||
|
struct tu_draw_state coords_ubo = tu_cs_end_draw_state(&cmd->sub_cs,
|
||||||
|
&sub_cs);
|
||||||
|
|
||||||
|
tu_cs_emit_pkt7(cs, CP_LOAD_STATE6_GEOM, 5);
|
||||||
|
tu_cs_emit(cs,
|
||||||
|
CP_LOAD_STATE6_0_DST_OFF(0) |
|
||||||
|
CP_LOAD_STATE6_0_STATE_TYPE(ST6_UBO) |
|
||||||
|
CP_LOAD_STATE6_0_STATE_SRC(SS6_DIRECT) |
|
||||||
|
CP_LOAD_STATE6_0_STATE_BLOCK(SB6_VS_SHADER) |
|
||||||
|
CP_LOAD_STATE6_0_NUM_UNIT(1));
|
||||||
|
tu_cs_emit(cs, CP_LOAD_STATE6_1_EXT_SRC_ADDR(0));
|
||||||
|
tu_cs_emit(cs, CP_LOAD_STATE6_2_EXT_SRC_ADDR_HI(0));
|
||||||
|
tu_cs_emit_qw(cs,
|
||||||
|
coords_ubo.iova |
|
||||||
|
(uint64_t)A6XX_UBO_1_SIZE(count * 2) << 32);
|
||||||
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
r3d_clear_value(struct tu_cmd_buffer *cmd, struct tu_cs *cs, enum pipe_format format, const VkClearValue *val)
|
r3d_clear_value(struct tu_cmd_buffer *cmd, struct tu_cs *cs, enum pipe_format format, const VkClearValue *val)
|
||||||
{
|
{
|
||||||
|
|
@ -1290,6 +1381,7 @@ r3d_src_load(struct tu_cmd_buffer *cmd,
|
||||||
struct tu_cs *cs,
|
struct tu_cs *cs,
|
||||||
const struct tu_image_view *iview,
|
const struct tu_image_view *iview,
|
||||||
uint32_t layer,
|
uint32_t layer,
|
||||||
|
VkFilter filter,
|
||||||
bool override_swap)
|
bool override_swap)
|
||||||
{
|
{
|
||||||
uint32_t desc[FDL6_TEX_CONST_DWORDS];
|
uint32_t desc[FDL6_TEX_CONST_DWORDS];
|
||||||
|
|
@ -1321,7 +1413,7 @@ r3d_src_load(struct tu_cmd_buffer *cmd,
|
||||||
r3d_src_common<CHIP>(cmd, cs, desc,
|
r3d_src_common<CHIP>(cmd, cs, desc,
|
||||||
iview->view.layer_size * layer,
|
iview->view.layer_size * layer,
|
||||||
iview->view.ubwc_layer_size * layer,
|
iview->view.ubwc_layer_size * layer,
|
||||||
VK_FILTER_NEAREST);
|
filter);
|
||||||
}
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
|
|
@ -1331,7 +1423,7 @@ r3d_src_gmem_load(struct tu_cmd_buffer *cmd,
|
||||||
const struct tu_image_view *iview,
|
const struct tu_image_view *iview,
|
||||||
uint32_t layer)
|
uint32_t layer)
|
||||||
{
|
{
|
||||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, true);
|
r3d_src_load<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST, true);
|
||||||
}
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
|
|
@ -1339,9 +1431,10 @@ static void
|
||||||
r3d_src_sysmem_load(struct tu_cmd_buffer *cmd,
|
r3d_src_sysmem_load(struct tu_cmd_buffer *cmd,
|
||||||
struct tu_cs *cs,
|
struct tu_cs *cs,
|
||||||
const struct tu_image_view *iview,
|
const struct tu_image_view *iview,
|
||||||
uint32_t layer)
|
uint32_t layer,
|
||||||
|
VkFilter filter)
|
||||||
{
|
{
|
||||||
r3d_src_load<CHIP>(cmd, cs, iview, layer, false);
|
r3d_src_load<CHIP>(cmd, cs, iview, layer, filter, false);
|
||||||
}
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
|
|
@ -1594,6 +1687,9 @@ enum r3d_blit_param {
|
||||||
R3D_Z_SCALE = 1 << 0,
|
R3D_Z_SCALE = 1 << 0,
|
||||||
R3D_DST_GMEM = 1 << 1,
|
R3D_DST_GMEM = 1 << 1,
|
||||||
R3D_COPY = 1 << 2,
|
R3D_COPY = 1 << 2,
|
||||||
|
R3D_USE_MULTI_BLIT = 1 << 3,
|
||||||
|
R3D_OUTSIDE_PASS = 1 << 4,
|
||||||
|
R3D_OVERLAPPING = 1 << 5,
|
||||||
};
|
};
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
|
|
@ -1617,7 +1713,7 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
||||||
blit_param & R3D_DST_GMEM);
|
blit_param & R3D_DST_GMEM);
|
||||||
fixup_dst_format(src_format, &dst_format, &fmt);
|
fixup_dst_format(src_format, &dst_format, &fmt);
|
||||||
|
|
||||||
if (!cmd->state.pass) {
|
if (!cmd->state.pass || (blit_param & R3D_OUTSIDE_PASS)) {
|
||||||
tu_emit_cache_flush_ccu<CHIP>(cmd, cs, TU_CMD_CCU_SYSMEM);
|
tu_emit_cache_flush_ccu<CHIP>(cmd, cs, TU_CMD_CCU_SYSMEM);
|
||||||
tu6_emit_window_scissor<CHIP>(cs, 0, 0, 0x3fff, 0x3fff);
|
tu6_emit_window_scissor<CHIP>(cs, 0, 0, 0x3fff, 0x3fff);
|
||||||
if (cmd->device->physical_device->info->props.has_hw_bin_scaling) {
|
if (cmd->device->physical_device->info->props.has_hw_bin_scaling) {
|
||||||
|
|
@ -1651,7 +1747,8 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
const enum r3d_type type = (clear) ? R3D_CLEAR : R3D_BLIT;
|
const enum r3d_type type = (clear) ? R3D_CLEAR :
|
||||||
|
((blit_param & R3D_USE_MULTI_BLIT) ? R3D_MULTI_BLIT : R3D_BLIT);
|
||||||
r3d_common<CHIP>(cmd, cs, type, 1, blit_param & R3D_Z_SCALE, src_samples,
|
r3d_common<CHIP>(cmd, cs, type, 1, blit_param & R3D_Z_SCALE, src_samples,
|
||||||
dst_samples);
|
dst_samples);
|
||||||
|
|
||||||
|
|
@ -1696,7 +1793,17 @@ r3d_setup(struct tu_cmd_buffer *cmd,
|
||||||
tu_cs_emit_regs(cs, GRAS_VRS_CONFIG(CHIP));
|
tu_cs_emit_regs(cs, GRAS_VRS_CONFIG(CHIP));
|
||||||
}
|
}
|
||||||
|
|
||||||
tu_cs_emit_regs(cs, GRAS_SC_CNTL(CHIP, .ccusinglecachelinesize = 2));
|
/* We need to handle overlapping blits the same as feedback loops, which
|
||||||
|
* means setting this bit to avoid corruption due to UBWC flag caches
|
||||||
|
* becoming desynchronized. On a7xx+ UBWC caches are coherent.
|
||||||
|
*/
|
||||||
|
enum a6xx_single_prim_mode prim_mode =
|
||||||
|
CHIP == A6XX && (blit_param & R3D_OVERLAPPING) && ubwc ?
|
||||||
|
FLUSH_PER_OVERLAP_AND_OVERWRITE : NO_FLUSH;
|
||||||
|
|
||||||
|
tu_cs_emit_regs(cs, GRAS_SC_CNTL(CHIP,
|
||||||
|
.single_prim_mode = prim_mode,
|
||||||
|
.ccusinglecachelinesize = 2));
|
||||||
|
|
||||||
/* Disable sample counting in order to not affect occlusion query. */
|
/* Disable sample counting in order to not affect occlusion query. */
|
||||||
tu_cs_emit_regs(cs, A6XX_RB_SAMPLE_COUNTER_CNTL(.disable = true));
|
tu_cs_emit_regs(cs, A6XX_RB_SAMPLE_COUNTER_CNTL(.disable = true));
|
||||||
|
|
@ -1738,6 +1845,17 @@ r3d_run_vis(struct tu_cmd_buffer *cmd, struct tu_cs *cs)
|
||||||
tu_cs_emit(cs, 2); /* vertex count */
|
tu_cs_emit(cs, 2); /* vertex count */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
r3d_run_multi(struct tu_cmd_buffer *cmd, struct tu_cs *cs, unsigned count)
|
||||||
|
{
|
||||||
|
tu_cs_emit_pkt7(cs, CP_DRAW_INDX_OFFSET, 3);
|
||||||
|
tu_cs_emit(cs, CP_DRAW_INDX_OFFSET_0_PRIM_TYPE(DI_PT_RECTLIST) |
|
||||||
|
CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT(DI_SRC_SEL_AUTO_INDEX) |
|
||||||
|
CP_DRAW_INDX_OFFSET_0_VIS_CULL(IGNORE_VISIBILITY));
|
||||||
|
tu_cs_emit(cs, 1); /* instance count */
|
||||||
|
tu_cs_emit(cs, count * 2); /* vertex count */
|
||||||
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
static void
|
static void
|
||||||
r3d_teardown(struct tu_cmd_buffer *cmd, struct tu_cs *cs)
|
r3d_teardown(struct tu_cmd_buffer *cmd, struct tu_cs *cs)
|
||||||
|
|
@ -3620,12 +3738,6 @@ tu_CmdResolveImage2(VkCommandBuffer commandBuffer,
|
||||||
}
|
}
|
||||||
TU_GENX(tu_CmdResolveImage2);
|
TU_GENX(tu_CmdResolveImage2);
|
||||||
|
|
||||||
#define for_each_layer(layer, layer_mask, layers) \
|
|
||||||
for (uint32_t layer = 0; \
|
|
||||||
layer < ((layer_mask) ? (util_logbase2(layer_mask) + 1) : layers); \
|
|
||||||
layer++) \
|
|
||||||
if (!layer_mask || (layer_mask & BIT(layer)))
|
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
static void
|
static void
|
||||||
resolve_sysmem(struct tu_cmd_buffer *cmd,
|
resolve_sysmem(struct tu_cmd_buffer *cmd,
|
||||||
|
|
@ -3673,7 +3785,7 @@ resolve_sysmem(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
if (ops == &r3d_ops<CHIP>) {
|
if (ops == &r3d_ops<CHIP>) {
|
||||||
r3d_src_sysmem_load<CHIP>(cmd, cs, src, i);
|
r3d_src_sysmem_load<CHIP>(cmd, cs, src, i, VK_FILTER_NEAREST);
|
||||||
} else {
|
} else {
|
||||||
ops->src(cmd, cs, &src->view, i, VK_FILTER_NEAREST, dst_format);
|
ops->src(cmd, cs, &src->view, i, VK_FILTER_NEAREST, dst_format);
|
||||||
}
|
}
|
||||||
|
|
@ -4984,6 +5096,124 @@ tu7_generic_clear_attachment(struct tu_cmd_buffer *cmd,
|
||||||
trace_end_generic_clear(&cmd->rp_trace, cs);
|
trace_end_generic_clear(&cmd->rp_trace, cs);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Transform the render area from framebuffer space to subsampled space. Be
|
||||||
|
* conservative if the render area partially covers a fragment.
|
||||||
|
*/
|
||||||
|
static VkRect2D
|
||||||
|
transform_render_area(VkRect2D render_area, const struct tu_tile_config *tile,
|
||||||
|
const VkRect2D *bins, unsigned view)
|
||||||
|
{
|
||||||
|
/* Calculate transform from framebuffer space to subsampled space.
|
||||||
|
*/
|
||||||
|
VkExtent2D frag_area = (tile->subsampled_views & (1u << view)) ?
|
||||||
|
tile->frag_areas[view] : (VkExtent2D) { 1, 1 };
|
||||||
|
|
||||||
|
VkOffset2D offset = {
|
||||||
|
tile->subsampled_pos[view].offset.x -
|
||||||
|
bins[view].offset.x / frag_area.width,
|
||||||
|
tile->subsampled_pos[view].offset.y -
|
||||||
|
bins[view].offset.y / frag_area.height,
|
||||||
|
};
|
||||||
|
|
||||||
|
/* In the unlikely case subsampling was disabled due to running out of
|
||||||
|
* tiles, don't transform the render area.
|
||||||
|
*/
|
||||||
|
if (!tile->subsampled)
|
||||||
|
offset = (VkOffset2D) { 0, 0 };
|
||||||
|
|
||||||
|
unsigned x1 =
|
||||||
|
render_area.offset.x / frag_area.width + offset.x;
|
||||||
|
unsigned x2 =
|
||||||
|
DIV_ROUND_UP(render_area.offset.x + render_area.extent.width,
|
||||||
|
frag_area.width) + offset.x;
|
||||||
|
unsigned y1 =
|
||||||
|
render_area.offset.y / frag_area.height + offset.y;
|
||||||
|
unsigned y2 =
|
||||||
|
DIV_ROUND_UP(render_area.offset.y + render_area.extent.height,
|
||||||
|
frag_area.height) + offset.y;
|
||||||
|
|
||||||
|
return (VkRect2D) {
|
||||||
|
{ x1, y1 }, { x2 - x1, y2 - y1 }
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
struct apply_blit_scissor_state {
|
||||||
|
unsigned view;
|
||||||
|
VkRect2D render_area;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
static void
|
||||||
|
fdm_apply_blit_scissor(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
void *data,
|
||||||
|
VkOffset2D common_bin_offset,
|
||||||
|
const VkOffset2D *hw_viewport_offsets,
|
||||||
|
unsigned views,
|
||||||
|
const struct tu_tile_config *tile,
|
||||||
|
const VkRect2D *bins,
|
||||||
|
bool binning)
|
||||||
|
{
|
||||||
|
struct tu_physical_device *phys_dev = cmd->device->physical_device;
|
||||||
|
const struct apply_blit_scissor_state *state =
|
||||||
|
(const struct apply_blit_scissor_state *)data;
|
||||||
|
unsigned view = MIN2(state->view, views - 1);
|
||||||
|
|
||||||
|
VkRect2D subsampled_render_area =
|
||||||
|
transform_render_area(state->render_area, tile, bins, view);
|
||||||
|
VkOffset2D pos = tile->subsampled ?
|
||||||
|
tile->subsampled_pos[view].offset : common_bin_offset;
|
||||||
|
|
||||||
|
VkRect2D scissor = subsampled_render_area;
|
||||||
|
if (tile->subsampled) {
|
||||||
|
/* Intersect the render area with the subsampled tile. We don't want to
|
||||||
|
* store the whole unscaled tile, and the unscaled tile may jut into the
|
||||||
|
* next tile.
|
||||||
|
*/
|
||||||
|
scissor.offset.x = MAX2(scissor.offset.x, tile->subsampled_pos[view].offset.x);
|
||||||
|
scissor.offset.y = MAX2(scissor.offset.y, tile->subsampled_pos[view].offset.y);
|
||||||
|
scissor.extent.width =
|
||||||
|
MIN2(subsampled_render_area.offset.x +
|
||||||
|
subsampled_render_area.extent.width,
|
||||||
|
tile->subsampled_pos[view].offset.x +
|
||||||
|
tile->subsampled_pos[view].extent.width) - scissor.offset.x;
|
||||||
|
scissor.extent.height =
|
||||||
|
MIN2(subsampled_render_area.offset.y +
|
||||||
|
subsampled_render_area.extent.height,
|
||||||
|
tile->subsampled_pos[view].offset.y +
|
||||||
|
tile->subsampled_pos[view].extent.height) - scissor.offset.y;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (bins[view].extent.width == 0 && bins[view].extent.height == 0) {
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
A6XX_RB_RESOLVE_CNTL_1(.x = 1, .y = 1),
|
||||||
|
A6XX_RB_RESOLVE_CNTL_2(.x = 0, .y = 0));
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
A6XX_RB_RESOLVE_WINDOW_OFFSET(.x = 0, .y = 0));
|
||||||
|
} else {
|
||||||
|
/* Note: we will not dynamically enable CCU_RESOLVE for stores unless the
|
||||||
|
* offset is aligned, but this patchpoint will be executed anyway so we
|
||||||
|
* have to do something and not assert in the builder.
|
||||||
|
*/
|
||||||
|
uint32_t x1 = scissor.offset.x &
|
||||||
|
~(phys_dev->info->gmem_align_w - 1);
|
||||||
|
uint32_t y1 = scissor.offset.y &
|
||||||
|
~(phys_dev->info->gmem_align_h - 1);
|
||||||
|
uint32_t x2 = ALIGN_POT(scissor.offset.x +
|
||||||
|
scissor.extent.width,
|
||||||
|
phys_dev->info->gmem_align_w) - 1;
|
||||||
|
uint32_t y2 = ALIGN_POT(scissor.offset.y +
|
||||||
|
scissor.extent.height,
|
||||||
|
phys_dev->info->gmem_align_h) - 1;
|
||||||
|
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
A6XX_RB_RESOLVE_CNTL_1(.x = x1, .y = y1),
|
||||||
|
A6XX_RB_RESOLVE_CNTL_2(.x = x2, .y = y2));
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
A6XX_RB_RESOLVE_WINDOW_OFFSET(.x = pos.x, .y = pos.y));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
static void
|
static void
|
||||||
tu_emit_blit(struct tu_cmd_buffer *cmd,
|
tu_emit_blit(struct tu_cmd_buffer *cmd,
|
||||||
|
|
@ -5041,8 +5271,17 @@ tu_emit_blit(struct tu_cmd_buffer *cmd,
|
||||||
event_blit_setup(cs, buffer_id, attachment, blit_event_type, clear_mask);
|
event_blit_setup(cs, buffer_id, attachment, blit_event_type, clear_mask);
|
||||||
|
|
||||||
for_each_layer(i, attachment->used_views, cmd->state.framebuffer->layers) {
|
for_each_layer(i, attachment->used_views, cmd->state.framebuffer->layers) {
|
||||||
if (scissor_per_layer)
|
if (cmd->state.pass->has_fdm && cmd->state.fdm_subsampled) {
|
||||||
|
struct apply_blit_scissor_state state = {
|
||||||
|
.view = i,
|
||||||
|
.render_area = scissor_per_layer ?
|
||||||
|
cmd->state.render_areas[i] : cmd->state.render_areas[0],
|
||||||
|
};
|
||||||
|
tu_create_fdm_bin_patchpoint(cmd, cs, 5, TU_FDM_SKIP_BINNING,
|
||||||
|
fdm_apply_blit_scissor<CHIP>, state);
|
||||||
|
} else if (scissor_per_layer) {
|
||||||
tu6_emit_blit_scissor(cmd, cs, i, align_scissor);
|
tu6_emit_blit_scissor(cmd, cs, i, align_scissor);
|
||||||
|
}
|
||||||
event_blit_dst_view blt_view = blt_view_from_tu_view(iview, i);
|
event_blit_dst_view blt_view = blt_view_from_tu_view(iview, i);
|
||||||
event_blit_run<CHIP>(cmd, cs, attachment, &blt_view, separate_stencil);
|
event_blit_run<CHIP>(cmd, cs, attachment, &blt_view, separate_stencil);
|
||||||
}
|
}
|
||||||
|
|
@ -5331,7 +5570,8 @@ store_cp_blit(struct tu_cmd_buffer *cmd,
|
||||||
{
|
{
|
||||||
r2d_setup_common<CHIP>(cmd, cs, src_format, dst_format,
|
r2d_setup_common<CHIP>(cmd, cs, src_format, dst_format,
|
||||||
VK_IMAGE_ASPECT_COLOR_BIT, 0, false,
|
VK_IMAGE_ASPECT_COLOR_BIT, 0, false,
|
||||||
dst_iview->view.ubwc_enabled, true);
|
dst_iview->view.ubwc_enabled,
|
||||||
|
true);
|
||||||
|
|
||||||
if (dst_iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
if (dst_iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||||
if (!separate_stencil) {
|
if (!separate_stencil) {
|
||||||
|
|
@ -5509,13 +5749,16 @@ tu_attachment_store_unaligned(struct tu_cmd_buffer *cmd, uint32_t a)
|
||||||
if (TU_DEBUG(UNALIGNED_STORE))
|
if (TU_DEBUG(UNALIGNED_STORE))
|
||||||
return true;
|
return true;
|
||||||
|
|
||||||
/* We always use the unaligned store path when scaling rendering. */
|
|
||||||
if (cmd->state.pass->has_fdm)
|
|
||||||
return true;
|
|
||||||
|
|
||||||
unsigned render_area_count =
|
unsigned render_area_count =
|
||||||
cmd->state.per_layer_render_area ? cmd->state.pass->num_views : 1;
|
cmd->state.per_layer_render_area ? cmd->state.pass->num_views : 1;
|
||||||
|
|
||||||
|
/* With subsampling, the formula below doesn't work, but we already
|
||||||
|
* conditionally use A2D for the unaligned blits at the edge. Just return
|
||||||
|
* false here.
|
||||||
|
*/
|
||||||
|
if (cmd->state.fdm_subsampled)
|
||||||
|
return false;
|
||||||
|
|
||||||
for (unsigned i = 0; i < render_area_count; i++) {
|
for (unsigned i = 0; i < render_area_count; i++) {
|
||||||
const VkRect2D *render_area = &cmd->state.render_areas[i];
|
const VkRect2D *render_area = &cmd->state.render_areas[i];
|
||||||
uint32_t x1 = render_area->offset.x;
|
uint32_t x1 = render_area->offset.x;
|
||||||
|
|
@ -5564,6 +5807,9 @@ tu_choose_gmem_layout(struct tu_cmd_buffer *cmd)
|
||||||
{
|
{
|
||||||
cmd->state.gmem_layout = TU_GMEM_LAYOUT_FULL;
|
cmd->state.gmem_layout = TU_GMEM_LAYOUT_FULL;
|
||||||
|
|
||||||
|
if (cmd->state.pass->has_fdm)
|
||||||
|
cmd->state.gmem_layout = TU_GMEM_LAYOUT_AVOID_CCU;
|
||||||
|
|
||||||
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||||
if (!cmd->state.attachments[i])
|
if (!cmd->state.attachments[i])
|
||||||
continue;
|
continue;
|
||||||
|
|
@ -5620,8 +5866,9 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
||||||
{
|
{
|
||||||
const struct apply_store_coords_state *state =
|
const struct apply_store_coords_state *state =
|
||||||
(const struct apply_store_coords_state *)data;
|
(const struct apply_store_coords_state *)data;
|
||||||
VkExtent2D frag_area = tile->frag_areas[MIN2(state->view, views - 1)];
|
unsigned view = MIN2(state->view, views - 1);
|
||||||
VkRect2D bin = bins[MIN2(state->view, views - 1)];
|
VkExtent2D frag_area = tile->frag_areas[view];
|
||||||
|
VkRect2D bin = bins[view];
|
||||||
|
|
||||||
/* The bin width/height must be a multiple of the frag_area to make sure
|
/* The bin width/height must be a multiple of the frag_area to make sure
|
||||||
* that the scaling happens correctly. This means there may be some
|
* that the scaling happens correctly. This means there may be some
|
||||||
|
|
@ -5643,10 +5890,22 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
||||||
GRAS_A2D_SRC_YMIN(CHIP, 1),
|
GRAS_A2D_SRC_YMIN(CHIP, 1),
|
||||||
GRAS_A2D_SRC_YMAX(CHIP, 0));
|
GRAS_A2D_SRC_YMAX(CHIP, 0));
|
||||||
} else {
|
} else {
|
||||||
tu_cs_emit_regs(cs,
|
VkOffset2D start =
|
||||||
GRAS_A2D_DEST_TL(CHIP, .x = bin.offset.x, .y = bin.offset.y),
|
tile->subsampled ? tile->subsampled_pos[view].offset : bin.offset;
|
||||||
GRAS_A2D_DEST_BR(CHIP, .x = bin.offset.x + bin.extent.width - 1,
|
if (tile->subsampled_views & (1u << view)) {
|
||||||
.y = bin.offset.y + bin.extent.height - 1));
|
/* Subsampled blits don't scale up the bin, and go to the subsampled
|
||||||
|
* destination.
|
||||||
|
*/
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
GRAS_A2D_DEST_TL(CHIP, .x = start.x, .y = start.y),
|
||||||
|
GRAS_A2D_DEST_BR(CHIP, .x = start.x + scaled_width - 1,
|
||||||
|
.y = start.y + scaled_height - 1));
|
||||||
|
} else {
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
GRAS_A2D_DEST_TL(CHIP, .x = start.x, .y = start.y),
|
||||||
|
GRAS_A2D_DEST_BR(CHIP, .x = start.x + bin.extent.width - 1,
|
||||||
|
.y = start.y + bin.extent.height - 1));
|
||||||
|
}
|
||||||
tu_cs_emit_regs(cs,
|
tu_cs_emit_regs(cs,
|
||||||
GRAS_A2D_SRC_XMIN(CHIP, common_bin_offset.x),
|
GRAS_A2D_SRC_XMIN(CHIP, common_bin_offset.x),
|
||||||
GRAS_A2D_SRC_XMAX(CHIP, common_bin_offset.x + scaled_width - 1),
|
GRAS_A2D_SRC_XMAX(CHIP, common_bin_offset.x + scaled_width - 1),
|
||||||
|
|
@ -5655,6 +5914,45 @@ fdm_apply_store_coords(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
struct apply_render_area_state {
|
||||||
|
unsigned view;
|
||||||
|
VkRect2D render_area;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
static void
|
||||||
|
fdm_apply_render_area(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
void *data,
|
||||||
|
VkOffset2D common_bin_offset,
|
||||||
|
const VkOffset2D *hw_viewport_offsets,
|
||||||
|
unsigned views,
|
||||||
|
const struct tu_tile_config *tile,
|
||||||
|
const VkRect2D *bins,
|
||||||
|
bool binning)
|
||||||
|
{
|
||||||
|
struct apply_render_area_state *state =
|
||||||
|
(struct apply_render_area_state *)data;
|
||||||
|
|
||||||
|
unsigned view = MIN2(state->view, views - 1);
|
||||||
|
|
||||||
|
VkRect2D subsampled_render_area =
|
||||||
|
transform_render_area(state->render_area, tile, bins, view);
|
||||||
|
|
||||||
|
unsigned x1 = subsampled_render_area.offset.x;
|
||||||
|
unsigned x2 = subsampled_render_area.offset.x +
|
||||||
|
subsampled_render_area.extent.width - 1;
|
||||||
|
unsigned y1 = subsampled_render_area.offset.y;
|
||||||
|
unsigned y2 = subsampled_render_area.offset.y +
|
||||||
|
subsampled_render_area.extent.height - 1;
|
||||||
|
|
||||||
|
tu_cs_emit_regs(cs,
|
||||||
|
GRAS_A2D_SCISSOR_TL(CHIP, .x = x1,
|
||||||
|
.y = y1,),
|
||||||
|
GRAS_A2D_SCISSOR_BR(CHIP, .x = x2,
|
||||||
|
.y = y2,));
|
||||||
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
void
|
void
|
||||||
tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
|
|
@ -5703,7 +6001,10 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
|
|
||||||
bool use_fast_path = !unaligned && !mismatched_mutability &&
|
bool use_fast_path = !unaligned && !mismatched_mutability &&
|
||||||
!resolve_d24s8_s8 &&
|
!resolve_d24s8_s8 &&
|
||||||
(a == gmem_a || blit_can_resolve(dst->format));
|
(a == gmem_a || blit_can_resolve(dst->format)) &&
|
||||||
|
(!cmd->state.pass->has_fdm || CHIP >= A7XX);
|
||||||
|
|
||||||
|
bool fast_path_conditional = use_fast_path && cmd->state.pass->has_fdm;
|
||||||
|
|
||||||
trace_start_gmem_store(&cmd->rp_trace, cs, cmd, dst->format, use_fast_path, unaligned);
|
trace_start_gmem_store(&cmd->rp_trace, cs, cmd, dst->format, use_fast_path, unaligned);
|
||||||
|
|
||||||
|
|
@ -5717,6 +6018,11 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
|
|
||||||
/* use fast path when render area is aligned, except for unsupported resolve cases */
|
/* use fast path when render area is aligned, except for unsupported resolve cases */
|
||||||
if (use_fast_path) {
|
if (use_fast_path) {
|
||||||
|
if (fast_path_conditional) {
|
||||||
|
tu_cond_exec_start(cs, CP_COND_REG_EXEC_0_MODE(PRED_TEST) |
|
||||||
|
CP_COND_REG_EXEC_0_PRED_BIT(TU_PREDICATE_FAST_STORE));
|
||||||
|
}
|
||||||
|
|
||||||
if (store_common)
|
if (store_common)
|
||||||
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
||||||
BLIT_EVENT_STORE, per_layer_render_area, true, false);
|
BLIT_EVENT_STORE, per_layer_render_area, true, false);
|
||||||
|
|
@ -5724,16 +6030,25 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
tu_emit_blit<CHIP>(cmd, cs, resolve_group, dst_iview, src, clear_value,
|
||||||
BLIT_EVENT_STORE, per_layer_render_area, true, true);
|
BLIT_EVENT_STORE, per_layer_render_area, true, true);
|
||||||
|
|
||||||
if (cond_exec) {
|
if (fast_path_conditional) {
|
||||||
tu_end_load_store_cond_exec(cmd, cs, false);
|
tu_cond_exec_end(cs);
|
||||||
}
|
} else {
|
||||||
|
if (cond_exec) {
|
||||||
|
tu_end_load_store_cond_exec(cmd, cs, false);
|
||||||
|
}
|
||||||
|
|
||||||
trace_end_gmem_store(&cmd->rp_trace, cs);
|
trace_end_gmem_store(&cmd->rp_trace, cs);
|
||||||
return;
|
return;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
assert(cmd->state.gmem_layout == TU_GMEM_LAYOUT_AVOID_CCU);
|
assert(cmd->state.gmem_layout == TU_GMEM_LAYOUT_AVOID_CCU);
|
||||||
|
|
||||||
|
if (fast_path_conditional) {
|
||||||
|
tu_cond_exec_start(cs, CP_COND_REG_EXEC_0_MODE(PRED_TEST) |
|
||||||
|
CP_COND_REG_EXEC_0_PRED_BIT(TU_PREDICATE_NO_FAST_STORE));
|
||||||
|
}
|
||||||
|
|
||||||
enum pipe_format src_format = vk_format_to_pipe_format(src->format);
|
enum pipe_format src_format = vk_format_to_pipe_format(src->format);
|
||||||
if (src_format == PIPE_FORMAT_Z32_FLOAT_S8X24_UINT)
|
if (src_format == PIPE_FORMAT_Z32_FLOAT_S8X24_UINT)
|
||||||
src_format = PIPE_FORMAT_Z32_FLOAT;
|
src_format = PIPE_FORMAT_Z32_FLOAT;
|
||||||
|
|
@ -5773,7 +6088,7 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
if (!cmd->state.pass->has_fdm) {
|
if (!cmd->state.pass->has_fdm) {
|
||||||
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
||||||
render_area->extent);
|
render_area->extent);
|
||||||
} else {
|
} else if (!cmd->state.fdm_subsampled) {
|
||||||
/* Usually GRAS_2D_RESOLVE_CNTL_* clips the destination to the bin
|
/* Usually GRAS_2D_RESOLVE_CNTL_* clips the destination to the bin
|
||||||
* area and the coordinates span the entire render area, but for
|
* area and the coordinates span the entire render area, but for
|
||||||
* FDM we need to scale the coordinates so we need to take the
|
* FDM we need to scale the coordinates so we need to take the
|
||||||
|
|
@ -5795,7 +6110,7 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
if (!cmd->state.pass->has_fdm) {
|
if (!cmd->state.pass->has_fdm) {
|
||||||
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
r2d_coords<CHIP>(cmd, cs, render_area->offset, render_area->offset,
|
||||||
render_area->extent);
|
render_area->extent);
|
||||||
} else {
|
} else if (!cmd->state.fdm_subsampled) {
|
||||||
tu_cs_emit_regs(cs,
|
tu_cs_emit_regs(cs,
|
||||||
GRAS_A2D_SCISSOR_TL(CHIP, .x = render_area->offset.x,
|
GRAS_A2D_SCISSOR_TL(CHIP, .x = render_area->offset.x,
|
||||||
.y = render_area->offset.y,),
|
.y = render_area->offset.y,),
|
||||||
|
|
@ -5805,6 +6120,17 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
|
|
||||||
if (cmd->state.pass->has_fdm) {
|
if (cmd->state.pass->has_fdm) {
|
||||||
|
if (cmd->state.fdm_subsampled) {
|
||||||
|
struct apply_render_area_state state {
|
||||||
|
.view = i,
|
||||||
|
.render_area =
|
||||||
|
per_layer_render_area ? cmd->state.render_areas[i] :
|
||||||
|
cmd->state.render_areas[0],
|
||||||
|
};
|
||||||
|
tu_create_fdm_bin_patchpoint(cmd, cs, 3, TU_FDM_SKIP_BINNING,
|
||||||
|
fdm_apply_render_area<CHIP>,
|
||||||
|
state);
|
||||||
|
}
|
||||||
struct apply_store_coords_state state = {
|
struct apply_store_coords_state state = {
|
||||||
.view = i,
|
.view = i,
|
||||||
};
|
};
|
||||||
|
|
@ -5822,6 +6148,9 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (fast_path_conditional)
|
||||||
|
tu_cond_exec_end(cs);
|
||||||
|
|
||||||
if (cond_exec) {
|
if (cond_exec) {
|
||||||
tu_end_load_store_cond_exec(cmd, cs, false);
|
tu_end_load_store_cond_exec(cmd, cs, false);
|
||||||
}
|
}
|
||||||
|
|
@ -5829,3 +6158,71 @@ tu_store_gmem_attachment(struct tu_cmd_buffer *cmd,
|
||||||
trace_end_gmem_store(&cmd->rp_trace, cs);
|
trace_end_gmem_store(&cmd->rp_trace, cs);
|
||||||
}
|
}
|
||||||
TU_GENX(tu_store_gmem_attachment);
|
TU_GENX(tu_store_gmem_attachment);
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
static void
|
||||||
|
blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
const struct tu_image_view *iview,
|
||||||
|
enum VkFormat vk_format,
|
||||||
|
unsigned layer,
|
||||||
|
const VkRect2D *dst_coord,
|
||||||
|
const tu_rect2d_float *src_coord,
|
||||||
|
unsigned count)
|
||||||
|
{
|
||||||
|
enum pipe_format format = vk_format_to_pipe_format(vk_format);
|
||||||
|
|
||||||
|
r3d_setup<CHIP>(cmd, cs, format, format, VK_IMAGE_ASPECT_COLOR_BIT,
|
||||||
|
R3D_USE_MULTI_BLIT | R3D_OUTSIDE_PASS | R3D_OVERLAPPING,
|
||||||
|
false, iview->image->layout[0].ubwc,
|
||||||
|
VK_SAMPLE_COUNT_1_BIT, VK_SAMPLE_COUNT_1_BIT);
|
||||||
|
|
||||||
|
for (unsigned i = 0; i < count; i++) {
|
||||||
|
assert(dst_coord[i].offset.x + dst_coord[i].extent.width <=
|
||||||
|
iview->image->layout[0].width0);
|
||||||
|
assert(dst_coord[i].offset.y + dst_coord[i].extent.height <=
|
||||||
|
iview->image->layout[0].height0);
|
||||||
|
}
|
||||||
|
|
||||||
|
r3d_coords_multi(cmd, cs, dst_coord, src_coord, count);
|
||||||
|
|
||||||
|
if (iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||||
|
if (vk_format == VK_FORMAT_D32_SFLOAT) {
|
||||||
|
r3d_src_stencil<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||||
|
r3d_dst_stencil<CHIP>(cs, iview, layer);
|
||||||
|
} else {
|
||||||
|
r3d_src_depth<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||||
|
r3d_dst_depth<CHIP>(cs, iview, layer);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
r3d_src_sysmem_load<CHIP>(cmd, cs, iview, layer, VK_FILTER_NEAREST);
|
||||||
|
r3d_dst<CHIP>(cs, &iview->view, layer, format);
|
||||||
|
}
|
||||||
|
|
||||||
|
r3d_run_multi(cmd, cs, count);
|
||||||
|
|
||||||
|
r3d_teardown<CHIP>(cmd, cs);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
void
|
||||||
|
tu_blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
const struct tu_image_view *iview,
|
||||||
|
unsigned layer,
|
||||||
|
const VkRect2D *dst_coord,
|
||||||
|
const tu_rect2d_float *src_coord,
|
||||||
|
unsigned count)
|
||||||
|
{
|
||||||
|
if (iview->image->vk.format == VK_FORMAT_D32_SFLOAT_S8_UINT) {
|
||||||
|
blit_subsampled_apron<CHIP>(cmd, cs, iview, VK_FORMAT_D32_SFLOAT, layer,
|
||||||
|
dst_coord, src_coord, count);
|
||||||
|
blit_subsampled_apron<CHIP>(cmd, cs, iview, VK_FORMAT_S8_UINT, layer,
|
||||||
|
dst_coord, src_coord, count);
|
||||||
|
} else {
|
||||||
|
blit_subsampled_apron<CHIP>(cmd, cs, iview, iview->vk.format, layer,
|
||||||
|
dst_coord, src_coord, count);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
TU_GENX(tu_blit_subsampled_apron);
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -100,4 +100,14 @@ tu_cmd_fill_buffer_addr(VkCommandBuffer commandBuffer,
|
||||||
VkDeviceSize fillSize,
|
VkDeviceSize fillSize,
|
||||||
uint32_t data);
|
uint32_t data);
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
void
|
||||||
|
tu_blit_subsampled_apron(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
const struct tu_image_view *iview,
|
||||||
|
unsigned layer,
|
||||||
|
const VkRect2D *dst_coord,
|
||||||
|
const tu_rect2d_float *src_coord,
|
||||||
|
unsigned count);
|
||||||
|
|
||||||
#endif /* TU_CLEAR_BLIT_H */
|
#endif /* TU_CLEAR_BLIT_H */
|
||||||
|
|
|
||||||
|
|
@ -22,6 +22,7 @@
|
||||||
#include "tu_knl.h"
|
#include "tu_knl.h"
|
||||||
#include "tu_tile_config.h"
|
#include "tu_tile_config.h"
|
||||||
#include "tu_tracepoints.h"
|
#include "tu_tracepoints.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
|
|
||||||
#include "common/freedreno_gpu_event.h"
|
#include "common/freedreno_gpu_event.h"
|
||||||
#include "common/freedreno_lrz.h"
|
#include "common/freedreno_lrz.h"
|
||||||
|
|
@ -1733,6 +1734,29 @@ tu6_emit_tile_select(struct tu_cmd_buffer *cmd,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (CHIP >= A7XX) {
|
||||||
|
/* Without FDM offset, b_s = b_cs which is always aligned. With FDM
|
||||||
|
* offset, none may be aligned. With FDM offset, it may not be
|
||||||
|
* aligned. However with FDM offset and subsampled, we shift the
|
||||||
|
* subsampled coordinates to align the bins, so we can enable the
|
||||||
|
* fast path except for the last row/column where the end has to be
|
||||||
|
* aligned to the framebuffer end.
|
||||||
|
*
|
||||||
|
* We don't just directly check for aligned-ness because that depends
|
||||||
|
* on the actual offset, and signficantly changing the performance
|
||||||
|
* could result in jank between frames as the offset changes.
|
||||||
|
*/
|
||||||
|
bool use_fast_store = (!fdm_offsets && !bin_scale_en) ||
|
||||||
|
(tile->subsampled_views == tile->visible_views &&
|
||||||
|
!tile->subsampled_border);
|
||||||
|
|
||||||
|
tu7_set_pred_mask(cs, (1u << TU_PREDICATE_FAST_STORE) |
|
||||||
|
(1u << TU_PREDICATE_NO_FAST_STORE),
|
||||||
|
(1u << (use_fast_store ?
|
||||||
|
TU_PREDICATE_FAST_STORE :
|
||||||
|
TU_PREDICATE_NO_FAST_STORE)));
|
||||||
|
}
|
||||||
|
|
||||||
util_dynarray_foreach (&cmd->fdm_bin_patchpoints,
|
util_dynarray_foreach (&cmd->fdm_bin_patchpoints,
|
||||||
struct tu_fdm_bin_patchpoint, patch) {
|
struct tu_fdm_bin_patchpoint, patch) {
|
||||||
tu_cs_emit_pkt7(cs, CP_MEM_WRITE, 2 + patch->size);
|
tu_cs_emit_pkt7(cs, CP_MEM_WRITE, 2 + patch->size);
|
||||||
|
|
@ -2951,6 +2975,16 @@ tu_renderpass_begin(struct tu_cmd_buffer *cmd)
|
||||||
MESA_VK_DYNAMIC_IA_PRIMITIVE_RESTART_ENABLE);
|
MESA_VK_DYNAMIC_IA_PRIMITIVE_RESTART_ENABLE);
|
||||||
|
|
||||||
cmd->state.fdm_enabled = cmd->state.pass->has_fdm;
|
cmd->state.fdm_enabled = cmd->state.pass->has_fdm;
|
||||||
|
|
||||||
|
cmd->state.fdm_subsampled = false;
|
||||||
|
|
||||||
|
for (unsigned i = 0; i < cmd->state.framebuffer->attachment_count; i++) {
|
||||||
|
const struct tu_image_view *iview = cmd->state.attachments[i];
|
||||||
|
if (iview && (iview->image->vk.create_flags &
|
||||||
|
VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT)) {
|
||||||
|
cmd->state.fdm_subsampled = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool
|
static inline bool
|
||||||
|
|
@ -3169,6 +3203,18 @@ tu6_sysmem_render_end(struct tu_cmd_buffer *cmd, struct tu_cs *cs,
|
||||||
tu_cs_emit_pkt7(cs, CP_SKIP_IB2_ENABLE_GLOBAL, 1);
|
tu_cs_emit_pkt7(cs, CP_SKIP_IB2_ENABLE_GLOBAL, 1);
|
||||||
tu_cs_emit(cs, 0x0);
|
tu_cs_emit(cs, 0x0);
|
||||||
|
|
||||||
|
if (cmd->state.fdm_subsampled) {
|
||||||
|
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||||
|
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||||
|
cmd->state.pass->attachments[i].store) {
|
||||||
|
/* emit dummy subsampled metadata since we didn't use FDM */
|
||||||
|
tu_emit_subsampled_metadata(cmd, &cmd->cs, i,
|
||||||
|
NULL, NULL, NULL,
|
||||||
|
cmd->state.framebuffer, NULL);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
tu_lrz_sysmem_end<CHIP>(cmd, cs);
|
tu_lrz_sysmem_end<CHIP>(cmd, cs);
|
||||||
|
|
||||||
/* Clear the resource list for any LRZ resources we emitted at the
|
/* Clear the resource list for any LRZ resources we emitted at the
|
||||||
|
|
@ -3651,6 +3697,73 @@ tu_allocate_transient_attachments(struct tu_cmd_buffer *cmd, bool sysmem)
|
||||||
return VK_SUCCESS;
|
return VK_SUCCESS;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template <chip CHIP>
|
||||||
|
static void
|
||||||
|
tu_emit_subsampled(struct tu_cmd_buffer *cmd,
|
||||||
|
const struct tu_tile_config *tiles,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const VkOffset2D *fdm_offsets)
|
||||||
|
{
|
||||||
|
struct tu_cs *cs = &cmd->cs;
|
||||||
|
|
||||||
|
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||||
|
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||||
|
cmd->state.pass->attachments[i].store) {
|
||||||
|
tu_emit_subsampled_metadata(cmd, cs, i,
|
||||||
|
tiles, tiling, vsc,
|
||||||
|
cmd->state.framebuffer,
|
||||||
|
fdm_offsets);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* We may have subsampled images without FDM if FDM is disabled due to
|
||||||
|
* multisampled loads/stores, in which case we only need to emit the
|
||||||
|
* metadata.
|
||||||
|
*/
|
||||||
|
if (!tiles)
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* Flush for GMEM -> UCHE */
|
||||||
|
cmd->state.cache.pending_flush_bits |=
|
||||||
|
TU_CMD_FLAG_CACHE_INVALIDATE |
|
||||||
|
TU_CMD_FLAG_WAIT_FOR_IDLE;
|
||||||
|
|
||||||
|
VkRect2D *dst =
|
||||||
|
(VkRect2D *)malloc(8 * vsc->tile_count.width * vsc->tile_count.height *
|
||||||
|
(sizeof(VkRect2D) + sizeof(struct tu_rect2d_float)));
|
||||||
|
struct tu_rect2d_float *src =
|
||||||
|
(struct tu_rect2d_float *)(dst + 8 * vsc->tile_count.width * vsc->tile_count.height);
|
||||||
|
unsigned count;
|
||||||
|
|
||||||
|
/* Iterate over layers and then attachments so that we don't recompute the
|
||||||
|
* list of areas to copy for each attachment.
|
||||||
|
*/
|
||||||
|
for (unsigned layer = 0; layer < MAX2(cmd->state.pass->num_views,
|
||||||
|
fb->layers); layer++) {
|
||||||
|
unsigned view = fb->layers > 1 ?
|
||||||
|
(cmd->state.fdm_per_layer ? layer : 0) : layer;
|
||||||
|
count = tu_calc_subsampled_aprons(dst, src, view, tiles, tiling, vsc, fb,
|
||||||
|
fdm_offsets);
|
||||||
|
|
||||||
|
if (count != 0) {
|
||||||
|
for (unsigned i = 0; i < cmd->state.pass->attachment_count; i++) {
|
||||||
|
if (i != cmd->state.pass->fragment_density_map.attachment &&
|
||||||
|
cmd->state.pass->attachments[i].store &&
|
||||||
|
(cmd->state.pass->num_views == 0 ||
|
||||||
|
(cmd->state.pass->attachments[i].used_views & (1u << layer)) ||
|
||||||
|
(cmd->state.pass->attachments[i].resolve_views & (1u << layer)))) {
|
||||||
|
tu_blit_subsampled_apron<CHIP>(cmd, cs, cmd->state.attachments[i],
|
||||||
|
layer, dst, src, count);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
free(dst);
|
||||||
|
}
|
||||||
|
|
||||||
template <chip CHIP>
|
template <chip CHIP>
|
||||||
static void
|
static void
|
||||||
tu_cmd_render_tiles(struct tu_cmd_buffer *cmd,
|
tu_cmd_render_tiles(struct tu_cmd_buffer *cmd,
|
||||||
|
|
@ -3750,6 +3863,17 @@ tu_cmd_render_tiles(struct tu_cmd_buffer *cmd,
|
||||||
|
|
||||||
tu6_tile_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
tu6_tile_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
||||||
|
|
||||||
|
/* Outside of renderpasses we assume all draw states are disabled. We do
|
||||||
|
* this outside the draw CS for the normal case where 3d gmem stores aren't
|
||||||
|
* used. Do this before emitting subsampled blits.
|
||||||
|
*/
|
||||||
|
tu_disable_draw_states(cmd, &cmd->cs);
|
||||||
|
|
||||||
|
if (cmd->state.fdm_subsampled) {
|
||||||
|
tu_emit_subsampled<CHIP>(cmd, tiles, tiling, vsc, cmd->state.framebuffer,
|
||||||
|
fdm_offsets);
|
||||||
|
}
|
||||||
|
|
||||||
tu_trace_end_render_pass<CHIP>(cmd, true);
|
tu_trace_end_render_pass<CHIP>(cmd, true);
|
||||||
|
|
||||||
/* We have trashed the dynamically-emitted viewport, scissor, and FS params
|
/* We have trashed the dynamically-emitted viewport, scissor, and FS params
|
||||||
|
|
@ -3791,6 +3915,9 @@ tu_cmd_render_sysmem(struct tu_cmd_buffer *cmd,
|
||||||
|
|
||||||
tu6_sysmem_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
tu6_sysmem_render_end<CHIP>(cmd, &cmd->cs, autotune_result);
|
||||||
|
|
||||||
|
/* Outside of renderpasses we assume all draw states are disabled. */
|
||||||
|
tu_disable_draw_states(cmd, &cmd->cs);
|
||||||
|
|
||||||
tu_clone_trace_range(cmd, &cmd->cs, &cmd->trace,
|
tu_clone_trace_range(cmd, &cmd->cs, &cmd->trace,
|
||||||
cmd->trace_renderpass_start,
|
cmd->trace_renderpass_start,
|
||||||
u_trace_end_iterator(&cmd->rp_trace));
|
u_trace_end_iterator(&cmd->rp_trace));
|
||||||
|
|
@ -3811,13 +3938,6 @@ tu_cmd_render(struct tu_cmd_buffer *cmd_buffer,
|
||||||
tu_cmd_render_sysmem<CHIP>(cmd_buffer, autotune_result);
|
tu_cmd_render_sysmem<CHIP>(cmd_buffer, autotune_result);
|
||||||
else
|
else
|
||||||
tu_cmd_render_tiles<CHIP>(cmd_buffer, autotune_result, fdm_offsets);
|
tu_cmd_render_tiles<CHIP>(cmd_buffer, autotune_result, fdm_offsets);
|
||||||
|
|
||||||
/* Outside of renderpasses we assume all draw states are disabled. We do
|
|
||||||
* this outside the draw CS for the normal case where 3d gmem stores aren't
|
|
||||||
* used.
|
|
||||||
*/
|
|
||||||
tu_disable_draw_states(cmd_buffer, &cmd_buffer->cs);
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tu_reset_render_pass(struct tu_cmd_buffer *cmd_buffer)
|
static void tu_reset_render_pass(struct tu_cmd_buffer *cmd_buffer)
|
||||||
|
|
@ -5907,7 +6027,8 @@ tu_restore_suspended_pass(struct tu_cmd_buffer *cmd,
|
||||||
memcpy(cmd->state.render_areas,
|
memcpy(cmd->state.render_areas,
|
||||||
suspended->state.suspended_pass.render_areas,
|
suspended->state.suspended_pass.render_areas,
|
||||||
sizeof(cmd->state.render_areas));
|
sizeof(cmd->state.render_areas));
|
||||||
cmd->state.per_layer_render_area = suspended->state.per_layer_render_area;
|
cmd->state.per_layer_render_area = suspended->state.suspended_pass.per_layer_render_area;
|
||||||
|
cmd->state.fdm_subsampled = suspended->state.suspended_pass.fdm_subsampled;
|
||||||
cmd->state.gmem_layout = suspended->state.suspended_pass.gmem_layout;
|
cmd->state.gmem_layout = suspended->state.suspended_pass.gmem_layout;
|
||||||
cmd->state.tiling = &cmd->state.framebuffer->tiling[cmd->state.gmem_layout];
|
cmd->state.tiling = &cmd->state.framebuffer->tiling[cmd->state.gmem_layout];
|
||||||
cmd->state.lrz = suspended->state.suspended_pass.lrz;
|
cmd->state.lrz = suspended->state.suspended_pass.lrz;
|
||||||
|
|
@ -6903,6 +7024,7 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
||||||
tu_lrz_begin_renderpass<CHIP>(cmd);
|
tu_lrz_begin_renderpass<CHIP>(cmd);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
tu_renderpass_begin(cmd);
|
||||||
|
|
||||||
if (suspending) {
|
if (suspending) {
|
||||||
cmd->state.suspended_pass.pass = cmd->state.pass;
|
cmd->state.suspended_pass.pass = cmd->state.pass;
|
||||||
|
|
@ -6912,6 +7034,8 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
||||||
cmd->state.render_areas, sizeof(cmd->state.render_areas));
|
cmd->state.render_areas, sizeof(cmd->state.render_areas));
|
||||||
cmd->state.suspended_pass.per_layer_render_area =
|
cmd->state.suspended_pass.per_layer_render_area =
|
||||||
cmd->state.per_layer_render_area;
|
cmd->state.per_layer_render_area;
|
||||||
|
cmd->state.suspended_pass.fdm_subsampled =
|
||||||
|
cmd->state.fdm_subsampled;
|
||||||
cmd->state.suspended_pass.attachments = cmd->state.attachments;
|
cmd->state.suspended_pass.attachments = cmd->state.attachments;
|
||||||
cmd->state.suspended_pass.clear_values = cmd->state.clear_values;
|
cmd->state.suspended_pass.clear_values = cmd->state.clear_values;
|
||||||
cmd->state.suspended_pass.gmem_layout = cmd->state.gmem_layout;
|
cmd->state.suspended_pass.gmem_layout = cmd->state.gmem_layout;
|
||||||
|
|
@ -6919,8 +7043,6 @@ tu_CmdBeginRendering(VkCommandBuffer commandBuffer,
|
||||||
|
|
||||||
tu_fill_render_pass_state(&cmd->state.vk_rp, cmd->state.pass, cmd->state.subpass);
|
tu_fill_render_pass_state(&cmd->state.vk_rp, cmd->state.pass, cmd->state.subpass);
|
||||||
|
|
||||||
tu_renderpass_begin(cmd);
|
|
||||||
|
|
||||||
if (!resuming) {
|
if (!resuming) {
|
||||||
cmd->patchpoints_ctx = ralloc_context(NULL);
|
cmd->patchpoints_ctx = ralloc_context(NULL);
|
||||||
tu_emit_subpass_begin<CHIP>(cmd);
|
tu_emit_subpass_begin<CHIP>(cmd);
|
||||||
|
|
@ -7676,41 +7798,53 @@ fdm_apply_fs_params(struct tu_cmd_buffer *cmd,
|
||||||
* in which case views will be 1 and we have to replicate the one view
|
* in which case views will be 1 and we have to replicate the one view
|
||||||
* to all of the layers.
|
* to all of the layers.
|
||||||
*/
|
*/
|
||||||
VkExtent2D area = config->frag_areas[MIN2(i, views - 1)];
|
unsigned view = MIN2(i, views - 1);
|
||||||
|
VkExtent2D tile_frag_area = config->frag_areas[view];
|
||||||
VkRect2D bin = bins[MIN2(i, views - 1)];
|
VkRect2D bin = bins[MIN2(i, views - 1)];
|
||||||
VkOffset2D offset = tu_fdm_per_bin_offset(area, bin, common_bin_offset);
|
/* The space HW FragCoord (as well as viewport and scissor) is in is:
|
||||||
|
* - Without custom resolve, rendering space as usual.
|
||||||
/* For custom resolve, we switch to rendering directly to sysmem and so
|
* - With custom resolve to non-subsampled images, framebuffer
|
||||||
* the fragment size becomes 1x1. This means we have to scale down
|
* space.
|
||||||
* FragCoord when accessing GMEM input attachments.
|
* - With custom resolve to subsampled images, subsampled space. Its
|
||||||
|
* origin is subsampled_pos.offset, and it may or may not be scaled
|
||||||
|
* down depending on whether the view is subsampled.
|
||||||
*
|
*
|
||||||
* TODO: When we support subsampled images, this should also only happen
|
* For user FragCoord, we need to transform from this space to
|
||||||
* for non-subsampled images.
|
* framebuffer space. However the transform in the shader performs the
|
||||||
|
* opposite, so we actually need to transform from framebuffer space to
|
||||||
|
* this "custom rendering space". For GMEM FragCoord, we need to
|
||||||
|
* transform this space to rendering space.
|
||||||
*/
|
*/
|
||||||
|
VkOffset2D tile_start = common_bin_offset;
|
||||||
|
VkExtent2D rendering_frag_area = tile_frag_area;
|
||||||
|
VkExtent2D gmem_frag_area = (VkExtent2D) { 1, 1 };
|
||||||
if (state->custom_resolve) {
|
if (state->custom_resolve) {
|
||||||
tu_cs_emit(cs, 1 /* width */);
|
if (config->subsampled)
|
||||||
tu_cs_emit(cs, 1 /* height */);
|
tile_start = config->subsampled_pos[view].offset;
|
||||||
tu_cs_emit(cs, fui(0.0));
|
else
|
||||||
tu_cs_emit(cs, fui(0.0));
|
tile_start = bin.offset;
|
||||||
} else {
|
if (!(config->subsampled_views & (1u << view))) {
|
||||||
tu_cs_emit(cs, area.width);
|
rendering_frag_area = (VkExtent2D){ 1, 1 };
|
||||||
tu_cs_emit(cs, area.height);
|
gmem_frag_area = tile_frag_area;
|
||||||
tu_cs_emit(cs, fui(offset.x));
|
}
|
||||||
tu_cs_emit(cs, fui(offset.y));
|
|
||||||
}
|
}
|
||||||
|
VkRect2D gmem_bin = bin;
|
||||||
|
gmem_bin.offset = tile_start;
|
||||||
|
|
||||||
|
VkOffset2D offset = tu_fdm_per_bin_offset(rendering_frag_area, bin, tile_start);
|
||||||
|
VkOffset2D gmem_offset = tu_fdm_per_bin_offset(gmem_frag_area, gmem_bin,
|
||||||
|
common_bin_offset);
|
||||||
|
|
||||||
|
tu_cs_emit(cs, rendering_frag_area.width);
|
||||||
|
tu_cs_emit(cs, rendering_frag_area.height);
|
||||||
|
tu_cs_emit(cs, fui(offset.x));
|
||||||
|
tu_cs_emit(cs, fui(offset.y));
|
||||||
|
|
||||||
if (i * 2 + 1 < num_consts) {
|
if (i * 2 + 1 < num_consts) {
|
||||||
if (state->custom_resolve) {
|
tu_cs_emit(cs, fui(1. / gmem_frag_area.width));
|
||||||
tu_cs_emit(cs, fui(1. / area.width));
|
tu_cs_emit(cs, fui(1. / gmem_frag_area.height));
|
||||||
tu_cs_emit(cs, fui(1. / area.height));
|
tu_cs_emit(cs, fui(gmem_offset.x));
|
||||||
tu_cs_emit(cs, fui(offset.x));
|
tu_cs_emit(cs, fui(gmem_offset.y));
|
||||||
tu_cs_emit(cs, fui(offset.y));
|
|
||||||
} else {
|
|
||||||
tu_cs_emit(cs, fui(1.0));
|
|
||||||
tu_cs_emit(cs, fui(1.0));
|
|
||||||
tu_cs_emit(cs, fui(0.0));
|
|
||||||
tu_cs_emit(cs, fui(0.0));
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -551,6 +551,7 @@ struct tu_cmd_state
|
||||||
const struct tu_framebuffer *framebuffer;
|
const struct tu_framebuffer *framebuffer;
|
||||||
VkRect2D render_areas[MAX_VIEWS];
|
VkRect2D render_areas[MAX_VIEWS];
|
||||||
bool per_layer_render_area;
|
bool per_layer_render_area;
|
||||||
|
bool fdm_subsampled;
|
||||||
enum tu_gmem_layout gmem_layout;
|
enum tu_gmem_layout gmem_layout;
|
||||||
|
|
||||||
const struct tu_image_view **attachments;
|
const struct tu_image_view **attachments;
|
||||||
|
|
@ -560,6 +561,7 @@ struct tu_cmd_state
|
||||||
} suspended_pass;
|
} suspended_pass;
|
||||||
|
|
||||||
bool fdm_enabled;
|
bool fdm_enabled;
|
||||||
|
bool fdm_subsampled;
|
||||||
|
|
||||||
bool tessfactor_addr_set;
|
bool tessfactor_addr_set;
|
||||||
bool predication_active;
|
bool predication_active;
|
||||||
|
|
|
||||||
|
|
@ -156,6 +156,8 @@ enum tu_predicate_bit {
|
||||||
TU_PREDICATE_VTX_STATS_RUNNING = 3,
|
TU_PREDICATE_VTX_STATS_RUNNING = 3,
|
||||||
TU_PREDICATE_VTX_STATS_NOT_RUNNING = 4,
|
TU_PREDICATE_VTX_STATS_NOT_RUNNING = 4,
|
||||||
TU_PREDICATE_FIRST_TILE = 5,
|
TU_PREDICATE_FIRST_TILE = 5,
|
||||||
|
TU_PREDICATE_FAST_STORE = 6,
|
||||||
|
TU_PREDICATE_NO_FAST_STORE = 7,
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Onchip timestamp register layout. */
|
/* Onchip timestamp register layout. */
|
||||||
|
|
@ -176,6 +178,11 @@ enum tu_onchip_addr {
|
||||||
*/
|
*/
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct tu_rect2d_float {
|
||||||
|
float x_start, y_start;
|
||||||
|
float x_end, y_end;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
#define TU_GENX(FUNC_NAME) FD_GENX(FUNC_NAME)
|
#define TU_GENX(FUNC_NAME) FD_GENX(FUNC_NAME)
|
||||||
|
|
||||||
|
|
@ -213,4 +220,13 @@ struct tu_suballocator;
|
||||||
struct tu_subpass;
|
struct tu_subpass;
|
||||||
struct tu_u_trace_submission_data;
|
struct tu_u_trace_submission_data;
|
||||||
|
|
||||||
|
/* Helper for iterating over layers of an attachment that handles both
|
||||||
|
* multiview and layered rendering cases.
|
||||||
|
*/
|
||||||
|
#define for_each_layer(layer, layer_mask, layers) \
|
||||||
|
for (uint32_t layer = 0; \
|
||||||
|
layer < ((layer_mask) ? (util_logbase2(layer_mask) + 1) : (layers)); \
|
||||||
|
layer++) \
|
||||||
|
if (!(layer_mask) || ((layer_mask) & BIT(layer)))
|
||||||
|
|
||||||
#endif /* TU_COMMON_H */
|
#endif /* TU_COMMON_H */
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,7 @@
|
||||||
#include "tu_image.h"
|
#include "tu_image.h"
|
||||||
#include "tu_formats.h"
|
#include "tu_formats.h"
|
||||||
#include "tu_rmv.h"
|
#include "tu_rmv.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
#include "bvh/tu_build_interface.h"
|
#include "bvh/tu_build_interface.h"
|
||||||
|
|
||||||
static inline uint8_t *
|
static inline uint8_t *
|
||||||
|
|
@ -43,7 +44,8 @@ pool_base(struct tu_descriptor_pool *pool)
|
||||||
static uint32_t
|
static uint32_t
|
||||||
descriptor_size(struct tu_device *dev,
|
descriptor_size(struct tu_device *dev,
|
||||||
const VkDescriptorSetLayoutBinding *binding,
|
const VkDescriptorSetLayoutBinding *binding,
|
||||||
VkDescriptorType type)
|
VkDescriptorType type,
|
||||||
|
bool subsampled)
|
||||||
{
|
{
|
||||||
switch (type) {
|
switch (type) {
|
||||||
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
|
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
|
||||||
|
|
@ -54,7 +56,7 @@ descriptor_size(struct tu_device *dev,
|
||||||
* descriptors which are less than 16 dwords. However combined images
|
* descriptors which are less than 16 dwords. However combined images
|
||||||
* and samplers are actually two descriptors, so they have size 2.
|
* and samplers are actually two descriptors, so they have size 2.
|
||||||
*/
|
*/
|
||||||
return FDL6_TEX_CONST_DWORDS * 4 * 2;
|
return FDL6_TEX_CONST_DWORDS * 4 * (subsampled ? 3 : 2);
|
||||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
|
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
|
||||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
||||||
/* isam.v allows using a single 16-bit descriptor for both 16-bit and
|
/* isam.v allows using a single 16-bit descriptor for both 16-bit and
|
||||||
|
|
@ -80,7 +82,8 @@ mutable_descriptor_size(struct tu_device *dev,
|
||||||
uint32_t max_size = 0;
|
uint32_t max_size = 0;
|
||||||
|
|
||||||
for (uint32_t i = 0; i < list->descriptorTypeCount; i++) {
|
for (uint32_t i = 0; i < list->descriptorTypeCount; i++) {
|
||||||
uint32_t size = descriptor_size(dev, NULL, list->pDescriptorTypes[i]);
|
uint32_t size = descriptor_size(dev, NULL, list->pDescriptorTypes[i],
|
||||||
|
false);
|
||||||
max_size = MAX2(max_size, size);
|
max_size = MAX2(max_size, size);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -194,6 +197,7 @@ tu_CreateDescriptorSetLayout(
|
||||||
set_layout->binding[b].dynamic_offset_offset = dynamic_offset_size;
|
set_layout->binding[b].dynamic_offset_offset = dynamic_offset_size;
|
||||||
set_layout->binding[b].shader_stages = binding->stageFlags;
|
set_layout->binding[b].shader_stages = binding->stageFlags;
|
||||||
|
|
||||||
|
bool has_subsampled_sampler = false;
|
||||||
if ((binding->descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER ||
|
if ((binding->descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER ||
|
||||||
binding->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) &&
|
binding->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) &&
|
||||||
binding->pImmutableSamplers) {
|
binding->pImmutableSamplers) {
|
||||||
|
|
@ -208,8 +212,12 @@ tu_CreateDescriptorSetLayout(
|
||||||
|
|
||||||
bool has_ycbcr_sampler = false;
|
bool has_ycbcr_sampler = false;
|
||||||
for (unsigned i = 0; i < pCreateInfo->pBindings[j].descriptorCount; ++i) {
|
for (unsigned i = 0; i < pCreateInfo->pBindings[j].descriptorCount; ++i) {
|
||||||
if (tu_sampler_from_handle(binding->pImmutableSamplers[i])->vk.ycbcr_conversion)
|
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||||
|
binding->pImmutableSamplers[i]);
|
||||||
|
if (sampler->vk.ycbcr_conversion)
|
||||||
has_ycbcr_sampler = true;
|
has_ycbcr_sampler = true;
|
||||||
|
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)
|
||||||
|
has_subsampled_sampler = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (has_ycbcr_sampler) {
|
if (has_ycbcr_sampler) {
|
||||||
|
|
@ -236,7 +244,8 @@ tu_CreateDescriptorSetLayout(
|
||||||
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[j]);
|
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[j]);
|
||||||
} else {
|
} else {
|
||||||
set_layout->binding[b].size =
|
set_layout->binding[b].size =
|
||||||
descriptor_size(device, binding, binding->descriptorType);
|
descriptor_size(device, binding, binding->descriptorType,
|
||||||
|
has_subsampled_sampler);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK)
|
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK)
|
||||||
|
|
@ -365,7 +374,19 @@ tu_GetDescriptorSetLayoutSupport(
|
||||||
descriptor_sz =
|
descriptor_sz =
|
||||||
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[i]);
|
mutable_descriptor_size(device, &mutable_info->pMutableDescriptorTypeLists[i]);
|
||||||
} else {
|
} else {
|
||||||
descriptor_sz = descriptor_size(device, binding, binding->descriptorType);
|
bool has_subsampled_sampler = false;
|
||||||
|
if (binding->pImmutableSamplers) {
|
||||||
|
for (unsigned i = 0; i < binding->descriptorType; i++) {
|
||||||
|
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||||
|
binding->pImmutableSamplers[i]);
|
||||||
|
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||||
|
has_subsampled_sampler = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
descriptor_sz = descriptor_size(device, binding, binding->descriptorType,
|
||||||
|
has_subsampled_sampler);
|
||||||
}
|
}
|
||||||
uint64_t descriptor_alignment = 4 * FDL6_TEX_CONST_DWORDS;
|
uint64_t descriptor_alignment = 4 * FDL6_TEX_CONST_DWORDS;
|
||||||
|
|
||||||
|
|
@ -453,6 +474,9 @@ sha1_update_descriptor_set_binding_layout(struct mesa_sha1 *ctx,
|
||||||
SHA1_UPDATE_VALUE(ctx, layout->dynamic_offset_offset);
|
SHA1_UPDATE_VALUE(ctx, layout->dynamic_offset_offset);
|
||||||
SHA1_UPDATE_VALUE(ctx, layout->immutable_samplers_offset);
|
SHA1_UPDATE_VALUE(ctx, layout->immutable_samplers_offset);
|
||||||
|
|
||||||
|
const struct tu_sampler *samplers =
|
||||||
|
tu_immutable_samplers(set_layout, layout);
|
||||||
|
|
||||||
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
||||||
tu_immutable_ycbcr_samplers(set_layout, layout);
|
tu_immutable_ycbcr_samplers(set_layout, layout);
|
||||||
|
|
||||||
|
|
@ -460,6 +484,16 @@ sha1_update_descriptor_set_binding_layout(struct mesa_sha1 *ctx,
|
||||||
for (unsigned i = 0; i < layout->array_size; i++)
|
for (unsigned i = 0; i < layout->array_size; i++)
|
||||||
sha1_update_ycbcr_sampler(ctx, ycbcr_samplers + i);
|
sha1_update_ycbcr_sampler(ctx, ycbcr_samplers + i);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (samplers) {
|
||||||
|
for (unsigned i = 0; i < layout->array_size; i++) {
|
||||||
|
if (samplers[i].vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||||
|
SHA1_UPDATE_VALUE(ctx, i);
|
||||||
|
SHA1_UPDATE_VALUE(ctx, samplers[i].vk.address_mode_u);
|
||||||
|
SHA1_UPDATE_VALUE(ctx, samplers[i].vk.address_mode_v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -721,7 +755,7 @@ tu_CreateDescriptorPool(VkDevice _device,
|
||||||
switch (pool_size->type) {
|
switch (pool_size->type) {
|
||||||
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
|
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
|
||||||
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
|
||||||
dynamic_size += descriptor_size(device, NULL, pool_size->type) *
|
dynamic_size += descriptor_size(device, NULL, pool_size->type, false) *
|
||||||
pool_size->descriptorCount;
|
pool_size->descriptorCount;
|
||||||
break;
|
break;
|
||||||
case VK_DESCRIPTOR_TYPE_MUTABLE_EXT:
|
case VK_DESCRIPTOR_TYPE_MUTABLE_EXT:
|
||||||
|
|
@ -740,7 +774,11 @@ tu_CreateDescriptorPool(VkDevice _device,
|
||||||
bo_size += pool_size->descriptorCount;
|
bo_size += pool_size->descriptorCount;
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
bo_size += descriptor_size(device, NULL, pool_size->type) *
|
/* We don't know whether this pool will be used with subsampled
|
||||||
|
* images, so we have to assume it may be.
|
||||||
|
*/
|
||||||
|
bo_size += descriptor_size(device, NULL, pool_size->type,
|
||||||
|
device->vk.enabled_features.fragmentDensityMap) *
|
||||||
pool_size->descriptorCount;
|
pool_size->descriptorCount;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
@ -1084,15 +1122,35 @@ static void
|
||||||
write_combined_image_sampler_descriptor(uint32_t *dst,
|
write_combined_image_sampler_descriptor(uint32_t *dst,
|
||||||
VkDescriptorType descriptor_type,
|
VkDescriptorType descriptor_type,
|
||||||
const VkDescriptorImageInfo *image_info,
|
const VkDescriptorImageInfo *image_info,
|
||||||
bool has_sampler)
|
bool write_sampler,
|
||||||
|
const struct tu_sampler *immutable_sampler)
|
||||||
{
|
{
|
||||||
write_image_descriptor(dst, descriptor_type, image_info);
|
write_image_descriptor(dst, descriptor_type, image_info);
|
||||||
/* copy over sampler state */
|
|
||||||
if (has_sampler) {
|
|
||||||
VK_FROM_HANDLE(tu_sampler, sampler, image_info->sampler);
|
|
||||||
|
|
||||||
|
/* copy over sampler state */
|
||||||
|
if (write_sampler) {
|
||||||
|
VK_FROM_HANDLE(tu_sampler, sampler, image_info->sampler);
|
||||||
memcpy(dst + FDL6_TEX_CONST_DWORDS, sampler->descriptor, sizeof(sampler->descriptor));
|
memcpy(dst + FDL6_TEX_CONST_DWORDS, sampler->descriptor, sizeof(sampler->descriptor));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* It's technically legal to sample from a mismatched descriptor (i.e. only
|
||||||
|
* the sampler or only the image has SUBSAMPLED_BIT) but it gives undefined
|
||||||
|
* results. So we have to make sure not to crash or disturb other
|
||||||
|
* descriptors. Therefore we check the sampler, because that's what
|
||||||
|
* triggers allocating extra space in the descriptor set.
|
||||||
|
*/
|
||||||
|
if (immutable_sampler &&
|
||||||
|
(immutable_sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)) {
|
||||||
|
VK_FROM_HANDLE(tu_image_view, iview, image_info->imageView);
|
||||||
|
VkDescriptorAddressInfoEXT info = {
|
||||||
|
.address = iview->image->iova +
|
||||||
|
iview->image->subsampled_metadata_offset +
|
||||||
|
iview->vk.base_array_layer * sizeof(struct tu_subsampled_metadata),
|
||||||
|
.range =
|
||||||
|
iview->vk.layer_count * sizeof(struct tu_subsampled_metadata),
|
||||||
|
};
|
||||||
|
write_ubo_descriptor_addr(dst + 2 * FDL6_TEX_CONST_DWORDS, &info);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
|
|
@ -1156,12 +1214,15 @@ tu_GetDescriptorEXT(
|
||||||
write_image_descriptor(dest, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
|
write_image_descriptor(dest, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
|
||||||
pDescriptorInfo->data.pStorageImage);
|
pDescriptorInfo->data.pStorageImage);
|
||||||
break;
|
break;
|
||||||
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
|
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: {
|
||||||
|
VK_FROM_HANDLE(tu_sampler, sampler,
|
||||||
|
pDescriptorInfo->data.pCombinedImageSampler->sampler);
|
||||||
write_combined_image_sampler_descriptor(dest,
|
write_combined_image_sampler_descriptor(dest,
|
||||||
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
|
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
|
||||||
pDescriptorInfo->data.pCombinedImageSampler,
|
pDescriptorInfo->data.pCombinedImageSampler,
|
||||||
true);
|
true, sampler);
|
||||||
break;
|
break;
|
||||||
|
}
|
||||||
case VK_DESCRIPTOR_TYPE_SAMPLER:
|
case VK_DESCRIPTOR_TYPE_SAMPLER:
|
||||||
write_sampler_descriptor(dest, *pDescriptorInfo->data.pSampler);
|
write_sampler_descriptor(dest, *pDescriptorInfo->data.pSampler);
|
||||||
break;
|
break;
|
||||||
|
|
@ -1285,7 +1346,8 @@ tu_update_descriptor_sets(const struct tu_device *device,
|
||||||
write_combined_image_sampler_descriptor(ptr,
|
write_combined_image_sampler_descriptor(ptr,
|
||||||
writeset->descriptorType,
|
writeset->descriptorType,
|
||||||
writeset->pImageInfo + j,
|
writeset->pImageInfo + j,
|
||||||
!binding_layout->immutable_samplers_offset);
|
!samplers,
|
||||||
|
samplers ? &samplers[writeset->dstArrayElement + j] : NULL);
|
||||||
|
|
||||||
if (copy_immutable_samplers)
|
if (copy_immutable_samplers)
|
||||||
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[writeset->dstArrayElement + j]);
|
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[writeset->dstArrayElement + j]);
|
||||||
|
|
@ -1636,7 +1698,8 @@ tu_update_descriptor_set_with_template(
|
||||||
write_combined_image_sampler_descriptor(ptr,
|
write_combined_image_sampler_descriptor(ptr,
|
||||||
templ->entry[i].descriptor_type,
|
templ->entry[i].descriptor_type,
|
||||||
(const VkDescriptorImageInfo *) src,
|
(const VkDescriptorImageInfo *) src,
|
||||||
!samplers);
|
!samplers,
|
||||||
|
samplers ? &samplers[j] : NULL);
|
||||||
if (templ->entry[i].copy_immutable_samplers)
|
if (templ->entry[i].copy_immutable_samplers)
|
||||||
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[j]);
|
write_sampler_push(ptr + FDL6_TEX_CONST_DWORDS, &samplers[j]);
|
||||||
break;
|
break;
|
||||||
|
|
|
||||||
|
|
@ -1411,7 +1411,7 @@ tu_get_properties(struct tu_physical_device *pdevice,
|
||||||
props->samplerDescriptorBufferAddressSpaceSize = ~0ull;
|
props->samplerDescriptorBufferAddressSpaceSize = ~0ull;
|
||||||
props->resourceDescriptorBufferAddressSpaceSize = ~0ull;
|
props->resourceDescriptorBufferAddressSpaceSize = ~0ull;
|
||||||
props->descriptorBufferAddressSpaceSize = ~0ull;
|
props->descriptorBufferAddressSpaceSize = ~0ull;
|
||||||
props->combinedImageSamplerDensityMapDescriptorSize = 2 * FDL6_TEX_CONST_DWORDS * 4;
|
props->combinedImageSamplerDensityMapDescriptorSize = 3 * FDL6_TEX_CONST_DWORDS * 4;
|
||||||
|
|
||||||
/* VK_EXT_legacy_vertex_attributes */
|
/* VK_EXT_legacy_vertex_attributes */
|
||||||
props->nativeUnalignedPerformance = true;
|
props->nativeUnalignedPerformance = true;
|
||||||
|
|
|
||||||
|
|
@ -44,6 +44,7 @@
|
||||||
|
|
||||||
enum global_shader {
|
enum global_shader {
|
||||||
GLOBAL_SH_VS_BLIT,
|
GLOBAL_SH_VS_BLIT,
|
||||||
|
GLOBAL_SH_VS_MULTI_BLIT,
|
||||||
GLOBAL_SH_VS_CLEAR,
|
GLOBAL_SH_VS_CLEAR,
|
||||||
GLOBAL_SH_FS_BLIT,
|
GLOBAL_SH_FS_BLIT,
|
||||||
GLOBAL_SH_FS_BLIT_ZSCALE,
|
GLOBAL_SH_FS_BLIT_ZSCALE,
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,7 @@
|
||||||
#include "tu_formats.h"
|
#include "tu_formats.h"
|
||||||
#include "tu_lrz.h"
|
#include "tu_lrz.h"
|
||||||
#include "tu_rmv.h"
|
#include "tu_rmv.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
#include "tu_wsi.h"
|
#include "tu_wsi.h"
|
||||||
|
|
||||||
uint32_t
|
uint32_t
|
||||||
|
|
@ -538,6 +539,15 @@ tu_image_update_layout(struct tu_device *device, struct tu_image *image,
|
||||||
/* no UBWC for separate stencil */
|
/* no UBWC for separate stencil */
|
||||||
image->ubwc_enabled = false;
|
image->ubwc_enabled = false;
|
||||||
|
|
||||||
|
/* Subsampled images with FDM offset require extra space for adjusting
|
||||||
|
* the offset to make the tiles aligned.
|
||||||
|
*/
|
||||||
|
if ((image->vk.create_flags & VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT) &&
|
||||||
|
(image->vk.create_flags & VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT)) {
|
||||||
|
width0 += device->physical_device->info->tile_align_w;
|
||||||
|
height0 += device->physical_device->info->tile_align_h;
|
||||||
|
}
|
||||||
|
|
||||||
struct fdl_explicit_layout plane_layout;
|
struct fdl_explicit_layout plane_layout;
|
||||||
|
|
||||||
if (plane_layouts) {
|
if (plane_layouts) {
|
||||||
|
|
@ -634,6 +644,12 @@ tu_image_update_layout(struct tu_device *device, struct tu_image *image,
|
||||||
image->lrz_layout.lrz_total_size = 0;
|
image->lrz_layout.lrz_total_size = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (image->vk.create_flags & VK_IMAGE_CREATE_SUBSAMPLED_BIT_EXT) {
|
||||||
|
image->subsampled_metadata_offset = align64(image->total_size, 16);
|
||||||
|
image->total_size = image->subsampled_metadata_offset +
|
||||||
|
image->vk.array_layers * sizeof(struct tu_subsampled_metadata);
|
||||||
|
}
|
||||||
|
|
||||||
return VK_SUCCESS;
|
return VK_SUCCESS;
|
||||||
}
|
}
|
||||||
TU_GENX(tu_image_update_layout);
|
TU_GENX(tu_image_update_layout);
|
||||||
|
|
|
||||||
|
|
@ -34,6 +34,7 @@ struct tu_image
|
||||||
struct vk_image vk;
|
struct vk_image vk;
|
||||||
|
|
||||||
struct fdl_layout layout[3];
|
struct fdl_layout layout[3];
|
||||||
|
uint64_t subsampled_metadata_offset;
|
||||||
uint64_t total_size;
|
uint64_t total_size;
|
||||||
|
|
||||||
/* Set when bound */
|
/* Set when bound */
|
||||||
|
|
|
||||||
|
|
@ -2732,32 +2732,46 @@ fdm_apply_viewports(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
||||||
* renderpass, views will be 1 and we also have to replicate the 0'th
|
* renderpass, views will be 1 and we also have to replicate the 0'th
|
||||||
* view to every view.
|
* view to every view.
|
||||||
*/
|
*/
|
||||||
VkExtent2D frag_area =
|
unsigned view = (state->share_scale || views == 1) ? 0 : i;
|
||||||
(state->share_scale || views == 1) ? tile->frag_areas[0] : tile->frag_areas[i];
|
VkExtent2D frag_area = tile->frag_areas[view];
|
||||||
VkRect2D bin =
|
VkRect2D bin = bins[view];
|
||||||
(state->share_scale || views == 1) ? bins[0] : bins[i];
|
VkOffset2D hw_viewport_offset = hw_viewport_offsets[view];
|
||||||
VkOffset2D hw_viewport_offset =
|
|
||||||
(state->share_scale || views == 1) ? hw_viewport_offsets[0] :
|
|
||||||
hw_viewport_offsets[i];
|
|
||||||
/* Implement fake_single_viewport by replicating viewport 0 across all
|
/* Implement fake_single_viewport by replicating viewport 0 across all
|
||||||
* views.
|
* views.
|
||||||
*/
|
*/
|
||||||
VkViewport viewport =
|
VkViewport viewport =
|
||||||
state->fake_single_viewport ? state->vp.viewports[0] : state->vp.viewports[i];
|
state->fake_single_viewport ? state->vp.viewports[0] : state->vp.viewports[i];
|
||||||
if ((frag_area.width == 1 && frag_area.height == 1 &&
|
if (frag_area.width == 1 && frag_area.height == 1 &&
|
||||||
common_bin_offset.x == bin.offset.x &&
|
common_bin_offset.x == bin.offset.x &&
|
||||||
common_bin_offset.y == bin.offset.y) ||
|
common_bin_offset.y == bin.offset.y) {
|
||||||
/* When in a custom resolve operation (TODO: and using
|
|
||||||
* non-subsampled images) we switch to framebuffer coordinates so we
|
|
||||||
* shouldn't apply the transform. However the binning pass isn't
|
|
||||||
* aware of this, so we have to keep applying the transform for
|
|
||||||
* binning.
|
|
||||||
*/
|
|
||||||
(state->custom_resolve && !binning)) {
|
|
||||||
vp.viewports[i] = viewport;
|
vp.viewports[i] = viewport;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* When custom resolve is enabled, we need to apply the viewport
|
||||||
|
* transform so that we render to where we would've blitted the tile to.
|
||||||
|
* Without subsampled images, this the framebuffer space bin (so there
|
||||||
|
* is effectively no transform). With subsampled images, this is
|
||||||
|
* subsampled space, which may not be the same as rendering space if
|
||||||
|
* we had to shift the tile or with FDM offset.
|
||||||
|
*/
|
||||||
|
VkOffset2D tile_start = common_bin_offset;
|
||||||
|
if (state->custom_resolve && !binning) {
|
||||||
|
if (tile->subsampled)
|
||||||
|
tile_start = tile->subsampled_pos[view].offset;
|
||||||
|
else
|
||||||
|
tile_start = bin.offset;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* When in a custom resolve operation without subsampling we shouldn't
|
||||||
|
* scale the viewport down. However the binning pass isn't aware of
|
||||||
|
* this, so we have to keep applying the transform for binning.
|
||||||
|
*/
|
||||||
|
if (state->custom_resolve &&
|
||||||
|
!(tile->subsampled_views & (1u << view)) && !binning) {
|
||||||
|
frag_area = (VkExtent2D) {1, 1};
|
||||||
|
}
|
||||||
|
|
||||||
float scale_x = (float) 1.0f / frag_area.width;
|
float scale_x = (float) 1.0f / frag_area.width;
|
||||||
float scale_y = (float) 1.0f / frag_area.height;
|
float scale_y = (float) 1.0f / frag_area.height;
|
||||||
|
|
||||||
|
|
@ -2767,9 +2781,12 @@ fdm_apply_viewports(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
||||||
vp.viewports[i].height = viewport.height * scale_y;
|
vp.viewports[i].height = viewport.height * scale_y;
|
||||||
|
|
||||||
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
||||||
common_bin_offset);
|
tile_start);
|
||||||
offset.x -= hw_viewport_offset.x;
|
/* FDM offsets are disabled with custom resolve. */
|
||||||
offset.y -= hw_viewport_offset.y;
|
if (!state->custom_resolve) {
|
||||||
|
offset.x -= hw_viewport_offset.x;
|
||||||
|
offset.y -= hw_viewport_offset.y;
|
||||||
|
}
|
||||||
|
|
||||||
vp.viewports[i].x = scale_x * viewport.x + offset.x;
|
vp.viewports[i].x = scale_x * viewport.x + offset.x;
|
||||||
vp.viewports[i].y = scale_y * viewport.y + offset.y;
|
vp.viewports[i].y = scale_y * viewport.y + offset.y;
|
||||||
|
|
@ -2861,15 +2878,33 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
||||||
struct vk_viewport_state vp = state->vp;
|
struct vk_viewport_state vp = state->vp;
|
||||||
|
|
||||||
for (unsigned i = 0; i < vp.scissor_count; i++) {
|
for (unsigned i = 0; i < vp.scissor_count; i++) {
|
||||||
VkExtent2D frag_area =
|
unsigned view = (state->share_scale || views == 1) ? 0 : i;
|
||||||
(state->share_scale || views == 1) ? tile->frag_areas[0] : tile->frag_areas[i];
|
VkExtent2D frag_area = tile->frag_areas[view];
|
||||||
VkRect2D bin =
|
VkRect2D bin = bins[view];
|
||||||
(state->share_scale || views == 1) ? bins[0] : bins[i];
|
|
||||||
VkRect2D scissor =
|
VkRect2D scissor =
|
||||||
state->fake_single_viewport ? state->vp.scissors[0] : state->vp.scissors[i];
|
state->fake_single_viewport ? state->vp.scissors[0] : state->vp.scissors[i];
|
||||||
VkOffset2D hw_viewport_offset =
|
VkOffset2D hw_viewport_offset = hw_viewport_offsets[view];
|
||||||
(state->share_scale || views == 1) ? hw_viewport_offsets[0] :
|
|
||||||
hw_viewport_offsets[i];
|
VkOffset2D tile_start = common_bin_offset;
|
||||||
|
if (state->custom_resolve && !binning) {
|
||||||
|
if (tile->subsampled)
|
||||||
|
tile_start = tile->subsampled_pos[view].offset;
|
||||||
|
else
|
||||||
|
tile_start = bin.offset;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Disable scaling when doing a custom resolve to a non-subsampled image
|
||||||
|
* and not in the binning pass, because we use framebuffer coordinates.
|
||||||
|
*/
|
||||||
|
if (state->custom_resolve &&
|
||||||
|
!(tile->subsampled_views & (1u << view)) && !binning) {
|
||||||
|
frag_area = (VkExtent2D) {1, 1};
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!state->custom_resolve) {
|
||||||
|
tile_start.x -= hw_viewport_offset.x;
|
||||||
|
tile_start.y -= hw_viewport_offset.y;
|
||||||
|
}
|
||||||
|
|
||||||
/* Transform the scissor following the viewport. It's unclear how this
|
/* Transform the scissor following the viewport. It's unclear how this
|
||||||
* is supposed to handle cases where the scissor isn't aligned to the
|
* is supposed to handle cases where the scissor isn't aligned to the
|
||||||
|
|
@ -2878,22 +2913,7 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
||||||
* isn't aligned to the fragment area.
|
* isn't aligned to the fragment area.
|
||||||
*/
|
*/
|
||||||
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
VkOffset2D offset = tu_fdm_per_bin_offset(frag_area, bin,
|
||||||
common_bin_offset);
|
tile_start);
|
||||||
offset.x -= hw_viewport_offset.x;
|
|
||||||
offset.y -= hw_viewport_offset.y;
|
|
||||||
|
|
||||||
/* Disable scaling and offset when doing a custom resolve to a
|
|
||||||
* non-subsampled image and not in the binning pass, because we
|
|
||||||
* use framebuffer coordinates.
|
|
||||||
*
|
|
||||||
* TODO: When we support subsampled images, only do this for
|
|
||||||
* non-subsampled images.
|
|
||||||
*/
|
|
||||||
if (state->custom_resolve && !binning) {
|
|
||||||
offset = (VkOffset2D) {};
|
|
||||||
frag_area = (VkExtent2D) {1, 1};
|
|
||||||
}
|
|
||||||
|
|
||||||
VkOffset2D min = {
|
VkOffset2D min = {
|
||||||
scissor.offset.x / frag_area.width + offset.x,
|
scissor.offset.x / frag_area.width + offset.x,
|
||||||
scissor.offset.y / frag_area.height + offset.y,
|
scissor.offset.y / frag_area.height + offset.y,
|
||||||
|
|
@ -2904,26 +2924,17 @@ fdm_apply_scissors(struct tu_cmd_buffer *cmd, struct tu_cs *cs, void *data,
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Intersect scissor with the scaled bin, this essentially replaces the
|
/* Intersect scissor with the scaled bin, this essentially replaces the
|
||||||
* window scissor. With custom resolve (TODO: and non-subsampled images)
|
* window scissor. With custom resolve we have to use the unscaled bin
|
||||||
* we have to use the unscaled bin instead.
|
* instead.
|
||||||
*/
|
*/
|
||||||
uint32_t scaled_width = bin.extent.width / frag_area.width;
|
uint32_t scaled_width = bin.extent.width / frag_area.width;
|
||||||
uint32_t scaled_height = bin.extent.height / frag_area.height;
|
uint32_t scaled_height = bin.extent.height / frag_area.height;
|
||||||
int32_t bin_x;
|
vp.scissors[i].offset.x = MAX2(min.x, tile_start.x);
|
||||||
int32_t bin_y;
|
vp.scissors[i].offset.y = MAX2(min.y, tile_start.y);
|
||||||
if (state->custom_resolve && !binning) {
|
|
||||||
bin_x = bin.offset.x;
|
|
||||||
bin_y = bin.offset.y;
|
|
||||||
} else {
|
|
||||||
bin_x = common_bin_offset.x - hw_viewport_offset.x;
|
|
||||||
bin_y = common_bin_offset.y - hw_viewport_offset.y;
|
|
||||||
}
|
|
||||||
vp.scissors[i].offset.x = MAX2(min.x, bin_x);
|
|
||||||
vp.scissors[i].offset.y = MAX2(min.y, bin_y);
|
|
||||||
vp.scissors[i].extent.width =
|
vp.scissors[i].extent.width =
|
||||||
MIN2(max.x, bin_x + scaled_width) - vp.scissors[i].offset.x;
|
MIN2(max.x, tile_start.x + scaled_width) - vp.scissors[i].offset.x;
|
||||||
vp.scissors[i].extent.height =
|
vp.scissors[i].extent.height =
|
||||||
MIN2(max.y, bin_y + scaled_height) - vp.scissors[i].offset.y;
|
MIN2(max.y, tile_start.y + scaled_height) - vp.scissors[i].offset.y;
|
||||||
}
|
}
|
||||||
|
|
||||||
TU_CALLX(cs->device, tu6_emit_scissor)(cs, &vp);
|
TU_CALLX(cs->device, tu6_emit_scissor)(cs, &vp);
|
||||||
|
|
|
||||||
|
|
@ -21,6 +21,7 @@
|
||||||
#include "tu_lrz.h"
|
#include "tu_lrz.h"
|
||||||
#include "tu_pipeline.h"
|
#include "tu_pipeline.h"
|
||||||
#include "tu_rmv.h"
|
#include "tu_rmv.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
|
|
||||||
#include <initializer_list>
|
#include <initializer_list>
|
||||||
|
|
||||||
|
|
@ -506,7 +507,7 @@ lower_ssbo_ubo_intrinsic(struct tu_device *dev,
|
||||||
|
|
||||||
static nir_def *
|
static nir_def *
|
||||||
build_bindless(struct tu_device *dev, nir_builder *b,
|
build_bindless(struct tu_device *dev, nir_builder *b,
|
||||||
nir_deref_instr *deref, bool is_sampler,
|
nir_deref_instr *deref, unsigned combined_descriptor_offset,
|
||||||
struct tu_shader *shader,
|
struct tu_shader *shader,
|
||||||
const struct tu_pipeline_layout *layout,
|
const struct tu_pipeline_layout *layout,
|
||||||
uint32_t read_only_input_attachments,
|
uint32_t read_only_input_attachments,
|
||||||
|
|
@ -568,9 +569,8 @@ build_bindless(struct tu_device *dev, nir_builder *b,
|
||||||
/* Samplers come second in combined image/sampler descriptors, see
|
/* Samplers come second in combined image/sampler descriptors, see
|
||||||
* write_combined_image_sampler_descriptor().
|
* write_combined_image_sampler_descriptor().
|
||||||
*/
|
*/
|
||||||
if (is_sampler && bind_layout->type ==
|
if (bind_layout->type == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) {
|
||||||
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) {
|
offset = combined_descriptor_offset;
|
||||||
offset = 1;
|
|
||||||
}
|
}
|
||||||
desc_offset =
|
desc_offset =
|
||||||
nir_imm_int(b, (bind_layout->offset / (4 * FDL6_TEX_CONST_DWORDS)) +
|
nir_imm_int(b, (bind_layout->offset / (4 * FDL6_TEX_CONST_DWORDS)) +
|
||||||
|
|
@ -594,7 +594,7 @@ lower_image_deref(struct tu_device *dev, nir_builder *b,
|
||||||
const struct tu_pipeline_layout *layout)
|
const struct tu_pipeline_layout *layout)
|
||||||
{
|
{
|
||||||
nir_deref_instr *deref = nir_src_as_deref(instr->src[0]);
|
nir_deref_instr *deref = nir_src_as_deref(instr->src[0]);
|
||||||
nir_def *bindless = build_bindless(dev, b, deref, false, shader, layout, 0, false);
|
nir_def *bindless = build_bindless(dev, b, deref, 0, shader, layout, 0, false);
|
||||||
nir_rewrite_image_intrinsic(instr, bindless, true);
|
nir_rewrite_image_intrinsic(instr, bindless, true);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -697,42 +697,93 @@ lower_intrinsic(nir_builder *b, nir_intrinsic_instr *instr,
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
lower_tex_ycbcr(const struct tu_pipeline_layout *layout,
|
lower_tex_subsampled(const struct tu_sampler *sampler,
|
||||||
|
struct tu_device *dev,
|
||||||
|
struct tu_shader *shader,
|
||||||
|
const struct tu_pipeline_layout *layout,
|
||||||
|
nir_builder *b,
|
||||||
|
nir_tex_instr *tex)
|
||||||
|
{
|
||||||
|
/* Only these ops are allowed with subsampled images */
|
||||||
|
if (tex->op != nir_texop_tex &&
|
||||||
|
tex->op != nir_texop_txl)
|
||||||
|
return;
|
||||||
|
|
||||||
|
b->cursor = nir_before_instr(&tex->instr);
|
||||||
|
|
||||||
|
int tex_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
||||||
|
assert(tex_src_idx >= 0);
|
||||||
|
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_src_idx].src);
|
||||||
|
nir_def *bindless = build_bindless(dev, b, deref, 2, shader, layout,
|
||||||
|
0, /* read_only_input_attachments (not used) */
|
||||||
|
false /* dynamic_renderpass (not used)*/
|
||||||
|
);
|
||||||
|
|
||||||
|
nir_def *coord = nir_steal_tex_src(tex, nir_tex_src_coord);
|
||||||
|
nir_def *coord_xy = nir_channels(b, coord, 0x3);
|
||||||
|
nir_def *layer = NULL;
|
||||||
|
if (coord->num_components > 2)
|
||||||
|
layer = nir_channel(b, coord, 2);
|
||||||
|
|
||||||
|
/* In order to avoid problems in the math for finding the bin with
|
||||||
|
* an x or y coordinate of exactly 1.0, where we would overflow into the
|
||||||
|
* next bin, we have to clamp to some 1.0 - epsilon. The largest possible
|
||||||
|
* framebuffer is 2^14 pixels currently, and we cannot shift the coordinate
|
||||||
|
* to before the pixel center, so we use 2^-15.
|
||||||
|
*/
|
||||||
|
const float epsilon = 0x1p-15f;
|
||||||
|
nir_def *clamped_coord_xy =
|
||||||
|
nir_fmax(b, nir_fmin(b, coord_xy, nir_imm_float(b, 1.0f - epsilon)),
|
||||||
|
nir_imm_float(b, 0.0));
|
||||||
|
|
||||||
|
nir_def *clamped_coord = clamped_coord_xy;
|
||||||
|
if (layer) {
|
||||||
|
clamped_coord = nir_vec3(b, nir_channel(b, clamped_coord_xy, 0),
|
||||||
|
nir_channel(b, clamped_coord_xy, 1),
|
||||||
|
layer);
|
||||||
|
}
|
||||||
|
|
||||||
|
nir_def *transformed_coord_xy =
|
||||||
|
tu_get_subsampled_coordinates(b, clamped_coord, bindless);
|
||||||
|
|
||||||
|
/* Due to VUID-VkSamplerCreateInfo-flags-02577 we only have to handle
|
||||||
|
* CLAMP_TO_EDGE and CLAMP_TO_BORDER. We implicitly do CLAMP_TO_EDGE to
|
||||||
|
* prevent OOB accesses to the metadata anyway, so we just fixup the
|
||||||
|
* coordinates to pass the original coordinates if OOB.
|
||||||
|
*/
|
||||||
|
if (sampler->vk.address_mode_u == VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER) {
|
||||||
|
nir_def *x = nir_channel(b, coord, 0);
|
||||||
|
nir_def *oob = nir_fneu(b, nir_fsat(b, x), x);
|
||||||
|
transformed_coord_xy =
|
||||||
|
nir_vec2(b, nir_bcsel(b, oob, x,
|
||||||
|
nir_channel(b, transformed_coord_xy, 0)),
|
||||||
|
nir_channel(b, transformed_coord_xy, 1));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (sampler->vk.address_mode_v == VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER) {
|
||||||
|
nir_def *y = nir_channel(b, coord, 1);
|
||||||
|
nir_def *oob = nir_fneu(b, nir_fsat(b, y), y);
|
||||||
|
transformed_coord_xy =
|
||||||
|
nir_vec2(b, nir_channel(b, transformed_coord_xy, 0),
|
||||||
|
nir_bcsel(b, oob, y,
|
||||||
|
nir_channel(b, transformed_coord_xy, 1)));
|
||||||
|
}
|
||||||
|
|
||||||
|
nir_def *transformed_coord = transformed_coord_xy;
|
||||||
|
if (layer) {
|
||||||
|
transformed_coord = nir_vec3(b, nir_channel(b, transformed_coord_xy, 0),
|
||||||
|
nir_channel(b, transformed_coord_xy, 1),
|
||||||
|
layer);
|
||||||
|
}
|
||||||
|
|
||||||
|
nir_tex_instr_add_src(tex, nir_tex_src_coord, transformed_coord);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
lower_tex_ycbcr(const struct vk_ycbcr_conversion_state *ycbcr_sampler,
|
||||||
nir_builder *builder,
|
nir_builder *builder,
|
||||||
nir_tex_instr *tex)
|
nir_tex_instr *tex)
|
||||||
{
|
{
|
||||||
int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
|
||||||
assert(deref_src_idx >= 0);
|
|
||||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src);
|
|
||||||
|
|
||||||
nir_variable *var = nir_deref_instr_get_variable(deref);
|
|
||||||
const struct tu_descriptor_set_layout *set_layout =
|
|
||||||
layout->set[var->data.descriptor_set].layout;
|
|
||||||
const struct tu_descriptor_set_binding_layout *binding =
|
|
||||||
&set_layout->binding[var->data.binding];
|
|
||||||
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
|
||||||
tu_immutable_ycbcr_samplers(set_layout, binding);
|
|
||||||
|
|
||||||
if (!ycbcr_samplers)
|
|
||||||
return;
|
|
||||||
|
|
||||||
/* For the following instructions, we don't apply any change */
|
|
||||||
if (tex->op == nir_texop_txs ||
|
|
||||||
tex->op == nir_texop_query_levels ||
|
|
||||||
tex->op == nir_texop_lod)
|
|
||||||
return;
|
|
||||||
|
|
||||||
assert(tex->texture_index == 0);
|
|
||||||
unsigned array_index = 0;
|
|
||||||
if (deref->deref_type != nir_deref_type_var) {
|
|
||||||
assert(deref->deref_type == nir_deref_type_array);
|
|
||||||
if (!nir_src_is_const(deref->arr.index))
|
|
||||||
return;
|
|
||||||
array_index = nir_src_as_uint(deref->arr.index);
|
|
||||||
array_index = MIN2(array_index, binding->array_size - 1);
|
|
||||||
}
|
|
||||||
const struct vk_ycbcr_conversion_state *ycbcr_sampler = ycbcr_samplers + array_index;
|
|
||||||
|
|
||||||
if (ycbcr_sampler->ycbcr_model == VK_SAMPLER_YCBCR_MODEL_CONVERSION_RGB_IDENTITY)
|
if (ycbcr_sampler->ycbcr_model == VK_SAMPLER_YCBCR_MODEL_CONVERSION_RGB_IDENTITY)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
|
@ -756,6 +807,55 @@ lower_tex_ycbcr(const struct tu_pipeline_layout *layout,
|
||||||
builder->cursor = nir_before_instr(&tex->instr);
|
builder->cursor = nir_before_instr(&tex->instr);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
lower_tex_immutable(struct tu_device *dev,
|
||||||
|
struct tu_shader *shader,
|
||||||
|
const struct tu_pipeline_layout *layout,
|
||||||
|
nir_builder *builder,
|
||||||
|
nir_tex_instr *tex)
|
||||||
|
{
|
||||||
|
int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
|
||||||
|
assert(deref_src_idx >= 0);
|
||||||
|
nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src);
|
||||||
|
|
||||||
|
nir_variable *var = nir_deref_instr_get_variable(deref);
|
||||||
|
const struct tu_descriptor_set_layout *set_layout =
|
||||||
|
layout->set[var->data.descriptor_set].layout;
|
||||||
|
const struct tu_descriptor_set_binding_layout *binding =
|
||||||
|
&set_layout->binding[var->data.binding];
|
||||||
|
|
||||||
|
/* For the following instructions, we don't apply any change */
|
||||||
|
if (tex->op == nir_texop_txs ||
|
||||||
|
tex->op == nir_texop_query_levels ||
|
||||||
|
tex->op == nir_texop_lod)
|
||||||
|
return;
|
||||||
|
|
||||||
|
assert(tex->texture_index == 0);
|
||||||
|
unsigned array_index = 0;
|
||||||
|
if (deref->deref_type != nir_deref_type_var) {
|
||||||
|
assert(deref->deref_type == nir_deref_type_array);
|
||||||
|
if (!nir_src_is_const(deref->arr.index))
|
||||||
|
return;
|
||||||
|
array_index = nir_src_as_uint(deref->arr.index);
|
||||||
|
array_index = MIN2(array_index, binding->array_size - 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const struct vk_ycbcr_conversion_state *ycbcr_samplers =
|
||||||
|
tu_immutable_ycbcr_samplers(set_layout, binding);
|
||||||
|
if (ycbcr_samplers) {
|
||||||
|
const struct vk_ycbcr_conversion_state *ycbcr_sampler = ycbcr_samplers + array_index;
|
||||||
|
lower_tex_ycbcr(ycbcr_sampler, builder, tex);
|
||||||
|
}
|
||||||
|
|
||||||
|
const struct tu_sampler *samplers =
|
||||||
|
tu_immutable_samplers(set_layout, binding);
|
||||||
|
if (samplers) {
|
||||||
|
const struct tu_sampler *sampler = samplers + array_index;
|
||||||
|
if (sampler->vk.flags & VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT)
|
||||||
|
lower_tex_subsampled(sampler, dev, shader, layout, builder, tex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static bool
|
static bool
|
||||||
lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
||||||
struct tu_shader *shader, const struct tu_pipeline_layout *layout,
|
struct tu_shader *shader, const struct tu_pipeline_layout *layout,
|
||||||
|
|
@ -765,7 +865,7 @@ lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
||||||
int sampler_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_sampler_2_deref : nir_tex_src_sampler_deref);
|
int sampler_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_sampler_2_deref : nir_tex_src_sampler_deref);
|
||||||
if (sampler_src_idx >= 0) {
|
if (sampler_src_idx >= 0) {
|
||||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[sampler_src_idx].src);
|
nir_deref_instr *deref = nir_src_as_deref(tex->src[sampler_src_idx].src);
|
||||||
nir_def *bindless = build_bindless(dev, b, deref, true, shader, layout,
|
nir_def *bindless = build_bindless(dev, b, deref, 1, shader, layout,
|
||||||
read_only_input_attachments,
|
read_only_input_attachments,
|
||||||
dynamic_renderpass);
|
dynamic_renderpass);
|
||||||
nir_src_rewrite(&tex->src[sampler_src_idx].src, bindless);
|
nir_src_rewrite(&tex->src[sampler_src_idx].src, bindless);
|
||||||
|
|
@ -775,7 +875,7 @@ lower_tex_impl(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
||||||
int tex_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_texture_2_deref : nir_tex_src_texture_deref);
|
int tex_src_idx = nir_tex_instr_src_index(tex, ref ? nir_tex_src_texture_2_deref : nir_tex_src_texture_deref);
|
||||||
if (tex_src_idx >= 0) {
|
if (tex_src_idx >= 0) {
|
||||||
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_src_idx].src);
|
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_src_idx].src);
|
||||||
nir_def *bindless = build_bindless(dev, b, deref, false, shader, layout,
|
nir_def *bindless = build_bindless(dev, b, deref, 0, shader, layout,
|
||||||
read_only_input_attachments,
|
read_only_input_attachments,
|
||||||
dynamic_renderpass);
|
dynamic_renderpass);
|
||||||
nir_src_rewrite(&tex->src[tex_src_idx].src, bindless);
|
nir_src_rewrite(&tex->src[tex_src_idx].src, bindless);
|
||||||
|
|
@ -800,7 +900,7 @@ lower_tex(nir_builder *b, nir_tex_instr *tex, struct tu_device *dev,
|
||||||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
||||||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, true);
|
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, true);
|
||||||
} else {
|
} else {
|
||||||
lower_tex_ycbcr(layout, b, tex);
|
lower_tex_immutable(dev, shader, layout, b, tex);
|
||||||
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
lower_tex_impl(b, tex, dev, shader, layout, read_only_input_attachments, dynamic_renderpass, false);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
584
src/freedreno/vulkan/tu_subsampled_image.cc
Normal file
584
src/freedreno/vulkan/tu_subsampled_image.cc
Normal file
|
|
@ -0,0 +1,584 @@
|
||||||
|
/*
|
||||||
|
* Copyright © 2026 Valve Corporation.
|
||||||
|
* SPDX-License-Identifier: MIT
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "tu_cmd_buffer.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
|
|
||||||
|
#include "nir_builder.h"
|
||||||
|
|
||||||
|
/* If a tile is not subsampled, we treat it as if its fragment area is (1,1)
|
||||||
|
* for the purposes of subsampling.
|
||||||
|
*/
|
||||||
|
static VkExtent2D
|
||||||
|
get_effective_frag_area(const struct tu_tile_config *tile, unsigned view)
|
||||||
|
{
|
||||||
|
return (tile->subsampled_views & (1u << view)) ?
|
||||||
|
tile->frag_areas[view] : (VkExtent2D) {1, 1};
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
tu_emit_subsampled_metadata(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
unsigned a,
|
||||||
|
const struct tu_tile_config *tiles,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const VkOffset2D *fdm_offsets)
|
||||||
|
{
|
||||||
|
const struct tu_image_view *iview = cmd->state.attachments[a];
|
||||||
|
float size_ratio_x = (float)iview->image->vk.extent.width /
|
||||||
|
iview->image->layout[0].width0;
|
||||||
|
float size_ratio_y = (float)iview->image->vk.extent.height /
|
||||||
|
iview->image->layout[0].height0;
|
||||||
|
for_each_layer (i, cmd->state.pass->attachments[a].used_views |
|
||||||
|
cmd->state.pass->attachments[a].resolve_views,
|
||||||
|
fb->layers) {
|
||||||
|
struct tu_subsampled_metadata metadata;
|
||||||
|
|
||||||
|
metadata.hdr.pad0[0] = metadata.hdr.pad0[1] = metadata.hdr.pad0[2] = 0;
|
||||||
|
|
||||||
|
unsigned tile_count;
|
||||||
|
if (!tiles || vsc->tile_count.width * vsc->tile_count.height >
|
||||||
|
TU_SUBSAMPLED_MAX_BINS) {
|
||||||
|
tile_count = 1;
|
||||||
|
metadata.hdr.scale_x = 1.0;
|
||||||
|
metadata.hdr.scale_y = 1.0;
|
||||||
|
metadata.hdr.offset_x = 0.0;
|
||||||
|
metadata.hdr.offset_y = 0.0;
|
||||||
|
metadata.hdr.bin_stride = 1;
|
||||||
|
metadata.bins[0].scale_x = size_ratio_x;
|
||||||
|
metadata.bins[0].scale_y = size_ratio_y;
|
||||||
|
metadata.bins[0].offset_x = 0.0;
|
||||||
|
metadata.bins[0].offset_y = 0.0;
|
||||||
|
} else {
|
||||||
|
unsigned view = MIN2(i, tu_fdm_num_layers(cmd) - 1);
|
||||||
|
VkOffset2D bin_offset = {};
|
||||||
|
if (fdm_offsets)
|
||||||
|
bin_offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||||
|
tile_count = vsc->tile_count.width * vsc->tile_count.height;
|
||||||
|
metadata.hdr.scale_x = (float)iview->vk.extent.width / tiling->tile0.width;
|
||||||
|
metadata.hdr.scale_y = (float)iview->vk.extent.height / tiling->tile0.height;
|
||||||
|
metadata.hdr.offset_x = (float)bin_offset.x / tiling->tile0.width;
|
||||||
|
metadata.hdr.offset_y = (float)bin_offset.y / tiling->tile0.height;
|
||||||
|
metadata.hdr.bin_stride = vsc->tile_count.width;
|
||||||
|
|
||||||
|
for (unsigned j = 0; j < tile_count; j++) {
|
||||||
|
const struct tu_tile_config *tile = &tiles[j];
|
||||||
|
|
||||||
|
while (tile->merged_tile)
|
||||||
|
tile = tile->merged_tile;
|
||||||
|
|
||||||
|
if (!(tile->visible_views & (1u << view)) ||
|
||||||
|
!tile->subsampled) {
|
||||||
|
metadata.bins[j].scale_x = metadata.bins[j].scale_y = 1.0;
|
||||||
|
metadata.bins[j].offset_x = metadata.bins[j].offset_y = 0.0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||||
|
VkOffset2D fb_bin_start = (VkOffset2D) {
|
||||||
|
MAX2(tile->pos.x * (int32_t)tiling->tile0.width - bin_offset.x, 0),
|
||||||
|
MAX2(tile->pos.y * (int32_t)tiling->tile0.height - bin_offset.y, 0),
|
||||||
|
};
|
||||||
|
metadata.bins[j].scale_x = 1.0 / frag_area.width * size_ratio_x;
|
||||||
|
metadata.bins[j].scale_y = 1.0 / frag_area.height * size_ratio_y;
|
||||||
|
metadata.bins[j].offset_x =
|
||||||
|
(float)(tile->subsampled_pos[view].offset.x -
|
||||||
|
fb_bin_start.x / frag_area.width) /
|
||||||
|
iview->image->layout[0].width0;
|
||||||
|
metadata.bins[j].offset_y =
|
||||||
|
(float)(tile->subsampled_pos[view].offset.y -
|
||||||
|
fb_bin_start.y / frag_area.height) /
|
||||||
|
iview->image->layout[0].height0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
uint64_t iova = iview->image->iova +
|
||||||
|
iview->image->subsampled_metadata_offset +
|
||||||
|
sizeof(struct tu_subsampled_metadata) *
|
||||||
|
(iview->vk.base_array_layer + i);
|
||||||
|
|
||||||
|
tu_cs_emit_pkt7(cs, CP_MEM_WRITE,
|
||||||
|
2 + (sizeof(struct tu_subsampled_header) +
|
||||||
|
tile_count * sizeof(struct tu_subsampled_bin)) / 4);
|
||||||
|
tu_cs_emit_qw(cs, iova);
|
||||||
|
tu_cs_emit_array(cs, (const uint32_t *)&metadata.hdr,
|
||||||
|
sizeof(struct tu_subsampled_header) / 4);
|
||||||
|
tu_cs_emit_array(cs, (const uint32_t *)&metadata.bins,
|
||||||
|
sizeof(struct tu_subsampled_bin) * tile_count / 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The cache-tracking infrastructure can't be aware of subsampled images,
|
||||||
|
* so manually make sure the writes land. Sampling as an image should
|
||||||
|
* already insert a CACHE_INVALIDATE + WFI.
|
||||||
|
*/
|
||||||
|
cmd->state.cache.pending_flush_bits |=
|
||||||
|
TU_CMD_FLAG_WAIT_MEM_WRITES;
|
||||||
|
}
|
||||||
|
|
||||||
|
nir_def *
|
||||||
|
tu_get_subsampled_coordinates(nir_builder *b,
|
||||||
|
nir_def *coords,
|
||||||
|
nir_def *descriptor)
|
||||||
|
{
|
||||||
|
nir_def *layer;
|
||||||
|
if (coords->num_components > 2)
|
||||||
|
layer = nir_f2u16(b, nir_channel(b, coords, 2));
|
||||||
|
else
|
||||||
|
layer = nir_imm_intN_t(b, 0, 16);
|
||||||
|
|
||||||
|
nir_def *layer_offset =
|
||||||
|
nir_imul_imm_nuw(b, layer, sizeof(struct tu_subsampled_metadata) / 16);
|
||||||
|
|
||||||
|
nir_def *hdr0 =
|
||||||
|
nir_load_ubo(b, 4, 32, descriptor,
|
||||||
|
nir_ishl_imm(b, nir_u2u32(b, layer_offset), 4),
|
||||||
|
.align_mul = 16,
|
||||||
|
.align_offset = 0,
|
||||||
|
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||||
|
nir_def *bin_stride =
|
||||||
|
nir_load_ubo(b, 1, 32, descriptor, nir_ishl_imm(b, nir_u2u32(b, nir_iadd_imm(b, layer_offset, 1)), 4),
|
||||||
|
.align_mul = 16,
|
||||||
|
.align_offset = 0,
|
||||||
|
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||||
|
|
||||||
|
nir_def *hdr_scale = nir_channels(b, hdr0, 0x3);
|
||||||
|
nir_def *hdr_offset = nir_channels(b, hdr0, 0xc);
|
||||||
|
|
||||||
|
nir_def *bin = nir_f2u16(b, nir_ffma(b, coords, hdr_scale, hdr_offset));
|
||||||
|
nir_def *bin_idx = nir_iadd(b, nir_imul(b, nir_channel(b, bin, 1),
|
||||||
|
nir_u2u16(b, bin_stride)),
|
||||||
|
nir_channel(b, bin, 0));
|
||||||
|
|
||||||
|
bin_idx = nir_iadd_imm(b, nir_iadd(b, bin_idx, layer_offset),
|
||||||
|
sizeof(struct tu_subsampled_header) / 16);
|
||||||
|
|
||||||
|
nir_def *bin_data =
|
||||||
|
nir_load_ubo(b, 4, 32, descriptor, nir_ishl_imm(b, nir_u2u32(b, bin_idx), 4),
|
||||||
|
.align_mul = 16,
|
||||||
|
.align_offset = 0,
|
||||||
|
.range = TU_SUBSAMPLED_MAX_LAYERS * sizeof(struct tu_subsampled_metadata));
|
||||||
|
|
||||||
|
nir_def *bin_scale = nir_channels(b, bin_data, 0x3);
|
||||||
|
nir_def *bin_offset = nir_channels(b, bin_data, 0xc);
|
||||||
|
|
||||||
|
return nir_ffma(b, coords, bin_scale, bin_offset);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Calculate the y coordinate in subsampled space of a given number of tiles
|
||||||
|
* after the start of "tile".
|
||||||
|
*/
|
||||||
|
static void
|
||||||
|
calc_tile_vert_pos(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
unsigned view,
|
||||||
|
VkOffset2D bin_offset,
|
||||||
|
unsigned tile_offset,
|
||||||
|
unsigned *pos_y_out)
|
||||||
|
{
|
||||||
|
int offset_px = 0;
|
||||||
|
if (tile->pos.y == 0 && tile_offset > 0) {
|
||||||
|
/* The first row is a partial row with FDM offset. */
|
||||||
|
offset_px += tiling->tile0.height - bin_offset.y;
|
||||||
|
tile_offset--;
|
||||||
|
}
|
||||||
|
offset_px += tiling->tile0.height * tile_offset;
|
||||||
|
|
||||||
|
unsigned pos_y = tile->subsampled_pos[view].offset.y +
|
||||||
|
offset_px / get_effective_frag_area(tile, view).height;
|
||||||
|
|
||||||
|
/* The last tile is along the framebuffer edge, so clamp to the framebuffer
|
||||||
|
* height.
|
||||||
|
*/
|
||||||
|
*pos_y_out = MIN2(pos_y, tile->subsampled_pos[view].offset.y +
|
||||||
|
tile->subsampled_pos[view].extent.height);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
calc_tile_horiz_pos(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
unsigned view,
|
||||||
|
VkOffset2D bin_offset,
|
||||||
|
unsigned tile_offset,
|
||||||
|
unsigned *pos_x_out)
|
||||||
|
{
|
||||||
|
int offset_px = 0;
|
||||||
|
if (tile->pos.x == 0 && tile_offset > 0) {
|
||||||
|
/* The first column is a partial column with FDM offset. */
|
||||||
|
offset_px += tiling->tile0.width - bin_offset.x;
|
||||||
|
tile_offset--;
|
||||||
|
}
|
||||||
|
offset_px += tiling->tile0.width * tile_offset;
|
||||||
|
|
||||||
|
unsigned pos_x = tile->subsampled_pos[view].offset.x +
|
||||||
|
offset_px / get_effective_frag_area(tile, view).width;
|
||||||
|
|
||||||
|
/* The last tile is along the framebuffer edge, so clamp to the framebuffer
|
||||||
|
* width.
|
||||||
|
*/
|
||||||
|
*pos_x_out = MIN2(pos_x, tile->subsampled_pos[view].offset.x +
|
||||||
|
tile->subsampled_pos[view].extent.width);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Given two tiles "tile" and "other_tile", calculate the y coordinates of
|
||||||
|
* their shared vertical edge in subsampled space relative to "tile". That is,
|
||||||
|
* calculate the y coordinates along the edge of "tile" where "other_tile"
|
||||||
|
* will touch it after scaling up to framebuffer coordinates. The start and
|
||||||
|
* end may be the same coordinate if "tile" and "other_tile" only share a
|
||||||
|
* corner, but this will be extended when handling corners.
|
||||||
|
*/
|
||||||
|
static void
|
||||||
|
calc_shared_vert_edge(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tile_config *other_tile,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
unsigned view,
|
||||||
|
VkOffset2D bin_offset,
|
||||||
|
unsigned *out_start,
|
||||||
|
unsigned *out_end)
|
||||||
|
{
|
||||||
|
int other_start_tile = MAX2(other_tile->pos.y - tile->pos.y, 0);
|
||||||
|
assert(other_start_tile <= tile->sysmem_extent.height);
|
||||||
|
calc_tile_vert_pos(tile, tiling, fb, view, bin_offset,
|
||||||
|
other_start_tile, out_start);
|
||||||
|
int other_end_tile =
|
||||||
|
MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||||
|
other_tile->pos.y + other_tile->sysmem_extent.height) - tile->pos.y;
|
||||||
|
assert(other_end_tile >= 0);
|
||||||
|
calc_tile_vert_pos(tile, tiling, fb, view, bin_offset,
|
||||||
|
other_end_tile, out_end);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
calc_shared_horiz_edge(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tile_config *other_tile,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
unsigned view,
|
||||||
|
VkOffset2D bin_offset,
|
||||||
|
unsigned *out_start,
|
||||||
|
unsigned *out_end)
|
||||||
|
{
|
||||||
|
int other_start_tile = MAX2(other_tile->pos.x - tile->pos.x, 0);
|
||||||
|
assert(other_start_tile <= tile->sysmem_extent.width);
|
||||||
|
calc_tile_horiz_pos(tile, tiling, fb, view, bin_offset,
|
||||||
|
other_start_tile, out_start);
|
||||||
|
int other_end_tile =
|
||||||
|
MIN2(tile->pos.x + tile->sysmem_extent.width,
|
||||||
|
other_tile->pos.x + other_tile->sysmem_extent.width) - tile->pos.x;
|
||||||
|
assert(other_end_tile >= 0);
|
||||||
|
calc_tile_horiz_pos(tile, tiling, fb, view, bin_offset,
|
||||||
|
other_end_tile, out_end);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Extend vertical-edge blit start and end for apron corners. */
|
||||||
|
static void
|
||||||
|
handle_vertical_corners(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tile_config *other_tile,
|
||||||
|
unsigned view,
|
||||||
|
VkRect2D *tile_dst,
|
||||||
|
struct tu_rect2d_float *other_src)
|
||||||
|
{
|
||||||
|
float other_apron_height =
|
||||||
|
(float)APRON_SIZE * get_effective_frag_area(tile, view).height /
|
||||||
|
get_effective_frag_area(other_tile, view).height;
|
||||||
|
if ((unsigned)other_src->y_start > other_tile->subsampled_pos[view].offset.y) {
|
||||||
|
tile_dst->offset.y -= APRON_SIZE;
|
||||||
|
tile_dst->extent.height += APRON_SIZE;
|
||||||
|
other_src->y_start -= other_apron_height;
|
||||||
|
}
|
||||||
|
if ((unsigned)other_src->y_end <
|
||||||
|
other_tile->subsampled_pos[view].offset.y +
|
||||||
|
other_tile->subsampled_pos[view].extent.height) {
|
||||||
|
tile_dst->extent.height += APRON_SIZE;
|
||||||
|
other_src->y_end += other_apron_height;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
handle_horizontal_corners(const struct tu_tile_config *tile,
|
||||||
|
const struct tu_tile_config *other_tile,
|
||||||
|
unsigned view,
|
||||||
|
VkRect2D *tile_dst,
|
||||||
|
struct tu_rect2d_float *other_src)
|
||||||
|
{
|
||||||
|
float other_apron_width =
|
||||||
|
(float)APRON_SIZE * get_effective_frag_area(tile, view).width /
|
||||||
|
get_effective_frag_area(other_tile, view).width;
|
||||||
|
if (other_src->x_start > other_tile->subsampled_pos[view].offset.x) {
|
||||||
|
tile_dst->offset.x -= APRON_SIZE;
|
||||||
|
tile_dst->extent.width += APRON_SIZE;
|
||||||
|
other_src->x_start -= other_apron_width;
|
||||||
|
}
|
||||||
|
if ((unsigned)other_src->x_end <
|
||||||
|
other_tile->subsampled_pos[view].offset.x +
|
||||||
|
other_tile->subsampled_pos[view].extent.width) {
|
||||||
|
tile_dst->extent.width += APRON_SIZE;
|
||||||
|
other_src->x_end += other_apron_width;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
unsigned
|
||||||
|
tu_calc_subsampled_aprons(VkRect2D *dst,
|
||||||
|
struct tu_rect2d_float *src,
|
||||||
|
unsigned view,
|
||||||
|
const struct tu_tile_config *tiles,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const VkOffset2D *fdm_offsets)
|
||||||
|
{
|
||||||
|
unsigned count = 0;
|
||||||
|
|
||||||
|
VkOffset2D bin_offset = {};
|
||||||
|
if (fdm_offsets)
|
||||||
|
bin_offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||||
|
|
||||||
|
for (unsigned y = 0; y < vsc->tile_count.height; y++) {
|
||||||
|
for (unsigned x = 0; x < vsc->tile_count.width; x++) {
|
||||||
|
const struct tu_tile_config *tile = &tiles[y * vsc->tile_count.width + x];
|
||||||
|
|
||||||
|
if (tile->merged_tile || !(tile->visible_views & (1u << view)))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
int x_neighbor = tile->pos.x + tile->sysmem_extent.width;
|
||||||
|
int y_neighbor = tile->pos.y + tile->sysmem_extent.height;
|
||||||
|
|
||||||
|
/* Start with vertically adjacent tiles. For a given neighbor to the
|
||||||
|
* right, produce aprons for both this tile and its neighbor along
|
||||||
|
* their shared edge. We handle tiles that only share an edge:
|
||||||
|
*
|
||||||
|
* -------- -------
|
||||||
|
* | | |
|
||||||
|
* | tile | other |
|
||||||
|
* | | |
|
||||||
|
* -------- -------
|
||||||
|
*
|
||||||
|
* Tiles that only share a corner:
|
||||||
|
*
|
||||||
|
* -------
|
||||||
|
* | |
|
||||||
|
* | other |
|
||||||
|
* | |
|
||||||
|
* -------- -------
|
||||||
|
* | |
|
||||||
|
* | tile |
|
||||||
|
* | |
|
||||||
|
* --------
|
||||||
|
*
|
||||||
|
* And tiles where the corner of one tile comes from the edge of
|
||||||
|
* another:
|
||||||
|
*
|
||||||
|
* -------
|
||||||
|
* | |
|
||||||
|
* | |
|
||||||
|
* | |
|
||||||
|
* --------| other |
|
||||||
|
* | | |
|
||||||
|
* | tile | |
|
||||||
|
* | | |
|
||||||
|
* -------- -------
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
if (x_neighbor < vsc->tile_count.width) {
|
||||||
|
int y_start = MAX2(tile->pos.y - 1, 0);
|
||||||
|
int y_end = MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||||
|
vsc->tile_count.height - 1);
|
||||||
|
const struct tu_tile_config *other_tile;
|
||||||
|
|
||||||
|
/* Sweep all tiles directly to the right, keeping in mind
|
||||||
|
* merged tiles.
|
||||||
|
*/
|
||||||
|
for (int y = y_start; y <= y_end;
|
||||||
|
y = other_tile->pos.y + other_tile->sysmem_extent.height) {
|
||||||
|
other_tile = tu_get_merged_tile_const(&tiles[y * vsc->tile_count.width + x_neighbor]);
|
||||||
|
|
||||||
|
if (!(other_tile->visible_views & (1u << view)))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* If they are next to each other then neither needs an apron. */
|
||||||
|
if (tile->subsampled_pos[view].offset.x +
|
||||||
|
tile->subsampled_pos[view].extent.width ==
|
||||||
|
other_tile->subsampled_pos[view].offset.x)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* If other_tile isn't entirely to the right of tile, it is not
|
||||||
|
* vertically adjacent and will be handled below instead.
|
||||||
|
*/
|
||||||
|
if (other_tile->pos.x < tile->pos.x + tile->sysmem_extent.width)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||||
|
VkExtent2D other_frag_area =
|
||||||
|
get_effective_frag_area(other_tile, view);
|
||||||
|
|
||||||
|
unsigned tile_start, tile_end;
|
||||||
|
calc_shared_vert_edge(tile, other_tile, tiling, fb, view,
|
||||||
|
bin_offset, &tile_start, &tile_end);
|
||||||
|
|
||||||
|
unsigned other_tile_start, other_tile_end;
|
||||||
|
calc_shared_vert_edge(other_tile, tile, tiling, fb, view,
|
||||||
|
bin_offset, &other_tile_start,
|
||||||
|
&other_tile_end);
|
||||||
|
|
||||||
|
VkRect2D tile_dst;
|
||||||
|
|
||||||
|
tile_dst.offset.y = tile_start;
|
||||||
|
tile_dst.extent.height = tile_end - tile_start;
|
||||||
|
|
||||||
|
tile_dst.offset.x = tile->subsampled_pos[view].offset.x +
|
||||||
|
tile->subsampled_pos[view].extent.width;
|
||||||
|
tile_dst.extent.width = APRON_SIZE;
|
||||||
|
|
||||||
|
struct tu_rect2d_float other_src;
|
||||||
|
|
||||||
|
other_src.x_start = other_tile->subsampled_pos[view].offset.x;
|
||||||
|
other_src.x_end = other_src.x_start +
|
||||||
|
(float)APRON_SIZE * frag_area.width / other_frag_area.width;
|
||||||
|
|
||||||
|
other_src.y_start = other_tile_start;
|
||||||
|
other_src.y_end = other_tile_end;
|
||||||
|
|
||||||
|
/* Extend start and end for apron corners. */
|
||||||
|
handle_vertical_corners(tile, other_tile, view, &tile_dst,
|
||||||
|
&other_src);
|
||||||
|
|
||||||
|
/* Add other_tile -> tile blit to the list. */
|
||||||
|
dst[count] = tile_dst;
|
||||||
|
src[count] = other_src;
|
||||||
|
count++;
|
||||||
|
|
||||||
|
VkRect2D other_dst;
|
||||||
|
|
||||||
|
other_dst.offset.y = other_tile_start;
|
||||||
|
other_dst.extent.height = other_tile_end - other_tile_start;
|
||||||
|
|
||||||
|
other_dst.offset.x =
|
||||||
|
other_tile->subsampled_pos[view].offset.x - APRON_SIZE;
|
||||||
|
other_dst.extent.width = APRON_SIZE;
|
||||||
|
|
||||||
|
struct tu_rect2d_float tile_src;
|
||||||
|
|
||||||
|
tile_src.x_end = tile->subsampled_pos[view].offset.x
|
||||||
|
+ tile->subsampled_pos[view].extent.width;
|
||||||
|
tile_src.x_start = tile_src.x_end -
|
||||||
|
(float)APRON_SIZE * other_frag_area.width / frag_area.width;
|
||||||
|
|
||||||
|
tile_src.y_start = tile_start;
|
||||||
|
tile_src.y_end = tile_end;
|
||||||
|
|
||||||
|
handle_vertical_corners(other_tile, tile, view, &other_dst,
|
||||||
|
&tile_src);
|
||||||
|
|
||||||
|
/* Add tile -> other_tile blit to the list. */
|
||||||
|
dst[count] = other_dst;
|
||||||
|
src[count] = tile_src;
|
||||||
|
count++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Now do the same thing but for horizontally adjacent tiles. Because
|
||||||
|
* the above loop handled tiles that only share a corner, we only
|
||||||
|
* have to handle neighbors below it that share an edge. However,
|
||||||
|
* these neighbors may also share a corner if they are merged tiles.
|
||||||
|
*/
|
||||||
|
if (y_neighbor < vsc->tile_count.height) {
|
||||||
|
const struct tu_tile_config *other_tile;
|
||||||
|
|
||||||
|
/* Sweep all tiles directly below, keeping in mind merged tiles.
|
||||||
|
*/
|
||||||
|
for (int x = tile->pos.x;
|
||||||
|
x < tile->pos.x + tile->sysmem_extent.width;
|
||||||
|
x = other_tile->pos.x + other_tile->sysmem_extent.width) {
|
||||||
|
other_tile = tu_get_merged_tile_const(&tiles[y_neighbor * vsc->tile_count.width + x]);
|
||||||
|
|
||||||
|
if (!(other_tile->visible_views & (1u << view)))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* If both are next to each other then neither needs an apron. */
|
||||||
|
if (tile->subsampled_pos[view].offset.y +
|
||||||
|
tile->subsampled_pos[view].extent.height ==
|
||||||
|
other_tile->subsampled_pos[view].offset.y)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
VkExtent2D frag_area = get_effective_frag_area(tile, view);
|
||||||
|
VkExtent2D other_frag_area =
|
||||||
|
get_effective_frag_area(other_tile, view);
|
||||||
|
|
||||||
|
unsigned tile_start, tile_end;
|
||||||
|
calc_shared_horiz_edge(tile, other_tile, tiling, fb, view,
|
||||||
|
bin_offset, &tile_start, &tile_end);
|
||||||
|
|
||||||
|
unsigned other_tile_start, other_tile_end;
|
||||||
|
calc_shared_horiz_edge(other_tile, tile, tiling, fb, view,
|
||||||
|
bin_offset, &other_tile_start,
|
||||||
|
&other_tile_end);
|
||||||
|
|
||||||
|
VkRect2D tile_dst;
|
||||||
|
|
||||||
|
tile_dst.offset.x = tile_start;
|
||||||
|
tile_dst.extent.width = tile_end - tile_start;
|
||||||
|
|
||||||
|
tile_dst.offset.y = tile->subsampled_pos[view].offset.y +
|
||||||
|
tile->subsampled_pos[view].extent.height;
|
||||||
|
tile_dst.extent.height = APRON_SIZE;
|
||||||
|
|
||||||
|
struct tu_rect2d_float other_src;
|
||||||
|
|
||||||
|
other_src.y_start = other_tile->subsampled_pos[view].offset.y;
|
||||||
|
other_src.y_end = other_src.y_start +
|
||||||
|
(float)APRON_SIZE * frag_area.height / other_frag_area.height;
|
||||||
|
|
||||||
|
other_src.x_start = other_tile_start;
|
||||||
|
other_src.x_end = other_tile_end;
|
||||||
|
|
||||||
|
/* Extend start and end for apron corners. */
|
||||||
|
handle_horizontal_corners(tile, other_tile, view, &tile_dst,
|
||||||
|
&other_src);
|
||||||
|
|
||||||
|
/* Add other_tile -> tile blit to the list. */
|
||||||
|
dst[count] = tile_dst;
|
||||||
|
src[count] = other_src;
|
||||||
|
assert(tile_dst.offset.x >= 0);
|
||||||
|
assert(tile_dst.offset.y >= 0);
|
||||||
|
count++;
|
||||||
|
|
||||||
|
VkRect2D other_dst;
|
||||||
|
|
||||||
|
other_dst.offset.x = other_tile_start;
|
||||||
|
other_dst.extent.width = other_tile_end - other_tile_start;
|
||||||
|
|
||||||
|
other_dst.offset.y =
|
||||||
|
other_tile->subsampled_pos[view].offset.y - APRON_SIZE;
|
||||||
|
other_dst.extent.height = APRON_SIZE;
|
||||||
|
|
||||||
|
struct tu_rect2d_float tile_src;
|
||||||
|
|
||||||
|
tile_src.y_end = tile->subsampled_pos[view].offset.y
|
||||||
|
+ tile->subsampled_pos[view].extent.height;
|
||||||
|
tile_src.y_start = tile_src.y_end -
|
||||||
|
(float)APRON_SIZE * other_frag_area.height / frag_area.height;
|
||||||
|
|
||||||
|
tile_src.x_start = tile_start;
|
||||||
|
tile_src.x_end = tile_end;
|
||||||
|
|
||||||
|
handle_horizontal_corners(other_tile, tile, view, &other_dst,
|
||||||
|
&tile_src);
|
||||||
|
|
||||||
|
/* Add tile -> other_tile blit to the list. */
|
||||||
|
dst[count] = other_dst;
|
||||||
|
src[count] = tile_src;
|
||||||
|
assert(other_dst.offset.x >= 0);
|
||||||
|
assert(other_dst.offset.y >= 0);
|
||||||
|
count++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
88
src/freedreno/vulkan/tu_subsampled_image.h
Normal file
88
src/freedreno/vulkan/tu_subsampled_image.h
Normal file
|
|
@ -0,0 +1,88 @@
|
||||||
|
/*
|
||||||
|
* Copyright © 2026 Valve Corporation.
|
||||||
|
* SPDX-License-Identifier: MIT
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
#include "tu_common.h"
|
||||||
|
|
||||||
|
/* Describe the format used for subsampled image metadata. This is attached to
|
||||||
|
* subsampled images, via a separate UBO descriptor after the image
|
||||||
|
* descriptor. It is written after the render pass which writes to the image,
|
||||||
|
* and is read via code injected into the shader when sampling from a
|
||||||
|
* subsampled image.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* The maximum number of bins a subsampled image can have before we disable
|
||||||
|
* subsampling.
|
||||||
|
*/
|
||||||
|
#define TU_SUBSAMPLED_MAX_BINS 512
|
||||||
|
|
||||||
|
/* The maximum number of layers a view of a subsampled image can have.
|
||||||
|
*
|
||||||
|
* There is one metadata structure per layer, and the view uses a UBO for the
|
||||||
|
* metadata, so this is bounded by the maximum UBO size.
|
||||||
|
*
|
||||||
|
* TODO: When we implement fdm2, we should expose this as
|
||||||
|
* maxSubsampledArrayLayers. The Vulkan spec says that the minimum value for
|
||||||
|
* maxSubsampledArrayLayers is 2, so users can only rely on 2 layers even
|
||||||
|
* though we support more.
|
||||||
|
*/
|
||||||
|
#define TU_SUBSAMPLED_MAX_LAYERS 6
|
||||||
|
|
||||||
|
/* This is 2 to allow for floating-point precision errors and in case the user
|
||||||
|
* uses bicubic filtering.
|
||||||
|
*/
|
||||||
|
#define APRON_SIZE 2
|
||||||
|
|
||||||
|
struct tu_subsampled_bin {
|
||||||
|
float scale_x;
|
||||||
|
float scale_y;
|
||||||
|
float offset_x;
|
||||||
|
float offset_y;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct tu_subsampled_header {
|
||||||
|
/* The bin coordinate to use is calculated as:
|
||||||
|
* bin = int(coord * scale + offset)
|
||||||
|
*/
|
||||||
|
float scale_x;
|
||||||
|
float scale_y;
|
||||||
|
float offset_x;
|
||||||
|
float offset_y;
|
||||||
|
|
||||||
|
uint32_t bin_stride;
|
||||||
|
uint32_t pad0[3];
|
||||||
|
};
|
||||||
|
|
||||||
|
struct tu_subsampled_metadata {
|
||||||
|
struct tu_subsampled_header hdr;
|
||||||
|
|
||||||
|
struct tu_subsampled_bin bins[TU_SUBSAMPLED_MAX_BINS];
|
||||||
|
};
|
||||||
|
|
||||||
|
void
|
||||||
|
tu_emit_subsampled_metadata(struct tu_cmd_buffer *cmd,
|
||||||
|
struct tu_cs *cs,
|
||||||
|
unsigned a,
|
||||||
|
const struct tu_tile_config *tiles,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const VkOffset2D *fdm_offsets);
|
||||||
|
|
||||||
|
unsigned
|
||||||
|
tu_calc_subsampled_aprons(VkRect2D *dst,
|
||||||
|
struct tu_rect2d_float *src,
|
||||||
|
unsigned view,
|
||||||
|
const struct tu_tile_config *tiles,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const VkOffset2D *fdm_offsets);
|
||||||
|
|
||||||
|
nir_def *
|
||||||
|
tu_get_subsampled_coordinates(nir_builder *b,
|
||||||
|
nir_def *coords,
|
||||||
|
nir_def *descriptor);
|
||||||
|
|
@ -10,6 +10,9 @@
|
||||||
|
|
||||||
#include "tu_cmd_buffer.h"
|
#include "tu_cmd_buffer.h"
|
||||||
#include "tu_tile_config.h"
|
#include "tu_tile_config.h"
|
||||||
|
#include "tu_subsampled_image.h"
|
||||||
|
|
||||||
|
#include "util/u_worklist.h"
|
||||||
|
|
||||||
static void
|
static void
|
||||||
tu_calc_frag_area(struct tu_cmd_buffer *cmd,
|
tu_calc_frag_area(struct tu_cmd_buffer *cmd,
|
||||||
|
|
@ -369,6 +372,370 @@ tu_merge_tiles(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Get the default position of the tile in subsampled space. It may be shifted
|
||||||
|
* over later, but it has to stay within the non-subsampled rectangle (i.e.
|
||||||
|
* the result we return with frag_area = 1,1). If the tile is made
|
||||||
|
* non-subsampled then its frag_area becomes 1,1.
|
||||||
|
*/
|
||||||
|
static VkRect2D
|
||||||
|
get_default_tile_pos(const struct tu_physical_device *phys_dev,
|
||||||
|
struct tu_tile_config *tile,
|
||||||
|
unsigned view,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const VkOffset2D *fdm_offsets,
|
||||||
|
VkExtent2D frag_area)
|
||||||
|
{
|
||||||
|
VkOffset2D offset = {};
|
||||||
|
if (fdm_offsets)
|
||||||
|
offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||||
|
VkOffset2D aligned_offset = {};
|
||||||
|
aligned_offset.x = offset.x / phys_dev->info->tile_align_w *
|
||||||
|
phys_dev->info->tile_align_w;;
|
||||||
|
aligned_offset.y = offset.y / phys_dev->info->tile_align_h *
|
||||||
|
phys_dev->info->tile_align_h;
|
||||||
|
int32_t fb_start_x =
|
||||||
|
MAX2(tile->pos.x * (int32_t)tiling->tile0.width - offset.x, 0);
|
||||||
|
int32_t fb_end_x =
|
||||||
|
(tile->pos.x + tile->sysmem_extent.width) * tiling->tile0.width - offset.x;
|
||||||
|
int32_t fb_start_y =
|
||||||
|
MAX2(tile->pos.y * (int32_t)tiling->tile0.height - offset.y, 0);
|
||||||
|
int32_t fb_end_y =
|
||||||
|
(tile->pos.y + tile->sysmem_extent.height) * tiling->tile0.height - offset.y;
|
||||||
|
|
||||||
|
/* For tiles in the last row/column, we cannot create an apron for their
|
||||||
|
* right/bottom edges because we don't know what addressing mode the
|
||||||
|
* sampler will use. If the edge of the framebuffer is the same as the edge
|
||||||
|
* of the image, then when sampling the image near the edge we'd expect the
|
||||||
|
* sampler border handling to kick in, but that doesn't work unless the
|
||||||
|
* tile is shifted to the end of the framebuffer. Because the images are
|
||||||
|
* made larger, we have to shift it over by the same amount, which is
|
||||||
|
* currently gmem_align_w/gmem_align_h, so that if the framebuffer is the
|
||||||
|
* same size as the original API image then the border works correctly.
|
||||||
|
*
|
||||||
|
* For tiles not in the first row/column, we align the FDM offset down so
|
||||||
|
* that we can use the faster tile store method. This means that the
|
||||||
|
* subsampled space tile start may be shifted compared to framebuffer
|
||||||
|
* space. This will create a gap between the first and second tiles, which
|
||||||
|
* will require an apron even if neither is subsampled. This works because
|
||||||
|
* gmem_align_w/gmem_align_h is always at least the apron size times two.
|
||||||
|
*/
|
||||||
|
bool stick_to_end_x = fb_end_x >= fb->width;
|
||||||
|
bool stick_to_end_y = fb_end_y >= fb->height;
|
||||||
|
unsigned fb_offset_x = fdm_offsets ?
|
||||||
|
phys_dev->info->tile_align_w : 0;
|
||||||
|
unsigned fb_offset_y = fdm_offsets ?
|
||||||
|
phys_dev->info->tile_align_h : 0;
|
||||||
|
int32_t start_x, end_x, start_y, end_y;
|
||||||
|
if (stick_to_end_x) {
|
||||||
|
end_x = fb->width + fb_offset_x;
|
||||||
|
start_x = end_x - DIV_ROUND_UP(fb->width - fb_start_x, frag_area.width);
|
||||||
|
} else if (tile->pos.x == 0) {
|
||||||
|
start_x = 0;
|
||||||
|
end_x = fb_end_x / frag_area.width;
|
||||||
|
} else {
|
||||||
|
start_x = tile->pos.x * tiling->tile0.width - aligned_offset.x;
|
||||||
|
end_x = start_x + tile->sysmem_extent.width * tiling->tile0.width / frag_area.width;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (stick_to_end_y) {
|
||||||
|
end_y = fb->height + fb_offset_y;
|
||||||
|
start_y = end_y - DIV_ROUND_UP(fb->height - fb_start_y, frag_area.height);
|
||||||
|
} else if (tile->pos.y == 0) {
|
||||||
|
start_y = 0;
|
||||||
|
end_y = fb_end_y / frag_area.height;
|
||||||
|
} else {
|
||||||
|
start_y = tile->pos.y * tiling->tile0.height - aligned_offset.y;
|
||||||
|
end_y = start_y + tile->sysmem_extent.height * tiling->tile0.height / frag_area.height;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (stick_to_end_x || stick_to_end_y)
|
||||||
|
tile->subsampled_border = true;
|
||||||
|
|
||||||
|
return (VkRect2D) {
|
||||||
|
.offset = { start_x, start_y },
|
||||||
|
.extent = { end_x - start_x, end_y - start_y },
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
make_non_subsampled(const struct tu_physical_device *phys_dev,
|
||||||
|
struct tu_tile_config *tile,
|
||||||
|
unsigned view,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const VkOffset2D *fdm_offsets)
|
||||||
|
{
|
||||||
|
tile->subsampled_views &= ~(1u << view);
|
||||||
|
tile->subsampled_pos[view] =
|
||||||
|
get_default_tile_pos(phys_dev, tile, view, fb, tiling, fdm_offsets,
|
||||||
|
(VkExtent2D) { 1, 1 });
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool
|
||||||
|
aprons_intersect(struct tu_tile_config *a, struct tu_tile_config *b,
|
||||||
|
unsigned view)
|
||||||
|
{
|
||||||
|
if (a->subsampled_pos[view].offset.x +
|
||||||
|
a->subsampled_pos[view].extent.width + APRON_SIZE * 2 <=
|
||||||
|
b->subsampled_pos[view].offset.x)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
if (b->subsampled_pos[view].offset.x +
|
||||||
|
b->subsampled_pos[view].extent.width + APRON_SIZE * 2 <=
|
||||||
|
a->subsampled_pos[view].offset.x)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
if (a->subsampled_pos[view].offset.y +
|
||||||
|
a->subsampled_pos[view].extent.height + APRON_SIZE * 2 <=
|
||||||
|
b->subsampled_pos[view].offset.y)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
if (b->subsampled_pos[view].offset.y +
|
||||||
|
b->subsampled_pos[view].extent.height + APRON_SIZE * 2 <=
|
||||||
|
a->subsampled_pos[view].offset.y)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Calculate the location of each bin in the subsampled image and whether we
|
||||||
|
* need to avoid subsampling it. The constraint we have to deal with here is
|
||||||
|
* that for any two tiles sharing an edge, either both must not be subsampled
|
||||||
|
* (so that we do not need to insert an apron) or they must be at least 4
|
||||||
|
* pixels apart along that edge to create an apron of 2 pixels around each
|
||||||
|
* tile. The apron includes the corner of the tile, so tiles that only touch
|
||||||
|
* corners also count as touching along both edges. The two strategies
|
||||||
|
* available to us to deal with this are disabling subsampling and shifting
|
||||||
|
* over the origin of the tile, which only works when there is enough free
|
||||||
|
* space to shift it. This is complicated by the fact that one or both of the
|
||||||
|
* neighboring tiles may be a merged tile, so each tile may have several
|
||||||
|
* neighbors sharing an edge instead of just 3.
|
||||||
|
*
|
||||||
|
* By default, we make each bin start at an aligned version of the start in
|
||||||
|
* framebuffer space, b_s. This means that the tile grid is shifted up and to
|
||||||
|
* the right for FDM offset, making sure the last row/column of tiles always
|
||||||
|
* fits within the image and we only need a small fixed amount of extra space
|
||||||
|
* to hold the overflow.
|
||||||
|
*/
|
||||||
|
static void
|
||||||
|
tu_calc_subsampled(struct tu_tile_config *tiles,
|
||||||
|
const struct tu_physical_device *phys_dev,
|
||||||
|
const struct tu_tiling_config *tiling,
|
||||||
|
const struct tu_framebuffer *fb,
|
||||||
|
const struct tu_vsc_config *vsc,
|
||||||
|
const VkOffset2D *fdm_offsets)
|
||||||
|
{
|
||||||
|
u_worklist worklist;
|
||||||
|
u_worklist_init(&worklist, vsc->tile_count.width * vsc->tile_count.height,
|
||||||
|
NULL);
|
||||||
|
|
||||||
|
for (unsigned y = 0; y < vsc->tile_count.height; y++) {
|
||||||
|
for (unsigned x = 0; x < vsc->tile_count.width; x++) {
|
||||||
|
struct tu_tile_config *tile = &tiles[y * vsc->tile_count.width + x];
|
||||||
|
|
||||||
|
if (!tile->visible_views || tile->merged_tile)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
u_foreach_bit (view, tile->visible_views) {
|
||||||
|
VkOffset2D offset = {};
|
||||||
|
if (fdm_offsets)
|
||||||
|
offset = tu_bin_offset(fdm_offsets[view], tiling);
|
||||||
|
tile->subsampled_pos[view] =
|
||||||
|
get_default_tile_pos(phys_dev, tile, view, fb, tiling, fdm_offsets,
|
||||||
|
tile->frag_areas[view]);
|
||||||
|
|
||||||
|
if (tile->frag_areas[view].width != 1 ||
|
||||||
|
tile->frag_areas[view].height != 1)
|
||||||
|
tile->subsampled_views |= 1u << view;
|
||||||
|
}
|
||||||
|
|
||||||
|
tile->subsampled = true;
|
||||||
|
tile->worklist_idx = y * vsc->tile_count.width + x;
|
||||||
|
|
||||||
|
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
while (!u_worklist_is_empty(&worklist)) {
|
||||||
|
struct tu_tile_config *tile =
|
||||||
|
u_worklist_pop_head(&worklist, struct tu_tile_config, worklist_idx);
|
||||||
|
|
||||||
|
/* First, iterate over the vertically adjacent tiles and check for
|
||||||
|
* vertical issues.
|
||||||
|
*/
|
||||||
|
for (unsigned i = 0; i < 2; i++) {
|
||||||
|
int x_offset = i == 0 ? -1 : tile->sysmem_extent.width;
|
||||||
|
int x_pos = tile->pos.x + x_offset;
|
||||||
|
if (x_pos < 0 || x_pos >= vsc->tile_count.width)
|
||||||
|
continue;
|
||||||
|
int y_start = MAX2(tile->pos.y - 1, 0);
|
||||||
|
int y_end = MIN2(tile->pos.y + tile->sysmem_extent.height,
|
||||||
|
vsc->tile_count.height - 1);
|
||||||
|
struct tu_tile_config *other_tile =
|
||||||
|
tu_get_merged_tile(&tiles[y_start * vsc->tile_count.width + x_pos]);
|
||||||
|
/* Sweep from (x_pos, y_start) to (x_pos, y_end), keeping in mind
|
||||||
|
* merged tiles.
|
||||||
|
*/
|
||||||
|
for (int y = y_start; y <= y_end;
|
||||||
|
y = other_tile->pos.y + other_tile->sysmem_extent.height) {
|
||||||
|
other_tile = tu_get_merged_tile(&tiles[y * vsc->tile_count.width + x_pos]);
|
||||||
|
uint32_t common_views = tile->visible_views &
|
||||||
|
other_tile->visible_views;
|
||||||
|
if (common_views == 0)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
if (((tile->subsampled_views | other_tile->subsampled_views) &
|
||||||
|
common_views) == 0)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
struct tu_tile_config *left_tile = (i == 0) ? other_tile : tile;
|
||||||
|
struct tu_tile_config *right_tile = (i == 0) ? tile : other_tile;
|
||||||
|
|
||||||
|
/* Due to bin merging, the right tile may not actually be
|
||||||
|
* to the right of the left tile, instead extending to the right
|
||||||
|
* of it, for example if other_tile includes (0, 0) and (1, 0) and
|
||||||
|
* the current tile is (0, 1) or vice versa. top_tile will then
|
||||||
|
* also be vertically adjacent, and we can skip it because it will
|
||||||
|
* be handled below, and it should not touch horizontally
|
||||||
|
* which means it will also not touch vertically.
|
||||||
|
*/
|
||||||
|
if (right_tile->pos.x < left_tile->pos.x +
|
||||||
|
left_tile->sysmem_extent.width)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
u_foreach_bit (view, common_views) {
|
||||||
|
if (!((tile->subsampled_views | other_tile->subsampled_views) &
|
||||||
|
(1u << view)))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
if (!aprons_intersect(tile, other_tile, view))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Try shifting the right tile to the right. */
|
||||||
|
if (right_tile->subsampled_views & (1u << view)) {
|
||||||
|
VkRect2D right_unsubsampled =
|
||||||
|
get_default_tile_pos(phys_dev, right_tile, view, fb,
|
||||||
|
tiling, fdm_offsets,
|
||||||
|
(VkExtent2D) { 1, 1 });
|
||||||
|
const unsigned shift_amount =
|
||||||
|
MAX2(APRON_SIZE * 2, phys_dev->info->tile_align_w);
|
||||||
|
if (right_tile->subsampled_pos[view].offset.x +
|
||||||
|
right_tile->subsampled_pos[view].extent.width +
|
||||||
|
shift_amount <= right_unsubsampled.offset.x +
|
||||||
|
right_unsubsampled.extent.width) {
|
||||||
|
right_tile->subsampled_pos[view].offset.x +=
|
||||||
|
shift_amount;
|
||||||
|
u_worklist_push_tail(&worklist, right_tile,
|
||||||
|
worklist_idx);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Now we have to make both tiles non-subsampled. */
|
||||||
|
if (tile->subsampled_views & (1u << view)) {
|
||||||
|
make_non_subsampled(phys_dev, tile, view, fb, tiling, fdm_offsets);
|
||||||
|
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (other_tile->subsampled_views & (1u << view)) {
|
||||||
|
make_non_subsampled(phys_dev, other_tile, view, fb, tiling, fdm_offsets);
|
||||||
|
u_worklist_push_tail(&worklist, other_tile, worklist_idx);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Do the identical thing for horizontally adjacent tiles.
|
||||||
|
*/
|
||||||
|
for (unsigned i = 0; i < 2; i++) {
|
||||||
|
int y_offset = i == 0 ? -1 : tile->sysmem_extent.height;
|
||||||
|
int y_pos = tile->pos.y + y_offset;
|
||||||
|
if (y_pos < 0 || y_pos >= vsc->tile_count.height)
|
||||||
|
continue;
|
||||||
|
int x_start = MAX2(tile->pos.x - 1, 0);
|
||||||
|
int x_end = MIN2(tile->pos.x + tile->sysmem_extent.width,
|
||||||
|
vsc->tile_count.width - 1);
|
||||||
|
struct tu_tile_config *other_tile =
|
||||||
|
tu_get_merged_tile(&tiles[y_pos * vsc->tile_count.width + x_start]);
|
||||||
|
/* Sweep from (x_start, y_pos) to (x_end, y_pos), keeping in mind
|
||||||
|
* merged tiles.
|
||||||
|
*/
|
||||||
|
for (int x = x_start; x <= x_end;
|
||||||
|
x = other_tile->pos.x + other_tile->sysmem_extent.width) {
|
||||||
|
other_tile = tu_get_merged_tile(&tiles[y_pos * vsc->tile_count.width + x]);
|
||||||
|
uint32_t common_views = tile->visible_views &
|
||||||
|
other_tile->visible_views;
|
||||||
|
if (common_views == 0)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
if (((tile->subsampled_views | other_tile->subsampled_views) &
|
||||||
|
common_views) == 0)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
struct tu_tile_config *top_tile = (i == 0) ? other_tile : tile;
|
||||||
|
struct tu_tile_config *bottom_tile = (i == 0) ? tile : other_tile;
|
||||||
|
|
||||||
|
/* Due to bin merging, the bottom tile may not actually be
|
||||||
|
* below the top tile, instead extending below it, for example
|
||||||
|
* if other_tile includes (0, 0) and (0, 1) and the current
|
||||||
|
* tile is (1, 0) or vice versa. top_tile will then also be
|
||||||
|
* vertically adjacent, and we can skip it because it will have
|
||||||
|
* been handled above, and it should not touch vertically which
|
||||||
|
* means it will also not touch horizontally.
|
||||||
|
*/
|
||||||
|
if (bottom_tile->pos.y < top_tile->pos.y +
|
||||||
|
top_tile->sysmem_extent.height)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
u_foreach_bit (view, common_views) {
|
||||||
|
if (!((tile->subsampled_views | other_tile->subsampled_views) &
|
||||||
|
(1u << view)))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
if (!aprons_intersect(tile, other_tile, view))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Try shifting the bottom tile down. */
|
||||||
|
if (bottom_tile->subsampled_views & (1u << view)) {
|
||||||
|
VkRect2D bottom_unsubsampled =
|
||||||
|
get_default_tile_pos(phys_dev, bottom_tile, view, fb,
|
||||||
|
tiling, fdm_offsets,
|
||||||
|
(VkExtent2D) { 1, 1 });
|
||||||
|
const unsigned shift_amount =
|
||||||
|
MAX2(APRON_SIZE * 2, phys_dev->info->tile_align_h);
|
||||||
|
if (bottom_tile->subsampled_pos[view].offset.y +
|
||||||
|
bottom_tile->subsampled_pos[view].extent.height +
|
||||||
|
shift_amount <= bottom_unsubsampled.offset.y +
|
||||||
|
bottom_unsubsampled.extent.height) {
|
||||||
|
bottom_tile->subsampled_pos[view].offset.y +=
|
||||||
|
shift_amount;
|
||||||
|
u_worklist_push_tail(&worklist, bottom_tile,
|
||||||
|
worklist_idx);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Now we have to make both tiles non-subsampled. One or both
|
||||||
|
* may be shifted so we have to un-shift them.
|
||||||
|
*/
|
||||||
|
if (tile->subsampled_views & (1u << view)) {
|
||||||
|
make_non_subsampled(phys_dev, tile, view, fb, tiling, fdm_offsets);
|
||||||
|
u_worklist_push_tail(&worklist, tile, worklist_idx);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (other_tile->subsampled_views & (1u << view)) {
|
||||||
|
make_non_subsampled(phys_dev, other_tile, view, fb, tiling, fdm_offsets);
|
||||||
|
u_worklist_push_tail(&worklist, other_tile, worklist_idx);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
u_worklist_fini(&worklist);
|
||||||
|
}
|
||||||
|
|
||||||
struct tu_tile_config *
|
struct tu_tile_config *
|
||||||
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||||
|
|
@ -420,6 +787,13 @@ tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (cmd->state.fdm_subsampled &&
|
||||||
|
vsc->tile_count.width * vsc->tile_count.height <= TU_SUBSAMPLED_MAX_BINS) {
|
||||||
|
tu_calc_subsampled(tiles, cmd->device->physical_device,
|
||||||
|
cmd->state.tiling, cmd->state.framebuffer,
|
||||||
|
vsc, fdm_offsets);
|
||||||
|
}
|
||||||
|
|
||||||
return tiles;
|
return tiles;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -18,10 +18,37 @@ struct tu_tile_config {
|
||||||
uint32_t pipe;
|
uint32_t pipe;
|
||||||
uint32_t slot_mask;
|
uint32_t slot_mask;
|
||||||
uint32_t visible_views;
|
uint32_t visible_views;
|
||||||
|
|
||||||
|
/* Whether to use subsampled_pos instead of the normal origin in
|
||||||
|
* framebuffer space when storing this tile.
|
||||||
|
*/
|
||||||
|
bool subsampled;
|
||||||
|
|
||||||
|
/* If subsampled is true, whether this is a border tile that may not be
|
||||||
|
* aligned.
|
||||||
|
*/
|
||||||
|
bool subsampled_border;
|
||||||
|
|
||||||
|
/* If subsampled is true, which views to store subsampled. If true, the
|
||||||
|
* view is stored low-resolution as is, if false the view is expanded to
|
||||||
|
* its full size in sysmem when resolving. However the origin of the tile
|
||||||
|
* in subsampled space is always subsampled_pos when subsampled is true,
|
||||||
|
* regardless of the value of this field.
|
||||||
|
*/
|
||||||
|
uint32_t subsampled_views;
|
||||||
|
|
||||||
|
/* Used internally. */
|
||||||
|
unsigned worklist_idx;
|
||||||
|
|
||||||
/* The tile this tile was merged with. */
|
/* The tile this tile was merged with. */
|
||||||
struct tu_tile_config *merged_tile;
|
struct tu_tile_config *merged_tile;
|
||||||
|
|
||||||
|
/* For subsampled images, the start of the tile in the final subsampled
|
||||||
|
* image for each view. This may or may not be the start of the tile in
|
||||||
|
* framebuffer space, due to the need to shift tiles over.
|
||||||
|
*/
|
||||||
|
VkRect2D subsampled_pos[MAX_VIEWS];
|
||||||
|
|
||||||
/* For merged tiles, the extent in tiles when resolved to system memory.
|
/* For merged tiles, the extent in tiles when resolved to system memory.
|
||||||
*/
|
*/
|
||||||
VkExtent2D sysmem_extent;
|
VkExtent2D sysmem_extent;
|
||||||
|
|
@ -34,6 +61,25 @@ struct tu_tile_config {
|
||||||
VkExtent2D frag_areas[MAX_VIEWS];
|
VkExtent2D frag_areas[MAX_VIEWS];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/* After merging, follow the trail of merged_tile pointers back to the tile
|
||||||
|
* this tile was ultimately merged with.
|
||||||
|
*/
|
||||||
|
static inline struct tu_tile_config *
|
||||||
|
tu_get_merged_tile(struct tu_tile_config *tile)
|
||||||
|
{
|
||||||
|
while (tile->merged_tile)
|
||||||
|
tile = tile->merged_tile;
|
||||||
|
return tile;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline const struct tu_tile_config *
|
||||||
|
tu_get_merged_tile_const(const struct tu_tile_config *tile)
|
||||||
|
{
|
||||||
|
while (tile->merged_tile)
|
||||||
|
tile = tile->merged_tile;
|
||||||
|
return tile;
|
||||||
|
}
|
||||||
|
|
||||||
struct tu_tile_config *
|
struct tu_tile_config *
|
||||||
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
tu_calc_tile_config(struct tu_cmd_buffer *cmd, const struct tu_vsc_config *vsc,
|
||||||
const struct tu_image_view *fdm, const VkOffset2D *fdm_offsets);
|
const struct tu_image_view *fdm, const VkOffset2D *fdm_offsets);
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue