i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
/*
|
|
|
|
|
* Copyright © 2012 Intel Corporation
|
|
|
|
|
*
|
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
|
*
|
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
|
* Software.
|
|
|
|
|
*
|
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
|
*/
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
#include "compiler/nir/nir_builder.h"
|
|
|
|
|
|
2016-08-08 15:25:17 -07:00
|
|
|
#include "blorp_priv.h"
|
2014-04-22 22:11:27 +03:00
|
|
|
#include "brw_meta_util.h"
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
|
2013-05-30 14:53:55 -07:00
|
|
|
#define FILE_DEBUG_FLAG DEBUG_BLORP
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
|
i965/blorp: Generalize sampling code in preparation for Gen7
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-05-08 16:28:43 -07:00
|
|
|
/**
|
|
|
|
|
* Enum to specify the order of arguments in a sampler message
|
|
|
|
|
*/
|
|
|
|
|
enum sampler_message_arg
|
|
|
|
|
{
|
|
|
|
|
SAMPLER_MESSAGE_ARG_U_FLOAT,
|
|
|
|
|
SAMPLER_MESSAGE_ARG_V_FLOAT,
|
|
|
|
|
SAMPLER_MESSAGE_ARG_U_INT,
|
|
|
|
|
SAMPLER_MESSAGE_ARG_V_INT,
|
2016-04-08 10:22:37 +03:00
|
|
|
SAMPLER_MESSAGE_ARG_R_INT,
|
2012-05-09 06:57:06 -07:00
|
|
|
SAMPLER_MESSAGE_ARG_SI_INT,
|
2012-07-03 08:13:35 -07:00
|
|
|
SAMPLER_MESSAGE_ARG_MCS_INT,
|
2012-05-09 06:57:06 -07:00
|
|
|
SAMPLER_MESSAGE_ARG_ZERO_INT,
|
i965/blorp: Generalize sampling code in preparation for Gen7
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-05-08 16:28:43 -07:00
|
|
|
};
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
struct brw_blorp_blit_vars {
|
2016-05-17 15:44:39 +03:00
|
|
|
/* Input values from brw_blorp_wm_inputs */
|
2016-05-15 07:43:39 +03:00
|
|
|
nir_variable *v_discard_rect;
|
|
|
|
|
nir_variable *v_rect_grid;
|
|
|
|
|
nir_variable *v_coord_transform;
|
|
|
|
|
nir_variable *v_src_z;
|
2016-08-30 11:18:39 -07:00
|
|
|
nir_variable *v_src_offset;
|
|
|
|
|
nir_variable *v_dst_offset;
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
/* gl_FragCoord */
|
|
|
|
|
nir_variable *frag_coord;
|
|
|
|
|
|
|
|
|
|
/* gl_FragColor */
|
|
|
|
|
nir_variable *color_out;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
brw_blorp_blit_vars_init(nir_builder *b, struct brw_blorp_blit_vars *v,
|
|
|
|
|
const struct brw_blorp_blit_prog_key *key)
|
|
|
|
|
{
|
2016-05-17 09:27:49 +03:00
|
|
|
/* Blended and scaled blits never use pixel discard. */
|
|
|
|
|
assert(!key->use_kill || !(key->blend && key->blit_scaled));
|
|
|
|
|
|
2016-07-03 10:19:25 +03:00
|
|
|
#define LOAD_INPUT(name, type)\
|
2016-05-15 07:43:39 +03:00
|
|
|
v->v_##name = nir_variable_create(b->shader, nir_var_shader_in, \
|
|
|
|
|
type, #name); \
|
2016-07-07 02:02:38 -07:00
|
|
|
v->v_##name->data.interpolation = INTERP_MODE_FLAT; \
|
2016-05-15 07:43:39 +03:00
|
|
|
v->v_##name->data.location = VARYING_SLOT_VAR0 + \
|
2016-08-30 11:18:39 -07:00
|
|
|
offsetof(struct brw_blorp_wm_inputs, name) / (4 * sizeof(float)); \
|
|
|
|
|
v->v_##name->data.location_frac = \
|
|
|
|
|
(offsetof(struct brw_blorp_wm_inputs, name) / sizeof(float)) % 4;
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-07-03 10:19:25 +03:00
|
|
|
LOAD_INPUT(discard_rect, glsl_vec4_type())
|
|
|
|
|
LOAD_INPUT(rect_grid, glsl_vec4_type())
|
2016-05-17 16:41:23 +03:00
|
|
|
LOAD_INPUT(coord_transform, glsl_vec4_type())
|
2016-07-03 10:19:25 +03:00
|
|
|
LOAD_INPUT(src_z, glsl_uint_type())
|
2016-08-30 11:18:39 -07:00
|
|
|
LOAD_INPUT(src_offset, glsl_vector_type(GLSL_TYPE_UINT, 2))
|
|
|
|
|
LOAD_INPUT(dst_offset, glsl_vector_type(GLSL_TYPE_UINT, 2))
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-07-03 10:19:25 +03:00
|
|
|
#undef LOAD_INPUT
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
v->frag_coord = nir_variable_create(b->shader, nir_var_shader_in,
|
|
|
|
|
glsl_vec4_type(), "gl_FragCoord");
|
|
|
|
|
v->frag_coord->data.location = VARYING_SLOT_POS;
|
|
|
|
|
v->frag_coord->data.origin_upper_left = true;
|
|
|
|
|
|
|
|
|
|
v->color_out = nir_variable_create(b->shader, nir_var_shader_out,
|
|
|
|
|
glsl_vec4_type(), "gl_FragColor");
|
|
|
|
|
v->color_out->data.location = FRAG_RESULT_COLOR;
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-08 15:33:43 -07:00
|
|
|
static nir_ssa_def *
|
2016-04-29 12:52:00 -07:00
|
|
|
blorp_blit_get_frag_coords(nir_builder *b,
|
|
|
|
|
const struct brw_blorp_blit_prog_key *key,
|
|
|
|
|
struct brw_blorp_blit_vars *v)
|
|
|
|
|
{
|
|
|
|
|
nir_ssa_def *coord = nir_f2i(b, nir_load_var(b, v->frag_coord));
|
|
|
|
|
|
2016-08-30 11:18:39 -07:00
|
|
|
/* Account for destination surface intratile offset
|
|
|
|
|
*
|
|
|
|
|
* Transformation parameters giving translation from destination to source
|
|
|
|
|
* coordinates don't take into account possible intra-tile destination
|
|
|
|
|
* offset. Therefore it has to be first subtracted from the incoming
|
|
|
|
|
* coordinates. Vertices are set up based on coordinates containing the
|
|
|
|
|
* intra-tile offset.
|
|
|
|
|
*/
|
|
|
|
|
if (key->need_dst_offset)
|
|
|
|
|
coord = nir_isub(b, coord, nir_load_var(b, v->v_dst_offset));
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
if (key->persample_msaa_dispatch) {
|
|
|
|
|
return nir_vec3(b, nir_channel(b, coord, 0), nir_channel(b, coord, 1),
|
2016-08-05 17:10:18 -07:00
|
|
|
nir_load_sample_id(b));
|
2016-04-29 12:52:00 -07:00
|
|
|
} else {
|
|
|
|
|
return nir_vec2(b, nir_channel(b, coord, 0), nir_channel(b, coord, 1));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Emit code to translate from destination (X, Y) coordinates to source (X, Y)
|
|
|
|
|
* coordinates.
|
|
|
|
|
*/
|
2016-08-08 15:33:43 -07:00
|
|
|
static nir_ssa_def *
|
2016-04-29 12:52:00 -07:00
|
|
|
blorp_blit_apply_transform(nir_builder *b, nir_ssa_def *src_pos,
|
|
|
|
|
struct brw_blorp_blit_vars *v)
|
|
|
|
|
{
|
2016-05-15 07:43:39 +03:00
|
|
|
nir_ssa_def *coord_transform = nir_load_var(b, v->v_coord_transform);
|
2016-05-17 16:41:23 +03:00
|
|
|
|
|
|
|
|
nir_ssa_def *offset = nir_vec2(b, nir_channel(b, coord_transform, 1),
|
|
|
|
|
nir_channel(b, coord_transform, 3));
|
|
|
|
|
nir_ssa_def *mul = nir_vec2(b, nir_channel(b, coord_transform, 0),
|
|
|
|
|
nir_channel(b, coord_transform, 2));
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-05-13 00:36:25 -07:00
|
|
|
return nir_ffma(b, src_pos, mul, offset);
|
2016-04-29 12:52:00 -07:00
|
|
|
}
|
|
|
|
|
|
2016-05-02 12:30:45 -07:00
|
|
|
static inline void
|
|
|
|
|
blorp_nir_discard_if_outside_rect(nir_builder *b, nir_ssa_def *pos,
|
|
|
|
|
struct brw_blorp_blit_vars *v)
|
|
|
|
|
{
|
|
|
|
|
nir_ssa_def *c0, *c1, *c2, *c3;
|
2016-05-15 07:43:39 +03:00
|
|
|
nir_ssa_def *discard_rect = nir_load_var(b, v->v_discard_rect);
|
2016-05-17 09:27:49 +03:00
|
|
|
nir_ssa_def *dst_x0 = nir_channel(b, discard_rect, 0);
|
|
|
|
|
nir_ssa_def *dst_x1 = nir_channel(b, discard_rect, 1);
|
|
|
|
|
nir_ssa_def *dst_y0 = nir_channel(b, discard_rect, 2);
|
|
|
|
|
nir_ssa_def *dst_y1 = nir_channel(b, discard_rect, 3);
|
|
|
|
|
|
|
|
|
|
c0 = nir_ult(b, nir_channel(b, pos, 0), dst_x0);
|
|
|
|
|
c1 = nir_uge(b, nir_channel(b, pos, 0), dst_x1);
|
|
|
|
|
c2 = nir_ult(b, nir_channel(b, pos, 1), dst_y0);
|
|
|
|
|
c3 = nir_uge(b, nir_channel(b, pos, 1), dst_y1);
|
|
|
|
|
|
2016-05-02 12:30:45 -07:00
|
|
|
nir_ssa_def *oob = nir_ior(b, nir_ior(b, c0, c1), nir_ior(b, c2, c3));
|
|
|
|
|
|
|
|
|
|
nir_intrinsic_instr *discard =
|
|
|
|
|
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
|
|
|
|
|
discard->src[0] = nir_src_for_ssa(oob);
|
|
|
|
|
nir_builder_instr_insert(b, &discard->instr);
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
static nir_tex_instr *
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_create_nir_tex_instr(nir_builder *b, struct brw_blorp_blit_vars *v,
|
|
|
|
|
nir_texop op, nir_ssa_def *pos, unsigned num_srcs,
|
2016-08-08 16:53:00 -07:00
|
|
|
nir_alu_type dst_type)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
2016-06-28 14:10:49 -07:00
|
|
|
nir_tex_instr *tex = nir_tex_instr_create(b->shader, num_srcs);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
tex->op = op;
|
|
|
|
|
|
2016-08-08 16:53:00 -07:00
|
|
|
tex->dest_type = dst_type;
|
2016-04-29 12:52:00 -07:00
|
|
|
tex->is_array = false;
|
|
|
|
|
tex->is_shadow = false;
|
|
|
|
|
|
|
|
|
|
/* Blorp only has one texture and it's bound at unit 0 */
|
|
|
|
|
tex->texture = NULL;
|
|
|
|
|
tex->sampler = NULL;
|
|
|
|
|
tex->texture_index = 0;
|
|
|
|
|
tex->sampler_index = 0;
|
|
|
|
|
|
2016-06-28 14:10:49 -07:00
|
|
|
/* To properly handle 3-D and 2-D array textures, we pull the Z component
|
|
|
|
|
* from an input. TODO: This is a bit magic; we should probably make this
|
|
|
|
|
* more explicit in the future.
|
|
|
|
|
*/
|
|
|
|
|
assert(pos->num_components >= 2);
|
|
|
|
|
pos = nir_vec3(b, nir_channel(b, pos, 0), nir_channel(b, pos, 1),
|
|
|
|
|
nir_load_var(b, v->v_src_z));
|
|
|
|
|
|
|
|
|
|
tex->src[0].src_type = nir_tex_src_coord;
|
|
|
|
|
tex->src[0].src = nir_src_for_ssa(pos);
|
|
|
|
|
tex->coord_components = 3;
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
nir_ssa_dest_init(&tex->instr, &tex->dest, 4, 32, NULL);
|
|
|
|
|
|
|
|
|
|
return tex;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_nir_tex(nir_builder *b, struct brw_blorp_blit_vars *v,
|
2016-08-08 16:53:00 -07:00
|
|
|
nir_ssa_def *pos, nir_alu_type dst_type)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
|
|
|
|
nir_tex_instr *tex =
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_create_nir_tex_instr(b, v, nir_texop_tex, pos, 2, dst_type);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
|
|
|
|
|
tex->src[1].src_type = nir_tex_src_lod;
|
|
|
|
|
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
|
|
|
|
|
|
|
|
|
|
nir_builder_instr_insert(b, &tex->instr);
|
|
|
|
|
|
|
|
|
|
return &tex->dest.ssa;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
|
|
|
|
blorp_nir_txf(nir_builder *b, struct brw_blorp_blit_vars *v,
|
2016-08-08 16:53:00 -07:00
|
|
|
nir_ssa_def *pos, nir_alu_type dst_type)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
|
|
|
|
nir_tex_instr *tex =
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_create_nir_tex_instr(b, v, nir_texop_txf, pos, 2, dst_type);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
tex->sampler_dim = GLSL_SAMPLER_DIM_3D;
|
|
|
|
|
tex->src[1].src_type = nir_tex_src_lod;
|
|
|
|
|
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
|
|
|
|
|
|
|
|
|
|
nir_builder_instr_insert(b, &tex->instr);
|
|
|
|
|
|
|
|
|
|
return &tex->dest.ssa;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_nir_txf_ms(nir_builder *b, struct brw_blorp_blit_vars *v,
|
2016-08-08 16:53:00 -07:00
|
|
|
nir_ssa_def *pos, nir_ssa_def *mcs, nir_alu_type dst_type)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
|
|
|
|
nir_tex_instr *tex =
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_create_nir_tex_instr(b, v, nir_texop_txf_ms, pos,
|
2016-04-29 12:52:00 -07:00
|
|
|
mcs != NULL ? 3 : 2, dst_type);
|
|
|
|
|
|
|
|
|
|
tex->sampler_dim = GLSL_SAMPLER_DIM_MS;
|
|
|
|
|
|
|
|
|
|
tex->src[1].src_type = nir_tex_src_ms_index;
|
|
|
|
|
if (pos->num_components == 2) {
|
|
|
|
|
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
|
|
|
|
|
} else {
|
|
|
|
|
assert(pos->num_components == 3);
|
|
|
|
|
tex->src[1].src = nir_src_for_ssa(nir_channel(b, pos, 2));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (mcs) {
|
|
|
|
|
tex->src[2].src_type = nir_tex_src_ms_mcs;
|
|
|
|
|
tex->src[2].src = nir_src_for_ssa(mcs);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
nir_builder_instr_insert(b, &tex->instr);
|
|
|
|
|
|
|
|
|
|
return &tex->dest.ssa;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_nir_txf_ms_mcs(nir_builder *b, struct brw_blorp_blit_vars *v, nir_ssa_def *pos)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
|
|
|
|
nir_tex_instr *tex =
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_create_nir_tex_instr(b, v, nir_texop_txf_ms_mcs,
|
2016-08-08 16:53:00 -07:00
|
|
|
pos, 1, nir_type_int);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
tex->sampler_dim = GLSL_SAMPLER_DIM_MS;
|
|
|
|
|
|
|
|
|
|
nir_builder_instr_insert(b, &tex->instr);
|
|
|
|
|
|
|
|
|
|
return &tex->dest.ssa;
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-02 11:50:06 -07:00
|
|
|
static nir_ssa_def *
|
|
|
|
|
nir_mask_shift_or(struct nir_builder *b, nir_ssa_def *dst, nir_ssa_def *src,
|
|
|
|
|
uint32_t src_mask, int src_left_shift)
|
|
|
|
|
{
|
|
|
|
|
nir_ssa_def *masked = nir_iand(b, src, nir_imm_int(b, src_mask));
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *shifted;
|
|
|
|
|
if (src_left_shift > 0) {
|
|
|
|
|
shifted = nir_ishl(b, masked, nir_imm_int(b, src_left_shift));
|
|
|
|
|
} else if (src_left_shift < 0) {
|
|
|
|
|
shifted = nir_ushr(b, masked, nir_imm_int(b, -src_left_shift));
|
|
|
|
|
} else {
|
|
|
|
|
assert(src_left_shift == 0);
|
|
|
|
|
shifted = masked;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return nir_ior(b, dst, shifted);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Emit code to compensate for the difference between Y and W tiling.
|
|
|
|
|
*
|
|
|
|
|
* This code modifies the X and Y coordinates according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X', Y', S') = detile(W-MAJOR, tile(Y-MAJOR, X, Y, S))
|
|
|
|
|
*
|
|
|
|
|
* (See brw_blorp_build_nir_shader).
|
|
|
|
|
*/
|
|
|
|
|
static inline nir_ssa_def *
|
|
|
|
|
blorp_nir_retile_y_to_w(nir_builder *b, nir_ssa_def *pos)
|
|
|
|
|
{
|
|
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
nir_ssa_def *x_Y = nir_channel(b, pos, 0);
|
|
|
|
|
nir_ssa_def *y_Y = nir_channel(b, pos, 1);
|
|
|
|
|
|
|
|
|
|
/* Given X and Y coordinates that describe an address using Y tiling,
|
|
|
|
|
* translate to the X and Y coordinates that describe the same address
|
|
|
|
|
* using W tiling.
|
|
|
|
|
*
|
|
|
|
|
* If we break down the low order bits of X and Y, using a
|
|
|
|
|
* single letter to represent each low-order bit:
|
|
|
|
|
*
|
|
|
|
|
* X = A << 7 | 0bBCDEFGH
|
|
|
|
|
* Y = J << 5 | 0bKLMNP (1)
|
|
|
|
|
*
|
|
|
|
|
* Then we can apply the Y tiling formula to see the memory offset being
|
|
|
|
|
* addressed:
|
|
|
|
|
*
|
|
|
|
|
* offset = (J * tile_pitch + A) << 12 | 0bBCDKLMNPEFGH (2)
|
|
|
|
|
*
|
|
|
|
|
* If we apply the W detiling formula to this memory location, that the
|
|
|
|
|
* corresponding X' and Y' coordinates are:
|
|
|
|
|
*
|
|
|
|
|
* X' = A << 6 | 0bBCDPFH (3)
|
|
|
|
|
* Y' = J << 6 | 0bKLMNEG
|
|
|
|
|
*
|
|
|
|
|
* Combining (1) and (3), we see that to transform (X, Y) to (X', Y'),
|
|
|
|
|
* we need to make the following computation:
|
|
|
|
|
*
|
|
|
|
|
* X' = (X & ~0b1011) >> 1 | (Y & 0b1) << 2 | X & 0b1 (4)
|
|
|
|
|
* Y' = (Y & ~0b1) << 1 | (X & 0b1000) >> 2 | (X & 0b10) >> 1
|
|
|
|
|
*/
|
|
|
|
|
nir_ssa_def *x_W = nir_imm_int(b, 0);
|
|
|
|
|
x_W = nir_mask_shift_or(b, x_W, x_Y, 0xfffffff4, -1);
|
|
|
|
|
x_W = nir_mask_shift_or(b, x_W, y_Y, 0x1, 2);
|
|
|
|
|
x_W = nir_mask_shift_or(b, x_W, x_Y, 0x1, 0);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *y_W = nir_imm_int(b, 0);
|
|
|
|
|
y_W = nir_mask_shift_or(b, y_W, y_Y, 0xfffffffe, 1);
|
|
|
|
|
y_W = nir_mask_shift_or(b, y_W, x_Y, 0x8, -2);
|
|
|
|
|
y_W = nir_mask_shift_or(b, y_W, x_Y, 0x2, -1);
|
|
|
|
|
|
|
|
|
|
return nir_vec2(b, x_W, y_W);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Emit code to compensate for the difference between Y and W tiling.
|
|
|
|
|
*
|
|
|
|
|
* This code modifies the X and Y coordinates according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X', Y', S') = detile(Y-MAJOR, tile(W-MAJOR, X, Y, S))
|
|
|
|
|
*
|
|
|
|
|
* (See brw_blorp_build_nir_shader).
|
|
|
|
|
*/
|
|
|
|
|
static inline nir_ssa_def *
|
|
|
|
|
blorp_nir_retile_w_to_y(nir_builder *b, nir_ssa_def *pos)
|
|
|
|
|
{
|
|
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
nir_ssa_def *x_W = nir_channel(b, pos, 0);
|
|
|
|
|
nir_ssa_def *y_W = nir_channel(b, pos, 1);
|
|
|
|
|
|
|
|
|
|
/* Applying the same logic as above, but in reverse, we obtain the
|
|
|
|
|
* formulas:
|
|
|
|
|
*
|
|
|
|
|
* X' = (X & ~0b101) << 1 | (Y & 0b10) << 2 | (Y & 0b1) << 1 | X & 0b1
|
|
|
|
|
* Y' = (Y & ~0b11) >> 1 | (X & 0b100) >> 2
|
|
|
|
|
*/
|
|
|
|
|
nir_ssa_def *x_Y = nir_imm_int(b, 0);
|
|
|
|
|
x_Y = nir_mask_shift_or(b, x_Y, x_W, 0xfffffffa, 1);
|
|
|
|
|
x_Y = nir_mask_shift_or(b, x_Y, y_W, 0x2, 2);
|
|
|
|
|
x_Y = nir_mask_shift_or(b, x_Y, y_W, 0x1, 1);
|
|
|
|
|
x_Y = nir_mask_shift_or(b, x_Y, x_W, 0x1, 0);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *y_Y = nir_imm_int(b, 0);
|
|
|
|
|
y_Y = nir_mask_shift_or(b, y_Y, y_W, 0xfffffffc, -1);
|
|
|
|
|
y_Y = nir_mask_shift_or(b, y_Y, x_W, 0x4, -2);
|
|
|
|
|
|
|
|
|
|
return nir_vec2(b, x_Y, y_Y);
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-02 12:13:14 -07:00
|
|
|
/**
|
|
|
|
|
* Emit code to compensate for the difference between MSAA and non-MSAA
|
|
|
|
|
* surfaces.
|
|
|
|
|
*
|
|
|
|
|
* This code modifies the X and Y coordinates according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X', Y', S') = encode_msaa(num_samples, IMS, X, Y, S)
|
|
|
|
|
*
|
|
|
|
|
* (See brw_blorp_blit_program).
|
|
|
|
|
*/
|
|
|
|
|
static inline nir_ssa_def *
|
|
|
|
|
blorp_nir_encode_msaa(nir_builder *b, nir_ssa_def *pos,
|
2016-06-23 15:50:18 -07:00
|
|
|
unsigned num_samples, enum isl_msaa_layout layout)
|
2016-05-02 12:13:14 -07:00
|
|
|
{
|
|
|
|
|
assert(pos->num_components == 2 || pos->num_components == 3);
|
|
|
|
|
|
|
|
|
|
switch (layout) {
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_NONE:
|
2016-05-02 12:13:14 -07:00
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
return pos;
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_ARRAY:
|
2016-05-02 12:13:14 -07:00
|
|
|
/* No translation needed */
|
|
|
|
|
return pos;
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_INTERLEAVED: {
|
2016-05-02 12:13:14 -07:00
|
|
|
nir_ssa_def *x_in = nir_channel(b, pos, 0);
|
|
|
|
|
nir_ssa_def *y_in = nir_channel(b, pos, 1);
|
|
|
|
|
nir_ssa_def *s_in = pos->num_components == 2 ? nir_imm_int(b, 0) :
|
|
|
|
|
nir_channel(b, pos, 2);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *x_out = nir_imm_int(b, 0);
|
|
|
|
|
nir_ssa_def *y_out = nir_imm_int(b, 0);
|
|
|
|
|
switch (num_samples) {
|
|
|
|
|
case 2:
|
|
|
|
|
case 4:
|
|
|
|
|
/* encode_msaa(2, IMS, X, Y, S) = (X', Y', 0)
|
|
|
|
|
* where X' = (X & ~0b1) << 1 | (S & 0b1) << 1 | (X & 0b1)
|
|
|
|
|
* Y' = Y
|
|
|
|
|
*
|
|
|
|
|
* encode_msaa(4, IMS, X, Y, S) = (X', Y', 0)
|
|
|
|
|
* where X' = (X & ~0b1) << 1 | (S & 0b1) << 1 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b1) << 1 | (S & 0b10) | (Y & 0b1)
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffffe, 1);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, s_in, 0x1, 1);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
if (num_samples == 2) {
|
|
|
|
|
y_out = y_in;
|
|
|
|
|
} else {
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffffe, 1);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, s_in, 0x2, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case 8:
|
|
|
|
|
/* encode_msaa(8, IMS, X, Y, S) = (X', Y', 0)
|
|
|
|
|
* where X' = (X & ~0b1) << 2 | (S & 0b100) | (S & 0b1) << 1
|
|
|
|
|
* | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b1) << 1 | (S & 0b10) | (Y & 0b1)
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffffe, 2);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, s_in, 0x4, 0);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, s_in, 0x1, 1);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffffe, 1);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, s_in, 0x2, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
break;
|
|
|
|
|
|
2016-05-11 17:11:47 -07:00
|
|
|
case 16:
|
|
|
|
|
/* encode_msaa(16, IMS, X, Y, S) = (X', Y', 0)
|
|
|
|
|
* where X' = (X & ~0b1) << 2 | (S & 0b100) | (S & 0b1) << 1
|
|
|
|
|
* | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b1) << 2 | (S & 0b1000) >> 1 (S & 0b10)
|
|
|
|
|
* | (Y & 0b1)
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffffe, 2);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, s_in, 0x4, 0);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, s_in, 0x1, 1);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffffe, 2);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, s_in, 0x8, -1);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, s_in, 0x2, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
break;
|
|
|
|
|
|
2016-05-02 12:13:14 -07:00
|
|
|
default:
|
|
|
|
|
unreachable("Invalid number of samples for IMS layout");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return nir_vec2(b, x_out, y_out);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
default:
|
|
|
|
|
unreachable("Invalid MSAA layout");
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Emit code to compensate for the difference between MSAA and non-MSAA
|
|
|
|
|
* surfaces.
|
|
|
|
|
*
|
|
|
|
|
* This code modifies the X and Y coordinates according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X', Y', S) = decode_msaa(num_samples, IMS, X, Y, S)
|
|
|
|
|
*
|
|
|
|
|
* (See brw_blorp_blit_program).
|
|
|
|
|
*/
|
|
|
|
|
static inline nir_ssa_def *
|
|
|
|
|
blorp_nir_decode_msaa(nir_builder *b, nir_ssa_def *pos,
|
2016-06-23 15:50:18 -07:00
|
|
|
unsigned num_samples, enum isl_msaa_layout layout)
|
2016-05-02 12:13:14 -07:00
|
|
|
{
|
|
|
|
|
assert(pos->num_components == 2 || pos->num_components == 3);
|
|
|
|
|
|
|
|
|
|
switch (layout) {
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_NONE:
|
2016-05-02 12:13:14 -07:00
|
|
|
/* No translation necessary, and S should already be zero. */
|
|
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
return pos;
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_ARRAY:
|
2016-05-02 12:13:14 -07:00
|
|
|
/* No translation necessary. */
|
|
|
|
|
return pos;
|
2016-06-23 15:50:18 -07:00
|
|
|
case ISL_MSAA_LAYOUT_INTERLEAVED: {
|
2016-05-02 12:13:14 -07:00
|
|
|
assert(pos->num_components == 2);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *x_in = nir_channel(b, pos, 0);
|
|
|
|
|
nir_ssa_def *y_in = nir_channel(b, pos, 1);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *x_out = nir_imm_int(b, 0);
|
|
|
|
|
nir_ssa_def *y_out = nir_imm_int(b, 0);
|
|
|
|
|
nir_ssa_def *s_out = nir_imm_int(b, 0);
|
|
|
|
|
switch (num_samples) {
|
|
|
|
|
case 2:
|
|
|
|
|
case 4:
|
|
|
|
|
/* decode_msaa(2, IMS, X, Y, 0) = (X', Y', S)
|
|
|
|
|
* where X' = (X & ~0b11) >> 1 | (X & 0b1)
|
|
|
|
|
* S = (X & 0b10) >> 1
|
|
|
|
|
*
|
|
|
|
|
* decode_msaa(4, IMS, X, Y, 0) = (X', Y', S)
|
|
|
|
|
* where X' = (X & ~0b11) >> 1 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b11) >> 1 | (Y & 0b1)
|
|
|
|
|
* S = (Y & 0b10) | (X & 0b10) >> 1
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffffc, -1);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
if (num_samples == 2) {
|
|
|
|
|
y_out = y_in;
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x2, -1);
|
|
|
|
|
} else {
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffffc, -1);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x2, -1);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, y_in, 0x2, 0);
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case 8:
|
|
|
|
|
/* decode_msaa(8, IMS, X, Y, 0) = (X', Y', S)
|
|
|
|
|
* where X' = (X & ~0b111) >> 2 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b11) >> 1 | (Y & 0b1)
|
|
|
|
|
* S = (X & 0b100) | (Y & 0b10) | (X & 0b10) >> 1
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffff8, -2);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffffc, -1);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x4, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, y_in, 0x2, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x2, -1);
|
|
|
|
|
break;
|
|
|
|
|
|
2016-05-11 17:11:47 -07:00
|
|
|
case 16:
|
|
|
|
|
/* decode_msaa(16, IMS, X, Y, 0) = (X', Y', S)
|
|
|
|
|
* where X' = (X & ~0b111) >> 2 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b111) >> 2 | (Y & 0b1)
|
|
|
|
|
* S = (Y & 0b100) << 1 | (X & 0b100) |
|
|
|
|
|
* (Y & 0b10) | (X & 0b10) >> 1
|
|
|
|
|
*/
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffffff8, -2);
|
|
|
|
|
x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffffff8, -2);
|
|
|
|
|
y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, y_in, 0x4, 1);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x4, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, y_in, 0x2, 0);
|
|
|
|
|
s_out = nir_mask_shift_or(b, s_out, x_in, 0x2, -1);
|
|
|
|
|
break;
|
|
|
|
|
|
2016-05-02 12:13:14 -07:00
|
|
|
default:
|
|
|
|
|
unreachable("Invalid number of samples for IMS layout");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return nir_vec3(b, x_out, y_out, s_out);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
default:
|
|
|
|
|
unreachable("Invalid MSAA layout");
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-03 16:22:46 -07:00
|
|
|
/**
|
|
|
|
|
* Count the number of trailing 1 bits in the given value. For example:
|
|
|
|
|
*
|
|
|
|
|
* count_trailing_one_bits(0) == 0
|
|
|
|
|
* count_trailing_one_bits(7) == 3
|
|
|
|
|
* count_trailing_one_bits(11) == 2
|
|
|
|
|
*/
|
|
|
|
|
static inline int count_trailing_one_bits(unsigned value)
|
|
|
|
|
{
|
|
|
|
|
#ifdef HAVE___BUILTIN_CTZ
|
|
|
|
|
return __builtin_ctz(~value);
|
|
|
|
|
#else
|
|
|
|
|
return _mesa_bitcount(value & ~(value + 1));
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
2016-06-28 14:10:49 -07:00
|
|
|
blorp_nir_manual_blend_average(nir_builder *b, struct brw_blorp_blit_vars *v,
|
|
|
|
|
nir_ssa_def *pos, unsigned tex_samples,
|
2016-06-23 15:17:15 -07:00
|
|
|
enum isl_aux_usage tex_aux_usage,
|
2016-08-08 16:53:00 -07:00
|
|
|
nir_alu_type dst_type)
|
2016-05-03 16:22:46 -07:00
|
|
|
{
|
|
|
|
|
/* If non-null, this is the outer-most if statement */
|
|
|
|
|
nir_if *outer_if = NULL;
|
|
|
|
|
|
|
|
|
|
nir_variable *color =
|
|
|
|
|
nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *mcs = NULL;
|
2016-06-23 15:17:15 -07:00
|
|
|
if (tex_aux_usage == ISL_AUX_USAGE_MCS)
|
2016-06-28 14:10:49 -07:00
|
|
|
mcs = blorp_nir_txf_ms_mcs(b, v, pos);
|
2016-05-03 16:22:46 -07:00
|
|
|
|
|
|
|
|
/* We add together samples using a binary tree structure, e.g. for 4x MSAA:
|
|
|
|
|
*
|
|
|
|
|
* result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
|
|
|
|
|
*
|
|
|
|
|
* This ensures that when all samples have the same value, no numerical
|
|
|
|
|
* precision is lost, since each addition operation always adds two equal
|
|
|
|
|
* values, and summing two equal floating point values does not lose
|
|
|
|
|
* precision.
|
|
|
|
|
*
|
|
|
|
|
* We perform this computation by treating the texture_data array as a
|
|
|
|
|
* stack and performing the following operations:
|
|
|
|
|
*
|
|
|
|
|
* - push sample 0 onto stack
|
|
|
|
|
* - push sample 1 onto stack
|
|
|
|
|
* - add top two stack entries
|
|
|
|
|
* - push sample 2 onto stack
|
|
|
|
|
* - push sample 3 onto stack
|
|
|
|
|
* - add top two stack entries
|
|
|
|
|
* - add top two stack entries
|
|
|
|
|
* - divide top stack entry by 4
|
|
|
|
|
*
|
|
|
|
|
* Note that after pushing sample i onto the stack, the number of add
|
|
|
|
|
* operations we do is equal to the number of trailing 1 bits in i. This
|
|
|
|
|
* works provided the total number of samples is a power of two, which it
|
|
|
|
|
* always is for i965.
|
|
|
|
|
*
|
|
|
|
|
* For integer formats, we replace the add operations with average
|
|
|
|
|
* operations and skip the final division.
|
|
|
|
|
*/
|
2016-05-11 17:11:47 -07:00
|
|
|
nir_ssa_def *texture_data[5];
|
2016-05-03 16:22:46 -07:00
|
|
|
unsigned stack_depth = 0;
|
|
|
|
|
for (unsigned i = 0; i < tex_samples; ++i) {
|
|
|
|
|
assert(stack_depth == _mesa_bitcount(i)); /* Loop invariant */
|
|
|
|
|
|
|
|
|
|
/* Push sample i onto the stack */
|
|
|
|
|
assert(stack_depth < ARRAY_SIZE(texture_data));
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *ms_pos = nir_vec3(b, nir_channel(b, pos, 0),
|
|
|
|
|
nir_channel(b, pos, 1),
|
|
|
|
|
nir_imm_int(b, i));
|
2016-06-28 14:10:49 -07:00
|
|
|
texture_data[stack_depth++] = blorp_nir_txf_ms(b, v, ms_pos, mcs, dst_type);
|
2016-05-03 16:22:46 -07:00
|
|
|
|
2016-06-23 15:17:15 -07:00
|
|
|
if (i == 0 && tex_aux_usage == ISL_AUX_USAGE_MCS) {
|
2016-05-03 16:22:46 -07:00
|
|
|
/* The Ivy Bridge PRM, Vol4 Part1 p27 (Multisample Control Surface)
|
|
|
|
|
* suggests an optimization:
|
|
|
|
|
*
|
|
|
|
|
* "A simple optimization with probable large return in
|
|
|
|
|
* performance is to compare the MCS value to zero (indicating
|
|
|
|
|
* all samples are on sample slice 0), and sample only from
|
|
|
|
|
* sample slice 0 using ld2dss if MCS is zero."
|
|
|
|
|
*
|
|
|
|
|
* Note that in the case where the MCS value is zero, sampling from
|
|
|
|
|
* sample slice 0 using ld2dss and sampling from sample 0 using
|
|
|
|
|
* ld2dms are equivalent (since all samples are on sample slice 0).
|
|
|
|
|
* Since we have already sampled from sample 0, all we need to do is
|
|
|
|
|
* skip the remaining fetches and averaging if MCS is zero.
|
|
|
|
|
*/
|
|
|
|
|
nir_ssa_def *mcs_zero =
|
|
|
|
|
nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0));
|
2016-05-11 17:11:47 -07:00
|
|
|
if (tex_samples == 16) {
|
|
|
|
|
mcs_zero = nir_iand(b, mcs_zero,
|
|
|
|
|
nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b, 0)));
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-03 16:22:46 -07:00
|
|
|
nir_if *if_stmt = nir_if_create(b->shader);
|
|
|
|
|
if_stmt->condition = nir_src_for_ssa(mcs_zero);
|
|
|
|
|
nir_cf_node_insert(b->cursor, &if_stmt->cf_node);
|
|
|
|
|
|
|
|
|
|
b->cursor = nir_after_cf_list(&if_stmt->then_list);
|
|
|
|
|
nir_store_var(b, color, texture_data[0], 0xf);
|
|
|
|
|
|
|
|
|
|
b->cursor = nir_after_cf_list(&if_stmt->else_list);
|
|
|
|
|
outer_if = if_stmt;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (int j = 0; j < count_trailing_one_bits(i); j++) {
|
|
|
|
|
assert(stack_depth >= 2);
|
|
|
|
|
--stack_depth;
|
|
|
|
|
|
2016-08-08 16:53:00 -07:00
|
|
|
assert(dst_type == nir_type_float);
|
2016-05-03 16:22:46 -07:00
|
|
|
texture_data[stack_depth - 1] =
|
|
|
|
|
nir_fadd(b, texture_data[stack_depth - 1],
|
|
|
|
|
texture_data[stack_depth]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* We should have just 1 sample on the stack now. */
|
|
|
|
|
assert(stack_depth == 1);
|
|
|
|
|
|
|
|
|
|
texture_data[0] = nir_fmul(b, texture_data[0],
|
|
|
|
|
nir_imm_float(b, 1.0 / tex_samples));
|
|
|
|
|
|
|
|
|
|
nir_store_var(b, color, texture_data[0], 0xf);
|
|
|
|
|
|
|
|
|
|
if (outer_if)
|
|
|
|
|
b->cursor = nir_after_cf_node(&outer_if->cf_node);
|
|
|
|
|
|
|
|
|
|
return nir_load_var(b, color);
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-05 11:01:16 -07:00
|
|
|
static inline nir_ssa_def *
|
|
|
|
|
nir_imm_vec2(nir_builder *build, float x, float y)
|
|
|
|
|
{
|
|
|
|
|
nir_const_value v;
|
|
|
|
|
|
|
|
|
|
memset(&v, 0, sizeof(v));
|
|
|
|
|
v.f32[0] = x;
|
|
|
|
|
v.f32[1] = y;
|
|
|
|
|
|
|
|
|
|
return nir_build_imm(build, 4, 32, v);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static nir_ssa_def *
|
|
|
|
|
blorp_nir_manual_blend_bilinear(nir_builder *b, nir_ssa_def *pos,
|
|
|
|
|
unsigned tex_samples,
|
2016-08-08 15:33:43 -07:00
|
|
|
const struct brw_blorp_blit_prog_key *key,
|
2016-05-05 11:01:16 -07:00
|
|
|
struct brw_blorp_blit_vars *v)
|
|
|
|
|
{
|
|
|
|
|
nir_ssa_def *pos_xy = nir_channels(b, pos, 0x3);
|
2016-05-15 07:43:39 +03:00
|
|
|
nir_ssa_def *rect_grid = nir_load_var(b, v->v_rect_grid);
|
2016-05-05 11:01:16 -07:00
|
|
|
nir_ssa_def *scale = nir_imm_vec2(b, key->x_scale, key->y_scale);
|
|
|
|
|
|
|
|
|
|
/* Translate coordinates to lay out the samples in a rectangular grid
|
|
|
|
|
* roughly corresponding to sample locations.
|
|
|
|
|
*/
|
|
|
|
|
pos_xy = nir_fmul(b, pos_xy, scale);
|
|
|
|
|
/* Adjust coordinates so that integers represent pixel centers rather
|
|
|
|
|
* than pixel edges.
|
|
|
|
|
*/
|
|
|
|
|
pos_xy = nir_fadd(b, pos_xy, nir_imm_float(b, -0.5));
|
|
|
|
|
/* Clamp the X, Y texture coordinates to properly handle the sampling of
|
|
|
|
|
* texels on texture edges.
|
|
|
|
|
*/
|
|
|
|
|
pos_xy = nir_fmin(b, nir_fmax(b, pos_xy, nir_imm_float(b, 0.0)),
|
2016-05-17 09:27:49 +03:00
|
|
|
nir_vec2(b, nir_channel(b, rect_grid, 0),
|
|
|
|
|
nir_channel(b, rect_grid, 1)));
|
2016-05-05 11:01:16 -07:00
|
|
|
|
|
|
|
|
/* Store the fractional parts to be used as bilinear interpolation
|
|
|
|
|
* coefficients.
|
|
|
|
|
*/
|
|
|
|
|
nir_ssa_def *frac_xy = nir_ffract(b, pos_xy);
|
|
|
|
|
/* Round the float coordinates down to nearest integer */
|
|
|
|
|
pos_xy = nir_fdiv(b, nir_ftrunc(b, pos_xy), scale);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *tex_data[4];
|
|
|
|
|
for (unsigned i = 0; i < 4; ++i) {
|
|
|
|
|
float sample_off_x = (float)(i & 0x1) / key->x_scale;
|
|
|
|
|
float sample_off_y = (float)((i >> 1) & 0x1) / key->y_scale;
|
|
|
|
|
nir_ssa_def *sample_off = nir_imm_vec2(b, sample_off_x, sample_off_y);
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *sample_coords = nir_fadd(b, pos_xy, sample_off);
|
|
|
|
|
nir_ssa_def *sample_coords_int = nir_f2i(b, sample_coords);
|
|
|
|
|
|
|
|
|
|
/* The MCS value we fetch has to match up with the pixel that we're
|
|
|
|
|
* sampling from. Since we sample from different pixels in each
|
|
|
|
|
* iteration of this "for" loop, the call to mcs_fetch() should be
|
|
|
|
|
* here inside the loop after computing the pixel coordinates.
|
|
|
|
|
*/
|
|
|
|
|
nir_ssa_def *mcs = NULL;
|
2016-06-23 15:17:15 -07:00
|
|
|
if (key->tex_aux_usage == ISL_AUX_USAGE_MCS)
|
2016-06-28 14:10:49 -07:00
|
|
|
mcs = blorp_nir_txf_ms_mcs(b, v, sample_coords_int);
|
2016-05-05 11:01:16 -07:00
|
|
|
|
|
|
|
|
/* Compute sample index and map the sample index to a sample number.
|
|
|
|
|
* Sample index layout shows the numbering of slots in a rectangular
|
|
|
|
|
* grid of samples with in a pixel. Sample number layout shows the
|
|
|
|
|
* rectangular grid of samples roughly corresponding to the real sample
|
|
|
|
|
* locations with in a pixel.
|
|
|
|
|
* In case of 4x MSAA, layout of sample indices matches the layout of
|
|
|
|
|
* sample numbers:
|
|
|
|
|
* ---------
|
|
|
|
|
* | 0 | 1 |
|
|
|
|
|
* ---------
|
|
|
|
|
* | 2 | 3 |
|
|
|
|
|
* ---------
|
|
|
|
|
*
|
|
|
|
|
* In case of 8x MSAA the two layouts don't match.
|
|
|
|
|
* sample index layout : --------- sample number layout : ---------
|
2016-08-11 12:05:45 -07:00
|
|
|
* | 0 | 1 | | 3 | 7 |
|
2016-05-05 11:01:16 -07:00
|
|
|
* --------- ---------
|
2016-08-11 12:05:45 -07:00
|
|
|
* | 2 | 3 | | 5 | 0 |
|
2016-05-05 11:01:16 -07:00
|
|
|
* --------- ---------
|
2016-08-11 12:05:45 -07:00
|
|
|
* | 4 | 5 | | 1 | 2 |
|
2016-05-05 11:01:16 -07:00
|
|
|
* --------- ---------
|
2016-08-11 12:05:45 -07:00
|
|
|
* | 6 | 7 | | 4 | 6 |
|
2016-05-05 11:01:16 -07:00
|
|
|
* --------- ---------
|
|
|
|
|
*
|
|
|
|
|
* Fortunately, this can be done fairly easily as:
|
|
|
|
|
* S' = (0x17306425 >> (S * 4)) & 0xf
|
2016-05-11 17:11:47 -07:00
|
|
|
*
|
|
|
|
|
* In the case of 16x MSAA the two layouts don't match.
|
|
|
|
|
* Sample index layout: Sample number layout:
|
|
|
|
|
* --------------------- ---------------------
|
2016-06-02 11:05:44 -07:00
|
|
|
* | 0 | 1 | 2 | 3 | | 15 | 10 | 9 | 7 |
|
2016-05-11 17:11:47 -07:00
|
|
|
* --------------------- ---------------------
|
2016-06-02 11:05:44 -07:00
|
|
|
* | 4 | 5 | 6 | 7 | | 4 | 1 | 3 | 13 |
|
2016-05-11 17:11:47 -07:00
|
|
|
* --------------------- ---------------------
|
|
|
|
|
* | 8 | 9 | 10 | 11 | | 12 | 2 | 0 | 6 |
|
|
|
|
|
* --------------------- ---------------------
|
|
|
|
|
* | 12 | 13 | 14 | 15 | | 11 | 8 | 5 | 14 |
|
|
|
|
|
* --------------------- ---------------------
|
|
|
|
|
*
|
|
|
|
|
* This is equivalent to
|
2016-06-02 11:05:44 -07:00
|
|
|
* S' = (0xe58b602cd31479af >> (S * 4)) & 0xf
|
2016-05-05 11:01:16 -07:00
|
|
|
*/
|
|
|
|
|
nir_ssa_def *frac = nir_ffract(b, sample_coords);
|
|
|
|
|
nir_ssa_def *sample =
|
|
|
|
|
nir_fdot2(b, frac, nir_imm_vec2(b, key->x_scale,
|
|
|
|
|
key->x_scale * key->y_scale));
|
|
|
|
|
sample = nir_f2i(b, sample);
|
|
|
|
|
|
|
|
|
|
if (tex_samples == 8) {
|
2016-08-11 12:05:45 -07:00
|
|
|
sample = nir_iand(b, nir_ishr(b, nir_imm_int(b, 0x64210573),
|
2016-05-05 11:01:16 -07:00
|
|
|
nir_ishl(b, sample, nir_imm_int(b, 2))),
|
|
|
|
|
nir_imm_int(b, 0xf));
|
2016-05-11 17:11:47 -07:00
|
|
|
} else if (tex_samples == 16) {
|
|
|
|
|
nir_ssa_def *sample_low =
|
2016-06-02 11:05:44 -07:00
|
|
|
nir_iand(b, nir_ishr(b, nir_imm_int(b, 0xd31479af),
|
2016-05-11 17:11:47 -07:00
|
|
|
nir_ishl(b, sample, nir_imm_int(b, 2))),
|
|
|
|
|
nir_imm_int(b, 0xf));
|
|
|
|
|
nir_ssa_def *sample_high =
|
2016-06-02 11:05:44 -07:00
|
|
|
nir_iand(b, nir_ishr(b, nir_imm_int(b, 0xe58b602c),
|
2016-05-11 17:11:47 -07:00
|
|
|
nir_ishl(b, nir_iadd(b, sample,
|
|
|
|
|
nir_imm_int(b, -8)),
|
|
|
|
|
nir_imm_int(b, 2))),
|
|
|
|
|
nir_imm_int(b, 0xf));
|
|
|
|
|
|
|
|
|
|
sample = nir_bcsel(b, nir_ilt(b, sample, nir_imm_int(b, 8)),
|
|
|
|
|
sample_low, sample_high);
|
2016-05-05 11:01:16 -07:00
|
|
|
}
|
|
|
|
|
nir_ssa_def *pos_ms = nir_vec3(b, nir_channel(b, sample_coords_int, 0),
|
|
|
|
|
nir_channel(b, sample_coords_int, 1),
|
|
|
|
|
sample);
|
2016-06-28 14:10:49 -07:00
|
|
|
tex_data[i] = blorp_nir_txf_ms(b, v, pos_ms, mcs, key->texture_data_type);
|
2016-05-05 11:01:16 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *frac_x = nir_channel(b, frac_xy, 0);
|
|
|
|
|
nir_ssa_def *frac_y = nir_channel(b, frac_xy, 1);
|
|
|
|
|
return nir_flrp(b, nir_flrp(b, tex_data[0], tex_data[1], frac_x),
|
|
|
|
|
nir_flrp(b, tex_data[2], tex_data[3], frac_x),
|
|
|
|
|
frac_y);
|
|
|
|
|
}
|
|
|
|
|
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
/**
|
|
|
|
|
* Generator for WM programs used in BLORP blits.
|
|
|
|
|
*
|
|
|
|
|
* The bulk of the work done by the WM program is to wrap and unwrap the
|
|
|
|
|
* coordinate transformations used by the hardware to store surfaces in
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
* memory. The hardware transforms a pixel location (X, Y, S) (where S is the
|
|
|
|
|
* sample index for a multisampled surface) to a memory offset by the
|
|
|
|
|
* following formulas:
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*
|
2012-05-08 13:39:10 -07:00
|
|
|
* offset = tile(tiling_format, encode_msaa(num_samples, layout, X, Y, S))
|
|
|
|
|
* (X, Y, S) = decode_msaa(num_samples, layout, detile(tiling_format, offset))
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
*
|
2012-07-04 05:48:25 -07:00
|
|
|
* For a single-sampled surface, or for a multisampled surface using
|
|
|
|
|
* INTEL_MSAA_LAYOUT_UMS, encode_msaa() and decode_msaa are the identity
|
|
|
|
|
* function:
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
*
|
2012-07-04 05:48:25 -07:00
|
|
|
* encode_msaa(1, NONE, X, Y, 0) = (X, Y, 0)
|
|
|
|
|
* decode_msaa(1, NONE, X, Y, 0) = (X, Y, 0)
|
|
|
|
|
* encode_msaa(n, UMS, X, Y, S) = (X, Y, S)
|
|
|
|
|
* decode_msaa(n, UMS, X, Y, S) = (X, Y, S)
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
*
|
2012-07-04 05:48:25 -07:00
|
|
|
* For a 4x multisampled surface using INTEL_MSAA_LAYOUT_IMS, encode_msaa()
|
|
|
|
|
* embeds the sample number into bit 1 of the X and Y coordinates:
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
*
|
2012-07-04 05:48:25 -07:00
|
|
|
* encode_msaa(4, IMS, X, Y, S) = (X', Y', 0)
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
* where X' = (X & ~0b1) << 1 | (S & 0b1) << 1 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b1 ) << 1 | (S & 0b10) | (Y & 0b1)
|
2012-07-04 05:48:25 -07:00
|
|
|
* decode_msaa(4, IMS, X, Y, 0) = (X', Y', S)
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
* where X' = (X & ~0b11) >> 1 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b11) >> 1 | (Y & 0b1)
|
|
|
|
|
* S = (Y & 0b10) | (X & 0b10) >> 1
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*
|
2012-07-17 21:06:01 -07:00
|
|
|
* For an 8x multisampled surface using INTEL_MSAA_LAYOUT_IMS, encode_msaa()
|
|
|
|
|
* embeds the sample number into bits 1 and 2 of the X coordinate and bit 1 of
|
|
|
|
|
* the Y coordinate:
|
|
|
|
|
*
|
|
|
|
|
* encode_msaa(8, IMS, X, Y, S) = (X', Y', 0)
|
|
|
|
|
* where X' = (X & ~0b1) << 2 | (S & 0b100) | (S & 0b1) << 1 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b1) << 1 | (S & 0b10) | (Y & 0b1)
|
|
|
|
|
* decode_msaa(8, IMS, X, Y, 0) = (X', Y', S)
|
|
|
|
|
* where X' = (X & ~0b111) >> 2 | (X & 0b1)
|
|
|
|
|
* Y' = (Y & ~0b11) >> 1 | (Y & 0b1)
|
|
|
|
|
* S = (X & 0b100) | (Y & 0b10) | (X & 0b10) >> 1
|
|
|
|
|
*
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* For X tiling, tile() combines together the low-order bits of the X and Y
|
|
|
|
|
* coordinates in the pattern 0byyyxxxxxxxxx, creating 4k tiles that are 512
|
|
|
|
|
* bytes wide and 8 rows high:
|
|
|
|
|
*
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile(x_tiled, X, Y, S) = A
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where A = tile_num << 12 | offset
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile_num = (Y' >> 3) * tile_pitch + (X' >> 9)
|
|
|
|
|
* offset = (Y' & 0b111) << 9
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* | (X & 0b111111111)
|
|
|
|
|
* X' = X * cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y' = Y + S * qpitch
|
|
|
|
|
* detile(x_tiled, A) = (X, Y, S)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where X = X' / cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y = Y' % qpitch
|
|
|
|
|
* S = Y' / qpitch
|
|
|
|
|
* Y' = (tile_num / tile_pitch) << 3
|
|
|
|
|
* | (A & 0b111000000000) >> 9
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* X' = (tile_num % tile_pitch) << 9
|
|
|
|
|
* | (A & 0b111111111)
|
|
|
|
|
*
|
|
|
|
|
* (In all tiling formulas, cpp is the number of bytes occupied by a single
|
2012-05-08 13:39:10 -07:00
|
|
|
* sample ("chars per pixel"), tile_pitch is the number of 4k tiles required
|
|
|
|
|
* to fill the width of the surface, and qpitch is the spacing (in rows)
|
|
|
|
|
* between array slices).
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*
|
|
|
|
|
* For Y tiling, tile() combines together the low-order bits of the X and Y
|
|
|
|
|
* coordinates in the pattern 0bxxxyyyyyxxxx, creating 4k tiles that are 128
|
|
|
|
|
* bytes wide and 32 rows high:
|
|
|
|
|
*
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile(y_tiled, X, Y, S) = A
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where A = tile_num << 12 | offset
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile_num = (Y' >> 5) * tile_pitch + (X' >> 7)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* offset = (X' & 0b1110000) << 5
|
|
|
|
|
* | (Y' & 0b11111) << 4
|
|
|
|
|
* | (X' & 0b1111)
|
|
|
|
|
* X' = X * cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y' = Y + S * qpitch
|
|
|
|
|
* detile(y_tiled, A) = (X, Y, S)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where X = X' / cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y = Y' % qpitch
|
|
|
|
|
* S = Y' / qpitch
|
|
|
|
|
* Y' = (tile_num / tile_pitch) << 5
|
|
|
|
|
* | (A & 0b111110000) >> 4
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* X' = (tile_num % tile_pitch) << 7
|
|
|
|
|
* | (A & 0b111000000000) >> 5
|
|
|
|
|
* | (A & 0b1111)
|
|
|
|
|
*
|
|
|
|
|
* For W tiling, tile() combines together the low-order bits of the X and Y
|
|
|
|
|
* coordinates in the pattern 0bxxxyyyyxyxyx, creating 4k tiles that are 64
|
|
|
|
|
* bytes wide and 64 rows high (note that W tiling is only used for stencil
|
2012-05-08 13:39:10 -07:00
|
|
|
* buffers, which always have cpp = 1 and S=0):
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile(w_tiled, X, Y, S) = A
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where A = tile_num << 12 | offset
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile_num = (Y' >> 6) * tile_pitch + (X' >> 6)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* offset = (X' & 0b111000) << 6
|
2012-05-08 13:39:10 -07:00
|
|
|
* | (Y' & 0b111100) << 3
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* | (X' & 0b100) << 2
|
2012-05-08 13:39:10 -07:00
|
|
|
* | (Y' & 0b10) << 2
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* | (X' & 0b10) << 1
|
2012-05-08 13:39:10 -07:00
|
|
|
* | (Y' & 0b1) << 1
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* | (X' & 0b1)
|
|
|
|
|
* X' = X * cpp = X
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y' = Y + S * qpitch
|
|
|
|
|
* detile(w_tiled, A) = (X, Y, S)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where X = X' / cpp = X'
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y = Y' % qpitch = Y'
|
|
|
|
|
* S = Y / qpitch = 0
|
|
|
|
|
* Y' = (tile_num / tile_pitch) << 6
|
|
|
|
|
* | (A & 0b111100000) >> 3
|
|
|
|
|
* | (A & 0b1000) >> 2
|
|
|
|
|
* | (A & 0b10) >> 1
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* X' = (tile_num % tile_pitch) << 6
|
|
|
|
|
* | (A & 0b111000000000) >> 6
|
|
|
|
|
* | (A & 0b10000) >> 2
|
|
|
|
|
* | (A & 0b100) >> 1
|
|
|
|
|
* | (A & 0b1)
|
|
|
|
|
*
|
|
|
|
|
* Finally, for a non-tiled surface, tile() simply combines together the X and
|
|
|
|
|
* Y coordinates in the natural way:
|
|
|
|
|
*
|
2012-05-08 13:39:10 -07:00
|
|
|
* tile(untiled, X, Y, S) = A
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where A = Y * pitch + X'
|
|
|
|
|
* X' = X * cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y' = Y + S * qpitch
|
|
|
|
|
* detile(untiled, A) = (X, Y, S)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* where X = X' / cpp
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y = Y' % qpitch
|
|
|
|
|
* S = Y' / qpitch
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* X' = A % pitch
|
2012-05-08 13:39:10 -07:00
|
|
|
* Y' = A / pitch
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*
|
|
|
|
|
* (In these formulas, pitch is the number of bytes occupied by a single row
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
* of samples).
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*/
|
2016-04-29 12:52:00 -07:00
|
|
|
static nir_shader *
|
2016-10-21 12:09:38 -07:00
|
|
|
brw_blorp_build_nir_shader(struct blorp_context *blorp, void *mem_ctx,
|
2016-08-08 15:33:43 -07:00
|
|
|
const struct brw_blorp_blit_prog_key *key)
|
2016-04-29 12:52:00 -07:00
|
|
|
{
|
2016-08-22 15:01:08 -07:00
|
|
|
const struct gen_device_info *devinfo = blorp->isl_dev->info;
|
2016-04-29 12:52:00 -07:00
|
|
|
nir_ssa_def *src_pos, *dst_pos, *color;
|
|
|
|
|
|
|
|
|
|
/* Sanity checks */
|
2016-06-23 11:35:50 -07:00
|
|
|
if (key->dst_tiled_w && key->rt_samples > 1) {
|
2016-04-29 12:52:00 -07:00
|
|
|
/* If the destination image is W tiled and multisampled, then the thread
|
|
|
|
|
* must be dispatched once per sample, not once per pixel. This is
|
|
|
|
|
* necessary because after conversion between W and Y tiling, there's no
|
|
|
|
|
* guarantee that all samples corresponding to a single pixel will still
|
|
|
|
|
* be together.
|
|
|
|
|
*/
|
|
|
|
|
assert(key->persample_msaa_dispatch);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (key->blend) {
|
|
|
|
|
/* We are blending, which means we won't have an opportunity to
|
|
|
|
|
* translate the tiling and sample count for the texture surface. So
|
|
|
|
|
* the surface state for the texture must be configured with the correct
|
|
|
|
|
* tiling and sample count.
|
|
|
|
|
*/
|
|
|
|
|
assert(!key->src_tiled_w);
|
|
|
|
|
assert(key->tex_samples == key->src_samples);
|
|
|
|
|
assert(key->tex_layout == key->src_layout);
|
|
|
|
|
assert(key->tex_samples > 0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (key->persample_msaa_dispatch) {
|
|
|
|
|
/* It only makes sense to do persample dispatch if the render target is
|
|
|
|
|
* configured as multisampled.
|
|
|
|
|
*/
|
|
|
|
|
assert(key->rt_samples > 0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Make sure layout is consistent with sample count */
|
2016-06-23 15:50:18 -07:00
|
|
|
assert((key->tex_layout == ISL_MSAA_LAYOUT_NONE) ==
|
2016-06-23 11:35:50 -07:00
|
|
|
(key->tex_samples <= 1));
|
2016-06-23 15:50:18 -07:00
|
|
|
assert((key->rt_layout == ISL_MSAA_LAYOUT_NONE) ==
|
2016-06-23 11:35:50 -07:00
|
|
|
(key->rt_samples <= 1));
|
2016-06-23 15:50:18 -07:00
|
|
|
assert((key->src_layout == ISL_MSAA_LAYOUT_NONE) ==
|
2016-06-23 11:35:50 -07:00
|
|
|
(key->src_samples <= 1));
|
2016-06-23 15:50:18 -07:00
|
|
|
assert((key->dst_layout == ISL_MSAA_LAYOUT_NONE) ==
|
2016-06-23 11:35:50 -07:00
|
|
|
(key->dst_samples <= 1));
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
nir_builder b;
|
2016-10-21 12:09:38 -07:00
|
|
|
nir_builder_init_simple_shader(&b, mem_ctx, MESA_SHADER_FRAGMENT, NULL);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
struct brw_blorp_blit_vars v;
|
|
|
|
|
brw_blorp_blit_vars_init(&b, &v, key);
|
|
|
|
|
|
|
|
|
|
dst_pos = blorp_blit_get_frag_coords(&b, key, &v);
|
|
|
|
|
|
|
|
|
|
/* Render target and texture hardware don't support W tiling until Gen8. */
|
|
|
|
|
const bool rt_tiled_w = false;
|
2016-08-19 00:54:56 -07:00
|
|
|
const bool tex_tiled_w = devinfo->gen >= 8 && key->src_tiled_w;
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
/* The address that data will be written to is determined by the
|
|
|
|
|
* coordinates supplied to the WM thread and the tiling and sample count of
|
|
|
|
|
* the render target, according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X, Y, S) = decode_msaa(rt_samples, detile(rt_tiling, offset))
|
|
|
|
|
*
|
|
|
|
|
* If the actual tiling and sample count of the destination surface are not
|
|
|
|
|
* the same as the configuration of the render target, then these
|
|
|
|
|
* coordinates are wrong and we have to adjust them to compensate for the
|
|
|
|
|
* difference.
|
|
|
|
|
*/
|
|
|
|
|
if (rt_tiled_w != key->dst_tiled_w ||
|
|
|
|
|
key->rt_samples != key->dst_samples ||
|
|
|
|
|
key->rt_layout != key->dst_layout) {
|
2016-05-02 12:13:14 -07:00
|
|
|
dst_pos = blorp_nir_encode_msaa(&b, dst_pos, key->rt_samples,
|
|
|
|
|
key->rt_layout);
|
|
|
|
|
/* Now (X, Y, S) = detile(rt_tiling, offset) */
|
2016-05-02 11:50:06 -07:00
|
|
|
if (rt_tiled_w != key->dst_tiled_w)
|
|
|
|
|
dst_pos = blorp_nir_retile_y_to_w(&b, dst_pos);
|
2016-05-02 12:13:14 -07:00
|
|
|
/* Now (X, Y, S) = detile(rt_tiling, offset) */
|
|
|
|
|
dst_pos = blorp_nir_decode_msaa(&b, dst_pos, key->dst_samples,
|
|
|
|
|
key->dst_layout);
|
2016-04-29 12:52:00 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Now (X, Y, S) = decode_msaa(dst_samples, detile(dst_tiling, offset)).
|
|
|
|
|
*
|
|
|
|
|
* That is: X, Y and S now contain the true coordinates and sample index of
|
|
|
|
|
* the data that the WM thread should output.
|
|
|
|
|
*
|
|
|
|
|
* If we need to kill pixels that are outside the destination rectangle,
|
|
|
|
|
* now is the time to do it.
|
|
|
|
|
*/
|
2016-05-17 09:27:49 +03:00
|
|
|
if (key->use_kill) {
|
|
|
|
|
assert(!(key->blend && key->blit_scaled));
|
2016-05-02 12:30:45 -07:00
|
|
|
blorp_nir_discard_if_outside_rect(&b, dst_pos, &v);
|
2016-05-17 09:27:49 +03:00
|
|
|
}
|
2016-04-29 12:52:00 -07:00
|
|
|
|
|
|
|
|
src_pos = blorp_blit_apply_transform(&b, nir_i2f(&b, dst_pos), &v);
|
2016-05-13 00:36:25 -07:00
|
|
|
if (dst_pos->num_components == 3) {
|
|
|
|
|
/* The sample coordinate is an integer that we want left alone but
|
|
|
|
|
* blorp_blit_apply_transform() blindly applies the transform to all
|
|
|
|
|
* three coordinates. Grab the original sample index.
|
|
|
|
|
*/
|
|
|
|
|
src_pos = nir_vec3(&b, nir_channel(&b, src_pos, 0),
|
|
|
|
|
nir_channel(&b, src_pos, 1),
|
|
|
|
|
nir_channel(&b, dst_pos, 2));
|
|
|
|
|
}
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-05-02 12:13:14 -07:00
|
|
|
/* If the source image is not multisampled, then we want to fetch sample
|
|
|
|
|
* number 0, because that's the only sample there is.
|
|
|
|
|
*/
|
2016-07-19 19:01:38 -07:00
|
|
|
if (key->src_samples == 1)
|
2016-05-02 12:13:14 -07:00
|
|
|
src_pos = nir_channels(&b, src_pos, 0x3);
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
/* X, Y, and S are now the coordinates of the pixel in the source image
|
|
|
|
|
* that we want to texture from. Exception: if we are blending, then S is
|
|
|
|
|
* irrelevant, because we are going to fetch all samples.
|
|
|
|
|
*/
|
|
|
|
|
if (key->blend && !key->blit_scaled) {
|
2016-05-13 00:36:25 -07:00
|
|
|
/* Resolves (effecively) use texelFetch, so we need integers and we
|
|
|
|
|
* don't care about the sample index if we got one.
|
|
|
|
|
*/
|
|
|
|
|
src_pos = nir_f2i(&b, nir_channels(&b, src_pos, 0x3));
|
2016-05-05 14:27:23 -07:00
|
|
|
|
2016-08-19 00:54:56 -07:00
|
|
|
if (devinfo->gen == 6) {
|
2016-05-03 16:22:46 -07:00
|
|
|
/* Because gen6 only supports 4x interleved MSAA, we can do all the
|
|
|
|
|
* blending we need with a single linear-interpolated texture lookup
|
|
|
|
|
* at the center of the sample. The texture coordinates to be odd
|
|
|
|
|
* integers so that they correspond to the center of a 2x2 block
|
|
|
|
|
* representing the four samples that maxe up a pixel. So we need
|
|
|
|
|
* to multiply our X and Y coordinates each by 2 and then add 1.
|
|
|
|
|
*/
|
|
|
|
|
src_pos = nir_ishl(&b, src_pos, nir_imm_int(&b, 1));
|
|
|
|
|
src_pos = nir_iadd(&b, src_pos, nir_imm_int(&b, 1));
|
2016-05-13 00:36:25 -07:00
|
|
|
src_pos = nir_i2f(&b, src_pos);
|
2016-06-28 14:10:49 -07:00
|
|
|
color = blorp_nir_tex(&b, &v, src_pos, key->texture_data_type);
|
2016-05-03 16:22:46 -07:00
|
|
|
} else {
|
|
|
|
|
/* Gen7+ hardware doesn't automaticaly blend. */
|
2016-06-28 14:10:49 -07:00
|
|
|
color = blorp_nir_manual_blend_average(&b, &v, src_pos, key->src_samples,
|
2016-06-23 15:17:15 -07:00
|
|
|
key->tex_aux_usage,
|
2016-05-03 16:22:46 -07:00
|
|
|
key->texture_data_type);
|
|
|
|
|
}
|
2016-04-29 12:52:00 -07:00
|
|
|
} else if (key->blend && key->blit_scaled) {
|
2016-05-17 09:27:49 +03:00
|
|
|
assert(!key->use_kill);
|
2016-05-05 11:01:16 -07:00
|
|
|
color = blorp_nir_manual_blend_bilinear(&b, src_pos, key->src_samples, key, &v);
|
2016-04-29 12:52:00 -07:00
|
|
|
} else {
|
|
|
|
|
if (key->bilinear_filter) {
|
2016-06-28 14:10:49 -07:00
|
|
|
color = blorp_nir_tex(&b, &v, src_pos, key->texture_data_type);
|
2016-04-29 12:52:00 -07:00
|
|
|
} else {
|
2016-05-05 14:27:23 -07:00
|
|
|
/* We're going to use texelFetch, so we need integers */
|
2016-05-13 00:36:25 -07:00
|
|
|
if (src_pos->num_components == 2) {
|
|
|
|
|
src_pos = nir_f2i(&b, src_pos);
|
|
|
|
|
} else {
|
|
|
|
|
assert(src_pos->num_components == 3);
|
|
|
|
|
src_pos = nir_vec3(&b, nir_channel(&b, nir_f2i(&b, src_pos), 0),
|
|
|
|
|
nir_channel(&b, nir_f2i(&b, src_pos), 1),
|
|
|
|
|
nir_channel(&b, src_pos, 2));
|
|
|
|
|
}
|
2016-05-05 14:27:23 -07:00
|
|
|
|
|
|
|
|
/* We aren't blending, which means we just want to fetch a single
|
|
|
|
|
* sample from the source surface. The address that we want to fetch
|
|
|
|
|
* from is related to the X, Y and S values according to the formula:
|
|
|
|
|
*
|
|
|
|
|
* (X, Y, S) = decode_msaa(src_samples, detile(src_tiling, offset)).
|
|
|
|
|
*
|
|
|
|
|
* If the actual tiling and sample count of the source surface are
|
|
|
|
|
* not the same as the configuration of the texture, then we need to
|
|
|
|
|
* adjust the coordinates to compensate for the difference.
|
|
|
|
|
*/
|
|
|
|
|
if (tex_tiled_w != key->src_tiled_w ||
|
|
|
|
|
key->tex_samples != key->src_samples ||
|
|
|
|
|
key->tex_layout != key->src_layout) {
|
|
|
|
|
src_pos = blorp_nir_encode_msaa(&b, src_pos, key->src_samples,
|
|
|
|
|
key->src_layout);
|
|
|
|
|
/* Now (X, Y, S) = detile(src_tiling, offset) */
|
|
|
|
|
if (tex_tiled_w != key->src_tiled_w)
|
|
|
|
|
src_pos = blorp_nir_retile_w_to_y(&b, src_pos);
|
|
|
|
|
/* Now (X, Y, S) = detile(tex_tiling, offset) */
|
|
|
|
|
src_pos = blorp_nir_decode_msaa(&b, src_pos, key->tex_samples,
|
|
|
|
|
key->tex_layout);
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-30 11:18:39 -07:00
|
|
|
if (key->need_src_offset)
|
|
|
|
|
src_pos = nir_iadd(&b, src_pos, nir_load_var(&b, v.v_src_offset));
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
/* Now (X, Y, S) = decode_msaa(tex_samples, detile(tex_tiling, offset)).
|
|
|
|
|
*
|
|
|
|
|
* In other words: X, Y, and S now contain values which, when passed to
|
|
|
|
|
* the texturing unit, will cause data to be read from the correct
|
|
|
|
|
* memory location. So we can fetch the texel now.
|
|
|
|
|
*/
|
2016-07-19 19:01:38 -07:00
|
|
|
if (key->src_samples == 1) {
|
2016-04-29 12:52:00 -07:00
|
|
|
color = blorp_nir_txf(&b, &v, src_pos, key->texture_data_type);
|
|
|
|
|
} else {
|
|
|
|
|
nir_ssa_def *mcs = NULL;
|
2016-06-23 15:17:15 -07:00
|
|
|
if (key->tex_aux_usage == ISL_AUX_USAGE_MCS)
|
2016-06-28 14:10:49 -07:00
|
|
|
mcs = blorp_nir_txf_ms_mcs(&b, &v, src_pos);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-06-28 14:10:49 -07:00
|
|
|
color = blorp_nir_txf_ms(&b, &v, src_pos, mcs, key->texture_data_type);
|
2016-04-29 12:52:00 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-27 12:07:31 -07:00
|
|
|
if (key->dst_rgb) {
|
|
|
|
|
/* The destination image is bound as a red texture three times as wide
|
|
|
|
|
* as the actual image. Our shader is effectively running one color
|
|
|
|
|
* component at a time. We need to pick off the appropriate component
|
|
|
|
|
* from the source color and write that to destination red.
|
|
|
|
|
*/
|
|
|
|
|
assert(dst_pos->num_components == 2);
|
|
|
|
|
nir_ssa_def *comp =
|
|
|
|
|
nir_umod(&b, nir_channel(&b, dst_pos, 0), nir_imm_int(&b, 3));
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *color_component =
|
|
|
|
|
nir_bcsel(&b, nir_ieq(&b, comp, nir_imm_int(&b, 0)),
|
|
|
|
|
nir_channel(&b, color, 0),
|
|
|
|
|
nir_bcsel(&b, nir_ieq(&b, comp, nir_imm_int(&b, 1)),
|
|
|
|
|
nir_channel(&b, color, 1),
|
|
|
|
|
nir_channel(&b, color, 2)));
|
|
|
|
|
|
|
|
|
|
nir_ssa_def *u = nir_ssa_undef(&b, 1, 32);
|
|
|
|
|
color = nir_vec4(&b, color_component, u, u, u);
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
nir_store_var(&b, v.color_out, color, 0xf);
|
|
|
|
|
|
|
|
|
|
return b.shader;
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-29 12:34:10 -07:00
|
|
|
static void
|
2016-08-19 00:54:56 -07:00
|
|
|
brw_blorp_get_blit_kernel(struct blorp_context *blorp,
|
2016-08-19 05:43:29 -07:00
|
|
|
struct blorp_params *params,
|
2016-04-29 12:34:10 -07:00
|
|
|
const struct brw_blorp_blit_prog_key *prog_key)
|
|
|
|
|
{
|
2016-08-19 00:54:56 -07:00
|
|
|
if (blorp->lookup_shader(blorp, prog_key, sizeof(*prog_key),
|
|
|
|
|
¶ms->wm_prog_kernel, ¶ms->wm_prog_data))
|
2016-04-29 12:34:10 -07:00
|
|
|
return;
|
|
|
|
|
|
2016-10-21 12:09:38 -07:00
|
|
|
void *mem_ctx = ralloc_context(NULL);
|
|
|
|
|
|
2016-04-29 12:52:00 -07:00
|
|
|
const unsigned *program;
|
|
|
|
|
unsigned program_size;
|
|
|
|
|
struct brw_blorp_prog_data prog_data;
|
|
|
|
|
|
2016-10-21 12:09:38 -07:00
|
|
|
nir_shader *nir = brw_blorp_build_nir_shader(blorp, mem_ctx, prog_key);
|
2016-05-05 14:37:53 -07:00
|
|
|
struct brw_wm_prog_key wm_key;
|
|
|
|
|
brw_blorp_init_wm_prog_key(&wm_key);
|
|
|
|
|
wm_key.tex.compressed_multisample_layout_mask =
|
2016-06-23 15:17:15 -07:00
|
|
|
prog_key->tex_aux_usage == ISL_AUX_USAGE_MCS;
|
2016-05-11 17:11:47 -07:00
|
|
|
wm_key.tex.msaa_16 = prog_key->tex_samples == 16;
|
2016-05-05 14:37:53 -07:00
|
|
|
wm_key.multisample_fbo = prog_key->rt_samples > 1;
|
|
|
|
|
|
2016-10-21 12:09:38 -07:00
|
|
|
program = blorp_compile_fs(blorp, mem_ctx, nir, &wm_key, false,
|
2016-10-21 12:04:25 -07:00
|
|
|
&prog_data, &program_size);
|
2016-04-29 12:52:00 -07:00
|
|
|
|
2016-08-19 00:54:56 -07:00
|
|
|
blorp->upload_shader(blorp, prog_key, sizeof(*prog_key),
|
|
|
|
|
program, program_size,
|
|
|
|
|
&prog_data, sizeof(prog_data),
|
|
|
|
|
¶ms->wm_prog_kernel, ¶ms->wm_prog_data);
|
2016-10-21 12:09:38 -07:00
|
|
|
|
|
|
|
|
ralloc_free(mem_ctx);
|
2016-04-29 12:34:10 -07:00
|
|
|
}
|
|
|
|
|
|
2016-04-21 18:10:53 -07:00
|
|
|
static void
|
|
|
|
|
brw_blorp_setup_coord_transform(struct brw_blorp_coord_transform *xform,
|
|
|
|
|
GLfloat src0, GLfloat src1,
|
|
|
|
|
GLfloat dst0, GLfloat dst1,
|
|
|
|
|
bool mirror)
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
{
|
2016-09-03 09:49:24 -07:00
|
|
|
double scale = (double)(src1 - src0) / (double)(dst1 - dst0);
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
if (!mirror) {
|
|
|
|
|
/* When not mirroring a coordinate (say, X), we need:
|
2013-05-14 17:20:02 -07:00
|
|
|
* src_x - src_x0 = (dst_x - dst_x0 + 0.5) * scale
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* Therefore:
|
2013-05-14 17:20:02 -07:00
|
|
|
* src_x = src_x0 + (dst_x - dst_x0 + 0.5) * scale
|
|
|
|
|
*
|
|
|
|
|
* blorp program uses "round toward zero" to convert the
|
|
|
|
|
* transformed floating point coordinates to integer coordinates,
|
|
|
|
|
* whereas the behaviour we actually want is "round to nearest",
|
|
|
|
|
* so 0.5 provides the necessary correction.
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*/
|
2016-04-21 18:10:53 -07:00
|
|
|
xform->multiplier = scale;
|
2016-09-03 09:49:24 -07:00
|
|
|
xform->offset = src0 + (-(double)dst0 + 0.5) * scale;
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
} else {
|
|
|
|
|
/* When mirroring X we need:
|
2013-05-14 17:20:02 -07:00
|
|
|
* src_x - src_x0 = dst_x1 - dst_x - 0.5
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
* Therefore:
|
2013-05-14 17:20:02 -07:00
|
|
|
* src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*/
|
2016-04-21 18:10:53 -07:00
|
|
|
xform->multiplier = -scale;
|
2016-09-03 09:49:24 -07:00
|
|
|
xform->offset = src0 + ((double)dst1 - 0.5) * scale;
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-30 11:18:39 -07:00
|
|
|
static inline void
|
|
|
|
|
surf_get_intratile_offset_px(struct brw_blorp_surface_info *info,
|
|
|
|
|
uint32_t *tile_x_px, uint32_t *tile_y_px)
|
|
|
|
|
{
|
|
|
|
|
if (info->surf.msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED) {
|
|
|
|
|
struct isl_extent2d px_size_sa =
|
|
|
|
|
isl_get_interleaved_msaa_px_size_sa(info->surf.samples);
|
|
|
|
|
assert(info->tile_x_sa % px_size_sa.width == 0);
|
|
|
|
|
assert(info->tile_y_sa % px_size_sa.height == 0);
|
|
|
|
|
*tile_x_px = info->tile_x_sa / px_size_sa.width;
|
|
|
|
|
*tile_y_px = info->tile_y_sa / px_size_sa.height;
|
|
|
|
|
} else {
|
|
|
|
|
*tile_x_px = info->tile_x_sa;
|
|
|
|
|
*tile_y_px = info->tile_y_sa;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-06-27 11:54:14 -07:00
|
|
|
static void
|
2016-08-19 00:54:56 -07:00
|
|
|
surf_convert_to_single_slice(const struct isl_device *isl_dev,
|
2016-06-27 11:54:14 -07:00
|
|
|
struct brw_blorp_surface_info *info)
|
|
|
|
|
{
|
|
|
|
|
/* Just bail if we have nothing to do. */
|
|
|
|
|
if (info->surf.dim == ISL_SURF_DIM_2D &&
|
|
|
|
|
info->view.base_level == 0 && info->view.base_array_layer == 0 &&
|
2016-08-31 12:58:54 -07:00
|
|
|
info->surf.levels == 1 && info->surf.logical_level0_px.array_len == 1)
|
2016-06-27 11:54:14 -07:00
|
|
|
return;
|
|
|
|
|
|
2016-08-31 12:58:54 -07:00
|
|
|
/* If this gets triggered then we've gotten here twice which. This
|
|
|
|
|
* shouldn't happen thanks to the above early return.
|
|
|
|
|
*/
|
|
|
|
|
assert(info->tile_x_sa == 0 && info->tile_y_sa == 0);
|
|
|
|
|
|
2016-08-29 09:48:10 -07:00
|
|
|
uint32_t layer = 0, z = 0;
|
|
|
|
|
if (info->surf.dim == ISL_SURF_DIM_3D)
|
|
|
|
|
z = info->view.base_array_layer + info->z_offset;
|
|
|
|
|
else
|
|
|
|
|
layer = info->view.base_array_layer;
|
|
|
|
|
|
2016-06-27 11:54:14 -07:00
|
|
|
uint32_t x_offset_sa, y_offset_sa;
|
2016-07-19 19:59:16 -07:00
|
|
|
isl_surf_get_image_offset_sa(&info->surf, info->view.base_level,
|
2016-08-29 09:48:10 -07:00
|
|
|
layer, z, &x_offset_sa, &y_offset_sa);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
2016-07-22 14:24:06 -07:00
|
|
|
uint32_t byte_offset;
|
2016-08-19 00:54:56 -07:00
|
|
|
isl_tiling_get_intratile_offset_sa(isl_dev, info->surf.tiling,
|
2016-08-27 22:44:15 -07:00
|
|
|
info->surf.format, info->surf.row_pitch,
|
2016-06-27 11:54:14 -07:00
|
|
|
x_offset_sa, y_offset_sa,
|
2016-07-22 14:24:06 -07:00
|
|
|
&byte_offset,
|
2016-06-27 11:54:14 -07:00
|
|
|
&info->tile_x_sa, &info->tile_y_sa);
|
2016-08-18 02:19:29 -07:00
|
|
|
info->addr.offset += byte_offset;
|
2016-06-27 11:54:14 -07:00
|
|
|
|
2016-08-30 11:18:39 -07:00
|
|
|
const uint32_t slice_width_px =
|
|
|
|
|
minify(info->surf.logical_level0_px.width, info->view.base_level);
|
|
|
|
|
const uint32_t slice_height_px =
|
|
|
|
|
minify(info->surf.logical_level0_px.height, info->view.base_level);
|
|
|
|
|
|
|
|
|
|
uint32_t tile_x_px, tile_y_px;
|
|
|
|
|
surf_get_intratile_offset_px(info, &tile_x_px, &tile_y_px);
|
|
|
|
|
|
2016-06-27 11:54:14 -07:00
|
|
|
/* TODO: Once this file gets converted to C, we shouls just use designated
|
|
|
|
|
* initializers.
|
|
|
|
|
*/
|
2016-08-08 15:33:43 -07:00
|
|
|
struct isl_surf_init_info init_info = { 0, };
|
2016-06-27 11:54:14 -07:00
|
|
|
|
|
|
|
|
init_info.dim = ISL_SURF_DIM_2D;
|
2016-08-27 22:06:11 -07:00
|
|
|
init_info.format = info->surf.format;
|
2016-08-30 11:18:39 -07:00
|
|
|
init_info.width = slice_width_px + tile_x_px;
|
|
|
|
|
init_info.height = slice_height_px + tile_y_px;
|
2016-06-27 11:54:14 -07:00
|
|
|
init_info.depth = 1;
|
|
|
|
|
init_info.levels = 1;
|
|
|
|
|
init_info.array_len = 1;
|
|
|
|
|
init_info.samples = info->surf.samples;
|
|
|
|
|
init_info.min_pitch = info->surf.row_pitch;
|
|
|
|
|
init_info.usage = info->surf.usage;
|
|
|
|
|
init_info.tiling_flags = 1 << info->surf.tiling;
|
|
|
|
|
|
2016-08-19 00:54:56 -07:00
|
|
|
isl_surf_init_s(isl_dev, &info->surf, &init_info);
|
2016-06-27 11:54:14 -07:00
|
|
|
assert(info->surf.row_pitch == init_info.min_pitch);
|
|
|
|
|
|
|
|
|
|
/* The view is also different now. */
|
|
|
|
|
info->view.base_level = 0;
|
|
|
|
|
info->view.levels = 1;
|
|
|
|
|
info->view.base_array_layer = 0;
|
|
|
|
|
info->view.array_len = 1;
|
2016-08-29 09:48:10 -07:00
|
|
|
info->z_offset = 0;
|
2016-06-27 11:54:14 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
2016-08-19 00:54:56 -07:00
|
|
|
surf_fake_interleaved_msaa(const struct isl_device *isl_dev,
|
2016-06-27 11:54:14 -07:00
|
|
|
struct brw_blorp_surface_info *info)
|
|
|
|
|
{
|
|
|
|
|
assert(info->surf.msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED);
|
|
|
|
|
|
|
|
|
|
/* First, we need to convert it to a simple 1-level 1-layer 2-D surface */
|
2016-08-19 00:54:56 -07:00
|
|
|
surf_convert_to_single_slice(isl_dev, info);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
|
|
|
|
info->surf.logical_level0_px = info->surf.phys_level0_sa;
|
|
|
|
|
info->surf.samples = 1;
|
|
|
|
|
info->surf.msaa_layout = ISL_MSAA_LAYOUT_NONE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
2016-08-19 00:54:56 -07:00
|
|
|
surf_retile_w_to_y(const struct isl_device *isl_dev,
|
2016-06-27 11:54:14 -07:00
|
|
|
struct brw_blorp_surface_info *info)
|
|
|
|
|
{
|
|
|
|
|
assert(info->surf.tiling == ISL_TILING_W);
|
|
|
|
|
|
|
|
|
|
/* First, we need to convert it to a simple 1-level 1-layer 2-D surface */
|
2016-08-19 00:54:56 -07:00
|
|
|
surf_convert_to_single_slice(isl_dev, info);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
|
|
|
|
/* On gen7+, we don't have interleaved multisampling for color render
|
|
|
|
|
* targets so we have to fake it.
|
|
|
|
|
*
|
|
|
|
|
* TODO: Are we sure we don't also need to fake it on gen6?
|
|
|
|
|
*/
|
2016-08-19 00:54:56 -07:00
|
|
|
if (isl_dev->info->gen > 6 &&
|
|
|
|
|
info->surf.msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED) {
|
2016-08-29 18:05:11 -07:00
|
|
|
surf_fake_interleaved_msaa(isl_dev, info);
|
2016-06-27 11:54:14 -07:00
|
|
|
}
|
|
|
|
|
|
2016-08-19 00:54:56 -07:00
|
|
|
if (isl_dev->info->gen == 6) {
|
2016-06-27 11:54:14 -07:00
|
|
|
/* Gen6 stencil buffers have a very large alignment coming in from the
|
|
|
|
|
* miptree. It's out-of-bounds for what the surface state can handle.
|
|
|
|
|
* Since we have a single layer and level, it doesn't really matter as
|
|
|
|
|
* long as we don't pass a bogus value into isl_surf_fill_state().
|
|
|
|
|
*/
|
|
|
|
|
info->surf.image_alignment_el = isl_extent3d(4, 2, 1);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Now that we've converted everything to a simple 2-D surface with only
|
|
|
|
|
* one miplevel, we can go about retiling it.
|
|
|
|
|
*/
|
|
|
|
|
const unsigned x_align = 8, y_align = info->surf.samples != 0 ? 8 : 4;
|
|
|
|
|
info->surf.tiling = ISL_TILING_Y0;
|
|
|
|
|
info->surf.logical_level0_px.width =
|
|
|
|
|
ALIGN(info->surf.logical_level0_px.width, x_align) * 2;
|
|
|
|
|
info->surf.logical_level0_px.height =
|
|
|
|
|
ALIGN(info->surf.logical_level0_px.height, y_align) / 2;
|
|
|
|
|
info->tile_x_sa *= 2;
|
|
|
|
|
info->tile_y_sa /= 2;
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
static void
|
|
|
|
|
do_blorp_blit(struct blorp_batch *batch,
|
|
|
|
|
struct blorp_params *params,
|
|
|
|
|
struct brw_blorp_blit_prog_key *wm_prog_key,
|
|
|
|
|
float src_x0, float src_y0,
|
|
|
|
|
float src_x1, float src_y1,
|
|
|
|
|
float dst_x0, float dst_y0,
|
|
|
|
|
float dst_x1, float dst_y1,
|
|
|
|
|
bool mirror_x, bool mirror_y)
|
2016-07-19 19:19:12 -07:00
|
|
|
{
|
2016-08-22 15:01:08 -07:00
|
|
|
const struct gen_device_info *devinfo = batch->blorp->isl_dev->info;
|
2016-08-19 00:54:56 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (isl_format_has_sint_channel(params->src.view.format)) {
|
|
|
|
|
wm_prog_key->texture_data_type = nir_type_int;
|
|
|
|
|
} else if (isl_format_has_uint_channel(params->src.view.format)) {
|
|
|
|
|
wm_prog_key->texture_data_type = nir_type_uint;
|
2016-07-19 19:01:38 -07:00
|
|
|
} else {
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->texture_data_type = nir_type_float;
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* src_samples and dst_samples are the true sample counts */
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->src_samples = params->src.surf.samples;
|
|
|
|
|
wm_prog_key->dst_samples = params->dst.surf.samples;
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->tex_aux_usage = params->src.aux_usage;
|
2016-06-23 15:17:15 -07:00
|
|
|
|
2012-07-04 05:48:25 -07:00
|
|
|
/* src_layout and dst_layout indicate the true MSAA layout used by src and
|
|
|
|
|
* dst.
|
2012-05-08 13:39:10 -07:00
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->src_layout = params->src.surf.msaa_layout;
|
|
|
|
|
wm_prog_key->dst_layout = params->dst.surf.msaa_layout;
|
2016-04-18 08:51:10 +03:00
|
|
|
|
2015-01-22 16:01:57 +01:00
|
|
|
/* Round floating point values to nearest integer to avoid "off by one texel"
|
|
|
|
|
* kind of errors when blitting.
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
params->x0 = params->wm_inputs.discard_rect.x0 = roundf(dst_x0);
|
|
|
|
|
params->y0 = params->wm_inputs.discard_rect.y0 = roundf(dst_y0);
|
|
|
|
|
params->x1 = params->wm_inputs.discard_rect.x1 = roundf(dst_x1);
|
|
|
|
|
params->y1 = params->wm_inputs.discard_rect.y1 = roundf(dst_y1);
|
2016-05-17 09:27:49 +03:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
brw_blorp_setup_coord_transform(¶ms->wm_inputs.coord_transform[0],
|
2016-04-21 18:10:53 -07:00
|
|
|
src_x0, src_x1, dst_x0, dst_x1, mirror_x);
|
2016-08-30 12:49:54 -07:00
|
|
|
brw_blorp_setup_coord_transform(¶ms->wm_inputs.coord_transform[1],
|
2016-04-21 18:10:53 -07:00
|
|
|
src_y0, src_y1, dst_y0, dst_y1, mirror_y);
|
2016-04-22 14:06:08 -07:00
|
|
|
|
2016-08-19 00:54:56 -07:00
|
|
|
if (devinfo->gen > 6 &&
|
2016-08-30 12:49:54 -07:00
|
|
|
params->dst.surf.msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED) {
|
|
|
|
|
assert(params->dst.surf.samples > 1);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
/* We must expand the rectangle we send through the rendering pipeline,
|
|
|
|
|
* to account for the fact that we are mapping the destination region as
|
|
|
|
|
* single-sampled when it is in fact multisampled. We must also align
|
|
|
|
|
* it to a multiple of the multisampling pattern, because the
|
|
|
|
|
* differences between multisampled and single-sampled surface formats
|
|
|
|
|
* will mean that pixels are scrambled within the multisampling pattern.
|
|
|
|
|
* TODO: what if this makes the coordinates too large?
|
2012-05-08 13:39:10 -07:00
|
|
|
*
|
2012-07-04 05:48:25 -07:00
|
|
|
* Note: this only works if the destination surface uses the IMS layout.
|
|
|
|
|
* If it's UMS, then we have no choice but to set up the rendering
|
|
|
|
|
* pipeline as multisampled.
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
*/
|
2016-08-29 17:52:52 -07:00
|
|
|
struct isl_extent2d px_size_sa =
|
2016-08-30 12:49:54 -07:00
|
|
|
isl_get_interleaved_msaa_px_size_sa(params->dst.surf.samples);
|
|
|
|
|
params->x0 = ROUND_DOWN_TO(params->x0, 2) * px_size_sa.width;
|
|
|
|
|
params->y0 = ROUND_DOWN_TO(params->y0, 2) * px_size_sa.height;
|
|
|
|
|
params->x1 = ALIGN(params->x1, 2) * px_size_sa.width;
|
|
|
|
|
params->y1 = ALIGN(params->y1, 2) * px_size_sa.height;
|
2016-06-23 17:06:37 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
surf_fake_interleaved_msaa(batch->blorp->isl_dev, ¶ms->dst);
|
2016-06-23 17:06:37 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->use_kill = true;
|
|
|
|
|
wm_prog_key->need_dst_offset = true;
|
i965/gen6: Initial implementation of MSAA.
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
2012-04-29 21:41:42 -07:00
|
|
|
}
|
|
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (params->dst.surf.tiling == ISL_TILING_W) {
|
2012-08-29 11:51:14 -07:00
|
|
|
/* We must modify the rectangle we send through the rendering pipeline
|
2012-08-30 08:01:54 -07:00
|
|
|
* (and the size and x/y offset of the destination surface), to account
|
|
|
|
|
* for the fact that we are mapping it as Y-tiled when it is in fact
|
|
|
|
|
* W-tiled.
|
2012-08-29 14:26:48 -07:00
|
|
|
*
|
|
|
|
|
* Both Y tiling and W tiling can be understood as organizations of
|
|
|
|
|
* 32-byte sub-tiles; within each 32-byte sub-tile, the layout of pixels
|
|
|
|
|
* is different, but the layout of the 32-byte sub-tiles within the 4k
|
|
|
|
|
* tile is the same (8 sub-tiles across by 16 sub-tiles down, in
|
|
|
|
|
* column-major order). In Y tiling, the sub-tiles are 16 bytes wide
|
|
|
|
|
* and 2 rows high; in W tiling, they are 8 bytes wide and 4 rows high.
|
|
|
|
|
*
|
|
|
|
|
* Therefore, to account for the layout differences within the 32-byte
|
|
|
|
|
* sub-tiles, we must expand the rectangle so the X coordinates of its
|
|
|
|
|
* edges are multiples of 8 (the W sub-tile width), and its Y
|
|
|
|
|
* coordinates of its edges are multiples of 4 (the W sub-tile height).
|
|
|
|
|
* Then we need to scale the X and Y coordinates of the rectangle to
|
|
|
|
|
* account for the differences in aspect ratio between the Y and W
|
2012-08-30 08:01:54 -07:00
|
|
|
* sub-tiles. We need to modify the layer width and height similarly.
|
|
|
|
|
*
|
i965/blorp: Increase Y alignment for multisampled stencil blits.
This patch is a band-aid fix for a bug in commit 5fd67fa (i965/blorp:
Reduce alignment restrictions for stencil blits), which causes
multisampled stencil blits to work incorrectly on Sandy Bridge.
When blitting to or from a normal stencil buffer, we have to use a
coordinate transformation that swizzles coordinates to account for the
fact that stencil buffers use W tiling, but the most similar tiling
format available for textures and render targets is Y tiling. The
differences between W and Y tiling cause pixels to be scrambled within
a block of size 8x4 (width x height) as measured relative to a W tile,
or 16x2 as measured relative to a Y tile. So in order to make sure
that pixels at the edges of the blit aren't lost, we need to align the
rendering rectangle (and the buffer sizes) to multiples of the 8x4
block size. This alignment happens in the brw_blorp_blit_params
constructor, whereas the determination of how to swizzle the
coordinates happens during code generation, in the
brw_blorp_blit_program class.
When blitting to or from a multisampled stencil buffer, the coordinate
swizzling is more complex, because it has to account for the
interleaving pattern of samples, which uses 4x4 blocks for 4x MSAA and
8x4 blocks for 8x MSAA. The end result is that if multisampling is in
use, the 16x2 block size (relative so a Y tile) needs to be expanded
to 16x4, and the corresponding size relative to a W tile expands to
8x8.
The problem doesn't affect Ivy Bridge severely enough to crop up in
Piglit tests because on Ivy Bridge we have to disable multisampling
when blitting *to* a multisampled stencil buffer (the blorp compiler
generates code to compensate for the fact that multisampling is
disabled). However I suspect a bug is still present because we don't
disable multisampling when blitting *from* a multisampled stencil
buffer.
This patch fixes the problem by doubling the vertical alignment
requirement when blitting to or from a multisampled stencil buffer,
and multisampling has not been disabled.
In the long run I would like to rework the brw_blorp_blit_params
constructor--it's difficult to follow and has had several subtle bugs
like this one. However this band-aid fix should be suitable for
cherry-picking to release branches.
Fixes Piglit tests "unaligned-blit {2,4} stencil {msaa,upsample}" on
Sandy Bridge.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-09-12 11:13:49 -07:00
|
|
|
* A correction needs to be applied when MSAA is in use: since
|
|
|
|
|
* INTEL_MSAA_LAYOUT_IMS uses an interleaving pattern whose height is 4,
|
|
|
|
|
* we need to align the Y coordinates to multiples of 8, so that when
|
|
|
|
|
* they are divided by two they are still multiples of 4.
|
|
|
|
|
*
|
2012-08-30 08:01:54 -07:00
|
|
|
* Note: Since the x/y offset of the surface will be applied using the
|
|
|
|
|
* SURFACE_STATE command packet, it will be invisible to the swizzling
|
|
|
|
|
* code in the shader; therefore it needs to be in a multiple of the
|
|
|
|
|
* 32-byte sub-tile size. Fortunately it is, since the sub-tile is 8
|
|
|
|
|
* pixels wide and 4 pixels high (when viewed as a W-tiled stencil
|
|
|
|
|
* buffer), and the miplevel alignment used for stencil buffers is 8
|
|
|
|
|
* pixels horizontally and either 4 or 8 pixels vertically (see
|
|
|
|
|
* intel_horizontal_texture_alignment_unit() and
|
|
|
|
|
* intel_vertical_texture_alignment_unit()).
|
|
|
|
|
*
|
|
|
|
|
* Note: Also, since the SURFACE_STATE command packet can only apply
|
|
|
|
|
* offsets that are multiples of 4 pixels horizontally and 2 pixels
|
|
|
|
|
* vertically, it is important that the offsets will be multiples of
|
|
|
|
|
* these sizes after they are converted into Y-tiled coordinates.
|
|
|
|
|
* Fortunately they will be, since we know from above that the offsets
|
|
|
|
|
* are a multiple of the 32-byte sub-tile size, and in Y-tiled
|
|
|
|
|
* coordinates the sub-tile is 16 pixels wide and 2 pixels high.
|
2012-05-09 08:29:33 -07:00
|
|
|
*
|
2012-08-29 11:51:14 -07:00
|
|
|
* TODO: what if this makes the coordinates (or the texture size) too
|
|
|
|
|
* large?
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
const unsigned x_align = 8;
|
|
|
|
|
const unsigned y_align = params->dst.surf.samples != 0 ? 8 : 4;
|
|
|
|
|
params->x0 = ROUND_DOWN_TO(params->x0, x_align) * 2;
|
|
|
|
|
params->y0 = ROUND_DOWN_TO(params->y0, y_align) / 2;
|
|
|
|
|
params->x1 = ALIGN(params->x1, x_align) * 2;
|
|
|
|
|
params->y1 = ALIGN(params->y1, y_align) / 2;
|
2016-06-27 11:54:14 -07:00
|
|
|
|
|
|
|
|
/* Retile the surface to Y-tiled */
|
2016-08-30 12:49:54 -07:00
|
|
|
surf_retile_w_to_y(batch->blorp->isl_dev, ¶ms->dst);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->dst_tiled_w = true;
|
|
|
|
|
wm_prog_key->use_kill = true;
|
|
|
|
|
wm_prog_key->need_dst_offset = true;
|
2016-06-27 11:54:14 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (params->dst.surf.samples > 1) {
|
2016-06-27 11:54:14 -07:00
|
|
|
/* If the destination surface is a W-tiled multisampled stencil
|
|
|
|
|
* buffer that we're mapping as Y tiled, then we need to arrange for
|
|
|
|
|
* the WM program to run once per sample rather than once per pixel,
|
|
|
|
|
* because the memory layout of related samples doesn't match between
|
|
|
|
|
* W and Y tiling.
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->persample_msaa_dispatch = true;
|
2016-06-27 11:54:14 -07:00
|
|
|
}
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
}
|
2012-08-29 11:51:14 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (devinfo->gen < 8 && params->src.surf.tiling == ISL_TILING_W) {
|
2016-06-22 16:46:20 -07:00
|
|
|
/* On Haswell and earlier, we have to fake W-tiled sources as Y-tiled.
|
|
|
|
|
* Broadwell adds support for sampling from stencil.
|
2012-08-30 08:01:54 -07:00
|
|
|
*
|
|
|
|
|
* See the comments above concerning x/y offset alignment for the
|
|
|
|
|
* destination surface.
|
2012-08-29 11:51:14 -07:00
|
|
|
*
|
|
|
|
|
* TODO: what if this makes the texture size too large?
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
surf_retile_w_to_y(batch->blorp->isl_dev, ¶ms->src);
|
2016-06-27 11:54:14 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->src_tiled_w = true;
|
|
|
|
|
wm_prog_key->need_src_offset = true;
|
2012-08-29 11:51:14 -07:00
|
|
|
}
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
|
2016-06-23 17:06:37 -07:00
|
|
|
/* tex_samples and rt_samples are the sample counts that are set up in
|
|
|
|
|
* SURFACE_STATE.
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->tex_samples = params->src.surf.samples;
|
|
|
|
|
wm_prog_key->rt_samples = params->dst.surf.samples;
|
2016-06-23 17:06:37 -07:00
|
|
|
|
|
|
|
|
/* tex_layout and rt_layout indicate the MSAA layout the GPU pipeline will
|
|
|
|
|
* use to access the source and destination surfaces.
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->tex_layout = params->src.surf.msaa_layout;
|
|
|
|
|
wm_prog_key->rt_layout = params->dst.surf.msaa_layout;
|
2016-06-23 17:06:37 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (params->src.surf.samples > 0 && params->dst.surf.samples > 1) {
|
2016-06-23 17:06:37 -07:00
|
|
|
/* We are blitting from a multisample buffer to a multisample buffer, so
|
|
|
|
|
* we must preserve samples within a pixel. This means we have to
|
|
|
|
|
* arrange for the WM program to run once per sample rather than once
|
|
|
|
|
* per pixel.
|
|
|
|
|
*/
|
2016-08-30 12:49:54 -07:00
|
|
|
wm_prog_key->persample_msaa_dispatch = true;
|
2016-06-23 17:06:37 -07:00
|
|
|
}
|
|
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (params->src.tile_x_sa || params->src.tile_y_sa) {
|
|
|
|
|
assert(wm_prog_key->need_src_offset);
|
|
|
|
|
surf_get_intratile_offset_px(¶ms->src,
|
|
|
|
|
¶ms->wm_inputs.src_offset.x,
|
|
|
|
|
¶ms->wm_inputs.src_offset.y);
|
2016-08-30 11:18:39 -07:00
|
|
|
}
|
|
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
if (params->dst.tile_x_sa || params->dst.tile_y_sa) {
|
|
|
|
|
assert(wm_prog_key->need_dst_offset);
|
|
|
|
|
surf_get_intratile_offset_px(¶ms->dst,
|
|
|
|
|
¶ms->wm_inputs.dst_offset.x,
|
|
|
|
|
¶ms->wm_inputs.dst_offset.y);
|
|
|
|
|
params->x0 += params->wm_inputs.dst_offset.x;
|
|
|
|
|
params->y0 += params->wm_inputs.dst_offset.y;
|
|
|
|
|
params->x1 += params->wm_inputs.dst_offset.x;
|
|
|
|
|
params->y1 += params->wm_inputs.dst_offset.y;
|
2016-08-30 11:18:39 -07:00
|
|
|
}
|
|
|
|
|
|
2016-08-29 09:48:10 -07:00
|
|
|
/* For some texture types, we need to pass the layer through the sampler. */
|
2016-08-30 12:49:54 -07:00
|
|
|
params->wm_inputs.src_z = params->src.z_offset;
|
|
|
|
|
|
|
|
|
|
brw_blorp_get_blit_kernel(batch->blorp, params, wm_prog_key);
|
2016-08-29 09:48:10 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
batch->blorp->exec(batch, params);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void
|
|
|
|
|
blorp_blit(struct blorp_batch *batch,
|
|
|
|
|
const struct blorp_surf *src_surf,
|
|
|
|
|
unsigned src_level, unsigned src_layer,
|
|
|
|
|
enum isl_format src_format, struct isl_swizzle src_swizzle,
|
|
|
|
|
const struct blorp_surf *dst_surf,
|
|
|
|
|
unsigned dst_level, unsigned dst_layer,
|
|
|
|
|
enum isl_format dst_format, struct isl_swizzle dst_swizzle,
|
|
|
|
|
float src_x0, float src_y0,
|
|
|
|
|
float src_x1, float src_y1,
|
|
|
|
|
float dst_x0, float dst_y0,
|
|
|
|
|
float dst_x1, float dst_y1,
|
|
|
|
|
GLenum filter, bool mirror_x, bool mirror_y)
|
|
|
|
|
{
|
|
|
|
|
struct blorp_params params;
|
|
|
|
|
blorp_params_init(¶ms);
|
|
|
|
|
|
|
|
|
|
brw_blorp_surface_info_init(batch->blorp, ¶ms.src, src_surf, src_level,
|
|
|
|
|
src_layer, src_format, false);
|
|
|
|
|
brw_blorp_surface_info_init(batch->blorp, ¶ms.dst, dst_surf, dst_level,
|
|
|
|
|
dst_layer, dst_format, true);
|
2016-04-22 14:06:08 -07:00
|
|
|
|
2016-08-27 21:48:40 -07:00
|
|
|
params.src.view.swizzle = src_swizzle;
|
2016-08-27 21:57:51 -07:00
|
|
|
params.dst.view.swizzle = dst_swizzle;
|
2016-04-22 14:06:08 -07:00
|
|
|
|
2016-08-30 12:49:54 -07:00
|
|
|
struct brw_blorp_blit_prog_key wm_prog_key;
|
|
|
|
|
memset(&wm_prog_key, 0, sizeof(wm_prog_key));
|
|
|
|
|
|
|
|
|
|
/* Scaled blitting or not. */
|
|
|
|
|
wm_prog_key.blit_scaled =
|
|
|
|
|
((dst_x1 - dst_x0) == (src_x1 - src_x0) &&
|
|
|
|
|
(dst_y1 - dst_y0) == (src_y1 - src_y0)) ? false : true;
|
|
|
|
|
|
|
|
|
|
/* Scaling factors used for bilinear filtering in multisample scaled
|
|
|
|
|
* blits.
|
|
|
|
|
*/
|
|
|
|
|
if (params.src.surf.samples == 16)
|
|
|
|
|
wm_prog_key.x_scale = 4.0f;
|
|
|
|
|
else
|
|
|
|
|
wm_prog_key.x_scale = 2.0f;
|
|
|
|
|
wm_prog_key.y_scale = params.src.surf.samples / wm_prog_key.x_scale;
|
|
|
|
|
|
|
|
|
|
if (filter == GL_LINEAR &&
|
|
|
|
|
params.src.surf.samples <= 1 && params.dst.surf.samples <= 1)
|
|
|
|
|
wm_prog_key.bilinear_filter = true;
|
|
|
|
|
|
|
|
|
|
if ((params.src.surf.usage & ISL_SURF_USAGE_DEPTH_BIT) == 0 &&
|
|
|
|
|
(params.src.surf.usage & ISL_SURF_USAGE_STENCIL_BIT) == 0 &&
|
|
|
|
|
!isl_format_has_int_channel(params.src.surf.format) &&
|
|
|
|
|
params.src.surf.samples > 1 && params.dst.surf.samples <= 1) {
|
|
|
|
|
/* We are downsampling a non-integer color buffer, so blend.
|
|
|
|
|
*
|
|
|
|
|
* Regarding integer color buffers, the OpenGL ES 3.2 spec says:
|
|
|
|
|
*
|
|
|
|
|
* "If the source formats are integer types or stencil values, a
|
|
|
|
|
* single sample's value is selected for each pixel."
|
|
|
|
|
*
|
|
|
|
|
* This implies we should not blend in that case.
|
|
|
|
|
*/
|
|
|
|
|
wm_prog_key.blend = true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
params.wm_inputs.rect_grid.x1 =
|
|
|
|
|
minify(params.src.surf.logical_level0_px.width, src_level) *
|
|
|
|
|
wm_prog_key.x_scale - 1.0f;
|
|
|
|
|
params.wm_inputs.rect_grid.y1 =
|
|
|
|
|
minify(params.src.surf.logical_level0_px.height, src_level) *
|
|
|
|
|
wm_prog_key.y_scale - 1.0f;
|
|
|
|
|
|
|
|
|
|
do_blorp_blit(batch, ¶ms, &wm_prog_key,
|
|
|
|
|
src_x0, src_y0, src_x1, src_y1,
|
|
|
|
|
dst_x0, dst_y0, dst_x1, dst_y1,
|
|
|
|
|
mirror_x, mirror_y);
|
i965/gen6+: Add code to perform blits on the render path ("blorp").
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
2012-04-29 22:44:25 -07:00
|
|
|
}
|
2016-08-30 13:13:43 -07:00
|
|
|
|
|
|
|
|
static enum isl_format
|
|
|
|
|
get_copy_format_for_bpb(unsigned bpb)
|
|
|
|
|
{
|
|
|
|
|
/* The choice of UNORM and UINT formats is very intentional here. Most of
|
|
|
|
|
* the time, we want to use a UINT format to avoid any rounding error in
|
|
|
|
|
* the blit. For stencil blits, R8_UINT is required by the hardware.
|
|
|
|
|
* (It's the only format allowed in conjunction with W-tiling.) Also we
|
|
|
|
|
* intentionally use the 4-channel formats whenever we can. This is so
|
|
|
|
|
* that, when we do a RGB <-> RGBX copy, the two formats will line up even
|
|
|
|
|
* though one of them is 3/4 the size of the other. The choice of UNORM
|
|
|
|
|
* vs. UINT is also very intentional because Haswell doesn't handle 8 or
|
|
|
|
|
* 16-bit RGB UINT formats at all so we have to use UNORM there.
|
|
|
|
|
* Fortunately, the only time we should ever use two different formats in
|
|
|
|
|
* the table below is for RGB -> RGBA blits and so we will never have any
|
|
|
|
|
* UNORM/UINT mismatch.
|
|
|
|
|
*/
|
|
|
|
|
switch (bpb) {
|
|
|
|
|
case 8: return ISL_FORMAT_R8_UINT;
|
|
|
|
|
case 16: return ISL_FORMAT_R8G8_UINT;
|
|
|
|
|
case 24: return ISL_FORMAT_R8G8B8_UNORM;
|
|
|
|
|
case 32: return ISL_FORMAT_R8G8B8A8_UNORM;
|
|
|
|
|
case 48: return ISL_FORMAT_R16G16B16_UNORM;
|
|
|
|
|
case 64: return ISL_FORMAT_R16G16B16A16_UNORM;
|
|
|
|
|
case 96: return ISL_FORMAT_R32G32B32_UINT;
|
|
|
|
|
case 128:return ISL_FORMAT_R32G32B32A32_UINT;
|
|
|
|
|
default:
|
|
|
|
|
unreachable("Unknown format bpb");
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
surf_convert_to_uncompressed(const struct isl_device *isl_dev,
|
|
|
|
|
struct brw_blorp_surface_info *info,
|
|
|
|
|
uint32_t *x, uint32_t *y,
|
|
|
|
|
uint32_t *width, uint32_t *height)
|
|
|
|
|
{
|
|
|
|
|
const struct isl_format_layout *fmtl =
|
|
|
|
|
isl_format_get_layout(info->surf.format);
|
|
|
|
|
|
|
|
|
|
assert(fmtl->bw > 1 || fmtl->bh > 1);
|
|
|
|
|
|
|
|
|
|
/* This is a compressed surface. We need to convert it to a single
|
|
|
|
|
* slice (because compressed layouts don't perfectly match uncompressed
|
|
|
|
|
* ones with the same bpb) and divide x, y, width, and height by the
|
|
|
|
|
* block size.
|
|
|
|
|
*/
|
|
|
|
|
surf_convert_to_single_slice(isl_dev, info);
|
|
|
|
|
|
|
|
|
|
if (width || height) {
|
2016-10-25 22:47:21 -07:00
|
|
|
#ifndef NDEBUG
|
|
|
|
|
uint32_t right_edge_px = info->tile_x_sa + *x + *width;
|
|
|
|
|
uint32_t bottom_edge_px = info->tile_y_sa + *y + *height;
|
2016-08-30 13:13:43 -07:00
|
|
|
assert(*width % fmtl->bw == 0 ||
|
2016-10-25 22:47:21 -07:00
|
|
|
right_edge_px == info->surf.logical_level0_px.width);
|
2016-08-30 13:13:43 -07:00
|
|
|
assert(*height % fmtl->bh == 0 ||
|
2016-10-25 22:47:21 -07:00
|
|
|
bottom_edge_px == info->surf.logical_level0_px.height);
|
|
|
|
|
#endif
|
2016-08-30 13:13:43 -07:00
|
|
|
*width = DIV_ROUND_UP(*width, fmtl->bw);
|
|
|
|
|
*height = DIV_ROUND_UP(*height, fmtl->bh);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
assert(*x % fmtl->bw == 0);
|
|
|
|
|
assert(*y % fmtl->bh == 0);
|
|
|
|
|
*x /= fmtl->bw;
|
|
|
|
|
*y /= fmtl->bh;
|
|
|
|
|
|
|
|
|
|
info->surf.logical_level0_px.width =
|
|
|
|
|
DIV_ROUND_UP(info->surf.logical_level0_px.width, fmtl->bw);
|
|
|
|
|
info->surf.logical_level0_px.height =
|
|
|
|
|
DIV_ROUND_UP(info->surf.logical_level0_px.height, fmtl->bh);
|
|
|
|
|
|
|
|
|
|
assert(info->surf.phys_level0_sa.width % fmtl->bw == 0);
|
|
|
|
|
assert(info->surf.phys_level0_sa.height % fmtl->bh == 0);
|
|
|
|
|
info->surf.phys_level0_sa.width /= fmtl->bw;
|
|
|
|
|
info->surf.phys_level0_sa.height /= fmtl->bh;
|
|
|
|
|
|
|
|
|
|
assert(info->tile_x_sa % fmtl->bw == 0);
|
|
|
|
|
assert(info->tile_y_sa % fmtl->bh == 0);
|
|
|
|
|
info->tile_x_sa /= fmtl->bw;
|
|
|
|
|
info->tile_y_sa /= fmtl->bh;
|
|
|
|
|
|
|
|
|
|
/* It's now an uncompressed surface so we need an uncompressed format */
|
|
|
|
|
info->surf.format = get_copy_format_for_bpb(fmtl->bpb);
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-27 12:07:31 -07:00
|
|
|
static void
|
|
|
|
|
surf_fake_rgb_with_red(const struct isl_device *isl_dev,
|
|
|
|
|
struct brw_blorp_surface_info *info,
|
|
|
|
|
uint32_t *x, uint32_t *width)
|
|
|
|
|
{
|
|
|
|
|
surf_convert_to_single_slice(isl_dev, info);
|
|
|
|
|
|
|
|
|
|
info->surf.logical_level0_px.width *= 3;
|
|
|
|
|
info->surf.phys_level0_sa.width *= 3;
|
|
|
|
|
*x *= 3;
|
|
|
|
|
*width *= 3;
|
|
|
|
|
|
|
|
|
|
enum isl_format red_format;
|
|
|
|
|
switch (info->view.format) {
|
|
|
|
|
case ISL_FORMAT_R8G8B8_UNORM:
|
|
|
|
|
red_format = ISL_FORMAT_R8_UNORM;
|
|
|
|
|
break;
|
|
|
|
|
case ISL_FORMAT_R16G16B16_UNORM:
|
|
|
|
|
red_format = ISL_FORMAT_R16_UNORM;
|
|
|
|
|
break;
|
|
|
|
|
case ISL_FORMAT_R32G32B32_UINT:
|
|
|
|
|
red_format = ISL_FORMAT_R32_UINT;
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
unreachable("Invalid RGB copy destination format");
|
|
|
|
|
}
|
|
|
|
|
assert(isl_format_get_layout(red_format)->channels.r.type ==
|
|
|
|
|
isl_format_get_layout(info->view.format)->channels.r.type);
|
|
|
|
|
assert(isl_format_get_layout(red_format)->channels.r.bits ==
|
|
|
|
|
isl_format_get_layout(info->view.format)->channels.r.bits);
|
|
|
|
|
|
|
|
|
|
info->surf.format = info->view.format = red_format;
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-30 13:13:43 -07:00
|
|
|
void
|
|
|
|
|
blorp_copy(struct blorp_batch *batch,
|
|
|
|
|
const struct blorp_surf *src_surf,
|
|
|
|
|
unsigned src_level, unsigned src_layer,
|
|
|
|
|
const struct blorp_surf *dst_surf,
|
|
|
|
|
unsigned dst_level, unsigned dst_layer,
|
|
|
|
|
uint32_t src_x, uint32_t src_y,
|
|
|
|
|
uint32_t dst_x, uint32_t dst_y,
|
|
|
|
|
uint32_t src_width, uint32_t src_height)
|
|
|
|
|
{
|
|
|
|
|
struct blorp_params params;
|
|
|
|
|
|
2016-09-26 10:17:49 -07:00
|
|
|
if (src_width == 0 || src_height == 0)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
blorp_params_init(¶ms);
|
2016-08-30 13:13:43 -07:00
|
|
|
brw_blorp_surface_info_init(batch->blorp, ¶ms.src, src_surf, src_level,
|
|
|
|
|
src_layer, ISL_FORMAT_UNSUPPORTED, false);
|
|
|
|
|
brw_blorp_surface_info_init(batch->blorp, ¶ms.dst, dst_surf, dst_level,
|
|
|
|
|
dst_layer, ISL_FORMAT_UNSUPPORTED, true);
|
|
|
|
|
|
|
|
|
|
struct brw_blorp_blit_prog_key wm_prog_key;
|
|
|
|
|
memset(&wm_prog_key, 0, sizeof(wm_prog_key));
|
|
|
|
|
|
|
|
|
|
const struct isl_format_layout *src_fmtl =
|
|
|
|
|
isl_format_get_layout(params.src.surf.format);
|
|
|
|
|
const struct isl_format_layout *dst_fmtl =
|
|
|
|
|
isl_format_get_layout(params.dst.surf.format);
|
|
|
|
|
|
|
|
|
|
params.src.view.format = get_copy_format_for_bpb(src_fmtl->bpb);
|
|
|
|
|
if (src_fmtl->bw > 1 || src_fmtl->bh > 1) {
|
|
|
|
|
surf_convert_to_uncompressed(batch->blorp->isl_dev, ¶ms.src,
|
|
|
|
|
&src_x, &src_y, &src_width, &src_height);
|
|
|
|
|
wm_prog_key.need_src_offset = true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
params.dst.view.format = get_copy_format_for_bpb(dst_fmtl->bpb);
|
|
|
|
|
if (dst_fmtl->bw > 1 || dst_fmtl->bh > 1) {
|
|
|
|
|
surf_convert_to_uncompressed(batch->blorp->isl_dev, ¶ms.dst,
|
|
|
|
|
&dst_x, &dst_y, NULL, NULL);
|
|
|
|
|
wm_prog_key.need_dst_offset = true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Once both surfaces are stompped to uncompressed as needed, the
|
|
|
|
|
* destination size is the same as the source size.
|
|
|
|
|
*/
|
|
|
|
|
uint32_t dst_width = src_width;
|
|
|
|
|
uint32_t dst_height = src_height;
|
|
|
|
|
|
2016-08-27 12:07:31 -07:00
|
|
|
if (dst_fmtl->bpb % 3 == 0) {
|
|
|
|
|
surf_fake_rgb_with_red(batch->blorp->isl_dev, ¶ms.dst,
|
|
|
|
|
&dst_x, &dst_width);
|
|
|
|
|
wm_prog_key.dst_rgb = true;
|
|
|
|
|
wm_prog_key.need_dst_offset = true;
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-30 13:13:43 -07:00
|
|
|
do_blorp_blit(batch, ¶ms, &wm_prog_key,
|
|
|
|
|
src_x, src_y, src_x + src_width, src_y + src_height,
|
|
|
|
|
dst_x, dst_y, dst_x + dst_width, dst_y + dst_height,
|
|
|
|
|
false, false);
|
|
|
|
|
}
|