This moves the decision out of begin_subpass and into BeginRenderPass
like the decision for color clears. We use a similar name for the
function for depth/stencil as for color even though no aux usage is
really getting computed.
v2 (Jason Ekstrand):
- Don't always disable HiZ clears by accident
- Use the initial layout to decide whether to do fast clears
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is similar to blorp_gen8_hiz_clear_attachments except that it takes
actual images instead of trusting in the already set depth state.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This doesn't really change much now but it will give us more/better
control over clears in the future. The one interesting functional
change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and
friends for each clear. However, this only happens at begin_subpass
time so it shouldn't be substantially more expensive.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is a bit less awkward than passing in the subpass because it means
we don't have to extract the subpass id from the subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This seems slightly more correct because it means that the flushes
happen before any clears or resolves implied by the subpass transition.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previously, we just used all the channels regardless of the format.
This is less than ideal because some channels may have undefined values
and this should be ok from the client's perspective. Even though the
driver should do the correct thing regardless of what is in the
undefined value, it makes things less deterministic. In particular, the
driver may choose to fast-clear or not based on undefined values. This
level of nondeterminism is bad.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This fixes a pile of hangs caused by the recent shuffling of resolves
and transitions. The particularly problematic case is when you have at
least three attachments with load ops of CLEAR, LOAD, CLEAR. In this
case, we execute the first CLEAR followed by a MI memcpy to copy the
clear values over for the LOAD followed by a second CLEAR. The MI
commands cause the first CLEAR to hang which causes us to get stuck on
the 3DSTATE_MULTISAMPLE in the second CLEAR.
We also add guards for BLORP to fix the same issue. These shouldn't
actually do anything right now because the only use of indirect clears
in BLORP today is for resolves which are already guarded by a render
cache flush and CS stall. However, this will guard us against potential
issues in the future.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
The PIPE_CONTROL command description says:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Sampling from hiz is enabled in i965 for GEN9+ but this feature has
been removed from gen11. So, this new flag will be useful to turn
the feature on/off for different gen h/w. It will be used later
in a patch adding device info for gen11.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SIMD4x2 dispatch mode has been removed in GEN11. We're not using
it anyways in Mesa. Adding few asserts to make it explicit.
Use GEN_GEN macro in place of devinfo->gen (Ken)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
StateCacheInvalidation is required on all gen7+ platforms. We
don't need to update this check for every new gen h/w unless
this requirement is changed. So, dropping the check for latest
gen h/w.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 10d1b0be8e.
This is unnecessary, the depend_files argument is for adding
dependencies on files that are not part of the input, which is already
done.
cc: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 10d1b0be8e
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
From Skylake PRM Surface Formats section:
"The surface format for the typed atomic integer operations must
be R32_UINT or R32_SINT."
Fixes an error and a piglit GPU hang in simulation environment.
Piglit test: gl45-imageAtomicExchange-float.shader_test
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.co
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "18.0 17.3" <mesa-stable@lists.freedesktop.org>
This code to re-set the type of the source and destination is not
necessary since we never manipulate the types. Looks like a
left over from a time where we had to retype to float temporarily
to handle 64-bit inputs.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Divide it by two as we do for other stages. This is because the
component layout qualifier is always in 32-bit units.
Fixes issues in a new CTS test (still WIP):
KHR-GL45.enhanced_layouts.varying_double_components
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We were setting current_pipeline to UINT32_MAX and then calling
cmd_cmd_state_reset which memsets the entire state struct to 0 which
implicitly resets current_pipeline to 3D. I have no idea how this
hasn't caused everything to explode.
Fixes: cd3feea745 "anv/cmd_buffer: Rework anv_cmd_state_reset"
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The previous code was trying to avoid non-existent layers by taking a
MAX with anv_image_aux_layers. Unfortunately, it wasn't taking into
account that layer_count starts at base_layer which may not be zero.
Instead, we need to subtract base_layer from anv_image_aux_layers with
a guard against roll-over.
Fixes: de3be61801 "anv/cmd_buffer: Rework aux tracking"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The kernel used to have execbuf parameters to program the INSTPM bit
for whether 3DSTATE_CONSTANT_* should be relative to dynamic state
base address or an absolute address. However, they never worked in
the presence of hardware contexts, so I deleted them a while back.
It doesn't make sense to set this flag, as it doesn't exist anymore.
It also never did anything anyway - the flag is zero, so |'ing it in
did nothing. The default is relative anyway.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix the following:
warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but
argument 3 has type ‘uint64_t {aka long long unsigned int}.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that we're tracking aux properly per-slice, we can enable this for
applications which actually care.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This commit completely reworks aux tracking. This includes a number of
somewhat distinct changes:
1) Since we are no longer fast-clearing multiple slices, we only need
to track one fast clear color and one fast clear type.
2) We store two bits for fast clear instead of one to let us
distinguish between zero and non-zero fast clear colors. This is
needed so that we can do full resolves when transitioning to
PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear
values in all sorts of places we wouldn't normally.
3) We now track compression state as a boolean separate from fast clear
type and this is tracked on a per-slice granularity.
The previous scheme had some issues when it came to individual slices of
a multi-LOD images. In particular, we only tracked "needs resolve"
per-LOD but you could do a vkCmdPipelineBarrier that would only resolve
a portion of the image and would set "needs resolve" to false anyway.
Also, any transition from an undefined layout would reset the clear
color for the entire LOD regardless of whether or not there was some
clear color on some other slice.
As far as full/partial resolves go, he assumptions of the previous
scheme held because the one case where we do need a full resolve when
CCS_E is enabled is for window-system images. Since we only ever
allowed X-tiled window-system images, CCS was entirely disabled on gen9+
and we never got CCS_E. With the advent of Y-tiled window-system
buffers, we now need to properly support doing a full resolve of images
marked CCS_E.
v2 (Jason Ekstrand):
- Fix an bug in the compressed flag offset calculation
- Treat 3D images as multi-slice for the purposes of resolve tracking
v3 (Jason Ekstrand):
- Set the compressed flag whenever we fast-clear
- Simplify the resolve predicate computation logic
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>