This commit renames add_surface_state_reloc to add_surface_reloc and
makes it takes an address. We also rename add_image_view_relocs to
add_surface_state_relocs because it takes an anv_surface_state and
doesn't really care about the image view anymore.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Instead of storing a BO and offset separately, use an anv_address. This
changes anv_fill_buffer_surface_state to use anv_address and we now call
anv_address_physical and pass that into ISL.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This refactors surface state filling to work entirely in terms of
anv_addresses instead of offsets. This should make things simpler for
when we go to soft-pin image buffers. Among other things,
add_image_view_relocs now only cares about the addresses in the surface
state and doesn't really need the image view anymore.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
From the bspec docs for "Indirect State Pointers Disable":
"At the completion of the post-sync operation associated with this
pipe control packet, the indirect state pointers in the hardware are
considered invalid"
So the ISP disable is a post-sync type of operation which means that it
should be combined with a CS stall. Without this, the simulator throws
an error.
Fixes: 766d801ca "anv: emit pixel scoreboard stall before ISP disable"
Fixes: f536097f6 "i965: require pixel scoreboard stall prior to ISP disable"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to make sure that all indirect state data has been loaded into
the EUs before disable the pointers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes: 78c125af39 ("anv/gen10: Ignore push constant packets during context restore.")
base_vertex will be zero for non-indexed calls and in that case we
need vertex_id to be offset by the ‘first’ parameter instead. That is
what we get with first_vertex. This is true for both GL and Vulkan.
The freedreno driver is also setting vertex_id_zero_based on
nir_options. In order to avoid breakage this patch switches the
relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can
retain the same behavior.
v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from
SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
The base vertex in Vulkan is different from GL in that for non-indexed
primitives the value is taken from the firstVertex parameter instead
of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX
instead of BASE_VERTEX.
v2 (idr): Add comment describing why SYSTEM_VALUE_FIRST_VERTEX is used
for SpvBuiltInBaseVertex. Suggested by Jason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're not counting correctly with depth & stencil images.
Additionally we need to move an assert that is meant just for color
attachments.
v2: Move an assert() (Reported by Craig)
Change aspect mask checks (Francesco)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a62a979335 ("anv: enable multiple planes per image/imageView")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105994
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Instead of updating the clear color in anv before a resolve, just let
blorp handle that for us during fast clears.
v5: Update comment about HiZ clear color (Jordan).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On Gen10+, instead of copying the clear color from the state buffer to
the surface state, just use the address of the state buffer in the
surface state directly. This way we can avoid the copy from state buffer
to surface state.
v4:
- Remove use_clear_address from anv code. (Jason)
- Use the helper to extract clear color from attachment (Jason)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Extract the code from color_attachment_compute_aux_usage, so we can
later reuse it to update the clear color state buffer.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The size of the clear color struct (expected by the hardware) is 8
dwords (isl_dev.ss.clear_value_state_size here). But we still need to
track the size of the clear color, used when memcopying it to/from the
state buffer. For that we keep isl_dev.ss.clear_value_size.
v4:
- Add struct to gen11 too (Jason, Jordan)
- Add field for Converted Clear Color to gen11 (Jason)
- Add clear_color_state_offset to differentiate from
clear_value_offset.
- Fix all the places where clear_value_size was used.
v5 (Jason):
- Split genxml changes to another commit.
- Remove unnecessary gen checks.
- Bring back missing offset increment to init_fast_clear_color().
v6 (Jason):
- On init_fast_clear_color, change:
addr.offset += 4 => sdi.Address.offset += i * 4
- Use GEN_GEN instead of GEN_VERSIONx10.
[jordan.l.justen@intel.com: isl_device_init changes]
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: rebased on top of subpass rework.
v3: rebased
v4:
- rebased
- reset pending clear views in one go rather one bit at a time (Caio)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When multiview is active a subpass clear may only clear a subset of the
attachment layers. Other subpasses in the same render pass may also
clear too and we want to honor those clears as well, however, we need to
ensure that we only clear a layer once, on the first subpass that uses
a particular layer (view) of a given attachment.
This means that when we check if a subpass attachment needs to be cleared
we need to check if all the layers used by that subpass (as indicated by
its view_mask) have already been cleared in previous subpasses or not, in
which case, we must clear any pending layers used by the subpass, and only
those pending.
v2:
- track pending clear views in the attachment state (Jason)
- rebased on top of fast-clear rework.
v3:
- rebased on top of subpass rework.
v4: rebased.
v5 (Caio):
- Rebased.
- Initialize pending clear views to only have bits set for layers
that exist.
- Reset pending clear views in one go rather one bit at a time.
- Put "last subpass for this attachment" condition in a separate
function to simplify the conditional that resets pending_clear_aspects.
Fixes:
dEQP-VK.multiview.readback_implicit_clear.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We'll want to re-use the complex resolve predicate computations for MCS
resolves so it's nice to have them as helper functions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This doesn't actually do anything because att_state->fast_clear is
determined based on the return value of anv_layout_to_fast_clear_type
which currently returns NONE for multisampled images.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
It's true for depth HiZ clears because we only have HiZ on single-slice
images right now. However, for stencil-only clears there is no such
restriction.
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
If we don't have HiZ, then anv_layout_to_aux_usage will return NONE for
both layouts. If the two layouts are the same, they will get the aux
usage. In either case, the code below will give us ISL_AUX_OP_NONE and
we'll return without doing anything.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is quite a bit cleaner because we now sync the clear values at the
same time as we do the fast clear. For loading the clear values into
the surface state, we now do it once when we handle the LOAD_OP_LOAD
instead of every subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This requires us to ditch the VkAttachmentReference struct in favor of
an anv-specific struct. However, we can now easily identify from just
the subpass attachment what kind of an attachment it is. This will make
iteration over anv_subpass::attachments a little easier in some case.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This moves the decision out of begin_subpass and into BeginRenderPass
like the decision for color clears. We use a similar name for the
function for depth/stencil as for color even though no aux usage is
really getting computed.
v2 (Jason Ekstrand):
- Don't always disable HiZ clears by accident
- Use the initial layout to decide whether to do fast clears
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This doesn't really change much now but it will give us more/better
control over clears in the future. The one interesting functional
change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and
friends for each clear. However, this only happens at begin_subpass
time so it shouldn't be substantially more expensive.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is a bit less awkward than passing in the subpass because it means
we don't have to extract the subpass id from the subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This seems slightly more correct because it means that the flushes
happen before any clears or resolves implied by the subpass transition.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previously, we just used all the channels regardless of the format.
This is less than ideal because some channels may have undefined values
and this should be ok from the client's perspective. Even though the
driver should do the correct thing regardless of what is in the
undefined value, it makes things less deterministic. In particular, the
driver may choose to fast-clear or not based on undefined values. This
level of nondeterminism is bad.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The PIPE_CONTROL command description says:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The previous code was trying to avoid non-existent layers by taking a
MAX with anv_image_aux_layers. Unfortunately, it wasn't taking into
account that layer_count starts at base_layer which may not be zero.
Instead, we need to subtract base_layer from anv_image_aux_layers with
a guard against roll-over.
Fixes: de3be61801 "anv/cmd_buffer: Rework aux tracking"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Now that we're tracking aux properly per-slice, we can enable this for
applications which actually care.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This commit completely reworks aux tracking. This includes a number of
somewhat distinct changes:
1) Since we are no longer fast-clearing multiple slices, we only need
to track one fast clear color and one fast clear type.
2) We store two bits for fast clear instead of one to let us
distinguish between zero and non-zero fast clear colors. This is
needed so that we can do full resolves when transitioning to
PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear
values in all sorts of places we wouldn't normally.
3) We now track compression state as a boolean separate from fast clear
type and this is tracked on a per-slice granularity.
The previous scheme had some issues when it came to individual slices of
a multi-LOD images. In particular, we only tracked "needs resolve"
per-LOD but you could do a vkCmdPipelineBarrier that would only resolve
a portion of the image and would set "needs resolve" to false anyway.
Also, any transition from an undefined layout would reset the clear
color for the entire LOD regardless of whether or not there was some
clear color on some other slice.
As far as full/partial resolves go, he assumptions of the previous
scheme held because the one case where we do need a full resolve when
CCS_E is enabled is for window-system images. Since we only ever
allowed X-tiled window-system images, CCS was entirely disabled on gen9+
and we never got CCS_E. With the advent of Y-tiled window-system
buffers, we now need to properly support doing a full resolve of images
marked CCS_E.
v2 (Jason Ekstrand):
- Fix an bug in the compressed flag offset calculation
- Treat 3D images as multi-slice for the purposes of resolve tracking
v3 (Jason Ekstrand):
- Set the compressed flag whenever we fast-clear
- Simplify the resolve predicate computation logic
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Even though the blorp pass looks a bit on the sketchy side, the end
result in the Vulkan driver is very nice. Instead of having this weird
case where you do a fast clear and then maybe have to resolve, we just
do the ambiguate and are done with it. The ambiguate does exactly what
we want of setting all the CCS values to 0 which puts it into the
pass-through state.
This should also improve performance a bit in certain cases. For
instance, if we did a transition from UNDEFINED to GENERAL for a surface
that doesn't have CCS enabled all the time, we would end up doing a
fast-clear and then a full resolve which ends up touching every byte in
the main surface as well as the CCS. With the ambiguate pass, that
transition only touches the CCS.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Now that this isn't a multi-case if and it's just the one case, it's a
bit clearer if the condition is just part of the if instead of being
pulled out into a boolean variable.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The current strategy we use for managing resolves has an issues where we
track clear colors and the need for resolves per-LOD but we still allow
resolves of only a subset of the slices in any given LOD and doing so
sets the "needs resolve" flag for that LOD to false while leaving the
remaining layers unresolved. This patch is only the first step and does
not, by itself fix anything. However, it's fairly self-contained and
splitting it out means any performance regressions should bisect to this
nice obvious commit rather than to the giant "rework aux tracking"
commit.
Nanley and I did some testing and none of the applications we tested
even tried to fast-clear anything other than the first slice of an
image. The test was done by adding a printf right before we call
blorp_fast_clear if we were every going to touch any slice other than
the first with a fast-clear. Due to the way the original code was
structured, this would not have included applications which only cleared
a subset of layers. The applications tested were:
* All Sascha Willems demos
* Aztec Ruins
* Dota 2
* The Talos Principle
* Mad Max
* Warhammer 40,000: Dawn of War III
* Serious Sam Fusion 2017: BFE
While not the full list of shipping applications, it's a pretty good
spread and covers most of the engines we've seen running on our driver.
If this is ever shown to be a performance problem in the future, we can
reconsider our strategy.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>