A descriptor buffer promoted to push constants requires a constant
cache invalidation if it is modified on the device.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 42b70cf05a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Given a situation like this :
- CB_A: begin, renderDepthA, end
- CB_B: begin, computeA, barrier (depth), computeB, end
The depth cache is not being flushed between renderDepthA & computeB
because :
- it's not flushed at the end of CB_A (it's not required)
- when CB_B starts, we're still on GFX pipeline mode but do not
flush render caches because pipeline mode is unknown
- when barrier is CB_B is executed, we're already in compute
pipeline mode and HW cannot flush depth.
The fix is to flush RT/depth cached when switching from unknown
pipeline mode any pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6dae6ef5f ("vulkan: Optimize implicit end_subpass barrier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14816
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: David Gow <david@davidgow.net>
(cherry picked from commit 888ac904a3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This bit is set in mocs for other protected attachment types by
anv_image_fill_surface_state() however was ommited for depth/stencil
attachments here.
Without the protected bit set, it causes heavy black artifacting when
attaching a protected depth attachment image to a framebuffer.
Fixes: 794b0496e9 ("anv: enable protected memory")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f84ed620c2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This has not been problem before the compression hint given to kernel
but now that we set it we hit problems when allocating bo if modifier
does not support compression.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14625
Fixes: f91de58818 ("anv: Add support to DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit fc814fa828)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Fixes a race condition where a BVH will be dumped before its command buffer is
actually submitted if a different command buffer completes between the time the
BVH dump is recorded and the time the command buffer is actually submitted.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes: 1b55f101 ("anv/bvh: Dump BVH synchronously upon command buffer completion")
(cherry picked from commit 95e471e558)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Not enough tested on over Gen12 platforms.
Turns out to be not working on DG2, for example.
Cc: mesa-stable
Closes: #14449
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39676>
(cherry picked from commit d2c24a0d8b)
For non-WSI images, explicitly map VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in anv_layout_to_aux_state().
Before this patch, the function passed PRESENT_SRC into
vk_image_layout_to_usage_flags() and got a return value of 0 from it
(that function expects that layout to be explicitly handled by the
caller). This caused the logic dependent on the return value to be
unreliable.
Fixes: c5cad407f8 ("anv: handle non-wsi images in anv_layout_to_aux_state")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit f616d4fb2a)
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 811c413f98)
The pMiColStarts/pMiRowStarts arrays from applications may have
incorrect units. Instead of using them directly, compute the tile
start positions in superblock units internally based on the tile
dimensions.
Cc: mesa-stable
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>
(cherry picked from commit 8e9fec8e40)
This fixes bunch of cts tests hitting issues when attempting
anv_image_mcs_op with compute.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39581>
(cherry picked from commit 85978ccd28)
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
(cherry picked from commit 5b48805b42)
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.
Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit ce196c9de5)
This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.
Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
(cherry picked from commit 895ff7fe92)
Each node has their own opacity bits, so we don't need to track these
opacity flags at header level.
This commit also fixes the instance flag. Instance flag is 8bit wide,
but we were always using 4 lower bits.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39053>
Setting this bit always might hurt performance. It might forces
traversal to treat all leafs always valid.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39053>
We have no usage of the information returned by
intel_perf_load_configuration(). It is only used to add a copy of the
configuration so we have the metric id but we could instead get the
metric id from sysfs, that is added by mdapi.
Xe KMD don't have a uAPI to query the metrics configuration, so
using sysfs also fixes the integration of mdapi with Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
There is no usage for register_config outside of
anv_AcquirePerformanceConfigurationINTEL(), so we don't need to store
it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Fixes:
dEQP-VK.api.copy_and_blit.dedicated_allocation.resolve_image.whole_copy_before_resolving_transfer.2_bit
Otherwise we attempt to use blorp and hit various asserts later in:
- blorp_copy_supports_blitter
- blorp_xy_block_copy_blt
Fixes: 61287b00f3 ("anv: Stop using RCS companion for MSAA copy/clear on Xe3+")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39346>
Fixes a crash with:
dEQP-VK.api.external.semaphore.opaque_fd.signal_export_import_wait_temporary
when driver calls genX(CmdSetEvent2) -> emit_apply_pipe_flushes with
having NULL in emitted_flush_bits.
Fixes: 8834ef8bcd ("anv: use flushing PIPE_CONTROL for event signaling")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39343>
Allows a shader to have multiple ray queries without spilling them to a shadow
stack. Instead, the driver provides the shader with an array of multiple
RTDispatchGlobals structs to give each query its own dedicated stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
The previous approach does ensure that all entries are zero'd, but that
may not be clear to the reader (i.e., me). Using `{ 0 }` is clearer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39245>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Anv already manages this itself. This allows removing the logic from
the compiler.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Drivers can do all the lowering to push constants to find the only
value useful in that array (subgroup_id). Then drivers call into
brw_cs_fill_push_const_info() to get the cross/per thread constant
layout computed in the prog_data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
The way we build our ranges, the first empty one is the end of the
ranges.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
That whole comment about Ivy Bridge is not relevant as ANV don't support IVB.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
There is no side affects for this shadowing but better fix it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
When this flag is set, it gives a hint to KMD to skip some operations around
compressed buffers, like copying the auxiliary buffer to smem during eviction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38425>
This patch adds compute_w_to_host_r barrier and also throw QueueWaitIdle
just to make sure all operations are completed before we fetch the BVH
data on the host.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39079>
The entire range of the allocated bo up to bo->size will be used, even if
alloc_size was way less, so to track the growth precisely for load factoring,
the allocated_batch_size should increase by bo->size.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38703>