In order to use load_global_const_block_intel we need to ensure the
64bit address in src[0] is uniform. This is not the case in the
vkd3d-proton test_bindless_cbv tests for example.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22624>
This change makes usage bits verification closer to the Vulkan spec.
i.e. VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT does not always require all formats
to support all the requested usage bits.
Also, VK_IMAGE_CREATE_EXTENDED_USAGE_BIT, when combined with
VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT can relax the requirements for the
usage supported by the original image format.
v2: Removed strict verification of the format_list_info formats usage
per chadversary's suggestion. Other minor style/comments tweaks.
v3: Added checking of all compatible formats when
VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT and VK_IMAGE_CREATE_EXTENDED_USAGE_BIT
are specified, but no list of possible formats was given.
v4: Add VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT handling.
Cc: 22.2 <mesa-stable>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6031
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17182>
It's not invalid to have this value in the list, but the only case it
is actually valid as format in the creation of an image or image view
is with Android Hardware Buffers which have their format specified
externally.
So we can just ignore all entries with VK_FORMAT_UNDEFINED.
Cc: 22.2 <mesa-stable>
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17182>
There's only one caller of this function and always passes false.
v2: (Nanley Chery)
- Also remove the aspect parameter
v3: (Nanley Chery)
- Rename the function so it's more clear in which direction it works
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
Instead of doing a slow depth clear, we can do depth fast clear in
vkClearAttachments.
Sascha Willems occlusionquery demo shows more than 2% perf boost with
this series.
On Felix's Tigerlake with the GPU at fixed frequency, this patch
improves performance of RoTR by +0.5%.
v2: (Nanley Chery)
- Clear stencil surface along with depth.
- Check for multilayer resources.
- Lookout for state.attachments.
- Fallback on slow clear for BDW and CHV if conditional rendering
enabled.
- Keep flush in same function.
v3: (Nanley Chery)
- Return immediately after fast clearing.
- Remove unnecessary comment.
v4: (Nanley Chery)
- Add assertion for BLORP_BATCH_NO_EMIT_DEPTH_STENCIL.
- Remove unnecessary local variable.
- Add 3DSTATE_WM_HZ_OP comment.
v5: (Nanley Chery)
- Fix comments.
- Don't take fast depth clear path if BLORP_BATCH_PREDICATE_ENABLE set.
- Refactor code in can_hiz_clear_att.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
Refactoring code from anv_image_hiz_clear which helps in future patches
to support fast depth clear in vkCmdClearAttachments.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
This gives anv the same behavior as turnip in not asserting, and just not
filling out feedback for those stages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
Since there are concerns that the VK_EXT_GPL implementation may have
issues with mesh shading, disable it by default but give users a knob to
turn it on to experiment.
This doesn't automatically enable GPL use in zink, because we lack
extendedDynamicState2PatchControlPoints, but it means that you only need
to set ZINK_DEBUG=gpl and not both env vars.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we're not able to compute immediate values for
the index at which to read anv_push_constants::dynamic_offsets to get
the offset of a dynamic buffer. This is because the pipeline layout
may not have all the descriptor set layouts when we compile the
shader.
To solve that issue, we insert a layer of indirection.
This reworks the dynamic buffer offset storage with a 2D array in
anv_cmd_pipeline_state :
dynamic_offsets[MAX_SETS][MAX_DYN_BUFFERS]
When the pipeline or the dynamic buffer offsets are updated, we
flatten that array into the
anv_push_constants::dynamic_offsets[MAX_DYN_BUFFERS] array.
For shaders compiled with independent sets, the bottom 6 bits of
element X in anv_push_constants::desc_sets[] is used to specify the
base offsets into the anv_push_constants::dynamic_offsets[] for the
set X.
The computation in the shader is now something like :
base_dyn_buffer_set_idx = anv_push_constants::desc_sets[set_idx] & 0x3f
dyn_buffer_offset = anv_push_constants::dynamic_offsets[base_dyn_buffer_set_idx + dynamic_buffer_idx]
It was suggested by Faith to use a different push constant buffer with
dynamic_offsets prepared for each stage when using independent sets
instead, but it feels easier to understand this way. And there is some
room for optimization if you are set X and that you know all the sets in
the range [0, X], then you can still avoid the indirection. Separate
push constant allocations per stage do have a CPU cost.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
For graphics pipelines, we'll need to load NIR for retained shaders.
We want to avoid as much processing as possible while doing that when
we're able to load ISA from cache.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we cannot bake into the shader the binding
table entry of input attachments anymore because that final location
is affected by multiple sets.
We can still access them by looking into the descriptor buffer. This
change enables the image handle to be stored in the descriptor buffer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With variable fragment shading rate, the last pre-rasterization stage
is responsible to write the shading rate value.
The current checks is as follow :
If the fragment shader can be dispatched at variable shading rate,
look for the last pre-raster stage to force the write.
We change this to :
If we're the last pre-raster stage, force the write.
That way this works for pre-rasterization shaders compiled without a
fragment shader.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
MTL and newer platforms on Xe kmd will have engines with gt_id != 0.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22477>
Measure performance of each draw separately in multi_draw event.
Previously, we measured duration of the sum of all draws launched
per multi_draw. This should provide more detailed data for
multi_draws.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
cs_stall gets inserted if both flushes and invalidates are required.
This cs_stall reason was not called out explicitly, until now.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
This optimization causes some MTL tests to run forever. Not
yet sure why. Disabling optimization until we have a fix.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22373>
Based on vkQueuePresentKHR calls. It just helps spotting the beginning
end of a frame in perfetto when apps are using 3/4 command buffers per
frame.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22276>
The query buffer contains a batch to implement the multi pass
replay/accumulation of results. So we can't clear it with a memset.
An optimization for later would be to move the batches to the very end
of the query buffer so we can clear the query data without touching
the batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4dc7256bf9 ("anv: reset query pools using blorp")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22421>
Imported buffers may be created in a device with different
memory alignment and this can cause vm bind to fail because bo
size can be smaller than the calculated vm bind range using the
importer device memory alignment.
So here adding actual_size to anv_bo, this will be set with the actual
size of the bo allocated by kmd for bos allocate in the current device.
For other bo the lseek or the Vulkan API size will be used.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22219>