There's only one caller of this function and always passes false.
v2: (Nanley Chery)
- Also remove the aspect parameter
v3: (Nanley Chery)
- Rename the function so it's more clear in which direction it works
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
Instead of doing a slow depth clear, we can do depth fast clear in
vkClearAttachments.
Sascha Willems occlusionquery demo shows more than 2% perf boost with
this series.
On Felix's Tigerlake with the GPU at fixed frequency, this patch
improves performance of RoTR by +0.5%.
v2: (Nanley Chery)
- Clear stencil surface along with depth.
- Check for multilayer resources.
- Lookout for state.attachments.
- Fallback on slow clear for BDW and CHV if conditional rendering
enabled.
- Keep flush in same function.
v3: (Nanley Chery)
- Return immediately after fast clearing.
- Remove unnecessary comment.
v4: (Nanley Chery)
- Add assertion for BLORP_BATCH_NO_EMIT_DEPTH_STENCIL.
- Remove unnecessary local variable.
- Add 3DSTATE_WM_HZ_OP comment.
v5: (Nanley Chery)
- Fix comments.
- Don't take fast depth clear path if BLORP_BATCH_PREDICATE_ENABLE set.
- Refactor code in can_hiz_clear_att.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
Refactoring code from anv_image_hiz_clear which helps in future patches
to support fast depth clear in vkCmdClearAttachments.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
For MTL (verx10 == 125), float64 is supported, but int64 is not.
Therefore we need to lower cluster broadcast using 32-bit int ops.
For gfx12.5+ platforms that support int64, the register regions
used by cluster broadcast aren't supported by the 64-bit pipeline.
On MTL, dEQP-VK.subgroups.clustered.*_double* and
dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the
compiled shader in debug mode, and reportedly gpu-hanging in release
mode.
With this change dEQP-VK.subgroups.clustered.*_double* passed all 48
tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on
MTL.
Rework:
* Move from generator to brw_fs_lower_regioning.cpp. (Suggested by
Francisco)
* Apply to verx10 >= 125.. (Suggested by Francisco)
Cc: 23.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
Same/similar issues are seen on MTL platform as DG2 so disable for both.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22435>
This gives anv the same behavior as turnip in not asserting, and just not
filling out feedback for those stages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
Since there are concerns that the VK_EXT_GPL implementation may have
issues with mesh shading, disable it by default but give users a knob to
turn it on to experiment.
This doesn't automatically enable GPL use in zink, because we lack
extendedDynamicState2PatchControlPoints, but it means that you only need
to set ZINK_DEBUG=gpl and not both env vars.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we're not able to compute immediate values for
the index at which to read anv_push_constants::dynamic_offsets to get
the offset of a dynamic buffer. This is because the pipeline layout
may not have all the descriptor set layouts when we compile the
shader.
To solve that issue, we insert a layer of indirection.
This reworks the dynamic buffer offset storage with a 2D array in
anv_cmd_pipeline_state :
dynamic_offsets[MAX_SETS][MAX_DYN_BUFFERS]
When the pipeline or the dynamic buffer offsets are updated, we
flatten that array into the
anv_push_constants::dynamic_offsets[MAX_DYN_BUFFERS] array.
For shaders compiled with independent sets, the bottom 6 bits of
element X in anv_push_constants::desc_sets[] is used to specify the
base offsets into the anv_push_constants::dynamic_offsets[] for the
set X.
The computation in the shader is now something like :
base_dyn_buffer_set_idx = anv_push_constants::desc_sets[set_idx] & 0x3f
dyn_buffer_offset = anv_push_constants::dynamic_offsets[base_dyn_buffer_set_idx + dynamic_buffer_idx]
It was suggested by Faith to use a different push constant buffer with
dynamic_offsets prepared for each stage when using independent sets
instead, but it feels easier to understand this way. And there is some
room for optimization if you are set X and that you know all the sets in
the range [0, X], then you can still avoid the indirection. Separate
push constant allocations per stage do have a CPU cost.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
For graphics pipelines, we'll need to load NIR for retained shaders.
We want to avoid as much processing as possible while doing that when
we're able to load ISA from cache.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we cannot bake into the shader the binding
table entry of input attachments anymore because that final location
is affected by multiple sets.
We can still access them by looking into the descriptor buffer. This
change enables the image handle to be stored in the descriptor buffer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With variable fragment shading rate, the last pre-rasterization stage
is responsible to write the shading rate value.
The current checks is as follow :
If the fragment shader can be dispatched at variable shading rate,
look for the last pre-raster stage to force the write.
We change this to :
If we're the last pre-raster stage, force the write.
That way this works for pre-rasterization shaders compiled without a
fragment shader.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
Since Gfx12+ 3DSTATE_STENCIL_BUFFER gained its own
Width/Depth/Format/etc... fields. So don't set those fields but leave
the address/pitch to 0.
Issue found on simulation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
MTL and newer platforms on Xe kmd will have engines with gt_id != 0.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22477>
INTEL_MEASURE normally measures timing of GPU events. However, it
is sometimes useful to instead measure when these gfx API calls
were requested of the driver. INTEL_MEASURE cpu can be used in
in conjunction with other driver debug capabilities, like
INTEL_DEBUG=pc for analyzing stalls/flushes or when debugger is
attached, to track which frame you're currently on or where in
the frame you're at.
Initial commit, without plumbing into anv/iris.
"INTEL_MEASURE=cpu" will collect a cpu timestamp for each
INTEL_MEASURE event instead of GPU timestamps.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
Measure performance of each draw separately in multi_draw event.
Previously, we measured duration of the sum of all draws launched
per multi_draw. This should provide more detailed data for
multi_draws.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
cs_stall gets inserted if both flushes and invalidates are required.
This cs_stall reason was not called out explicitly, until now.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
Makes it clearer which platform is being run.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Helen Koike <helen.koike@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22450>
Applying the same rule as the fs backend so that generation code
doesn't assert.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: daa8003e45 ("intel/fs: use nomask for setting cr0 for float controls")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22473>
This optimization causes some MTL tests to run forever. Not
yet sure why. Disabling optimization until we have a fix.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22373>
Based on vkQueuePresentKHR calls. It just helps spotting the beginning
end of a frame in perfetto when apps are using 3/4 command buffers per
frame.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22276>
The query buffer contains a batch to implement the multi pass
replay/accumulation of results. So we can't clear it with a memset.
An optimization for later would be to move the batches to the very end
of the query buffer so we can clear the query data without touching
the batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4dc7256bf9 ("anv: reset query pools using blorp")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22421>
Imported buffers may be created in a device with different
memory alignment and this can cause vm bind to fail because bo
size can be smaller than the calculated vm bind range using the
importer device memory alignment.
So here adding actual_size to anv_bo, this will be set with the actual
size of the bo allocated by kmd for bos allocate in the current device.
For other bo the lseek or the Vulkan API size will be used.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22219>
The instructions manipulation cr0 use the default mask on lane0. So if
for some reason that lane is disabled in some of the dispatchs, we can
end up not executing the instructions.
Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>