These handles can be emitted in control flow, which means that the handle
might be in a block which does not dominate a block that's processed
later on, which results in incorrect DXIL if we try to reference it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26803>
If we did clear a query buffer in compute mode, the flushing needs to
match the engine used for clearing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6823ffe70e ("anv: try to keep the pipeline in GPGPU mode when buffer transfer ops")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28285>
External images translate to 2D images in ntv, so we will have to emit
OpImageQuerySizeLod instead of OpImageQuerySize (thanks Faith for
pointing that out). This quells
VUID-VkShaderModuleCreateInfo-pCode-08737
Image must have either 'MS'=1 or 'Sampled'=0 or 'Sampled'=2
%32 = OpImageQuerySize %v2int %31
triggred by piglit
spec@oes_egl_image_external_essl3@oes_egl_image_external_essl3
on Zink.
Fixes: 3f783a3c50
zink: omit Lod image operand in ntv when not using an image texture dim
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28389>
The right one is a few lines below.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 44bf552704 ("anv: allocate border colors for descriptor buffers")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28387>
Two known differences with a740 are:
- RB_DBG_ECO_CNTL being 1 on A740v3
- Concurrent binning is not used
We don't have concurrent binning implemented and it's unknown
how important is RB_DBG_ECO_CNTL diff. So for now A740v3 is aliased
to ordinary A740.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28380>
If this state was emitted at the point of previous RP, which
could happen if pipeline is not set at the start of current RP,
we have to emit non-draw-state state since it would become stale
in the next tile.
Fixes test with stale reg dbg:
dEQP-VK.transform_feedback.primitives_generated_query.get.queue_reset.32bit.tese.xfb.color_write_disable_static.patch_list.pgq_default_xfb_default.two_draws.pqg_first.none_2_queries
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28326>
The pipeline used in RP may have been bound in another RP, so
we have to save relevant state and re-apply it on first draw.
Fixes GPU hang in the following test with forced binning + reg stomping:
dEQP-VK.transform_feedback.primitives_generated_query.get.queue_reset.32bit.tese.xfb.color_write_disable_static.patch_list.pgq_default_xfb_default.two_draws.pqg_first.none_2_queries
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28326>
VK_EXT_post_depth_coverage was implemented in
f1305d49d9 ("tu: Implement VK_EXT_post_depth_coverage").
Additionally mark that certain extensions are supported from a650
onwards rather than exclusively on that generation in features.txt
to match the formatting that the other drivers use.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28236>
This patch exposes support for the following three extensions:
* VK_GOOGLE_decorate_string
* VK_GOOGLE_hlsl_functionality1
* VK_GOOGLE_user_type
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28236>
Push constants are exposed as special registers on Bifrost/Valhall,
this means we can't index the push constant region with a dynamic
index. In order to support dynamic indexing, we need iterative CSELs
to select the right value from the access range.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28175>
On Bifrost, push constants are exposed as 64-bit registers which can
be accessed at a 32-bit granularity. Make sure push constant accesses
are lowered to guarantee a 32-bit alignment.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28175>
Mali GPUs have a 32-bit alignment constraint on push constants. Extend
nir_lower_mem_access_bit_sizes() so it can lower bit sizes on push
constant accesses.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28175>
Will be needed to support push constants in
nir_lower_mem_access_bit_sizes().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28175>
Implement dynamic rendering entry points so we can get rid of the
render pass logic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28167>
The gallium and vulkan drivers deal with vertex attribute emission
differently. The gallium driver re-emits the VS attributes on each
draw, while the vulkan driver uses explicit attribute/image
descriptor dirtiness tracking, and could keep the attribute array
around if a new pipeline using a different number of attribute is
bound. If we want to be able to do that, we need to assign a fixed
offset for image attributes, such that the Vulkan descriptor
lowering pass knows where the images are in the attribute table.
We could teach the Bifrost backend how to deal with a custom offset
but it doing that in a lowering pass also simplifies the Midgard
code.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28200>
Fixes crashes in tests like
dEQP-VK.pipeline.monolithic.render_to_image.core.2d_array.huge.width_height_layers.r8g8b8a8_unorm
with CTS main.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28364>
All we really need is for them to have no indirects which we can ensure
via nir_lower_indirect_derefs. Splitting into individual variables is a
relic of older attempts at FS output lowering and not needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28377>
This removes our reliance on driver_locaiton for varyings and attributes
by using nir_io_semantics instead. This is probably better as NIR seems
to be trending this direction long-term.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28377>
The pass "nir_opt_constant_folding" is definitely required.
For instance, this issue is triggered on a R430 with "piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/variable-indexing/fs-varying-array-mat2-col-rd.shader_test -auto -fb":
shader_runner: ../src/compiler/nir/nir_lower_int_to_float.c:239: lower_alu_instr: Assertion `nir_alu_type_get_base_type(info->output_type) != nir_type_int && nir_alu_type_get_base_type(info->output_type) != nir_type_uint' failed.
Fixes: 092299f18a ("r300: remove some late NIR passes")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Pavel OndraÄka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28365>
The default vn_relax is mainly targeting Vulkan commands expecting a
rely like object creation and property queries. The defined relax reason
here is VN_RELAX_REASON_RING_SPACE. The polling strategy involves more
busy waits to overcome sleep penalty affecting cpu utilization, as well
as an edge case for Android system server which forces to sleep longer
even with trivial hrtimer interval.
However, for the below relax reasons:
- VN_RELAX_REASON_RING_SPACE
- VN_RELAX_REASON_FENCE
- VN_RELAX_REASON_SEMAPHORE
- VN_RELAX_REASON_QUERY
It's a waste of cpu cycles if we do more busy waits if the initial
polled signals are not "ready". Having less busy waits there allows to
jump to higher order of sleeps sooner to disturb the scheduler less
until signaled.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Up to this commit in this MR, the gfxbench manhattan scores have been
improved by 10~15% with ANGLE-on-Venus on some AMD platforms.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Better distinguish different client waiting and prepare for applying
different waiting profile for different reasons.
Default case is avoided in reason string mapping so that below can be
hit upon compilation:
- error: enumeration value ‘XXX’ not handled in switch [-Werror=switch]
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Similar to the rationale for the 50ms -> 5ms adjustment before. When
there's enough cpu cycles, doing so would only help reduce cpu
utilization. When cpu is mostly drained, less host side unnecessary
polling is favored by the scheduler. Also in the latter case, it'd be
the non-primary ring, so it doesn't hurt to idle out faster.
Besides the theory, there's no regression in popular benchmarks, but
only power wins. Making the idle timeout too small will lead to overhead
built up. e.g. From the initial notify to ring being waken up, it's
about 200us. The notify op is more expensive than ring thread doing a
few more polls. However, we normally would save many more polls by idle
out earlier. From my local testing, reducing down to 500us won't incur
and real perf regressions either.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
The ring notification can be blocked on renderer main thread if a vq cmd
is waiting for a ring cmd (via a different non-idle ring). This change
optimizes to only try waking up the ring on the idle timeout period.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>