This removes our reliance on driver_locaiton for varyings and attributes
by using nir_io_semantics instead. This is probably better as NIR seems
to be trending this direction long-term.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28377>
The pass "nir_opt_constant_folding" is definitely required.
For instance, this issue is triggered on a R430 with "piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/variable-indexing/fs-varying-array-mat2-col-rd.shader_test -auto -fb":
shader_runner: ../src/compiler/nir/nir_lower_int_to_float.c:239: lower_alu_instr: Assertion `nir_alu_type_get_base_type(info->output_type) != nir_type_int && nir_alu_type_get_base_type(info->output_type) != nir_type_uint' failed.
Fixes: 092299f18a ("r300: remove some late NIR passes")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Pavel OndraÄka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28365>
The default vn_relax is mainly targeting Vulkan commands expecting a
rely like object creation and property queries. The defined relax reason
here is VN_RELAX_REASON_RING_SPACE. The polling strategy involves more
busy waits to overcome sleep penalty affecting cpu utilization, as well
as an edge case for Android system server which forces to sleep longer
even with trivial hrtimer interval.
However, for the below relax reasons:
- VN_RELAX_REASON_RING_SPACE
- VN_RELAX_REASON_FENCE
- VN_RELAX_REASON_SEMAPHORE
- VN_RELAX_REASON_QUERY
It's a waste of cpu cycles if we do more busy waits if the initial
polled signals are not "ready". Having less busy waits there allows to
jump to higher order of sleeps sooner to disturb the scheduler less
until signaled.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Up to this commit in this MR, the gfxbench manhattan scores have been
improved by 10~15% with ANGLE-on-Venus on some AMD platforms.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Better distinguish different client waiting and prepare for applying
different waiting profile for different reasons.
Default case is avoided in reason string mapping so that below can be
hit upon compilation:
- error: enumeration value ‘XXX’ not handled in switch [-Werror=switch]
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Similar to the rationale for the 50ms -> 5ms adjustment before. When
there's enough cpu cycles, doing so would only help reduce cpu
utilization. When cpu is mostly drained, less host side unnecessary
polling is favored by the scheduler. Also in the latter case, it'd be
the non-primary ring, so it doesn't hurt to idle out faster.
Besides the theory, there's no regression in popular benchmarks, but
only power wins. Making the idle timeout too small will lead to overhead
built up. e.g. From the initial notify to ring being waken up, it's
about 200us. The notify op is more expensive than ring thread doing a
few more polls. However, we normally would save many more polls by idle
out earlier. From my local testing, reducing down to 500us won't incur
and real perf regressions either.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
The ring notification can be blocked on renderer main thread if a vq cmd
is waiting for a ring cmd (via a different non-idle ring). This change
optimizes to only try waking up the ring on the idle timeout period.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
If entry goes until the last byte of the bo it was being marked as
not valid while it is valid.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28376>
The design of the perfetto memory event is a bit more vk specific, but
we can abuse it to get a breakdown of memory usage for various purposes.
The memory_type parameter is (ab)used to get buffer vs image memory
split out into it's own track/graph.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28058>
init_perfcntr() can be called multiple times. We don't want to
regenerate the list of counters (and overwrite/leak various other
things), so just bail if we've already initialized.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28058>
We are not supposed to apply the vertex index offset to our varying or
non-VS attribute (AKA image) descriptors. While at it, explicitly set
offset_enable to true when emitting vertex attribute descriptors, to
clarify our intentions.
Fixes: c0d6539827 ("panvk: Drop support for Midgard")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28182>
In NAK, halt turns int OpExit so this gives us exactly the same behavior
as before, just with the increased ability of NIR and our controlf-flow
lowering pass to reason about it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
This pass lowers from NIR structured control-flow to unstructured
control-flow with sync instructions scattered throughout to ensure
uniform convergence. Unlike the previous nak_nir_add_barriers() pass,
this one actually handles loop continues correctly. The previous pass
had no plan for handling divergent early continues whereas this pass
should. Also, the previous pass attempted to use barrier breaks in a
way that don't actually work because not all lanes involved in the
barrier were involved in the break.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
This is just a little handle to tell the back-end where to do the copy.
Ideally, we'd have a NIR intrinsic that does the copy but we need to be
able to copy any number of registers up to 34 and NIR intrinsics just
aren't that flexible.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
This way we avoid destroying divergence information which may be used by
passes which also need to lower phis.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
This fixes nir_print for unstructured control-flow. It's safe to
backport just this patch because the worst case is that we don't set as
many types and not as much gets printed.
Fixes: 260a9167db ("nir/print: Improve NIR_PRINT=print_consts by using nir_gather_ssa_types()")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
For any reg we can lower, we remove it whenever we remove the last read
or write. For regs that aren't used at all, however, there are no reads
or writes so there's nothing to trigger the removal. Instead, we need
to do it in setup_reg.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>