Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).
To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).
This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:
Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.
A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
We may be pipelining TMU writes and reads, in which case we can
see both TMUWT and LDTMU at the end of a TMU sequence, so we should
not assume that a TMUWT always terminates a sequence.
Also, we had a bug where we were using inst instead of scan_inst
to check if we find another TMUWT after the curent instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Until now we would only allow spilling as a last resort in the
last 2 strategies, however, it is possible that in some cases
earlier strategies may produce less spills if we allowed spilling
on them.
Likewise, the fallback scheduler can sometimes produce less spills
than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to
spill, and instead of choosing the first strategy that is successful,
we choose the one that doesn't spill or the one with the least amount
of spilling.
It should be noted that this may incur in a significant increase
of compile times. We will address this in a follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
nir_builder_init_simple_shader does this automatically now.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14936>
Matches the expected use by callers. We do need to fix up a few callers which
use this call for external shaders.
v2: Fix up a radv call site (Rhys).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14936>
Commit 38800b38 changed nir_opcodes.py, but that doesn't seem to have
triggered nir_opt_algebraic.py. The change in 75ef5991 depends on
opt_algebraic lowering 16-bit versions of slt, but if opt_algebraic is
not rebuilt, this may not happen. This resulted in some people seeing
assertion failures in, for example,
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_3.step,
due to the backend seeing nir_op_slt that it didn't know how to handle.
v2: Add nir_opcodes.py to nir_algebraic_py so that all the per-driver
algebraic passes pick up the dependency too. Rename it to
nir_algebraic_depends. Suggested by Emma.
Closes: #6047
Fixes: d1992255bb ("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15050>
INTEL_MEASURE broke while implementing the common sync and submit
framework. Re-adding missing INTEL_MEASURE entry point for
command buffer submit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14998>
memcpy is divided into chunks that are vec4 sized max. The problem
here happens with a structure of 24 bytes :
struct {
float3 a;
float3 b;
}
If you memcpy that struct, the lowering will emit 2 load/store, one of
sized 8, next one sized 16. But both end up located at offset 0, so we
effectively drop 2 floats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a3177cca99 ("nir: Add a lowering pass to lower memcpy")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049>
The img->deferred_info will out-live vn_CreateImage, so we need a deep
copy of the VkImageFormatListCreateInfo struct.
This change also avoids tracking VkImageFormatListCreateInfo struct with
a zero viewFormatCount.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15017>
If a secondary command buffer is used and the client provides a
framebuffer and that framebuffer has a stencil-only attchment, we would
try to get the aux usage for the depth component of that attachment and
crash. Check the aspects of the image before looking at aux usage.
This fixes at least the following SkQP tests on my Tigerlake:
- vk_circular-clips
- vk_filterfastbounds
- vk_innershapes_bw
- vk_lineclosepath
- vk_multipicturedraw_rrectclip_simple
- vk_pathinvfill
- vk_quadclosepath
- vk_rrect_clip_bw
- vk_windowrectangles
Fixes: 0d8b9c529c ("anv: Allow PMA optimization to be enabled in secondary command buffers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15048>
Valhall has a new twist on Mali's task splitting voodoo, plus compute offset
support.
On Bifrost + Vulkan, compute offsets needed lowering on Bifrost (gl_GlobalID).
Valhall saves a few instructions here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15047>
It breaks postprocessing in some games like Ghostrunner, Deathloop,
Street Fighter V.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15042>
It's more useful than I thought.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15042>
I was doing some RE on freedreno and we had some questions about when the
hardware might need non-uniform or non-constant array access for various
descriptor types, so let's leave some notes for the next person.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13621>
shader-db results (note that we expect many more loops unrolled, so instr
count is probably understating the win):
nv30:
total instructions in shared programs: 16535069 -> 14299105 (-13.52%)
instructions in affected programs: 16377286 -> 14141322 (-13.65%)
total gpr in shared programs: 81255 -> 67268 (-17.21%)
gpr in affected programs: 56714 -> 42727 (-24.66%)
LOST: 0
GAINED: 824
nv40:
total instructions in shared programs: 20907673 -> 18428749 (-11.86%)
instructions in affected programs: 20755510 -> 18276586 (-11.94%)
total gpr in shared programs: 104200 -> 82831 (-20.51%)
gpr in affected programs: 80278 -> 58909 (-26.62%)
14130
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14130>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on resources can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: 7688b8ae98 ("st/mesa: eliminate all atomic ops when setting vertex buffers")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on sampler views can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: ef5d427413 ("st/mesa: add a mechanism to bypass atomics when binding sampler views")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
I had missed a int -> enum conversion in one recently added function and
it's probably nice to also dump the target value also in
trace_dump_resource_template() so let's do just that.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14980>
The "IB2" indirect buffer command is not supported on compute queues
according to PAL, and it indeed causes GPU hangs when task shaders are
used together with vkCmdExecuteCommands.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15006>
Ensure that we are using a recent virglrenderer to catch potential regressions
early.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15023>
Once we simplified a phi node, we never updated the definition it points
to, which meant that it could become out of date if that definition were
also simplified, and we didn't check that when rewriting sources. That
could happen when there are multiple nested loops with phi nodes at the
header.
Fix it by updating the phi's pointer. Since we always update sources
after visiting the definition it points to, when we go to rewrite a
source, if that source points to a simplified phi, the phi's pointer
can't be pointing to a simplified phi because we already visited the phi
earlier in the pass and updated it, or else it's been simplified in the
meantime and this isn't the last pass. This way we don't need to
keep recursing when rewriting sources.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15035>
This fixes artifacts seen in games when using ASTC transcoding,
we need to use DXT5 for proper alpha channel support.
Number of components is a block specific property, there is no easy
way to see if we will require >1bit alpha support or not, so simply
use DXT5 to have support in place.
Fixes: 91cbe8d855 ("gallium: Add a transcode_astc driconf option")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15029>
This is needed to add the primitive shading rate output to the vertex
or geometry shaders, even if the default value is 1x1.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
This introduces inotify support in RADV to handle changes from the
RADV_FORCE_VRS_CONFIG_FILE. This is similar to MangoHUD. I'm personally
not sure it's the best solution but let's try this way and change it
later if we have issues (or if we have a lightweight solution).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
Similar to RADV_FORCE_VRS but from a file. If present, this is used
instead of RADV_FORCE_VRS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
It's the default value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
This reduces the overhead slightly when the VRS rates don't change.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>