Alias registers allow us to allocate non-consecutive registers and remap
them to consecutive ones using alias.tex. We implement this by adding
the sources of collects directly to the sources of their users. This
way, RA treats them as scalar registers and we can remap them to
consecutive registers afterwards. To keep track of the scalar sources
that should be remapped together, the IR3_REG_FIRST_ALIAS flag is
introduced. Every source of such an "alias group" will have the
IR3_REG_ALIAS set, while the first one will also have
IR3_REG_FIRST_ALIAS set.
This commit also adds a number of helpers to iterate over sources while
keeping track of the original src index (i.e., before they were expanded
to alias goups), and to iterate the sources within an alias group. It
also introduces a new notation (@{regs...}) to clearly show alias groups
when printing instructions.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
Take the properties of alias.{rt,tex} and its registers into account:
- Don't count alias registers for GPR usage;
- Allow all immediates in alias regs;
- Fix properties like is_barrier and (ss) support;
- alias.rt dst is not a GPR, don't use it in legalize/postsched to track
dependencies;
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
The UNK field encodes the table size for alias.tex: the first alias.tex
instruction uses it to indicate how many follow (i.e., it is the total
table size minus one).
Also switch from using a src to a cat7 field to store this value which
makes it a bit easier to handle.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
The alias scope and type bits are intertwined in the encoding:
- bit 47: low scope
- bit 48: type
- bit 49: high scope
- bit 50: type size
Combining the low and high scope bits, the value is used as follows:
- 0: tex
- 1: rt
- 2: mem
- 3: mem
I don't know what the difference between 2 and 3 is. The blob currently
doesn't use mem at all.
The type bit seems to be used to make a distinction between floating
point (f) and integer (b) sources. There doesn't seem to be any
functional difference and it only affects how immediates are displayed.
Note that I haven't exactly mimicked the blob in these cases:
- alias.tex.f16/32: the blob uses b16/32 while printing immediates in
floating point notation. I think it make more sense to use f16/32.
- alias.rt.b16/32: the blob uses i16/32 here. I think it makes more
sense to stick to a single notation (b).
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
There is a single swap field for each color attachment, regardless of
whether it's in GMEM or not, and this does appear to be used in
GMEM mode when MUTABLEEN is set on the attachment. This means that when
a color attachment has a non-identity swap because it's mutable on a750,
we have to use the same corresponding swap when it's a source in a
GMEM resolve. When using the fastpath, we have to make sure that the
swaps match because there aren't separate fields for GMEM and sysmem
swap.
This fixes dEQP-VK.image.mutable.2d.*_b8g8r8a8_unorm_draw_copy_resolve
with TU_DEBUG=gmem.
Fixes: 247d11d635 ("tu: Allow UBWC with images with swapped formats.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>
We don't currently support building acceleration structures on the CPU
or indirect building in the common framework, and drivers using it don't
either, but drivers have to return non-NULL entrypoints for CPU building
functions if they claim to support VK_KHR_acceleration_structure. Add
stub entrypoints here so that drivers don't have to have this
boilerplate.
Fixes dEQP-VK.api.version_check.entry_points on turnip.
Fixes: 671e3a65a6 ("tu: Support VK_KHR_acceleration_structure")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>
The alignment was considered only for offset, but its users
(at least ir3_nir_opt_preamble) expect the size itself to also
be aligned.
Fixes tests:
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tessc
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse
gmem-dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse
Fixes: 922ef8e720
("ir3: Make allocation of consts more generic and order independent")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33161>
This reverts commit 9d043e138d.
It is no longer needed. nir_convert_from_ssa() is now capable to
ignore divergence information.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>
This pass used to unconditionally use divergence information
which forced the caller to either call divergence_analysis or
ensure that the divergence is properly reset.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>
libwaffle 1.7.0 has a hack that dlopen's libglapi with RTLD_GLOBAL, which
was meant to preload libglapi, but with this MR it overwrites libgallium's
own symbols, which breaks libgallium.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
so that we don't have to maintain a stable ABI for it.
This will allow removal of the remapping table to reduce CALL_* overhead
for GL dispatch tables.
Also we can now clean it up.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.
12 (0.02% of 79395) affected shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
They might be needed as convergence point in order to
insert code (e.g. for loop alignment, wait states, etc.).
Totals from 1 (0.00% of 79395) affected shaders:
CodeSize: 12672 -> 12716 (+0.35%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
TS can be valid and flushed at the same time when no compression is
used. This state is beneficial if we needed to flush TS to the base
surface (filling cleared tiles) for any reason, but still use TS
state to accelerate read requests into PE or TX caches.
The current seqno based tracking of the TS flush state has a major
drawback with the following sequence of events:
1. fast clear surface (TS is now valid)
2. flush TS (base surface tiles filled, TS still valid,
flush seqno == surface seqno)
3. render to surface (surface seqno increased)
4. flush resource
Step 4 will now execute a full TS flush as the flush and surface seqnos
are different after rendering and TS is still valid, wasting memory
bandwidth to fill already filled tiled that are still marked as clear
in the TS state. If the TS has been flushed already, step 4 should be
a no-op.
Switch from the seqno based tracking to tracking the flush state itself,
marking the TS state un-/flushed as needed. With this boolean tracking
of the flush state step4 above will correctly see that the TS has already
been flushed since the last fast clear and skip the tile fill blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32956>
Without this, we fail to register-allocate the shader used in the
dEQP-VK.ssbo.phys.layout.random.8bit.scalar.78 VK-CTS test case.
Yeah, this sucks, but failing to compile sucks even more. We need a new
register allocator plan here.
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33124>
We can't use VK_SHADER_STAGE_ALL here, because we don't support geometry
and tesselation shaders. Additionally, the DDK doesn't support the
vertex stage, so let's not even try that for now; it probably won't
work.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>
Implement as_uniform with a simple mov, as the HW doesn't have
uniform registers (registers shared by all threads in the warp)
like some other hardware does.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>