Commit graph

200611 commits

Author SHA1 Message Date
Job Noorman
38f5fd66de freedreno: add chip param to emit_fs_output
We will need this to emit the a7xx-specific aliased components regs.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
2716385afa tu: add support for aliased render target components
a7xx introduced support for aliasing render target components using
alias.rt. This allows components to be bound to uniform (const or
immediate) values in the preamble:

alias.rt.f32.0 rt0.y, c0.x
alias.rt.f32.0 rt1.z, (1.000000)

This aliases the 2nd component of RT0 to c0.x and the 3rd component of
RT1 to the immediate 1.0. All components of all 8 render targets can be
aliased.

In addition to using alias.rt, the hardware needs to be informed about
which render target components are being aliased using the
SP_PS_ALIASED_COMPONENTS{_CONTROL} registers. This commit implements
those registers.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
3290a3dcf3 tu: add chip param to tu6_emit_fs_outputs
We will need this to emit the a7xx-specific aliased components regs.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
040b803e4b ir3: reuse ir3_find_output in ir3_find_output_regid
The search logic was duplicated here. Also added a new helper
ir3_get_output_regid to make the regid calculation reusable.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
39206c1150 ir3: make shader output struct non-anonymous
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
b8b3fe20b2 tu,ir3: inform ir3 of dynamically remapped FS slots
The clear FS shaders will statically use slot numbers from 0 up to the
number of supported render targets. However, the driver will remap those
slots to the actual render targets being cleared.

This means that ir3 should not make any assumptions about the static
slot number in those cases. This is especially important when
implementing alias.rt, which statically encodes the render target.

Add an new ir3_shader_option (fragdata_dynamic_remap) which allows the
driver to indicate to ir3 that it will perform such dynamic remapping.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
2a0a317244 ir3: make find_end a global helper
Rename to ir3_find_end and move to ir3.{h,c}.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
f3026b3d3e ir3: add some preamble helpers
Helpers to check for preamble existence, find shpe, and create an empty
preamble.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
144121b6df ir3/dce: support partial writes from collects
When alias.rt is used to alias certain output components, we might end
up with a situation where some, but not all, of the components of
collects end up being unused. This is currently not supported which
means we end up with useless moves (coming from copy lowering) for
aliased output components.

Fix this by adding support for partial wrmasks for collects in DCE. The
wrmasks are initially zeroed out and then updated based on the wrmask of
their users. Sources of collects for which the corresponding dst ends up
being unused are treated as unused as well. This allows us to remove
the useless output moves by simply updating the wrmask of the end
sources.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
a7a357f91d ir3/legalize: insert (sy) to read consts after ldc.k
Observed when reading consts in the preamble using alias.rt.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
96e08c3859 ir3/legalize: insert (ss) to read consts after stc
Observed when reading consts in the preamble using alias.rt.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
9b6bca52d5 ir3: optimize alias register allocation by reusing GPRs
Allocate alias registers for an alias group while trying to minimize the
number of needed aliases. That is, if the allocated GPRs for the group
are (partially) consecutive, only allocate aliases to fill-in the gaps.
For example:
   sam ..., @{r1.x, r5.z, r1.z}, ...
only needs a single alias:
   alias.tex.b32.0 r1.y, r5.z
   sam ..., r1.x, ...

Also, try to reuse allocations of previous groups. For example, this is
relatively common:
   sam ..., @{r2.z, 0}, @{0}
Reusing the allocation of the first group for the second one gives this:
   alias.tex.b32.0 r2.w, 0
   sam ..., r2.z, r2.w

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
3fb0f54d70 ir3: add support for alias.tex
alias.tex allows us to construct an "alias table" that creates a mapping
between virtual alias registers and concrete GPRs, consts, or
immediates. The following texture instruction will lookup its sources in
this table and use the mapped value instead. This has a few advantages:
- We don't have to allocate consecutive registers (necessary for many
  tex sources) as we can just map them to consecutive alias registers.
- We don't have to allocate GPRs at all for consts and immediates.
- There's no delay penalty when initializing alias registers with consts
  or immediates.

For example, this code:
mov.u32u32 r1.x, r3.z
mov.u32u32 r1.y, c0.x
mov.u32u32 r1.z, 0
(rpt2)nop
sam ..., r1.x, ...

Can be implemented as follows:
alias.tex.b32.2 r40.x, r3.z
alias.tex.b32.0 r40.y, c0.x
alias.tex.b32.0 r40.z, 0
sam ..., r40.x, ...

Note that the alias registers (r40.xyz in this case) do not occupy GPR
space.

(More intelligent allocation strategies are possible; e.g., just mapping
r3.w and r4.x to c0.x and 0. This is implemented by the next commit.)

Support for alias.tex is implemented in two passes in ir3.

In a first pass, sources of tex instructions are replaced by alias
sources (IR3_REG_ALIAS) as follows:
- movs from const/imm: replace with the const/imm;
- collects: replace with the sources of the collect;
- GPR sources: simply mark as alias.

This way, RA won't be forced to allocate consecutive registers for
collects and useless collects/movs can be DCE'd. Note that simply
lowering collects to aliases doesn't work because RA would assume that
killed sources of aliases are dead, while they are in fact live until
the tex instruction that uses them.

The second pass inserts alias.tex instructions in front of the tex
instructions that need them and fixes up the tex instruction's sources.
This pass needs to run post-RA as discussed above. It also needs to run
post-legalization as all the sync flags need to be inserted based on the
registers instructions actually use, not on the alias registers they
have as sources.

This commit uses a very simple allocation strategy for alias registers:
simply allocate consecutive registers starting from r40.x. Note that
this works because the alias table is reset after a tex instruction is
executed so we don't have to worry about aliasing a live register.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
4a9faaae17 ir3: add ir3_compiler::has_alias
Flag to detect support for alias.rt/alias.tex available in a7xx.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
c5c95f8916 ir3: add validation for alias
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:24 +00:00
Job Noorman
84b93cf718 ir3: introduce alias goups
Alias registers allow us to allocate non-consecutive registers and remap
them to consecutive ones using alias.tex. We implement this by adding
the sources of collects directly to the sources of their users. This
way, RA treats them as scalar registers and we can remap them to
consecutive registers afterwards. To keep track of the scalar sources
that should be remapped together, the IR3_REG_FIRST_ALIAS flag is
introduced. Every source of such an "alias group" will have the
IR3_REG_ALIAS set, while the first one will also have
IR3_REG_FIRST_ALIAS set.

This commit also adds a number of helpers to iterate over sources while
keeping track of the original src index (i.e., before they were expanded
to alias goups), and to iterate the sources within an alias group. It
also introduces a new notation (@{regs...}) to clearly show alias groups
when printing instructions.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
4c2fc07a7e ir3: teach backend about alias
Take the properties of alias.{rt,tex} and its registers into account:
- Don't count alias registers for GPR usage;
- Allow all immediates in alias regs;
- Fix properties like is_barrier and (ss) support;
- alias.rt dst is not a GPR, don't use it in legalize/postsched to track
  dependencies;

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
a325573aaf ir3/print: add support for alias
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
fb9de08efd ir3/a7xx: document alias.rt
It works completely differently from alias.tex.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
d9241c6360 ir3/a7xx: handle alias.rt dst
alias.rt writes to a render target, not a GPR. Render targets are
disassembled as rtN.c.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
dab47b55ef ir3/a7xx: implement and document unknown alias field
The UNK field encodes the table size for alias.tex: the first alias.tex
instruction uses it to indicate how many follow (i.e., it is the total
table size minus one).

Also switch from using a src to a cat7 field to store this value which
makes it a bit easier to handle.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
af7c6f8dd5 ir3/a7xx: disasm halfness of alias dst
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
c4d84a8675 ir3/a7xx: properly handle alias scope and type
The alias scope and type bits are intertwined in the encoding:
- bit 47: low scope
- bit 48: type
- bit 49: high scope
- bit 50: type size

Combining the low and high scope bits, the value is used as follows:
- 0: tex
- 1: rt
- 2: mem
- 3: mem

I don't know what the difference between 2 and 3 is. The blob currently
doesn't use mem at all.

The type bit seems to be used to make a distinction between floating
point (f) and integer (b) sources. There doesn't seem to be any
functional difference and it only affects how immediates are displayed.

Note that I haven't exactly mimicked the blob in these cases:
- alias.tex.f16/32: the blob uses b16/32 while printing immediates in
  floating point notation. I think it make more sense to use f16/32.
- alias.rt.b16/32: the blob uses i16/32 here. I think it makes more
  sense to stick to a single notation (b).

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Job Noorman
2f629810aa ir3/parser: fix parsing integer as float
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>
2025-01-23 06:26:23 +00:00
Connor Abbott
15642c8ec2 tu: Handle non-identity GMEM swaps for input attachments
I believe nothing currently tests this, but this should be required by
analogy with the previous commit.

Fixes: 247d11d635 ("tu: Allow UBWC with images with swapped formats.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>
2025-01-23 05:53:40 +00:00
Connor Abbott
a104a7ca1a tu: Handle non-identity GMEM swaps when resolving
There is a single swap field for each color attachment, regardless of
whether it's in GMEM or not, and this does appear to be used in
GMEM mode when MUTABLEEN is set on the attachment. This means that when
a color attachment has a non-identity swap because it's mutable on a750,
we have to use the same corresponding swap when it's a source in a
GMEM resolve.  When using the fastpath, we have to make sure that the
swaps match because there aren't separate fields for GMEM and sysmem
swap.

This fixes dEQP-VK.image.mutable.2d.*_b8g8r8a8_unorm_draw_copy_resolve
with TU_DEBUG=gmem.

Fixes: 247d11d635 ("tu: Allow UBWC with images with swapped formats.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>
2025-01-23 05:53:40 +00:00
Connor Abbott
450755bd40 tu: Use image view format for sysmem resolves
The spec says that we're supposed to do this. This fixes the
newly-introduced tests dEQP-VK.image.mutable.*.*_draw_copy_resolve with
TU_DEBUG=sysmem.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>
2025-01-23 05:53:40 +00:00
Connor Abbott
47a85815b0 radv: Delete acceleration structure stubs
These are now provided by common code.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>
2025-01-23 05:16:58 +00:00
Connor Abbott
987e499253 anv: Delete acceleration structure stubs
These are now provided by common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>
2025-01-23 05:16:58 +00:00
Connor Abbott
3141033ac2 vk/bvh: Add default stubs for unsupported entrypoints
We don't currently support building acceleration structures on the CPU
or indirect building in the common framework, and drivers using it don't
either, but drivers have to return non-NULL entrypoints for CPU building
functions if they claim to support VK_KHR_acceleration_structure. Add
stub entrypoints here so that drivers don't have to have this
boilerplate.

Fixes dEQP-VK.api.version_check.entry_points on turnip.

Fixes: 671e3a65a6 ("tu: Support VK_KHR_acceleration_structure")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>
2025-01-23 05:16:58 +00:00
Eric Engestrom
762cd246ee docs/release-calendar: push back the 24.3.x releases by one week
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>
2025-01-23 03:09:36 +00:00
Eric Engestrom
835ecc5758 docs: add sha sum for 24.3.4
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>
2025-01-23 03:09:36 +00:00
Eric Engestrom
e5ca260032 docs: add release notes for 24.3.4
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>
2025-01-23 03:09:36 +00:00
Eric Engestrom
3d3ac0de25 docs: update calendar for 24.3.4
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>
2025-01-23 03:09:36 +00:00
Danylo Piliaiev
244e408341 ir3: Consider const alloc alignment in free space size calcs
The alignment was considered only for offset, but its users
(at least ir3_nir_opt_preamble) expect the size itself to also
be aligned.

Fixes tests:
  dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
  dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tessc
  dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse
  gmem-dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse

Fixes: 922ef8e720
("ir3: Make allocation of consts more generic and order independent")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33161>
2025-01-23 02:31:48 +00:00
Daniel Schürmann
1feb733cd4 Revert "nir: add nir_clear_divergence_info, use it in nir_opt_varyings"
This reverts commit 9d043e138d.

It is no longer needed. nir_convert_from_ssa() is now capable to
ignore divergence information.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>
2025-01-23 01:31:24 +00:00
Daniel Schürmann
f3be7ce01b nir/from_ssa: only consider divergence if requested
This pass used to unconditionally use divergence information
which forced the caller to either call divergence_analysis or
ensure that the divergence is properly reset.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>
2025-01-23 01:31:23 +00:00
Marek Olšák
e7214b9446 glapi: rename exported symbols so as not to conflict with old libglapi
libwaffle 1.7.0 has a hack that dlopen's libglapi with RTLD_GLOBAL, which
was meant to preload libglapi, but with this MR it overwrites libgallium's
own symbols, which breaks libgallium.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
2025-01-23 00:49:05 +00:00
Marek Olšák
6e3ee3a072 loader: improve the existing loader-libgallium non-matching version error
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
2025-01-23 00:49:05 +00:00
Marek Olšák
464dde302c glapi: remove the remap table
it's unused now

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
2025-01-23 00:49:05 +00:00
Marek Olšák
b22f682a31 glapi: stop using the remap table
The remap table adds an array lookup into 75% of CALL_* macros, which are
used to call GL functions through the dispatch table. Removing the array
lookup reduces overhead of dispatch table calls.

Since libglapi is now required to be from the same build as libgallium,
the remap table is no longer needed.

This change doesn't remove the remapping table. It only disables it.
Compare asm:

Before:
0000000000000000 <_mesa_unmarshal_Uniform1f>:
   0:   f3 0f 1e fa             endbr64
   4:   48 83 ec 08             sub    $0x8,%rsp
   8:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # f <_mesa_unmarshal_Uniform1f+0xf>
   f:   8b 4e 04                mov    0x4(%rsi),%ecx
  12:   31 d2                   xor    %edx,%edx
  14:   f3 0f 10 46 08          movss  0x8(%rsi),%xmm0
  19:   48 63 80 a8 01 00 00    movslq 0x1a8(%rax),%rax
  20:   85 c0                   test   %eax,%eax
  22:   78 08                   js     2c <_mesa_unmarshal_Uniform1f+0x2c>
  24:   48 8b 57 40             mov    0x40(%rdi),%rdx
  28:   48 8b 14 c2             mov    (%rdx,%rax,8),%rdx
  2c:   89 cf                   mov    %ecx,%edi
  2e:   ff d2                   call   *%rdx
  30:   b8 02 00 00 00          mov    $0x2,%eax
  35:   48 83 c4 08             add    $0x8,%rsp
  39:   c3                      ret

After:
0000000000000000 <_mesa_unmarshal_Uniform1f>:
   0:   f3 0f 1e fa             endbr64
   4:   48 89 f8                mov    %rdi,%rax
   7:   48 83 ec 08             sub    $0x8,%rsp
   b:   f3 0f 10 46 08          movss  0x8(%rsi),%xmm0
  10:   8b 7e 04                mov    0x4(%rsi),%edi
  13:   48 8b 40 40             mov    0x40(%rax),%rax
  17:   ff 90 10 10 00 00       call   *0x1010(%rax)
  1d:   b8 02 00 00 00          mov    $0x2,%eax
  22:   48 83 c4 08             add    $0x8,%rsp
  26:   c3                      ret

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
2025-01-23 00:49:05 +00:00
Marek Olšák
44bda7c258 dri: put shared-glapi into libgallium.*.so
so that we don't have to maintain a stable ABI for it.

This will allow removal of the remapping table to reduce CALL_* overhead
for GL dispatch tables.

Also we can now clean it up.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>
2025-01-23 00:49:05 +00:00
Daniel Schürmann
08560b8ff8 aco/lower_branches: stitch linear blocks if there is exactly one successor with one predecessor
Totals from 12906 (16.26% of 79395) affected shaders: (Navi31)

Instrs: 22051521 -> 22049488 (-0.01%); split: -0.01%, +0.00%
CodeSize: 116591240 -> 116583920 (-0.01%)
Latency: 196625178 -> 196538410 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 33943045 -> 33930615 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c90ae5f773 aco: delete aco_jump_threading.cpp
This is now handled by lower_branches().

Totals from 47236 (59.49% of 79395) affected shaders: (Navi31)
Instrs: 29490400 -> 29490507 (+0.00%)
CodeSize: 152316812 -> 152317248 (+0.00%); split: -0.00%, +0.00%
Latency: 229665459 -> 229665106 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 36870605 -> 36870504 (-0.00%); split: -0.00%, +0.00%
Copies: 1966751 -> 2233467 (+13.56%)
SALU: 3122941 -> 3123048 (+0.00%)

Note, that only about 20 shaders are actually affected.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c677809f25 aco/lower_branches: allow for non-fallthrough loop exits in try_merge_break_with_continue()
Totals from 211 (0.27% of 79395) affected shaders: (Navi31)

Instrs: 276961 -> 276545 (-0.15%)
CodeSize: 1404356 -> 1402248 (-0.15%)
Latency: 1344722 -> 1344887 (+0.01%); split: -0.00%, +0.01%
InvThroughput: 165624 -> 165622 (-0.00%); split: -0.00%, +0.00%
Branches: 6149 -> 5987 (-2.63%)
SALU: 25722 -> 25468 (-0.99%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
12656ea5f5 aco: move try_merge_break_with_continue() to lower_branches()
Totals from 3 (0.00% of 79395) affected shaders: (Navi31)

Instrs: 12888 -> 12882 (-0.05%)
Latency: 83253 -> 83246 (-0.01%)
InvThroughput: 9251 -> 9249 (-0.02%)
Branches: 483 -> 480 (-0.62%)
SALU: 1329 -> 1326 (-0.23%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
13ad3db43f aco/lower_branches: implement try_remove_simple_block() in lower_branches()
This is mostly the same as in jump_threading, but can handle
multiple predecessors.

Totals from 3523 (4.44% of 79395) affected shaders: (Navi31)

Instrs: 10244892 -> 10244753 (-0.00%); split: -0.00%, +0.00%
CodeSize: 54171500 -> 54168540 (-0.01%); split: -0.01%, +0.00%
Latency: 75070425 -> 75059570 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 11606911 -> 11605767 (-0.01%); split: -0.01%, +0.00%
Branches: 331778 -> 331675 (-0.03%); split: -0.05%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
2b5a893e29 aco/lower_branches: do eliminate_useless_exec_writes_in_block() during branch lowering.
Totals from 728 (0.92% of 79395) affected shaders: (Navi31)

Instrs: 452926 -> 452161 (-0.17%)
CodeSize: 2255536 -> 2252504 (-0.13%)
Latency: 1683404 -> 1683470 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 210887 -> 210888 (+0.00%); split: -0.00%, +0.00%
SALU: 77865 -> 77106 (-0.97%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
eecdb45d61 aco: consider s_cbranch_exec* instructions in needs_exec_mask()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
de1e38e214 aco/assembler: Find loop exits using the successor's loop nest depth
Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.

12 (0.02% of 79395) affected shaders.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00