Aligning these can create situations where register allocation is
impossible. Only pseudo-instructions can use these, and they don't require
any alignment.
I'm not sure if these temporaries actually happen in practice. This
probably only affects the get_reg()'s compact_relocate_vars fallback path,
which doesn't usually happen for SGPRs.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34343>
D16 MIMG or pseudo-scalar transcendental instructions might give lower
limits to the maximum register available for their definitions, so just
try to place them earlier.
This is also part of fixing compact_relocate_vars with aligned NPOT
def/killed-op space (the second part is the later commit which changes
get_stride()).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34343>
The src and dst queue family indices in an image memory barrier may
contain arbitrary values that can be ignored unless both are different.
This fixes a crash in upcoming CTS tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34691>
These stages are for the jobs that are skipped in merge pipelines,
automatically run in nightly pipelines, and are available to run
manually in other pipelines.
None of these ever run in post-merge pipelines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34590>
Add new job definitions using the new debian/x86_64-test-video container,
and convert the radeonsi-raven-va and radeonsi-raven-vaapi-fluster jobs
to use the rootfs derived from this container.
Now that we are no longer downloading the Fluster vectors at runtime, the
parallelsim of the radeonsi-raven-vaapi-fluster job can be greatly
decreased.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34451>
The register file tests here should be done after update_renames().
Normally, get_reg() wouldn't have to move anything to make space for a 1-3
byte definition. This was encountered with skip_optimistic_path=true and a
get_reg_impl() bug (fixed in a later commit) which caused suboptimal
register assignment.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34679>
Invalid for now, but used by vkd3d-proton, where the use case is to convert
a result matrix to lower precision, followed by a store.
For 16bit accumulation matrices, GFX11 only uses 16bits per 32bit register.
RADV's coop matrix code pads the unused space with undefs and uses a vector
with twice as many elements as the matrix length. Extending that to 8bit by
leaving 24 bits unused is unnecessary as these matrices as there
is no hw unit that requires it. And in wave32, it would also result in
vectors larger than NIR's limit.
So tightly pack 8bit matrices without any undef padding.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>
With the current code in clpeak LLVM ended up generating v_mad_u64_u32
instructions, with this we get nice v_mad_u32_s24 ones instead and an 4x
performance increase in the int24 benchmark.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34630>
clang 20 complains:
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 5 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 6 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 7 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
But `i < MIN2(instr->operands.back().size() - 1, 5 - num_vaddr)` means `i` is
at most `5 - num_vaddr - 1`, which means `vaddr[num_vaddr + i]` =>
`vaddr[num_vaddr + 5 - num_vaddr - 1]` => `vaddr[5 - 1]` => `vaddr[4]` which
is within the valid indices.
For some reason, using signed `int` instead allows clang to figure this
out, so let's do that since we don't need the extra range.
While at it, use ARRAY_SIZE(vaddr) instead of hard-coding the same `5`
in several places.
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34625>
No change in the size on GPUs with 16 CUs per SE such as Navi31 and Navi48.
Navi21 and Navi32 get 25% increase. (20 CUs per SE)
APUs get a significant decrease. For example:
- Phoenix gets 25% decrease
- Vangogh gets 50% decrease
- Phoenix2 gets 75% decrease
- Raphael and Stoney get 87.5% decrease
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>