Reported by clang tools.
See: https://clangd.llvm.org/guides/include-cleaner
struct ac_cmdbuf had to be moved to ac_cmdbuf_base.h because we can't
include ac_cmdbuf.h->sid.h->amdgfxregs.h in radeon_winsys.h for r300.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41091>
It's not good for performance, but it's possible to use for debugging.
Running single-wave GS workgroups could work around any LDS race conditions.
Setting the workgroup size to 64 reliably works around
GLCTS *primitive_counter*line failures, indicating streamout data
corruption with multi-wave GS workgroups.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38328>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
This will enable large code removal.
shader->config.lds_size is now always computed the same as ACO except for
compute shaders.
We have to add a new 8-bit user SGPR bitfield called
GS_STATE_GS_OUT_LDS_OFFSET_256B, which contains the offset
that was previously set by the relocation.
Since the offset must be a multiple of 256, we have to add padding
to the LDS size computation to make sure the alignment to 256 for the ESGS
LDS size doesn't cause us to exceed the maximum LDS size.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>
This has no effect on triangles because max 64 patches implied max 192
threads, but it improves performance for cases when the number of threads
per patch is > 3.
This improves the score for gfxbench5 "gl_tess_off" (offscreen) by 11%
on Navi48.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
Using the used component count is not enough. We need to consider
the component mask because any component can be disabled. This might
fix tests.
This removes the component counting from ac_get_fs_input_vgpr_cnt
and determines the component mask where it's needed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
This should be faster than 32_ABGR.
Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
It's split into ac_nir_lower_ps_early ac_nir_lower_ps_late.
ac_nir_lower_ps_early doesn't generate any AMD specific intrinsics except
some system values and is mainly an optimization pass with some lowering.
The new change here is that it also eliminates output components not needed
by spi_shader_col_format.
ac_nir_lower_ps_late lowers output stores to exports and does the bc_optimize
thing.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
If there is a 4-byte hole between 2 loads, they are vectorized. Example:
load 4 + hole 4 + load 8 -> load 16
This helps GLSL uniform loads, which are often sparse. See the code for more
info.
RADV could get better code by vectorizing later.
radeonsi+ACO - TOTALS FROM AFFECTED SHADERS (45482/58355)
Spilled SGPRs: 841 -> 747 (-11.18 %)
Code Size: 67552396 -> 65291092 (-3.35 %) bytes
Max Waves: 714439 -> 714520 (0.01 %)
This should have no effect on LLVM because ac_build_buffer_load scalarizes
SMEM, but it's improved for some reason:
radeonsi+LLVM - TOTALS FROM AFFECTED SHADERS (4673/58355)
Spilled SGPRs: 1450 -> 1282 (-11.59 %)
Spilled VGPRs: 106 -> 107 (0.94 %)
Scratch size: 101 -> 102 (0.99 %) dwords per thread
Code Size: 14994624 -> 14956316 (-0.26 %) bytes
Max Waves: 66679 -> 66735 (0.08 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
There is nothing preventing ACO from generating loads with unused
components. This happens often with GLSL uniforms. Some of those loads
are partially re-vectorized after this.
radeonsi+ACO:
TOTALS FROM AFFECTED SHADERS (19564/58918)
VGPRs: 732900 -> 728448 (-0.61 %)
Spilled SGPRs: 429 -> 433 (0.93 %)
Code Size: 38446004 -> 38485612 (0.10 %) bytes
Max Waves: 305440 -> 305549 (0.04 %)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
This injects a MRTZ export with only the alpha channel to select it
with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage.
Co-Authored-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>
This will need to be set to true when the GLSL linker lowers IO, which
can later be unlowered by st/mesa, and then drivers can lower it again
without load_interpolated_input. Therefore, it can't be a global
immutable option.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>
This was enabled by default in nir_opt_varyings, but vc4 can't handle
when shader outputs write Y but not X. Add an option for it and enable
it only for the driver that benefits from it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32174>
It will be used to allow merging loads with a hole between them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>