Commit graph

228 commits

Author SHA1 Message Date
Daniel Schürmann
7b1f6fa6fc aco: remove radeon_family from aco::Program
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:48 +00:00
Daniel Schürmann
febc29907c amd: add and use ac_cu_info::has_gfx6_mrt_export_bug
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:47 +00:00
Daniel Schürmann
7b7bdb76ab amd: add ac_cu_info::has_point_sample_accel flag and use in ACO
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:47 +00:00
Daniel Schürmann
cfb745592d amd: add ac_cu_info::has_mad32 flag and use in ACO
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:47 +00:00
Daniel Schürmann
1e3db50170 aco: use additional flags from ac_cu_info
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
f791e46c47 aco: add ac_cu_info to aco_compiler_options
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
addd4ea59f aco: pass aco_compiler_options to init_program()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
bf9bec07c2 aco/tests: don't pass CHIP_UNKNOWN to ACO
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
0db1ae1f01 aco: disable XNACK on all GPUs
Affects code generation on GFX8 and GFX9 APUs where we misunderstood
the feature. XNACK replay is not being used with graphics APIs.

Totals from 41759 (65.90% of 63370) affected shaders: (Raven)

MaxWaves: 298672 -> 299000 (+0.11%)
Instrs: 19200726 -> 19138227 (-0.33%); split: -0.33%, +0.00%
CodeSize: 98501904 -> 98253196 (-0.25%); split: -0.26%, +0.00%
SGPRs: 3058544 -> 2831492 (-7.42%)
VGPRs: 1644896 -> 1643660 (-0.08%)
Latency: 193383803 -> 193224047 (-0.08%); split: -0.08%, +0.00%
InvThroughput: 92741082 -> 92698975 (-0.05%); split: -0.05%, +0.00%
SClause: 678580 -> 630107 (-7.14%); split: -7.15%, +0.00%
Copies: 1863375 -> 1863406 (+0.00%); split: -0.04%, +0.04%
VALU: 13791245 -> 13791267 (+0.00%); split: -0.00%, +0.00%
SALU: 2066726 -> 2066741 (+0.00%); split: -0.04%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:43 +00:00
Marek Olšák
9b011a7344 amd: rename most GFX115x definitions for released chips
addrlib changes match the original code.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38718>
2025-12-03 13:29:07 +00:00
Antonio Ospite
222b85328e mesa: replace most occurrences of getenv() with os_get_option()
The standard way to query options in mesa is `os_get_option()` which
abstracts platform-specific mechanisms to get config variables.

However in quite a few places `getenv()` is still used and this may
preclude controlling some options on some systems.

For instance it is not generally possible to use `MESA_DEBUG` on
Android.

So replace most `getenv()` occurrences with  `os_get_option()` to
support configuration options more consistently across different
platforms.

Do the same with `secure_getenv()` replacing it with
`os_get_option_secure()`.

The bulk of the proposed changes are mechanically performed by the
following script:

-----------------------------------------------------------------------
  #!/bin/sh

  set -e

  replace() {

    # Don't replace in some files, for example where `os_get_option` is defined,
    # or in external files
    EXCLUDE_FILES_PATTERN='(src/util/os_misc.c|src/util/u_debug.h|src/gtest/include/gtest/internal/gtest-port.h)'

    # Don't replace some "system" variables
    EXCLUDE_VARS_PATTERN='("XDG|"DISPLAY|"HOME|"TMPDIR|"POSIXLY_CORRECT)'

    git grep "[=!( ]$1(" -- src/ | cut -d ':' -f 1 | sort | uniq | \
      grep -v -E "$EXCLUDE_FILES_PATTERN" | \
      while read -r file;
      do
        # Don't replace usages of XDG_* variables or HOME
        sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!\( ])$1\(/\1$2\(/g" -i "$file";
      done
  }

  # Add const to os_get_option results, to avoid warning about discarded qualifier:
  #   warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
  # but also errors in some cases:
  #   error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
  add_const_results() {
    git grep -l -P '(?<!const )char.*os_get_option' | \
      while read -r file;
      do
        sed -e '/^\s*const/! s/\(char.*os_get_option\)/const \1/g' -i "$file"
      done
  }

  replace 'secure_getenv' 'os_get_option_secure'

  replace 'getenv' 'os_get_option'

  add_const_results
-----------------------------------------------------------------------

After this, the `#include "util/os_misc.h"` is also added in files where
`os_get_option()` was not used before.

And since the replacements from the script above generated some new
`-Wdiscarded-qualifiers` warnings, those have been addressed as well,
generally by declaring `os_get_option()` results as `const char *` and
adjusting some function declarations.

Finally some replacements caused new errors like:

-----------------------------------------------------------------------
../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok'
  127 |          for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) {
      |                               ^~~~~~
/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char *') would lose const qualifier
  124 | char* _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter);
      |                 ^      ~~~~~~~~~~~~~~~~~~~
-----------------------------------------------------------------------

Those have been addressed too, copying the const string returned by
`os_get_option()` so that it could be modified.

In particular, the error above has been fixed  by copying the `const
char *env_llc_options` variable in
`src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char *` which can
be tokenized using `strtok()`.

Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
2025-11-06 04:36:13 +00:00
Daniel Schürmann
fe6ff6d1ef aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule
Use utility functions instead.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>
2025-10-15 11:20:08 +00:00
Georg Lehmann
cc08786689 aco: use maximum RT vgpr_limit that doesn't reduce wave count
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
144 instead of 132 with 5 waves, in practice.

Foz-DB Navi31:
Totals from 33 (0.04% of 80273) affected shaders:
Instrs: 3266241 -> 3261329 (-0.15%)
CodeSize: 16885356 -> 16860088 (-0.15%)
VGPRs: 4356 -> 4752 (+9.09%)
SpillVGPRs: 2504 -> 1535 (-38.70%)
Scratch: 264704 -> 216320 (-18.28%)
Latency: 18445909 -> 18395904 (-0.27%)
InvThroughput: 3689182 -> 3679182 (-0.27%)
VClause: 85171 -> 84595 (-0.68%)
SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01%
Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05%
Branches: 92537 -> 92519 (-0.02%)
VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01%
SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01%
VMEM: 147914 -> 146003 (-1.29%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548>
2025-09-26 08:45:05 +00:00
Georg Lehmann
26e041e821 aco: remove existing dealloc_vgprs use
We didn't consider that s_sendmsg dealloc_vgpr waits for all counters
expect vscnt.

Foz-DB Navi31:
Totals from 74090 (92.52% of 80084) affected shaders:
Instrs: 36031071 -> 35853573 (-0.49%)
CodeSize: 189233756 -> 188523764 (-0.38%)
Latency: 222378318 -> 222374890 (-0.00%)
InvThroughput: 33366893 -> 33362457 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508>
2025-09-26 07:51:02 +00:00
nihui
849344dc08 aco: set program->dev.fused_mad_mix=true for GFX940
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35655>
2025-09-16 07:02:32 +00:00
Natalie Vock
917a98b722 aco: Add ABI and Pseudo CALL format
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34531>
2025-09-15 17:16:20 +00:00
Rhys Perry
ac882985c0 aco/gfx10: skip waitcnts or use vm_vsrc(0) for workgroup vmem barriers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
21332609b9 aco: don't move acquire barriers before interlock begin
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
0ee1c137f9 aco: don't move release barriers after interlock end
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
7c056dd473 aco: add is_atomic_or_control_instr helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Georg Lehmann
a4c537c5b3 aco: use new disable_wqm for mubuf/mtbuf
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Natalie Vock
ea66a8d1c5 aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:40 +00:00
Natalie Vock
9707b30965 nir,aco: Add ds_bvh_stack_rtn
This is a ds instruction that also overwrites its first input, so
introduce a new ds format with two outputs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:39 +00:00
Natalie Vock
c515f1fd58 aco: Use vector-aligned operands for image_bvh8_intersect_ray
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:38 +00:00
Georg Lehmann
d45f375a9d aco: only insert fp mode when needed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746>
2025-07-10 13:48:50 +00:00
Natalie Vock
630913e1b4 aco: Introduce static_scratch_rsrc program member
Function callees get their scratch resource as a parameter instead of
generating it on-the-fly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031>
2025-06-26 11:02:53 +00:00
Natalie Vock
4a62b342f3 aco: Add common utility to load scratch descriptor
Also modifies the scratch descriptor to take the stack pointer into
account.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35031>
2025-06-26 11:02:52 +00:00
Georg Lehmann
001cd632ee aco: select float8 to fp32 conversions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:27 +00:00
Georg Lehmann
9e6adcbca0 aco: select fp32 to float8 conversions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:26 +00:00
Georg Lehmann
94c191e6d9 aco: remove p_v_cvt_pk_u8_f32
Now unused.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
2025-06-10 07:32:04 +00:00
Rhys Perry
86ccceb4de aco: don't consider gfx1153 to have point sample acceleration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
1fdfdbaf92 aco/hard_clauses: simplify and complete get_type()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This now includes image_msaa_load and the new atomic instructions in
GFX12.

It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.

No fossil-db changes (gfx1201, gfx1150 and navi31).

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:10 +00:00
Rhys Perry
8764ec0230 aco: consider image_msaa_load a sample operation before gfx12
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:09 +00:00
Rhys Perry
c04ef28c56 aco: rename ops_fixed_to_def to tied_defs
This is less words.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700>
2025-05-20 15:40:46 +00:00
Georg Lehmann
0257644130 aco: assume sram ecc is enabled on Vega20
There are D16 load issues on Vega20 that are expected if sram ecc is enabled.
It's a professional class chip and I found mentions of it supporting ecc,
so assume it's enabled.

Maybe this could be improved by querying ecc info from the kernel, but
I'm not sure which query should be used.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13189
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12393
Cc: mesa-stable

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35045>
2025-05-20 08:24:21 +00:00
Georg Lehmann
f98d20fec6 aco: replace get_operand_size with get_operand_type
To not use constant_bits everywhere.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35020>
2025-05-19 13:05:48 +00:00
Georg Lehmann
fa3f207035 aco: swap operands without instructions
For future work.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35020>
2025-05-19 13:05:48 +00:00
Georg Lehmann
3f70433ff0 aco: add type information for operands/definitions
More information available for use in the optimizer.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29695>
2025-05-15 12:17:17 +00:00
Rhys Perry
171920ceed aco/gfx115: consider point sample acceleration
Like 15428e0d786939a5c7629a9978947c8a9112ce96 in LLVM.

fossil-db (gfx1150):
Totals from 909 (1.14% of 79653) affected shaders:
Instrs: 5840489 -> 5840705 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31133460 -> 31134296 (+0.00%); split: -0.00%, +0.00%
Latency: 52982280 -> 53438577 (+0.86%); split: -0.00%, +0.86%
InvThroughput: 10841454 -> 10942682 (+0.93%); split: -0.00%, +0.93%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34935>
2025-05-14 11:22:13 +00:00
Georg Lehmann
c0e88c376a aco/optimizer: validate context data
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34858>
2025-05-09 14:23:25 +00:00
Georg Lehmann
906b7dbcec aco: replace novalidateir with novalidate debug option
The next commits will add more validation that's enabled by default.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34858>
2025-05-09 14:23:25 +00:00
Rhys Perry
6338ed44c5 aco/gfx12: increase maximum vbuffer offset
fossil-db (gfx1201):
Totals from 301 (0.38% of 79377) affected shaders:
Instrs: 2734478 -> 2728816 (-0.21%); split: -0.21%, +0.00%
CodeSize: 14347476 -> 14306568 (-0.29%)
Latency: 15508055 -> 15502202 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 2846419 -> 2842387 (-0.14%); split: -0.14%, +0.00%
VClause: 68286 -> 68101 (-0.27%); split: -0.30%, +0.03%
SClause: 49487 -> 49500 (+0.03%)
Copies: 207179 -> 206093 (-0.52%); split: -0.57%, +0.04%
Branches: 72941 -> 72942 (+0.00%); split: -0.00%, +0.00%
VALU: 1549156 -> 1544727 (-0.29%); split: -0.29%, +0.00%
SALU: 339620 -> 338989 (-0.19%); split: -0.19%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
d987d5e341 aco/gfx12: increase maximum global/scratch offset
fossil-db (gfx1201):
Totals from 29 (0.04% of 79377) affected shaders:
Instrs: 1439082 -> 1439070 (-0.00%)
CodeSize: 7641688 -> 7641608 (-0.00%)
Latency: 9836296 -> 9836280 (-0.00%)
VALU: 799504 -> 799496 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
02d193f058 aco/gfx12: increase maximum smem offset
fossil-db (gfx1201):
Totals from 55 (0.07% of 79377) affected shaders:
Instrs: 3203775 -> 3200809 (-0.09%)
CodeSize: 16817140 -> 16813440 (-0.02%); split: -0.04%, +0.02%
Latency: 17838315 -> 17829658 (-0.05%)
InvThroughput: 3352905 -> 3351689 (-0.04%)
SClause: 57377 -> 57273 (-0.18%)
Copies: 231006 -> 230941 (-0.03%)
SALU: 436900 -> 435234 (-0.38%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
c26851b80b aco: increase max_const_offset_plus_one for SMEM load_global
fossil-db (gfx1201):
Totals from 1115 (1.40% of 79377) affected shaders:
Instrs: 1473805 -> 1467571 (-0.42%); split: -0.43%, +0.01%
CodeSize: 7852972 -> 7819656 (-0.42%); split: -0.44%, +0.02%
SpillSGPRs: 1632 -> 1460 (-10.54%); split: -11.27%, +0.74%
Latency: 11975762 -> 11971915 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 2496961 -> 2496448 (-0.02%); split: -0.03%, +0.01%
VClause: 25213 -> 25218 (+0.02%); split: -0.00%, +0.02%
SClause: 28822 -> 28565 (-0.89%); split: -1.41%, +0.52%
Copies: 106377 -> 105715 (-0.62%); split: -1.23%, +0.61%
Branches: 27497 -> 27473 (-0.09%)
PreSGPRs: 52071 -> 51310 (-1.46%)
VALU: 871051 -> 870694 (-0.04%); split: -0.04%, +0.00%
SALU: 186090 -> 181811 (-2.30%); split: -2.32%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Konstantin Seurer
978e9b670e aco,nir: Add support for new GFX12 ray tracing instructions
Adds image_bvh_dual_intersect_ray and image_bvh8_intersect_ray which can
handle the new BVH format. Both instructions write up to 10 VGPRs so
they need to use a vec16 definition in nir.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
f309d76aab aco: Add support for multiple ops fixed to defs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
d1ff9e951a aco: Fix RT VGPR limit on Navi31/32, GFX11.5, GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since 128 is not a multiple of the VGPR allocation granule, we will
actually allocate 134 VGPRs. No reason not to use the extra 6.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34265>
2025-04-09 10:02:52 +00:00
Georg Lehmann
64cae5c48d aco: form mixed MTBUF/MUBUF clauses
This should be one clause (all of the instructions load from the same vertex buffer)

s_clause 0x2                                                ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2                                                ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen           ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005

Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379>
2025-04-08 09:22:04 +00:00