Commit graph

335 commits

Author SHA1 Message Date
Samuel Pitoiset
8e4d5743d2 radv: move debug related drirc to radv_drirc::debug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37145>
2025-09-05 05:56:17 +00:00
Georg Lehmann
83326af899 nir/builder: add nir_inverse_ballot_imm
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>
2025-09-04 14:03:56 +00:00
Georg Lehmann
ef8c364d3d nir: make inverse_ballot 1bit only
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>
2025-09-04 14:03:56 +00:00
Samuel Pitoiset
decf9af472 radv/rt: only use one user SGPR for the traversal shader addr
All shaders are allocated in the 32-bit addr space. To avoid an issue
with alignment, and also for future work, there is an unused user SGPR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37133>
2025-09-03 05:53:41 +00:00
Daniel Schürmann
fcf8899c9e radv/rt: use ACCESS_CAN_REORDER when loading SBT entries
Totals from 56 (0.07% of 79839) affected shaders: (Navi48)

Instrs: 2790220 -> 2790130 (-0.00%); split: -0.00%, +0.00%
CodeSize: 14704952 -> 14704292 (-0.00%)
Latency: 13994383 -> 13953444 (-0.29%); split: -0.29%, +0.00%
InvThroughput: 2717973 -> 2710748 (-0.27%); split: -0.27%, +0.00%
VClause: 68783 -> 68687 (-0.14%)
SClause: 51910 -> 52007 (+0.19%)
Copies: 223192 -> 223190 (-0.00%); split: -0.01%, +0.01%
VALU: 1557513 -> 1557451 (-0.00%); split: -0.00%, +0.00%
VMEM: 118789 -> 118692 (-0.08%)
SMEM: 66498 -> 66595 (+0.15%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36933>
2025-09-02 19:07:30 +00:00
Samuel Pitoiset
bc9a020dd3 radv: rename NGG culling user SGPRs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37022>
2025-09-01 08:52:55 +00:00
Marek Olšák
9e16ed7a13 ac/nir: switch nir_load_smem_amd uses to ac_nir_load_smem wrapper
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
ac_nir_load_smem will use load_global_amd

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Georg Lehmann
acd879f096 radv: set ACCESS_CAN_SPECULATE for smem buffer loads with known good descriptors
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB GFX1201:
Totals from 2872 (3.59% of 80098) affected shaders:
MaxWaves: 78208 -> 78234 (+0.03%); split: +0.21%, -0.18%
Instrs: 6214171 -> 6193701 (-0.33%); split: -0.40%, +0.07%
CodeSize: 33121244 -> 33113692 (-0.02%); split: -0.18%, +0.16%
VGPRs: 151680 -> 152016 (+0.22%); split: -0.25%, +0.47%
SpillSGPRs: 775 -> 776 (+0.13%)
Latency: 46080905 -> 45955331 (-0.27%); split: -0.55%, +0.28%
InvThroughput: 6235954 -> 6250598 (+0.23%); split: -0.25%, +0.48%
VClause: 111125 -> 110955 (-0.15%); split: -0.17%, +0.02%
SClause: 221845 -> 214761 (-3.19%); split: -3.20%, +0.01%
Copies: 501387 -> 488215 (-2.63%); split: -2.96%, +0.33%
Branches: 191455 -> 178574 (-6.73%)
PreSGPRs: 146364 -> 146923 (+0.38%); split: -0.12%, +0.50%
PreVGPRs: 120813 -> 121073 (+0.22%)
VALU: 3139282 -> 3137471 (-0.06%); split: -0.11%, +0.05%
SALU: 1079863 -> 1083158 (+0.31%); split: -0.55%, +0.86%
VMEM: 182255 -> 182247 (-0.00%)
SMEM: 293409 -> 290233 (-1.08%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>
2025-08-27 09:45:19 +00:00
Georg Lehmann
5a10142a9f radv/nir/lower_cmat: split up larger nested switches
This has been annoying me for quite some while, the level of indention
makes reviewing code changes in Gitlab harder.

I think now is a good time to change this before more cmat lowering is added.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37002>
2025-08-27 08:20:47 +00:00
Samuel Pitoiset
c5a5c8818c radv/nir/lower_cmat: handle untyped pointers for load/store
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36677>
2025-08-26 13:47:07 +00:00
Samuel Pitoiset
19c712c8ef radv: rename rast_prim to vgt_outprim_type everywhere
To avoid confusion between the primitive topology and the output
rasterized primitive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36912>
2025-08-25 12:17:38 +00:00
Samuel Pitoiset
ce83800262 radv: remove unused forwarded declarations of pipeline layout
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36792>
2025-08-18 07:25:34 +00:00
Konstantin Seurer
cc0dc4b566 radv: Store parent node IDs inside nodes on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Saves some space.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36691>
2025-08-15 13:00:32 +00:00
Konstantin Seurer
be4be884e1 radv: Rename radv_printf files to radv_debug_nir
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34392>
2025-08-15 10:32:34 +00:00
Samuel Pitoiset
0ac7f1888f radv: reduce the combined image/sampler desc size on GFX11+
From 96 to 64 due to the 32 bytes descriptor alignment.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36762>
2025-08-14 06:47:30 +00:00
Samuel Pitoiset
297cf6f1aa radv/meta: add a pass to clear HiZ surfaces
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36739>
2025-08-12 13:48:09 +00:00
Konstantin Seurer
c4b18c689f radv: Emit compressed primitive nodes on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emits two triangles per node whenever possible. The nir code will
revisit the triangle node to handle the second triangle only if both
triangles are interescted by the ray.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35734>
2025-08-07 20:23:15 +00:00
Qiang Yu
196569b1a4 all: rename gl_shader_stage to mesa_shader_stage
It's not only for GL, change to a generic name.

Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Alyssa Rosenzweig
82ae8b1d33 treewide: simplify nir_def_rewrite_uses_after
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.

We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.

Via Coccinelle patch:

    @@
    expression a, b;
    @@

    -nir_def_rewrite_uses_after(a, b, b->parent_instr)
    +nir_def_rewrite_uses_after_def(a, b)

Followed by a bunch of sed.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Georg Lehmann
4683187f49 radv/nir/lower_cmat: load gfx11 8bit ACC using the B layout to get aligned loads
This allows us to use aligned loads that can be vectorized, without any
downside as 8bit scalar loads always write 16bits of a register.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shader:
MaxWaves: 71 -> 68 (-4.23%)
Instrs: 60146 -> 59781 (-0.61%); split: -0.67%, +0.06%
CodeSize: 412448 -> 413428 (+0.24%); split: -0.11%, +0.35%
VGPRs: 2112 -> 2160 (+2.27%)
SpillVGPRs: 89 -> 68 (-23.60%)
Scratch: 11776 -> 8704 (-26.09%)
Latency: 196628 -> 193770 (-1.45%); split: -2.62%, +1.17%
InvThroughput: 224944 -> 226274 (+0.59%); split: -0.02%, +0.61%
VClause: 862 -> 796 (-7.66%)
Copies: 3166 -> 3342 (+5.56%); split: -6.22%, +11.78%
Branches: 37 -> 38 (+2.70%)
PreSGPRs: 311 -> 312 (+0.32%)
PreVGPRs: 2153 -> 2214 (+2.83%); split: -1.35%, +4.18%
VALU: 51073 -> 51448 (+0.73%); split: -0.03%, +0.77%
SALU: 1072 -> 1074 (+0.19%)
VMEM: 3275 -> 2765 (-15.57%)
VOPD: 1739 -> 1783 (+2.53%); split: +7.99%, -5.46%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Marek Olšák
09e607c385 nir: add access to load_smem_amd (for ACCESS_CAN_SPECULATE)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36099>
2025-07-24 18:41:38 +00:00
Marek Olšák
4c8a757951 radv,radeonsi: mark VS input loads and poly stipple load speculatable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35950>
2025-07-24 06:31:17 +00:00
Alyssa Rosenzweig
8a1a410389 treewide: use SWAP macro
Via Coccinelle patch + manual clean up:

    @@
    identifier temporary, a, b;
    type T;
    @@

    -T temporary = a;
    -a = b;
    -b = temporary;
    +SWAP(a, b);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>
2025-07-23 19:49:47 +00:00
Alyssa Rosenzweig
6b34e2174e nir: introduce ergonomic tex builder
for intrinsics, we have these really nice builders using designated initializers
+ macros to specify optional indices. texture instrs have even more craziness
involved, but we can do the same trick. this commit takes the existing "fixed
form" deref-centric tex builders and generalizes them to work with non-deref
textures, making it useful also for GL and late VK passes, while providing an
API that strives to be ergonomic and consistent.

this series only implements a subset of possible texture operations for now, but
more generalizing could be added as people have need.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36050>
2025-07-21 12:11:41 +00:00
Konstantin Seurer
d59c22b6e1 radv/rt: Implement null acceleration structure in shader code
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The previous approach is broken with descriptor buffer capture/replay
because the address off the dummy VA used can randomly change.

Totals from 78 (20.58% of 379) affected shaders:

Instrs: 3837275 -> 3839653 (+0.06%); split: -0.01%, +0.07%
CodeSize: 20235104 -> 20251744 (+0.08%); split: -0.01%, +0.09%
SpillSGPRs: 997 -> 1007 (+1.00%)
Latency: 22305937 -> 22331551 (+0.11%); split: -0.03%, +0.15%
InvThroughput: 4232313 -> 4237341 (+0.12%); split: -0.03%, +0.15%
VClause: 97043 -> 97027 (-0.02%); split: -0.02%, +0.01%
SClause: 72169 -> 72416 (+0.34%); split: -0.00%, +0.35%
Copies: 321578 -> 322126 (+0.17%); split: -0.11%, +0.28%
Branches: 110163 -> 110444 (+0.26%); split: -0.00%, +0.26%
PreSGPRs: 7879 -> 7942 (+0.80%)
VALU: 2155040 -> 2156425 (+0.06%); split: -0.02%, +0.09%
SALU: 502292 -> 503078 (+0.16%); split: -0.00%, +0.16%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36034>
2025-07-19 21:02:42 +00:00
Konstantin Seurer
d28ff8050a radv/rt: Use inv_dir for software ray-triangle tests
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:37 +00:00
Konstantin Seurer
5494789e89 radv/rt: Optimize emulated ray-triangle tests
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:37 +00:00
Konstantin Seurer
d140f2a6a2 radv: Implement watertightness for emulated RT
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:36 +00:00
Konstantin Seurer
55641f9ca0 radv: Disable pointer flags and the GFX12 WA for emulated RT
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:36 +00:00
Konstantin Seurer
df44b353ad radv: Optimize ray tracing position fetch
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Gets rid of a lot of indirection when fetching triangle positions.
Storing the primitive address increases register pressure by a bit but
the traversal shader which should have the highest register demand
should not be affected when position fetch is not used.

Totals:
Instrs: 4021686 -> 4022435 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21235812 -> 21235832 (+0.00%); split: -0.02%, +0.02%
Latency: 23402275 -> 23412110 (+0.04%); split: -0.04%, +0.09%
InvThroughput: 4352818 -> 4352206 (-0.01%); split: -0.04%, +0.02%
VClause: 101906 -> 102058 (+0.15%); split: -0.03%, +0.18%
Copies: 342210 -> 342368 (+0.05%); split: -0.09%, +0.14%
Branches: 114988 -> 114993 (+0.00%)
PreVGPRs: 26551 -> 27111 (+2.11%)
VALU: 2249366 -> 2249524 (+0.01%); split: -0.01%, +0.02%
SALU: 529828 -> 529808 (-0.00%); split: -0.01%, +0.00%

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35533>
2025-07-19 16:07:59 +00:00
Georg Lehmann
497f607c8e radv/nir/lower_cmat: vectorize GFX11 B -> ACC conversion
Foz-DB Navi31:
Totals from 7 out of 14 FSR4 shaders:
MaxWaves: 50 -> 52 (+4.00%)
Instrs: 44951 -> 44516 (-0.97%); split: -1.00%, +0.03%
CodeSize: 309176 -> 305500 (-1.19%); split: -1.23%, +0.04%
VGPRs: 1464 -> 1416 (-3.28%)
SpillVGPRs: 188 -> 92 (-51.06%)
Scratch: 24064 -> 11776 (-51.06%)
Latency: 171318 -> 163663 (-4.47%); split: -4.51%, +0.04%
InvThroughput: 178796 -> 178956 (+0.09%); split: -0.04%, +0.13%
VClause: 769 -> 730 (-5.07%); split: -6.50%, +1.43%
Copies: 3149 -> 3261 (+3.56%); split: -1.21%, +4.76%
PreVGPRs: 1607 -> 1467 (-8.71%)
VALU: 37715 -> 37744 (+0.08%); split: -0.11%, +0.18%
SALU: 754 -> 753 (-0.13%)
VMEM: 2813 -> 2621 (-6.83%)
VOPD: 1674 -> 1685 (+0.66%); split: +1.55%, -0.90%

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:52 +00:00
Georg Lehmann
7546169e1c radv/nir/lower_cmat: vectorize GFX11 ACC -> B conversion
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 64204 -> 60749 (-5.38%)
CodeSize: 439052 -> 417668 (-4.87%)
SpillVGPRs: 186 -> 188 (+1.08%)
Scratch: 23808 -> 24064 (+1.08%)
Latency: 208878 -> 202903 (-2.86%)
InvThroughput: 232898 -> 225688 (-3.10%)
VClause: 902 -> 907 (+0.55%); split: -1.55%, +2.11%
Copies: 6418 -> 3762 (-41.38%)
Branches: 55 -> 37 (-32.73%)
PreSGPRs: 297 -> 298 (+0.34%)
PreVGPRs: 2299 -> 2303 (+0.17%)
VALU: 54762 -> 51489 (-5.98%)
SALU: 956 -> 938 (-1.88%)
VMEM: 3469 -> 3473 (+0.12%)
VOPD: 3895 -> 2126 (-45.42%)

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:52 +00:00
Georg Lehmann
56d93c40ea radv/nir/lower_cmat: convert matrix use in smaller type
Less conversions, and less data to move around.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 65443 -> 64204 (-1.89%); split: -1.93%, +0.04%
CodeSize: 441884 -> 439052 (-0.64%); split: -1.21%, +0.57%
Latency: 213374 -> 208878 (-2.11%); split: -2.17%, +0.07%
InvThroughput: 236922 -> 232898 (-1.70%); split: -1.77%, +0.08%
VClause: 935 -> 902 (-3.53%); split: -3.74%, +0.21%
Copies: 5064 -> 6418 (+26.74%); split: -13.35%, +40.09%
Branches: 54 -> 55 (+1.85%)
VALU: 55700 -> 54762 (-1.68%); split: -1.85%, +0.16%
VOPD: 3459 -> 3895 (+12.60%); split: +16.88%, -4.28%

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:52 +00:00
Georg Lehmann
f2846b936a radv/nir/lower_cmat: use v_permlanex16_b32 instead of ds_swizzle_b32 for GFX11 ACC->B
ds_swizzle is slower than I expected.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 68802 -> 65443 (-4.88%)
CodeSize: 458000 -> 441884 (-3.52%)
Latency: 218147 -> 213374 (-2.19%); split: -3.17%, +0.99%
InvThroughput: 230190 -> 236922 (+2.92%); split: -0.25%, +3.18%
VClause: 922 -> 935 (+1.41%); split: -0.98%, +2.39%
Copies: 5877 -> 5064 (-13.83%); split: -15.74%, +1.91%
Branches: 37 -> 54 (+45.95%)
VALU: 53441 -> 55700 (+4.23%); split: -0.55%, +4.77%
SALU: 872 -> 956 (+9.63%)
VOPD: 1767 -> 3459 (+95.76%)

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:51 +00:00
Samuel Pitoiset
ea742877f6 radv: re-run clang-format
For style consistency.

$ clang-format -i $(find src/amd/vulkan/ -name "*.h" -o -name "*.c" -o -name "*.cpp")

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36118>
2025-07-16 09:10:33 +02:00
Natalie Vock
e978f6e247 radv/rt: Use ds_bvh_stack_push8_pop1_rtn_b32
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:40 +00:00
Natalie Vock
f0aa383e09 radv/rt: Use ds_bvh_stack_rtn
Improves Quake 2 RTX performance by 5% on RDNA3.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:40 +00:00
Natalie Vock
8815845271 radv/rt/gfx12: Always overwrite origin/dir
They're unchanged if we don't test against instance nodes. This makes
image_bvh8_intersect_ray kill its direction/origin operands, improving
RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:38 +00:00
Marek Olšák
bdcfe15457 radv: don't export cull distances if the shader culls against them
This increases primitive throughput for all hw with NGG if the shader
culls and the removal of cull distances reduces the number of position
exports.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
2025-07-12 05:20:05 +00:00
Marek Olšák
89e1ec92c5 radv: cull against clip and cull distances in the shader
Clip and cull distance outputs decrease primitive throughput, so culling
against them in the shader has even more benefit than other culling
options.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
2025-07-12 05:20:03 +00:00
Marek Olšák
65972f2301 ac/nir: return GSVS emit sizes from legacy GS lowering and simplify shader info
This simplifies shader info in drivers by returning GSVS emit sizes from
ac_nir_lower_legacy_gs. The pass knows the sizes, so drivers shouldn't
have to determine them independently.

This also makes the values more accurate because both drivers were
computing the GSVS emit sizes inaccurately and had redundant fields
in shader info. RADV had a lot of redudancy there.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
2025-07-12 05:20:02 +00:00
Qiang Yu
88c79a13b9 ac,radv: move nir_load_ring_mesh_scratch_offset_amd to ac
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
To be shared with radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
2025-07-11 02:25:51 +00:00
Qiang Yu
5ddbd8c83b ac,radv: move mesh scratch ring constants to ac
To be shared with radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
2025-07-11 02:25:51 +00:00
Qiang Yu
78fed5fc13 ac,radv: move nir_load_task_ring_entry_amd to ac
To be shared with radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
2025-07-11 02:25:51 +00:00
Georg Lehmann
cac60c39a9 radv/nir/lower_cmat: use explicit shift when calculating gfx12 wave64 layout
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The rest of the compiler stack doesn't understand the alignment implications
of the combined shift.

Effect on llama.cpp fossils:
Totals from 3 (13.64% of 22) affected shaders:
Instrs: 5778 -> 5684 (-1.63%)
CodeSize: 33540 -> 32800 (-2.21%)
VGPRs: 228 -> 216 (-5.26%)
Latency: 39942 -> 39417 (-1.31%)
InvThroughput: 12037 -> 11862 (-1.45%)
VALU: 2162 -> 2111 (-2.36%)

More importantly, this replaces some ds_load_2addr_b32 with ds_load_b64.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13447
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36016>
2025-07-10 07:11:23 +00:00
Alyssa Rosenzweig
d31cb824df treewide: use VARYING_BIT_*
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Via Coccinelle patch generated by the following Python:

  varys = [ "POS", "COL0", "COL1", "FOGC", "TEX0", "TEX1", "TEX2", "TEX3", "TEX4",
           "TEX5", "TEX6", "TEX7", "PSIZ", "BFC0", "BFC1", "EDGE", "CLIP_VERTEX",
           "CLIP_DIST0", "CLIP_DIST1", "CULL_DIST0", "CULL_DIST1", "PRIMITIVE_ID",
           "PRIMITIVE_COUNT", "LAYER", "VIEWPORT", "FACE",
           "PRIMITIVE_SHADING_RATE", "PNTC", "TESS_LEVEL_OUTER",
           "TESS_LEVEL_INNER", "PRIMITIVE_INDICES", "BOUNDING_BOX0",
           "BOUNDING_BOX1", "VIEWPORT_MASK", "CULL_PRIMITIVE" ]
  t = """
  @@
  @@

  -(1 << VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -BITFIELD_BIT(VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -(1ull << VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  @@
  @@

  -BITFIELD64_BIT(VARYING_SLOT_${V})
  +VARYING_BIT_${V}

  """
  for v in varys:
      from mako.template import Template
      print(Template(t).render(V = v))

Closes: #13453
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [panfrost, common]
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [broadcom]
Reviewed-by: Corentin Noël <corentin.noel@collabora.com> [virgl]
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> [zink]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35917>
2025-07-04 19:01:04 +00:00
Marek Olšák
4263b49778 ac/nir: remove ngg_scratch LDS ABI, allocate it in the lowering pass
This is a cleanup.

Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]

New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]

The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.

The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.

The ngg_lds_layout SGPR is now unused without GS in RADV.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
2025-07-02 20:27:41 +00:00
Natalie Vock
e236a731e4 radv/rt: Enable pointer flags on GFX11+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Allows hardware to do some of the culling work, as well as early-cull
box nodes with CullOpaque/CullNonOpaque ray masks when all children are
(not) opaque.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32417>
2025-06-28 10:31:38 +00:00