Konstantin Seurer
d59c22b6e1
radv/rt: Implement null acceleration structure in shader code
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The previous approach is broken with descriptor buffer capture/replay
because the address off the dummy VA used can randomly change.
Totals from 78 (20.58% of 379) affected shaders:
Instrs: 3837275 -> 3839653 (+0.06%); split: -0.01%, +0.07%
CodeSize: 20235104 -> 20251744 (+0.08%); split: -0.01%, +0.09%
SpillSGPRs: 997 -> 1007 (+1.00%)
Latency: 22305937 -> 22331551 (+0.11%); split: -0.03%, +0.15%
InvThroughput: 4232313 -> 4237341 (+0.12%); split: -0.03%, +0.15%
VClause: 97043 -> 97027 (-0.02%); split: -0.02%, +0.01%
SClause: 72169 -> 72416 (+0.34%); split: -0.00%, +0.35%
Copies: 321578 -> 322126 (+0.17%); split: -0.11%, +0.28%
Branches: 110163 -> 110444 (+0.26%); split: -0.00%, +0.26%
PreSGPRs: 7879 -> 7942 (+0.80%)
VALU: 2155040 -> 2156425 (+0.06%); split: -0.02%, +0.09%
SALU: 502292 -> 503078 (+0.16%); split: -0.00%, +0.16%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36034 >
2025-07-19 21:02:42 +00:00
Konstantin Seurer
d28ff8050a
radv/rt: Use inv_dir for software ray-triangle tests
...
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:37 +00:00
Konstantin Seurer
5494789e89
radv/rt: Optimize emulated ray-triangle tests
...
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:37 +00:00
Konstantin Seurer
d140f2a6a2
radv: Implement watertightness for emulated RT
...
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:36 +00:00
Konstantin Seurer
55641f9ca0
radv: Disable pointer flags and the GFX12 WA for emulated RT
...
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:36 +00:00
Konstantin Seurer
df44b353ad
radv: Optimize ray tracing position fetch
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Gets rid of a lot of indirection when fetching triangle positions.
Storing the primitive address increases register pressure by a bit but
the traversal shader which should have the highest register demand
should not be affected when position fetch is not used.
Totals:
Instrs: 4021686 -> 4022435 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21235812 -> 21235832 (+0.00%); split: -0.02%, +0.02%
Latency: 23402275 -> 23412110 (+0.04%); split: -0.04%, +0.09%
InvThroughput: 4352818 -> 4352206 (-0.01%); split: -0.04%, +0.02%
VClause: 101906 -> 102058 (+0.15%); split: -0.03%, +0.18%
Copies: 342210 -> 342368 (+0.05%); split: -0.09%, +0.14%
Branches: 114988 -> 114993 (+0.00%)
PreVGPRs: 26551 -> 27111 (+2.11%)
VALU: 2249366 -> 2249524 (+0.01%); split: -0.01%, +0.02%
SALU: 529828 -> 529808 (-0.00%); split: -0.01%, +0.00%
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35533 >
2025-07-19 16:07:59 +00:00
Georg Lehmann
497f607c8e
radv/nir/lower_cmat: vectorize GFX11 B -> ACC conversion
...
Foz-DB Navi31:
Totals from 7 out of 14 FSR4 shaders:
MaxWaves: 50 -> 52 (+4.00%)
Instrs: 44951 -> 44516 (-0.97%); split: -1.00%, +0.03%
CodeSize: 309176 -> 305500 (-1.19%); split: -1.23%, +0.04%
VGPRs: 1464 -> 1416 (-3.28%)
SpillVGPRs: 188 -> 92 (-51.06%)
Scratch: 24064 -> 11776 (-51.06%)
Latency: 171318 -> 163663 (-4.47%); split: -4.51%, +0.04%
InvThroughput: 178796 -> 178956 (+0.09%); split: -0.04%, +0.13%
VClause: 769 -> 730 (-5.07%); split: -6.50%, +1.43%
Copies: 3149 -> 3261 (+3.56%); split: -1.21%, +4.76%
PreVGPRs: 1607 -> 1467 (-8.71%)
VALU: 37715 -> 37744 (+0.08%); split: -0.11%, +0.18%
SALU: 754 -> 753 (-0.13%)
VMEM: 2813 -> 2621 (-6.83%)
VOPD: 1674 -> 1685 (+0.66%); split: +1.55%, -0.90%
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
7546169e1c
radv/nir/lower_cmat: vectorize GFX11 ACC -> B conversion
...
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 64204 -> 60749 (-5.38%)
CodeSize: 439052 -> 417668 (-4.87%)
SpillVGPRs: 186 -> 188 (+1.08%)
Scratch: 23808 -> 24064 (+1.08%)
Latency: 208878 -> 202903 (-2.86%)
InvThroughput: 232898 -> 225688 (-3.10%)
VClause: 902 -> 907 (+0.55%); split: -1.55%, +2.11%
Copies: 6418 -> 3762 (-41.38%)
Branches: 55 -> 37 (-32.73%)
PreSGPRs: 297 -> 298 (+0.34%)
PreVGPRs: 2299 -> 2303 (+0.17%)
VALU: 54762 -> 51489 (-5.98%)
SALU: 956 -> 938 (-1.88%)
VMEM: 3469 -> 3473 (+0.12%)
VOPD: 3895 -> 2126 (-45.42%)
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
56d93c40ea
radv/nir/lower_cmat: convert matrix use in smaller type
...
Less conversions, and less data to move around.
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 65443 -> 64204 (-1.89%); split: -1.93%, +0.04%
CodeSize: 441884 -> 439052 (-0.64%); split: -1.21%, +0.57%
Latency: 213374 -> 208878 (-2.11%); split: -2.17%, +0.07%
InvThroughput: 236922 -> 232898 (-1.70%); split: -1.77%, +0.08%
VClause: 935 -> 902 (-3.53%); split: -3.74%, +0.21%
Copies: 5064 -> 6418 (+26.74%); split: -13.35%, +40.09%
Branches: 54 -> 55 (+1.85%)
VALU: 55700 -> 54762 (-1.68%); split: -1.85%, +0.16%
VOPD: 3459 -> 3895 (+12.60%); split: +16.88%, -4.28%
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
f2846b936a
radv/nir/lower_cmat: use v_permlanex16_b32 instead of ds_swizzle_b32 for GFX11 ACC->B
...
ds_swizzle is slower than I expected.
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 68802 -> 65443 (-4.88%)
CodeSize: 458000 -> 441884 (-3.52%)
Latency: 218147 -> 213374 (-2.19%); split: -3.17%, +0.99%
InvThroughput: 230190 -> 236922 (+2.92%); split: -0.25%, +3.18%
VClause: 922 -> 935 (+1.41%); split: -0.98%, +2.39%
Copies: 5877 -> 5064 (-13.83%); split: -15.74%, +1.91%
Branches: 37 -> 54 (+45.95%)
VALU: 53441 -> 55700 (+4.23%); split: -0.55%, +4.77%
SALU: 872 -> 956 (+9.63%)
VOPD: 1767 -> 3459 (+95.76%)
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:51 +00:00
Samuel Pitoiset
ea742877f6
radv: re-run clang-format
...
For style consistency.
$ clang-format -i $(find src/amd/vulkan/ -name "*.h" -o -name "*.c" -o -name "*.cpp")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36118 >
2025-07-16 09:10:33 +02:00
Natalie Vock
e978f6e247
radv/rt: Use ds_bvh_stack_push8_pop1_rtn_b32
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
f0aa383e09
radv/rt: Use ds_bvh_stack_rtn
...
Improves Quake 2 RTX performance by 5% on RDNA3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
8815845271
radv/rt/gfx12: Always overwrite origin/dir
...
They're unchanged if we don't test against instance nodes. This makes
image_bvh8_intersect_ray kill its direction/origin operands, improving
RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Marek Olšák
bdcfe15457
radv: don't export cull distances if the shader culls against them
...
This increases primitive throughput for all hw with NGG if the shader
culls and the removal of cull distances reduces the number of position
exports.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:05 +00:00
Marek Olšák
89e1ec92c5
radv: cull against clip and cull distances in the shader
...
Clip and cull distance outputs decrease primitive throughput, so culling
against them in the shader has even more benefit than other culling
options.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:03 +00:00
Marek Olšák
65972f2301
ac/nir: return GSVS emit sizes from legacy GS lowering and simplify shader info
...
This simplifies shader info in drivers by returning GSVS emit sizes from
ac_nir_lower_legacy_gs. The pass knows the sizes, so drivers shouldn't
have to determine them independently.
This also makes the values more accurate because both drivers were
computing the GSVS emit sizes inaccurately and had redundant fields
in shader info. RADV had a lot of redudancy there.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:02 +00:00
Qiang Yu
88c79a13b9
ac,radv: move nir_load_ring_mesh_scratch_offset_amd to ac
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931 >
2025-07-11 02:25:51 +00:00
Qiang Yu
5ddbd8c83b
ac,radv: move mesh scratch ring constants to ac
...
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931 >
2025-07-11 02:25:51 +00:00
Qiang Yu
78fed5fc13
ac,radv: move nir_load_task_ring_entry_amd to ac
...
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931 >
2025-07-11 02:25:51 +00:00
Georg Lehmann
cac60c39a9
radv/nir/lower_cmat: use explicit shift when calculating gfx12 wave64 layout
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The rest of the compiler stack doesn't understand the alignment implications
of the combined shift.
Effect on llama.cpp fossils:
Totals from 3 (13.64% of 22) affected shaders:
Instrs: 5778 -> 5684 (-1.63%)
CodeSize: 33540 -> 32800 (-2.21%)
VGPRs: 228 -> 216 (-5.26%)
Latency: 39942 -> 39417 (-1.31%)
InvThroughput: 12037 -> 11862 (-1.45%)
VALU: 2162 -> 2111 (-2.36%)
More importantly, this replaces some ds_load_2addr_b32 with ds_load_b64.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13447
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36016 >
2025-07-10 07:11:23 +00:00
Alyssa Rosenzweig
d31cb824df
treewide: use VARYING_BIT_*
...
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Via Coccinelle patch generated by the following Python:
varys = [ "POS", "COL0", "COL1", "FOGC", "TEX0", "TEX1", "TEX2", "TEX3", "TEX4",
"TEX5", "TEX6", "TEX7", "PSIZ", "BFC0", "BFC1", "EDGE", "CLIP_VERTEX",
"CLIP_DIST0", "CLIP_DIST1", "CULL_DIST0", "CULL_DIST1", "PRIMITIVE_ID",
"PRIMITIVE_COUNT", "LAYER", "VIEWPORT", "FACE",
"PRIMITIVE_SHADING_RATE", "PNTC", "TESS_LEVEL_OUTER",
"TESS_LEVEL_INNER", "PRIMITIVE_INDICES", "BOUNDING_BOX0",
"BOUNDING_BOX1", "VIEWPORT_MASK", "CULL_PRIMITIVE" ]
t = """
@@
@@
-(1 << VARYING_SLOT_${V})
+VARYING_BIT_${V}
@@
@@
-BITFIELD_BIT(VARYING_SLOT_${V})
+VARYING_BIT_${V}
@@
@@
-(1ull << VARYING_SLOT_${V})
+VARYING_BIT_${V}
@@
@@
-BITFIELD64_BIT(VARYING_SLOT_${V})
+VARYING_BIT_${V}
"""
for v in varys:
from mako.template import Template
print(Template(t).render(V = v))
Closes : #13453
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [panfrost, common]
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [broadcom]
Reviewed-by: Corentin Noël <corentin.noel@collabora.com> [virgl]
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> [zink]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35917 >
2025-07-04 19:01:04 +00:00
Marek Olšák
4263b49778
ac/nir: remove ngg_scratch LDS ABI, allocate it in the lowering pass
...
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352 >
2025-07-02 20:27:41 +00:00
Natalie Vock
e236a731e4
radv/rt: Enable pointer flags on GFX11+
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Allows hardware to do some of the culling work, as well as early-cull
box nodes with CullOpaque/CullNonOpaque ray masks when all children are
(not) opaque.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32417 >
2025-06-28 10:31:38 +00:00
Marek Olšák
42e98f115a
radv: always use the ngg_lds_layout SGPR
...
This is a prerequisite for NGG lowering passes to return LDS vertex and
scratch sizes, which will lead to further simplifications. That will
require calling gfx10_get_ngg_info after radv_postprocess_nir, which means
LDS offsets are unknown when the passes are called.
This makes the 2 values no longer compile-time constants.
A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will
determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from
an SGPR, though that could be removed too (non-trivially) or handled as
a relocation.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351 >
2025-06-28 08:20:26 +00:00
Samuel Pitoiset
989162e67a
radv: split descriptor set and descriptor utils in separate files
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35732 >
2025-06-27 07:55:37 +00:00
Marek Olšák
12df9b3def
nir: rename nir_vectorize_tess_levels -> nir_lower_tess_level_array_vars_to_vec
...
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760 >
2025-06-26 18:20:50 +00:00
Marek Olšák
439d805291
nir: rename nir_lower_io_to_scalar_early -> nir_lower_io_vars_to_scalar
...
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760 >
2025-06-26 18:20:49 +00:00
Georg Lehmann
21523dad96
radv/nir/lower_cmat: use nir_src_as_deref
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633 >
2025-06-24 17:12:34 +00:00
Georg Lehmann
48fc8c8d1c
radv/nir/lower_cmat: set optimal load/store alignment
...
Allows vectorizing load/stores with sub dword types or with robustness.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633 >
2025-06-24 17:12:33 +00:00
Georg Lehmann
ed2ecf9ef8
radv/nir/lower_cmat: share cmat_load/cmat_store code
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633 >
2025-06-24 17:12:33 +00:00
Georg Lehmann
4ac6aae3a4
radv/nir/lower_cmat: fix gfx11 B->ACC conversion
...
Of course I messed up the one path that's not tested by CTS.
Fixes: 249ccc6b4c ("radv/nir/lower_cmat: implement use conversions/transpose")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35713 >
2025-06-24 15:53:52 +00:00
Georg Lehmann
249ccc6b4c
radv/nir/lower_cmat: implement use conversions/transpose
...
This could potentially be improved using packed 32bit subgroup ops,
but what we actually care about (gfx12 ACC -> B) is free.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34793 >
2025-06-24 07:14:34 +00:00
Georg Lehmann
cbd17cb4d6
radv/nir/lower_cmat: handle float8 conversions
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434 >
2025-06-23 07:59:27 +00:00
Samuel Pitoiset
59dfa8c2f5
radv: switch to nir_intrinsic_load_input_attachment_coord
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35556 >
2025-06-20 06:12:24 +00:00
Samuel Pitoiset
50c4d5cccd
radv: use one descriptor per plane for combined image+sampler with ycbcr
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This removes a very old hack which will also allow us to enable DCC
for multiplanar formats eventually and to reduce the combined
image+sampler descriptor size from 96 to 48 on RDNA3+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457 >
2025-06-19 12:58:32 +00:00
Samuel Pitoiset
4f37876c7b
radv: replace radv_combined_image_descriptor_sampler_offset() by a constant
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457 >
2025-06-19 12:58:31 +00:00
Georg Lehmann
e0cdf4dfdd
radv/nir/lower_cmat: use common matrix layout on gfx12
...
The GFX12 ISA doc describes other layouts for A/B, but they are identical
to the C layout with the exception of the order of the rows (columns for A).
And as long as these are swapped in the same way for both A and B, the muladd
result will be the same. So we use the C layout for all uses.
This will simplify conversions between uses, and allows A/B to use a single
memory access for load/store in wave32.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35570 >
2025-06-18 06:33:06 +00:00
Samuel Pitoiset
eeabce93b6
radv: use constants for different descriptor sizes
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of magic values everywhere.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428 >
2025-06-13 07:53:04 +00:00
Samuel Pitoiset
63f8b8ce6d
radv/nir: adjust a comment about inlining immutable samplers
...
That (broken) optimization has been removed few weeks ago.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428 >
2025-06-13 07:53:04 +00:00
Samuel Pitoiset
99fb1a9bd7
radv/nir: lower unassigned vertex attributes to (0,0,0,0)
...
The spec allows both 0,0,0,0 and 0,0,0,1. Returning all zeroes makes it
consistent with vertex prologs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35423 >
2025-06-13 07:33:03 +00:00
Marek Olšák
5734a916d6
ac: move tcs_offchip_layout into ac_shader_args
...
It's the same variable between radv and radeonsi, but the implementation of
the load intrinsics is very different.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
9d9cfd89da
ac/nir/tess: compute the number of remapped VRAM outputs in common code
...
This unifies it for both drivers.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
ea70060826
ac/nir/tess: stop using tes_inputs_read / tes_patch_inputs read for TCS & TES
...
use ac_nir_tess_io_info instead
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
42445e271e
radv,radeonsi: use ac_nir_tess_io_info for LDS size computation
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
a59464b6e3
radv,radeonsi: precompute and pass TCS per-vertex output stride via a user SGPR
...
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
742227c65c
radv,radeonsi: make TCS_OFFCHIP_LAYOUT_NUM_PATCHES not off by one
...
We never use 128 anyway.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
8d3e3c72e0
radv,radeonsi: merge PATCH_CONTROL_POINT & OUT_PATCH_CP into 1 field
...
One is only used by TCS, the other is only used by TES.
Use the same field for both, call it PATCH_VERTICES_IN.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:39 +00:00
Marek Olšák
534b282573
ac/nir/tess: adjust memory layout of TCS outputs to have aligned store offsets
...
There is a comment that explains it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780 >
2025-06-07 16:29:38 +00:00
Rhys Perry
00a2ed60f8
radv/meta: use unsigned min in copy/fill shaders
...
Otherwise, this would break >2 GiB copy/fill.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport: 25.1
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35343 >
2025-06-05 09:55:32 +00:00