David Rosca
211dc09e0f
radv/video: Fix encode when using layered source image
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Found by inspection.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36214 >
2025-07-21 09:52:47 +00:00
Samuel Pitoiset
49e49d7dde
radv: add RADV_DEBUG=novideo to disable all video extensions
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's useful when video support has issues.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36186 >
2025-07-21 07:20:12 +00:00
Samuel Pitoiset
af22d5c97d
radv: use vk_optimize_depth_stencil_state() for optimal settings
...
For apps that aren't optimized.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36168 >
2025-07-21 06:53:40 +00:00
Samuel Pitoiset
79c02a3388
radv: adjust conservative rasterization configuration on GFX12
...
PAL doesn't set these two registers either.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36196 >
2025-07-21 06:26:48 +00:00
Konstantin Seurer
d59c22b6e1
radv/rt: Implement null acceleration structure in shader code
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The previous approach is broken with descriptor buffer capture/replay
because the address off the dummy VA used can randomly change.
Totals from 78 (20.58% of 379) affected shaders:
Instrs: 3837275 -> 3839653 (+0.06%); split: -0.01%, +0.07%
CodeSize: 20235104 -> 20251744 (+0.08%); split: -0.01%, +0.09%
SpillSGPRs: 997 -> 1007 (+1.00%)
Latency: 22305937 -> 22331551 (+0.11%); split: -0.03%, +0.15%
InvThroughput: 4232313 -> 4237341 (+0.12%); split: -0.03%, +0.15%
VClause: 97043 -> 97027 (-0.02%); split: -0.02%, +0.01%
SClause: 72169 -> 72416 (+0.34%); split: -0.00%, +0.35%
Copies: 321578 -> 322126 (+0.17%); split: -0.11%, +0.28%
Branches: 110163 -> 110444 (+0.26%); split: -0.00%, +0.26%
PreSGPRs: 7879 -> 7942 (+0.80%)
VALU: 2155040 -> 2156425 (+0.06%); split: -0.02%, +0.09%
SALU: 502292 -> 503078 (+0.16%); split: -0.00%, +0.16%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36034 >
2025-07-19 21:02:42 +00:00
Konstantin Seurer
d28ff8050a
radv/rt: Use inv_dir for software ray-triangle tests
...
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:37 +00:00
Konstantin Seurer
5494789e89
radv/rt: Optimize emulated ray-triangle tests
...
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:37 +00:00
Konstantin Seurer
d140f2a6a2
radv: Implement watertightness for emulated RT
...
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:36 +00:00
Konstantin Seurer
55641f9ca0
radv: Disable pointer flags and the GFX12 WA for emulated RT
...
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213 >
2025-07-19 16:35:36 +00:00
Konstantin Seurer
df44b353ad
radv: Optimize ray tracing position fetch
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Gets rid of a lot of indirection when fetching triangle positions.
Storing the primitive address increases register pressure by a bit but
the traversal shader which should have the highest register demand
should not be affected when position fetch is not used.
Totals:
Instrs: 4021686 -> 4022435 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21235812 -> 21235832 (+0.00%); split: -0.02%, +0.02%
Latency: 23402275 -> 23412110 (+0.04%); split: -0.04%, +0.09%
InvThroughput: 4352818 -> 4352206 (-0.01%); split: -0.04%, +0.02%
VClause: 101906 -> 102058 (+0.15%); split: -0.03%, +0.18%
Copies: 342210 -> 342368 (+0.05%); split: -0.09%, +0.14%
Branches: 114988 -> 114993 (+0.00%)
PreVGPRs: 26551 -> 27111 (+2.11%)
VALU: 2249366 -> 2249524 (+0.01%); split: -0.01%, +0.02%
SALU: 529828 -> 529808 (-0.00%); split: -0.01%, +0.00%
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35533 >
2025-07-19 16:07:59 +00:00
Georg Lehmann
497f607c8e
radv/nir/lower_cmat: vectorize GFX11 B -> ACC conversion
...
Foz-DB Navi31:
Totals from 7 out of 14 FSR4 shaders:
MaxWaves: 50 -> 52 (+4.00%)
Instrs: 44951 -> 44516 (-0.97%); split: -1.00%, +0.03%
CodeSize: 309176 -> 305500 (-1.19%); split: -1.23%, +0.04%
VGPRs: 1464 -> 1416 (-3.28%)
SpillVGPRs: 188 -> 92 (-51.06%)
Scratch: 24064 -> 11776 (-51.06%)
Latency: 171318 -> 163663 (-4.47%); split: -4.51%, +0.04%
InvThroughput: 178796 -> 178956 (+0.09%); split: -0.04%, +0.13%
VClause: 769 -> 730 (-5.07%); split: -6.50%, +1.43%
Copies: 3149 -> 3261 (+3.56%); split: -1.21%, +4.76%
PreVGPRs: 1607 -> 1467 (-8.71%)
VALU: 37715 -> 37744 (+0.08%); split: -0.11%, +0.18%
SALU: 754 -> 753 (-0.13%)
VMEM: 2813 -> 2621 (-6.83%)
VOPD: 1674 -> 1685 (+0.66%); split: +1.55%, -0.90%
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
7546169e1c
radv/nir/lower_cmat: vectorize GFX11 ACC -> B conversion
...
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 64204 -> 60749 (-5.38%)
CodeSize: 439052 -> 417668 (-4.87%)
SpillVGPRs: 186 -> 188 (+1.08%)
Scratch: 23808 -> 24064 (+1.08%)
Latency: 208878 -> 202903 (-2.86%)
InvThroughput: 232898 -> 225688 (-3.10%)
VClause: 902 -> 907 (+0.55%); split: -1.55%, +2.11%
Copies: 6418 -> 3762 (-41.38%)
Branches: 55 -> 37 (-32.73%)
PreSGPRs: 297 -> 298 (+0.34%)
PreVGPRs: 2299 -> 2303 (+0.17%)
VALU: 54762 -> 51489 (-5.98%)
SALU: 956 -> 938 (-1.88%)
VMEM: 3469 -> 3473 (+0.12%)
VOPD: 3895 -> 2126 (-45.42%)
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
56d93c40ea
radv/nir/lower_cmat: convert matrix use in smaller type
...
Less conversions, and less data to move around.
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 65443 -> 64204 (-1.89%); split: -1.93%, +0.04%
CodeSize: 441884 -> 439052 (-0.64%); split: -1.21%, +0.57%
Latency: 213374 -> 208878 (-2.11%); split: -2.17%, +0.07%
InvThroughput: 236922 -> 232898 (-1.70%); split: -1.77%, +0.08%
VClause: 935 -> 902 (-3.53%); split: -3.74%, +0.21%
Copies: 5064 -> 6418 (+26.74%); split: -13.35%, +40.09%
Branches: 54 -> 55 (+1.85%)
VALU: 55700 -> 54762 (-1.68%); split: -1.85%, +0.16%
VOPD: 3459 -> 3895 (+12.60%); split: +16.88%, -4.28%
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Georg Lehmann
f2846b936a
radv/nir/lower_cmat: use v_permlanex16_b32 instead of ds_swizzle_b32 for GFX11 ACC->B
...
ds_swizzle is slower than I expected.
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 68802 -> 65443 (-4.88%)
CodeSize: 458000 -> 441884 (-3.52%)
Latency: 218147 -> 213374 (-2.19%); split: -3.17%, +0.99%
InvThroughput: 230190 -> 236922 (+2.92%); split: -0.25%, +3.18%
VClause: 922 -> 935 (+1.41%); split: -0.98%, +2.39%
Copies: 5877 -> 5064 (-13.83%); split: -15.74%, +1.91%
Branches: 37 -> 54 (+45.95%)
VALU: 53441 -> 55700 (+4.23%); split: -0.55%, +4.77%
SALU: 872 -> 956 (+9.63%)
VOPD: 1767 -> 3459 (+95.76%)
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:51 +00:00
Samuel Pitoiset
00a6e284c8
radv: implement DGC IB chaining when the number of sequences is too high
...
The maximum number of DWORDS per IB is limited by the hardware. So,
when the number of sequences is too high, it would just hang.
The solution here is to implement IB chaining inside the DGC cmdbuf
itself, so that a sequence chains the next one basically.
In practice, games only use up to 4K sequences and they aren't affected
by this change.
This fixes dEQP-VK.dgc.ext.misc.properties.maxIndirectSequenceCount.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13536
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36062 >
2025-07-16 10:30:41 +00:00
Samuel Pitoiset
ea742877f6
radv: re-run clang-format
...
For style consistency.
$ clang-format -i $(find src/amd/vulkan/ -name "*.h" -o -name "*.c" -o -name "*.cpp")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36118 >
2025-07-16 09:10:33 +02:00
Samuel Pitoiset
6111e40a55
radv/bvh: remove redundant definition of DIV_ROUND_UP
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36118 >
2025-07-16 09:09:30 +02:00
Natalie Vock
e978f6e247
radv/rt: Use ds_bvh_stack_push8_pop1_rtn_b32
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
f0aa383e09
radv/rt: Use ds_bvh_stack_rtn
...
Improves Quake 2 RTX performance by 5% on RDNA3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
8815845271
radv/rt/gfx12: Always overwrite origin/dir
...
They're unchanged if we don't test against instance nodes. This makes
image_bvh8_intersect_ray kill its direction/origin operands, improving
RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
David Rosca
850a3b0cae
radv/video: Set correct VP9 decode minCodedExtent
...
Fixes: b8ac2d47e7 ("radv/video: add KHR_video_decode_vp9 support.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35997 >
2025-07-15 17:44:15 +00:00
David Rosca
50eaa0c19f
radv/video: Set correct H264/5 decode minCodedExtent
...
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35997 >
2025-07-15 17:44:15 +00:00
Marek Olšák
6286c1c66f
nir/opt_vectorize_io: optionally vectorize loads with holes
...
e.g. load X; load W; ==> load XYZW. Verified with a shader test.
This will be used by AMD drivers. See the code comments.
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36098 >
2025-07-15 16:29:30 +00:00
Marek Olšák
b0494f9485
nir/opt_varyings: optimize the consumer after constant propagation and dedupli.
...
A TF2 shader propagates 0 to the consumer, which eliminates 1 input
if we run algebraic opts and DCE before compaction.
This is a prerequisite for removing all IO var optimizations from the GLSL
linker that are redundant with nir_opt_varyings.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36091 >
2025-07-15 13:38:29 +00:00
Samuel Pitoiset
b59895140d
radv: add a way to disable the HIZ/HiS events based workaround on GFX12
...
This workaround doesn't mitigate the issue reliably/completely. An
alternative (but complex) solution also exists.
This introduces a small option that allows to disable the current
workaround as preliminary work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36110 >
2025-07-15 10:01:54 +00:00
Pavel Gribov
24cb745460
radv: small fix for sam check
...
for exact PCIe 3.0 x8 case there will be
pcie_bandwidth_mbps >= bandwidth_mbps_threshold => (8069 >= 8069,12) == false
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36109 >
2025-07-15 09:37:32 +00:00
Samuel Pitoiset
fbea486854
radv: advertise VK_EXT_host_image_copy on GFX10+ behind RADV_PERFTEST=hic
...
This exposes an experimental implementation of HIC with
RADV_PERFTEST=hic. It's passing 100% of VKCTS but it requires some
benchmarks first to verify if performance is acceptable or not.
No addrlib support for GFX6-9.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:16 +00:00
Samuel Pitoiset
ea4ad51eb1
radv: implement vkTransitionImageLayout()
...
It's a no-op.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:16 +00:00
Samuel Pitoiset
b2e338a9c7
radv: implement vkCopyImageToImageEXT()
...
Because there is no surface<->surface helper in addrlib, this allocates
a temporary buffer on the CPU to do image->buffer->image. It's a naive
implementation which is probably not the best for performance, but it
works at least.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:16 +00:00
Samuel Pitoiset
c9ea920da0
radv: implement vkCopyMemoryToImageEXT()/vkCopyImageToMemoryEXT()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:16 +00:00
Samuel Pitoiset
4a5370819c
radv: do not use MRT counters for host-transfer images
...
Otherwise, the tile swizzle changes and addrlib is confused.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:15 +00:00
Samuel Pitoiset
8d38b25cb3
radv: add support for querying HIC memcpy size
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:15 +00:00
Samuel Pitoiset
031843ebb1
radv: add support for querying HIC performance info
...
On GFX12, everything is compressed with DCC and it's completely
transparent to the userspace driver, so that should be optimal.
On older gens, using HIC disables compression which isn't optimal
for device access.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:15 +00:00
Samuel Pitoiset
d89b11011f
radv: add support for formats with host-transfer
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:15 +00:00
Samuel Pitoiset
545d5a0675
radv: set RADEON_SURF_HOST_TRANSFER for host-transfer images
...
To forbid some swizzles on GFX11.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:14 +00:00
Samuel Pitoiset
37f3997edf
radv: disable compression for host-transfer images
...
HIC isn't supposed to have compression.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:14 +00:00
Samuel Pitoiset
afa7509207
radv: map images with host-transfer at bind time
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:13 +00:00
Samuel Pitoiset
d85b7b6c62
radv: only expose host visible memory types for images with host-transfer
...
Because the memory must be mapped on the CPU.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35974 >
2025-07-15 09:12:13 +00:00
David Rosca
f78222dc29
radv/video: Add support for decode tier3
...
On VCN5 both distinct and coincide output/dpb are supported. Tier3
(coincide) requires tiling, Tier2 (distinct) also works with linear.
Application can decide which one to use.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35878 >
2025-07-14 07:42:28 +00:00
David Rosca
2d06b43292
radv: Enable tiling for video images on VCN5
...
All planes must have the same swizzle mode and no tile swizzle.
Only linear decode target requires the custom height alignment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35878 >
2025-07-14 07:42:27 +00:00
David Rosca
e50ee32876
radv: Don't allow linear tiling for video DPB images
...
We don't support linear DPB images and they will currently always
be tiled internally.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35878 >
2025-07-14 07:42:27 +00:00
Samuel Pitoiset
a51afbaff8
radv/sdma: fix unaligned 96-bits copies on GFX9
...
On SDMA4, when the pitch isn't aligned, the width needs to be scaled
by 3 for 96-bits formats.
On SDMA5+, the pitch is aligned and the driver doesn't need to fallback
to unaligned copies.
CC: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36067 >
2025-07-14 06:30:55 +00:00
Yogesh Mohan Marimuthu
1000ee3d2f
ac,radeonsi,radv: rename register_shadowing_required
...
rename register_shadowing_required to has_kernelq_reg_shadowing
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35106 >
2025-07-13 20:05:25 +00:00
Marek Olšák
34580a32ff
ac/nir: remove redundant option dont_export_cull_distances
...
It has the same value as can_cull.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Marek Olšák
fde3384cfd
ac/nir: remove pack_clip_cull_distances option
...
it's always true
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Marek Olšák
f6aecfb886
ac/llvm: don't declare LDS as an array for HS & GS & CS, use IntToPtr(0)
...
We don't need all this stuff anymore.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Marek Olšák
65c5ee1628
radeonsi: stop using LLVM LDS linking logic for the GS out LDS offset
...
This will enable large code removal.
shader->config.lds_size is now always computed the same as ACO except for
compute shaders.
We have to add a new 8-bit user SGPR bitfield called
GS_STATE_GS_OUT_LDS_OFFSET_256B, which contains the offset
that was previously set by the relocation.
Since the offset must be a multiple of 256, we have to add padding
to the LDS size computation to make sure the alignment to 256 for the ESGS
LDS size doesn't cause us to exceed the maximum LDS size.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:20 +00:00
Marek Olšák
f8918ed6c6
radv: stop using LLVM LDS linking logic
...
Not needed.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:06 +00:00
Marek Olšák
44dd39d121
radv: pack clip and cull distance outputs for both legacy and NGG pipelines
...
This increases primitive throughput when packing reduces the number
of pos exports due to holes in clip and cull distance arrays that could be
punched out by nir_opt_clip_cull_const. This applies to all chips.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:06 +00:00
Marek Olšák
2751d488ce
radv: enable nir_opt_clip_cull_const for GS too
...
The pass also supports GS now.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473 >
2025-07-12 05:20:05 +00:00