fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 13:28:09 +02:00

Author	SHA1	Message	Date
Marek Olšák	4c8a757951	radv,radeonsi: mark VS input loads and poly stipple load speculatable Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35950>	2025-07-24 06:31:17 +00:00
Alyssa Rosenzweig	8a1a410389	treewide: use SWAP macro Via Coccinelle patch + manual clean up: @@ identifier temporary, a, b; type T; @@ -T temporary = a; -a = b; -b = temporary; +SWAP(a, b); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>	2025-07-23 19:49:47 +00:00
Alyssa Rosenzweig	6b34e2174e	nir: introduce ergonomic tex builder for intrinsics, we have these really nice builders using designated initializers + macros to specify optional indices. texture instrs have even more craziness involved, but we can do the same trick. this commit takes the existing "fixed form" deref-centric tex builders and generalizes them to work with non-deref textures, making it useful also for GL and late VK passes, while providing an API that strives to be ergonomic and consistent. this series only implements a subset of possible texture operations for now, but more generalizing could be added as people have need. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36050>	2025-07-21 12:11:41 +00:00
Konstantin Seurer	d59c22b6e1	radv/rt: Implement null acceleration structure in shader code Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The previous approach is broken with descriptor buffer capture/replay because the address off the dummy VA used can randomly change. Totals from 78 (20.58% of 379) affected shaders: Instrs: 3837275 -> 3839653 (+0.06%); split: -0.01%, +0.07% CodeSize: 20235104 -> 20251744 (+0.08%); split: -0.01%, +0.09% SpillSGPRs: 997 -> 1007 (+1.00%) Latency: 22305937 -> 22331551 (+0.11%); split: -0.03%, +0.15% InvThroughput: 4232313 -> 4237341 (+0.12%); split: -0.03%, +0.15% VClause: 97043 -> 97027 (-0.02%); split: -0.02%, +0.01% SClause: 72169 -> 72416 (+0.34%); split: -0.00%, +0.35% Copies: 321578 -> 322126 (+0.17%); split: -0.11%, +0.28% Branches: 110163 -> 110444 (+0.26%); split: -0.00%, +0.26% PreSGPRs: 7879 -> 7942 (+0.80%) VALU: 2155040 -> 2156425 (+0.06%); split: -0.02%, +0.09% SALU: 502292 -> 503078 (+0.16%); split: -0.00%, +0.16% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36034>	2025-07-19 21:02:42 +00:00
Konstantin Seurer	d28ff8050a	radv/rt: Use inv_dir for software ray-triangle tests Reviewed-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Autumn Ashton <misyl@froggi.es> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>	2025-07-19 16:35:37 +00:00
Konstantin Seurer	5494789e89	radv/rt: Optimize emulated ray-triangle tests The imod instructions are lowered to 4 alu instructions each. We can do better by packing the results with the values for kz. Reviewed-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Autumn Ashton <misyl@froggi.es> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>	2025-07-19 16:35:37 +00:00
Konstantin Seurer	d140f2a6a2	radv: Implement watertightness for emulated RT Instead of using fp64 (Which is broken in some cases) the new approach only uses fp32 and implements tiebreaking for edge/vertex hits. Using fp32 is also much faster, improving performance of q2rtx by around 40%. Reviewed-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Autumn Ashton <misyl@froggi.es> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>	2025-07-19 16:35:36 +00:00
Konstantin Seurer	55641f9ca0	radv: Disable pointer flags and the GFX12 WA for emulated RT Reviewed-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Autumn Ashton <misyl@froggi.es> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>	2025-07-19 16:35:36 +00:00
Konstantin Seurer	df44b353ad	radv: Optimize ray tracing position fetch Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Gets rid of a lot of indirection when fetching triangle positions. Storing the primitive address increases register pressure by a bit but the traversal shader which should have the highest register demand should not be affected when position fetch is not used. Totals: Instrs: 4021686 -> 4022435 (+0.02%); split: -0.01%, +0.03% CodeSize: 21235812 -> 21235832 (+0.00%); split: -0.02%, +0.02% Latency: 23402275 -> 23412110 (+0.04%); split: -0.04%, +0.09% InvThroughput: 4352818 -> 4352206 (-0.01%); split: -0.04%, +0.02% VClause: 101906 -> 102058 (+0.15%); split: -0.03%, +0.18% Copies: 342210 -> 342368 (+0.05%); split: -0.09%, +0.14% Branches: 114988 -> 114993 (+0.00%) PreVGPRs: 26551 -> 27111 (+2.11%) VALU: 2249366 -> 2249524 (+0.01%); split: -0.01%, +0.02% SALU: 529828 -> 529808 (-0.00%); split: -0.01%, +0.00% Reviewed-by: Natalie Vock <natalie.vock@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35533>	2025-07-19 16:07:59 +00:00
Georg Lehmann	497f607c8e	radv/nir/lower_cmat: vectorize GFX11 B -> ACC conversion Foz-DB Navi31: Totals from 7 out of 14 FSR4 shaders: MaxWaves: 50 -> 52 (+4.00%) Instrs: 44951 -> 44516 (-0.97%); split: -1.00%, +0.03% CodeSize: 309176 -> 305500 (-1.19%); split: -1.23%, +0.04% VGPRs: 1464 -> 1416 (-3.28%) SpillVGPRs: 188 -> 92 (-51.06%) Scratch: 24064 -> 11776 (-51.06%) Latency: 171318 -> 163663 (-4.47%); split: -4.51%, +0.04% InvThroughput: 178796 -> 178956 (+0.09%); split: -0.04%, +0.13% VClause: 769 -> 730 (-5.07%); split: -6.50%, +1.43% Copies: 3149 -> 3261 (+3.56%); split: -1.21%, +4.76% PreVGPRs: 1607 -> 1467 (-8.71%) VALU: 37715 -> 37744 (+0.08%); split: -0.11%, +0.18% SALU: 754 -> 753 (-0.13%) VMEM: 2813 -> 2621 (-6.83%) VOPD: 1674 -> 1685 (+0.66%); split: +1.55%, -0.90% Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:52 +00:00
Georg Lehmann	7546169e1c	radv/nir/lower_cmat: vectorize GFX11 ACC -> B conversion Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 64204 -> 60749 (-5.38%) CodeSize: 439052 -> 417668 (-4.87%) SpillVGPRs: 186 -> 188 (+1.08%) Scratch: 23808 -> 24064 (+1.08%) Latency: 208878 -> 202903 (-2.86%) InvThroughput: 232898 -> 225688 (-3.10%) VClause: 902 -> 907 (+0.55%); split: -1.55%, +2.11% Copies: 6418 -> 3762 (-41.38%) Branches: 55 -> 37 (-32.73%) PreSGPRs: 297 -> 298 (+0.34%) PreVGPRs: 2299 -> 2303 (+0.17%) VALU: 54762 -> 51489 (-5.98%) SALU: 956 -> 938 (-1.88%) VMEM: 3469 -> 3473 (+0.12%) VOPD: 3895 -> 2126 (-45.42%) Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:52 +00:00
Georg Lehmann	56d93c40ea	radv/nir/lower_cmat: convert matrix use in smaller type Less conversions, and less data to move around. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 65443 -> 64204 (-1.89%); split: -1.93%, +0.04% CodeSize: 441884 -> 439052 (-0.64%); split: -1.21%, +0.57% Latency: 213374 -> 208878 (-2.11%); split: -2.17%, +0.07% InvThroughput: 236922 -> 232898 (-1.70%); split: -1.77%, +0.08% VClause: 935 -> 902 (-3.53%); split: -3.74%, +0.21% Copies: 5064 -> 6418 (+26.74%); split: -13.35%, +40.09% Branches: 54 -> 55 (+1.85%) VALU: 55700 -> 54762 (-1.68%); split: -1.85%, +0.16% VOPD: 3459 -> 3895 (+12.60%); split: +16.88%, -4.28% Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:52 +00:00
Georg Lehmann	f2846b936a	radv/nir/lower_cmat: use v_permlanex16_b32 instead of ds_swizzle_b32 for GFX11 ACC->B ds_swizzle is slower than I expected. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shaders: Instrs: 68802 -> 65443 (-4.88%) CodeSize: 458000 -> 441884 (-3.52%) Latency: 218147 -> 213374 (-2.19%); split: -3.17%, +0.99% InvThroughput: 230190 -> 236922 (+2.92%); split: -0.25%, +3.18% VClause: 922 -> 935 (+1.41%); split: -0.98%, +2.39% Copies: 5877 -> 5064 (-13.83%); split: -15.74%, +1.91% Branches: 37 -> 54 (+45.95%) VALU: 53441 -> 55700 (+4.23%); split: -0.55%, +4.77% SALU: 872 -> 956 (+9.63%) VOPD: 1767 -> 3459 (+95.76%) Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:51 +00:00
Samuel Pitoiset	ea742877f6	radv: re-run clang-format For style consistency. $ clang-format -i $(find src/amd/vulkan/ -name ".h" -o -name ".c" -o -name "*.cpp") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36118>	2025-07-16 09:10:33 +02:00
Natalie Vock	e978f6e247	radv/rt: Use ds_bvh_stack_push8_pop1_rtn_b32 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	f0aa383e09	radv/rt: Use ds_bvh_stack_rtn Improves Quake 2 RTX performance by 5% on RDNA3. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	8815845271	radv/rt/gfx12: Always overwrite origin/dir They're unchanged if we don't test against instance nodes. This makes image_bvh8_intersect_ray kill its direction/origin operands, improving RA. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Marek Olšák	bdcfe15457	radv: don't export cull distances if the shader culls against them This increases primitive throughput for all hw with NGG if the shader culls and the removal of cull distances reduces the number of position exports. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:05 +00:00
Marek Olšák	89e1ec92c5	radv: cull against clip and cull distances in the shader Clip and cull distance outputs decrease primitive throughput, so culling against them in the shader has even more benefit than other culling options. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:03 +00:00
Marek Olšák	65972f2301	ac/nir: return GSVS emit sizes from legacy GS lowering and simplify shader info This simplifies shader info in drivers by returning GSVS emit sizes from ac_nir_lower_legacy_gs. The pass knows the sizes, so drivers shouldn't have to determine them independently. This also makes the values more accurate because both drivers were computing the GSVS emit sizes inaccurately and had redundant fields in shader info. RADV had a lot of redudancy there. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:02 +00:00
Qiang Yu	88c79a13b9	ac,radv: move nir_load_ring_mesh_scratch_offset_amd to ac Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Qiang Yu	5ddbd8c83b	ac,radv: move mesh scratch ring constants to ac To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Qiang Yu	78fed5fc13	ac,radv: move nir_load_task_ring_entry_amd to ac To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Georg Lehmann	cac60c39a9	radv/nir/lower_cmat: use explicit shift when calculating gfx12 wave64 layout Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The rest of the compiler stack doesn't understand the alignment implications of the combined shift. Effect on llama.cpp fossils: Totals from 3 (13.64% of 22) affected shaders: Instrs: 5778 -> 5684 (-1.63%) CodeSize: 33540 -> 32800 (-2.21%) VGPRs: 228 -> 216 (-5.26%) Latency: 39942 -> 39417 (-1.31%) InvThroughput: 12037 -> 11862 (-1.45%) VALU: 2162 -> 2111 (-2.36%) More importantly, this replaces some ds_load_2addr_b32 with ds_load_b64. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13447 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36016>	2025-07-10 07:11:23 +00:00
Alyssa Rosenzweig	d31cb824df	treewide: use VARYING_BIT_* Some checks failed macOS-CI / macOS-CI (dri) (push) Has been cancelled Details macOS-CI / macOS-CI (xlib) (push) Has been cancelled Details Via Coccinelle patch generated by the following Python: varys = [ "POS", "COL0", "COL1", "FOGC", "TEX0", "TEX1", "TEX2", "TEX3", "TEX4", "TEX5", "TEX6", "TEX7", "PSIZ", "BFC0", "BFC1", "EDGE", "CLIP_VERTEX", "CLIP_DIST0", "CLIP_DIST1", "CULL_DIST0", "CULL_DIST1", "PRIMITIVE_ID", "PRIMITIVE_COUNT", "LAYER", "VIEWPORT", "FACE", "PRIMITIVE_SHADING_RATE", "PNTC", "TESS_LEVEL_OUTER", "TESS_LEVEL_INNER", "PRIMITIVE_INDICES", "BOUNDING_BOX0", "BOUNDING_BOX1", "VIEWPORT_MASK", "CULL_PRIMITIVE" ] t = """ @@ @@ -(1 << VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -BITFIELD_BIT(VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -(1ull << VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -BITFIELD64_BIT(VARYING_SLOT_${V}) +VARYING_BIT_${V} """ for v in varys: from mako.template import Template print(Template(t).render(V = v)) Closes: #13453 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [panfrost, common] Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [broadcom] Reviewed-by: Corentin Noël <corentin.noel@collabora.com> [virgl] Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> [zink] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35917>	2025-07-04 19:01:04 +00:00
Marek Olšák	4263b49778	ac/nir: remove ngg_scratch LDS ABI, allocate it in the lowering pass This is a cleanup. Old gs LDS layout: [es outputs][gs outputs][scratch] Old nogs LDS layout: [xfb/cull][scratch] New gs LDS layout: [es outputs][scratch\|gs outputs] New nogs LDS layout: [scratch\|xfb/cull] The LDS scratch is moved to the beginning of the preceding buffer in LDS, while the addresses in that LDS buffer are offset by the scratch size. It effectively merges the LDS scratch with the preceding buffer in LDS. Thanks to that, we no longer need the ngg_scratch ABI and the offset in a user SGPR. The lowering passes now return the LDS scratch size, which is used by the drivers to determine the final LDS size. The ngg_lds_layout SGPR is now unused without GS in RADV. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>	2025-07-02 20:27:41 +00:00
Natalie Vock	e236a731e4	radv/rt: Enable pointer flags on GFX11+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Allows hardware to do some of the culling work, as well as early-cull box nodes with CullOpaque/CullNonOpaque ray masks when all children are (not) opaque. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32417>	2025-06-28 10:31:38 +00:00
Marek Olšák	42e98f115a	radv: always use the ngg_lds_layout SGPR This is a prerequisite for NGG lowering passes to return LDS vertex and scratch sizes, which will lead to further simplifications. That will require calling gfx10_get_ngg_info after radv_postprocess_nir, which means LDS offsets are unknown when the passes are called. This makes the 2 values no longer compile-time constants. A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from an SGPR, though that could be removed too (non-trivially) or handled as a relocation. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>	2025-06-28 08:20:26 +00:00
Samuel Pitoiset	989162e67a	radv: split descriptor set and descriptor utils in separate files Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35732>	2025-06-27 07:55:37 +00:00
Marek Olšák	12df9b3def	nir: rename nir_vectorize_tess_levels -> nir_lower_tess_level_array_vars_to_vec Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>	2025-06-26 18:20:50 +00:00
Marek Olšák	439d805291	nir: rename nir_lower_io_to_scalar_early -> nir_lower_io_vars_to_scalar Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>	2025-06-26 18:20:49 +00:00
Georg Lehmann	21523dad96	radv/nir/lower_cmat: use nir_src_as_deref Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:34 +00:00
Georg Lehmann	48fc8c8d1c	radv/nir/lower_cmat: set optimal load/store alignment Allows vectorizing load/stores with sub dword types or with robustness. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:33 +00:00
Georg Lehmann	ed2ecf9ef8	radv/nir/lower_cmat: share cmat_load/cmat_store code Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:33 +00:00
Georg Lehmann	4ac6aae3a4	radv/nir/lower_cmat: fix gfx11 B->ACC conversion Of course I messed up the one path that's not tested by CTS. Fixes: `249ccc6b4c` ("radv/nir/lower_cmat: implement use conversions/transpose") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35713>	2025-06-24 15:53:52 +00:00
Georg Lehmann	249ccc6b4c	radv/nir/lower_cmat: implement use conversions/transpose This could potentially be improved using packed 32bit subgroup ops, but what we actually care about (gfx12 ACC -> B) is free. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34793>	2025-06-24 07:14:34 +00:00
Georg Lehmann	cbd17cb4d6	radv/nir/lower_cmat: handle float8 conversions Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>	2025-06-23 07:59:27 +00:00
Samuel Pitoiset	59dfa8c2f5	radv: switch to nir_intrinsic_load_input_attachment_coord Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35556>	2025-06-20 06:12:24 +00:00
Samuel Pitoiset	50c4d5cccd	radv: use one descriptor per plane for combined image+sampler with ycbcr Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This removes a very old hack which will also allow us to enable DCC for multiplanar formats eventually and to reduce the combined image+sampler descriptor size from 96 to 48 on RDNA3+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>	2025-06-19 12:58:32 +00:00
Samuel Pitoiset	4f37876c7b	radv: replace radv_combined_image_descriptor_sampler_offset() by a constant Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>	2025-06-19 12:58:31 +00:00
Georg Lehmann	e0cdf4dfdd	radv/nir/lower_cmat: use common matrix layout on gfx12 The GFX12 ISA doc describes other layouts for A/B, but they are identical to the C layout with the exception of the order of the rows (columns for A). And as long as these are swapped in the same way for both A and B, the muladd result will be the same. So we use the C layout for all uses. This will simplify conversions between uses, and allows A/B to use a single memory access for load/store in wave32. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35570>	2025-06-18 06:33:06 +00:00
Samuel Pitoiset	eeabce93b6	radv: use constants for different descriptor sizes Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Instead of magic values everywhere. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428>	2025-06-13 07:53:04 +00:00
Samuel Pitoiset	63f8b8ce6d	radv/nir: adjust a comment about inlining immutable samplers That (broken) optimization has been removed few weeks ago. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428>	2025-06-13 07:53:04 +00:00
Samuel Pitoiset	99fb1a9bd7	radv/nir: lower unassigned vertex attributes to (0,0,0,0) The spec allows both 0,0,0,0 and 0,0,0,1. Returning all zeroes makes it consistent with vertex prologs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35423>	2025-06-13 07:33:03 +00:00
Marek Olšák	5734a916d6	ac: move tcs_offchip_layout into ac_shader_args It's the same variable between radv and radeonsi, but the implementation of the load intrinsics is very different. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	9d9cfd89da	ac/nir/tess: compute the number of remapped VRAM outputs in common code This unifies it for both drivers. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	ea70060826	ac/nir/tess: stop using tes_inputs_read / tes_patch_inputs read for TCS & TES use ac_nir_tess_io_info instead Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	42445e271e	radv,radeonsi: use ac_nir_tess_io_info for LDS size computation Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	a59464b6e3	radv,radeonsi: precompute and pass TCS per-vertex output stride via a user SGPR It's a stride of 1 output, which isn't 16. It's 16 * num_threads, aligned to 256. tcs_offchip_layout has 5 unused bits, so let's use them. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	742227c65c	radv,radeonsi: make TCS_OFFCHIP_LAYOUT_NUM_PATCHES not off by one We never use 128 anyway. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00

1 2 3 4 5 ...

312 commits