fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 06:58:16 +02:00

Author	SHA1	Message	Date
Natalie Vock	8815845271	radv/rt/gfx12: Always overwrite origin/dir They're unchanged if we don't test against instance nodes. This makes image_bvh8_intersect_ray kill its direction/origin operands, improving RA. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Marek Olšák	bdcfe15457	radv: don't export cull distances if the shader culls against them This increases primitive throughput for all hw with NGG if the shader culls and the removal of cull distances reduces the number of position exports. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:05 +00:00
Marek Olšák	89e1ec92c5	radv: cull against clip and cull distances in the shader Clip and cull distance outputs decrease primitive throughput, so culling against them in the shader has even more benefit than other culling options. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:03 +00:00
Marek Olšák	65972f2301	ac/nir: return GSVS emit sizes from legacy GS lowering and simplify shader info This simplifies shader info in drivers by returning GSVS emit sizes from ac_nir_lower_legacy_gs. The pass knows the sizes, so drivers shouldn't have to determine them independently. This also makes the values more accurate because both drivers were computing the GSVS emit sizes inaccurately and had redundant fields in shader info. RADV had a lot of redudancy there. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>	2025-07-12 05:20:02 +00:00
Qiang Yu	88c79a13b9	ac,radv: move nir_load_ring_mesh_scratch_offset_amd to ac Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Qiang Yu	5ddbd8c83b	ac,radv: move mesh scratch ring constants to ac To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Qiang Yu	78fed5fc13	ac,radv: move nir_load_task_ring_entry_amd to ac To be shared with radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>	2025-07-11 02:25:51 +00:00
Georg Lehmann	cac60c39a9	radv/nir/lower_cmat: use explicit shift when calculating gfx12 wave64 layout Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The rest of the compiler stack doesn't understand the alignment implications of the combined shift. Effect on llama.cpp fossils: Totals from 3 (13.64% of 22) affected shaders: Instrs: 5778 -> 5684 (-1.63%) CodeSize: 33540 -> 32800 (-2.21%) VGPRs: 228 -> 216 (-5.26%) Latency: 39942 -> 39417 (-1.31%) InvThroughput: 12037 -> 11862 (-1.45%) VALU: 2162 -> 2111 (-2.36%) More importantly, this replaces some ds_load_2addr_b32 with ds_load_b64. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13447 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36016>	2025-07-10 07:11:23 +00:00
Alyssa Rosenzweig	d31cb824df	treewide: use VARYING_BIT_* Some checks failed macOS-CI / macOS-CI (dri) (push) Has been cancelled Details macOS-CI / macOS-CI (xlib) (push) Has been cancelled Details Via Coccinelle patch generated by the following Python: varys = [ "POS", "COL0", "COL1", "FOGC", "TEX0", "TEX1", "TEX2", "TEX3", "TEX4", "TEX5", "TEX6", "TEX7", "PSIZ", "BFC0", "BFC1", "EDGE", "CLIP_VERTEX", "CLIP_DIST0", "CLIP_DIST1", "CULL_DIST0", "CULL_DIST1", "PRIMITIVE_ID", "PRIMITIVE_COUNT", "LAYER", "VIEWPORT", "FACE", "PRIMITIVE_SHADING_RATE", "PNTC", "TESS_LEVEL_OUTER", "TESS_LEVEL_INNER", "PRIMITIVE_INDICES", "BOUNDING_BOX0", "BOUNDING_BOX1", "VIEWPORT_MASK", "CULL_PRIMITIVE" ] t = """ @@ @@ -(1 << VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -BITFIELD_BIT(VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -(1ull << VARYING_SLOT_${V}) +VARYING_BIT_${V} @@ @@ -BITFIELD64_BIT(VARYING_SLOT_${V}) +VARYING_BIT_${V} """ for v in varys: from mako.template import Template print(Template(t).render(V = v)) Closes: #13453 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [panfrost, common] Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [broadcom] Reviewed-by: Corentin Noël <corentin.noel@collabora.com> [virgl] Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> [zink] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35917>	2025-07-04 19:01:04 +00:00
Marek Olšák	4263b49778	ac/nir: remove ngg_scratch LDS ABI, allocate it in the lowering pass This is a cleanup. Old gs LDS layout: [es outputs][gs outputs][scratch] Old nogs LDS layout: [xfb/cull][scratch] New gs LDS layout: [es outputs][scratch\|gs outputs] New nogs LDS layout: [scratch\|xfb/cull] The LDS scratch is moved to the beginning of the preceding buffer in LDS, while the addresses in that LDS buffer are offset by the scratch size. It effectively merges the LDS scratch with the preceding buffer in LDS. Thanks to that, we no longer need the ngg_scratch ABI and the offset in a user SGPR. The lowering passes now return the LDS scratch size, which is used by the drivers to determine the final LDS size. The ngg_lds_layout SGPR is now unused without GS in RADV. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>	2025-07-02 20:27:41 +00:00
Natalie Vock	e236a731e4	radv/rt: Enable pointer flags on GFX11+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Allows hardware to do some of the culling work, as well as early-cull box nodes with CullOpaque/CullNonOpaque ray masks when all children are (not) opaque. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32417>	2025-06-28 10:31:38 +00:00
Marek Olšák	42e98f115a	radv: always use the ngg_lds_layout SGPR This is a prerequisite for NGG lowering passes to return LDS vertex and scratch sizes, which will lead to further simplifications. That will require calling gfx10_get_ngg_info after radv_postprocess_nir, which means LDS offsets are unknown when the passes are called. This makes the 2 values no longer compile-time constants. A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from an SGPR, though that could be removed too (non-trivially) or handled as a relocation. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>	2025-06-28 08:20:26 +00:00
Samuel Pitoiset	989162e67a	radv: split descriptor set and descriptor utils in separate files Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35732>	2025-06-27 07:55:37 +00:00
Marek Olšák	12df9b3def	nir: rename nir_vectorize_tess_levels -> nir_lower_tess_level_array_vars_to_vec Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>	2025-06-26 18:20:50 +00:00
Marek Olšák	439d805291	nir: rename nir_lower_io_to_scalar_early -> nir_lower_io_vars_to_scalar Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>	2025-06-26 18:20:49 +00:00
Georg Lehmann	21523dad96	radv/nir/lower_cmat: use nir_src_as_deref Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:34 +00:00
Georg Lehmann	48fc8c8d1c	radv/nir/lower_cmat: set optimal load/store alignment Allows vectorizing load/stores with sub dword types or with robustness. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:33 +00:00
Georg Lehmann	ed2ecf9ef8	radv/nir/lower_cmat: share cmat_load/cmat_store code Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35633>	2025-06-24 17:12:33 +00:00
Georg Lehmann	4ac6aae3a4	radv/nir/lower_cmat: fix gfx11 B->ACC conversion Of course I messed up the one path that's not tested by CTS. Fixes: `249ccc6b4c` ("radv/nir/lower_cmat: implement use conversions/transpose") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35713>	2025-06-24 15:53:52 +00:00
Georg Lehmann	249ccc6b4c	radv/nir/lower_cmat: implement use conversions/transpose This could potentially be improved using packed 32bit subgroup ops, but what we actually care about (gfx12 ACC -> B) is free. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34793>	2025-06-24 07:14:34 +00:00
Georg Lehmann	cbd17cb4d6	radv/nir/lower_cmat: handle float8 conversions Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>	2025-06-23 07:59:27 +00:00
Samuel Pitoiset	59dfa8c2f5	radv: switch to nir_intrinsic_load_input_attachment_coord Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35556>	2025-06-20 06:12:24 +00:00
Samuel Pitoiset	50c4d5cccd	radv: use one descriptor per plane for combined image+sampler with ycbcr Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This removes a very old hack which will also allow us to enable DCC for multiplanar formats eventually and to reduce the combined image+sampler descriptor size from 96 to 48 on RDNA3+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>	2025-06-19 12:58:32 +00:00
Samuel Pitoiset	4f37876c7b	radv: replace radv_combined_image_descriptor_sampler_offset() by a constant Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>	2025-06-19 12:58:31 +00:00
Georg Lehmann	e0cdf4dfdd	radv/nir/lower_cmat: use common matrix layout on gfx12 The GFX12 ISA doc describes other layouts for A/B, but they are identical to the C layout with the exception of the order of the rows (columns for A). And as long as these are swapped in the same way for both A and B, the muladd result will be the same. So we use the C layout for all uses. This will simplify conversions between uses, and allows A/B to use a single memory access for load/store in wave32. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35570>	2025-06-18 06:33:06 +00:00
Samuel Pitoiset	eeabce93b6	radv: use constants for different descriptor sizes Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Instead of magic values everywhere. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428>	2025-06-13 07:53:04 +00:00
Samuel Pitoiset	63f8b8ce6d	radv/nir: adjust a comment about inlining immutable samplers That (broken) optimization has been removed few weeks ago. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35428>	2025-06-13 07:53:04 +00:00
Samuel Pitoiset	99fb1a9bd7	radv/nir: lower unassigned vertex attributes to (0,0,0,0) The spec allows both 0,0,0,0 and 0,0,0,1. Returning all zeroes makes it consistent with vertex prologs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35423>	2025-06-13 07:33:03 +00:00
Marek Olšák	5734a916d6	ac: move tcs_offchip_layout into ac_shader_args It's the same variable between radv and radeonsi, but the implementation of the load intrinsics is very different. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	9d9cfd89da	ac/nir/tess: compute the number of remapped VRAM outputs in common code This unifies it for both drivers. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	ea70060826	ac/nir/tess: stop using tes_inputs_read / tes_patch_inputs read for TCS & TES use ac_nir_tess_io_info instead Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	42445e271e	radv,radeonsi: use ac_nir_tess_io_info for LDS size computation Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	a59464b6e3	radv,radeonsi: precompute and pass TCS per-vertex output stride via a user SGPR It's a stride of 1 output, which isn't 16. It's 16 * num_threads, aligned to 256. tcs_offchip_layout has 5 unused bits, so let's use them. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	742227c65c	radv,radeonsi: make TCS_OFFCHIP_LAYOUT_NUM_PATCHES not off by one We never use 128 anyway. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	8d3e3c72e0	radv,radeonsi: merge PATCH_CONTROL_POINT & OUT_PATCH_CP into 1 field One is only used by TCS, the other is only used by TES. Use the same field for both, call it PATCH_VERTICES_IN. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	534b282573	ac/nir/tess: adjust memory layout of TCS outputs to have aligned store offsets There is a comment that explains it. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:38 +00:00
Rhys Perry	00a2ed60f8	radv/meta: use unsigned min in copy/fill shaders Otherwise, this would break >2 GiB copy/fill. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Backport: 25.1 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35343>	2025-06-05 09:55:32 +00:00
Marek Olšák	c3034fa82c	amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>	2025-06-04 17:46:38 +00:00
Samuel Pitoiset	25eb836eec	radv: fix CP DMA with NULL PRT pages on GFX8-9 On GFX8-9 (starting from Polaris10), CP DMA is broken with NULL PRT pages. It doesn't read 0 and doesn't discard writes which can cause GPU hangs. Fix that by always using the compute path when a BO is sparse. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12828 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35071>	2025-05-21 09:41:23 +00:00
Samuel Pitoiset	6528bb76b1	radv: stop using GDS for emulated prims gen/xfb queries on GFX11-GFX11.5 Use the same path as GFX12 using SSBO atomics because performance should be equal or slightly better due to less synchronization. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35017>	2025-05-21 08:48:04 +02:00
Samuel Pitoiset	439baafe5e	radv: increase size of the buffer for emulated queries on GFX12 This increases this buffer by 20 bytes but it will be re-used for emulated queries on GFX11-GFX11.5 in order to remove the GDS path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35017>	2025-05-21 08:46:12 +02:00
Samuel Pitoiset	69ff204422	radv: remove the optimization for equal immutable samplers This optimization used to optimize the allocated space for descriptors when immutable samplers are equal. Though, this was basically broken : - descriptor copies were broken for combiner image sampler (or sampler) with equal immutable samplers because 96 bytes were copied instead of 64 bytes (cf. the linked ticket). This could be fixed but it's not worth it. - the value returned by vkGetDescriptorLayoutSupport() was broken, it should have been 96 with no immutable samplers (or when they aren't equal) This optimization was also not applied for descriptor buffers which is the default for vkd3d-proton and Zink. DXVK doesn't use db but it doesn't use immutable samplers, so basically only native vulkan games would be concerned. Note that immutable samplers would still be inlined in shaders if no indirect access which should be 99.9% of the usecase. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11165 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34928>	2025-05-13 16:27:22 +00:00
Georg Lehmann	7716e63cd6	radv/nir/lower_cmat: handle bf16 conversions Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:25 +00:00
Georg Lehmann	78524837c1	radv/nir/opt_cmat: support bfloat16 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:25 +00:00
Georg Lehmann	e8f5c335ff	radv,aco,nir: keep the A and B base type for cmat_muladd_amd With bfloat16, and the two fp8 formats in the future, using just the bit size to identify the types is no longer possible. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:25 +00:00
Rhys Perry	8abb787c6b	radv/gfx12: use dword3 smem loads for push constants fossil-db (gfx1201): Totals from 5 (0.01% of 79377) affected shaders: (no affected stats) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Georg Lehmann	6d2190300a	radv/nir/lower_cmat: tightly pack 8bit gfx11 acc matrix Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Invalid for now, but used by vkd3d-proton, where the use case is to convert a result matrix to lower precision, followed by a store. For 16bit accumulation matrices, GFX11 only uses 16bits per 32bit register. RADV's coop matrix code pads the unused space with undefs and uses a vector with twice as many elements as the matrix length. Extending that to 8bit by leaving 24 bits unused is unnecessary as these matrices as there is no hw unit that requires it. And in wave32, it would also result in vectors larger than NIR's limit. So tightly pack 8bit matrices without any undef padding. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>	2025-04-24 06:37:44 +00:00
Georg Lehmann	bbc9bc9d24	radv/nir/lower_cmat: use cmat_mul instead of duplicating hw details for type conversion Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>	2025-04-24 06:37:44 +00:00
Georg Lehmann	31a3430570	radv/nir/lower_cmat: use radv_nir_cmat_bits consistently Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>	2025-04-24 06:37:44 +00:00
Georg Lehmann	c3964e87f8	radv: apply fneg/fabs modifiers to wmma Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>	2025-04-22 16:08:55 +00:00

1 2 3 4 5 ...

296 commits