fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-09 04:38:03 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	a0694fd5c3	libagx: drop pointless helper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	c34635c58d	agx: implement halts just translate to a stop. seems to work fine. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	21c16fe343	asahi,hk: wire up printf, abort Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Lionel Landwerlin	36623697d1	hk: fix timeline value type Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	dd4805fcc8	asahi/clc: remap __FILE__ important for reproducability. wondering if we can do this in common code but not sure how yet. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	bfe1fd737b	asahi: allow c23 extensions hk already does. this quiesches warnings with single argument static_assert which we want for CL parity. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	13a4186c96	util/bitpack_helpers: make partially CL safe add enough preprocessor guards that we can include this from CL and get basic implementations of things. FIXED packs are missing due to llroundf (probably fixable). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	d64caf4161	libcl: add VkDraw(Indexed)IndirectCommand definitions this is helpful to indirect draw munging code, which applies to at least 3 stacks using driver CL stuff (current Intel, shortterm Asahi, mediumterm Panfrost) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	12e27497b3	libcl: add a common header for CPU/GPU stuff In an attempt to make OpenCL shaders more "batteries included", start building up a standard library. Based on libagx.h. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig	13b8af95fb	clc: plumb cl_khr_subgroup_ballot although rusticl isn't lighting it up yet, it's helpful to get sub_group_ballot for driver CL, which is all standard Vulkan-compatible spirv. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>	2024-12-12 21:16:12 +00:00
Paulo Zanoni	d4a54d4f92	brw: don't read past the end of old_src buffer in resize_sources() In this case, num_sources is bigger than this->sources, so if we loop up to num_sources (instead of this->sources) we'll end up reading past the end of old_src[]. Only copy up to what we originally had. This was found by code inspection, I'm not aware of any applications failing due to the lack of this patch. Fixes: `d9e737212d` ("intel/brw: Add a src array for the common case in fs_inst") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32600>	2024-12-12 20:33:13 +00:00
Samuel Pitoiset	c7a7f0244f	radv: add radv_lower_terminate_to_discard and enable for Indiana Jones To workaround game bug. This fixes the rendering issue with eyes. Cc: mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32606>	2024-12-12 19:54:39 +00:00
Samuel Pitoiset	4d4418dbb3	spirv: add an options to lower SpvOpTerminateInvocation to OpKill To workaround game bugs like Indiana Jones. Original workaround found by Hans-Kristian. Cc: mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32606>	2024-12-12 19:54:39 +00:00
Erik Faye-Lund	976eb6825e	panvk: do not require opt-in for panvk on v10 As of writing, PanVK on v10 HW is in pretty good shape. It's not yet conformant, but we were passing over 99.9% of the CTS last time I checked. That's probably good enough to drop the opt-in here. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32561>	2024-12-12 19:32:06 +00:00
Erik Faye-Lund	12067727fa	panvk: soften the language around opt-in We already have and use vk_warn_non_conformant_implementation(), so we're already being clear that PanVK is not yet conformant. Let's not repeat that information here, and instead focus on it not being well-tested. This brings the wording more or less in-line with NVK. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32561>	2024-12-12 19:32:06 +00:00
Timur Kristóf	deab81fb0d	radv: Configure implicit VS primitive ID to be per-primitive. This is beneficial to applications that rely on the implicit primitive ID from VS. - We don't have to disable provoking vertex reuse, which results in more efficient vertex processing. - There is no LDS access needed to export the primitive ID, because it is already available to GS threads. - As a consequence of not needing LDS, we can use this together with NGG passthrough mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>	2024-12-12 18:11:47 +00:00
Timur Kristóf	95ac0f8d76	radv: Reorder FS primitive ID input after layer and viewport. We want to make the implicit VS primitive ID a per-primitive output attribute, which means that this has to be last. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>	2024-12-12 18:11:47 +00:00
Timur Kristóf	9224b9a752	ac/nir/ngg: Add ability to store primitive ID as per-primitive. This configuration will be enabled in RADV in a subsequent commit. On GFX10.3: Do this together with the primitive export, to avoid adding extra CF, and to ensure optimal access of the export space. On GFX11: It's not an export but a memory store instruction, so always do it earlier and ensure the optimal attribute ring access pattern. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>	2024-12-12 18:11:45 +00:00
Timur Kristóf	d670dc0c0b	radv: Only set NGG_DISABLE_PROVOK_REUSE for VS. It doesn't do anything useful for other stages. In VS, we use this when the implicit primitive ID is needed, so that we can export that as a per-vertex attribute of the provoking vertex. In TES, the patch ID (which is used as the primitive ID) is already a per-vertex input VGPR, so it doesn't make sense to configure this. In GS, the primitive ID is explicitly written by the shader, so it makes no sense to disable provoking vertex reuse in the input. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>	2024-12-12 18:11:45 +00:00
Rhys Perry	9fe92689cc	radv: increase maxComputeWorkGroupCount[0] Match AMDVLK and radeonsi. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>	2024-12-12 17:38:47 +00:00
Rhys Perry	53d0187bab	aco: decrease max_workgroup_size Match the limit of radeonsi and RADV. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>	2024-12-12 17:38:46 +00:00
Rhys Perry	87f2f77960	aco: fix max_workgroup_count[0] This is necessary for radeonsi. fossil-db (navi21): Totals from 292 (0.37% of 79395) affected shaders: Instrs: 305965 -> 306182 (+0.07%); split: -0.00%, +0.07% CodeSize: 1624816 -> 1627212 (+0.15%); split: -0.00%, +0.15% Latency: 5244652 -> 5243587 (-0.02%); split: -0.07%, +0.05% InvThroughput: 1221089 -> 1225285 (+0.34%); split: -0.04%, +0.38% Copies: 22712 -> 22702 (-0.04%) PreSGPRs: 10713 -> 10712 (-0.01%) PreVGPRs: 10918 -> 10920 (+0.02%) VALU: 178613 -> 178836 (+0.12%) SALU: 43490 -> 43493 (+0.01%); split: -0.02%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>	2024-12-12 17:38:46 +00:00
Lionel Landwerlin	e0b5179869	blorp: use 2D dimension for 1D tiled images Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `31eeb72e45` ("blorp: Add support for blorp_copy via XY_BLOCK_COPY_BLT") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32608>	2024-12-12 17:10:45 +00:00
Erik Faye-Lund	cfb5687cb3	panvk: disable imageCubeArray on bifrost We haven't wired this up correctly on Bifrost, so let's make this V10 only for now. Fixes: `605c173fbd` ("panvk: update feature support") Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32610>	2024-12-12 15:10:26 +00:00
Erik Faye-Lund	1766e676fe	panvk: do not expose subgroup support We don't currently support it in the compiler, so we shouldn't claim support for it either. Fixes: `a6e03ce428` ("panvk: advertise version 1.1 support") Acked-by: Mary Guillemard <mary.guillemard@collabora.com> Acked-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32604>	2024-12-12 14:50:25 +00:00
Hans-Kristian Arntzen	e815d6523c	radv: Add radv_invariant_geom=true for Indiana Jones. Water puddles expect invariant position, but does not declare such in the vertex shaders, leading to random glitches. Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no> Cc: mesa-stable Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32607>	2024-12-12 13:54:27 +00:00
Eric R. Smith	f8bc6c8663	panfrost: fix potential memory leak In the very unlikely case that the packed AFBC image will not save (enough) memory, we abort packing. In this case we should free the BO associated with the metadata. Fixes: `5a928f7563` ("panfrost: Add env variable for max AFBC packing ratio") Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32597>	2024-12-12 13:08:28 +00:00
Eric R. Smith	b59e73e426	panfrost: fix read/write resource confusion in afbc_pack We read the source rather than write it, due to a typo we were not setting this correctly though. Fixes: `bc55d150a9` ("panfrost: Add support for AFBC packing") Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32597>	2024-12-12 13:08:28 +00:00
Christian Gmeiner	2ebd5fb978	etnaviv: rs: Add DBG(..) why blt usage was not possible Can be helpful to debug issues. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32592>	2024-12-12 12:52:37 +00:00
Christian Gmeiner	faf562651a	etnaviv: blt: Add DBG(..) why blt usage was not possible Can be helpful to debug issues. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32592>	2024-12-12 12:52:37 +00:00
Konstantin	815ca049cd	vulkan: Fix the argument order of update_as Also moves the src argument before dst which is more consistent. Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32488>	2024-12-12 11:15:08 +00:00
Valentine Burley	3bff52da4e	ci: Drop lava-piglit:(x86_64\|arm64) definitions As part of the migration to deqp-runner suites, remove these definitions to prevent the introduction of additional piglit jobs without test suites. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32461>	2024-12-12 10:35:41 +00:00
Valentine Burley	8e54b77910	panfrost/ci: Convert to deqp-runner suite Convert the panfrost-g52-piglit-gles2:arm64 job to a deqp-runner suite. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32461>	2024-12-12 10:35:41 +00:00
Valentine Burley	ca7df52af8	svga/ci: Convert to deqp-runner suite Convert the vmware-vmx-piglit:x86_64 job to a deqp-runner suite. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32461>	2024-12-12 10:35:40 +00:00
Samuel Pitoiset	370886c898	Revert "radv: disable alphaToOne except for Zink" This reverts commit `3b010a9e60`. This should be fixed properly now. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>	2024-12-12 10:07:25 +00:00
Samuel Pitoiset	c3a050da07	radv: fix alpha-to-coverage with alpha-to-one without MRTZ This injects a MRTZ export with only the alpha channel to select it with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage. Co-Authored-by: Rhys Perry <pendingchaos02@gmail.com> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>	2024-12-12 10:07:25 +00:00
Samuel Pitoiset	838b1cfcbd	radv: simplify determining some fragment shader info with epilogs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>	2024-12-12 10:07:25 +00:00
Collabora's Gfx CI Team	8085984aa2	Uprev Piglit to 4c0fd15fd956ec70c5509bedee219d602b334464 `468221c722...4c0fd15fd9` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32478>	2024-12-12 09:33:59 +00:00
Samuel Pitoiset	4d1aa9a2d0	radv: fix disabling DCC for stores with drirc Displayable DCC should also be disabled, otherwise it's asserting somewhere in ac_surface.c Fixes: `e3d1f27b31` ("radv: add radv_disable_dcc_stores and enable for Indiana Jones: The Great Circle") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32584>	2024-12-12 09:11:37 +00:00
Daniel Schürmann	26a3038b65	aco/lower_branches: remove edges between blocks if there is no direct branch This way, linear predecessors and successors better reflect the actual control flow which improves wait state insertion and hazard mitigation. Totals from 10252 (12.91% of 79395) affected shaders: (Navi31) Instrs: 18824540 -> 18803823 (-0.11%); split: -0.11%, +0.00% CodeSize: 99025464 -> 98942028 (-0.08%); split: -0.08%, +0.00% Latency: 169291854 -> 165781877 (-2.07%); split: -2.07%, +0.00% InvThroughput: 29701086 -> 29228602 (-1.59%); split: -1.59%, +0.00% SClause: 510587 -> 510586 (-0.00%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:22 +00:00
Daniel Schürmann	22ffe72022	aco: move branch lowering optimization into separate file 'aco_lower_branches.cpp' No fossil changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:22 +00:00
Friedrich Vock	845660f2b7	aco/lower_to_hw_instr: Check the right instruction's opcode instr is the branch instruction, its opcode won't ever be writelane. We should check inst instead. Found by inspection. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:21 +00:00
Daniel Schürmann	28ab7f0168	aco/jump_threading: remove branch sequence optimization This optimization gets applied during postRA optimization, now. No fossil changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:22 +00:00
Daniel Schürmann	fcd94a8ca7	aco: move try_optimize_branching_sequence() to postRA optimizations Totals from 196 (0.25% of 79206) affected shaders: (Navi31) Instrs: 534343 -> 534438 (+0.02%); split: -0.00%, +0.02% CodeSize: 2774852 -> 2775420 (+0.02%); split: -0.00%, +0.02% Latency: 7103512 -> 7103021 (-0.01%); split: -0.01%, +0.00% InvThroughput: 959477 -> 959447 (-0.00%) Copies: 42646 -> 42648 (+0.00%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Daniel Schürmann	95d44c7ce0	aco/optimizer_postRA: set branch()->never_taken if exec is constant non-zero Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Daniel Schürmann	d67932f69e	aco/print_ir: don't print disconnected empty blocks Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Lionel Landwerlin	2bb98a8f99	anv: document UBO descriptor range alignments Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Lionel Landwerlin	99bb2a087a	intel/decoder: fix COMPUTE_WALKER handling Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `17096f87` ("intel: Switch to COMPUTE_WALKER_BODY") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Kenneth Graunke	6341b3cd87	brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>	2024-12-12 00:05:42 +00:00
Daniel Schürmann	22881712c8	aco/assembler: Don't emit target basic block index when chaining branches This could erroneously cause an assertion to fail if the target block index was larger than UINT16_MAX. Fixes: `cab5639a09` ('aco/assembler: chain branches instead of emitting long jumps') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32599>	2024-12-11 23:28:55 +00:00

1 2 3 4 5 ...

199110 commits