mesa/src at 6341b3cd87d98dfca5d40b4c1e95ac26500d8558 - fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-31 00:48:13 +02:00

History

Kenneth Graunke 6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>		2024-12-12 00:05:42 +00:00
..
amd	aco/assembler: Don't emit target basic block index when chaining branches	2024-12-11 23:28:55 +00:00
android_stub
asahi	vulkan: rename depth bias graphics states	2024-12-06 13:48:26 -05:00
broadcom	vulkan: rename depth bias graphics states	2024-12-06 13:48:26 -05:00
c11	build: pass licensing information in SPDX form	2024-06-29 12:42:49 -07:00
compiler	nir: make ballot ALU and mbcnt_amd operations reorderable	2024-12-11 14:47:12 +00:00
drm-shim	drm-shim: stub synobj_timeline_wait and query ioctl	2024-07-16 11:17:59 +02:00
egl	meson: drop unused variables	2024-11-26 20:45:41 +00:00
etnaviv	etnaviv/ml: Add support for tensor split and concatenation operations	2024-12-06 13:29:11 +00:00
freedreno	ir3/cp: add support for swapping srcs of sad	2024-12-11 16:45:13 +00:00
gallium	format: Add R8_G8B8_422_UNORM format	2024-12-11 15:28:08 +00:00
gbm	Revert "gbm: mark surface buffers as explicit flushed"	2024-11-27 22:48:04 +00:00
getopt	build: pass licensing information in SPDX form	2024-06-29 12:42:49 -07:00
gfxstream	gfxstream: fix issues with VK1.4 build	2024-12-03 20:35:44 +00:00
glx	glx: return BadMatch for invalid reset notification strategy	2024-11-27 19:00:20 +00:00
gtest	build: pass licensing information in SPDX form	2024-06-29 12:42:49 -07:00
imagination	vulkan: rename depth bias graphics states	2024-12-06 13:48:26 -05:00
imgui
intel	brw: Combine convergent texture buffer fetches into fewer loads	2024-12-12 00:05:42 +00:00
loader	loader: Fix typo in __DRI_IMAGE_FORMAT_XBGR16161616 definition	2024-10-25 14:18:24 +00:00
mapi	meson: remove selinux option	2024-10-21 01:14:35 +00:00
mesa	mesa: when blitting between formats clear any unused components	2024-12-05 18:27:37 +00:00
microsoft	microsoft/clc: Initialize printf buffer for tests	2024-12-10 19:13:07 +00:00
nouveau	nvk/ci: update the ga106 expectations	2024-12-11 04:24:35 +00:00
panfrost	panvk/ci: update g52-vk-full job	2024-12-11 20:19:43 +00:00
tool	perfetto: Add Panfrost data sources to system.cfg	2024-08-22 18:33:45 +00:00
util	util/format: nr_channels is always <= 4	2024-12-11 18:34:47 +00:00
virtio	treewide: Stop putting enum in front of Vulkan enum types	2024-12-02 17:22:49 +00:00
vulkan	wsi/wayland: Add forward progress guarantee for present wait.	2024-12-11 11:51:48 +00:00
x11	meson: require dri3 modifiers	2024-09-06 17:34:17 +00:00
.clang-format	nir: add helpers for precompiled shaders	2024-11-28 17:34:12 +00:00
meson.build	meson: simplify logic a bit	2024-11-26 20:45:41 +00:00