mesa/src
Kenneth Graunke 6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:

  con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
  ...
  con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)

A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates.  In most cases, only the .x channel
of the result is read.  So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value.  Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.

The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently.  This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture.  It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example).  It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width.  (These get copy propagated away.)

We can pick the SIMD size of the load independently of the native shader
width as well.  On Xe2, those 28 convergent loads become a single SIMD32
ld message.  On earlier hardware, we use 2 SIMD16 messages.  Or we can
use a smaller size when there aren't many to combine.

In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers.  On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
2024-12-12 00:05:42 +00:00
..
amd aco/assembler: Don't emit target basic block index when chaining branches 2024-12-11 23:28:55 +00:00
android_stub
asahi vulkan: rename depth bias graphics states 2024-12-06 13:48:26 -05:00
broadcom vulkan: rename depth bias graphics states 2024-12-06 13:48:26 -05:00
c11 build: pass licensing information in SPDX form 2024-06-29 12:42:49 -07:00
compiler nir: make ballot ALU and mbcnt_amd operations reorderable 2024-12-11 14:47:12 +00:00
drm-shim drm-shim: stub synobj_timeline_wait and query ioctl 2024-07-16 11:17:59 +02:00
egl meson: drop unused variables 2024-11-26 20:45:41 +00:00
etnaviv etnaviv/ml: Add support for tensor split and concatenation operations 2024-12-06 13:29:11 +00:00
freedreno ir3/cp: add support for swapping srcs of sad 2024-12-11 16:45:13 +00:00
gallium format: Add R8_G8B8_422_UNORM format 2024-12-11 15:28:08 +00:00
gbm Revert "gbm: mark surface buffers as explicit flushed" 2024-11-27 22:48:04 +00:00
getopt build: pass licensing information in SPDX form 2024-06-29 12:42:49 -07:00
gfxstream gfxstream: fix issues with VK1.4 build 2024-12-03 20:35:44 +00:00
glx glx: return BadMatch for invalid reset notification strategy 2024-11-27 19:00:20 +00:00
gtest build: pass licensing information in SPDX form 2024-06-29 12:42:49 -07:00
imagination vulkan: rename depth bias graphics states 2024-12-06 13:48:26 -05:00
imgui
intel brw: Combine convergent texture buffer fetches into fewer loads 2024-12-12 00:05:42 +00:00
loader loader: Fix typo in __DRI_IMAGE_FORMAT_XBGR16161616 definition 2024-10-25 14:18:24 +00:00
mapi meson: remove selinux option 2024-10-21 01:14:35 +00:00
mesa mesa: when blitting between formats clear any unused components 2024-12-05 18:27:37 +00:00
microsoft microsoft/clc: Initialize printf buffer for tests 2024-12-10 19:13:07 +00:00
nouveau nvk/ci: update the ga106 expectations 2024-12-11 04:24:35 +00:00
panfrost panvk/ci: update g52-vk-full job 2024-12-11 20:19:43 +00:00
tool perfetto: Add Panfrost data sources to system.cfg 2024-08-22 18:33:45 +00:00
util util/format: nr_channels is always <= 4 2024-12-11 18:34:47 +00:00
virtio treewide: Stop putting enum in front of Vulkan enum types 2024-12-02 17:22:49 +00:00
vulkan wsi/wayland: Add forward progress guarantee for present wait. 2024-12-11 11:51:48 +00:00
x11 meson: require dri3 modifiers 2024-09-06 17:34:17 +00:00
.clang-format nir: add helpers for precompiled shaders 2024-11-28 17:34:12 +00:00
meson.build meson: simplify logic a bit 2024-11-26 20:45:41 +00:00