mesa/src/intel
Kenneth Graunke 24c66d3871 brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize
This helps cut down URB messages on tessellation and mesh shaders
significantly.  fossil-db results on Battlemage:

   Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01%
   Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00%
   Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00%
   Spill count: 576554 -> 576616 (+0.01%)
   Fill count: 545304 -> 545413 (+0.02%)
   Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04%
   Max dispatch width: 39856192 -> 39856208 (+0.00%)

   Totals from 4231 (0.27% of 1583648) affected shaders:
   Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40%
   Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03%
   Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70%
   Spill count: 378 -> 440 (+16.40%)
   Fill count: 1308 -> 1417 (+8.33%)
   Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36%
   Max dispatch width: 67952 -> 67968 (+0.02%)

There are several reasons we didn't go with nir_opt_vectorize_io:

1. nir_opt_vectorize_io appears to work on the slot location level.
   We want to be able to vectorize based on the URB offsets, especially
   for cases like point size, layer, and viewport which have different
   VARYING_SLOT_* values but live in the same vec4 in a URB entry.

2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize
   within a single 32-bit vec4.  It does handle 8 components, but that's
   only for packing 16-bit values into a 32-bit vec4.

Improves performance of Sascha Willems' tessellation demo by around 4%
on Meteorlake.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
..
blorp blorp: fix asserts hit with msaa blorp blits on xe3 2026-01-27 15:28:55 +00:00
ci ci: update trace checksums 2026-01-19 16:11:29 +00:00
common intel/measure: Define snapshot type for HiZ partial resolves. 2026-01-27 08:52:16 +00:00
compiler brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize 2026-01-27 16:08:36 +00:00
decoder intel/decoder: make libvulkan_intel to depend on stub decoder when buildtyle=release. 2025-11-24 16:40:02 +08:00
dev intel/dev: Add INTEL_DEVICE_INFO_MMAP_MODE_INVALID 2026-01-26 15:24:55 +00:00
ds anv: instrument resource barriers instruction in u_trace 2025-12-15 08:25:42 +00:00
executor meson: make dep_lua a disabler 2025-11-21 21:48:57 +00:00
genxml intel/blorp: Add support for partial resolves of HiZ-CCS surfaces. 2026-01-27 08:52:17 +00:00
isl intel/isl: Add unit tests for ISL_AUX_STATE_COMPRESSED_HIER_DEPTH. 2026-01-27 08:52:18 +00:00
mda intel/mda: Handle better processing a lot of archives 2025-12-13 01:21:08 +00:00
nullhw-layer build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
perf intel/perf: Add Gfx 12.5 mdap_metrics struct and set it 2026-01-19 19:24:16 +00:00
shaders util/glsl2spirv: Use better glslang flag for -Olib 2025-11-20 02:14:50 +00:00
tools intel/hang_replay: add option to dump VM state as part of the dump 2026-01-07 19:16:25 +00:00
vulkan driconf: LTO disable 2026-01-27 14:57:20 +00:00
vulkan_hasvk intel/isl: Define ISL_AUX_STATE_COMPRESSED_HIER_DEPTH aux state. 2026-01-27 08:52:12 +00:00
meson.build brw: Move into a new src/intel/compiler/brw subdirectory 2025-10-09 07:01:47 +00:00