mesa/src/intel/compiler at c7a7f0244f3d3e02c5b8c677cda52b97cd546349 - fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

History

Kenneth Graunke 6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>		2024-12-12 00:05:42 +00:00
..
elk	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
tests	intel/brw: Remove assembler tests for Gfx8-	2024-02-24 02:10:56 +00:00
brw_asm.c	intel/brw: Dump errors when brw_assemble() fails EU validation	2024-12-10 20:23:25 +00:00
brw_asm.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_asm_internal.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_asm_tool.c	intel/brw: Split off assembler logic into library	2024-07-12 19:34:23 +00:00
brw_cfg.cpp	intel/brw: Add a file parameter to idom_tree::dump()	2024-08-22 22:54:45 +00:00
brw_cfg.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_compile_bs.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compile_cs.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compile_fs.cpp	brw: move barycentric_mode enum to intel_shader_enums.h	2024-11-26 13:05:30 +00:00
brw_compile_gs.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compile_mesh.cpp	brw: fix task/mesh push constant loading	2024-10-26 18:12:41 +00:00
brw_compile_tcs.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compile_tes.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compile_vs.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_compiler.c	nir: add option to use compact view indices	2024-12-09 20:31:49 +00:00
brw_compiler.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_debug_recompile.c	intel/brw: Simplify @file annotations	2024-07-22 22:48:03 +00:00
brw_def_analysis.cpp	intel: Add statistic for Non SSA registers after NIR to BRW	2024-10-11 06:40:29 +00:00
brw_device_sha1_gen_c.py	intel/compiler: drop unused ray-tracing fields from cache hash	2024-03-22 00:01:28 +00:00
brw_disasm.c	intel/brw_asm: Add BranchCtrl support	2024-11-02 18:01:19 +00:00
brw_disasm.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_disasm_info.cpp	intel/brw: Simplify fs_inst annotation	2024-08-28 03:59:50 +00:00
brw_disasm_info.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_disasm_tool.c	intel/brw: Remove Gfx8- code from disassembler	2024-02-28 05:45:38 +00:00
brw_eu.c	brw,elk: Fix opening flags on dumping shader binaries	2024-08-27 08:26:08 +00:00
brw_eu.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_eu_compact.c	intel/compiler: Xe2 and Xe3 use the same compaction tables	2024-10-26 07:39:30 +00:00
brw_eu_defines.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_eu_emit.c	brw/emit: Add correct 3-source instruction assertions for each platform	2024-11-08 16:48:57 +00:00
brw_eu_validate.c	intel/brw: Fix decoding of cond_modifier and saturate in EU validation	2024-11-22 21:15:46 +00:00
brw_fs.cpp	intel/brw: Add is_control_source for the new subgroup ops	2024-12-04 01:19:37 +00:00
brw_fs.h	brw: Combine convergent texture buffer fetches into fewer loads	2024-12-12 00:05:42 +00:00
brw_fs_bank_conflicts.cpp	intel/brw: Simplify @file annotations	2024-07-22 22:48:03 +00:00
brw_fs_builder.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_fs_cmod_propagation.cpp	brw: Fix mov cmod propagation when there's int signedness mismatch	2024-09-09 22:13:08 +00:00
brw_fs_combine_constants.cpp	intel/brw: Allow immediates in the BFE instruction on Gfx12+	2024-10-24 21:31:28 +00:00
brw_fs_copy_propagation.cpp	brw/copy: Allow copy prop into src1 of broadcast	2024-12-05 00:15:27 +00:00
brw_fs_cse.cpp	brw/cse: Don't eliminate instructions that write flags	2024-11-08 17:46:45 +00:00
brw_fs_dead_code_eliminate.cpp	intel/brw: Delete old-style surface and A64 message opcodes	2024-09-12 20:54:36 +00:00
brw_fs_generator.cpp	brw: add a NOP in between WHILE instructions on LNL	2024-10-31 23:57:10 +00:00
brw_fs_live_variables.cpp	intel/brw: Simplify @file annotations	2024-07-22 22:48:03 +00:00
brw_fs_live_variables.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_fs_lower.cpp	intel/brw: Simplify fs_inst annotation	2024-08-28 03:59:50 +00:00
brw_fs_lower_dpas.cpp	intel/brw: Replace uses of fs_reg with brw_reg	2024-07-03 02:53:19 +00:00
brw_fs_lower_integer_multiplication.cpp	intel/brw: Replace uses of fs_reg with brw_reg	2024-07-03 02:53:19 +00:00
brw_fs_lower_pack.cpp	intel/brw: Replace uses of fs_reg with brw_reg	2024-07-03 02:53:19 +00:00
brw_fs_lower_regioning.cpp	brw/lower: Don't "fix" regioning of broadcast	2024-12-05 00:15:27 +00:00
brw_fs_lower_simd_width.cpp	brw: Allow SIMD32 math instructions on Xe2	2024-12-04 02:42:34 +00:00
brw_fs_nir.cpp	brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM	2024-12-09 15:45:49 +00:00
brw_fs_opt.cpp	brw: Combine convergent texture buffer fetches into fewer loads	2024-12-12 00:05:42 +00:00
brw_fs_opt_algebraic.cpp	brw/build: Use SIMD8 temporaries in emit_uniformize	2024-12-05 00:15:27 +00:00
brw_fs_opt_virtual_grfs.cpp	brw: fix virtual register splitting to not go below physical register size	2024-09-18 23:26:34 +00:00
brw_fs_reg_allocate.cpp	brw: use transpose unspill messages when possible	2024-12-04 08:59:07 +00:00
brw_fs_register_coalesce.cpp	intel/brw: Simplify @file annotations	2024-07-22 22:48:03 +00:00
brw_fs_saturate_propagation.cpp	intel/brw: Use def analysis for simple cases of saturate propagation	2024-08-09 14:26:05 -07:00
brw_fs_scoreboard.cpp	intel/brw: Allow extra SWSB encodings for Xe2	2024-11-19 04:27:00 +00:00
brw_fs_thread_payload.cpp	brw: move barycentric_mode enum to intel_shader_enums.h	2024-11-26 13:05:30 +00:00
brw_fs_validate.cpp	intel/brw: Add SHADER_OPCODE_QUAD_SWAP	2024-11-22 00:27:01 +00:00
brw_fs_visitor.cpp	intel/brw: Add phases to backend	2024-10-11 06:40:29 +00:00
brw_fs_workaround.cpp	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot	2024-10-23 15:02:27 +00:00
brw_gram.y	intel/brw_asm: Add BranchCtrl support	2024-11-02 18:01:19 +00:00
brw_inst.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_ir.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_ir_allocator.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_ir_analysis.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_ir_fs.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_ir_performance.cpp	intel/brw/xe2+: Adjust performance analysis divergence weight due to EU fusion removal.	2024-10-24 22:06:52 +00:00
brw_ir_performance.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_isa_info.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_kernel.c	intel-clc: missing printf lowering	2024-08-06 17:55:18 +00:00
brw_kernel.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_lex.l	intel/brw_asm: Add BranchCtrl support	2024-11-02 18:01:19 +00:00
brw_lower_logical_sends.cpp	brw: rename brw_sometimes to intel_sometimes	2024-11-26 13:05:30 +00:00
brw_lower_subgroup_ops.cpp	intel/brw: Add SHADER_OPCODE_QUAD_SWAP	2024-11-22 00:27:01 +00:00
brw_nir.c	nir: treat per-view outputs as arrayed IO	2024-12-09 20:31:49 +00:00
brw_nir.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_nir_analyze_ubo_ranges.c	brw: Only consider components read for UBO push analysis	2024-12-03 02:02:33 +00:00
brw_nir_lower_alpha_to_coverage.c	brw: rename brw_sometimes to intel_sometimes	2024-11-26 13:05:30 +00:00
brw_nir_lower_cooperative_matrix.c	intel/brw/xe2+: Allow vec16 for cooperative matrix	2024-06-25 14:17:47 -07:00
brw_nir_lower_cs_intrinsics.c	compiler: Allow derivative_group to be used for all stages in shader_info	2024-09-03 20:03:18 +00:00
brw_nir_lower_fsign.py	intel/brw: Use range analysis to optimize fsign	2024-05-14 01:28:21 +00:00
brw_nir_lower_intersection_shader.c	intel/rt: fix terminateOnFirstHit handling	2024-08-05 21:43:36 +00:00
brw_nir_lower_ray_queries.c	intel/rt: fix ray_query stack address calculation	2024-11-08 18:31:52 +00:00
brw_nir_lower_rt_intrinsics.c	brw/rt: fix ray_object_(direction\|origin) for closest-hit shaders	2024-08-13 10:28:50 +00:00
brw_nir_lower_shader_calls.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
brw_nir_lower_storage_image.c	intel/brw: Drop image_{load,store}_raw_intel handling	2024-08-09 07:20:08 +00:00
brw_nir_opt_fsat.c	intel/brw: Move fsat instructions closer to the source	2024-08-09 14:26:10 -07:00
brw_nir_rt.c	brw/nir: rework inline_data_intel to work with compute	2024-10-17 19:35:59 +00:00
brw_nir_rt.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_nir_rt_builder.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_nir_trig_workarounds.py
brw_opt_txf_combiner.cpp	brw: Combine convergent texture buffer fetches into fewer loads	2024-12-12 00:05:42 +00:00
brw_packed_float.c
brw_prim.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_print.cpp	intel/brw: Fix SWSB output when printing IR	2024-11-22 21:47:46 +00:00
brw_private.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_reg.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_reg_type.c	intel/brw: Rename brw_reg_type_to_hw_type to brw_type_encode	2024-04-25 11:41:48 +00:00
brw_reg_type.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_rt.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
brw_schedule_instructions.cpp	intel/brw: Only force g0's liveness to be the whole program if spilling	2024-08-01 16:37:34 -07:00
brw_shader.cpp	intel/brw: Delete old-style surface and A64 message opcodes	2024-09-12 20:54:36 +00:00
brw_simd_selection.cpp	intel/brw: fix subgroup size of geometry stages for lnl+	2024-05-14 23:13:37 +00:00
brw_vue_map.c	intel/brw: Simplify @file annotations	2024-07-22 22:48:03 +00:00
intel_clc.c	clc: Tell clang to track imported dependencies	2024-12-06 13:48:26 -05:00
intel_gfx_ver_enum.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
intel_nir.c	intel/compiler: Rename the passes and files related to intel_nir.h	2024-02-16 22:35:05 +00:00
intel_nir.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
intel_nir_blockify_uniform_loads.c	Revert in correct commit "fix"	2024-11-26 16:36:06 +02:00
intel_nir_clamp_image_1d_2d_array_sizes.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_clamp_per_vertex_loads.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_lower_conversions.c	intel/nir: Don't needlessly split u2f16 for nir_type_uint32	2024-07-11 02:37:05 -07:00
intel_nir_lower_non_uniform_barycentric_at_sample.c	nir: change signature of nir_src_is_divergent()	2024-10-24 10:06:17 +00:00
intel_nir_lower_non_uniform_resource_intel.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_lower_printf.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_lower_shading_rate_output.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_lower_sparse.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_lower_texture.c	intel/compiler: Pack texture LOD and offset to a single 32-bit value	2024-02-27 00:22:46 +00:00
intel_nir_opt_peephole_ffma.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_opt_peephole_imul32x16.c	treewide: use nir_metadata_control_flow	2024-06-17 16:28:14 -04:00
intel_nir_tcs_workarounds.c	intel/nir: Set src_type on TCS quads workaround store_output	2024-05-02 13:58:21 -07:00
intel_shader_enums.h	intel/compiler: Use #pragma once instead of header guards	2024-12-11 19:47:44 +00:00
meson.build	brw: Combine convergent texture buffer fetches into fewer loads	2024-12-12 00:05:42 +00:00
test_eu_compact.cpp	intel/brw: Enable EU validation and compaction tests for PTL	2024-12-04 23:03:11 +00:00
test_eu_validate.cpp	intel/brw: Enable EU validation and compaction tests for PTL	2024-12-04 23:03:11 +00:00
test_fs_cmod_propagation.cpp	brw: Fix mov cmod propagation when there's int signedness mismatch	2024-09-09 22:13:08 +00:00
test_fs_combine_constants.cpp	intel/brw: Move calculate_cfg out of fs_visitor	2024-07-25 15:37:13 +00:00
test_fs_copy_propagation.cpp	intel/brw: Copy prop from raw integer moves with mismatched types	2024-08-30 03:39:31 +00:00
test_fs_cse.cpp	intel/brw: Move calculate_cfg out of fs_visitor	2024-07-25 15:37:13 +00:00
test_fs_saturate_propagation.cpp	brw/sat: Convert nearly all tests to use new style builders	2024-10-25 20:31:45 +00:00
test_fs_scoreboard.cpp	intel/brw: Allow extra SWSB encodings for Xe2	2024-11-19 04:27:00 +00:00
test_simd_selection.cpp	intel: Remove brw_ prefix from process debug function	2024-02-16 22:35:05 +00:00
test_vf_float_conversions.cpp