fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 11:30:11 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Ian Romanick	73f365e208	intel/brw: load_offset cannot be constant on this path Literally inside an if-statement (about 26 lines before this hunk) that checks for !nir_src_is_const(instr->src[1]). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Caio Oliveira	695f5314d6	intel/brw: Simplify fs_inst annotation When INTEL_DEBUG=ann is also set, the disassembler would annotate the output with either a string or the string verison of a NIR instruction. This was done by keeping two pointers (but only using one at a time). Change the code to print the instruction into a string instead of keeping it pointer around (peg the string to the shader). That way, only one pointer is needed for annotations. Because that serialization is not free, only do that when the environment variable is set. Since we are here, move the annotation string field to the end, moving it to the least commonly used cacheline. Further packing might allow the entire fs_inst to fit in two cachelines. For release builds, don't even add the debug annotation to the struct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>	2024-08-28 03:59:50 +00:00
Caio Oliveira	31dfb04fd3	intel/brw: Remove long register file names The long names were originally meant to map to the HW encoding but nowadays the actual encoding values depend on gfx version, whether instruction is 3src, etc. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	d31c8bfb6f	intel/brw: Remove more uses of variable length arrays In these cases there's a clear bound we can use. In C++ this is a compiler extension and not compatible with zero initializing a regular struct -- which will happen in a later change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	86c20e2910	intel/brw: Use a helper for common VEC pattern In the helper, instead of using the Variable Length Array, use a fixed size array to NIR_MAX_VEC_COMPONENTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	abc535a3b4	intel/brw: Remove unused variable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:13 +00:00
Francisco Jerez	71ca8529c5	intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations. We were currently emitting logical atomic instructions with a packed destination region for sub-dword LSC atomics, along the lines of: > untyped_atomic_logical(32) dst<1>:HF, ... However, these instructions use an LSC data size D16U32, which means that the 16b data on the return payload is expanded to 32b by the LSC shared function, so we were lying to the compiler about the location of the individual channels on the return payload, its execution masking, etc. This is why the hacks that manually set the 'inst->size_written' of the instruction were required. In some cases this worked, but any non-trivial manipulation of the instruction destination by lowering or optimization passes could have led to corruption, as has been reproduced in deqp-vk during lower_simd_width() for shaders that use 16-bit atomics in SIMD32 dispatch mode. Note that LSC sub-dword reads aren't affected by this because they use raw UD destinations and specify the actual bit size of the operation datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't work for atomic operations since that immediate specifies the atomic opcode. Instead, have the logical operation implement the behavior of 16-bit destinations correctly instead of silently replacing the 16-bit region with an inconsistent 32-bit region -- This is done by emitting the MOV instructions used to pack the data from the UD temporary into the packed destination from the lower_logical_sends() pass instead of from the NIR translation pass. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>	2024-08-21 02:33:12 +00:00
Sagar Ghuge	c4f2a8d984	intel/compiler: Fix indirect offset in GS input read for Xe2+ Make sure to take new GRF size into consideration and adjust the indirect offset according to new size so that when we do the indirect load with address register, we load right values. This helps pass the following tests: - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.geom - dEQP-VK.ray_query.geometry_shader. Backport-to: 24.2 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>	2024-08-16 18:40:13 +00:00
Ian Romanick	c8038643b8	intel/brw: Make ifind_msb SSA friendly No shader-db changes on any Intel platform. v2: Use negate(tmp) instead of creating a new temporary. Suggested by Ken. fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06% Totals from 40 (0.01% of 633223) affected shaders: Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00% Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00% Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11% Spill count: 59795 -> 59794 (-0.00%) Fill count: 103513 -> 103509 (-0.00%) Totals from 40 (0.01% of 632445) affected shaders: Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00% Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43% Spill count: 16896 -> 16895 (-0.01%) Fill count: 27819 -> 27815 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Ian Romanick	e9c151fde6	intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00% Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00% Spill count: 78571 -> 78525 (-0.06%) Fill count: 148178 -> 148132 (-0.03%) Totals from 210 (0.03% of 633223) affected shaders: Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08% Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00% Spill count: 15632 -> 15586 (-0.29%) Fill count: 26241 -> 26195 (-0.18%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	eec02246f8	brw: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Caio Oliveira	2e2b83f72d	intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION Instead of emitting a single one at the top, and making reference to it, emit the virtual instruction as needed and let CSE do its job. Since load_subgroup_invocation now can appear not at the start of the shader, use UNDEF in all cases to ensure that the liveness of the destination doesn't extend to the first partial write done here (it was being used only for SIMD > 8 before). Note this option was considered in the past `6132992cdb` but at the time dismissed. The difference now is that the lowering of the virtual instruction happens earlier than the scheduling. The motivation for this change is to allow passes other than the NIR conversion to use this value. The alternative of storing a `brw_reg` in the shader (instead of NIR state) gets complicated by passes like compact_vgrfs, that move VGRFs around (and update the instructions). This and maybe other passes would have to care about the brw_reg. Fossil-db numbers, TGL ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00% Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03% Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01% Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01% Max live registers: 31725096 -> 31671792 (-0.17%) Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00% Totals from 52574 (8.34% of 630441) affected shaders: Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02% Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29% Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03% Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02% Max live registers: 1992493 -> 1939189 (-2.68%) Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05% ``` and DG2 ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7699776 -> 7699784 (+0.00%) Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02% Spill count: 61283 -> 61274 (-0.01%) Fill count: 119886 -> 119849 (-0.03%) Max live registers: 31810432 -> 31758920 (-0.16%) Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06% Totals from 49286 (7.81% of 631231) affected shaders: Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04% Subgroup size: 857752 -> 857760 (+0.00%) Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72% Spill count: 6339 -> 6330 (-0.14%) Fill count: 12571 -> 12534 (-0.29%) Max live registers: 1788346 -> 1736834 (-2.88%) Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66% ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>	2024-08-08 18:20:49 +00:00
Kenneth Graunke	32cce2f397	intel/brw: Set appropriate types for 16-bit sampler trailing components 16-bit SIMD8 sampler writeback messages come with a bit of padding in them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into the padding-free format expected by NIR. Additionally, we may reduce the response length on the sampler messages based on which components of the (always vec4) NIR destination are actually in use. When we do that, dest_size > read_size, and the trailing components are all empty BAD_FILE registers, indicating the contents are undefined. Unfortunately, we can't ignore those trailing components entirely. In the past, we left them default-initialized, giving us a BAD_FILE register with UD type (which didn't matter, since all sampler returns were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD. For example, writing RGB and skipping A (without sparse) would produce read_size = 3 and dest_size = 4 and nir_dest[5] containing: nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud> We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see 3 HF's and a UD, and try to copy the full 32-bit value at the end, instead of 16-bits of pad like we intended. This meant it would overflow the destination register's size, triggering validation errors. Thanks to Ian Romanick for noticing this, writing a test, and also coming up with a nearly identical fix. Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617 References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>	2024-08-06 17:26:05 +00:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Caio Oliveira	ba3fd5dc57	intel/brw: Don't retype load_subgroup_invocation result to signed The values are small unsigned integers, so their signed representation will be the same -- the sign conversion is not needed. As a result the extra MOV can be elided by the optimizations. Fossil-db results for DG2 ``` Totals: Instrs: 151779000 -> 151761591 (-0.01%) Cycle count: 12743968649 -> 12742826024 (-0.01%); split: -0.01%, +0.00% Max live registers: 31834993 -> 31834996 (+0.00%) Totals from 17018 (2.70% of 631450) affected shaders: Instrs: 2381740 -> 2364331 (-0.73%) Cycle count: 76798588 -> 75655963 (-1.49%); split: -1.70%, +0.22% Max live registers: 378921 -> 378924 (+0.00%) ``` and TGL ``` Totals: Instrs: 149812033 -> 149794080 (-0.01%); split: -0.01%, +0.00% Cycle count: 11534727002 -> 11534929834 (+0.00%); split: -0.01%, +0.01% Spill count: 42510 -> 42511 (+0.00%); split: -0.00%, +0.01% Fill count: 75100 -> 75101 (+0.00%); split: -0.00%, +0.00% Max live registers: 31727318 -> 31727321 (+0.00%) Totals from 17421 (2.76% of 630458) affected shaders: Instrs: 3092614 -> 3074661 (-0.58%); split: -0.58%, +0.00% Cycle count: 286061417 -> 286264249 (+0.07%); split: -0.32%, +0.39% Spill count: 11538 -> 11539 (+0.01%); split: -0.02%, +0.03% Fill count: 21359 -> 21360 (+0.00%); split: -0.01%, +0.01% Max live registers: 418954 -> 418957 (+0.00%) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30508>	2024-08-05 18:05:45 +00:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Sushma Venkatesh Reddy	ddd9e043dc	intel/brw: Move get_nir_def() higher to avoid UNDEF While extending our backend to handle 16-bit sampler return payloads, we found that in piglit's arb_texture_view-rendering-formats, the SIMD8 FS was missing the sampling operation altogether. This was because we were first emitting the texturing instruction, and then calling get_nir_def(), which adds an UNDEF instruction when the destination is smaller than the 32-bit. So the texturing was dead code elimated. Fix this by calling get_nir_def() earlier. Thank you to Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Ian Romanick	e6669467b8	intel/brw: Fix undefined left shift of negative value in brw_texture_offset When -fsanitize=shift is used, many instances of the following are produced: src/intel/compiler/brw_fs_nir.cpp:114:30: runtime error: left shift of negative value -1 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:17:59 -07:00
Caio Oliveira	23b0798551	intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a5cc8c4807	intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	5cb1f46fd1	intel/brw: Remove workgroup_size() helper from fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Lionel Landwerlin	67b778445a	brw: fix uniform rebuild of sources If you have something like this : con 32 %66 = @load_reg (%62) (base=0, legacy_fabs=0, legacy_fneg=0) con 32 %27 = @resource_intel (%22 (0xdeaddead), %66, %67, %17 (0x0)) (desc_set=2, binding=96, resource_intel=0, resource_block_intel=-1) Just copying the brw_reg in ssa_values[] is not enough for the load_reg intrinsic. We need to call get_nir_src() to force some logic to create the register correct. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8209d69ff` ("intel/fs: Add support for new-style registers") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30050>	2024-07-18 19:58:46 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Sagar Ghuge	edcad250ed	intel/compiler: Don't use half float param for sample_b Looks like some of the tests uses the bias which does not fit into half float parameter, so it's better to use float param for sample_b. If we have cube arrays, we anyway combine BIAS and array index properly so we don't have to worry about the first parameter. This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>	2024-06-28 03:33:18 +00:00
Jordan Justen	7b3149c99b	intel/brw: Retype some regs to BRW_TYPE_UD for Xe2 indirect accesses Following https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957, some Xe2 code paths started triggering asserts. In the cases fixed by this patch, it was because of the assert added to brw_type_larger_of() in `cf8ed9925f` ("intel/brw: Make a helper for finding the largest of two types"), and then brw_type_larger_of() is used in `674e89953f`. (For example, the assert was triggering when the SHL types differed between D and UD.) Fixes: `674e89953f` ("intel/brw: Use new builder helpers that allocate a VGRF destination") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29925>	2024-06-27 21:51:07 +00:00
Francisco Jerez	79fa3eba11	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel sample. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_sample that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. This implementation simply looks up the X/Y offsets of each sample index on the table provided in the PS thread payload by using indirect addressing, then does the actual interpolation by recursing into emit_pixel_interpolater_alu_at_offset() introduced in the previous commit. Note that even though this is only immediately useful on Xe2+ platforms there's no reason why it shouldn't work on earlier platforms, as long as we have the sample X/Y offsets available in the thread payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	95eec5a0dd	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_offset that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. That's okay since we can get all the primitive setup information needed for interpolation at an arbitrary coordinate: We use the X/Y offset relative to the "X/Y Start" coordinates from the thread payload order to evaluate the plane equations also provided in the thread payload for each barycentric coordinate of each polygon. The evaluation of the barycentric plane equations (and the RHW plane equation for perspective-correct interpolation) uses the accumulator and MAD/MAC for ALU efficiency, but that means we need to manually split instructions to fit the width of the accumulator. The division and scaling for perspective-correct interpolation is also now done in the shader if necessary. Note that even though this is only immediately useful on Xe2+, the thread payload numbers are filled out for older platforms, and the EU restrictions of previous Xe platforms are taken into account, mostly for the purposes of testing and performance evaluation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Ian Romanick	ea6e10c0b2	intel/brw: Temporarily disable result=float16 matrix configs Even though the hardware does not naively support these configurations, there are many potential benefits to advertising them. These configurations can theoretically use half the memory bandwidth for loads and stores. For large matrices, that can be the limiting in performance. The current implementation, however, has a number of significant problems. The conversion from float16 to float32 is performed in the driver during conversion from NIR. As a result, many common usage patterns end up doing back-to-back conversions to and from float16 between matrix multiplications (when the result of one multiplication is used as the accumulator for the next). The float16 version of the matrix waste half the possible register space. Each float16 value sits alone in a dword. This is done so that the per-invocation slice of an 8x8 float16 result matrix and an 8x8 float32 result matrix will have the same number of elements. This makes it possible to do straightforward implementations of all the unary_op type conversions in NIR. It would be possible to perform N:M element type conversions in the backend using specialized NIR intrinsics. However, per #10961, this would be very, very painful. My hope is that, once a suitable resolution for that issue can be found, support for these configs can be restored. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 13:52:12 -07:00
Lionel Landwerlin	339630ab05	brw: enable A64 loads source rematerialization Allows to avoid Wa_1407528679 on A64 loads Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	5227b2db73	brw: annotation send instructions with surface handles generated with exec_all Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	b79e85a93f	brw: always use new registers for load address increments Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	7f1ca16e3b	brw: enable rematerialization of non 32bit uniforms Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	0531f568ac	brw: remove some brackets Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	11a634151b	brw: remove rematerialization assert The default case should lead us to the next rematerialization block so this is useless. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	d42bc0d3fc	brw: bound the amount of rematerialized NIR instructions Some of the instructions we don't need to rematerialize because we already know they are executed with NoMask so we can use their destination without reemitting them again. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	4bfb4f35a8	brw: improve rematalization of surface/sampler handles This change handles patterns like this con v0 = load_ubo ... con v1 = add v0, 0x30 con v2 = load_ubo v1, 0x0 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	c7b312ad45	brw: factor out source extraction for rematerialization Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	8fbbc9c301	brw: add missing break Not fixing anything because of the default case below. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Francisco Jerez	c1feccdd90	intel/fs/gfx20+: Fix surface state address on extended descriptors for NIR scratch intrinsics. The r0.5 thread payload register contains Surface State Offset bits [27:6] as bits [31:10], so we need to shift the register right by 4 in order to get the surface state offset expected in ExBSO mode, which is the only extended descriptor encoding supported by the UGM shared function for SS addressing on Xe2+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>	2024-06-21 01:49:43 +00:00
Kenneth Graunke	ad9e414aa9	intel/brw: Skip LOAD_PAYLOADs after every texture instruction if possible This avoids generating a bunch of trash we have to clean up later. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	84219892ad	intel/brw: Make gl_SubgroupInvocation lane index loading SSA Our code to initialize gl_SubgroupInvocation uses multiple instructions some of which are partial writes. This makes it difficult to analyze expressions involving gl_SubgroupInvocation, which appear very frequently in compute shaders. To make this easier, we add a new virtual opcode which initializes a full VGRF to the value of gl_SubgroupInvocation. (We also expand it to UD for SIMD8 so there are not partial write issues.) We then lower it to the original code later on in compilation, after we've done the bulk of our optimizations. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Francisco Jerez	06e4e088a3	intel/brw/xe2+: Use active-thread-only barriers available since Xe2+. These allow avoiding dead-locks in non-compliant applications that execute barriers under non-uniform control flow. They're not expected to have any major disadvantage so let's enable them unconditionally. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>	2024-06-17 16:19:18 -07:00
Daniel Schürmann	9b1a748b5e	nir: remove nir_intrinsic_discard The semantics of discard differ between GLSL and HLSL and their various implementations. Subsequently, numerous application bugs occurred and SPV_EXT_demote_to_helper_invocation was written in order to clarify the behavior. In NIR, we now have 3 different intrinsics for 2 things, and while demote and terminate have clear semantics, discard still doesn't and can mean either of the two. This patch entirely removes nir_intrinsic_discard and nir_intrinsic_discard_if and replaces all occurences either with nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the case that the NIR option 'discard_is_demote' is being set. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>	2024-06-17 19:37:16 +00:00
Kenneth Graunke	e2d9ff8004	intel/brw: Handle scratch address swizzling of constants Pass in the nir_src and check if it's constant, handling it via CPU-side arithmetic instead of emitting instructions. While we can constant fold these via our optimization passes, we have to do opt_algebraic to fold the binary operation with constant sources into a MOV of an immediate, then opt_copy_propagation to put it in the next expression, and so on, until the entire expression is folded. This can take several iterations of the optimization loop, which is inefficient. For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch intrinsics with constant sources, and this patch removes a number of optimization passes according to INTEL_DEBUG=optimizer. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:54 -07:00

1 2 3 4 5 ...

699 commits