fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 11:20:11 +01:00

Author	SHA1	Message	Date
Rohan Garg	b5040bfc3f	intel/brw: Handle typed surface and atomic messages for xe2+ Reworks: * Francisco: Rebase on `07b9bfacc7` ("intel/compiler: Move logical-send lowering to a separate file") * Jordan: Rebase on `952a523abb` ("intel: switch over to unified atomics") Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Francisco Jerez	74efde7663	intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Francisco Jerez	f1812437e8	intel/eu/xehp+: Don't initialize mlen and rlen descriptor fields from lsc_msg_desc*(). These fields are overlapping with the ones set by brw_message_desc(), so the latter should be used instead. This fixes corruption of the LSC message descriptors when inconsistent values are specified through both helpers, which can happen if the 'inst->mlen' field is modified during optimization (e.g. by opt_split_sends()). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Francisco Jerez	fa96274a87	intel/brw/xehp+: Replace lsc_msg_desc_dest_len()/lsc_msg_desc_src0_len() with helpers to do the computation. We cannot rely on the immediate message descriptor having accurate values for mlen and rlen at the IR level, since they are updated at codegen time via 'inst->mlen' and 'inst->size_written', which could end up with values inconsistent with the message descriptor if e.g. the split sends optimization had an effect. Instead, define helpers that do the computation without relying on the message descriptor, and use the pre-existing brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully equivalent to the lsc helpers deleted here) during disassembly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Ian Romanick	5f9ab41457	intel/brw/xe2: Update uniform handling to account for 512b physical registers Rework: * Jordan: Drop FINISHME (s-b Caio) * Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio) * Ian: Make use of reg_unit() in round_components_to_whole_registers() Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Ian Romanick	8587ef172c	intel/brw/xe2: Update brw_nir_analyze_ubo_ranges to account for 512b physical registers Rework: * Jordan: Use `REG_SIZE * reg_unit` (Suggested by Caio) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Caio Oliveira	d9e737212d	intel/brw: Add a src array for the common case in fs_inst In the common case, fs_inst will have up to 4 sources (the HW instructions have up to 3, and our representation of SENDs have 4). Embed such array into the fs_inst, and use it whenever applicable instead of allocating a new array. Also change the code to reuse the allocated src array when resizing to a smaller length. Between the changes above and the reduced amount of initializing fs_regs, this reduces fossil-db time by around 2% for Borderlands 3 and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Caio Oliveira	dae9795628	intel/brw: Remove vestiges of sources on IF opcode, only valid on Gfx6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Kenneth Graunke	816a33849a	intel/brw: Rearrange fs_inst fields For better packing, and to make all the small fields easier to hash and compare en masse. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Ian Romanick	5e9c01dfe4	intel/brw/xe2+: Use phys_nr and phys_subnr in DPAS encoding Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Ian Romanick	6d85f7129a	intel/brw/xe2+: DPAS must be SIMD16 now Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Ian Romanick	a8115221e5	nir: intel/brw: Change the order of sources for nir_dpas_intel It was by pure luck that all sources (and the result) of nir_dpas_intel had the same number of components. It is possible to support matrix sizes where the accumlator matrix and the result matrix are larger (e.g., 16x8 * 8x16 = 16x16). This breaks all of the assumptions of NIR's infrastructure for code generating intrinsics. Fix the by making the accumulator matrix be the first source. The accumulator and the result will always have the same dimensions (due to rules of matrix multiplication) and the same type (due to restructions of the cooperative matrix extension). This forces them to have the same number of components. This doesn't fix all the potential problems. NIR expects that all 0-sized sources will have the same number of components. This just ensures that the result has the correct number of components. Fixes: `6b14da33ad` ("intel/fs: nir: Add nir_intrinsic_dpas_intel") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Ian Romanick	c6bd6f2a41	intel/brw: Use enums for DPAS source regioning Was previously passing 1, 1, 0 as the regioning. This generated incorrect disassembly because the encoding for a width of 1 is 0. Use the enums to ensure the correct values are used. Fixes: `1c92dad5cb` ("intel/disasm: Disassembly support for DPAS") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Ian Romanick	be4fa59a72	intel/brw: Clear write_accumulator flag when changing the destination If the destination was the accumulator but is no longer, having the flag set is not correct. On Xe2 this also causes a validation error. v2: Reword the comment to be more clear. Suggested by Jordan. Fixes: `efa4e4bc5f` ("intel/fs: Introduce regioning lowering pass.") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Rohan Garg	df3a1348d1	intel/brw: minor rework to de duplicate variable assignment Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Rohan Garg	a715512177	intel/brw: adjust the copy propgation pass to account for wider GRF's on Xe2+ Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Rohan Garg	7d425913f7	intel/brw: update disassembly for MATH pipe Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Rohan Garg	467ee9d27a	intel/brw: Xe2+ can do SIMD16 for extended math on HF types BSpec 56797: Math operation rules when half-floats are used on both source and destination operands and both source and destinations are packed. The execution size must be 16. Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Rohan Garg	c4b38c717d	intel/brw: account for sources when determining if a operation uses half floats Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Kenneth Graunke	348506462a	intel/brw: Stop checking mlen on math opcodes in CSE pass These were only messages on Gfx4 which we no longer support here. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	a203722634	intel/brw: Delete brw_fs_lower_minmax This is for old hardware and never called in brw. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	e5a0f3b570	intel/brw: Allow changing types for LOAD_PAYLOAD with 1 source This is equivalent to a MOV. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	c0c05c1041	intel/brw: Fix destination stride assertion in copy propagation We were asserting that entry->dst.offset % REG_SIZE == 0, which is easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2: load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW We create separate ACP entries corresponding to the values coming from vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where those sources get written to. So the second entry will have offset 16, which is not REG_SIZE aligned. It looks like this assert was originally added back in 2014 (see commit `1728e74957`) and adjusted through the ages, including at a point when we combined reg and subreg offsets into a single byte offset, and over time also extended copy propagation. Here the destination offset is already accounted for via rel_offset, at the byte offset level, so things ought to work and there is no need to assert that this is the case. Ian had already noted that the assert tripped in commit `e3f502e007`, but checking for inst->opcode == MOV here doesn't really make sense - it's just the case that he found that broke. Remove the erroneous assertion. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	1cb9946228	intel/brw: Fix register coalescing's LOAD_PAYLOAD dst offset handling We were discarding inst->dst.offset on LOAD_PAYLOAD instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	ba11127944	intel/brw: Fix opt_split_sends() to allow for FIXED_GRF send sources opt_copy_propagation() can sometimes propagate FIXED_GRF sources into SHADER_OPCODE_SENDs as the message payload. For example, GS input reads, which simply take a URB handle and have the offset in the descriptor. For non-VGRFs, there isn't a payload to split, so just skip past such send messages. Fixes: `589b03d02f` ("intel/fs: Opportunistically split SEND message payloads") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Kenneth Graunke	0e7bb74a1a	Revert "intel/brw: Don't consider UNIFORM_PULL_CONSTANT_LOAD a send-from-GRF" This reverts commit `5814534de5`. It apparently caused GPU hangs in Assassin's Creed: Valhalla, and it isn't that critical of a patch, so let's just roll it back for now. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10894 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28390>	2024-03-26 18:58:20 +00:00
Ian Romanick	b835784dde	intel/brw: Remove last vestiges of could_coissue Most of the obvious bits were removed by `7ac5696157` ("intel/brw: Remove Gfx8- code from backend passes"). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28342>	2024-03-23 01:29:22 +00:00
Yonggang Luo	1ac1c0843f	treewide: Replace usage of macro DEBUG with MESA_DEBUG when possible This is achieved by the following steps: #ifndef DEBUG => #if !MESA_DEBUG defined(DEBUG) => MESA_DEBUG #ifdef DEBUG => #if MESA_DEBUG This is done by replace in vscode excludes docs,.rs,addrlib,src/imgui,.sh,src/intel/vulkan/grl/gpu These are safe because those files should keep DEBUG macro is already excluded; and not directly replace DEBUG, as we have some symbols around it. Use debug or NDEBUG instead of DEBUG in comments when proper This for reduce the usage of DEBUG, so it's easier migrating to MESA_DEBUG These are found when migrating DEBUG to MESA_DEBUG, these are all comment update, so it's safe Replace comment /* DEBUG / and / !DEBUG / with proper / MESA_DEBUG / or / !MESA_DEBUG */ manually DEBUG \|\| !NDEBUG -> MESA_DEBUG \|\| !NDEBUG !DEBUG && NDEBUG -> !(MESA_DEBUG \|\| !NDEBUG) Replace the DEBUG present in comment with proper new MESA_DEBUG manually Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>	2024-03-22 18:22:34 +00:00
Mark Janes	4acea392af	intel/compiler: drop unused ray-tracing fields from cache hash The compiler only references `intel_device_info->subslice_masks` for ray tracing workloads. Platforms which lack raytracing support can share a cache even if they differ on this field. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28311>	2024-03-22 00:01:28 +00:00
Kenneth Graunke	9a72116367	intel/brw: Unify DF and Q/UQ lowering for MOV Using the new unsupported_64bit_type helper. Fixes: `ea423aba1b` ("intel/brw: Split out 64-bit lowering from algebraic optimizations") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>	2024-03-21 23:25:56 +00:00
Kenneth Graunke	97c7d5113d	intel/brw: Use correct execution pipe for lowering SEL on DF This is a float operation, let's keep it on the float pipe. Fixes: `ea423aba1b` ("intel/brw: Split out 64-bit lowering from algebraic optimizations") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>	2024-03-21 23:25:56 +00:00
Kenneth Graunke	26d65e96dd	intel/brw: Assert that min/max are not happening in 64-bit SEL lowering These aren't handled, only pure selects. Fixes: `ea423aba1b` ("intel/brw: Split out 64-bit lowering from algebraic optimizations") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>	2024-03-21 23:25:56 +00:00
Kenneth Graunke	a2c2a7bc00	intel/brw: Fix check for 64-bit SEL lowering types The 64-bit type lowering for SEL in opt_algebraic had a pre-existing bug where it only triggered when 64-bit float _and_ integer types were unsupported. Meteorlake supports 64-bit float but not integer, so we need to lower Q/UQ in that case still. When I moved this to a later pass, opt_peephole_sel started generating Q/UQ SEL instructions which were failing to be lowered. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10867 Fixes: `ea423aba1b` ("intel/brw: Split out 64-bit lowering from algebraic optimizations") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>	2024-03-21 23:25:56 +00:00
Dylan Baker	75ede9d9bc	intel/brw: track last successful pass and leave the loop early This is similar to what RADV implements using the NIR_LOOP_PASS helpers. I have not used those helpers for a couple of reasons: 1. They use the pointer to the optimization function, which doesn't work if the same function is called multiple times in one invocation of the loop (fixable) 2. After fixing them, due to Intel's use of sub-expressions, the amount of code added to wrap the shared macro becomes more than simply reimplementing them for the Intel compiler On most workloads the results are a wash, but on compile heavy workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw fossil-db runtimes fall by 1-2% on my ICL, with no changes to the compiled shaders. Caio saw closer to 2.5% on TGL. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>	2024-03-21 23:02:32 +00:00
Caio Oliveira	b2ee98d2db	intel/brw: Handle Xe2 in brw_fs_opt_zero_samples The mlen tracking is in REG_SIZE units, but in Xe2 each GRF has doubled the size. The optimization can only elide full GRFs, so round down the amount of trailing zeros to ensure the optimization will remove only full GRFs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28279>	2024-03-21 22:38:54 +00:00
Ian Romanick	cd70e49394	intel/brw: Allow SIMD16 F and HF type conversion moves On DG2, the lowering generated for these MOV instructions is awful. The original SIMD16 MOV { 18} 67: mov(16) vgrf54+0.0:HF, vgrf46+0.0:F NoMask group0 is lowered to SIMD8 MOVs: { 18} 118: mov(8) vgrf54+0.0:HF, vgrf46+0.0:F NoMask group0 { 18} 119: mov(8) vgrf54+0.16:HF, vgrf46+1.0:F NoMask group8 These MOVs violate Gfx12.5 region restrictions, so these are further lowered: { 17} 119: mov(8) vgrf83<2>:HF, vgrf46+0.0:F NoMask group0 { 19} 120: mov(8) vgrf54+0.0:UW, vgrf83<2>:UW NoMask group0 { 19} 122: mov(8) vgrf84<2>:HF, vgrf46+1.0:F NoMask group8 { 19} 123: mov(8) vgrf54+0.16:UW, vgrf84<2>:UW NoMask group8 The shader-db and fossil-db results are nothing to get excited about. However, the affect on vk_cooperative_matrix_perf is substantial. In one subtest shader: shaders/shmemfp16.spv cooperativeMatrixProps = 8x8x16 A = float16_t B = float16_t C = float16_t D = float16_t scope = subgroup TILE_M=128 TILE_N=128, TILE_K=32 BLayout=0 performance on my DG2 improved by ~60% due to a MASSIVE reduction in spills and fills: -Native code for unnamed compute shader (null) (src_hash 0x00000000) (sha1 c6a41b1c4e7aa2da327a39a70ed36c822a4b172f) -SIMD32 shader: 32484 instructions. 1 loops. 1893868 cycles. 737:1820 spills:fills, 442 sends, scheduled with mode none. Promoted 1 constants. Compacted 519744 to 492224 bytes (5%) - START B0 (20782 cycles) +Native code for unnamed compute shader (null) (src_hash 0x00000000) (sha1 621e960daad5b5579b176717f24a315e7ea560a1) +SIMD32 shader: 23918 instructions. 1 loops. 1089894 cycles. 432:1166 spills:fills, 442 sends, scheduled with mode none. Promoted 1 constants. Compacted 382688 to 353232 bytes (8%) shader-db: All Gfx9 and later platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19656270 -> 19653981 (-0.01%) instructions in affected programs: 61810 -> 59521 (-3.70%) helped: 116 / HURT: 0 total cycles in shared programs: 823368888 -> 823375854 (<.01%) cycles in affected programs: 1165284 -> 1172250 (0.60%) helped: 51 / HURT: 57 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) * Shaders only in 'before' results are ignored: fossil-db/steam-dxvk/total_war_warhammer3/2a3ed2ca632a7cb7/fs.32, fossil-db/steam-dxvk/total_war_warhammer3/18b9d4a3b1961616/fs.32, fossil-db/steam-dxvk/total_war_warhammer3/04ac9f3146a6db19/fs.32, fossil-db/steam-dxvk/total_war_warhammer3/f37ebec6aa1b379a/fs.32, fossil-db/steam-dxvk/total_war_warhammer3/255c987feb0d4310/fs.32, and 25 more from 1 apps: fossil-db/steam-dxvk/total_war_warhammer3 Totals: Instrs: 160946537 -> 160928389 (-0.01%); split: -0.01%, +0.00% Cycles: 14125908620 -> 14125873958 (-0.00%); split: -0.00%, +0.00% Totals from 1002 (0.15% of 652134) affected shaders: Instrs: 411261 -> 393113 (-4.41%); split: -4.41%, +0.00% Cycles: 16676735 -> 16642073 (-0.21%); split: -0.48%, +0.27% Tiger Lake Totals: Instrs: 164511816 -> 164497202 (-0.01%); split: -0.01%, +0.00% Cycles: 13801675722 -> 13801629397 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7955168 -> 7955152 (-0.00%) Send messages: 8544494 -> 8544486 (-0.00%) Totals from 997 (0.15% of 651454) affected shaders: Instrs: 460820 -> 446206 (-3.17%); split: -3.17%, +0.00% Cycles: 16265514 -> 16219189 (-0.28%); split: -0.84%, +0.56% Subgroup size: 17552 -> 17536 (-0.09%) Send messages: 26045 -> 26037 (-0.03%) Ice Lake Totals: Instrs: 165504747 -> 165489970 (-0.01%); split: -0.01%, +0.00% Cycles: 15145244554 -> 15145149627 (-0.00%); split: -0.00%, +0.00% Subgroup size: 8107032 -> 8107016 (-0.00%) Send messages: 8598680 -> 8598672 (-0.00%) Spill count: 45427 -> 45423 (-0.01%) Fill count: 74749 -> 74747 (-0.00%) Totals from 1125 (0.17% of 656115) affected shaders: Instrs: 521676 -> 506899 (-2.83%); split: -2.83%, +0.00% Cycles: 19555434 -> 19460507 (-0.49%); split: -0.59%, +0.10% Subgroup size: 21616 -> 21600 (-0.07%) Send messages: 28623 -> 28615 (-0.03%) Spill count: 603 -> 599 (-0.66%) Fill count: 1362 -> 1360 (-0.15%) Skylake * Shaders only in 'after' results are ignored: fossil-db/steam-native/red_dead_redemption2/cef460b80bad8485/fs.16, fossil-db/steam-native/red_dead_redemption2/cd5fe081e2e5529d/fs.16 from 1 apps: fossil-db/steam-native/red_dead_redemption2 Totals: Instrs: 141607617 -> 141593776 (-0.01%); split: -0.01%, +0.00% Cycles: 14257812441 -> 14257661671 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7743752 -> 7743736 (-0.00%) Send messages: 7552728 -> 7552720 (-0.00%) Spill count: 43660 -> 43661 (+0.00%) Fill count: 71301 -> 71303 (+0.00%) Totals from 1017 (0.16% of 636964) affected shaders: Instrs: 392454 -> 378613 (-3.53%); split: -3.53%, +0.00% Cycles: 16622974 -> 16472204 (-0.91%); split: -1.04%, +0.13% Subgroup size: 19840 -> 19824 (-0.08%) Send messages: 23021 -> 23013 (-0.03%) Spill count: 484 -> 485 (+0.21%) Fill count: 1155 -> 1157 (+0.17%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>	2024-03-21 15:12:58 -07:00
Ian Romanick	66dc6e07f5	intel/brw: Fix handling of accumulator register numbers Folks, there's more than one accumulator. In general, when the register file is ARF, the upper 4 bits of the register number specify which ARF, and the lower 4 bits specify which one of that ARF. This can be further partitioned by the subregister number. This is already mostly handled correctly for flags register, but lots of places wanted to check the register number for equality with BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>	2024-03-21 15:12:54 -07:00
Dylan Baker	477943cc9d	meson: Allow building intel-clc for the host if it can be run In what is probably the most common case cross of compilation, x86_64 -> x86, it should be possible to build intel-clc for the host machine and run it. Doing so simplifies the build by not needing to be able to cross compile half of mesa, and should ease developer and distro strain for building Intel drivers for x86. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>	2024-03-21 16:31:35 +00:00
Ian Romanick	3556dbb97f	intel/brw/xe2: Correctly disassemble RT write subtypes The encoding changed when SIMD32 was added. Part of Wa_14011334914. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	c4325f426c	intel/brw/xe2+: Setup PS thread payload registers required for ALU-based pixel interpolation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	6427f16074	intel/brw/gfx12: Setup PS thread payload registers required for ALU-based pixel interpolation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Rohan Garg	2df6d208c8	intel/brw: Adjust src1 length bits for xe2+ Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Rohan Garg	83f2bdc116	intel/brw: Set the right cache control bits for xe2 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Rohan Garg	adb853ed10	intel/brw: Update written size depending on the LSC message Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Rohan Garg	48376ac3b8	intel/brw: Cleanup send generation Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Rohan Garg	65f66974a5	intel/brw: Use the dimensions supplied in the instruction Rework: * Francisco Jerez: Rebase on `07b9bfacc7` ("intel/compiler: Move logical-send lowering to a separate file") Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	644a0ede1e	intel/blorp/xe2+: Don't use replicated-data clears. They've been removed from the hardware. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	af8b9af700	intel/brw/xe2+: Allow dual-source blending in SIMD16 mode. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	762ec3fd59	intel/brw/xe2+: Allow FS stencil output in SIMD16 dispatch mode. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Francisco Jerez	efc0601ddf	intel/brw/xe2+: Double allowed SIMD width of FB write SEND messages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00

1 2 3 4 5 ...

3369 commits