fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 00:30:13 +01:00

Author	SHA1	Message	Date
Francisco Jerez	0ad835a929	intel/fs/xe2: Fix up subdword integer region restriction with strided byte src and packed byte dst. This fixes a corner case of the LNL sub-dword integer restrictions that wasn't being detected by has_subdword_integer_region_restriction(), specifically: > if(Src.Type==Byte && Dst.Type==Byte && Dst.Stride==1 && W!=2) { > // ... > if(Src.Stride == 2) && (Src.UniformStride) && (Dst.SubReg%32 == Src.SubReg/2 ) { Allowed } > // ... > } All the other restrictions that require agreement between the SubReg number of source and destination only affect sources with a stride greater than a dword, which is why has_subdword_integer_region_restriction() was returning false except when "byte_stride(srcs[i]) >= 4" evaluated to true, but as implied by the pseudocode above, in the particular case of a packed byte destination, the restriction applies for source strides as narrow as 2B. The form of the equation that relates the subreg numbers is consistent with the existing calculations in brw_fs_lower_regioning (see required_src_byte_offset()), we just need to enable lowering for this corner case, and change lower_dst_region() to call lower_instruction() recursively, since some of the cases where we break this restriction are copy instructions introduced by brw_fs_lower_regioning() itself trying to lower other instructions with byte destinations. This fixes some Vulkan CTS test-cases that were hitting these restrictions with byte data types. Fixes: `217d412360` ("intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30630>	2024-11-15 07:39:33 +00:00
Ian Romanick	7aad19ccd2	brw/lower: Lower invalid source conversion to better code There are two fragment shaders from RDR2 that is hurt for spills and fills on Lunar Lake. Totals from 2 (0.00% of 551413) affected shaders: Spill count: 1252 -> 1317 (+5.19%) Fill count: 2518 -> 2642 (+4.92%) Those shaders... have a lot of room for improvement. There are some patterns in those shaders that we handle very, very poorly. Improving those patterns would likely improve the spills and fills in these shaders quite dramatically. Given how much other platforms are helped, I don't this should block this commit. No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms. v2: Add some comments and an additional assertion. Suggested by Ken. shader-db: Lunar Lake total instructions in shared programs: 18094517 -> 18094511 (<.01%) instructions in affected programs: 809 -> 803 (-0.74%) helped: 6 / HURT: 0 total cycles in shared programs: 921532158 -> 921532168 (<.01%) cycles in affected programs: 2266 -> 2276 (0.44%) helped: 0 / HURT: 3 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19820845 -> 19820839 (<.01%) instructions in affected programs: 803 -> 797 (-0.75%) helped: 6 / HURT: 0 total cycles in shared programs: 906372999 -> 906372949 (<.01%) cycles in affected programs: 3216 -> 3166 (-1.55%) helped: 6 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00% Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00% Spill count: 69732 -> 69797 (+0.09%) Fill count: 128521 -> 128645 (+0.10%) Totals from 349 (0.06% of 551413) affected shaders: Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21% Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41% Spill count: 1951 -> 2016 (+3.33%) Fill count: 4899 -> 5023 (+2.53%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00% Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00% Spill count: 79279 -> 79003 (-0.35%) Fill count: 148803 -> 147942 (-0.58%) Scratch Memory Size: 3949568 -> 3946496 (-0.08%) Max live registers: 31879325 -> 31879230 (-0.00%) Totals from 366 (0.06% of 633185) affected shaders: Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01% Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24% Spill count: 3238 -> 2962 (-8.52%) Fill count: 10018 -> 9157 (-8.59%) Scratch Memory Size: 257024 -> 253952 (-1.20%) Max live registers: 28187 -> 28092 (-0.34%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	b2d7a823be	brw/lower: Don't emit spurious moves to or from NULL register Previously an instruction like cmp.l.f0.0(16) null:F, v359:F, 0f would get lowered to undef(16) v13703:UD cmp.l.f0.0(16) v13703:F, v359:F, 0f mov(16) null:UD, v13703:UD After copy propagation and dead-code elimination are run again, the original CMP gets turned back into its original form! Some cases can also emit MOVs from the original NULL register. It should be possible to not do any lowering here, but there are some interactions with source lowering passes for things like cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf What inspired this was... diff'ing step-by-step dumps from INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs and undefs. It was very annoying. This low-effort change gets the majority of the possible benefit. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Rohan Garg	56adf42110	intel/brw: lower math op regions for Xe2+ This helps fix: - dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_3.tan_frag - dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.tan_frag Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31218>	2024-09-24 09:58:28 +00:00
Kenneth Graunke	d630ff1f79	intel/brw: Disallow scalar byte to float conversions on DG2+ I haven't been able to find this restriction mentioned anywhere in the hardware documentation, but the simulator has code to reject this case as invalid, and it doesn't appear to work on hardware anymore. Having lower_regioning() handle this takes care of the issue so we don't have to worry about generating it in random places. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11489 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30140>	2024-07-18 18:51:35 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Francisco Jerez	8be9f00d84	intel/brw/xe2+: Lower 64-bit SHUFFLE and CLUSTER_BROADCAST. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	007d891239	intel/brw: Use newer brw_type_is_* shorter names Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Francisco Jerez	217d412360	intel/fs/gfx20+: Implement sub-dword integer regioning restrictions. This patch introduces code to enforce the pages-long regioning restrictions introduced by Xe2 that apply to sub-dword integer datatypes (See BSpec page 56640). They impose a number of restrictions on what the regioning parameters of a source can be depending on the source and destination datatypes as well as the alignment of the destination. The tricky cases are when the destination stride is smaller than 32 bits and the source stride is at least 32 bits, since such cases require the destination and source offsets to be in agreement based on an equation determined by the source and destination strides. The second source of instructions with multiple sources is even more restricted, and due to the existence of hardware bug HSDES#16012383669 it basically requires the source data to be packed in the GRF if the destination stride isn't dword-aligned. In order to address those restrictions this patch leverages the existing infrastructure from brw_fs_lower_regioning.cpp. The same general approach can be used to handle this restriction we were using to handle restrictions of the floating-point pipeline in previous generations: Unsupported source regions are lowered by emitting an additional copy before the instruction that shuffles the data in a way that allows using a valid region in the original instruction. The main difficulty that wasn't encountered in previous platforms is that it is non-trivial to come up with a copy instruction that doesn't break the regioning restrictions itself, since on previous platforms we could just bitcast floating-point data and use integer copies in order to implement arbitrary regioning, which is unfortunately no longer a choice lacking a magic third pipeline able to do the regioning modes the integer pipeline is no longer able to do. The required_src_byte_stride() and required_src_byte_offset() helpers introduced here try to calculate parameters for both regions that avoid that situation, but it isn't always possible, and actually in some cases that involve the second source of ALU instructions a chain of multiple copy instructions will be required, so the lower_instruction() routine needs to be applied recursively to the instructions emitted to lower the original instruction. XXX - Allow more flexible regioning for the second source of an instruction if bug HSDES#16012383669 is fixed in a future hardware platform. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>	2024-04-22 18:02:07 -07:00
Ian Romanick	90e12ed843	intel/brw/xe2+: Only apply Wa 22016140776 to math instructions The check in has_invalid_src_region incorrectly omitted inst->is_math() from the condition. Fixes: `0e817ba548` ("intel/brw/xe2+: Implement Wa 22016140776") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>	2024-04-18 23:03:57 +00:00
Ian Romanick	0e817ba548	intel/brw/xe2+: Implement Wa 22016140776 HF sources to math instructions cannot be scalar. This is very similar to an old Gfx6 restriction on POW, so let's fix it in a similar way. As an extra bit of saftey, lower any occurances that might slip through in brw_fs_lower_regioning. The primary change is to prevent copy propagation from violating the restriction. With that change, nothing should be able to generate these invalid source strides. The modification to fs_visitor::validate should detect potential problems sooner rather than later. Previous attempts to implement this Wa when emitting the math instruction (in brw_eu_emit.c gfx6_math) didn't work for several reasons. The lowering happens after the SWSB pass, so the scoreboarding was incorrect (thanks to Curro for finding that). In addition, the lowering happens after register allocation, so it's impossible to allocate a non-scalar register to expand the scalar value. Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL. v2: Add changes to brw_fs_lower_regioning. Suggested by Curro. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>	2024-04-04 21:04:09 -07:00
Kenneth Graunke	a520c976a5	intel/brw: Drop dead CHV checks. This compiler no longer supports Cherryview. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>	2024-04-02 00:00:59 +00:00
Kenneth Graunke	7a24f29fbb	intel/brw: Fix lower_regioning for BROADCAST, MOV_INDIRECT on Q types For BROADCAST and MOV_INDIRECT, required_exec_type was returning brw_int_type(type_sz(t), false), which is an unsigned type. However, get_exec_type(inst) returns the original type for either Q or UQ. This meant that has_invalid_exec_type would detect a mismatch and trigger lowering. That lowering would insert new 64-bit MOVs, which would need to be lowered on platforms which don't support Q/UQ. Except, we already ran that lowering pass earlier. So, the unlowered Q/UQ MOVs would reach the software scoreboarding pass, and trigger failures in the inferred_exec_pipe() function, as no pipe is available to handle 64-bit integer operations. It turns out that we don't need the region lowering pass to do anything for these opcodes. The generator code for both BROADCAST and MOV_INDIRECT already handle decomposing Q/UQ operations into 32-bit MOVs when they're not supported. And, it also implicitly converts to integer types, even for floating point sources. The inferred_exec_pipe function already special cases them to note that they'll always be handled on the integer pipe, so that matches. Just drop the region lowering code for these opcodes. Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>	2024-04-02 00:00:59 +00:00
Ian Romanick	be4fa59a72	intel/brw: Clear write_accumulator flag when changing the destination If the destination was the accumulator but is no longer, having the flag set is not correct. On Xe2 this also causes a validation error. v2: Reword the comment to be more clear. Suggested by Jordan. Fixes: `efa4e4bc5f` ("intel/fs: Introduce regioning lowering pass.") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Caio Oliveira	e1afffe7fa	intel/brw: Use hstride instead of stride for accumulator The `stride` field is not meant to be used by ARFs (like the accumulator), and is always 1. Use the `hstride` instead. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28064>	2024-03-09 18:26:24 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	7ac5696157	intel/brw: Remove Gfx8- code from backend passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	06272d4883	intel/brw: Pull lower_regioning out of fs_visitor Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>	2024-02-26 20:54:25 +00:00
Ian Romanick	e666872c75	intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Francisco Jerez	ac4f598577	intel/fs/xe2+: Update regioning lowering offset alignment checks for Xe2 regs. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	37e280f28a	intel/fs: Lower unsupported regioning with non-trivial 2D regions on FIXED_GRFs. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Jordan Justen	fcb72ffd0c	intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops For MTL (verx10 == 125), float64 is supported, but int64 is not. Therefore we need to lower cluster broadcast using 32-bit int ops. For gfx12.5+ platforms that support int64, the register regions used by cluster broadcast aren't supported by the 64-bit pipeline. On MTL, dEQP-VK.subgroups.clustered._double and dEQP-VK.subgroups.clustered._dvec were failing to validate the compiled shader in debug mode, and reportedly gpu-hanging in release mode. With this change dEQP-VK.subgroups.clustered._double passed all 48 tests and dEQP-VK.subgroups.clustered._dvec passed all 140 tests on MTL. Rework: * Move from generator to brw_fs_lower_regioning.cpp. (Suggested by Francisco) * Apply to verx10 >= 125.. (Suggested by Francisco) Cc: 23.1 <mesa-stable> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1) Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>	2023-04-20 11:41:10 -07:00
Paulo Zanoni	a099d6ae4d	intel: add devinfo->has_64bit_float_via_math_pipe Unusual hardware features that require special hanlding usually get a devinfo field, so do this for MTL's unordered DF types. This will guarantee that any platform based on MTL (thus inheriting from MTL_FEATURES) will automatically be handled in these special cases. v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
Paulo Zanoni	df50add27e	intel/compiler: avoid 64bit SEL_EXEC on MTL On MTL, instructions with DF type are unordered, executed in the math pipe. This means that they require different SWSB dependency handling, and also that in some cases such as MOVs it's generally faster to simply use 2 smaller ordered moves than a single unordered MOV. One problem we have with the current code is that generate_code() is not setting the proper SWSB dependencies for the generated DF MOVs, causing some tests to fail. One solution would be to fix generate_code() by making it set the appropriate dependencies. This was the first patch I wrote. Another solution to this problem, pointed to us by Curro, is to change required_exec_type() so we use UD instructions instead of DF, just like we do with platforms that don't have 64 bit instructions, which means there won't be anything to fix in generate_code(). The second solution is what this patch implements. This fixes at least: - dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmin_double_vertex Thanks to Francisco Jerez for all the major help provided with this problem. Credits-to: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
Francisco Jerez	051887fbf3	intel/fs: Make the result of is_unordered() dependent on devinfo. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
José Roberto de Souza	daf0b67bc2	intel/compiler/fs: Fix compilation of shaders with SHADER_OPCODE_SHUFFLE of float64 type During the lower_regioning() optimization, required_exec_type() is returning BRW_REGISTER_TYPE_UQ type when processing SHADER_OPCODE_SHUFFLE instructions of type BRW_REGISTER_TYPE_DF but MTL has float64 support but lacks int64 support causing shader compilation to fail. To fix that we could make required_exec_type() return BRW_REGISTER_TYPE_DF in such case but SHADER_OPCODE_SHUFFLE virtual instruction runs in the integer pipeline(inferred_exec_pipe()). So here replacing the has_64bit check by has_64bit_int, this will properly handle older and newer cases making this function return BRW_REGISTER_TYPE_UD. Then lower_exec_type() will take care to generate 2 32bits operations to accomplish the same. While at it also dropping the 'devinfo->verx10 == 70' check as GFX7_FEATURES fall into the same category as MTL, has float64 but no int64 support. Fixes at least this crucible tests: func.uniform-subgroup.exclusive.fadd64.q0 func.uniform-subgroup.exclusive.fmin64.q0 func.uniform-subgroup.exclusive.fmax64.q0 Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>	2022-09-14 19:32:43 +00:00
Iván Briano	81f97905c3	intel/compiler: make CLUSTER_BROADCAST always deal with integers This way we don't run afoul of regioning restrictions around floating point types. Cc: 22.0 <mesa-stable> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15039>	2022-02-16 21:36:42 +00:00
Francisco Jerez	79fb7f9de8	intel/fs: Perform 64-bit CLUSTER_BROADCAST lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	6c8782c135	intel/fs: Perform 64-bit SEL_EXEC lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	9449b71bdd	intel/fs: Perform 64-bit SHUFFLE lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	44e48751d2	intel/fs: Teach the lower_regioning pass how to split instructions of unsuported exec type. This adds some generic infrastructure that allows splitting any instruction into a number of instructions of a smaller legal execution type. This is meant to replace several instances of handcrafted 64bit type lowering done manually in the code generator, which is rather error-prone, prevents scheduling of the lowered instructions, and makes them invisible to the SWSB pass on Gfx12+ platforms, which will become especially problematic on Gfx12.5+ since the EUs introduce multiple asynchronous execution pipelines which the SWSB pass needs to be able to synchronize to one another, so it's critical for the real execution type of the instruction to be visible to the SWSB pass. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	539c879a6b	intel/fs: Move legal exec type calculation into helper function in lower_regioning pass. Right now the execution type lowering functionality of this pass assumes that an integer type of the original bit size is always acceptable, however we'll want more complex behavior than that in order to leverage this pass to automate the lowering of unsupported 64-bit operations into multiple 32-bit operations. In order to do that calculate the closest legal execution type from a new helper function, and take advantage of that function from the has_invalid_exec_type() helper, along the lines of other lower_regioning() helpers structured as a pair of has_invalid_foo() + required_foo() functions. This shouldn't have any functional changes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Lionel Landwerlin	361b3fee3c	intel: move away from booleans to identify platforms v2: Drop changes around GFX_VERx10 == 75 (Luis) v3: Replace (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT) by (devinfo->platform == INTEL_PLATFORM_IVB) Replace (devinfo->ver >= 5 \|\| devinfo->platform == INTEL_PLATFORM_G4X) by (devinfo->verx10 >= 45) Replace (devinfo->platform != INTEL_PLATFORM_G4X) by (devinfo->verx10 != 45) v4: Fix crocus typo v5: Rebase v6: Add GFX3, ILK & I965 platforms (Jordan) Move ifdef to code expressions (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>	2021-11-08 16:48:06 +00:00
Ian Romanick	38807ceeae	intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented using a separte cmpn and sel instruction. This lowering occurs in fs_vistor::lower_minmax which is called very, very late... a long, long time after the first calls to opt_cmod_propagation. As a result, conditional modifiers can be incorrectly propagated across sel.cond on those platforms. No tests were affected by this change, and I find that quite shocking. After just changing flags_written(), all of the atan tests started failing on ILK. That required the change in cmod_propagatin (and the addition of the prop_across_into_sel_gfx5 unit test). Shader-db results for ILK and GM45 are below. I looked at a couple before and after shaders... and every case that I looked at had experienced incorrect cmod propagation. This affected a LOT of apps! Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2, Gang Beasts, and on and on... :( I discovered this bug while working on a couple new optimization passes. One of the passes attempts to remove condition modifiers that are never used. The pass made no progress except on ILK and GM45. After investigating a couple of the affected shaders, I noticed that the code in those shaders looked wrong... investigation led to this cause. v2: Trivial changes in the unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason. Fixes: `df1aec763e` ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Dave Airlie <airlied@redhat.com> Iron Lake total instructions in shared programs: 8180493 -> 8181781 (0.02%) instructions in affected programs: 541796 -> 543084 (0.24%) helped: 28 HURT: 1158 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50% HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23% 95% mean confidence interval for instructions value: 1.06 1.11 95% mean confidence interval for instructions %-change: 0.31% 0.38% Instructions are HURT. total cycles in shared programs: 239420470 -> 239421690 (<.01%) cycles in affected programs: 2925992 -> 2927212 (0.04%) helped: 49 HURT: 157 helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70 helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96% HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24 HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20% 95% mean confidence interval for cycles value: -0.80 12.64 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4985517 -> 4986207 (0.01%) instructions in affected programs: 306935 -> 307625 (0.22%) helped: 14 HURT: 625 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.29% 0.36% Instructions are HURT. total cycles in shared programs: 153827268 -> 153828052 (<.01%) cycles in affected programs: 1669290 -> 1670074 (0.05%) helped: 24 HURT: 84 helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67 helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94% HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24 HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14% 95% mean confidence interval for cycles value: -1.94 16.46 95% mean confidence interval for cycles %-change: -0.29% 0.11% Inconclusive result (value mean confidence interval includes 0). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:20 -07:00
Lionel Landwerlin	474eaa25ad	intel/fs: make sure shuffle is lowered to supported types On XeHP there are restrictions on types of source and destinations with float types. As shuffle is implemented using MOV we need to make sure we lower it to supported types. This fixes tests like : dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>	2021-05-22 21:55:33 +00:00
Marcin Ślusarz	3340d5ee02	intel: simplify is_haswell checks, part 1 Generated with: files=`git grep is_haswell \| cut -d: -f1 \| sort \| uniq` for file in $files; do cat $file \| \ sed "s/devinfo->ver <= 7 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" \| \ sed "s/devinfo->ver >= 8 \|\| devinfo->is_haswell/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo->is_haswell \|\| devinfo->ver >= 8/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo.is_haswell \|\| devinfo.ver >= 8/devinfo.verx10 >= 75/g" \| \ sed "s/devinfo->ver > 7 \|\| devinfo->is_haswell/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo->ver == 7 && !devinfo->is_haswell/devinfo->verx10 == 70/g" \| \ sed "s/devinfo.ver == 7 && !devinfo.is_haswell/devinfo.verx10 == 70/g" \| \ sed "s/devinfo->ver < 8 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" \| \ sed "s/device->info.ver == 7 && !device->info.is_haswell/device->info.verx10 == 70/g" \ > tmpXXX mv tmpXXX $file done Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10810>	2021-05-17 09:46:45 +00:00
Anuj Phogat	4c535cbf99	intel: Fix alignment and line wrapping due to gen_device renaming Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Francisco Jerez	f3e5cd813a	intel/fs: Handle regioning restrictions of split FP/DP pipelines. The floating-point and double-precision FPU pipelines of XeHP platforms don't support arbitrary regioning modes, corresponding channels of sources and destination are required to be aligned to the same sub-register offset, similar to the restriction FP64 instructions had on CHV/BXT platforms. Most violations of this restriction can be fixed easily by teaching has_dst_aligned_region_restriction() about the change so the regioning lowering pass gets rid of any unsupported regioning. For cases where this is not sufficient (e.g. because a virtual instruction internally uses some regioning mode not supported by the floating-point pipeline) the regioning lowering pass is extended with an additional lower_exec_type() codepath that bit-casts sources and destination to an integer type whenever the execution type is not supported by the instruction. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Anuj Phogat	abe9a71a09	intel: Rename gen field in gen_device_info struct to ver Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "info\)(.\|->)gen" -rIl $SEARCH_PATH \| xargs sed -ie "s/info$)$$\.\\|->$gen/info\1\2ver/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Mark Janes	ee06e47c5b	intel/fs: Assert if lower_source_modifiers converts 32x16 to 32x32 multiplication Lowering source modifiers will convert a 16bit source to a 32bit value. In the case of integer multiplication, this will reverse previous lowering performed by lower_mul_dword_inst. Assert to prevent an illegal DxD operation (and GPU hang). Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-10 13:29:56 -07:00
Francisco Jerez	ab6d792986	intel/compiler: Pass detailed dependency classes to invalidate_analysis() Have fun reading through the whole back-end optimizer to verify whether I've missed any dependency flags -- Or alternatively, just trust that any mistake here will trigger an assertion failure during analysis pass validation if it ever poses a problem for the consistency of any of the analysis passes managed by the framework. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:39 -08:00
Francisco Jerez	d966a6b4c4	intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes The invalidate_analysis() method knows what analysis passes there are in the back-end and calls their invalidate() method to report changes in the IR. For the moment it just calls invalidate_live_intervals() (which will eventually be fully replaced by this function) if anything changed. This makes all optimization passes invalidate DEPENDENCY_EVERYTHING, which is clearly far from ideal -- The dependency classes passed to invalidate_analysis() will be refined in a future commit. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:32 -08:00
Jason Ekstrand	f4ef34f207	intel/fs: Add an UNDEF instruction to avoid excess live ranges With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-04 14:27:30 -05:00
Iago Toral Quiroga	0986199b31	intel/compiler: workaround for SIMD8 half-float MAD in gen8 Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore, with a 16B offset). It looks like we are not currently doing this any more but this would handle the situation properly if we ever happen to produce code like this again. v2 (Jason): - Move this workaround to the lower_regioning pass as an additional case to has_invalid_src_region() - Do not apply the workaround if the stride of the source operand is 0, testing suggests the problem doesn't exist in that case. v3 (Jason): - We want offset % REG_SIZE > 0, not just offset > 0 - Use a helper to compute the offset Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)	2019-04-18 11:05:18 +02:00
Francisco Jerez	7f9f6263c1	intel/fs: Cap dst-aligned region stride to maximum representable hstride value. This is required in combination with the following commit, because otherwise if a source region with an extended 8+ stride is present in the instruction (which we're about to declare legal) we'll end up emitting code that attempts to write to such a region, even though strides greater than four are still illegal for the destination. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	c3c27762f7	intel/fs: Exclude control sources from execution type and region alignment calculations. Currently the execution type calculation will return a bogus value in cases like: mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u Which will be considered to have a 32-bit integer execution type even though the actual indirect move operation will be carried out with 16-bit precision. Similarly there's no need to apply the CHV/BXT double-precision region alignment restrictions to such control sources, since they aren't directly involved in the double-precision arithmetic operations emitted by these virtual instructions. Applying the CHV/BXT restrictions to control sources was expected to be harmless if mildly inefficient, but unfortunately it exposed problems at codegen level for virtual instructions (namely the SHUFFLE instruction used for the Vulkan 1.1 subgroup feature) that weren't prepared to accept control sources with an arbitrary strided region. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Fixes: `efa4e4bc5f` "intel/fs: Introduce regioning lowering pass." Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00

1 2

52 commits