fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 09:10:11 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	5227b2db73	brw: annotation send instructions with surface handles generated with exec_all Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Kenneth Graunke	061da9f748	intel/brw: Make brw_reg::bits publicly accessible from fs_reg I want to be able to hash an fs_reg, including all the brw_reg fields. It's easiest to do this if I can use the "bits" union field that incorporates many of the other ones. We also move the using declaration for "nr" down because that field was moved to the second section a while back. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:19:01 -07:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	007d891239	intel/brw: Use newer brw_type_is_* shorter names Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Francisco Jerez	62aab1437e	intel/fs/gfx20+: Handle subdword integer regioning restrictions in copy propagation. This makes sure that copy propagation doesn't undo the lowering of restricted sub-dword integer regions done by brw_fs_lower_regioning(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>	2024-04-22 18:02:32 -07:00
Francisco Jerez	217d412360	intel/fs/gfx20+: Implement sub-dword integer regioning restrictions. This patch introduces code to enforce the pages-long regioning restrictions introduced by Xe2 that apply to sub-dword integer datatypes (See BSpec page 56640). They impose a number of restrictions on what the regioning parameters of a source can be depending on the source and destination datatypes as well as the alignment of the destination. The tricky cases are when the destination stride is smaller than 32 bits and the source stride is at least 32 bits, since such cases require the destination and source offsets to be in agreement based on an equation determined by the source and destination strides. The second source of instructions with multiple sources is even more restricted, and due to the existence of hardware bug HSDES#16012383669 it basically requires the source data to be packed in the GRF if the destination stride isn't dword-aligned. In order to address those restrictions this patch leverages the existing infrastructure from brw_fs_lower_regioning.cpp. The same general approach can be used to handle this restriction we were using to handle restrictions of the floating-point pipeline in previous generations: Unsupported source regions are lowered by emitting an additional copy before the instruction that shuffles the data in a way that allows using a valid region in the original instruction. The main difficulty that wasn't encountered in previous platforms is that it is non-trivial to come up with a copy instruction that doesn't break the regioning restrictions itself, since on previous platforms we could just bitcast floating-point data and use integer copies in order to implement arbitrary regioning, which is unfortunately no longer a choice lacking a magic third pipeline able to do the regioning modes the integer pipeline is no longer able to do. The required_src_byte_stride() and required_src_byte_offset() helpers introduced here try to calculate parameters for both regions that avoid that situation, but it isn't always possible, and actually in some cases that involve the second source of ALU instructions a chain of multiple copy instructions will be required, so the lower_instruction() routine needs to be applied recursively to the instructions emitted to lower the original instruction. XXX - Allow more flexible regioning for the second source of an instruction if bug HSDES#16012383669 is fixed in a future hardware platform. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>	2024-04-22 18:02:07 -07:00
Kenneth Graunke	a520c976a5	intel/brw: Drop dead CHV checks. This compiler no longer supports Cherryview. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>	2024-04-02 00:00:59 +00:00
Caio Oliveira	d9e737212d	intel/brw: Add a src array for the common case in fs_inst In the common case, fs_inst will have up to 4 sources (the HW instructions have up to 3, and our representation of SENDs have 4). Embed such array into the fs_inst, and use it whenever applicable instead of allocating a new array. Also change the code to reuse the allocated src array when resizing to a smaller length. Between the changes above and the reduced amount of initializing fs_regs, this reduces fossil-db time by around 2% for Borderlands 3 and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Kenneth Graunke	816a33849a	intel/brw: Rearrange fs_inst fields For better packing, and to make all the small fields easier to hash and compare en masse. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Caio Oliveira	082735750b	intel/brw: Simplify usage of reg immediate helpers Use fs_reg and don't take the type as argument. In all uses the type passed is the type of the register. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>	2024-03-01 17:52:09 +00:00
Caio Oliveira	fb1d871714	intel/brw: Fold backend_reg into fs_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>	2024-03-01 17:52:09 +00:00
Caio Oliveira	0f5f3fddd4	intel/brw: Fold backend_instruction into fs_inst Since we are touching it, change fs_inst to use struct instead of class so its forward declaration is C compatible. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>	2024-02-29 21:14:13 -08:00
Caio Oliveira	e5c5a983f7	intel/brw: Move functions from backend_instruction into fs_inst Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>	2024-02-29 21:14:13 -08:00
Caio Oliveira	634dff403f	intel/brw: Fold backend_shader into fs_visitor The base class was used when we had vec4, but now we can fold it with its only subclass. Declare fs_visitor now as a struct to be able to forward declare for C code without causing errors due to class/struct being mixed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:05 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Sagar Ghuge	b34b2bdff3	intel/compiler: Adjust sample_b parameter according to new layout On Xe2+, we need to pack LOD with array index for cube array surfaces, with that mlod parameter gets adjusted to different indices based on the layout. So track if we are packing LOD with array index in fs_inst and propogate that to sampler lowering code to adjust param location. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Caio Oliveira	c989ad09f1	intel/brw: Expose flag_mask/bit_mask fs helpers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>	2024-02-26 20:54:25 +00:00
Ian Romanick	3cb9625539	intel/fs: Fix scoreboarding for DPAS v2: Remove all mention of DPASW. Suggested by Curro and Caio. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Francisco Jerez	742a575bd6	intel/fs: Consider ATTR registers with different fs_reg::nr as belonging to disjoint register spaces. Instead of treating fs_reg::nr as an offset for ATTR registers simply consider different indices as denoting disjoint spaces that can never be accessed simultaneously by a single region. From now on geometry stages will just use ATTR #0 for everything and select specific attributes via offset() with the native dispatch width of the program, which should work on current platforms as well as on Xe2+. See "intel/fs: Map all GS input attributes to ATTR register number 0." for the rationale. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:30 +00:00
Francisco Jerez	ff3814abdd	intel/fs/xe2+: Handle extended math instructions as in-order in SWSB pass. Extended math instructions are now synchronized as in-order instructions like other ALU operations, which is more efficient than the out-of-order tracking we had to do in previous generations, and avoids false dependencies introduced due to SBID aliasing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:12 -08:00
Caio Oliveira	40416850f1	intel/compiler: Re-enable opt_zero_samples() in many cases for Gfx12.5 The workaround applies specifically to Cube and Cube Arrays, so we can still apply the optimization for the others. Ideally we would like to pull opt_zero_samples logic into the lowering sends -- to avoid adding a bit to communicate between passes. However the texture coordinates for the LOGICAL backend instructions, which are a common target for the optimization, are combined into offsets over a single VGRF, so we can't easily identify the constant cases. The copy-prop pass make this more visible for opt_zero_samples. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Lionel Landwerlin	d33aff783d	intel/fs: add support for sparse accesses Purely from the backend point of view it's just an additional parameter to sampler messages. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>	2023-07-27 02:02:30 +03:00
Ian Romanick	78ee74de4a	intel/compiler: Micro optimize regions_overlap On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -1.09% ± 0.084% (n = 5, pooled s = 0.354471) Reduces the size of a release build by 26k. text data bss dec hex filename 23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so 23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Paulo Zanoni	a099d6ae4d	intel: add devinfo->has_64bit_float_via_math_pipe Unusual hardware features that require special hanlding usually get a devinfo field, so do this for MTL's unordered DF types. This will guarantee that any platform based on MTL (thus inheriting from MTL_FEATURES) will automatically be handled in these special cases. v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
Paulo Zanoni	16b9f87104	intel/compiler: on MTL, DF instructions run in the math pipe Adjust the scoreboard code to take that into account. Fixes at least: - dEQP-VK.glsl.builtin.precision_double.refract.compute.vec3 - dEQP-VK.glsl.builtin.precision_double.matrixcompmult.compute.mat4 v2: use intel_device_info_is_mtl() (Curro, Jordan) Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
Francisco Jerez	051887fbf3	intel/fs: Make the result of is_unordered() dependent on devinfo. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>	2022-12-10 03:59:19 +00:00
LingMan	11f91505d9	intel/fs: Accept an unsigned int in fs_reg::fs_reg The parameter `nr` is currenlty an `int` but it only gets assigned to an `unsigned int`. Make it clear in the function signature what's actually required. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19423>	2022-11-23 18:37:35 +00:00
Caio Oliveira	0a2cfa14dd	intel/compiler: Make component() work for FIXED_GRF/ARF Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18157>	2022-08-23 19:52:38 +00:00
Francisco Jerez	6f33b22495	intel/fs: Fix horiz_offset() to handle FIXED_GRFs with non-trivial 2D regions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18157>	2022-08-23 19:52:38 +00:00
Kenneth Graunke	72e9843991	intel/compiler: Introduce a new brw_isa_info structure This structure will contain the opcode mapping tables in the next commit. For now, this is the mechanical change to plumb it into all the necessary places, and it continues simply holding devinfo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Lionel Landwerlin	361b3fee3c	intel: move away from booleans to identify platforms v2: Drop changes around GFX_VERx10 == 75 (Luis) v3: Replace (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT) by (devinfo->platform == INTEL_PLATFORM_IVB) Replace (devinfo->ver >= 5 \|\| devinfo->platform == INTEL_PLATFORM_G4X) by (devinfo->verx10 >= 45) Replace (devinfo->platform != INTEL_PLATFORM_G4X) by (devinfo->verx10 != 45) v4: Fix crocus typo v5: Rebase v6: Add GFX3, ILK & I965 platforms (Jordan) Move ifdef to code expressions (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>	2021-11-08 16:48:06 +00:00
Ian Romanick	38807ceeae	intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented using a separte cmpn and sel instruction. This lowering occurs in fs_vistor::lower_minmax which is called very, very late... a long, long time after the first calls to opt_cmod_propagation. As a result, conditional modifiers can be incorrectly propagated across sel.cond on those platforms. No tests were affected by this change, and I find that quite shocking. After just changing flags_written(), all of the atan tests started failing on ILK. That required the change in cmod_propagatin (and the addition of the prop_across_into_sel_gfx5 unit test). Shader-db results for ILK and GM45 are below. I looked at a couple before and after shaders... and every case that I looked at had experienced incorrect cmod propagation. This affected a LOT of apps! Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2, Gang Beasts, and on and on... :( I discovered this bug while working on a couple new optimization passes. One of the passes attempts to remove condition modifiers that are never used. The pass made no progress except on ILK and GM45. After investigating a couple of the affected shaders, I noticed that the code in those shaders looked wrong... investigation led to this cause. v2: Trivial changes in the unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason. Fixes: `df1aec763e` ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Dave Airlie <airlied@redhat.com> Iron Lake total instructions in shared programs: 8180493 -> 8181781 (0.02%) instructions in affected programs: 541796 -> 543084 (0.24%) helped: 28 HURT: 1158 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50% HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23% 95% mean confidence interval for instructions value: 1.06 1.11 95% mean confidence interval for instructions %-change: 0.31% 0.38% Instructions are HURT. total cycles in shared programs: 239420470 -> 239421690 (<.01%) cycles in affected programs: 2925992 -> 2927212 (0.04%) helped: 49 HURT: 157 helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70 helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96% HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24 HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20% 95% mean confidence interval for cycles value: -0.80 12.64 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4985517 -> 4986207 (0.01%) instructions in affected programs: 306935 -> 307625 (0.22%) helped: 14 HURT: 625 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.29% 0.36% Instructions are HURT. total cycles in shared programs: 153827268 -> 153828052 (<.01%) cycles in affected programs: 1669290 -> 1670074 (0.05%) helped: 24 HURT: 84 helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67 helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94% HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24 HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14% 95% mean confidence interval for cycles value: -1.94 16.46 95% mean confidence interval for cycles %-change: -0.29% 0.11% Inconclusive result (value mean confidence interval includes 0). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:20 -07:00
Lionel Landwerlin	f46aa1b9d7	intel/fs: use the final destination type for regioning restrictions This is most likely a rebase mistake :( Fixes: `f3e5cd813a` ("intel/fs: Handle regioning restrictions of split FP/DP pipelines.") Ref: `aa53665fda` ("intel/fs/copy_prop: check stride constraints with actual final type") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10764>	2021-05-12 21:19:11 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Francisco Jerez	f3e5cd813a	intel/fs: Handle regioning restrictions of split FP/DP pipelines. The floating-point and double-precision FPU pipelines of XeHP platforms don't support arbitrary regioning modes, corresponding channels of sources and destination are required to be aligned to the same sub-register offset, similar to the restriction FP64 instructions had on CHV/BXT platforms. Most violations of this restriction can be fixed easily by teaching has_dst_aligned_region_restriction() about the change so the regioning lowering pass gets rid of any unsupported regioning. For cases where this is not sufficient (e.g. because a virtual instruction internally uses some regioning mode not supported by the floating-point pipeline) the regioning lowering pass is extended with an additional lower_exec_type() codepath that bit-casts sources and destination to an integer type whenever the execution type is not supported by the instruction. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Lionel Landwerlin	aa53665fda	intel/fs/copy_prop: check stride constraints with actual final type In some cases we will change the type of the destination register of an instruction. This is the type we should use to verify that we're allow to do the replacement. Otherwise we can hit restrictions on CHV and upcoming Xe-Hp for instance where the copy propagation transforms this : send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD mov(16) vgrf11:UW, vgrf10<2>:UW mov(16) vgrf12:UW, vgrf10+0.2<2>:UW mov(16) vgrf15:HF, \|vgrf11\|:HF mov(16) vgrf16:HF, \|vgrf12\|:HF mov(8) vgrf41<2>:UW, vgrf15+0.0:UW group0 mov(8) vgrf42<2>:UW, vgrf15+0.16:UW group8 mov(8) vgrf45<2>:UW, vgrf16+0.0:UW group0 mov(8) vgrf46<2>:UW, vgrf16+0.16:UW group8 into this : send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD mov(8) vgrf41<2>:HF, \|vgrf10+0.0\|<2>:HF group0 mov(8) vgrf42<2>:HF, \|vgrf10+1.0\|<2>:HF group8 mov(8) vgrf45<2>:HF, \|vgrf10+0.2\|<2>:HF group0 mov(8) vgrf46<2>:HF, \|vgrf10+1.2\|<2>:HF group8 Because of the floating point use, stride and offets should be the same. v2: Fix final destination type selection (Curro) v3: constify (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9832>	2021-03-29 22:14:45 +00:00
Jason Ekstrand	69a3559efd	intel/reg,fs: Handle immediates properly in subscript() Just returning the original type isn't what we want in basically any case. Mask and shift the immediate as needed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:37 +00:00
Jason Ekstrand	4c8cbe9b13	intel/compiler: Return 1 for immediates in regs_read Previously, we were returning 2 whenever the source was a Q type. As far as I can tell, the only reason why this hasn't blown up before is that it was only ever used for VGRFs until the SWSB pass landed which uses it for everything. This wasn't a problem because Q types generally aren't a thing on TGL. However, they are for a small handful of instructions. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:37 +00:00
Francisco Jerez	bda1d72dd9	intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. This will be re-usable by the IR performance analysis pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	6310a05f68	intel/fs: Rename half() helpers to quarter(), allow index up to 3. Makes more sense considering SIMD32. Relaxing the assertion in brw_ir_fs.h will be required in order to avoid assertion failures on SNB with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Dylan Baker	f8e4542bad	replace _mesa_logbase2 with util_logbase2 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>	2020-04-21 11:09:03 -07:00
Francisco Jerez	ab0d1b3b3d	intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers. This reworks the current fs_inst::is_copy_payload() method into a number of classification helpers with well-defined semantics. This will be useful later on in order to optimize LOAD_PAYLOAD instructions more aggressively in cases where we can determine it's safe to do so. The closest equivalent of the present fs_inst::is_copy_payload() method is the is_coalescing_payload() helper introduced here. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:19 -08:00
Francisco Jerez	c20dc9b836	intel/fs: Make implied_mrf_writes() an fs_inst method. This will be convenient in a later commit enabling SIMD32 fragment shaders, and happens to fix the calculation for MATH instructions which is currently inaccurate for SIMD-lowered instructions on Gen4-5 platforms (all of them on Gen4 in SIMD16 mode), since it was based on the shader's dispatch width rather than on the actual execution size of the instruction. This causes some shader-db noise on Gen4 due to the more compact register allocation interacting with the SEND dependency workarounds, but otherwise no major changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:02:30 -08:00
Francisco Jerez	e0b8d7953e	intel/fs/gen12: Add scheduling information to the IR. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	6f275a863d	intel/fs: Define is_send() convenience IR helper. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	f326d9d218	intel/fs: Define is_payload() method of the IR instruction class. This is required because SEND message payload sources are fetched asynchronously by the hardware, which can lead to WaR data corruption on Gen12+ platforms if not handled specially by the compiler to guarantee proper synchronization. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Juan A. Suarez Romero	b06ae53606	Revert "intel/compiler: split is_partial_write() into two variants" This reverts commit `40b3abb4d1`. It is not clear that this commit was entirely correct, and unfortunately it was pushed by error. CC: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-25 09:19:10 +02:00
Iago Toral Quiroga	40b3abb4d1	intel/compiler: split is_partial_write() into two variants This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in the context of liveness analysis or register allocation, where we work with units of registers. The other scenario is that in which we want to know if an instruction is writing a full scalar component or just some subset of it. This is useful, for example, in the context of some optimization passes like copy propagation. For 32-bit instructions (or larger), a SIMD8 dispatch will always write at least a full SIMD8 register (32B) if the write is not partial. The function is_partial_write() checks this to determine if we have a partial write. However, when we deal with 16-bit instructions, that logic disables some optimizations that should be safe. For example, a SIMD8 16-bit MOV will only update half of a SIMD register, but it is still a complete write of the variable for a SIMD8 dispatch, so we should not prevent copy propagation in this scenario because we don't write all 32 bytes in the SIMD register or because the write starts at offset 16B (wehere we pack components Y or W of 16-bit vectors). This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit instructions, which lose a number of optimizations because of this, most important of which is copy-propagation. This patch splits is_partial_write() into is_partial_reg_write(), which represents the current is_partial_write(), useful for things like liveness analysis, and is_partial_var_write(), which considers the dispatch size to check if we are writing a full variable (rather than a full register) to decide if the write is partial or not, which is what we really want in many optimization passes. Then the patch goes on and rewrites all uses of is_partial_write() to use one or the other version. Specifically, we use is_partial_var_write() in the following places: copy propagation, cmod propagation, common subexpression elimination, saturate propagation and sel peephole. Notice that the semantics of is_partial_var_write() exactly match the current implementation of is_partial_write() for anything that is 32-bit or larger, so no changes are expected for 32-bit instructions. Tested against ~5000 tests involving 16-bit instructions in CTS produced the following changes in instruction counts: Patched \| Master \| % \| ================================================ SIMD8 \| 621,900 \| 706,721 \| -12.00% \| ================================================ SIMD16 \| 93,252 \| 93,252 \| 0.00% \| ================================================ As expected, the change only affects SIMD8 dispatches. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-18 11:05:18 +02:00
Francisco Jerez	7272fe9c08	intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply. Even though the hardware spec claims that any "integer DWord multiply" operation is affected by the regioning restrictions of CHV/BXT/GLK, this is inconsistent with the behavior of the simulator and with empirical evidence -- Return false from has_dst_aligned_region_restriction() for such instructions as a micro-optimization. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00

1 2

64 commits