fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 17:50:12 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	e2d9ff8004	intel/brw: Handle scratch address swizzling of constants Pass in the nir_src and check if it's constant, handling it via CPU-side arithmetic instead of emitting instructions. While we can constant fold these via our optimization passes, we have to do opt_algebraic to fold the binary operation with constant sources into a MOV of an immediate, then opt_copy_propagation to put it in the next expression, and so on, until the entire expression is folded. This can take several iterations of the optimization loop, which is inefficient. For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch intrinsics with constant sources, and this patch removes a number of optimization passes according to INTEL_DEBUG=optimizer. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:54 -07:00
Kenneth Graunke	07745752d6	intel/brw: Skip fs_nir_setup_outputs for compute shaders There aren't any outputs, so there's no point to doing this work. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:54 -07:00
Kenneth Graunke	fa1564fb87	intel/brw: Recreate GS output registers after EmitVertex Geometry shaders write outputs multiple times, with EmitVertex() between them. The value of output variables becomes undefined after calling EmitVertex(), so we don't need to preserve those. This lets us recreate new registers after each EmitVertex(), assuming we aren't in control flow, allowing them to have separate live ranges. It also means that those registers are more likely to be written once, rather than having multiple writes, which can make optimization easier. This is pretty much a total hack, but it's helpful. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:51 -07:00
Rohan Garg	6f6da58315	intel/compiler: fix shuffle generation on LNL There doesn't seem to be a restriction on the mentioned data types on LNL anymore. Default to a maximum exec size of SIMD16. This patch fixes dEQP-VK.subgroups.shuffle.framebuffer.* on LNL Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29504>	2024-06-07 12:51:30 +00:00
Sagar Ghuge	2dba5d484b	intel/fs: Adjust destination register size for global atomic on Xe2+ For 16-bit data type, we are padding 16-bit and using 32-bit data type, so we need to account for the padded portion while calculating the size_written. Rework: (Rohan) - Drop unnecessary fs_builder instance Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>	2024-06-06 00:18:37 +00:00
Sagar Ghuge	55c7b24899	intel/fs: Adjust destination register size for untyped atomic on Xe2+ For 16-bit data type, we are padding 16-bit and using 32-bit data type, so we need to account for the padded portion while calculating the size_written. Rework: (Rohan) - Drop unnecessary fs_builder instance Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>	2024-06-06 00:18:37 +00:00
Jordan Justen	1fa84d34ef	intel/compiler: Don't set size written in brw_lower_logical_sends.cpp Rework: (Sagar) - Drop unused variable Suggested-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>	2024-06-06 00:18:37 +00:00
Zach Battleman	ecfe8b0f75	intel/brw: update Wa_1805992985 to use workarounds mechanism Replaced two instances of checking version 11 with the new workaround mechanism. Reviewed-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29560>	2024-06-05 23:45:33 +00:00
Zach Battleman	ddaa7c4221	intel/brw: update comment to accurately reflect intended behavior Removed mention of Wa_* when referencing an intended harware behavior since version 12. This will prevent the erroneous usage of the `intel_needs_workaround` in the future. Reviewed-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29559>	2024-06-05 23:23:30 +00:00
Iván Briano	1c6a6349b0	intel/brw: always read LAYER/VIEWPORT from the FS payload Following on https://gitlab.freedesktop.org/mesa/mesa/-/issues/9811 the restriction that kept us from using the payload values for non-mesh cases is gone, so just use the same codepath for everything. But since we have functions that correctly read those for all gens, use those instead of the broken hack we had until now. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9796 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>	2024-06-05 21:52:51 +00:00
Iván Briano	3d071fe7db	intel/brw: add fetch_viewport_index function Like fetch_render_target_array_index(), it reads the values provided by the FS payload. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>	2024-06-05 21:52:51 +00:00
Sagar Ghuge	415c5ad989	intel/compiler: No need to re-type the destination register For 16-bit float case handling, intermediate destination register is already 32-bit wide, we don't have to retype it to 32-bit. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29506>	2024-06-04 18:07:44 +00:00
Lionel Landwerlin	724bb7fa15	brw: better model READ_ARF_REG opcode This opcode gets translated to 2 ALU instructions with dependency ALU stall. This change reproduces the FS_OPCODE_PACK_HALF_2x16_SPLIT values which is another opcode that generates 2 instructions. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Lionel Landwerlin	ac03cefb28	brw: limit dependencies on SR register Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Lionel Landwerlin	d8b78924c5	brw: use a single virtual opcode to read ARF registers In `2c65d90bc8` I forgot to add the new SHADER_OPCODE_READ_MASK_REG opcode to the list of barrier instruction in the scheduler. Let's just use a single opcode for all ARF registers that need special scoreboarding and put the register as source (nicer for the debug output). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `2c65d90bc8` ("intel/brw: ensure find_live_channel don't access arch register without sync") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Ian Romanick	7b7e5cf5d4	nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers v2: Add some comments explaining some of the nuance of the shift optimizations. Fix a bug in the shift count calculation of the upper 32-bits. Move the @64 from the variable to the opcode. All suggested by Jordan. No shader-db changes on any Intel platform. fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154507026 -> 154506576 (-0.00%) Cycle count: 17436298868 -> 17436295016 (-0.00%) Max live registers: 32635309 -> 32635297 (-0.00%) Totals from 42 (0.01% of 632575) affected shaders: Instrs: 5616 -> 5166 (-8.01%) Cycle count: 133680 -> 129828 (-2.88%) Max live registers: 1158 -> 1146 (-1.04%) No fossil-db changes on any other Intel platform. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Ian Romanick	22095c60bc	nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64 This allows us to not generate 64-bit iadd3 on Intel but continue generating it for NVIDIA. No shader-db or fossil-db changes. v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit iadd3 on NVIDIA platforms. v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of the optimizations from ever occuring. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Georg Lehmann	dcab408a6c	nir: remove unpack_half_flush_to_zero It doesn't make sense to have two sets of opcodes for this when all backends that support the flush_to_zero variant just rely on the global floating point mode anyway. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>	2024-05-31 09:46:35 +00:00
Jordan Justen	fbf5ea6b44	intel/dev: Silence INTEL_FORCE_PROBE warning for intel_clc Running intel_clc as part of the build doesn't need to issue this warning. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29445>	2024-05-30 22:28:50 +00:00
Kenneth Graunke	fbe0f8d36d	intel/brw: Blockify convergent load_shared on Gfx11-12 as well Gfx11-12 can support SLM block loads via OWord Block Load messages (notably, the aligned version, not the unaligned version). A while back we deleted the SHADER_OPCODE_OWORD_BLOCK_READ opcode. Rather than bring it back, we continue using UNALIGNED_OWORD_BLOCK_READ for SLM block access (like we do for SSBOs) but switch it over to the aligned variant when lowering logical sends. We do ensure the alignment is at least 16B, however. This is ugly, but it's probably not worth bringing back a whole extra opcode for a legacy HDC block load quirk. References: BSpec 47652 and 1689 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29429>	2024-05-30 22:01:10 +00:00
José Roberto de Souza	f5f71bae02	intel: Move slm functions from brw_compiler.h to intel_compute_slm.c/h This functions were inlined in a header and duplicated between brw and elk. That would be enough reasons to move to a C file but next patches will add more code to support Xe2 platforms, what would cause more code to be inlined, duplicating even more code and increasing lib size. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28910>	2024-05-30 16:46:16 +00:00
Jordan Justen	4ffe1a9f9e	intel/brw: Fix SSBO/shared load offset register size for Xe2 Rework: * Ken: Reword commit message Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>	2024-05-28 18:45:49 +00:00
Jordan Justen	4bc4da01f4	intel/brw: Allow xe2 in brw_stage_has_packed_dispatch() Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>	2024-05-28 18:45:49 +00:00
Jordan Justen	739613ec70	intel/brw: Simplify enabling brw_fs_test_dispatch_packing Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>	2024-05-28 18:45:49 +00:00
Lionel Landwerlin	2c65d90bc8	intel/brw: ensure find_live_channel don't access arch register without sync Another architecture register that requires some care before reading. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `49ee3ae9e8` ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+") Tested-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>	2024-05-24 07:26:17 +00:00
Francisco Jerez	eebc4ec264	intel/brw/xe2+: Round up spill/unspill data size to nearest reg_size multiple. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:52 +00:00
Francisco Jerez	50daf161f4	intel/brw/xe2+: Lower 64-bit integer uadd_sat. Fixes failures of CTS tests that currently end up emitting 64-bit integer ADDs with saturation, which isn't supported by the hardware. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:52 +00:00
Francisco Jerez	4bb5b25e53	intel/xe2+: Enable native 64-bit integer arithmetic. Note that some previously-supported 64-bit integer operations have been removed from the hardware, so we need to instruct NIR to lower them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	8be9f00d84	intel/brw/xe2+: Lower 64-bit SHUFFLE and CLUSTER_BROADCAST. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	6261f4d361	intel/brw/xe2+: Fix 64-bit subgroup scan intrinsics not to rely on SEL instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	1bf93ee4ec	intel/brw/xe2+: Don't use SEL peephole on 64-bit moves. 64-bit SEL isn't supported by the INT pipeline on this platform. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	8f798cc911	intel/brw/xe2+: Fix indirect extended descriptor setup for scratch space. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	0d92ec44e5	intel/brw: Don't emit Z coordinate interpolation if CPS isn't in use. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Francisco Jerez	7c129d9365	intel/brw/xe2+: Keep PS sample mask in the f1.0 register whether or not kill is used. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Rohan Garg	7668de019b	intel/eu/xe2+: Fix src1 length bits of SEND instruction with UGM target. Rework: * Francisco Jerez: Specify the src1 length value in the correct units. Don't break earlier platforms. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:51 +00:00
Lionel Landwerlin	5b76696861	intel/clc: enable printfs support Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	9a36278475	intel/nir: add printf lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	6a8ff3b550	intel/compiler: store u_printf_info in prog_data So that the driver can decode the printf buffer. We're not going to use the NIR data directly from the driver (Iris/Anv) because the late compile steps might want to add more printfs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	ecbec25e84	intel/nir: add reloc delta to load_reloc_const_intel intrinsic We'll use the delta for an upcoming internal printf mechanism, where the PARAM_IDX will be the base printf reloc identifier and the BASE will be the string id. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	dde91d18c2	intel/nir: remove unused prototypes Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:37 +00:00
Rohan Garg	aa9244c8f6	intel/brw: update Xe2 max SIMD message sizes All the non-transpose messages are SIMD 1,2,4,8,16,32 capable (BSpec 57330) Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29212>	2024-05-15 12:02:02 +00:00
Iván Briano	a9f24fb5f1	intel/brw: fix subgroup size of geometry stages for lnl+ Fixes dEQP-VK.subgroups.size_control.allow_varying_subgroup_size and maybe others checking subgroup size. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29177>	2024-05-14 23:13:37 +00:00
Ian Romanick	97e3c6a12a	intel/brw: Use range analysis to optimize fsign shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19674784 -> 19665960 (-0.04%) instructions in affected programs: 933425 -> 924601 (-0.95%) helped: 3656 / HURT: 0 total cycles in shared programs: 810343919 -> 810241030 (-0.01%) cycles in affected programs: 56752034 -> 56649145 (-0.18%) helped: 3032 / HURT: 434 LOST: 11 GAINED: 0 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20315795 -> 20305856 (-0.05%) instructions in affected programs: 979698 -> 969759 (-1.01%) helped: 3845 / HURT: 0 total cycles in shared programs: 830600281 -> 830534694 (<.01%) cycles in affected programs: 45675615 -> 45610028 (-0.14%) helped: 3250 / HURT: 325 total spills in shared programs: 4583 -> 4565 (-0.39%) spills in affected programs: 180 -> 162 (-10.00%) helped: 3 / HURT: 0 total fills in shared programs: 5245 -> 5219 (-0.50%) fills in affected programs: 379 -> 353 (-6.86%) helped: 3 / HURT: 0 LOST: 14 GAINED: 8 fossil-db: All Intel platforms except Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154024263 -> 154023814 (-0.00%) Cycle count: 17463341602 -> 17461726239 (-0.01%); split: -0.01%, +0.00% Totals from 322 (0.05% of 631440) affected shaders: Instrs: 199933 -> 199484 (-0.22%) Cycle count: 168492537 -> 166877174 (-0.96%); split: -0.96%, +0.00% Tiger Lake Instrs: 149984723 -> 149984287 (-0.00%) Cycle count: 15238596937 -> 15239260415 (+0.00%); split: -0.00%, +0.01% Max dispatch width: 5553408 -> 5553424 (+0.00%) Totals from 318 (0.05% of 631414) affected shaders: Instrs: 179624 -> 179188 (-0.24%) Cycle count: 160724533 -> 161388011 (+0.41%); split: -0.06%, +0.48% Max dispatch width: 3296 -> 3312 (+0.49%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:21 +00:00
Ian Romanick	e578657313	intel/brw: Implement more strictly correct fsign lowering The huge amount of helped shaders is due to the "~" versions of the patterns. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19672345 -> 19662605 (-0.05%) instructions in affected programs: 1147766 -> 1138026 (-0.85%) helped: 2691 / HURT: 1650 total cycles in shared programs: 810323688 -> 810145191 (-0.02%) cycles in affected programs: 68918312 -> 68739815 (-0.26%) helped: 3651 / HURT: 1832 LOST: 29 GAINED: 38 Tiger Lake total instructions in shared programs: 19489619 -> 19479909 (-0.05%) instructions in affected programs: 1124564 -> 1114854 (-0.86%) helped: 2682 / HURT: 1643 total cycles in shared programs: 811468406 -> 811706747 (0.03%) cycles in affected programs: 66397690 -> 66636031 (0.36%) helped: 3692 / HURT: 1775 total spills in shared programs: 3906 -> 3907 (0.03%) spills in affected programs: 16 -> 17 (6.25%) helped: 0 / HURT: 1 total fills in shared programs: 3220 -> 3222 (0.06%) fills in affected programs: 50 -> 52 (4.00%) helped: 0 / HURT: 1 LOST: 33 GAINED: 36 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20317882 -> 20307495 (-0.05%) instructions in affected programs: 1199651 -> 1189264 (-0.87%) helped: 2863 / HURT: 1680 total cycles in shared programs: 830880024 -> 830457927 (-0.05%) cycles in affected programs: 63347102 -> 62925005 (-0.67%) helped: 4118 / HURT: 1622 total spills in shared programs: 4593 -> 4583 (-0.22%) spills in affected programs: 205 -> 195 (-4.88%) helped: 4 / HURT: 0 total fills in shared programs: 5284 -> 5245 (-0.74%) fills in affected programs: 464 -> 425 (-8.41%) helped: 4 / HURT: 0 LOST: 70 GAINED: 33 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154025275 -> 154022035 (-0.00%); split: -0.00%, +0.00% Cycle count: 17472869499 -> 17463289530 (-0.05%); split: -0.06%, +0.00% Spill count: 141269 -> 141246 (-0.02%); split: -0.02%, +0.00% Fill count: 265342 -> 265159 (-0.07%); split: -0.11%, +0.04% Max live registers: 32597829 -> 32597986 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5536776 -> 5537048 (+0.00%) Totals from 1590 (0.25% of 631423) affected shaders: Instrs: 1146532 -> 1143292 (-0.28%); split: -0.44%, +0.16% Cycle count: 1230843330 -> 1221263361 (-0.78%); split: -0.83%, +0.05% Spill count: 15832 -> 15809 (-0.15%); split: -0.19%, +0.04% Fill count: 36071 -> 35888 (-0.51%); split: -0.79%, +0.29% Max live registers: 93529 -> 93686 (+0.17%); split: -0.00%, +0.17% Max dispatch width: 15168 -> 15440 (+1.79%) Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149564084 -> 149562467 (-0.00%); split: -0.00%, +0.00% Cycle count: 15151701515 -> 15158290114 (+0.04%); split: -0.00%, +0.04% Max live registers: 32249443 -> 32249620 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5540536 -> 5540488 (-0.00%) Totals from 1605 (0.25% of 630303) affected shaders: Instrs: 584950 -> 583333 (-0.28%); split: -0.49%, +0.21% Cycle count: 160926321 -> 167514920 (+4.09%); split: -0.05%, +4.14% Max live registers: 90851 -> 91028 (+0.19%); split: -0.00%, +0.20% Max dispatch width: 15440 -> 15392 (-0.31%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	864268ff0d	intel/brw: Algebraic optimizations for CSEL No shader-db or fossil-db changes on any Intel platform. In this MR, the only benefit of these changes is to convert some "-a > 0" CSEL comparisons to "a < 0" for improved readability. v2: Add integer CSEL support v3: Use fs_inst::resize_sources and brw_type_is_sint. Both suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	033405cd4b	intel/brw: Combine constants and constant propagation for CSEL No shader-db or fossil-db changes on any Intel platform. This ends up begin helpful in "intel/brw: Use range analysis to optimize fsign." v2: Add integer CSEL support v3: Massive simplification (-20 lines!) of constant propagation logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm. Noticed by Ken. v4: While MAD can mix F and HF sources on some platforms, CSEL cannot. Found by skqp on TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	504b742b83	intel/brw: Update CSEL source type validation Gfx9 can only have F, but newer GPUs can have F, HF, D, or W. The source and destination types must still match in size. v2: Simplify the float vs integer logic. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	cd343fb9ac	intel/brw: Add support for fcsel opcodes Don't enable nir_opt_algebraic to generate these opcodes yet. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	d51ad9f4e0	intel/brw: Use fs_inst::resize_sources in brw_fs_opt_algebraic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00

... 4 5 6 7 8 ...

3754 commits