fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 15:50:11 +01:00

Author	SHA1	Message	Date
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Kenneth Graunke	9e750f00c3	intel/brw: Make opt_copy_propagation_defs clean up its own trash Copy propagation often eliminates all uses of an instruction. If we detect that we've done so, we can eliminate the instruction ourselves rather than leaving it hanging until the next DCE pass. This saves some CPU time as other passes don't see dead code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	580e1c592d	intel/brw: Introduce a new SSA-based copy propagation pass (Quite a few of the restrictions here are ported from the old pass.) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	3da444b79e	intel/brw: Refactor code to commute immediates into legal positions This will let us reuse this in a new pass shortly. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:19:12 -07:00
Kenneth Graunke	d45da713e7	intel/brw: Refactor try_constant_propagate() This will let us reuse the bulk of this code in a new copy propagation pass without replicating it. We retain a wrapper function for dealing with ACP entries, which the new pass won't have. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:19:10 -07:00
Kenneth Graunke	85aa6f80af	intel/brw: Drop BRW_OPCODE_IF from try_constant_propagate This was for Sandybridge's IF with embedded comparison, which only existed for a single generation of hardware. Since the compiler fork, we no longer support Sandybridge here. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:19:08 -07:00
Kenneth Graunke	7019bc4469	intel/brw: Drop compiler parameter from try_constant_propagate() This is unused. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:19:06 -07:00
Ian Romanick	033405cd4b	intel/brw: Combine constants and constant propagation for CSEL No shader-db or fossil-db changes on any Intel platform. This ends up begin helpful in "intel/brw: Use range analysis to optimize fsign." v2: Add integer CSEL support v3: Massive simplification (-20 lines!) of constant propagation logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm. Noticed by Ken. v4: While MAD can mix F and HF sources on some platforms, CSEL cannot. Found by skqp on TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Francisco Jerez	62aab1437e	intel/fs/gfx20+: Handle subdword integer regioning restrictions in copy propagation. This makes sure that copy propagation doesn't undo the lowering of restricted sub-dword integer regions done by brw_fs_lower_regioning(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>	2024-04-22 18:02:32 -07:00
Ian Romanick	0e817ba548	intel/brw/xe2+: Implement Wa 22016140776 HF sources to math instructions cannot be scalar. This is very similar to an old Gfx6 restriction on POW, so let's fix it in a similar way. As an extra bit of saftey, lower any occurances that might slip through in brw_fs_lower_regioning. The primary change is to prevent copy propagation from violating the restriction. With that change, nothing should be able to generate these invalid source strides. The modification to fs_visitor::validate should detect potential problems sooner rather than later. Previous attempts to implement this Wa when emitting the math instruction (in brw_eu_emit.c gfx6_math) didn't work for several reasons. The lowering happens after the SWSB pass, so the scoreboarding was incorrect (thanks to Curro for finding that). In addition, the lowering happens after register allocation, so it's impossible to allocate a non-scalar register to expand the scalar value. Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL. v2: Add changes to brw_fs_lower_regioning. Suggested by Curro. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>	2024-04-04 21:04:09 -07:00
Rohan Garg	a715512177	intel/brw: adjust the copy propgation pass to account for wider GRF's on Xe2+ Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Kenneth Graunke	c0c05c1041	intel/brw: Fix destination stride assertion in copy propagation We were asserting that entry->dst.offset % REG_SIZE == 0, which is easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2: load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW We create separate ACP entries corresponding to the values coming from vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where those sources get written to. So the second entry will have offset 16, which is not REG_SIZE aligned. It looks like this assert was originally added back in 2014 (see commit `1728e74957`) and adjusted through the ages, including at a point when we combined reg and subreg offsets into a single byte offset, and over time also extended copy propagation. Here the destination offset is already accounted for via rel_offset, at the byte offset level, so things ought to work and there is no need to assert that this is the case. Ian had already noted that the assert tripped in commit `e3f502e007`, but checking for inst->opcode == MOV here doesn't really make sense - it's just the case that he found that broke. Remove the erroneous assertion. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>	2024-03-27 04:52:17 +00:00
Ian Romanick	d9674cbe7d	intel/brw: Combine constants for src0 of POW instructions too I tried this when I was working on MR !7698, and it didn't have much affect back then. Maybe I've added more stuff to my fossil-db? Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW instruction was removed. shader-db: Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20301933 -> 20301900 (<.01%) instructions in affected programs: 9077 -> 9044 (-0.36%) helped: 33 / HURT: 0 total cycles in shared programs: 842797624 -> 842799471 (<.01%) cycles in affected programs: 1361911 -> 1363758 (0.14%) helped: 35 / HURT: 111 LOST: 0 GAINED: 9 fossil-db: Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 165510222 -> 165510163 (-0.00%) Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00% Spill count: 45204 -> 45196 (-0.02%) Fill count: 74157 -> 74149 (-0.01%) Totals from 65 (0.01% of 656118) affected shaders: Instrs: 57426 -> 57367 (-0.10%) Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03% Spill count: 137 -> 129 (-5.84%) Fill count: 515 -> 507 (-1.55%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	e7480f94c1	intel/brw: Combine constants for src0 of integer multiply too The majority of cases that would have been affected by this actually had both sources as integer constants. The earlier commit "intel/rt: Don't directly generate umul_32x16" allowed those to be constant folded. v2: Move the a-1 block to be near the existing a-1 block. No shader-db changes on any Intel platform. fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165510246 -> 165510222 (-0.00%) Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00% Totals from 46 (0.01% of 656118) affected shaders: Instrs: 36010 -> 35986 (-0.07%) Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Kenneth Graunke	f6ac6c94a9	intel/brw: Handle SHADER_OPCODE_SEND without src[3] in copy prop We construct some SENDs with only 3 sources (such as FB writes). This code could read out of bounds. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>	2024-03-05 11:39:26 +00:00
Kenneth Graunke	49606ab067	intel/brw: Avoid copy propagating any fixed registers into EOTs We were handling FIXED_GRF, but we probably also ought to handle ATTR (pushed inputs) and UNIFORM (pushed constants). Just check if file isn't VGRF to handle everything. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>	2024-03-05 11:39:26 +00:00
Kenneth Graunke	45a5e4c0c4	intel/brw: Delete SHADER_OPCODE_TXF_UMS Nothing seems to generate this anymore. I guess we always use CMS. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>	2024-03-01 22:19:51 +00:00
Kenneth Graunke	601ef12467	intel/brw: Delete SHADER_OPCODE_TXF_CMS[_LOGICAL] We always use the wide variant (_W) on hardware this compiler supports. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>	2024-03-01 22:19:50 +00:00
Caio Oliveira	082735750b	intel/brw: Simplify usage of reg immediate helpers Use fs_reg and don't take the type as argument. In all uses the type passed is the type of the register. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>	2024-03-01 17:52:09 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5c93a0e125	intel/brw: Remove Gfx8- remaining opcodes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	7ac5696157	intel/brw: Remove Gfx8- code from backend passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Sagar Ghuge	6f0ab5e4d5	intel/compiler: Add texture gather offset LOD/Bias message support v2: (Ian) - Space formatting on conditional statement Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Sagar Ghuge	79af0ac29a	intel/compiler: Add gather4_i/l/[_c]/b sampler message v2: (Ian) - Format comment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Caio Oliveira	6a3329a6c4	intel/brw: Pull opt_copy_propagation out of fs_visitor Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>	2024-02-26 20:54:24 +00:00
Caio Oliveira	b5cd91501d	intel/fs: Use linear allocator in opt_copy_propagation Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>	2024-01-04 23:06:07 +00:00
Caio Oliveira	6d2503e935	intel/fs: Only allocate acp_entry if we are adding one In practice it seems we are always entering here, haven't looked in detail whether at this point we could just assert. But for now only allocate a new acp_entry if we are going to add it. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>	2024-01-04 23:06:07 +00:00
Francisco Jerez	6bf99e6a45	intel/compiler: Don't change types for copies from ATTR file. Since the <8;8,0> regions they use in multipolygon mode could violate regioning restrictions in some cases, depending on the execution type of the instruction. Note that the assertion is removed from try_copy_propagate() since a more accurate check is used within that function than what fs_inst::can_change_types() can do. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	2ed36050fb	intel/fs: Don't copy-propagate ATTR registers in multi-polygon FS shaders when invalid. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Jordan Justen	3f89fa63e6	intel/compiler: Pass max_polygons to copy-prop from fs_visitor. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Ian Romanick	92f5442489	intel/fs: Merge copy prop dataflow loops This is kept as a separate commit because the change looks like a lot more than it it. The order of the two loops is swapped, then the two loops are merged. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	fa2757aa97	intel/fs: Use rb_tree for copy prop dataflow Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	35644bb483	intel/fs: Use rb_tree to store ACP entries by destination Using a single data structure seems better. There's no appreciable performance change. On batman_arkham_city_goty.foz, the difference reported was 0.48%±0.36% (n=20). Several commits in the MR, including some that should have no effect at all, reported similar changes. I attribute this primarily changing of loop alignments and similar. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	c28bf1a249	intel/fs: Use rb_tree to store ACP entries by source On batman_arkham_city_goty.foz, this improves fossil-db time by -3.83%±0.24% (n=20). This fossil takes the longest time of any in my database. v2: Add some comments for cmp_entry_src_entry_src and cmp_entry_src_nr. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	06bdd3eac0	intel/fs: Encapsulate per-block ACP in a structure This simplifies some later changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	c262752d74	intel/fs: Make opt_copy_propagation_local file private This annoyed me durning development of this MR. Every time I changed the parameters to this internal function, I had to modify a public header file... and trigger a much large rebuild. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	0946108298	intel/fs: Simplify check in can_propagate_from The larger predicate here already requires that inst->opcode must be BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the other simplifications are pretty straight forward. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	1f15a0f8b2	intel/fs: Don't loop in try_constant_propagate The caller already loops over the sources. This means that the caller must loop over the sources in reverse because constant propagation prefers to propagate into the last sources first. The shader-db and fossil-db changes (below) are all due to SEL instructions. Changing the order sources are visited changes whether a SEL with two immediate sources is (+f0.0) sel g12 IMM_A IMM_B or (-f0.0) sel g12 IMM_B IMM_A The ordering of the sources affects the order the constant combining encounters the values, and the determines which value is "combined" and which value remains an immediate. This affects the results by luck. If there are two instructions: (+f0.0) sel g12 IMM_A IMM_B (+f0.0) sel g13 IMM_A IMM_C Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the selection algorithm in constant combining is greedy, this case requires the algorithm see the values in just the right order for the right thing to happen. v2: Rebase on many, many changes. Move instruction source fixup reordering out or try_constant_propagate. v3: Rebase on !7698. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	ab23d89ade	intel/fs: Move src.file checks out of try_constant_propagate and try_copy_propagate Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	b5b2338c5c	intel/fs: Make try_constant_propagate and try_copy_propagate file private This annoyed me durning development of this MR. Every time I changed the parameters to this internal function, I had to modify a public header file... and trigger a much large rebuild. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:22 +00:00
Ian Romanick	8665e37960	intel/fs: Don't try to copy propagate into a source again after progress is made If the linked list structure used depended on the list head to know when to terminate, this would be a pretty serious bug. If try_constant_propage or try_copy_propagate make progress, inst->src[i].nr will change. This results in the foreach_in_list using a different list header on later iterations of the loop. This causes two shaders in shader-db and 9 shaders in fossil-db to change. Looking at the code changes, these are cases where there was a copy of a copy that gets propagated. The part that confuses me is the VGRF numbers involved should not hash to the same bucket, so it should be impossible to find the original source from the intermediate VGRF. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:22 +00:00
Ian Romanick	c506d7e511	intel/fs: Combine constants for integer instructions too v2: Remove type change for SHR with negation. This was a leftover from a previous attempt to deal with SHR and negation. Now all right-shifts with unsigned parameters are marked as not being able to have source modifiers. v3: Disallow negations on right shifts of unsigned sources by setting the no_negations flag in add_candidate_immediate. This eliminates the need to exclude SHR in can_do_source_mods. Tiger Lake total instructions in shared programs: 21102817 -> 21099443 (-0.02%) instructions in affected programs: 296796 -> 293422 (-1.14%) helped: 92 / HURT: 356 total cycles in shared programs: 790564691 -> 790393358 (-0.02%) cycles in affected programs: 36456886 -> 36285553 (-0.47%) helped: 171 / HURT: 286 total spills in shared programs: 3951 -> 3959 (0.20%) spills in affected programs: 176 -> 184 (4.55%) helped: 0 / HURT: 2 total fills in shared programs: 2631 -> 2639 (0.30%) fills in affected programs: 176 -> 184 (4.55%) helped: 0 / HURT: 2 LOST: 0 GAINED: 4 Ice Lake total instructions in shared programs: 19954204 -> 19949122 (-0.03%) instructions in affected programs: 40301 -> 35219 (-12.61%) helped: 23 / HURT: 2 total cycles in shared programs: 858377735 -> 858462082 (<.01%) cycles in affected programs: 75537286 -> 75621633 (0.11%) helped: 124 / HURT: 319 total spills in shared programs: 6255 -> 6190 (-1.04%) spills in affected programs: 392 -> 327 (-16.58%) helped: 1 / HURT: 2 total fills in shared programs: 7813 -> 7382 (-5.52%) fills in affected programs: 942 -> 511 (-45.75%) helped: 1 / HURT: 2 LOST: 0 GAINED: 3 Skylake total instructions in shared programs: 18049362 -> 18044440 (-0.03%) instructions in affected programs: 48317 -> 43395 (-10.19%) helped: 26 / HURT: 2 total cycles in shared programs: 844884806 -> 844915655 (<.01%) cycles in affected programs: 76137133 -> 76167982 (0.04%) helped: 171 / HURT: 293 total spills in shared programs: 6148 -> 6149 (0.02%) spills in affected programs: 595 -> 596 (0.17%) helped: 4 / HURT: 2 total fills in shared programs: 7484 -> 7067 (-5.57%) fills in affected programs: 1226 -> 809 (-34.01%) helped: 4 / HURT: 2 LOST: 0 GAINED: 8 Broadwell total instructions in shared programs: 17826844 -> 17821805 (-0.03%) instructions in affected programs: 60687 -> 55648 (-8.30%) helped: 28 / HURT: 8 total cycles in shared programs: 905332682 -> 904369499 (-0.11%) cycles in affected programs: 76743509 -> 75780326 (-1.26%) helped: 179 / HURT: 225 total spills in shared programs: 17922 -> 17908 (-0.08%) spills in affected programs: 2495 -> 2481 (-0.56%) helped: 6 / HURT: 8 total fills in shared programs: 26290 -> 25397 (-3.40%) fills in affected programs: 2606 -> 1713 (-34.27%) helped: 8 / HURT: 6 LOST: 1 GAINED: 1 Haswell total instructions in shared programs: 16678878 -> 16674444 (-0.03%) instructions in affected programs: 78458 -> 74024 (-5.65%) helped: 87 / HURT: 6 total cycles in shared programs: 880189381 -> 880301043 (0.01%) cycles in affected programs: 29956463 -> 30068125 (0.37%) helped: 169 / HURT: 163 total spills in shared programs: 14428 -> 14378 (-0.35%) spills in affected programs: 2384 -> 2334 (-2.10%) helped: 8 / HURT: 6 total fills in shared programs: 16975 -> 16881 (-0.55%) fills in affected programs: 1334 -> 1240 (-7.05%) helped: 10 / HURT: 4 Ivy Bridge total instructions in shared programs: 15706048 -> 15706035 (<.01%) instructions in affected programs: 9941 -> 9928 (-0.13%) helped: 13 / HURT: 0 total cycles in shared programs: 433618834 -> 433624637 (<.01%) cycles in affected programs: 12926714 -> 12932517 (0.04%) helped: 52 / HURT: 41 Sandy Bridge total cycles in shared programs: 741223552 -> 741223443 (<.01%) cycles in affected programs: 19814 -> 19705 (-0.55%) helped: 14 / HURT: 0 No changes on Iron Lake or GM45 fossil-db changes: Tiger Lake Instructions in all programs: 156858030 -> 156905532 (+0.0%) Instructions helped: 3915 Instructions hurt: 15411 Cycles in all programs: 7529667771 -> 7532117340 (+0.0%) Cycles helped: 10260 Cycles hurt: 9990 Spills in all programs: 5610 -> 5457 (-2.7%) Spills helped: 18 Fills in all programs: 6274 -> 6091 (-2.9%) Fills helped: 18 Gained: 2 Lost: 16 Ice Lake Instructions in all programs: 141308082 -> 141303083 (-0.0%) Instructions helped: 574 Instructions hurt: 172 Cycles in all programs: 9091361325 -> 9094622766 (+0.0%) Cycles helped: 8764 Cycles hurt: 11702 Spills in all programs: 7531 -> 7385 (-1.9%) Spills helped: 19 Fills in all programs: 8462 -> 8294 (-2.0%) Fills helped: 19 Gained: 22 Lost: 15 Skylake Instructions in all programs: 131872162 -> 131867263 (-0.0%) Instructions helped: 566 Instructions hurt: 172 Cycles in all programs: 8795095440 -> 8799676943 (+0.1%) Cycles helped: 8333 Cycles hurt: 12182 Spills in all programs: 7006 -> 6884 (-1.7%) Spills helped: 13 Fills in all programs: 7696 -> 7552 (-1.9%) Fills helped: 13 Gained: 24 Lost: 1 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Ian Romanick	64c251bb3a	intel/fs: Combine constants for SEL instructions too It is very common to have bcsel where the second and third sources are both constants. This results in a situation where we would want to emit a SEL with two constant sources, but that's not allowed. Previously, we would load both constants into registers, then let constant propagation copy the last constant into the SEL instruction. This results in the constant using an entire SIMD register instead of a single channel. Instead, copy propagate both sources, then let the combine-constants pass do its thing. In the worst case, this stores the constant in a single channel of the SIMD register. In the best case, it reuses a value that was loaded into a register to satisfy another instruction. shader-db results: Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 19951549 -> 19948709 (-0.01%) instructions in affected programs: 482795 -> 479955 (-0.59%) helped: 1184 / HURT: 3 total cycles in shared programs: 858584724 -> 858205341 (-0.04%) cycles in affected programs: 356168375 -> 355788992 (-0.11%) helped: 1448 / HURT: 1195 total spills in shared programs: 6569 -> 6255 (-4.78%) spills in affected programs: 912 -> 598 (-34.43%) helped: 58 / HURT: 0 total fills in shared programs: 8218 -> 7813 (-4.93%) fills in affected programs: 1570 -> 1165 (-25.80%) helped: 58 / HURT: 0 LOST: 6 GAINED: 16 Broadwell total instructions in shared programs: 17819660 -> 17819389 (<.01%) instructions in affected programs: 1078129 -> 1077858 (-0.03%) helped: 1067 / HURT: 304 total cycles in shared programs: 904722624 -> 905035016 (0.03%) cycles in affected programs: 362583117 -> 362895509 (0.09%) helped: 1381 / HURT: 1123 total spills in shared programs: 17884 -> 17922 (0.21%) spills in affected programs: 5088 -> 5126 (0.75%) helped: 55 / HURT: 152 total fills in shared programs: 25533 -> 26290 (2.96%) fills in affected programs: 12992 -> 13749 (5.83%) helped: 61 /HURT: 295 LOST: 7 GAINED: 24 Haswell total instructions in shared programs: 16678080 -> 16673976 (-0.02%) instructions in affected programs: 1162893 -> 1158789 (-0.35%) helped: 1584 / HURT: 7 total cycles in shared programs: 880180082 -> 879932525 (-0.03%) cycles in affected programs: 364067522 -> 363819965 (-0.07%) helped: 1226 / HURT: 976 total spills in shared programs: 14937 -> 14428 (-3.41%) spills in affected programs: 7866 -> 7357 (-6.47%) helped: 351 / HURT: 5 total fills in shared programs: 17572 -> 16975 (-3.40%) fills in affected programs: 11028 -> 10431 (-5.41%) helped: 350 / HURT: 3 LOST: 8 GAINED: 16 Ivy Bridge total instructions in shared programs: 15704044 -> 15703158 (<.01%) instructions in affected programs: 304513 -> 303627 (-0.29%) helped: 707 / HURT: 0 total cycles in shared programs: 433560149 -> 433471118 (-0.02%) cycles in affected programs: 19299650 -> 19210619 (-0.46%) helped: 687 / HURT: 395 LOST: 2 GAINED: 9 Sandy Bridge total instructions in shared programs: 13913386 -> 13912884 (<.01%) instructions in affected programs: 195687 -> 195185 (-0.26%) helped: 455 / HURT: 0 total cycles in shared programs: 741156272 -> 741136266 (<.01%) cycles in affected programs: 10934349 -> 10914343 (-0.18%) helped: 578 / HURT: 289 LOST: 9 GAINED: 4 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8364056 -> 8364042 (<.01%) instructions in affected programs: 5178 -> 5164 (-0.27%) helped: 10 / HURT: 0 total cycles in shared programs: 248759794 -> 248757940 (<.01%) cycles in affected programs: 4305246 -> 4303392 (-0.04%) helped: 183 / HURT: 24 fossil-db results: Tiger Lake Instructions in all programs: 156943594 -> 156802601 (-0.1%) Instructions helped: 20595 Instructions hurt: 23248 Cycles in all programs: 7512086950 -> 7528386387 (+0.2%) Cycles helped: 29531 Cycles hurt: 27837 Spills in all programs: 13500 -> 5643 (-58.2%) Spills helped: 394 Spills hurt: 22 Fills in all programs: 18943 -> 6306 (-66.7%) Fills helped: 394 Fills hurt: 11 Gained: 93 Lost: 76 Ice Lake Instructions in all programs: 141395899 -> 141249621 (-0.1%) Instructions helped: 30067 Instructions hurt: 3 Cycles in all programs: 9097127057 -> 9089668235 (-0.1%) Cycles helped: 32268 Cycles hurt: 24315 Spills in all programs: 13695 -> 7564 (-44.8%) Spills helped: 403 Fills in all programs: 18400 -> 8494 (-53.8%) Fills helped: 403 Gained: 114 Lost: 137 Skylake Instructions in all programs: 131948328 -> 131826063 (-0.1%) Instructions helped: 29968 Instructions hurt: 3 Cycles in all programs: 8794778440 -> 8793934844 (-0.0%) Cycles helped: 32705 Cycles hurt: 23575 Spills in all programs: 10526 -> 7039 (-33.1%) Spills helped: 403 Fills in all programs: 11025 -> 7728 (-29.9%) Fills helped: 403 Gained: 102 Lost: 250 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Ian Romanick	cb0de0a1d3	intel/fs: Constant fold OR and AND The path taken in fs_visitor::swizzle_nir_scratch_addr for DG2 generates some AND and OR instructions before the SHL. This commit folds those so the whold calculation becomes a constant (like on older platforms). v2: Fix return type of src_as_uint. Noticed by Marcin. shader-db results: DG2 total instructions in shared programs: 23190475 -> 23179540 (-0.05%) instructions in affected programs: 36026 -> 25091 (-30.35%) helped: 7 / HURT: 0 total cycles in shared programs: 841196807 -> 841142563 (<.01%) cycles in affected programs: 1660670 -> 1606426 (-3.27%) helped: 7 / HURT: 0 No shader-db changes on any older Intel platforms. fossil-db results: DG2 Totals: Instrs: 197780372 -> 197773966 (-0.00%) Cycles: 14066410782 -> 14066399378 (-0.00%); split: -0.00%, +0.00% Subgroup size: 8438104 -> 8438112 (+0.00%) Send messages: 8049445 -> 8049446 (+0.00%) Scratch Memory Size: 14263296 -> 14264320 (+0.01%) Totals from 9 (0.00% of 668055) affected shaders: Instrs: 24547 -> 18141 (-26.10%) Cycles: 1984791 -> 1973387 (-0.57%); split: -0.98%, +0.40% Subgroup size: 88 -> 96 (+9.09%) Send messages: 867 -> 868 (+0.12%) Scratch Memory Size: 69632 -> 70656 (+1.47%) No fossil-db changes on any older Intel platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>	2023-07-25 22:11:21 +00:00
Ian Romanick	61c786bad5	intel/fs: Constant fold SHL This is a modified version of a commit originally in !7698. This version add the changes to brw_fs_copy_propagation. If the address passed to fs_visitor::swizzle_nir_scratch_addr is a constant, that function will generate SHL with two constant sources. DG2 uses a different path to generate those addresses, so the constant folding can't occur there yet. That will be addressed in the next commit. What follows is the commit change history from that older MR. v2: Previously this commit was after `intel/fs: Combine constants for integer instructions too`. However, this commit can create invalid instructions that are only cleaned up by `intel/fs: Combine constants for integer instructions too`. That would potentially affect the shader-db results of each commit, but I did not collect new data for the reordering. v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for !saturate. Both suggested by Ken. Also add an assertion that B/UB types don't matically come back. v4: Fix sources count. See also `ed3c2f73db` ("intel/fs: fixup sources number from opt_algebraic"). v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a comment added when pulling this commit out of !7698. Noticed by Ken. shader-db results: DG2 No changes. Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) total instructions in shared programs: 20655696 -> 20651648 (-0.02%) instructions in affected programs: 23125 -> 19077 (-17.50%) helped: 7 / HURT: 0 total cycles in shared programs: 858436639 -> 858407749 (<.01%) cycles in affected programs: 8990532 -> 8961642 (-0.32%) helped: 7 / HURT: 0 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 18500780 -> 18496630 (-0.02%) instructions in affected programs: 24715 -> 20565 (-16.79%) helped: 7 / HURT: 0 total cycles in shared programs: 946100660 -> 946087688 (<.01%) cycles in affected programs: 5838252 -> 5825280 (-0.22%) helped: 7 / HURT: 0 total spills in shared programs: 17588 -> 17572 (-0.09%) spills in affected programs: 1206 -> 1190 (-1.33%) helped: 2 / HURT: 0 total fills in shared programs: 25192 -> 25156 (-0.14%) fills in affected programs: 156 -> 120 (-23.08%) helped: 2 / HURT: 0 No shader-db changes on any older Intel platforms. fossil-db results: DG2 Totals: Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00% Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00% Totals from 16 (0.00% of 668055) affected shaders: Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17% Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01% Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 153425977 -> 153423678 (-0.00%) Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00% Subgroup size: 8535968 -> 8535976 (+0.00%) Send messages: 7697606 -> 7697607 (+0.00%) Scratch Memory Size: 4380672 -> 4381696 (+0.02%) Totals from 6 (0.00% of 662749) affected shaders: Instrs: 13893 -> 11594 (-16.55%) Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43% Subgroup size: 80 -> 88 (+10.00%) Send messages: 675 -> 676 (+0.15%) Scratch Memory Size: 91136 -> 92160 (+1.12%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>	2023-07-25 22:11:21 +00:00
Ian Romanick	5336cbff3b	intel/fs: Constant propagate into SHADER_OPCODE_SHUFFLE Code already exists to convert SHADER_OPCODE_SHUFFLE into a simple MOV when either source is constant. However... the constants have to actually get into those sources! On a shader that I'm working on that multiplies very large matrices using lots of subgroup operations, -SIMD8 shader: 1378 instructions. 3 loops. 793896 cycles. 0:0 spills:fills, 23 sends, scheduled with mode non-lifo. Promoted 0 constants. Compacted 22048 to 21664 bytes (2%) +SIMD8 shader: 346 instructions. 3 loops. 61742 cycles. 0:0 spills:fills, 23 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 5536 to 5216 bytes (6%) No changes in shader-db or fossil-db on any Intel platform. v2: Merge a bunch of identical cases. Suggested by Ken. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23609>	2023-06-21 17:16:57 +00:00
Ian Romanick	7ef45e661f	intel/fs: Add constant propagation for ADD3 v2: Require that the constant value be representable as either uint16_t or int16_t. Suggested by Matt. v3: Remove redundant patterns. Noticed by Matt. shader-db: DG2 total instructions in shared programs: 23103767 -> 23103577 (<.01%) instructions in affected programs: 51822 -> 51632 (-0.37%) helped: 98 / HURT: 15 total cycles in shared programs: 842347714 -> 842380017 (<.01%) cycles in affected programs: 1942595 -> 1974898 (1.66%) helped: 97 / HURT: 32 Nearly all of the affected shaders (around 9,900) are shaders in Cyberpunk 2077. It's about an even split between vertex and fragment shaders. The majority of the remaining affected shaders (3,600) are from Strange Brigade. This was also a nearly even split between fragment and vertex. All but two of the lost shaders are SIMD32 fragment shaders in Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2. fossil-db: DG2 Instructions in all programs: 196379107 -> 196248608 (-0.1%) helped: 13467 / HURT: 1210 Cycles in all programs: 13931355281 -> 13929955971 (-0.0%) helped: 11801 / HURT: 2922 Lost: 90 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Ian Romanick	7873edee6e	intel/fs: Use specialized version of regions_overlap in opt_copy_propagation Since one of the register must always be either VGRF or FIXED_GRF, much of regions_overlap and reg_offset can be elided. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -0.29% ± 0.097% (n = 5, pooled s = 0.361697). Using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s = 0.178312). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00

1 2

97 commits