fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Qiang Yu	3151f5ec47	nir: add filter parameter to nir_lower_array_deref_of_vec To be used by latter commits to limit the lowering to specific variables. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>	2024-07-03 02:06:56 +00:00
Francisco Jerez	e8007c9325	intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+. Floating-point offsets work fine in combination with the floating-point arithmetic we're about to lower these intrinsics into, and they require less instructions than converting to fixed-point and then back. No reason to take the precision/range hit nor the extra instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Alyssa Rosenzweig	da752ed7c1	treewide: use nir_def_replace sometimes Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but it's a start. Coccinelle patch: @@ expression intr, repl; @@ -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(&intr->instr); +nir_def_replace(&intr->def, repl); Coccinelle patch: @@ identifier intr; expression instr, repl; @@ nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); ... -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(instr); +nir_def_replace(&intr->def, repl); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna] Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>	2024-06-21 15:36:56 +00:00
Alyssa Rosenzweig	15257b65c6	treewide: use nir_metadata_control_flow Via Coccinelle patch: @@ @@ -nir_metadata_block_index \| nir_metadata_dominance +nir_metadata_control_flow ...plus some manual fixups for call sites missed by coccinelle. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>	2024-06-17 16:28:14 -04:00
Ian Romanick	7b7e5cf5d4	nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers v2: Add some comments explaining some of the nuance of the shift optimizations. Fix a bug in the shift count calculation of the upper 32-bits. Move the @64 from the variable to the opcode. All suggested by Jordan. No shader-db changes on any Intel platform. fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154507026 -> 154506576 (-0.00%) Cycle count: 17436298868 -> 17436295016 (-0.00%) Max live registers: 32635309 -> 32635297 (-0.00%) Totals from 42 (0.01% of 632575) affected shaders: Instrs: 5616 -> 5166 (-8.01%) Cycle count: 133680 -> 129828 (-2.88%) Max live registers: 1158 -> 1146 (-1.04%) No fossil-db changes on any other Intel platform. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Lionel Landwerlin	9a36278475	intel/nir: add printf lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Ian Romanick	24cdbbdaa2	intel/brw: Delete stray nir_opt_dce No shader-db or fossil-db changes on any Intel platform. Fixes: `f76f4be301` ("intel/compiler: move gen5 final pass to actually be final pass") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>	2024-04-04 23:42:27 +00:00
Ian Romanick	6377e8fd29	intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa Per discussion in #10727, removing phis breaks LCSSA form which in turn invalidates divergence analysis. shader-db: All Skylake and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20299612 -> 20299695 (<.01%) instructions in affected programs: 20829 -> 20912 (0.40%) helped: 6 / HURT: 13 total cycles in shared programs: 842149085 -> 842148399 (<.01%) cycles in affected programs: 15146222 -> 15145536 (<.01%) helped: 40 / HURT: 45 fossil-db: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00% Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00% Spill count: 45213 -> 45220 (+0.02%) Fill count: 74166 -> 74184 (+0.02%) Totals from 94 (0.01% of 656116) affected shaders: Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20% Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37% Spill count: 3474 -> 3481 (+0.20%) Fill count: 6713 -> 6731 (+0.27%) Fixes: `6dbb5f1e07` ("intel/fs: rerun divergence analysis prior to convert_from_ssa") Closes: #10727 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>	2024-04-04 23:42:27 +00:00
Dylan Baker	75ede9d9bc	intel/brw: track last successful pass and leave the loop early This is similar to what RADV implements using the NIR_LOOP_PASS helpers. I have not used those helpers for a couple of reasons: 1. They use the pointer to the optimization function, which doesn't work if the same function is called multiple times in one invocation of the loop (fixable) 2. After fixing them, due to Intel's use of sub-expressions, the amount of code added to wrap the shared macro becomes more than simply reimplementing them for the Intel compiler On most workloads the results are a wash, but on compile heavy workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw fossil-db runtimes fall by 1-2% on my ICL, with no changes to the compiled shaders. Caio saw closer to 2.5% on TGL. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>	2024-03-21 23:02:32 +00:00
Alyssa Rosenzweig	a6123a80da	nir/opt_shrink_vectors: shrink some intrinsics from start If the backend supports it, intrinsics with a component() are straightforward to shrink from the start. Notably helps vectorized I/O. v2: add an option for this and enable only on grown up backends, because some backends ignore the component() parameter. RADV GFX11: Totals from 921 (1.16% of 79439) affected shaders: Instrs: 616558 -> 615529 (-0.17%); split: -0.30%, +0.14% CodeSize: 3099864 -> 3095632 (-0.14%); split: -0.25%, +0.11% Latency: 2177075 -> 2160966 (-0.74%); split: -0.79%, +0.05% InvThroughput: 299997 -> 298664 (-0.44%); split: -0.47%, +0.02% VClause: 16343 -> 16395 (+0.32%); split: -0.01%, +0.32% SClause: 10715 -> 10714 (-0.01%) Copies: 24736 -> 24701 (-0.14%); split: -0.37%, +0.23% PreVGPRs: 30179 -> 30173 (-0.02%) VALU: 353472 -> 353439 (-0.01%); split: -0.03%, +0.02% SALU: 40323 -> 40322 (-0.00%) VMEM: 25353 -> 25352 (-0.00%) AGX: total instructions in shared programs: 2038217 -> 2038049 (<.01%) instructions in affected programs: 10249 -> 10081 (-1.64%) total alu in shared programs: 1593094 -> 1592939 (<.01%) alu in affected programs: 7145 -> 6990 (-2.17%) total fscib in shared programs: 1589254 -> 1589102 (<.01%) fscib in affected programs: 7217 -> 7065 (-2.11%) total bytes in shared programs: 13975666 -> 13974722 (<.01%) bytes in affected programs: 65942 -> 64998 (-1.43%) total regs in shared programs: 592758 -> 591187 (-0.27%) regs in affected programs: 6936 -> 5365 (-22.65%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28004>	2024-03-12 18:17:17 +00:00
Caio Oliveira	865ef36609	intel/brw: Remove brw_shader.h Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:06 +00:00
Kenneth Graunke	5fbba530cf	intel/brw: Delete compiler->supports_shader_constants True for all drivers using this compiler. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27872>	2024-02-29 18:00:14 +00:00
Caio Oliveira	63a4a4400a	intel/brw: Remove edgeflag_is_last VS parameter Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5a3f65e678	intel/brw: Remove unused attrib workarounds Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	d3e451780b	intel/brw: Inline brw_nir_apply_sampler_key code It doesn't use the prog_key anymore, so just move the nir_lower_tex call pass to the single callsite. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	a1e694a890	intel/brw: Remove Gfx8- code from NIR passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	7c23b90537	intel/brw: Always use scalar shaders Remove scalar_stage[] array, since now it is always scalar. This removes any usage of vec4 shaders in brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Caio Oliveira	303fd4e935	intel/brw: Move type_size_* functions out of vec4-specific file Will make easier later to delete vec4 files. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Ian Romanick	535caaf3e0	nir: Optimize uniform iadd, fadd, and ixor reduction operations This adds optimizations for iadd, fadd, and ixor with reduce, inclusive scan, and exclusive scan. NOTE: The fadd and ixor optimizations had no shader-db or fossil-db changes on any Intel platform. NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size and base-local-size.shader_test on DG2 and MTL. This is just changing the code path taken to not use whatever path was not working properly before. This is a subset of the things optimized by ACO. See also https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The min, max, iand, and ior exclusive_scan optimizations are not implemented. Broadwell on shader-db is not happy. I have not investigated. v2: Silence some warnings about discarding const. v3: Rename mbcnt to count_active_invocations. Add a big comment explaining the differences between the two paths. Suggested by Rhys. shader-db: All Gfx9 and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20300384 -> 20299545 (<.01%) instructions in affected programs: 19167 -> 18328 (-4.38%) helped: 35 / HURT: 0 total cycles in shared programs: 842809750 -> 842766381 (<.01%) cycles in affected programs: 2160249 -> 2116880 (-2.01%) helped: 33 / HURT: 2 total spills in shared programs: 4632 -> 4626 (-0.13%) spills in affected programs: 206 -> 200 (-2.91%) helped: 3 / HURT: 0 total fills in shared programs: 5594 -> 5581 (-0.23%) fills in affected programs: 664 -> 651 (-1.96%) helped: 3 / HURT: 1 fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165551893 -> 165513303 (-0.02%) Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00% Spill count: 45258 -> 45204 (-0.12%) Fill count: 74286 -> 74157 (-0.17%) Scratch Memory Size: 2467840 -> 2451456 (-0.66%) Totals from 712 (0.11% of 656120) affected shaders: Instrs: 598931 -> 560341 (-6.44%) Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04% Spill count: 983 -> 929 (-5.49%) Fill count: 2274 -> 2145 (-5.67%) Scratch Memory Size: 52224 -> 35840 (-31.37%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 09:44:11 -08:00
Ian Romanick	c63ea755fe	intel/fs: Use nir_opt_uniform_subgroup shader-db: All Skylake and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20300435 -> 20300384 (<.01%) instructions in affected programs: 303 -> 252 (-16.83%) helped: 2 / HURT: 0 total cycles in shared programs: 842810326 -> 842809750 (<.01%) cycles in affected programs: 8374 -> 7798 (-6.88%) helped: 2 / HURT: 0 fossil-db: All Intel platforms (note below) had similar results. (Ice Lake shown) Instrs: 165559735 -> 165551893 (-0.00%) Cycles: 15133083961 -> 15132539132 (-0.00%); split: -0.00%, +0.00% Spill count: 45262 -> 45258 (-0.01%) Fill count: 74293 -> 74286 (-0.01%) Totals from 854 (0.13% of 656120) affected shaders: Instrs: 3461998 -> 3454156 (-0.23%) Cycles: 154252729 -> 153707900 (-0.35%); split: -0.36%, +0.01% Spill count: 2655 -> 2651 (-0.15%) Fill count: 3881 -> 3874 (-0.18%) DG2 did not see changes in spills or fills. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:38:45 -08:00
Ian Romanick	b22fff90d5	intel/fs: Enable nir_opt_uniform_atomics in all shader stages The problem seems to have been related to nir_intrinsic_load_global_block_intel being marked as non-divergent. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:37:05 -08:00
Sagar Ghuge	269d2c4a3f	intel/compiler: Enable packing of offset with LOD or Bias Move intel_nir_lower_texture just before nir_lower_tex since we need to operate on the offset and those are getting lowerd. v2: (Ian) - Rename variable name to intel_tex_options Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Caio Oliveira	d8f9a05f32	intel/compiler: Rename the passes and files related to intel_nir.h Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644>	2024-02-16 22:35:05 +00:00
Caio Oliveira	dc76cfc781	intel/compiler: Collect NIR-only passes in intel_nir.h Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644>	2024-02-16 22:35:05 +00:00
Caio Oliveira	c5b80de583	intel/compiler: Rename brw_vue_map to intel_vue_map And move to the intel_shader_enums.h file. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>	2024-02-14 22:31:23 -08:00
Lionel Landwerlin	2437556d83	intel/fs: rerun divergence prior to lowering non-uniform interpolate at sample Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `74a40cc4b6` ("intel/fs: move lower of non-uniform at_sample barycentric to NIR") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>	2024-02-13 00:06:44 +00:00
Sagar Ghuge	98b62434bd	intel/compiler: Lower texture operation to combine LOD and AI We have to push the lowering of texture operations a bit further in pipeline since nir_lower_tex gets invoked twice and if there is no LOD source present, nir_lower_tex adds that as a source. Once that's all done we can easily combine the LOD and array index into a single 32-bit value. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>	2024-02-12 21:25:48 +00:00
Sagar Ghuge	15129c7634	intel/compiler: Use nir_tex_src_backend1 to pack LOD and array index Since this lowering is totally Intel specific, we don't have to introduce the new texture source. We can use the nir_tex_src_backend1 source to pack LOD/LOD Bias and array index into 32 bit single value. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>	2024-02-12 21:25:48 +00:00
Ian Romanick	84de7a88d3	intel/compiler/xe2: Emit texture instructions w/ combined LOD and array index The extra assertions are just there to help validate pack_lod_and_array_index (in nir_lower_tex.c). v2: Split got_lod_or_bias into two variables. This simplifies some changes that Sagar is working on. Suggested by Sagar. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>	2024-02-02 02:39:10 +00:00
Caio Oliveira	4af079960d	intel/compiler: Enable lower_rotate_to_shuffle in subgroup lowering Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27272>	2024-01-25 19:07:42 +00:00
Daniel Schürmann	a3ed36da1a	treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop() Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11) MaxWaves: 18134 -> 18130 (-0.02%) Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08% CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12% VGPRs: 63580 -> 63604 (+0.04%) SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67% Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02% InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02% VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02% SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04% Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94% Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26% PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87% PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06% Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>	2024-01-03 20:48:04 +00:00
Dave Airlie	f76f4be301	intel/compiler: move gen5 final pass to actually be final pass This got broken by the register conversion, this pass needs to be after all the others. Fixes: `ce75c3c3fe` ("intel: Switch to intrinsic-based registers") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26731>	2023-12-18 07:24:37 +00:00
Lionel Landwerlin	6dbb5f1e07	intel/fs: rerun divergence analysis prior to convert_from_ssa Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9964 Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26235>	2023-11-17 06:40:49 +00:00
Rhys Perry	f695a9fed2	intel/compiler: use nir_lower_fp16_casts Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566>	2023-11-16 11:02:31 +00:00
Caio Oliveira	d2125dac85	intel/compiler: Take more precise params in brw_nir_optimize() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Caio Oliveira	c4be90b4ba	intel/compiler: Remove unused parameter from brw_nir_adjust_payload() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Iván Briano	54498937c5	intel/compiler: round f2f16 correctly for RTNE case v2: bcsel -> b2i32 (Ian) Fixes upcoming Vulkan CTS tests: dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_vert dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_vert dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_frag dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_frag Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>	2023-10-09 23:37:52 +00:00
Connor Abbott	4282386311	nir/spirv: Add inverse_ballot intrinsic This is actually a no-op on AMD, so we really don't want to lower it to something more complicated. There may be a more efficient way to do this on Intel too. In addition, in the future we'll want to use this for lowering boolean reduce operations, where the inverse ballot will operate on the backend's "natural" ballot type as indicated by options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In total, there are now three possible lowerings we may have to perform: - inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot with natural source type, when the backend supports inverse_ballot natively. - inverse_ballot with source type of uvec4 from SPIR-V to arithmetic, when the backend doesn't support inverse_ballot. - inverse_ballot with natural source type from reduce operation, when the backend doesn't support inverse_ballot. Previously we just did the second lowering unconditionally in vtn, but it's just a combination of the first and third. We add support here for the first and third lowerings in nir_lower_subgroups, instead of simply moving the second lowering, to avoid unnecessary churn. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>	2023-09-20 14:41:18 +00:00
Pavel Ondračka	1c72c71bdf	nir/move_vec_src_uses_to_dest: allow to skip reuse of constant sources And enable this for r300 and intel-vec4 crocus HSW (mostly helps few doplhin ubershaders): total instructions in shared programs: 1576736 -> 1576589 (<.01%) instructions in affected programs: 38235 -> 38088 (-0.38%) helped: 12 HURT: 0 total cycles in shared programs: 111025838 -> 110944796 (-0.07%) cycles in affected programs: 5646582 -> 5565540 (-1.44%) helped: 15 HURT: 6 total spills in shared programs: 447 -> 432 (-3.36%) spills in affected programs: 186 -> 171 (-8.06%) helped: 12 HURT: 0 total fills in shared programs: 792 -> 774 (-2.27%) fills in affected programs: 291 -> 273 (-6.19%) helped: 12 HURT: 0 r300 RV530: total instructions in shared programs: 96655 -> 96304 (-0.36%) instructions in affected programs: 15020 -> 14669 (-2.34%) helped: 79 HURT: 18 total temps in shared programs: 13027 -> 12952 (-0.58%) temps in affected programs: 677 -> 602 (-11.08%) helped: 41 HURT: 9 total cycles in shared programs: 147745 -> 147314 (-0.29%) cycles in affected programs: 21831 -> 21400 (-1.97%) helped: 84 HURT: 19 r300 RV370: total instructions in shared programs: 63678 -> 63669 (-0.01%) instructions in affected programs: 931 -> 922 (-0.97%) helped: 12 HURT: 6 total temps in shared programs: 10028 -> 10013 (-0.15%) temps in affected programs: 339 -> 324 (-4.42%) helped: 33 HURT: 10 total cycles in shared programs: 101118 -> 101087 (-0.03%) cycles in affected programs: 2659 -> 2628 (-1.17%) helped: 22 HURT: 6 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24932>	2023-09-19 18:05:37 +02:00
Alyssa Rosenzweig	d1eb17e92e	treewide: Drop nir_ssa_for_src users Via Coccinelle patch: @@ expression b, s, n; @@ -nir_ssa_for_src(b, *s, n) +s->ssa @@ expression b, s, n; @@ -nir_ssa_for_src(b, s, n) +s.ssa Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>	2023-09-18 10:25:17 -04:00
Ian Romanick	5eddf60e56	intel/compiler: Combine control barriers with identical memory semantics This prevents the second barrier generating a spurious, identical fence message as the first barrier. fossil-db stats on Alchemist: Totals: Instrs: 196513342 -> 196512777 (-0.00%); split: -0.00%, +0.00% Cycles: 14271426028 -> 14271404569 (-0.00%); split: -0.00%, +0.00% Send messages: 8021892 -> 8021770 (-0.00%) Totals from 46 (0.01% of 653252) affected shaders: Instrs: 76761 -> 76196 (-0.74%); split: -0.75%, +0.01% Cycles: 2027946 -> 2006487 (-1.06%); split: -1.45%, +0.39% Send messages: 7589 -> 7467 (-1.61%) Nothing in shader-db was affected. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>	2023-09-09 04:41:25 +00:00
Lionel Landwerlin	10e75aae1b	intel/nir: rerun lower_tex if it lowers something nir_lower_tex can lower tg4 coords into tg4 offset which on DG2+ we also need to lower into constant offsets. Unfortunately the nir_lower_tex pass is not able to lower the instructions it itself generates, so the easy fix for when nir_lower_tex lowers tg4 coords into tg4 offsets is to rerun the pass. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9735 Cc: mesa-stable Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25015>	2023-09-05 13:35:51 +00:00
Lionel Landwerlin	74a40cc4b6	intel/fs: move lower of non-uniform at_sample barycentric to NIR We use a non-uniform lowering loop in the backend which we can do better in NIR because we can also use divergence analysis there. This change also limits VGRF usage to a single VGRF to hold the sample ID in the backend. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:13 +00:00
Alyssa Rosenzweig	cda1961835	treewide: Also handle struct nir_builder form Via Coccinelle patch: @def@ typedef bool; typedef nir_builder; typedef nir_instr; typedef nir_def; identifier fn, instr, intr, x, builder, data; @@ static fn(struct nir_builder* builder, -nir_instr instr, +nir_intrinsic_instr intr, ...) { ( - if (instr->type != nir_instr_type_intrinsic) - return false; - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); \| - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); - if (instr->type != nir_instr_type_intrinsic) - return false; ) <... ( -instr->x +intr->instr.x \| -instr +&intr->instr ) ...> } @pass depends on def@ identifier def.fn; expression shader, progress; @@ ( -nir_shader_instructions_pass(shader, fn, +nir_shader_intrinsics_pass(shader, fn, ...) \| -NIR_PASS_V(shader, nir_shader_instructions_pass, fn, +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn, ...) \| -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn, +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn, ...) ) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>	2023-08-24 15:48:02 +00:00
Alyssa Rosenzweig	465b138f01	treewide: Use nir_shader_intrinsic_pass sometimes This converts a lot of trivial passes. Nice boilerplate deletion. Via Coccinelle patch (with a small manual fix-up for panfrost where coccinelle got confused by genxml + ninja clang-format squashed in, and for Zink because my semantic patch was slightly buggy). @def@ typedef bool; typedef nir_builder; typedef nir_instr; typedef nir_def; identifier fn, instr, intr, x, builder, data; @@ static fn(nir_builder* builder, -nir_instr instr, +nir_intrinsic_instr intr, ...) { ( - if (instr->type != nir_instr_type_intrinsic) - return false; - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); \| - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); - if (instr->type != nir_instr_type_intrinsic) - return false; ) <... ( -instr->x +intr->instr.x \| -instr +&intr->instr ) ...> } @pass depends on def@ identifier def.fn; expression shader, progress; @@ ( -nir_shader_instructions_pass(shader, fn, +nir_shader_intrinsics_pass(shader, fn, ...) \| -NIR_PASS_V(shader, nir_shader_instructions_pass, fn, +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn, ...) \| -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn, +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn, ...) ) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>	2023-08-24 15:48:02 +00:00

1 2 3 4 5 ...

367 commits