fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 17:50:12 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	9b42215e0d	iris: ensure null render target for specific cases Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>	2024-09-23 15:56:02 +00:00
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Lionel Landwerlin	9a36278475	intel/nir: add printf lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	dde91d18c2	intel/nir: remove unused prototypes Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:37 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Caio Oliveira	865ef36609	intel/brw: Remove brw_shader.h Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:06 +00:00
Caio Oliveira	63a4a4400a	intel/brw: Remove edgeflag_is_last VS parameter Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5a3f65e678	intel/brw: Remove unused attrib workarounds Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	a1e694a890	intel/brw: Remove Gfx8- code from NIR passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	1ee29f82d2	intel/brw: Remove Gfx8- code from lower storage image pass Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	7c23b90537	intel/brw: Always use scalar shaders Remove scalar_stage[] array, since now it is always scalar. This removes any usage of vec4 shaders in brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Lionel Landwerlin	cf193af762	anv: fixup push descriptor shader analysis There are a couple mistakes here : - using a bitfield as an index to generate a bitfield... - in anv_nir_push_desc_ubo_fully_promoted(), confusing binding table access of the descriptor buffer with actual descriptors Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ff91c5ca42` ("anv: add analysis for push descriptor uses and store it in shader cache") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27504>	2024-02-19 11:10:29 +00:00
Caio Oliveira	dc76cfc781	intel/compiler: Collect NIR-only passes in intel_nir.h Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644>	2024-02-16 22:35:05 +00:00
Caio Oliveira	c5b80de583	intel/compiler: Rename brw_vue_map to intel_vue_map And move to the intel_shader_enums.h file. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>	2024-02-14 22:31:23 -08:00
Lionel Landwerlin	2a1ff08376	intel/compiler: make default NIR compiler options visible Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>	2024-02-13 00:06:45 +00:00
Sagar Ghuge	73a3257968	intel/compiler: Add texture operation lowering pass This pass combines the LOD or LOD bias and array index into a single 32-bit value since Xe2+ sampler messages requires us to do that. v2: (Alyssa) - Use nir_iand_imm instead of nir_iand and nir_imm_int - Use nir_trim_vector instead of nir_swizzle Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>	2024-02-12 21:25:48 +00:00
Kenneth Graunke	10ed4f1cab	intel/nir: Pass devinfo and prog_data to brw_nir_lower_cs_intrinsics We'll want to check for Alchemist and set various prog_data fields in the next patch, in order to enable some optimizations. Passing NULL for prog_data will remain valid and continue working as before. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>	2024-01-25 08:43:04 +00:00
Caio Oliveira	dba6451ce8	intel/cmat: Add pass to lower cooperative matrix to subgroup operations This is just the skeleton of the implementation. Future commits will fill it all in. v2: Move to src/intel/compiler v3 (idr): Use vecN instead of array[N] for slice type. v4 (idr): Refactor lower_cooperative_matrix_load and lower_cooperative_matrix_store into a single function. v5 (idr): Remove old, verbose debug logging. Assert that entry is not NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of 0xffff. s/cooperative_matrix/cmat/. All suggested by Caio. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> I put both R-b on this because, at this point, we've each done equal parts authoring and reviewing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Caio Oliveira	f4601d82c1	intel/compiler: Remove unused parameter from brw_nir_analyze_ubo_ranges() This parameter was used by i965 driver that is now gone. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Caio Oliveira	d2125dac85	intel/compiler: Take more precise params in brw_nir_optimize() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Caio Oliveira	c4be90b4ba	intel/compiler: Remove unused parameter from brw_nir_adjust_payload() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Lionel Landwerlin	74a40cc4b6	intel/fs: move lower of non-uniform at_sample barycentric to NIR We use a non-uniform lowering loop in the backend which we can do better in NIR because we can also use divergence analysis there. This change also limits VGRF usage to a single VGRF to hold the sample ID in the backend. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:13 +00:00
Alyssa Rosenzweig	09d31922de	nir: Drop "SSA" from NIR language Everything is SSA now. sed -e 's/nir_ssa_def/nir_def/g' \ -e 's/nir_ssa_undef/nir_undef/g' \ -e 's/nir_ssa_scalar/nir_scalar/g' \ -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \ -e 's/nir_gather_ssa_types/nir_gather_types/g' \ -i $(git grep -l nir \| grep -v relnotes) git mv src/compiler/nir/nir_gather_ssa_types.c \ src/compiler/nir/nir_gather_types.c ninja -C build/ clang-format cd src/compiler/nir && find .c .h -type f -exec clang-format -i \{} \; Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>	2023-08-12 16:44:41 -04:00
Lionel Landwerlin	9934613c74	anv/hasvk: track robustness per pipeline stage And split them into UBO and SSBO v2 (Lionel): - Get rid of robustness fields in anv_shader_bin v3 (Lionel): - Do not pass unused parameters around Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545>	2023-08-09 09:00:12 +03:00
Alyssa Rosenzweig	11fc4f969c	intel: Collapse is_ssa checks Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>	2023-08-03 22:40:29 +00:00
Lionel Landwerlin	fe81d40bff	intel/nir: add lower for sparse images & textures We have to lower images into image load + sampler residency. There is also a restriction on sampler access with a compare, lower those as 2 sampler instructions to meet the restriction. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>	2023-07-27 02:02:59 +03:00
Iván Briano	377c2a045f	intel/compiler: call brw_nir_adjust_payload from brw_postprocess_nir Calling anything after nir_trivialize_registers() risks undoing some of its work. In this case, brw_nir_adjust_payload() will do a constant folding pass if any payload adjusting happened, and that can turn a bunch of @store_regs into basically noops. Fixes dEQP-VK.subgroups.*task Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24325>	2023-07-25 22:48:09 +00:00
Marcin Ślusarz	a252123363	intel/compiler/mesh: compactify MUE layout Instead of using 4 dwords for each output slot, use only the amount of memory actually needed by each variable. There are some complications from this "obvious" idea: - flat and non-flat variables can't be merged into the same vec4 slot, because flat inputs mask has vec4 stride - multi-slot variables can have different layout: float[N] requires N 1-dword slots, but i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot - some output variables occur both in single-channel/component split and combined variants - crossing vec4 boundary requires generating more writes, so avoiding them if possible is beneficial This patch fixes some issues with arrays in per-vertex and per-primitive data (func.mesh.ext.outputs.*.indirect_array.q0 in crucible) and by reduction in single MUE size it allows spawning more threads at the same time. Note: this patch doesn't improve vk_meshlet_cadscene performance because default layout is already optimal enough. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>	2023-07-24 07:55:29 +00:00
Lionel Landwerlin	c26c0a36d3	intel/fs: disable coarse pixel shader with interpolater messages at sample Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9292 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23962>	2023-07-06 12:48:52 +00:00
Lionel Landwerlin	86e9943b00	intel/fs: teach ubo range analysis pass about resource_intel Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	12540dfb6b	intel/fs: add a pass to move resource_intel closer to user Non uniform lower can insert read_first_invocation on the result of resource_intel. We want to keep that intrinsic directly in front of the user (load_ubo/load_ssbo/load_image/etc...) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	429ef02f83	intel/fs: make tcs input_vertices dynamic We need to do 3 things to accomplish this : 1. make all the register access consider the maximal case when unknown at compile time 2. move the clamping of load_per_vertex_input prior to lowering nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the clamping will use the nir_intrinsic_load_patch_vertices_in to clamp), meaning clamping using derefs rather than lowered nir_intrinsic_load_per_vertex_input 3. in the known cases, lower nir_intrinsic_load_patch_vertices_in in NIR (so that the clamped elements still be vectorized to the smallest number of URB read messages) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>	2023-05-24 18:32:07 +00:00
Rohan Garg	a15cc833f9	intel: drop unused is_scalar function parameter in brw_nir_apply_key Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>	2023-05-18 15:46:06 +02:00
Rohan Garg	212810ac8a	intel: infer scalar'ness locally for brw_postprocess_nir Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>	2023-05-18 15:46:06 +02:00
Lionel Landwerlin	d04d701cc6	intel/nir: add options to storage image lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22302>	2023-04-18 08:38:55 +00:00
Lionel Landwerlin	a358b97c58	intel/fs: optimize uniform SSBO & shared loads Using divergence analysis, figure out when SSBO & shared memory loads are uniform and carry the data only once in register space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	56474fae93	intel/fs: fix subgroup invocation read bounds checking nir->info.subgroup_size can be set to an enum : SUBGROUP_SIZE_VARYING = 0 SUBGROUP_SIZE_UNIFORM = 1 SUBGROUP_SIZE_API_CONSTANT = 2 SUBGROUP_SIZE_FULL_SUBGROUPS = 3 So compute the API subgroup size value and compare it to the dispatch size to determine whether we need some bound checking. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ac192d79d` ("intel/fs: bound subgroup invocation read to dispatch size") Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>	2023-03-14 12:15:48 +00:00
Caio Oliveira	07de034791	intel/compiler: Drop brw_nir_lower_scoped_barriers Now that we handle scoped barriers with execution scope during NIR -> Backend IR translation, this lowering is not needed anymore. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Alejandro Piñeiro	ba0bc7182d	anv: use shader_info->var_copies_lowered Instead of passing allow_copies as a parameter for brw_nir_optimize (so manually doing that tracking). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>	2023-02-06 22:11:34 +00:00
Lionel Landwerlin	fd7debc8bb	intel/fs: make alpha_to_coverage a tristate That way in some cases we can do this dynamically. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:18 +00:00
Kenneth Graunke	90a2137cd5	intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs This gets our logical atomic messages using the lsc_opcode enum rather than the legacy BRW_AOP_* defines. We have to translate one way or another, and using the modern set makes sense going forward. One advantage is that the lsc_opcode encoding has opcodes for both integer and floating point atomics in the same enum, whereas the legacy encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX), which made it impossible to handle both sensibly in common code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Lionel Landwerlin	94bb4a13fa	intel/fs: make Wa_1806565034 conditional to non robust access Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>	2022-12-13 18:05:19 +00:00
Caio Oliveira	e9efd05af5	intel/compiler: Remove leftover declarations of old NIR passes Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19805>	2022-12-12 10:03:04 +00:00
Lionel Landwerlin	e25e17dd0c	intel/fs: clamp per vertex input accesses to patchControlPoints In a tesselation control shader where an input array is accessed using the index gl_InvocationID, we can end up accessing elements beyond the number of input vertices specified in the shader key. This happens because of the lowering in nir_lower_indirect_derefs(). This lowering will affect compact variables which happens in this case : in gl_PerVertex { vec4 gl_Position; float gl_ClipDistance[1]; } gl_in[gl_MaxPatchVertices]; The lowered code produced by NIR is somewhat ineffecient (implements a binary seach) : if (gl_InvocationID < 16) { if (gl_InvocationID < 8) { if (gl_InvocationID < 4) { vec4 vals = load_at_offset(0); value = bcsel(vals, gl_InvocationID); } else { vec4 vals = load_at_offset(4); value = bcsel(vals, gl_InvocationID - 4); } } else { if (gl_InvocationID < 12) { vec4 vals = load_at_offset(8); value = bcsel(vals, gl_InvocationID - 8); } else { vec4 vals = load_at_offset(12); value = bcsel(vals, gl_InvocationID - 12); } } } else { if (gl_InvocationID < 24) { ... } else { ... } } By default the gl_MaxPatchVertices must be set at 32 items and that's what the lowering code will use to divide the access into chunks of 4. But when running with 3 input vertices, this means we'll pull one more item than what was delivered in the shader payload. This triggers issues further down the register scheduling where the g5UD (register for the 4th item) is overwritten by a previous SEND, leading the URB read to use an invalid handle. This pass clamps any access load_per_vertex_input intrinsic vertex indice to (input_vertices - 1). Fixes issues with tests like : dEQP-VK.clipping.user_defined.clip_distance.vert_tess.* Also fixes a hang with zink/anv on : KHR-GL46.draw_elements_base_vertex_tests.AEP_shader_stages v2: Don't replace source register v3: Implement in NIR v4: Clamp per vertex array sizes in NIR (Jason) v5: Move the clamping on the intel compiler Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9749>	2022-12-07 08:16:03 +00:00
Lionel Landwerlin	6f2dbe6da1	anv: enable lower_shader_calls vectorizing On Q2RTX RT shaders : Totals from 7 (22.58% of 31) affected shaders: Instrs: 15453 -> 14418 (-6.70%) Cycles: 232647 -> 224959 (-3.30%) Send messages: 574 -> 481 (-16.20%) Spill count: 118 -> 106 (-10.17%) Fill count: 156 -> 140 (-10.26%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>	2022-11-30 07:23:30 +00:00
Caio Oliveira	fbe40720e0	intel/compiler: Remove redundant argument from brw_nir_create_passthrough_tcs Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19831>	2022-11-19 00:35:56 +00:00
Ian Romanick	f90d71055b	intel/compiler: Add and use a pass to generate imul_32x16 instructions Gfx8 and Gfx9 platforms are helped for cycles because now many instructions like mul(8) g12<1>D g10<8,8,1>D 6D become mul(8) g12<1>D g10<8,8,1>D 6W It is the same number of instructions, but the 32x16 multiply is a little faster. v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)". Noticed by Caio. Use nir_src_is_const instead of open coding it. Suggested by Caio. Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 845748380 -> 845145547 (-0.07%) cycles in affected programs: 446346348 -> 445743515 (-0.14%) helped: 6017 HURT: 0 helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8 helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39% 95% mean confidence interval for cycles value: -113.37 -87.00 95% mean confidence interval for cycles %-change: -0.42% -0.41% Cycles are helped. Skylake Cycles in all programs: 8844820715 -> 8828897462 (-0.2%) Cycles helped: 47914 Cycles hurt: 1 No shader-db or fossil-db changes on any other Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Vadym Shovkoplias	55c71217ec	driconf: Add a limit_trig_input_range option With this option enabled range of input values for fsin and fcos is limited to [-2pi : 2pi] by calculating the reminder after 2*pi modulo division. This helps to improve calculation precision for large input arguments on Intel. -v2: Add limit_trig_input_range option to prog_key to update shader cache (Lionel) Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>	2022-05-13 06:47:53 +00:00
Lionel Landwerlin	cebf284ac1	intel/compiler: add a new pass to lower shading rate into HW format Rework: * Jason: Modernize brw_nir_lower_shading_rate_output: 1. Use nir_shader_instructions_pass() 2. Use *_imm builder helpers. 3. Use nir_intrinsic_base() instead of ->const_index[0] v2: Also lower loads (Caio) v3: Update stage check to trigger lowering (Caio) v4: Assert on != MESH (Caio) v5: Fixup instruction insertion (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>	2022-02-02 17:09:46 +00:00

1 2 3

107 commits