fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 05:00:09 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	07745752d6	intel/brw: Skip fs_nir_setup_outputs for compute shaders There aren't any outputs, so there's no point to doing this work. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:54 -07:00
Kenneth Graunke	fa1564fb87	intel/brw: Recreate GS output registers after EmitVertex Geometry shaders write outputs multiple times, with EmitVertex() between them. The value of output variables becomes undefined after calling EmitVertex(), so we don't need to preserve those. This lets us recreate new registers after each EmitVertex(), assuming we aren't in control flow, allowing them to have separate live ranges. It also means that those registers are more likely to be written once, rather than having multiple writes, which can make optimization easier. This is pretty much a total hack, but it's helpful. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>	2024-06-08 02:18:51 -07:00
Sagar Ghuge	2dba5d484b	intel/fs: Adjust destination register size for global atomic on Xe2+ For 16-bit data type, we are padding 16-bit and using 32-bit data type, so we need to account for the padded portion while calculating the size_written. Rework: (Rohan) - Drop unnecessary fs_builder instance Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>	2024-06-06 00:18:37 +00:00
Sagar Ghuge	55c7b24899	intel/fs: Adjust destination register size for untyped atomic on Xe2+ For 16-bit data type, we are padding 16-bit and using 32-bit data type, so we need to account for the padded portion while calculating the size_written. Rework: (Rohan) - Drop unnecessary fs_builder instance Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>	2024-06-06 00:18:37 +00:00
Iván Briano	1c6a6349b0	intel/brw: always read LAYER/VIEWPORT from the FS payload Following on https://gitlab.freedesktop.org/mesa/mesa/-/issues/9811 the restriction that kept us from using the payload values for non-mesh cases is gone, so just use the same codepath for everything. But since we have functions that correctly read those for all gens, use those instead of the broken hack we had until now. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9796 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>	2024-06-05 21:52:51 +00:00
Iván Briano	3d071fe7db	intel/brw: add fetch_viewport_index function Like fetch_render_target_array_index(), it reads the values provided by the FS payload. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>	2024-06-05 21:52:51 +00:00
Sagar Ghuge	415c5ad989	intel/compiler: No need to re-type the destination register For 16-bit float case handling, intermediate destination register is already 32-bit wide, we don't have to retype it to 32-bit. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29506>	2024-06-04 18:07:44 +00:00
Lionel Landwerlin	d8b78924c5	brw: use a single virtual opcode to read ARF registers In `2c65d90bc8` I forgot to add the new SHADER_OPCODE_READ_MASK_REG opcode to the list of barrier instruction in the scheduler. Let's just use a single opcode for all ARF registers that need special scoreboarding and put the register as source (nicer for the debug output). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `2c65d90bc8` ("intel/brw: ensure find_live_channel don't access arch register without sync") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Ian Romanick	22095c60bc	nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64 This allows us to not generate 64-bit iadd3 on Intel but continue generating it for NVIDIA. No shader-db or fossil-db changes. v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit iadd3 on NVIDIA platforms. v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of the optimizations from ever occuring. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Georg Lehmann	dcab408a6c	nir: remove unpack_half_flush_to_zero It doesn't make sense to have two sets of opcodes for this when all backends that support the flush_to_zero variant just rely on the global floating point mode anyway. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>	2024-05-31 09:46:35 +00:00
Kenneth Graunke	fbe0f8d36d	intel/brw: Blockify convergent load_shared on Gfx11-12 as well Gfx11-12 can support SLM block loads via OWord Block Load messages (notably, the aligned version, not the unaligned version). A while back we deleted the SHADER_OPCODE_OWORD_BLOCK_READ opcode. Rather than bring it back, we continue using UNALIGNED_OWORD_BLOCK_READ for SLM block access (like we do for SSBOs) but switch it over to the aligned variant when lowering logical sends. We do ensure the alignment is at least 16B, however. This is ugly, but it's probably not worth bringing back a whole extra opcode for a legacy HDC block load quirk. References: BSpec 47652 and 1689 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29429>	2024-05-30 22:01:10 +00:00
Jordan Justen	4ffe1a9f9e	intel/brw: Fix SSBO/shared load offset register size for Xe2 Rework: * Ken: Reword commit message Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>	2024-05-28 18:45:49 +00:00
Jordan Justen	739613ec70	intel/brw: Simplify enabling brw_fs_test_dispatch_packing Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>	2024-05-28 18:45:49 +00:00
Lionel Landwerlin	6a8ff3b550	intel/compiler: store u_printf_info in prog_data So that the driver can decode the printf buffer. We're not going to use the NIR data directly from the driver (Iris/Anv) because the late compile steps might want to add more printfs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Lionel Landwerlin	ecbec25e84	intel/nir: add reloc delta to load_reloc_const_intel intrinsic We'll use the delta for an upcoming internal printf mechanism, where the PARAM_IDX will be the base printf reloc identifier and the BASE will be the string id. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	cd343fb9ac	intel/brw: Add support for fcsel opcodes Don't enable nir_opt_algebraic to generate these opcodes yet. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	ded8690336	intel/brw: Remove dsign optimization This bit from the comment should have been a big red flag: There are currently zero instances of fsign(double(x))IMM in shader-db or any test suite, so it is hard to care at this time. The implementation of that path was incorrect. The XOR instructions should be predicated like the OR instruction in the non-multiplication path. As a result, dsign(zero_value) x will not produce the correct result. Instead of fixing this code that is never exercised by anything, replace it with the simple lowering in NIR. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	fc2360167c	intel/brw: Avoid optimize_extract_to_float when it will just be undone later v2: Add bspec quotation. Suggested by Caio. With better understand of the restriction, only apply on DG2 and newer platforms. shader-db: DG2 and Meteor Lake had similar results. (DG2 shown) total instructions in shared programs: 19659363 -> 19659360 (<.01%) instructions in affected programs: 2484 -> 2481 (-0.12%) helped: 6 / HURT: 1 total cycles in shared programs: 823445738 -> 823432524 (<.01%) cycles in affected programs: 2619836 -> 2606622 (-0.50%) helped: 48 / HURT: 63 fossil-db: DG2 and Meteor Lake had similar results. (DG2 shown) Totals: Instrs: 154015863 -> 153987806 (-0.02%); split: -0.02%, +0.00% Cycle count: 17552172994 -> 17562047866 (+0.06%); split: -0.13%, +0.19% Spill count: 142124 -> 141544 (-0.41%); split: -0.54%, +0.13% Fill count: 266803 -> 266046 (-0.28%); split: -0.38%, +0.09% Scratch Memory Size: 10266624 -> 10271744 (+0.05%); split: -0.02%, +0.07% Max live registers: 32592428 -> 32592393 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5535944 -> 5535912 (-0.00%); split: +0.00%, -0.00% Totals from 41887 (6.63% of 631367) affected shaders: Instrs: 32971032 -> 32942975 (-0.09%); split: -0.10%, +0.01% Cycle count: 3892086217 -> 3901961089 (+0.25%); split: -0.60%, +0.85% Spill count: 105669 -> 105089 (-0.55%); split: -0.72%, +0.18% Fill count: 206459 -> 205702 (-0.37%); split: -0.49%, +0.12% Scratch Memory Size: 7766016 -> `7771136` (+0.07%); split: -0.03%, +0.09% Max live registers: 3230515 -> 3230480 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 337232 -> 337200 (-0.01%); split: +0.00%, -0.01% No shader-db or fossil-db changes on any earlier Intel platforms. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Ian Romanick	bf5d82654a	intel/brw: Fix optimize_extract_to_float for i2f of unsigned extract Fixes fs-uint-to-float-of-extract-int8.shader_test and fs-uint-to-float-of-extract-int16.shader_test added by piglit!883. No shader-db or fossil-db changes on any Intel platform. v2: Expand the comment explaining the potential problem. Suggested by Caio. Fixes: `29ce110be6` ("i965/fs: Remove extract virtual opcodes.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Kenneth Graunke	1b54b4fad5	intel/brw: Use VEC for NIR vec*() sources This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:50 -07:00
Kenneth Graunke	d4563747d9	intel/brw: Use VEC for output stores This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:49 -07:00
Kenneth Graunke	f0c29c9b71	intel/brw: Use VEC for FS outputs This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:49 -07:00
Kenneth Graunke	cbe7a13f2b	intel/brw: Use VEC for TCS/TES/GS input/output loads This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:48 -07:00
Kenneth Graunke	a94e1bd0ac	intel/brw: Use VEC for gl_FragCoord This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:47 -07:00
Kenneth Graunke	d0a24496fd	intel/brw: Use VEC for load_const This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:45 -07:00
Kenneth Graunke	c194df565a	intel/brw: Don't include unnecessary undefined values in texture results When emitting a sampler message, we allocate a temporary destination large enough to hold 4 values (or 5 for sparse). This is the maximum size needed to hold any result. However, we shrink the size written by the sampler message to skip writing any trailing components that NIR tells us are never read. So we may not write the entire temporary. The NIR texture instruction has a destination VGRF which is sized assuming that all components are present. We issue a LOAD_PAYLOAD instruction to copy our sampler result temporary to the NIR destination. When we reduce the response length of the sampler messages, then some of these temporary components have undefined values. The correct way to indicate that is by using a BAD_FILE source. Unfortunately, we were naively reading offsets of the temporary that were never written, but are still part of a larger VGRF. This complicates things. For example, sampling and only using RGB (not RGBA) was producing this: txl_logical(8) (written: 3) vgrf3+0.0:F, ... undef(8) (written: 4) vgrf4:UD load_payload(8) (written: 4) vgrf4:F, vgrf3+0.0:F, vgrf3+1.0:F, vgrf3+2.0:F, vgrf3+3.0:F The last source, vgrf3+3.0:F, is undefined, and should be BAD_FILE. Doing so allows VGRF splitting and other optimizations to work better. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:41 -07:00
Kenneth Graunke	674e89953f	intel/brw: Use new builder helpers that allocate a VGRF destination With the previous commit, we now have new builder helpers that will allocate a temporary destination for us. So we can eliminate a lot of the temporary naming and declarations, and build up expressions. In a number of cases here, the code was confusingly mixing D-type addresses with UD-immediates, or expecting a UD destination. But the underlying values should always be positive anyway. To accomodate the type inference restriction that the base types much match, we switch these over to be purely UD calculations. It's cleaner to do so anyway. Compared to the old code, this may in some cases allocate additional temporary registers for subexpressions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	319ba85e10	intel/brw: Add builder helpers for math functions Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	f5473e6edd	intel/brw: Don't use inst return value when it isn't needed We just want to emit an instruction, but we don't need to do anything further with it, so we don't need to store the resulting inst pointer anywhere. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Lionel Landwerlin	1bbe2d9833	intel/brw: fixup wm_prog_data_barycentric_modes() Always select sample barycentric when persample dispatch is unknown at compile time and let the payload adjustments feed the expected value based on dispatch. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>	2024-04-26 05:13:02 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	c22f44ff07	intel/brw: Replace brw_reg_type_from_bit_size by brw_type_with_size Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	f523bfcf90	intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_* Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Karol Herbst	d22f936019	nir: remove workgroup_id_zero_base This removes the need for drivers to handle both versions. The base will get added once in nir_lower_system_values when converting from deref to intrinsic and will be replaced by a zero for users not supporting it. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>	2024-04-24 20:18:49 +00:00
Jordan Justen	4e5ed7ebd5	intel/brw: Avoid getting a stride of 0 for nir_intrinsic_exclusive_scan Ref: `671745b616` ("intel/fs: Don't allow 0 stride on MOV destination") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>	2024-04-18 23:03:57 +00:00
Kenneth Graunke	e637c63239	intel/brw: Make an fs_builder::SYNC helper We always want a null destination, so this saves some typing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	d5b8cec7a2	intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN We no longer support the old LINE+MAC lowering, and we already lower this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always corresponds the PLN. The only catch is that LINTERP's operands are reversed from PLN, so we have to switch them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	46a7ee772e	intel/brw: Drop default size of 1 from bld.vgrf() calls This isn't necessary as 1 is the default value for the parameter. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	217d56e9b1	intel/brw: Delete fs_visitor::vgrf helper Just use fs_builder::vgrf instead of the older glsl_type-based one. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Ian Romanick	6d85f7129a	intel/brw/xe2+: DPAS must be SIMD16 now Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Ian Romanick	a8115221e5	nir: intel/brw: Change the order of sources for nir_dpas_intel It was by pure luck that all sources (and the result) of nir_dpas_intel had the same number of components. It is possible to support matrix sizes where the accumlator matrix and the result matrix are larger (e.g., 16x8 * 8x16 = 16x16). This breaks all of the assumptions of NIR's infrastructure for code generating intrinsics. Fix the by making the accumulator matrix be the first source. The accumulator and the result will always have the same dimensions (due to rules of matrix multiplication) and the same type (due to restructions of the cooperative matrix extension). This forces them to have the same number of components. This doesn't fix all the potential problems. NIR expects that all 0-sized sources will have the same number of components. This just ensures that the result has the correct number of components. Fixes: `6b14da33ad` ("intel/fs: nir: Add nir_intrinsic_dpas_intel") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Rohan Garg	df3a1348d1	intel/brw: minor rework to de duplicate variable assignment Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>	2024-03-28 19:53:40 +00:00
Kenneth Graunke	d473004576	intel/fs: Avoid generating useless UNDEFs for every SSA def Emitting UNDEF is only necessary when the instructions we generate to produce the NIR def are considered partial writes. By adding a simple check (adapted from fs_inst::is_partial_write()), we can avoid creating loads of unnecessary UNDEFs that we have to clean up later. Our first dead code elimination pass does get rid of them pretty quickly, but this should save memory and time during our first split_virtual_grfs and dead_code_elimination passes. This generates roughly 30% fewer instructions at the beginning. Improves compilation time of shaders: - Rise of the Tomb Raider: -3.51563% +/- 0.103951% (n=7) - Borderlands 3: -3.64422% +/- 0.300951% (n=7). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28169>	2024-03-19 19:32:18 +00:00
Caio Oliveira	b22879e753	intel/brw: Use predicates for quad_vote_any and quad_vote_all when available Up until Xe2, we can use the predicates ANY4H and ALL4H to achieve the same result with less instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>	2024-03-19 18:41:15 +00:00
Caio Oliveira	857e62e6ac	intel/brw: Implement quad_vote_any and quad_vote_all Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>	2024-03-19 18:41:15 +00:00
Ian Romanick	671745b616	intel/fs: Don't allow 0 stride on MOV destination Outside SIMD1 instructions, a destination stride of zero doesn't make any sense. When such strides exist, they would be fixed by the FS generator. Currently the only place that intentionally generates such a stride is setup_barrier_message_payload_gfx125, and this commit changes that. The existence of a zero stride that won't really be a zero stride causes a variety of problems with other optimization passes. Those passes don't know that 0 actually means 1, and they make incorrect assumptions about sizes written, etc. The assertion helped catch many bugs in some other work in progress that tries to store convergent values in SIMD8 registers regardless of the dispatch width. That code would accidentally generate destination strides of zero. v2: Check stride differently depending on register file. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>	2024-03-19 18:17:59 +00:00
Kenneth Graunke	97aec40111	intel/brw: Emit better code for read_invocation(x, constant) For something as basic as read_invocation(x, 0), we were emitting: mov(8) vgrf67:D, 0d find_live_channel(8) vgrf236:UD, NoMask broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W This is way overcomplicated - if the invocation is a constant, we can simply emit a single MOV which reads the desired channel index. Not only that, but it's difficult to clean up: 1. If this expression appears multiple times, CSE will find all the redundant emit_uniformize(invocation) and get rid of the duplicate (find_live_channel+broadcast) on future instructions. 2. Copy propagation will put the 0d directly in the first broadcast. 3. Dead code elimination will get rid of the vgrf67 temp holding 0. 4. Algebraic will replace the first broadcast(x, 0) with a MOV. 5. Copy propagation will put the 0d directly in the second broadcast. 6. Dead code elimination will get rid of the vgrf237 temp. 7. Algebraic will replace the second broadcast(x, 0) with a MOV. 8. Copy propagation will finally combine the two MOVs That's at least 7-8 optimization passes and several loops through the same passes just to clean up something we can do trivially. Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23). Shortens compilation time of the google-meet-clvk/Relight pipeline by -2.87717% +/- 0.509162% (n=150). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>	2024-03-12 21:58:27 +00:00
Rohan Garg	73d98848fa	intel/compiler: Xe2+ can do URB load/store with a byte offset Thanks to Ken for suggesting this URB refactoring change and pointing out that the LSC can operate on the byte offset granularity. This should fix the geometry shader test cases where we have more than 32 vertices since previously we were failing to write the correct control data bits because of incorrect write mask. Shader-db results for Xe2: total instructions in shared programs: 153475 -> 153437 (-0.02%) instructions in affected programs: 1374 -> 1336 (-2.77%) helped: 11 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3 helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70% 95% mean confidence interval for instructions value: -3.92 -2.99 95% mean confidence interval for instructions %-change: -4.10% -2.36% Instructions are helped. total loops in shared programs: 140 -> 140 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 16002649 -> 16002329 (<.01%) cycles in affected programs: 9174 -> 8854 (-3.49%) helped: 11 HURT: 0 helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32 helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85% 95% mean confidence interval for cycles value: -33.56 -24.62 95% mean confidence interval for cycles %-change: -4.48% -3.08% Cycles are helped. total spills in shared programs: 52 -> 52 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 94 -> 94 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 total sends in shared programs: 4240 -> 4240 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Rework: (Sagar) - Adjust offset/indirect offset calculation. - Add shader-db results - Always calculate dword index - Drop changes for indirect writes Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>	2024-03-01 16:11:30 +00:00

1 2 3 4 5 ...

649 commits