fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Rohan Garg	9b477eea19	intel/compiler: use a immediate when doing the shift We can pass immediates to SHL and don't need to allocate a separate register here. Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34604>	2025-04-18 10:08:22 +00:00
Ian Romanick	cb69d019cf	brw/nir: Use offset() for all uses of offs in emit_pixel_interpolater_alu_at_offset This is necessary to appropriately uniformize the first component access of a convergent vector. Without this, this is produced: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f This is the correct code: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f Without `38b58e286f`, the code generated was more incorrect, but happened to work for this test case: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+0.4<0>:F, 0.5f Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `38b58e286f` ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset") Closes: #12969 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>	2025-04-09 22:21:18 +00:00
Caio Oliveira	bf9ad36f2d	brw: Properly handle cooperative matrices created with constants Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Expand constant sources to cover the region read by DPAS, and also use NULL register as accumulator when possible. Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>	2025-04-07 14:27:43 -07:00
Ian Romanick	f33faa4648	brw/nir: Allow b2f(not(X)) optimization on Gfx12.5+ Since there are no type conversions, no restrictions are violated. No shader-db or fossil-db changes on any Gfx12 or older Intel platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 16956077 -> 16944933 (-0.07%) instructions in affected programs: 1957573 -> 1946429 (-0.57%) helped: 4629 / HURT: 35 total cycles in shared programs: 915668518 -> 915684808 (<.01%) cycles in affected programs: 341925598 -> 341941888 (<.01%) helped: 3040 / HURT: 1305 helped stats (abs) min: 2 max: 23034 x̄: 205.36 x̃: 16 helped stats (rel) min: <.01% max: 41.21% x̄: 1.28% x̃: 0.48% HURT stats (abs) min: 2 max: 68820 x̄: 490.88 x̃: 22 HURT stats (rel) min: <.01% max: 103.69% x̄: 2.29% x̃: 0.37% 95% mean confidence interval for cycles value: -50.28 57.78 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 40 GAINED: 42 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 209828027 -> 209790349 (-0.02%); split: -0.03%, +0.01% Cycle count: 30504938008 -> 30514045408 (+0.03%); split: -0.06%, +0.09% Spill count: 512182 -> 512168 (-0.00%) Fill count: 623432 -> 623426 (-0.00%); split: -0.00%, +0.00% Max live registers: 65465029 -> 65464959 (-0.00%) Totals from 57895 (8.19% of 706589) affected shaders: Instrs: 50144907 -> 50107229 (-0.08%); split: -0.11%, +0.03% Cycle count: 7549692606 -> 7558800006 (+0.12%); split: -0.25%, +0.37% Spill count: 58834 -> 58820 (-0.02%) Fill count: 102324 -> 102318 (-0.01%); split: -0.01%, +0.01% Max live registers: 9129045 -> 9128975 (-0.00%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	853ead2073	brw/nir: Optimize b2f(not(X)) using logical operations instead of arithmetic Funny story... this is how regular b2f was implemented before Curro implmented the `MOV dst:F -src:D` method 9 years ago (see `3ee2daf23d`). Eliminating the type conversion in the arithmetic operation enables the next commit. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	5656682344	brw/nir: Eliminate default parameter to get_nir_src The vast majority of the callers want channel = 0. During the development process, using this default parameter value saved a lot of pain in rebasing. However, it seems to be more trouble than it's worth. Issue #12464 occurred because LNL was merged while this code was in review. As a result, one caller of get_nir_src that wanted channel = -1 was not inspected closely, and it got the default channel = 0 instead. To prevent this happening in the future (with possible branches still yet to be merged, for example), remove the default parameter. This will force the inspection of any callers that don't have an explicit channel parameter. Hopefully that will prevent more problems. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00
Ian Romanick	38b58e286f	brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset The source of nir_intrinsic_load_barycentric_at_offset is a vector, so -1 should be passed to get_nir_src. This is also done for texture sampling intrinsics. I skimmed the other user of get_nir_src, and I believe they are correct. This one was just missed as LNL support landed an many, many rebases of the original MR occurred. v2: Fix another get_nir_src call. Suggested by Lionel. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `d5d7ae22ae` ("brw/nir: Fix up handling of sources that might be convergent vectors") Closes: #12464 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00
Caio Oliveira	7ae638c0fe	brw: Add brw_builder::uniform() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34355>	2025-04-04 23:07:21 +00:00
Lionel Landwerlin	4346210ae6	brw: move texture offset packing to NIR That way we can deal with upcoming non constant values for VK_KHR_maintenance8. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>	2025-03-29 02:15:18 +00:00
Kenneth Graunke	51c67ad7cf	brw: Avoid regioning restrictions for u2u16/i2i16 narrowing conversions Cuts 0.83% of instructions on Alchemist in affected fossil-db shaders (nearly all of which are in parallel-rdp). Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>	2025-03-28 13:40:07 +00:00
Kenneth Graunke	86f8b8860e	brw: Use a smaller type for masked sub-32-bit shift values Cuts 0.14% of instructions on Alchemist in affected fossil-db shaders (all of which are in parallel-rdp). Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>	2025-03-28 13:40:07 +00:00
Kenneth Graunke	2e108afb8c	brw: Skip unnecessary UNDEFs for comparisons For example, SIMD16 W/UW fills an entire REG_SIZE so UNDEF isn't needed. No change in fossil-db. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>	2025-03-28 13:40:07 +00:00
Kenneth Graunke	b89e269a46	brw: Make a helper to emit UNDEF for temporaries containing small types Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>	2025-03-28 13:40:07 +00:00
Caio Oliveira	89a87fab66	brw: Remove extra SHADER_OPCODE_FLOW emitted during NIR conversion Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The DO() helper already emits a FLOW. Fixes: `d2c39b1779` ("intel/brw: Always have a (non-DO) block after a DO in the CFG") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33954>	2025-03-25 02:05:26 +00:00
Sagar Ghuge	bea9d79cb9	intel/compiler: Add support for MSAA typed load/store messages Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32690>	2025-03-07 23:06:14 +00:00
Caio Oliveira	8e2a7cb42d	brw: Embed at_end() inside brw_builder(brw_shader *) constructor All remaining uses of that constructor would also use at_end(), and vice-versa. So just implement that behavior in the constructor itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>	2025-03-06 23:33:38 +00:00
Kenneth Graunke	88309a9818	brw: Rename shared function enums for clarity Our name for this enum was brw_message_target, but it's better known as shared function ID or SFID. Call it brw_sfid to make it easier to find. Now that brw only supports Gfx9+, we don't particularly care whether SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs were confusingly tagged "GFX12" but aren't available on Gfx12.0; they were introduced with Alchemist/Meteorlake. GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds like the SFID to use for the sampler on Gfx6+, however it has nothing to do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID. On Haswell, we ran out of messages on the main data cache data port, and so they introduced two additional ones, for more messages. The modern Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think the "sampler" name came from some idea about reorganizing messages that never materialized (instead, the LSC came as a much larger cleanup). Recently we've adopted the term "HDC" for the legacy data cluster, as opposed to "LSC" for the modern Load/Store Cache. To make clear which SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2. We were also citing the G45, Sandybridge, and Ivybridge PRMs for a compiler that supports none of those platforms. Cite modern docs. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>	2025-02-27 08:49:24 +00:00
Caio Oliveira	d2c39b1779	intel/brw: Always have a (non-DO) block after a DO in the CFG Make the "block after DO" more stable so that adding instructions after a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW instruction that is a placeholder representing "go to the next block" and disappears at code generation. For some context, there are a few facts about how CFG currently works - Blocks are assumed to not be empty; - DO is always by itself in a block, i.e. starts and ends a block; - There are no empty blocks; - Predicated WHILE and CONTINUE will link to the "block after DO"; - When nesting loops, it is possible that the "block after DO" is another "DO". Reasons and further explanations for those are in the brw_cfg.c comments. What makes this new change useful is that a pass might want to add instructions between two DO instructions. When that happens, a new block must be created and any predicated WHILE and CONTINUE must be repaired. So, instead of requiring a repair (which has proven to be tricky in the past), this change adds a block that can be "virtually" empty but allow instructions to be added without further changes. One alternative design would be allowing empty blocks, that would be a deeper change since the blocks are currently assumed to be not empty in various places. We'll save that for when other changes are made to the CFG. The problem described happens in brw_opt_combine_constants, and a different patch will clean that up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>	2025-02-24 23:25:06 +00:00
Caio Oliveira	d32a5ab0e4	intel/brw: Use the builder DO() function in all places Shorter and a preparation to add some functionality to DO(). Had to make it const since that's the convention for builder, so just made all the sibling helpers const too. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>	2025-02-24 23:25:06 +00:00
Lionel Landwerlin	3bd4c5a166	brw: include UGM fence when TGM + lowered image->global Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32676>	2025-02-23 15:16:50 +00:00
Caio Oliveira	5c55b29d1a	intel/brw: Rename a few remaining functions to remove fs prefix Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>	2025-02-11 09:13:28 +00:00
Caio Oliveira	cf3bb77224	intel/brw: Rename fs_visitor to brw_shader Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>	2025-02-11 09:13:28 +00:00
Caio Oliveira	352a63122f	intel/brw: Rename files brw_fs.cpp/h to brw_shader.cpp/h Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>	2025-02-11 09:13:28 +00:00
Caio Oliveira	f8a979466b	intel/brw: Rename and move thread_payload types to own header Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>	2025-02-11 09:13:28 +00:00
Kenneth Graunke	d06c3e21ac	brw: Drop unnecessary mlen/header_size on virtual GET_BUFFER_SIZE op The logical send lowering code sets these, and is the code which -should- set these. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	ae60338142	brw: Lower MEMORY_FENCE and INTERLOCK in lower_logical_sends We teach lower_logical_sends to lower these to SHADER_OPCODE_SEND and drop all the corresponding generator and eu_emit code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	b9de19f917	brw: Eliminate the BTI source from MEMORY_FENCE/INTERLOCK opcodes Memory fences do not refer to an element of a binding table. Rather, the reason we had "BTI" in these opcodes was to distinguish what in modern terms are called UGM (untyped memory data cache) vs. SLM (cross-thread shared local memory) fences. Icelake and older platforms used the "data cache" SFID for both purposes, distinguishing them by having a special binding table index, 254, meaning "this is actually SLM access". This is where the notion that fences had BTIs came in. (In fact, prior to Icelake, separate SLM fences were not a thing, so BTI wasn't used there either.) To avoid confusion about BTI being involved, we choose a simpler lie: we have Icelake SLM fences target GFX12_SFID_SLM (like modern platforms would), even though it didn't really exist back then. Later lowering code sets it back to the correct Data Cache SFID with magic SLM binding table index. This eliminates BTI everywhere and an unnecessary source. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	c0a32af125	brw: Use correct builder size for MEMORY_FENCE/INTERLOCK virtual opcodes brw_memory_fence() overrides the instructions generated by the MEMORY_FENCE or INTERLOCK opcodes to be force_writemask_all with exec_size == 1. But the IR was emitting it in SIMD8 (regardless of dispatch width). Instead, just emit the IR as SIMD1/NoMask so the IR matches what we actually generate. Have size_written indicate that the entire destination is written, however, as it is ultimately going to be a SEND that writes a whole register. We were also using a UD register for the source of FS_OPCODE_SCHEDULING_FENCE when the generator overrides it to UW, so just specify UW in the IR as well so that they line up. Also add validation for MEMORY_FENCE/INTERLOCK that we've done the exec_size and masking right in the IR. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	accef5e8f5	brw: Replace fs_inst::target field with logical FB read/write sources We can just specify this as a source to the logical FB read/write opcodes. Notably FB reads had no sources before; now they have one. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	7390d6189c	brw: Replace fs_inst::pi_noperspective with a logical control source We already have logical pixel interpolator messages that get lowered to send messages. We can just add an extra boolean source to those opcodes rather than sticking a opcode-specific boolean in the generic fs_inst data structure. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Kenneth Graunke	168ac07ffd	brw: Eliminate fs_inst::shadow_compare brw_lower_logical_sends can just check for the TEX_LOGICAL_SRC_SHADOW_C source; we don't need a generic instruction bit for this. We used to have one because this was handled in the generator for older hardware before the advent of logical opcode lowering. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>	2025-02-08 01:07:22 +00:00
Caio Oliveira	f82bcd56fc	intel/brw: Add functions to allocate VGRF space Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33334>	2025-02-06 08:33:03 -08:00
Caio Oliveira	ea87bab4ce	intel/brw: Remove 'using namespace brw' directives Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33418>	2025-02-06 07:58:55 -08:00
Caio Oliveira	92085e7bab	intel/brw: Remove 'fs' prefix from brw_from_nir functions Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Dylan Baker <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33330>	2025-02-03 23:08:11 +00:00
Caio Oliveira	1332d84500	intel/brw: Rename file brw_fs_nir.cpp to brw_from_nir.cpp Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Dylan Baker <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33330>	2025-02-03 23:08:11 +00:00

1 2

85 commits