fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Faith Ekstrand	83fd7a5ed1	intel: Use nir_lower_tex_options::lower_index_to_offset Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>	2023-03-06 21:38:32 +00:00
Caio Oliveira	c80268a20d	intel/compiler: Mark various memory barriers intrinsics unreachable Now that both SPIR-V and GLSL are using scoped barriers, we can stop handling the specialized ones. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>	2023-02-27 20:24:01 +00:00
Daniel Schürmann	2bb369dd8d	nir: add assertions that loops don't have a Continue Construct Hoping that I didn't miss any, this should add assertions to all functions and passes which explicitly handle 'nir_loop'. Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>	2023-02-21 10:41:11 +00:00
Lionel Landwerlin	9ac192d79d	intel/fs: bound subgroup invocation read to dispatch size This is to avoid out of bound register accesses (potentially leading to hangs) when the dispatch size is smaller than when is reported in the NIR subgroup_size. v2: Implement bounding with a mask (since workgroup sizes are powers of 2) (Faith) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `530de844ef` ("intel,anv,iris,crocus: Drop subgroup size from the shader key") Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21282>	2023-02-14 21:29:42 +00:00
Jason Ekstrand	949b42c4dc	intel/compiler: Convert wm_prog_key::multisample_fbo to a tri-state This allows us to communicate to the back-end that we don't actually know if the framebuffer is multisampled or not. No drivers set anything but ALWAYS/NEVER and we still have a few ALWAYS/NEVER assumptions but those should be asserted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:18 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Sagar Ghuge	0c083d29a5	intel/fs: Always stall between the fences on Gen11+ Be conservative in Gfx11+ and always stall in a fence. Since there are two different fences, and shader might want to synchronize between them. This change also brings back the original code block for the stall between the fence and comment from the commit `b390ff3517`. v2: (Caio) - Re-arrange code block. - Adjust comment. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6958 Fixes: `f7262462` ("intel/fs: Rework fence handling in brw_fs_nir.cpp") Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Tested-by: Mark Janes <markjanes@swizzler.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20996>	2023-02-02 00:21:21 +00:00
Amber	ab4c2990ed	intel/compiler: use lower_image_samples_to_one Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Amber Amber <amber@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>	2023-02-01 19:52:49 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Marcin Ślusarz	9bb18a4f9e	intel/compiler: fix generation of vec8/vec16 alu instruction I stumbled on this when I inserted some suboptimal lowering code after all optimizations. Adding certain subset of optimizations after my lowering code actually avoided this bug, so I think it's not possible to hit this on upstream. Let's fix this for the next person generating suboptimal code... Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>	2023-01-24 13:15:58 +00:00
Kenneth Graunke	16b66ab659	intel/compiler: Drop dest checking in atomic code NIR atomic operation intrinsics all have destinations. This is just copy and pasted from other generic intrinsic handling where that may or may not be the case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	780f3e2e6b	intel/compiler: Delete all the A64 atomic variants for type sizes These are handled identically in almost all cases. There is one place in the legacy surface lowering that was obtaining the bitsize from the opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8) for that and works just fine. If we just do that in the legacy lowering too, then we don't need this plethora of opcodes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	03ddde1230	intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper These are basically identical save for: - shared has surface hardcoded to SLM rather than an SSBO index - shared has to handle adding the 'base' const_index (SSBO have none) - the NIR source index for data is shifted by one It's not worth copy and pasting the entire function for this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	b84939c678	intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float() These are now basically identical to their non-float counterparts. The only thing that differed was the opcode checking to determine which operands existed. Now that we have a unified opcode enum and a helper for the number of data operands, we can just use that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	f7b29d7924	intel/compiler: Drop redundant 32-bit expansion for shared float atomics We already expanded data to 32-bit a few lines earlier, so this is just redundantly doing it a second time. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	02129eee3a	intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT The only reason for the separate opcode was because of the overlapping BRW_AOP_* enums, making it impossible to tell whether a particular AOP was the integer or float operation. Now that we use the lsc_opcode enums, we can just have the legacy lowering inspect the opcode and select the right descriptor. No need for a separate opcode. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	90a2137cd5	intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs This gets our logical atomic messages using the lsc_opcode enum rather than the legacy BRW_AOP_* defines. We have to translate one way or another, and using the modern set makes sense going forward. One advantage is that the lsc_opcode encoding has opcodes for both integer and floating point atomics in the same enum, whereas the legacy encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX), which made it impossible to handle both sensibly in common code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	8d2dc52a14	intel/compiler: Move atomic op translation into emit_*_atomic() There's no need to pass both the intrinsic and an opcode computed from that same intrinsic. Just do it in the functions themselves. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Francisco Jerez	4a2e7306dd	intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics. This avoids a violation of the Vulkan memory model that was leading to intermittent failures of at least 8k test-cases of the Vulkan CTS (within the group dEQP-VK.memory_model.) on TGL and DG2 platforms. In theory the issue may be reproducible on earlier platforms like IVB and ICL, but the SYNC.ALLWR instruction is not available on those platforms so a different (likely costlier) fix will be needed. The issue occurs within the sequence we emit for a NIR memory barrier with acquire semantics requiring the synchronization of multiple caches, e.g. in pseudocode for a barrier involving the TGM and UGM caches on DG2: x <- load.ugm // Atomic read sequenced-before the barrier y <- fence.ugm z <- fence.tgm wait(y, z) w <- load.tgm // Read sequenced-after the barrier In the example we must provide the guarantee that the memory load for x is completed before the one for w, however this ordering can be reversed with the intervention of a concurrent thread, since the UGM fence will block on the prior UGM load and potentially take a long time, while the TGM fence may complete and invalidate the TGM cache immediately, so a concurrent thread could pollute the TGM cache with stale contents for the w location before* the UGM load has completed, leading to an inversion of the expected memory ordering. v2: Apply the workaround regardless of whether the NIR barrier intrinsic specifies multiple storage classes or a single one, since an acquire barrier is required to order subsequent requests relative to previous atomic requests of unknown storage class not necessarily specified by the memory scope information of the intrinsic. Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>	2023-01-18 21:34:33 -08:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Kenneth Graunke	88918baf5c	intel/compiler: Delete key->msaa_16 None of the drivers have used this since we dropped i965, and BLORP no longer uses it as of the previous commit. We can also drop the former compressed_multisample_tex_mask (now padding) field so that things remain 64-bit aligned. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>	2022-12-09 10:18:25 +00:00
Kenneth Graunke	584e18863e	intel: Drop compressed_multisample_layout_mask from the compiler keys The compiler looks at this key field to determine whether to perform an MCS fetch for a txf_ms or samples_identical texture message, if a nir_tex_src_ms_mcs_intel source wasn't provided. If it isn't set, it instead uses constant 0 (nothing is compressed). All of the drivers (iris, crocus, anv, hasvk) unconditionally set this to ~0 because we don't want to pay for costly shader recompiles (which can cause nasty stuttering). Most textures are compressed anyway, and the hardware ignores the l2dms MCS parameter if MCS is disabled. The only user was BLORP, which sets the key field based on whether the texture's aux usage has MCS. But if it has MCS, it also does the MCS fetch itself and supplies it directly. Otherwise, it relies on the compiler to fill in the 0 value. But it could easily just provide the 0 value itself in that case and not rely on the compiler at all. With that fixed, we can just drop the key fields entirely. We leave them as padding for now to avoid repacking structures; we won't need to after the next commits anyway. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>	2022-12-09 10:18:25 +00:00
Jason Ekstrand	7d2e3f660c	intel/fs: Support load_workgroup_id_zero_base Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>	2022-12-01 04:56:48 +00:00
Lionel Landwerlin	9c1c1888d9	intel/fs: put scratch surface in the surface state heap In `4ceaed7839` we made scratch surface state allocations part of the internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress) so that it doesn't uses slots in the application's expected 1M descriptors (especially with vkd3d-proton). But all our compiler code relies on BSS (STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress). The additional issue is that there is only 26bits of surface offset available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for scratch surfaces. So we need the drivers to put the scratch surfaces in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress (hence all the driver changes). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4ceaed7839` ("anv: split internal surface states from descriptors") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>	2022-11-19 14:58:58 +00:00
Ian Romanick	351b8c6aec	intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7 Even though Intel's CI doesn't test these old platforms anymore, the validation added in "intel/eu/validate: Validate integer multiplication source size restrictions" combined with full shader-db runs gives me confidence in the changes. Sandy Bridge total instructions in shared programs: 13902341 -> 13902167 (<.01%) instructions in affected programs: 30771 -> 30597 (-0.57%) helped: 66 / HURT: 0 total cycles in shared programs: 741795500 -> 741791931 (<.01%) cycles in affected programs: 987602 -> 984033 (-0.36%) helped: 28 / HURT: 5 Iron Lake total instructions in shared programs: 8365806 -> 8365754 (<.01%) instructions in affected programs: 1766 -> 1714 (-2.94%) helped: 10 / HURT: 0 total cycles in shared programs: 248542694 -> 248542378 (<.01%) cycles in affected programs: 29836 -> 29520 (-1.06%) helped: 9 / HURT: 0 GM45 total instructions in shared programs: 5187127 -> 5187101 (<.01%) instructions in affected programs: 891 -> 865 (-2.92%) helped: 5 / HURT: 0 total cycles in shared programs: 163643914 -> 163643750 (<.01%) cycles in affected programs: 22206 -> 22042 (-0.74%) helped: 5 / HURT: 0 Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>	2022-11-09 21:34:26 +00:00
Ian Romanick	293ad13e3f	intel/fs: Slightly restructure emitting nir_op_imul_32x16 and nir_op_umul_32x16 There are no immediate values at this point, so all of this code was bunk. :face_palm: Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>	2022-11-09 21:34:26 +00:00
Caio Oliveira	22d8ed84b8	intel/compiler: Remove unused fs_visitor::emit_percomp() Since `7ef7738a61` ("i965: Write gl_FragCoord directly to the destination.") this is not used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>	2022-11-08 07:33:09 +00:00
Rohan Garg	43169dbbe5	intel/compiler: Support 16 bit float ops Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>	2022-10-17 15:56:28 +02:00
Lionel Landwerlin	9dba8d8aa1	intel/fs: take a builder arg for resolve_source_modifiers() There will be situations where we will want to use a local builder rather than the one associated with NIR->backend translation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
José Roberto de Souza	f4857591e1	intel/compiler/fs: Use DF to load constants when has_64bit_int is not supported This was already been done to gen7 platforms, so now extending to all platforms without has_64bit_int. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>	2022-09-14 19:32:43 +00:00
Caio Oliveira	0b6e613de8	intel/compiler: Create and use struct for CS thread payload Move subgroup_id, that's only used by CS for verx10 < 125, as part of the payload too -- even though is not, strictly speaking. Note the thread execution of Task/Mesh is similar enough, so we make their common struct inherit from cs_thread_payload. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	9de790760e	intel/compiler: Create and use struct for Bindless thread payload Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	a70378f292	intel/compiler: Store start of ICP handles in GS thread payload struct Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	5b6987daee	intel/compiler: Create and use struct for GS thread payload Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	19c6e1b447	intel/compiler: Create and use struct for TES thread payload Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	eb837dd23b	intel/compiler: Store start of ICP handles in TCS thread payload struct Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	2622fc3af1	intel/compiler: Store Primitive ID in TCS thread payload struct Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	9a9b1119b4	intel/compiler: Store Patch URB output in TCS thread payload struct Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Jordan Justen	af8ab4a889	intel/compiler: Use builder to allocate fs regs for gs control data bits Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>	2022-09-12 10:00:28 -07:00
Caio Oliveira	00b8f9a3a6	intel/compiler: Use builder to allocate fs regs for TCS store output Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>	2022-09-12 10:00:18 -07:00
Caio Oliveira	55db3aaa3a	intel/compiler: Create fs_visitor::emit_tcs_barrier() Allow us to implement this in brw_fs_visitor.cpp, which then will let us deduplicate code between the CS-like barrier and the TCS barrier in a later patch. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18362>	2022-09-09 09:35:08 -07:00
Kenneth Graunke	19fc870ac6	intel/compiler: Use subgroup invocation for ICP handle loads When loading a TCS or GS input, we generate some code to read the URB handle for a particular input control point (ICP handle), which often involves indirect addressing due to a non-constant vertex. For example: mov(8) vgrf148+0.0:UW, 76543210V shl(8) vgrf149:UD, vgrf148+0.0:UW, 2u shl(8) vgrf150:UD, vgrf145:UD, 5u add(8) vgrf151:UD, vgrf150:UD, vgrf149:UD mov_indirect(8) vgrf147:UD, g2:UD, vgrf151:UD, 96u Unfortunately, the first load with 76543210V is considered a partial write because the 8 channels of 16-bit UW data doesn't fill an entire register, and we can't allocate VGRFs at sub-register granularity. This causes none of the above math to be CSE'd, even though the first two instructions are common to all input loads, and the rest may be reused sometimes as well. To work around this, we stop emitting 76543210V to a temporary, and instead use nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION], which already contains this value, and is unconditionally set up for us. With all input loads using the same register for the sequence, our CSE pass is able to eliminate the rest of the common math. shader-db results on Tigerlake: total instructions in shared programs: 20748243 -> 20744844 (-0.02%) instructions in affected programs: 73410 -> 70011 (-4.63%) helped: 242 / HURT: 21 helped stats (abs) min: 1 max: 37 x̄: 14.17 x̃: 15 helped stats (rel) min: 0.17% max: 19.58% x̄: 6.13% x̃: 6.32% HURT stats (abs) min: 1 max: 4 x̄: 1.38 x̃: 1 HURT stats (rel) min: 0.18% max: 1.31% x̄: 0.58% x̃: 0.58% 95% mean confidence interval for instructions value: -13.73 -12.12 95% mean confidence interval for instructions %-change: -6.00% -5.19% Instructions are helped. total cycles in shared programs: 785828951 -> 785788480 (<.01%) cycles in affected programs: 597593 -> 557122 (-6.77%) helped: 227 / HURT: 13 helped stats (abs) min: 6 max: 624 x̄: 182.19 x̃: 185 helped stats (rel) min: 0.24% max: 18.22% x̄: 7.85% x̃: 7.80% HURT stats (abs) min: 2 max: 153 x̄: 68.08 x̃: 36 HURT stats (rel) min: 0.03% max: 7.79% x̄: 2.97% x̃: 1.25% 95% mean confidence interval for cycles value: -182.55 -154.71 95% mean confidence interval for cycles %-change: -7.84% -6.69% Cycles are helped. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18455>	2022-09-08 15:12:41 +00:00
Marcin Ślusarz	66bc9aec65	intel/compiler: add support for non-zero base in [load\|store]_shared intrins Acked-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17618>	2022-08-29 12:42:40 +00:00
Lionel Landwerlin	407f2beb97	intel/fs: port block a64/surface messages to use LSC v2: Fixup block load/store on surfaces/shared-memory (Rohan) v3: drop write specific size_written case (Rohan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>	2022-08-24 17:51:40 +00:00
Caio Oliveira	bee2df64d2	intel/compiler: Use fs_reg helpers for GS icp_handle selection Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>	2022-08-24 01:42:23 +00:00
Caio Oliveira	b4aff6ab49	intel/compiler: Use fs_reg helpers for TCS icp_handle selection Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>	2022-08-24 01:42:22 +00:00
Caio Oliveira	a1b1fdf70d	intel/compiler: Rename 8_PATCH to MULTI_PATCH Make it clearer we are dealing with multiple patches, works better in constrast with SINGLE_PATCH. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>	2022-08-24 00:39:57 +00:00
Lionel Landwerlin	3c78e94ff3	intel/fs: fixup scratch load/store handling on Gfx12.5+ We did not handle the operation with data size < 4. It works fine on all other messages (global/shared). The initial commit was just too restrictive. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1e242785c3` ("intel/fs: Implement load/store_scratch on XeHP") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>	2022-08-23 22:19:16 +00:00
Lionel Landwerlin	46a13404c0	intel/fs: fix load_scratch intrinsic The selection of the internal opcode to deal with load_scratch is incorrect. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c643979228` ("intel/fs: Choose memory message type based on bit size") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>	2022-08-23 22:19:16 +00:00
Kenneth Graunke	2cea0d6ef6	intel/compiler: Drop variable group size lowering This backend lowering code has been dead since the removal of i965 - nothing in the current source tree ever sets the flag. This is handled by iris_setup_uniforms() and crocus_setup_uniforms(). Variable group size does not appear to be a feature in anv. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>	2022-08-18 16:17:03 +00:00

1 2 3 4 5 ...

446 commits