fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Ian Romanick	556e78f737	intel/brw/xe2+: Allow vec16 for cooperative matrix Xe2 will allow a B matrix large enough that it will be stored in a vec16. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	b6236dd8f3	intel/brw/xe2+: Adjust DPAS lowering to DP4A to accommodate larger GRF and SIMD16 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	77ef241577	intel/brw/xe2+: Scale size_written by reg_unit for DPAS Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	e368b8e01b	intel/brw/xe2+: Adjust size_read() for DPAS v2: Remov "DG2" from a comment because it applies to DG2 and Xe2. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	b051602754	intel/brw/xe2+: Catch invalid uses of writes_accumulator earlier It turns out the problem I was trying to catch in `be4fa59a72` ("intel/brw: Clear write_accumulator flag when changing the destination") also came from the DPAS lowering pass itself. Checking for invalid uses of the feature in fs_validate helped detect the problem. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	7a773ac53e	intel/brw: Major rework of lower_cmat_load_store The original goal was to get rid of a bunch of the magic constants sprinkled through the function. Once I did that, I realized that there was a lot my symmertry between the row-major and column-major paths possible. It's +6 lines of code, but about 15 of those lines are comments explaining things that were not obvious in the original code. v2: Save duplicated condition in a variable with a meaningful name. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:16:48 -07:00
Ian Romanick	ea6e10c0b2	intel/brw: Temporarily disable result=float16 matrix configs Even though the hardware does not naively support these configurations, there are many potential benefits to advertising them. These configurations can theoretically use half the memory bandwidth for loads and stores. For large matrices, that can be the limiting in performance. The current implementation, however, has a number of significant problems. The conversion from float16 to float32 is performed in the driver during conversion from NIR. As a result, many common usage patterns end up doing back-to-back conversions to and from float16 between matrix multiplications (when the result of one multiplication is used as the accumulator for the next). The float16 version of the matrix waste half the possible register space. Each float16 value sits alone in a dword. This is done so that the per-invocation slice of an 8x8 float16 result matrix and an 8x8 float32 result matrix will have the same number of elements. This makes it possible to do straightforward implementations of all the unary_op type conversions in NIR. It would be possible to perform N:M element type conversions in the backend using specialized NIR intrinsics. However, per #10961, this would be very, very painful. My hope is that, once a suitable resolution for that issue can be found, support for these configs can be restored. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 13:52:12 -07:00
Kenneth Graunke	5cb15a6c67	intel/brw: Make bld.ADD(x, 0) emit no instructions and return x directly There are a lot of places where we add 0 to an offset. Avoiding generating this can save us algebraic + copy_propagation later. Cuts compile time in Borderlands 3 by -0.590631% +/- 0.170108% (n=25). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>	2024-06-24 19:12:21 -07:00
Kenneth Graunke	068865ce81	intel/brw: Make an alu2 builder helper Instead of replicating the whole thing in macros, just make an alu2() function and use that in the wrappers. It ought to get inlined anyway. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>	2024-06-24 19:12:19 -07:00
Kenneth Graunke	c18de3f048	intel/brw: Delay liveness calculations in saturate propagation Wait and see if we actually have a candidate for saturate propagation before requesting liveness info. Saves the calculation in the case where we have nothing to do. Cuts compile time in Borderlands 3 by -0.304754% +/- 0.194162% (n=25). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>	2024-06-24 19:12:00 -07:00
Caio Oliveira	b59ea3d63f	intel/brw: Print SWSB information when dumping instructions These were only being shown before as part of disassemble. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29738>	2024-06-23 08:09:56 -07:00
Alyssa Rosenzweig	da752ed7c1	treewide: use nir_def_replace sometimes Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but it's a start. Coccinelle patch: @@ expression intr, repl; @@ -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(&intr->instr); +nir_def_replace(&intr->def, repl); Coccinelle patch: @@ identifier intr; expression instr, repl; @@ nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); ... -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(instr); +nir_def_replace(&intr->def, repl); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna] Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>	2024-06-21 15:36:56 +00:00
Lionel Landwerlin	339630ab05	brw: enable A64 loads source rematerialization Allows to avoid Wa_1407528679 on A64 loads Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	f482fc33cf	brw: blockify load_global_const_block_intel This intrinsic is pretty much equivalent to load_global_constant_uniform_block_intel, it just has a predicate. If the predicate is always true we can turn into into the other. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	6fe6b9c8fa	brw: avoid Wa_1407528679 in uniform cases When the surface handles are generated with exec_all, we can avoid emitting the workaround. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	5227b2db73	brw: annotation send instructions with surface handles generated with exec_all Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	b79e85a93f	brw: always use new registers for load address increments Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	7f1ca16e3b	brw: enable rematerialization of non 32bit uniforms Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	0531f568ac	brw: remove some brackets Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	11a634151b	brw: remove rematerialization assert The default case should lead us to the next rematerialization block so this is useless. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	d42bc0d3fc	brw: bound the amount of rematerialized NIR instructions Some of the instructions we don't need to rematerialize because we already know they are executed with NoMask so we can use their destination without reemitting them again. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	4bfb4f35a8	brw: improve rematalization of surface/sampler handles This change handles patterns like this con v0 = load_ubo ... con v1 = add v0, 0x30 con v2 = load_ubo v1, 0x0 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	c7b312ad45	brw: factor out source extraction for rematerialization Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Lionel Landwerlin	8fbbc9c301	brw: add missing break Not fixing anything because of the default case below. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>	2024-06-21 08:29:44 +00:00
Francisco Jerez	c1feccdd90	intel/fs/gfx20+: Fix surface state address on extended descriptors for NIR scratch intrinsics. The r0.5 thread payload register contains Surface State Offset bits [27:6] as bits [31:10], so we need to shift the register right by 4 in order to get the surface state offset expected in ExBSO mode, which is the only extended descriptor encoding supported by the UGM shared function for SS addressing on Xe2+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>	2024-06-21 01:49:43 +00:00
Kenneth Graunke	50519598ff	intel/brw: Skip discarding the interference graph We no longer need to reserve registers for constructing spill/fill messages. We have split sends and construct message headers in new temporary registers with a very short lifespan which are simply added to the existing interference graph as new nodes and allocated via the normal mechanism. This means that when we need to spill for the first time, we can avoid discarding and recomputing the entire interference graph. We also avoid needing to recreate all spill candidate information once ra_allocate() fails, because the graph remains valid, and none of the existing nodes had any changes to their interference. The existing spill candidates remain valid. This will slightly help improve compile time when needing to spill. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>	2024-06-20 09:47:18 +00:00
Kenneth Graunke	29d6264627	intel/brw: Build the scratch header on the fly for pre-LSC systems Instead of reserving a register to contain the spill header, which gets marked live for the entire program, we can just emit the ALU instructions to build it on the fly. (This is similar to the way we handle scratch on Alchemist with the newer LSC data port.) There are a couple of downsides that make this not obviously a win. First, in order to construct the scratch header on Gfx9-12, we have to use fields from g0, which will have to remain live anywhere that scratch access is required. This could negate the register pressure benefits of creating the header on the fly. However, g0 is oft used in other places anyway, so it may already be there. Another is that it's a non-trivial number of ALU instructions to construct the value. Still, trading lower pressure (so fewer spills, less memory access and stalls) for more cheap ALU seems like it ought to be a win. There is another valuable benefit: by not reserving a register, we eliminate the need to reconstruct the interference graph. (The next patch will actually do so.) shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt, and an 8% increase in ALU on affected shaders. Synmark's OglCSDof (a benchmark that spills) performance remains the same on Alderlake. fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a 4% reduction in scratch memory size on affected shaders. Instruction counts go up by 11.07%, but cycle estimates only increase by 0.57%. Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30% reductions in spills/fills, a significant improvement. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>	2024-06-20 09:47:18 +00:00
Caio Oliveira	2a9f4618c5	intel/brw: Make component_size() consistent between VGRF and FIXED_GRF Change so the size rounds up to the next multiple of the horizontal stride like is done for VGRF. This was causing an inconsistency in regs_read() -- The original component_size() calculation for FIXED_GRF excluded any padding at the end but it was still being discounted by regs_read(). Suggested by Curro. Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069 Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>	2024-06-19 01:33:58 +00:00
Caio Oliveira	8fb70f0746	intel/brw: Add unit tests for scoreboard handling FIXED_GRF with stride Based on shaders reported in https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29723. These currently fail, later patch will enable them. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>	2024-06-19 01:33:58 +00:00
Caio Oliveira	f982d2bb79	intel/brw: Fix typo in DPAS emission code The enums were mixed up. Code was working because they were being used only for their numerical values. Fixes: `e666872c75` ("intel/compiler: Initial bits for DPAS instruction") Acked-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29762>	2024-06-18 18:25:21 +00:00
Kenneth Graunke	9e750f00c3	intel/brw: Make opt_copy_propagation_defs clean up its own trash Copy propagation often eliminates all uses of an instruction. If we detect that we've done so, we can eliminate the instruction ourselves rather than leaving it hanging until the next DCE pass. This saves some CPU time as other passes don't see dead code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	2af84c2d49	intel/brw: Use the defs-based copy propagation along with the old one The new def-based pass works better in many cases, and should be less resource intensive. However, the limited visibility of the defs-based pass due to many values not being SSA yet makes it unable to fully replace the old pass. Try the new one, and if it can't make progress, then try the old one. That way, things will mostly be handled by the new pass, but everything that was being cleaned up still will be. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	580e1c592d	intel/brw: Introduce a new SSA-based copy propagation pass (Quite a few of the restrictions here are ported from the old pass.) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	9690bd369d	intel/brw: Delete old local common subexpression elimination pass We no longer use this older pass, so there's no need to keep it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	8f09c58ddc	intel/brw: Switch to the new defs-based global CSE pass While the limited visibility due to partial SSA is a downside to the new pass, it has a huge number of advantages that make it worth switching over even now. It's much more efficient, can eliminate redundant memory loads across blocks, and doesn't generate loads of unnecessary copies that other passes have to clean up. This means we also eliminate the infighting between the old CSE, coalescing, and copy propagation passes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	234c45c929	intel/brw: Write a new global CSE pass that works on defs This has a number of advantages compared to the pass I wrote years ago: - It can easily perform either Global CSE or block-local CSE, without needing to roll any dataflow analysis, thanks to SSA def analysis. This global CSE is able to detect and coalesce memory loads across blocks. Although it may increase spilling a little, the reduction in memory loads seems to more than compensate. - Because SSA guarantees that values are never written more than once, the new CSE pass can directly reuse an existing value. The old pass emitted copies at the point where it discovered a value because it had no idea whether it'd be mutated later. This led it to generate a ton of trash for copy propagation to clean up later, and also a nasty fragility where CSE, register coalescing, and copy propagation could all fight one another by generating and cleaning up copies, leading to infinite optimization loops unless we were really careful. Generating less trash improves our CPU efficiency. - It uses hash tables like nir_instr_set and nir_opt_cse, instead of linearly walking lists and comparing each element. This is much more CPU efficient. - It doesn't use liveness analysis, which is one of the most expensive analysis passes that we have. Def analysis is cheaper. In addition to CSE'ing SSA values, we continue to handle flag writes, as this is a huge source of CSE'able values. These remain block local. However, we can simply track the last flag write, rather than creating entire sets of instruction entries like the old pass. Much simpler. The only real downside to this pass is that, because the backend is currently only partially SSA, it has limited visibility and isn't able to see all values. However, the results appear to be good enough that the new pass can effectively replace the old pass in almost all cases. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	2b30b3bbd4	intel/brw: Print defs in dump_instructions Like NIR, we print SSA defs as %1, %2, and so on. The number here is the VGRF number. VGRFs that don't correspond to a SSA def remain printed as vgrf1, vgrf2, and so on. This makes it much easier to see what values are SSA and which aren't. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Caio Oliveira	08da7edc0e	intel/brw: Track the number of uses of each def in def_analysis Even without a full use list, simply tracking the number of uses will let us tell "this is the only use of the def" or "we've just replaced all uses of a def". It's inexpensive to calculate and will be useful. (rebased by Kenneth Graunke) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	0d144821f0	intel/brw: Add a new def analysis pass This introduces a new analysis pass that opportunistically looks for VGRFs which happen to satisfy the SSA definition properties. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	ad9e414aa9	intel/brw: Skip LOAD_PAYLOADs after every texture instruction if possible This avoids generating a bunch of trash we have to clean up later. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	84219892ad	intel/brw: Make gl_SubgroupInvocation lane index loading SSA Our code to initialize gl_SubgroupInvocation uses multiple instructions some of which are partial writes. This makes it difficult to analyze expressions involving gl_SubgroupInvocation, which appear very frequently in compute shaders. To make this easier, we add a new virtual opcode which initializes a full VGRF to the value of gl_SubgroupInvocation. (We also expand it to UD for SIMD8 so there are not partial write issues.) We then lower it to the original code later on in compilation, after we've done the bulk of our optimizations. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	344d4ee9f0	intel/brw: Make VEC() perform a single write to its destination. This gathers a number of sources into a contiguous vector register, typically using LOAD_PAYLOAD. However, it uses MOV for a single source. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Francisco Jerez	06e4e088a3	intel/brw/xe2+: Use active-thread-only barriers available since Xe2+. These allow avoiding dead-locks in non-compliant applications that execute barriers under non-uniform control flow. They're not expected to have any major disadvantage so let's enable them unconditionally. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>	2024-06-17 16:19:18 -07:00
Alyssa Rosenzweig	15257b65c6	treewide: use nir_metadata_control_flow Via Coccinelle patch: @@ @@ -nir_metadata_block_index \| nir_metadata_dominance +nir_metadata_control_flow ...plus some manual fixups for call sites missed by coccinelle. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>	2024-06-17 16:28:14 -04:00
Daniel Schürmann	7af16e9f1e	nir/shader_info: remove uses_demote This flag is mostly redundant with uses_discard and was only introduced to implement demote with LLVM when it didn't have that intrinsic. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>	2024-06-17 19:37:16 +00:00
Daniel Schürmann	9b1a748b5e	nir: remove nir_intrinsic_discard The semantics of discard differ between GLSL and HLSL and their various implementations. Subsequently, numerous application bugs occurred and SPV_EXT_demote_to_helper_invocation was written in order to clarify the behavior. In NIR, we now have 3 different intrinsics for 2 things, and while demote and terminate have clear semantics, discard still doesn't and can mean either of the two. This patch entirely removes nir_intrinsic_discard and nir_intrinsic_discard_if and replaces all occurences either with nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the case that the NIR option 'discard_is_demote' is being set. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>	2024-06-17 19:37:16 +00:00
Daniel Schürmann	f3d8bd18dd	nir: introduce discard_is_demote compiler option This new option indicates that the driver emits the same code for nir_intrinsic_discard and nir_intrinsic_demote. Otherwise, it is assumed that discard is implemented as terminate. spirv_to_nir uses this option in order to directly emit nir_demote in case of OpKill. RADV GFX11: Totals from 3965 (4.99% of 79439) affected shaders: MaxWaves: 119418 -> 119424 (+0.01%); split: +0.03%, -0.03% Instrs: 1608753 -> 1620830 (+0.75%); split: -0.18%, +0.93% CodeSize: 8759152 -> 8785152 (+0.30%); split: -0.18%, +0.48% VGPRs: 152292 -> 149232 (-2.01%); split: -2.37%, +0.36% Latency: 9162314 -> 10033923 (+9.51%); split: -0.46%, +9.97% InvThroughput: 1491656 -> 1493408 (+0.12%); split: -0.10%, +0.22% VClause: 21424 -> 21452 (+0.13%); split: -0.31%, +0.44% SClause: 53598 -> 55871 (+4.24%); split: -2.15%, +6.39% Copies: 90553 -> 90462 (-0.10%); split: -2.91%, +2.81% Branches: 16283 -> 16311 (+0.17%) PreSGPRs: 113993 -> 113254 (-0.65%); split: -1.84%, +1.19% PreVGPRs: 110951 -> 108914 (-1.84%); split: -2.08%, +0.24% VALU: 963192 -> 963167 (-0.00%); split: -0.01%, +0.01% SALU: 87926 -> 90795 (+3.26%); split: -2.92%, +6.18% VMEM: 25937 -> 25936 (-0.00%) SMEM: 110012 -> 109799 (-0.19%); split: -0.20%, +0.01% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>	2024-06-17 19:37:15 +00:00
Lionel Landwerlin	13dc2a28ce	intel/fs: fix lower_simd_width for MOV_INDIRECT MOV_INDIRECT picks one lane from the src[0] and moves it to all lanes in the destination. Even if we split the instruction, src[0] should remain identical. Noticed this while trying to use this instruction in SIMD32. All current use cases are limited to SIMD8 shaders (or SIMD16 on Xe2). Or maybe in SIMD32 but with a uniform src[0]. That's we think we've never seen the issue so far. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28036>	2024-06-14 22:21:26 +00:00
Sviatoslav Peleshko	5ca51156e2	intel/elk: Actually retype integer sources of sampler message payload According to PRMs: "All parameters are of type IEEE_Float, except those in the The ld*, resinfo, and the offu, offv of the gather4_po[_c] instruction message types, which are of type signed integer." Currently, we load parameters with the correct types, but use them as send sources with the default float type, which may confuse passes downstream. Fix this by actually storing the retyped sources. Cc: mesa-stable Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>	2024-06-12 18:59:17 +00:00
Sviatoslav Peleshko	2358c997f3	intel/brw: Actually retype integer sources of sampler message payload According to PRMs: "All parameters are of type IEEE_Float, except those in the The ld*, resinfo, and the offu, offv of the gather4_po[_c] instruction message types, which are of type signed integer." Currently, we load parameters with the correct types, but use them as send sources with the default float type, which may confuse passes downstream. Fix this by actually storing the retyped sources. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11118 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>	2024-06-12 18:59:17 +00:00

1 2 3 4 5 ...

3563 commits