fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 00:30:13 +01:00

Author	SHA1	Message	Date
Job Noorman	144121b6df	ir3/dce: support partial writes from collects When alias.rt is used to alias certain output components, we might end up with a situation where some, but not all, of the components of collects end up being unused. This is currently not supported which means we end up with useless moves (coming from copy lowering) for aliased output components. Fix this by adding support for partial wrmasks for collects in DCE. The wrmasks are initially zeroed out and then updated based on the wrmask of their users. Sources of collects for which the corresponding dst ends up being unused are treated as unused as well. This allows us to remove the useless output moves by simply updating the wrmask of the end sources. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	a7a357f91d	ir3/legalize: insert (sy) to read consts after ldc.k Observed when reading consts in the preamble using alias.rt. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	96e08c3859	ir3/legalize: insert (ss) to read consts after stc Observed when reading consts in the preamble using alias.rt. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	9b6bca52d5	ir3: optimize alias register allocation by reusing GPRs Allocate alias registers for an alias group while trying to minimize the number of needed aliases. That is, if the allocated GPRs for the group are (partially) consecutive, only allocate aliases to fill-in the gaps. For example: sam ..., @{r1.x, r5.z, r1.z}, ... only needs a single alias: alias.tex.b32.0 r1.y, r5.z sam ..., r1.x, ... Also, try to reuse allocations of previous groups. For example, this is relatively common: sam ..., @{r2.z, 0}, @{0} Reusing the allocation of the first group for the second one gives this: alias.tex.b32.0 r2.w, 0 sam ..., r2.z, r2.w Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	3fb0f54d70	ir3: add support for alias.tex alias.tex allows us to construct an "alias table" that creates a mapping between virtual alias registers and concrete GPRs, consts, or immediates. The following texture instruction will lookup its sources in this table and use the mapped value instead. This has a few advantages: - We don't have to allocate consecutive registers (necessary for many tex sources) as we can just map them to consecutive alias registers. - We don't have to allocate GPRs at all for consts and immediates. - There's no delay penalty when initializing alias registers with consts or immediates. For example, this code: mov.u32u32 r1.x, r3.z mov.u32u32 r1.y, c0.x mov.u32u32 r1.z, 0 (rpt2)nop sam ..., r1.x, ... Can be implemented as follows: alias.tex.b32.2 r40.x, r3.z alias.tex.b32.0 r40.y, c0.x alias.tex.b32.0 r40.z, 0 sam ..., r40.x, ... Note that the alias registers (r40.xyz in this case) do not occupy GPR space. (More intelligent allocation strategies are possible; e.g., just mapping r3.w and r4.x to c0.x and 0. This is implemented by the next commit.) Support for alias.tex is implemented in two passes in ir3. In a first pass, sources of tex instructions are replaced by alias sources (IR3_REG_ALIAS) as follows: - movs from const/imm: replace with the const/imm; - collects: replace with the sources of the collect; - GPR sources: simply mark as alias. This way, RA won't be forced to allocate consecutive registers for collects and useless collects/movs can be DCE'd. Note that simply lowering collects to aliases doesn't work because RA would assume that killed sources of aliases are dead, while they are in fact live until the tex instruction that uses them. The second pass inserts alias.tex instructions in front of the tex instructions that need them and fixes up the tex instruction's sources. This pass needs to run post-RA as discussed above. It also needs to run post-legalization as all the sync flags need to be inserted based on the registers instructions actually use, not on the alias registers they have as sources. This commit uses a very simple allocation strategy for alias registers: simply allocate consecutive registers starting from r40.x. Note that this works because the alias table is reset after a tex instruction is executed so we don't have to worry about aliasing a live register. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	4a9faaae17	ir3: add ir3_compiler::has_alias Flag to detect support for alias.rt/alias.tex available in a7xx. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	c5c95f8916	ir3: add validation for alias Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:24 +00:00
Job Noorman	84b93cf718	ir3: introduce alias goups Alias registers allow us to allocate non-consecutive registers and remap them to consecutive ones using alias.tex. We implement this by adding the sources of collects directly to the sources of their users. This way, RA treats them as scalar registers and we can remap them to consecutive registers afterwards. To keep track of the scalar sources that should be remapped together, the IR3_REG_FIRST_ALIAS flag is introduced. Every source of such an "alias group" will have the IR3_REG_ALIAS set, while the first one will also have IR3_REG_FIRST_ALIAS set. This commit also adds a number of helpers to iterate over sources while keeping track of the original src index (i.e., before they were expanded to alias goups), and to iterate the sources within an alias group. It also introduces a new notation (@{regs...}) to clearly show alias groups when printing instructions. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	4c2fc07a7e	ir3: teach backend about alias Take the properties of alias.{rt,tex} and its registers into account: - Don't count alias registers for GPR usage; - Allow all immediates in alias regs; - Fix properties like is_barrier and (ss) support; - alias.rt dst is not a GPR, don't use it in legalize/postsched to track dependencies; Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	a325573aaf	ir3/print: add support for alias Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	fb9de08efd	ir3/a7xx: document alias.rt It works completely differently from alias.tex. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	d9241c6360	ir3/a7xx: handle alias.rt dst alias.rt writes to a render target, not a GPR. Render targets are disassembled as rtN.c. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	dab47b55ef	ir3/a7xx: implement and document unknown alias field The UNK field encodes the table size for alias.tex: the first alias.tex instruction uses it to indicate how many follow (i.e., it is the total table size minus one). Also switch from using a src to a cat7 field to store this value which makes it a bit easier to handle. Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	af7c6f8dd5	ir3/a7xx: disasm halfness of alias dst Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	c4d84a8675	ir3/a7xx: properly handle alias scope and type The alias scope and type bits are intertwined in the encoding: - bit 47: low scope - bit 48: type - bit 49: high scope - bit 50: type size Combining the low and high scope bits, the value is used as follows: - 0: tex - 1: rt - 2: mem - 3: mem I don't know what the difference between 2 and 3 is. The blob currently doesn't use mem at all. The type bit seems to be used to make a distinction between floating point (f) and integer (b) sources. There doesn't seem to be any functional difference and it only affects how immediates are displayed. Note that I haven't exactly mimicked the blob in these cases: - alias.tex.f16/32: the blob uses b16/32 while printing immediates in floating point notation. I think it make more sense to use f16/32. - alias.rt.b16/32: the blob uses i16/32 here. I think it makes more sense to stick to a single notation (b). Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Job Noorman	2f629810aa	ir3/parser: fix parsing integer as float Signed-off-by: Job Noorman <jnoorman@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222>	2025-01-23 06:26:23 +00:00
Connor Abbott	15642c8ec2	tu: Handle non-identity GMEM swaps for input attachments I believe nothing currently tests this, but this should be required by analogy with the previous commit. Fixes: `247d11d635` ("tu: Allow UBWC with images with swapped formats.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>	2025-01-23 05:53:40 +00:00
Connor Abbott	a104a7ca1a	tu: Handle non-identity GMEM swaps when resolving There is a single swap field for each color attachment, regardless of whether it's in GMEM or not, and this does appear to be used in GMEM mode when MUTABLEEN is set on the attachment. This means that when a color attachment has a non-identity swap because it's mutable on a750, we have to use the same corresponding swap when it's a source in a GMEM resolve. When using the fastpath, we have to make sure that the swaps match because there aren't separate fields for GMEM and sysmem swap. This fixes dEQP-VK.image.mutable.2d.*_b8g8r8a8_unorm_draw_copy_resolve with TU_DEBUG=gmem. Fixes: `247d11d635` ("tu: Allow UBWC with images with swapped formats.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>	2025-01-23 05:53:40 +00:00
Connor Abbott	450755bd40	tu: Use image view format for sysmem resolves The spec says that we're supposed to do this. This fixes the newly-introduced tests dEQP-VK.image.mutable.._draw_copy_resolve with TU_DEBUG=sysmem. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33115>	2025-01-23 05:53:40 +00:00
Connor Abbott	47a85815b0	radv: Delete acceleration structure stubs These are now provided by common code. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>	2025-01-23 05:16:58 +00:00
Connor Abbott	987e499253	anv: Delete acceleration structure stubs These are now provided by common code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>	2025-01-23 05:16:58 +00:00
Connor Abbott	3141033ac2	vk/bvh: Add default stubs for unsupported entrypoints We don't currently support building acceleration structures on the CPU or indirect building in the common framework, and drivers using it don't either, but drivers have to return non-NULL entrypoints for CPU building functions if they claim to support VK_KHR_acceleration_structure. Add stub entrypoints here so that drivers don't have to have this boilerplate. Fixes dEQP-VK.api.version_check.entry_points on turnip. Fixes: `671e3a65a6` ("tu: Support VK_KHR_acceleration_structure") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33153>	2025-01-23 05:16:58 +00:00
Eric Engestrom	762cd246ee	docs/release-calendar: push back the 24.3.x releases by one week Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>	2025-01-23 03:09:36 +00:00
Eric Engestrom	835ecc5758	docs: add sha sum for 24.3.4 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>	2025-01-23 03:09:36 +00:00
Eric Engestrom	e5ca260032	docs: add release notes for 24.3.4 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>	2025-01-23 03:09:36 +00:00
Eric Engestrom	3d3ac0de25	docs: update calendar for 24.3.4 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33171>	2025-01-23 03:09:36 +00:00
Danylo Piliaiev	244e408341	ir3: Consider const alloc alignment in free space size calcs The alignment was considered only for offset, but its users (at least ir3_nir_opt_preamble) expect the size itself to also be aligned. Fixes tests: dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tessc dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse gmem-dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse Fixes: `922ef8e720` ("ir3: Make allocation of consts more generic and order independent") Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33161>	2025-01-23 02:31:48 +00:00
Daniel Schürmann	1feb733cd4	Revert "nir: add nir_clear_divergence_info, use it in nir_opt_varyings" This reverts commit `9d043e138d`. It is no longer needed. nir_convert_from_ssa() is now capable to ignore divergence information. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>	2025-01-23 01:31:24 +00:00
Daniel Schürmann	f3be7ce01b	nir/from_ssa: only consider divergence if requested This pass used to unconditionally use divergence information which forced the caller to either call divergence_analysis or ensure that the divergence is properly reset. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>	2025-01-23 01:31:23 +00:00
Marek Olšák	e7214b9446	glapi: rename exported symbols so as not to conflict with old libglapi libwaffle 1.7.0 has a hack that dlopen's libglapi with RTLD_GLOBAL, which was meant to preload libglapi, but with this MR it overwrites libgallium's own symbols, which breaks libgallium. Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Eric Engestrom <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>	2025-01-23 00:49:05 +00:00
Marek Olšák	6e3ee3a072	loader: improve the existing loader-libgallium non-matching version error Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Eric Engestrom <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>	2025-01-23 00:49:05 +00:00
Marek Olšák	464dde302c	glapi: remove the remap table it's unused now Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Eric Engestrom <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>	2025-01-23 00:49:05 +00:00
Marek Olšák	b22f682a31	glapi: stop using the remap table The remap table adds an array lookup into 75% of CALL_* macros, which are used to call GL functions through the dispatch table. Removing the array lookup reduces overhead of dispatch table calls. Since libglapi is now required to be from the same build as libgallium, the remap table is no longer needed. This change doesn't remove the remapping table. It only disables it. Compare asm: Before: 0000000000000000 <_mesa_unmarshal_Uniform1f>: 0: f3 0f 1e fa endbr64 4: 48 83 ec 08 sub $0x8,%rsp 8: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # f <_mesa_unmarshal_Uniform1f+0xf> f: 8b 4e 04 mov 0x4(%rsi),%ecx 12: 31 d2 xor %edx,%edx 14: f3 0f 10 46 08 movss 0x8(%rsi),%xmm0 19: 48 63 80 a8 01 00 00 movslq 0x1a8(%rax),%rax 20: 85 c0 test %eax,%eax 22: 78 08 js 2c <_mesa_unmarshal_Uniform1f+0x2c> 24: 48 8b 57 40 mov 0x40(%rdi),%rdx 28: 48 8b 14 c2 mov (%rdx,%rax,8),%rdx 2c: 89 cf mov %ecx,%edi 2e: ff d2 call %rdx 30: b8 02 00 00 00 mov $0x2,%eax 35: 48 83 c4 08 add $0x8,%rsp 39: c3 ret After: 0000000000000000 <_mesa_unmarshal_Uniform1f>: 0: f3 0f 1e fa endbr64 4: 48 89 f8 mov %rdi,%rax 7: 48 83 ec 08 sub $0x8,%rsp b: f3 0f 10 46 08 movss 0x8(%rsi),%xmm0 10: 8b 7e 04 mov 0x4(%rsi),%edi 13: 48 8b 40 40 mov 0x40(%rax),%rax 17: ff 90 10 10 00 00 call 0x1010(%rax) 1d: b8 02 00 00 00 mov $0x2,%eax 22: 48 83 c4 08 add $0x8,%rsp 26: c3 ret Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Eric Engestrom <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>	2025-01-23 00:49:05 +00:00
Marek Olšák	44bda7c258	dri: put shared-glapi into libgallium..so so that we don't have to maintain a stable ABI for it. This will allow removal of the remapping table to reduce CALL_ overhead for GL dispatch tables. Also we can now clean it up. Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Eric Engestrom <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32789>	2025-01-23 00:49:05 +00:00
Daniel Schürmann	08560b8ff8	aco/lower_branches: stitch linear blocks if there is exactly one successor with one predecessor Totals from 12906 (16.26% of 79395) affected shaders: (Navi31) Instrs: 22051521 -> `22049488` (-0.01%); split: -0.01%, +0.00% CodeSize: 116591240 -> 116583920 (-0.01%) Latency: 196625178 -> 196538410 (-0.04%); split: -0.04%, +0.00% InvThroughput: 33943045 -> 33930615 (-0.04%); split: -0.04%, +0.00% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	c90ae5f773	aco: delete aco_jump_threading.cpp This is now handled by lower_branches(). Totals from 47236 (59.49% of 79395) affected shaders: (Navi31) Instrs: 29490400 -> 29490507 (+0.00%) CodeSize: 152316812 -> 152317248 (+0.00%); split: -0.00%, +0.00% Latency: 229665459 -> 229665106 (-0.00%); split: -0.00%, +0.00% InvThroughput: 36870605 -> 36870504 (-0.00%); split: -0.00%, +0.00% Copies: 1966751 -> 2233467 (+13.56%) SALU: 3122941 -> 3123048 (+0.00%) Note, that only about 20 shaders are actually affected. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	c677809f25	aco/lower_branches: allow for non-fallthrough loop exits in try_merge_break_with_continue() Totals from 211 (0.27% of 79395) affected shaders: (Navi31) Instrs: 276961 -> 276545 (-0.15%) CodeSize: 1404356 -> 1402248 (-0.15%) Latency: 1344722 -> 1344887 (+0.01%); split: -0.00%, +0.01% InvThroughput: 165624 -> 165622 (-0.00%); split: -0.00%, +0.00% Branches: 6149 -> 5987 (-2.63%) SALU: 25722 -> 25468 (-0.99%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	12656ea5f5	aco: move try_merge_break_with_continue() to lower_branches() Totals from 3 (0.00% of 79395) affected shaders: (Navi31) Instrs: 12888 -> 12882 (-0.05%) Latency: 83253 -> 83246 (-0.01%) InvThroughput: 9251 -> 9249 (-0.02%) Branches: 483 -> 480 (-0.62%) SALU: 1329 -> 1326 (-0.23%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	13ad3db43f	aco/lower_branches: implement try_remove_simple_block() in lower_branches() This is mostly the same as in jump_threading, but can handle multiple predecessors. Totals from 3523 (4.44% of 79395) affected shaders: (Navi31) Instrs: 10244892 -> 10244753 (-0.00%); split: -0.00%, +0.00% CodeSize: 54171500 -> 54168540 (-0.01%); split: -0.01%, +0.00% Latency: 75070425 -> 75059570 (-0.01%); split: -0.02%, +0.00% InvThroughput: 11606911 -> 11605767 (-0.01%); split: -0.01%, +0.00% Branches: 331778 -> 331675 (-0.03%); split: -0.05%, +0.02% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	2b5a893e29	aco/lower_branches: do eliminate_useless_exec_writes_in_block() during branch lowering. Totals from 728 (0.92% of 79395) affected shaders: (Navi31) Instrs: 452926 -> 452161 (-0.17%) CodeSize: 2255536 -> 2252504 (-0.13%) Latency: 1683404 -> 1683470 (+0.00%); split: -0.01%, +0.01% InvThroughput: 210887 -> 210888 (+0.00%); split: -0.00%, +0.00% SALU: 77865 -> 77106 (-0.97%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	eecdb45d61	aco: consider s_cbranch_exec* instructions in needs_exec_mask() Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	de1e38e214	aco/assembler: Find loop exits using the successor's loop nest depth Previously, we just used the next block after a loop that has a back-edge. This assumes that loop-exit blocks can only be removed when falling through to the next block, when in fact it can also be a jump to somewhere else, in future even to some block before the actual loop. 12 (0.02% of 79395) affected shaders. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Daniel Schürmann	29c63de062	aco/jump_threading: don't remove loop preheaders They might be needed as convergence point in order to insert code (e.g. for loop alignment, wait states, etc.). Totals from 1 (0.00% of 79395) affected shaders: CodeSize: 12672 -> 12716 (+0.35%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>	2025-01-23 00:11:06 +00:00
Lucas Stach	4bed508122	etnaviv: track TS flushed status as bool TS can be valid and flushed at the same time when no compression is used. This state is beneficial if we needed to flush TS to the base surface (filling cleared tiles) for any reason, but still use TS state to accelerate read requests into PE or TX caches. The current seqno based tracking of the TS flush state has a major drawback with the following sequence of events: 1. fast clear surface (TS is now valid) 2. flush TS (base surface tiles filled, TS still valid, flush seqno == surface seqno) 3. render to surface (surface seqno increased) 4. flush resource Step 4 will now execute a full TS flush as the flush and surface seqnos are different after rendering and TS is still valid, wasting memory bandwidth to fill already filled tiled that are still marked as clear in the TS state. If the TS has been flushed already, step 4 should be a no-op. Switch from the seqno based tracking to tracking the flush state itself, marking the TS state un-/flushed as needed. With this boolean tracking of the flush state step4 above will correctly see that the TS has already been flushed since the last fast clear and skip the tile fill blit. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32956>	2025-01-22 23:50:00 +00:00
Erik Faye-Lund	d74f569035	pan/bi: bump iter_count to 2000 Without this, we fail to register-allocate the shader used in the dEQP-VK.ssbo.phys.layout.random.8bit.scalar.78 VK-CTS test case. Yeah, this sucks, but failing to compile sucks even more. We need a new register allocator plan here. Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33124>	2025-01-22 23:19:18 +00:00
Samuel Pitoiset	b4085df31c	radv: re-emit streamout state for GFX12 when the user SGPR changes This is more for consistency than a real fix. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33164>	2025-01-22 22:54:23 +00:00
Caterina Shablia	d46b80249b	panvk: enable subgroupSizeControl This is trivial for us, the hardware only ever supports a single subgroup size. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>	2025-01-22 21:49:52 +00:00
Erik Faye-Lund	1a81bff6aa	panvk: expose vk1.1 on v10 hardware Subgroup ops were the last bit missing Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>	2025-01-22 21:49:52 +00:00
Erik Faye-Lund	ac05c2a2b8	panvk: expose subgroup operations We can't use VK_SHADER_STAGE_ALL here, because we don't support geometry and tesselation shaders. Additionally, the DDK doesn't support the vertex stage, so let's not even try that for now; it probably won't work. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>	2025-01-22 21:49:52 +00:00
Caterina Shablia	d2838f3ceb	pan/bi: handle barriers with SUBGROUP scope Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32710>	2025-01-22 21:49:52 +00:00

1 2 3 4 5 ...

200653 commits