fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-23 08:38:08 +02:00

Author	SHA1	Message	Date
Marek Olšák	66dd70adc5	amd: lower load_gs_wave_id_amd in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	923f59c971	amd: lower load_barycentric_at_offset in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	16ab05fad1	amd: lower load_barycentric_pixel/centroid/sample in NIR radeonsi needs to preserve interp_mode in the arg load. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	7e83f6ca8b	amd: lower load_front_face in NIR radeonsi must do this after si_lower_nir_abi, which optimizes front_face, but doesn't lower it. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6ad5225b2a	amd: lower load_frag_shading_rate in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6d2e29ff6e	amd: lower load_sample_pos in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	110e474b4f	amd: lower load_sample_id in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	684c8da553	amd: lower load_invocation_id in NIR ACO can't look for it because it's lowered there. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	d281240c57	amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	0d372b043b	amd: lower load_local_invocation_id in NIR This is based on ACO. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	13cb5c7b72	amd: lower load_frag_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	58cb155068	amd: lower load_pixel_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Georg Lehmann	272ff275fa	aco/insert_exec: reset top exec for p_discard_if Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12363 Fixes: `31f62a6123` ("aco/insert_exec: don't always reset top exec") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32830>	2025-01-02 15:18:48 +00:00
Georg Lehmann	3da2d96bc5	aco/optimizer: fix signed extract of sub dword temps with SDWA If an instruction didn't already use SDWA convert_to_SDWA in apply_extract will add ubyte0/uword0 selections for v1b/v2b operands. This loses information that the instruction doesn't care about the high bits and makes the next apply_extract_twice fail. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `6cb9d39bc2` ("aco: combine extracts with sub-dword definitions") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32803>	2025-01-02 09:33:18 +00:00
Timur Kristóf	01bf998e17	aco: Update documentation Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32766>	2024-12-31 23:01:23 +00:00
Georg Lehmann	43fca7fffe	amd: support load_front_face_fsign Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>	2024-12-30 22:31:35 +00:00
Georg Lehmann	aee0c7274c	amd: switch to FRONT_FACE_ALL_BITS(0) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>	2024-12-30 22:31:34 +00:00
Georg Lehmann	6a6b26dfa5	aco: create v_cmpx with s_andn2(exec, v_cmp) Foz-DB Navi21: Totals from 3928 (4.95% of 79395) affected shaders: Instrs: 1155370 -> 1151154 (-0.36%) CodeSize: 6332192 -> 6314616 (-0.28%) Latency: 11955231 -> 11933281 (-0.18%); split: -0.18%, +0.00% InvThroughput: 1842283 -> 1841822 (-0.03%); split: -0.03%, +0.00% SALU: 175431 -> 171215 (-2.40%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>	2024-12-30 13:05:23 +00:00
Georg Lehmann	42512208d8	aco/insert_exec: exit shader using exec for top level discard Totals from 14538 (18.31% of 79395) affected shaders: no changes Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>	2024-12-30 13:05:23 +00:00
Georg Lehmann	6b35d6f75b	aco: allow p_exit_early_if_not with exec condition Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>	2024-12-30 13:05:23 +00:00
Georg Lehmann	c279e63a79	aco: rename p_early_exit_if to if_not It exits the shaders if the condition is false, not true. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>	2024-12-30 13:05:23 +00:00
Georg Lehmann	33a73203b0	aco/isel: skip and(exec) for top level demote_if/terminate_if In nested control flow this is nessecary to not demote/terminate invocations that are part of the global but not part of the local mask. At the top level, the masks are the same and no additional invocations can be accidentally disabled. Foz-DB Navi21: Totals from 2095 (2.64% of 79395) affected shaders: Instrs: 1058326 -> 1056839 (-0.14%) CodeSize: 5632480 -> 5626616 (-0.10%) Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00% InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00% Copies: 114446 -> 114433 (-0.01%) SALU: 230585 -> 229098 (-0.64%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>	2024-12-26 18:34:38 +00:00
Marek Olšák	de996ac481	radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled This adds kill_z and kill_stencil flags to the shader PS epilog key, which removes those outputs if depth or stencil are disabled. It must be implemented in: * ACO PS epilog * LLVM PS epilog * ac_nir_lower_ps for monolithic shaders Some of the samplemask code wasn't completely correct, but probably harmless. Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>	2024-12-24 12:02:20 +00:00
Qiang Yu	dff14d102d	aco: fix voffset missing when buffer store base >=4096 Regression on test: dEQP-GLES31.functional.geometry_shading.basic.output_256 voffset is missing if buffer store base >=4096, we need to re-calculate offen after resolve_excess_vmem_const_offset(). Fixes: `cdaf269924` ("aco: inline store_vmem_mubuf/emit_single_mubuf_store") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767>	2024-12-24 01:42:45 +00:00
Marek Olšák	85c20def94	ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads Cross-invocation TCS input access doesn't prevent same-invocation access. This improves shaders that use both for the same inputs. Also, if some components of a vec4 slot only use same-invocation access and other components only use cross-invocation access (it's possible after compaction), this takes the VGPR path for the components with same-invocation access, which didn't happen previously because all masks only describe whole vec4s. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>	2024-12-18 11:07:59 +00:00
Qiang Yu	d38efee8ef	aco: enable gfx12 support for radeonsi Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>	2024-12-16 07:35:07 +00:00
Rhys Perry	53d0187bab	aco: decrease max_workgroup_size Match the limit of radeonsi and RADV. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>	2024-12-12 17:38:46 +00:00
Rhys Perry	87f2f77960	aco: fix max_workgroup_count[0] This is necessary for radeonsi. fossil-db (navi21): Totals from 292 (0.37% of 79395) affected shaders: Instrs: 305965 -> 306182 (+0.07%); split: -0.00%, +0.07% CodeSize: 1624816 -> 1627212 (+0.15%); split: -0.00%, +0.15% Latency: 5244652 -> 5243587 (-0.02%); split: -0.07%, +0.05% InvThroughput: 1221089 -> 1225285 (+0.34%); split: -0.04%, +0.38% Copies: 22712 -> 22702 (-0.04%) PreSGPRs: 10713 -> 10712 (-0.01%) PreVGPRs: 10918 -> 10920 (+0.02%) VALU: 178613 -> 178836 (+0.12%) SALU: 43490 -> 43493 (+0.01%); split: -0.02%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32577>	2024-12-12 17:38:46 +00:00
Daniel Schürmann	26a3038b65	aco/lower_branches: remove edges between blocks if there is no direct branch This way, linear predecessors and successors better reflect the actual control flow which improves wait state insertion and hazard mitigation. Totals from 10252 (12.91% of 79395) affected shaders: (Navi31) Instrs: 18824540 -> 18803823 (-0.11%); split: -0.11%, +0.00% CodeSize: 99025464 -> 98942028 (-0.08%); split: -0.08%, +0.00% Latency: 169291854 -> 165781877 (-2.07%); split: -2.07%, +0.00% InvThroughput: 29701086 -> 29228602 (-1.59%); split: -1.59%, +0.00% SClause: 510587 -> 510586 (-0.00%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:22 +00:00
Daniel Schürmann	22ffe72022	aco: move branch lowering optimization into separate file 'aco_lower_branches.cpp' No fossil changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:22 +00:00
Friedrich Vock	845660f2b7	aco/lower_to_hw_instr: Check the right instruction's opcode instr is the branch instruction, its opcode won't ever be writelane. We should check inst instead. Found by inspection. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32389>	2024-12-12 08:46:21 +00:00
Daniel Schürmann	28ab7f0168	aco/jump_threading: remove branch sequence optimization This optimization gets applied during postRA optimization, now. No fossil changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:22 +00:00
Daniel Schürmann	fcd94a8ca7	aco: move try_optimize_branching_sequence() to postRA optimizations Totals from 196 (0.25% of 79206) affected shaders: (Navi31) Instrs: 534343 -> 534438 (+0.02%); split: -0.00%, +0.02% CodeSize: 2774852 -> 2775420 (+0.02%); split: -0.00%, +0.02% Latency: 7103512 -> 7103021 (-0.01%); split: -0.01%, +0.00% InvThroughput: 959477 -> 959447 (-0.00%) Copies: 42646 -> 42648 (+0.00%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Daniel Schürmann	95d44c7ce0	aco/optimizer_postRA: set branch()->never_taken if exec is constant non-zero Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Daniel Schürmann	d67932f69e	aco/print_ir: don't print disconnected empty blocks Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32330>	2024-12-12 08:11:21 +00:00
Daniel Schürmann	22881712c8	aco/assembler: Don't emit target basic block index when chaining branches This could erroneously cause an assertion to fail if the target block index was larger than UINT16_MAX. Fixes: `cab5639a09` ('aco/assembler: chain branches instead of emitting long jumps') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32599>	2024-12-11 23:28:55 +00:00
Georg Lehmann	65506e635b	aco/ra: don't write to scc/ttmp with s_fmac Fixes: `4bd229ac50` ("aco/gfx11.5: select SOP2 float instructions") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>	2024-12-11 12:51:18 +00:00
Georg Lehmann	0b9e2a5427	aco/ra: disallow s_cmpk with scc operand Fixes: `2d6b0a4177` ("aco/optimizer: Optimize SOPC with literal to SOPK.") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>	2024-12-11 12:51:18 +00:00
Georg Lehmann	fe0c72caec	aco/ra: don't write to exec/ttmp with mulk/addk/cmovk ttmp sgprs are readonly outside of trap handlers, so the instructions were probably skipped. RA should also never create additional exec writes. Fixes: `e06773281b` ("aco/ra: Optimize some SOP2 instructions with literal to SOPK.") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32545>	2024-12-11 12:51:18 +00:00
Georg Lehmann	576a2e798c	aco/gfx12: don't assume memory operations complete in order Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32569>	2024-12-11 12:22:59 +00:00
Samuel Pitoiset	553eb1a3fd	radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported On AMD hardware, it's possible to export a separate alpha channel for applying alpha-to-one after alpha-to-coverage and not before. On GFX11+, it's already mostly supported but alpha needs to be exported to MRTZ.a and one to MRT0.a. The hw always uses alpha for alpha-to-coverage from MRTZ.a. On older generations, the driver needs the same separate alpha export but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE which selects alpha from MRTZ.a. This should fix alpha-to-coverage with alpha-to-one when either depth, stencil or samplemask are exported but it still needs a slightly different solution without MRTZ. I will fix that later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>	2024-12-11 10:50:31 +00:00
Samuel Pitoiset	70047e6bd6	aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11 For FS epilogs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>	2024-12-11 10:50:31 +00:00
Georg Lehmann	4a977ea24f	aco/gfx11+: use v_and_b32 to extract local id 0 Foz-DB Navi31: Totals from 2561 (3.23% of 79206) affected shaders: CodeSize: 10399004 -> 10389120 (-0.10%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532>	2024-12-10 11:58:21 +00:00
Daniel Schürmann	b64fff7731	aco: remove definition from Pseudo branch instructions They are not needed anymore. Totals from 7019 (8.84% of 79395) affected shaders: (Navi31) Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13% CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08% SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43% Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00% InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00% SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00% Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04% Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00% PreSGPRs: 360167 -> 355363 (-1.33%) VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00% SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	7e4687fd04	aco: remove definition from SOPP branch instructions Totals from 17942 (22.60% of 79395) affected shaders: (Navi31) Instrs: 20334063 -> 20312676 (-0.11%); split: -0.11%, +0.00% CodeSize: 108458732 -> 108377540 (-0.07%); split: -0.08%, +0.00% Latency: 180510540 -> 180479666 (-0.02%); split: -0.02%, +0.00% InvThroughput: 28079325 -> 28077938 (-0.00%); split: -0.01%, +0.00% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	cab5639a09	aco/assembler: chain branches instead of emitting long jumps As regular branch instructions cannot jump further than 32768 dwords, previously we used long jumps as fallback solution. The disadvantage of that is that an extra SGPR pair must be provided in order to temporarily store the PC. This patch changes that to chained branch instructions by inserting an artificial extra block into the code to be targeted by the original branch. This block contains a single branch instruction jumping to the original target. Before the block, if necessary, we insert a <branch 1> instruction for the existing code in order to jump over the newly inserted block. Only a few RT shaders are affected. Totals from 29 (0.04% of 79395) affected shaders: (Navi31) CodeSize: 17281176 -> 17276332 (-0.03%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	c3d777d8ac	aco/assembler: change ctx.loop_header to uint32_t instead of Block* We are about to add new blocks during assembly which makes pointers into a vector unreliable. Also, only set it if the loop has no back-edge. Totals from 126 (0.16% of 79206) affected shaders: (Navi31) CodeSize: 1486152 -> 1488152 (+0.13%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	592f3fd994	aco/assembler: Actually insert s_inst_prefetch instructions when aligning blocks for loops Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	b92afdbd28	aco/assembler: constify assembly functions Ensure that instruction formats and special operands are not manipulated during assembly. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00
Daniel Schürmann	3a02bbd916	aco/print_asm: allow for empty blocks with arbitrary offsets We will add empty blocks at the end of the shader, in order to store some branch offset information. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>	2024-12-06 14:34:03 +00:00

1 2 3 4 5 ...

3503 commits