fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 19:58:19 +02:00

Author	SHA1	Message	Date
Marek Olšák	d279d019d4	ac/nir/tess: remove parameter from and simplify hs_per_patch_output_vmem_offset Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	fa5e07d5f7	ac/nir/tess: write TCS patch outputs to memory as vec4 stores at the end This moves per-patch output VMEM stores to the end of the shader where they execute only once. They are skipped if the whole workgroup discards all patches. If tcs_vertices_out == 1, per-patch output VMEM stores use the same lanes as per-vertex output VMEM stores, which are aligned to 4 or 8 lanes to get cached bandwidth for the stores. Previously, per-patch outputs were stored to memory for every store_output intrinsic in TCS. Additionally, LDS is no longer allocated for per-patch outputs that are only written and read by invocation 0, or they are written by all invocations but not read, and don't have indirect indexing. This reduces LDS usage and LDS traffic. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	c732306c5a	ac/nir/tess: unify computing LDS output patch size, minimize LDS bank conflicts This unifies the duplicated LDS output patch size computation between hs_output_lds_offset and ac_nir_compute_tess_wg_info. "+ 4" to the output patch stride minimizes LDS bank conflicts by making the beginning of each patch start on a different LDS bank. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	37dc376395	ac/nir/tess: use if-ladder to determine valid tess level components for the vote Checking whether every compoment is valid in tess_level_has_effect() when prim_mode is unknown generated too many SALU. Do this instead: if (triangles) ... subgroup vote for triangles else if (quads) .. subgroup vote for quads else // isoline subgroup vote for isolines Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	2f0d9495c5	ac/nir/tess: inline mask helpers Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	10ae5b2fbf	ac/nir/tess: rewrite tess level tracking, don't use LDS for more cases This rewrites tess level value tracking to use the 2-bit masks, which means LDS allocation is determined separately for outer and inner levels. LDS is not allocated for tess levels that are only written by invocation 0 and never read or only read by invocation 0. If the number of output patch vertices is 1, LDS is also not allocated for tess levels. Tess level outputs for TES are always written as whole vec4 to get cached bandwidth. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	9d9cfd89da	ac/nir/tess: compute the number of remapped VRAM outputs in common code This unifies it for both drivers. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	ea70060826	ac/nir/tess: stop using tes_inputs_read / tes_patch_inputs read for TCS & TES use ac_nir_tess_io_info instead Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	c38bc4824f	ac/nir/tess: apply no_varying to ac_nir_tess_io_info This has the effect that no_varying is finally honored for per-patch outputs, skipping VMEM stores that TES doesn't read. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	42445e271e	radv,radeonsi: use ac_nir_tess_io_info for LDS size computation Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	c678844ccb	ac/nir/tess: move LDS and VMEM output masks into a new info structure This will replace LDS and VMEM output size computations in drivers. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	f9c2a01f6a	ac/nir/tess: indent a block for nir_if Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	d967266edd	ac/nir/tess: if all tess levels are 0, skip per-vertex TCS output stores This is done for all chips. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	c1237256cb	ac/nir/tess: execute the tess level workgroup vote on all chips It will be used to skip stores for discarded patches. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	9c16228359	ac/nir/tess: write TCS per-vertex outputs to memory as vec4 stores at the end This improves write throughput for TCS outputs. It follows the same idea as attribute stores in hw GS. The improvement is easily measurable with a microbenchmark. It also has the advantage that multiple output stores to the same address don't result in multiple memory stores. Each output components gets only one memory store at the end of the shader. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	509f0e62ad	ac/nir/tess: allow passing explicit patch_offset to VMEM/LDS offset calculations Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	a59464b6e3	radv,radeonsi: precompute and pass TCS per-vertex output stride via a user SGPR It's a stride of 1 output, which isn't 16. It's 16 * num_threads, aligned to 256. tcs_offchip_layout has 5 unused bits, so let's use them. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:39 +00:00
Marek Olšák	534b282573	ac/nir/tess: adjust memory layout of TCS outputs to have aligned store offsets There is a comment that explains it. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:38 +00:00
Marek Olšák	80236f2367	ac/nir/tess: add if/endif for HS threads in NIR instead of ACO/LLVM This just removes the if/endif wrapping for LLVM, and hopefully the ACO change does the same thing. ACO had redundant code in endif_merged_wave_info, which is removed here. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:38 +00:00
Marek Olšák	cd366b57d9	ac/nir: implement load_subgroup_id/local_invocation_index for TCS on gfx6-10.x Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>	2025-06-07 16:29:38 +00:00
Marek Olšák	c3034fa82c	amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>	2025-06-04 17:46:38 +00:00
Karol Herbst	4f5ce2d5aa	ac/nir: fix unaligned single component load/stores This fixes two problems: 1. we need to lower the bit_size according to the alignment. 2. num_components could end up being 0, so we need to round up instead. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13102 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34976>	2025-06-03 13:14:31 +00:00
Samuel Pitoiset	fe2c93a788	ac/nir: enable 64-bit lowering for bitfield_extract Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35187>	2025-05-29 08:45:41 +02:00
Marek Olšák	6e4154b7ef	ac/nir: fix export_ps_outputs not preserving divergence metadata Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34492>	2025-05-14 20:19:16 +00:00
Georg Lehmann	a2209547db	ac/nir: enable nir_op_bfdot2_bfadd Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:26 +00:00
Georg Lehmann	f364303084	ac/nir: set lower_bfloat16_conversions Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>	2025-05-09 11:20:26 +00:00
Rhys Perry	3b42626973	ac/nir: allow 8/16-bit smem loads fossil-db (gfx1201): Totals from 295 (0.37% of 79377) affected shaders: Instrs: 314018 -> 313355 (-0.21%); split: -0.22%, +0.00% CodeSize: 1697996 -> 1696528 (-0.09%); split: -0.11%, +0.02% Latency: 4197719 -> 4197106 (-0.01%) InvThroughput: 1258891 -> 1258744 (-0.01%) PreSGPRs: 12232 -> 12230 (-0.02%) SALU: 66762 -> 66269 (-0.74%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Rhys Perry	5b116c4de9	ac/nir: allow vectorization of unsupported 8/16-bit loads These can later be lowered to a vectorized 32-bit load. fossil-db (gfx1201): Totals from 1259 (1.59% of 79377) affected shaders: MaxWaves: 36821 -> 36817 (-0.01%) Instrs: 4363702 -> 4355749 (-0.18%); split: -0.23%, +0.05% CodeSize: 22779980 -> 22706504 (-0.32%); split: -0.37%, +0.05% VGPRs: 69672 -> 69792 (+0.17%); split: -0.02%, +0.19% SpillSGPRs: 675 -> 673 (-0.30%) Latency: 26684053 -> 26663819 (-0.08%); split: -0.11%, +0.03% InvThroughput: 5617687 -> 5614798 (-0.05%); split: -0.10%, +0.04% VClause: 106830 -> 106654 (-0.16%); split: -0.17%, +0.00% SClause: 75523 -> 75495 (-0.04%); split: -0.04%, +0.01% Copies: 323199 -> 323525 (+0.10%); split: -0.10%, +0.20% Branches: 109475 -> 109480 (+0.00%); split: -0.00%, +0.01% PreSGPRs: 55036 -> 55040 (+0.01%) PreVGPRs: 47538 -> 47582 (+0.09%); split: -0.12%, +0.21% VALU: 2377777 -> 2389977 (+0.51%); split: -0.02%, +0.53% SALU: 578272 -> 578385 (+0.02%); split: -0.02%, +0.04% VMEM: 190065 -> 181204 (-4.66%) SMEM: 99709 -> 99565 (-0.14%) VOPD: 244 -> 243 (-0.41%); split: +0.41%, -0.82% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Rhys Perry	6dbf44ad9c	ac/nir: allow less than one register of overfetch This is to allow vectorization of 8/16-bit loads, which can later be cheaply lowered to a 32-bit load. fossil-db (gfx1201): Totals from 178 (0.22% of 79377) affected shaders: MaxWaves: 4138 -> 4102 (-0.87%) Instrs: 619714 -> 617917 (-0.29%); split: -0.32%, +0.03% CodeSize: 3364396 -> `3352724` (-0.35%); split: -0.38%, +0.03% VGPRs: 12896 -> 12980 (+0.65%); split: -0.19%, +0.84% SpillSGPRs: 546 -> 545 (-0.18%) Latency: 7589585 -> 7406076 (-2.42%); split: -2.45%, +0.04% InvThroughput: 1926356 -> 1879866 (-2.41%); split: -2.42%, +0.00% VClause: 12301 -> 11750 (-4.48%) SClause: 13614 -> 13583 (-0.23%); split: -0.45%, +0.22% Copies: 82207 -> 82265 (+0.07%); split: -0.10%, +0.17% Branches: 19284 -> 19266 (-0.09%) PreSGPRs: 9525 -> 9457 (-0.71%) PreVGPRs: 12366 -> 12421 (+0.44%) VALU: 347928 -> 348020 (+0.03%); split: -0.01%, +0.04% SALU: 82620 -> 82519 (-0.12%); split: -0.19%, +0.07% VMEM: 22248 -> 21430 (-3.68%) SMEM: 17951 -> 17843 (-0.60%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Rhys Perry	ddef4bddf8	ac/nir: round components when lowering 8/16-bit loads to 32-bit Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>	2025-05-08 13:30:50 +00:00
Marek Olšák	dfc3c1135c	ac/nir/tess: don't pass nir_intrinsic_instr to hs_output_lds_offset It will be used without intrinsics. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>	2025-05-08 02:54:13 +00:00
Marek Olšák	4bbe497d9b	ac/nir/tess: don't pass nir_intrinsic_instr to VMEM IO calc helpers These will be used without intrinsics. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>	2025-05-08 02:54:13 +00:00
Marek Olšák	360494f50d	ac/nir/tess: remove unused variables Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>	2025-05-08 02:54:12 +00:00
Marek Olšák	f58c0cbb6a	nir: split _accessed_indirectly bitmasks into _read/written_indirectly for AMD Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>	2025-05-08 02:54:12 +00:00
Pierre-Eric Pelloux-Prayer	992a340eab	ac/nir: init blake3 for cs blit shader Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34574>	2025-04-23 07:59:10 +00:00
Marek Olšák	d2e016c37d	ac/nir: don't store tess levels for TES in TCS if no_varying is set Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>	2025-04-19 22:55:00 -04:00
Marek Olšák	be8977811b	ac/nir: remove shader_info parameter from ac_nir_compute_tess_wg_info Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>	2025-04-19 22:55:00 -04:00
Marek Olšák	5fb2de9454	ac/nir: don't include TCS offchip size in LDS_SIZE This drastically reduces LDS usage for TCS. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34544>	2025-04-19 22:55:00 -04:00
Marek Olšák	2c122d478b	ac/nir: set X=0 for task->mesh shader dispatch when Y or Z is 0 The code set X=0 when Y and Z is 0, not "or". Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34432>	2025-04-16 06:08:48 +00:00
Marek Olšák	27d5be13c6	ac/nir/cull: always do frustum culling, skip only small prim culling Only small prim culling uses the viewport state, so only that must be disabled when there are multiple viewports. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>	2025-04-07 19:44:22 +00:00
Marek Olšák	0f97dc707d	ac/nir/cull: rename skip_viewport_culling -> skip_viewport_state_culling Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>	2025-04-07 19:44:22 +00:00
Marek Olšák	1d5c42528b	nir/opt_algebraic: lower 16-bit imul_high & umul_high Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>	2025-04-07 19:44:22 +00:00
Marek Olšák	ce716d009f	ac/nir/cull: cull small prims using a point-triangle intersection test Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This is based on Timur Kristof's code, but there are a lot of differences. The idea is that it doesn't just compute an intersection between a point and a triangle. It computes the distance between a point and a triangle and it does so in screen space. It accurately takes the subpixel precision of the rasterizer into account, so that it works optimally at all resolutions, all MSAA modes, and all quant modes. The distance computation is only approximated because it only considers the infinite lines going through triangle edges. However, it seems to be more than sufficient in practice because the existing rounding-based small prim culling compensates for it. The performance improvement is up to 10% in some geometry-bound tests, though targeted microbenchmarks can show a lot more than that. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33361>	2025-04-01 16:12:22 +00:00
Pierre-Eric Pelloux-Prayer	785df1b980	ac/nir: fix nir_metadata value of ac_nir_lower_image_opcodes This pass can insert new blocks so 'nir_metadata_control_flow' is not preserved. Fixes: `eaf98b1422` ("ac/nir: implement image opcode emulation for CDNA, enable it in radeonsi") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34241>	2025-03-31 15:19:29 +02:00
Timur Kristóf	64c6930bfc	ac/nir/ngg: Remove cleanup_culling_shader_after_dce. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Not needed anymore, now that the new concept is there. No Fossil DB changes on Navi 21. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00
Timur Kristóf	243a80be44	ac/nir/ngg: Use deferred info for compacted arguments. This means we don't have to emit dead code anymore and can only repack the sysvals that are actually used by the deferred part. No Fossil DB changes on Navi 21. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00
Timur Kristóf	0b71293358	ac/nir/ngg: Gather info about what the deferred shader part uses. Now that the deferred shader part is prepared before emitting the non-deferred part, we can also gather info about what sysvals it needs. No Fossil DB changes on Navi 21. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00
Timur Kristóf	e4c91c01e3	ac/nir/ngg: Prepare deferred shader part before adding culling code. The previous concept was to emit the non-deferred shader part first, including the culling code, and then modify the non-deferred part accordingly. This caused some issues because it was really impossible to tell which sysvals the deferred part needs after DCE, so we had to run an additional cleanup pass afterwards. The new concept is to prepare the deferred part first by applying reusable variables (from the non-deferred part) and run DCE. This opens the possibility to accurately gather info about what the deferred part needs. This idea is further expanded in the next commits. Fossil DB stats on Navi 21: Totals from 17 (0.02% of 79377) affected shaders: Instrs: 18063 -> 18064 (+0.01%) CodeSize: 93368 -> 93372 (+0.00%) Latency: 49889 -> 49899 (+0.02%); split: -0.01%, +0.03% SALU: 2416 -> 2417 (+0.04%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00
Timur Kristóf	e9e58fa412	ac/nir/ngg: Remove inputs_needed_by_* This information will be collected by NIR core better, no need to do it here. It is also currently unused. No functional changes. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00
Timur Kristóf	1e7d28a82e	ac/nir/ngg: Improve reuse of position value. Instead of hand-rolled code, use nir_scalar and its helper functions to reuse the position value. Results in more copies, which are mitigated by copy prop from the previous commit. This helps eliminate some instructions, especially VMEM loads from the deferred shader part of NGG culling shaders, which can be reused from the position values calculated by the non-deferred part. Fossil DB stats on Navi 21: Totals from 2472 (3.11% of 79377) affected shaders: MaxWaves: 78748 -> 78772 (+0.03%) Instrs: 636342 -> 633739 (-0.41%); split: -0.45%, +0.04% CodeSize: 3444740 -> 3427172 (-0.51%); split: -0.53%, +0.02% VGPRs: 62552 -> 62176 (-0.60%) Latency: 2025711 -> 2019449 (-0.31%); split: -0.73%, +0.42% InvThroughput: 221140 -> 221946 (+0.36%); split: -0.12%, +0.49% VClause: 5443 -> 5278 (-3.03%); split: -3.20%, +0.17% SClause: 8369 -> 8302 (-0.80%); split: -0.82%, +0.02% Copies: 102435 -> 101652 (-0.76%); split: -0.87%, +0.11% PreSGPRs: 63714 -> 63533 (-0.28%) PreVGPRs: 48555 -> 48392 (-0.34%) VALU: 242165 -> 241457 (-0.29%); split: -0.33%, +0.04% SALU: 197656 -> 197482 (-0.09%); split: -0.10%, +0.01% VMEM: 7746 -> 7571 (-2.26%) SMEM: 10822 -> 10730 (-0.85%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>	2025-03-29 00:47:20 +00:00

1 2 3

136 commits