fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 07:08:05 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	c6e3324980	agx: Legalize image LODs to be 16-bit Required by the hardware. Do it in NIR so we can optimize the conversion. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	a77facd459	asahi: Augment PBE descriptor for software access For implementing image atomics (and multisample image writes), we need information about the image layout in the shader. It's a lot nicer to determine the image layouts on the CPU (where we have ail) and stash the results in the PBE descriptor, where we have a convenient hole to do so, rather than trying to do all the layout calculations on the GPU on the fly. Add a data structure that the driver will fill out and the image atomic lowering will consider as part of the hardware. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	ee83453f69	asahi: Add a shared library interface for decode Add a simple API so that decode can be used as a shared library by the Python hypervisor. Note that this is not thread-safe. If we ever want to use this in other contexts with thread safety, it will need a refactor (along with the core decode code anyway). Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	55d363e02e	asahi: decode: Add a function to construct decode_params from a chip_id Should be useful on macOS later to properly support detecting the right GPU, but for now just hardcode T8103/G13G. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	56d5db247a	asahi: decode: Refactor to always copy GPU mem to local buffers We want to plug this library into the hypervisor, but there we don't have all GPU memory already mapped in our address space. Refactor the GPU mem read function to always allocate local buffers and copy in the data there. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	2c2858c2af	asahi: wrap: Handle freeing shmems Needed for some Metal demos that end up creating multiple queues. This is still definitely broken/not fully correct, but it at least gets things working for those. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	0dc819f284	asahi: Add extra CDM header block for G14X Looks like we finally found our first properly divergent codepath. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Asahi Lina	69e91527d3	asahi: decode: Add a params argument to pass through Sooner or later we were going to need divergent codepaths in decode, and it looks like now is the time. Add a `params` typedef and pass it through all the decoder callbacks. This is an alias for drm_asahi_params_global, but use a typedef so we can change that later without changing dozens of instances. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	de1174791d	agx: Fix bogus assert Dolphin uses all the uniforms. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	80e103d718	agx: Reduce un/packs with mem access lowering Often not needed and makes the NIR harder to read. shader-db is noise. total instructions in shared programs: 1752712 -> 1752688 (<.01%) instructions in affected programs: 8338 -> 8314 (-0.29%) helped: 21 HURT: 8 Inconclusive result (%-change mean confidence interval includes 0). total bytes in shared programs: 11943572 -> 11943434 (<.01%) bytes in affected programs: 56716 -> 56578 (-0.24%) helped: 21 HURT: 8 Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	afa38c7d4f	agx: Vectorize 16-bit parallel copies If we have two 16-bit copies to/from adjacent 16-bit registers, we can instead use a single 32-bit copy from the 32-bit register pair. Since 32-bit integer arithmetic is (almost) as efficient as 16-bit on AGX, this (almost) doubles performance of affected parallel copies. total instructions in shared programs: 1788606 -> 1788301 (-0.02%) instructions in affected programs: 17057 -> 16752 (-1.79%) helped: 150 HURT: 0 Instructions are helped. total bytes in shared programs: 12196492 -> 12194662 (-0.02%) bytes in affected programs: 122894 -> 121064 (-1.49%) helped: 150 HURT: 0 Bytes are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	42a4c09b72	agx: Try to allocate phi sources with loop phis total instructions in shared programs: 1788666 -> 1788606 (<.01%) instructions in affected programs: 7953 -> 7893 (-0.75%) helped: 29 HURT: 0 Instructions are helped. total bytes in shared programs: 12196852 -> 12196492 (<.01%) bytes in affected programs: 53908 -> 53548 (-0.67%) helped: 29 HURT: 0 Bytes are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	d0caa08c26	agx: Try to allocate phi sources with phis Not meaningfully using more registers since this is just about how we assign registers after fixing the maximum # of registers used (note that thread count is unaffected). total instructions in shared programs: 1790901 -> 1788666 (-0.12%) instructions in affected programs: 230680 -> 228445 (-0.97%) helped: 681 HURT: 2 Instructions are helped. total bytes in shared programs: 12210266 -> 12196852 (-0.11%) bytes in affected programs: 1634100 -> 1620686 (-0.82%) helped: 682 HURT: 2 Bytes are helped. total halfregs in shared programs: 532130 -> 532218 (0.02%) halfregs in affected programs: 848 -> 936 (10.38%) helped: 3 HURT: 13 Halfregs are HURT. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	73da872a66	agx: Try to allocate phis compatibly with sources All shaders affected for thread count are in pubg... by chance the allocation before used fewer registers than the calculated register demand (I guess because we're conservative with our vector handling) and so got lucky and got higher thread count. That shader is also helped massively for instructions. The halfreg change doesn't matter -- we're not actually increasing register demand, we're just being more choosy about our registers. total instructions in shared programs: 1799738 -> 1790901 (-0.49%) instructions in affected programs: 306081 -> 297244 (-2.89%) helped: 889 HURT: 14 Instructions are helped. total bytes in shared programs: 12263290 -> 12210266 (-0.43%) bytes in affected programs: 2150966 -> 2097942 (-2.47%) helped: 889 HURT: 14 Bytes are helped. total halfregs in shared programs: 531981 -> 532130 (0.03%) halfregs in affected programs: 1925 -> 2074 (7.74%) helped: 0 HURT: 26 Halfregs are HURT. total threads in shared programs: 18885184 -> 18884224 (<.01%) threads in affected programs: 13440 -> 12480 (-7.14%) helped: 0 HURT: 15 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	6cc8d7b52a	agx: Add try_coalesce_with helper Common logic the next few patches will use to try to assign something to the same register as something else. "If it's already been assigned a register and that register is free now, use it, otherwise bail." Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	8db9eeaeec	asahi: Upload image descriptors Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig	16f081bf2a	ail: Page-align layers for writable images This appears to be necessary for PBE writes. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	f716da596b	asahi,agx: Set coherency bit for clustered targets We need to set a particular bit on atomics for them to be coherent across clusters. Fixes atomics on G13X. Setting this bit on the single-cluster G13G, on the other hand, wedges the GPU. So best be careful ;-) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Janne Grunau	f66fc18886	asahi: toggle more barrier bits after transform feedback Fixes KHR-GLES31.core.draw_indirect.advanced-twoPass-transformFeedback-arrays and KHR-GLES31.core.draw_indirect.advanced-twoPass-transformFeedback-elements on M1 Ultra (G13D). Let's assume that same bits are required on M1 Pro and Max. Signed-off-by: Janne Grunau <j@jannau.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	58d43ca03c	asahi: Identify background/EOT counts Similar to the counts for VDM/PDM/CDM. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Asahi Lina	0e08923a7b	asahi: Add nomsaa debug flag This forces off MSAA, which together with smalltile mode helps test more combinations. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Asahi Lina	e9b2f02c2f	asahi: Add smalltile debug option This lets us force small tiles when they otherwise would not be necessary, which is useful for decoupling tile size and the logic that depends on it from things like MSAA and MRT which can trigger small tiles. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Asahi Lina	35715db30d	asahi: Add synctvb debug flag This requests synchronous TVB growth (instead of split renders). Mostly for testing at this point. Only works with newer kernels and the kernel will complain on dmesg for now. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	85c829d64f	asahi: Remove unused #define Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	f10d51541d	asahi: Use nir_builder_at more Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	c20c9f06d3	asahi: Augment fake drm_asahi_params_global Stub out a bit more UAPI so we can build with the additions in this patch series. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>	2023-07-20 15:33:27 +00:00
Alyssa Rosenzweig	a28f9738e1	asahi: Use txf_ms for MSAA background programs Fixes regression in assorted dEQP tests including: dEQP-EGL.functional.color_clears.multi_context.gles3.rgba8888_window Fixes: `d4424950ac` ("asahi: Use txf for background program") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	02ac7305a0	agx: Don't leak ssa_to_reg_out calloc'd in the RA, should be freed in the RA. Identified with valgrind. Fixes: 6b13616cba2 ("agx: Implement vector live range splitting") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	1140bdb783	asahi: Arrange VS varyings in the correct order The GPU ABI requires varyings to be grouped as follows: - Position - Smooth shaded fp32 - Flat shaded fp32 - Linear shaded fp32 - Smooth shaded fp16 - Flat shaded fp16 - Linear shaded fp16 - Point size Use the flat shaded mask info we now have in the vertex shader key to sort things properly, and pass the counts to the hardware. FP16 is still TODO. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	2055e03243	asahi: Add flat/linear shaded varyings mask to the VS shader key We need this information in order to arrange varyings properly, which means we need shader variants. Add this to the shader key, taking the value from the FS input info. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	4a65b4bb14	asahi: Fix type confusion for fragment shader keys We can't attempt to access the fs union member if this is not a FS. That worked so far since there wasn't a VS shader key at all, but we're about to introduce one. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	90834353a1	asahi: Gather flat/linear shaded input info from uncompiled FS We need to propagate shading model metadata from the FS to the VS in order to correctly lay out the uniforms in the right order. This means we need VS variants depending on this data. We could use the existing shader info structure, but that applies to compiled shaders which would introduce a dependency from the VS compile to the FS compile. This information does not change with FS variants, so we can introduce an agx_uncompiled_shader_info structure and gather it early at precompilation time. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	49994dc8cb	asahi: Identify the separate varying count fields Flat/goraud/linear and 32/16 need to be specified separately. This change identifies the new fields but should be a functional no-op. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	d9bf52e00f	agx: Assert that barriers are not used in the preamble It is nonsensical and confuses the hardware. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	9bf7d14b2c	agx: Use nir_opt_shrink_vectors Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	c81a14c754	agx: Use nir_opt_shrink_stores This especially helps with image stores, where we otherwise insert a bunch of pointless moves to collect a vector even when we know the format only has a single channel. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	b57faede71	asahi: Identify PBE::sRGB flag Needed to write out sRGB images correctly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	6dc6991930	asahi: Rename 'Render Target' to 'PBE' It's used for all PBE operations, including regular image writes, so use the more general name. Compare the powervr driver. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	75b5bf8dbc	asahi: Strip ? in GenXML Sometimes it's nice to have boolean flags with ? in the name, allow this by stripping ? when generating the sanitized C name. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Asahi Lina	850380cbf5	asahi: match_soa: Treat offsets as signed An offset may be negative, indexing backwards from the array base. When we right shift an offset by the format shift, we need to use a signed shift to ensure that the resulting offset is still negative. Fixes Nautilus faults/pink crashes. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	a90b0743f3	agx: Smarten discard_agx -> sample_mask lowering In 97a1bbeaf26 ("agx: Fix discards"), we made our discard lowering very simple, since we had just discovered the underlying instruction behaviour and needed a hotfix for misrendering in the wild. Now that we understand the behaviour, we can do better. There are two potential performance issues with the lowering in that commit: 1. It generates extra sample_mask instructions. For a shader that has a single discard_if at root level, it would generate two instructions sample_mask foo, 0 sample_mask ~0, ~0 rather than a single sample_mask ~0, ~foo 2. It runs depth/stencil testing/updates at the end of the shader, even when it could be run immediately after the discard. This might cause pipeline stalls. The solution is to insert the "trigger testing" sample_mask instruction as soon after the "discard" instruction as possible, fusing them if they would be next to each other. There are two cases: 1. The last discard is executed unconditionally. In this case, we can test immediately after, unconditionally, and fuse together. 2. The last discard is executed conditionally. In this case, we test in the first unconditional block after the discard. Example shader: ... loop { if .. { loop { discard_if <-- discard here ... } .. } ... } <---- we test here ... store_output Together this covers all the usual patterns for single-sampled discard. We could still do better with multisampling, but whatever. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	5a4c9136cd	agx: Add algebraic opt to help with discard lowering When lowering discards, it will be convenient to generate the pattern: (cond ? 255 : 0) ^ 255 Add rules to optimize that to (cond ? 0 : 255) This is not part of the main algebraic optimizer since this lowering happens late. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Konstantin Seurer	8ce27e7ed2	asahi: Use nir_builder_at Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23883>	2023-07-03 15:21:37 +00:00
Yonggang Luo	99dce8407e	asahi: Use nir_foreach_function_impl instead nir_foreach_function in function agx_nir_lower_zs_emit Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23920>	2023-06-29 11:29:54 +00:00
Yonggang Luo	62ce223245	treewide: Switch to use nir_foreach_function_with_impl when possible Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23903>	2023-06-29 08:36:03 +00:00
Alyssa Rosenzweig	173b9ee69a	treewide: Use nir_builder_create more perl -p0e 's/nir_builder_init\(&([^,]*), /\1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860>	2023-06-27 18:13:02 +00:00
Alyssa Rosenzweig	815efcdf7e	nir: Use nir_builder_create perl -p0e 's/nir_builder ([^;]);\snir_builder_init\(&\1, /nir_builder \1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860>	2023-06-27 18:13:02 +00:00
Alyssa Rosenzweig	d4424950ac	asahi: Use txf for background program More straightforward (txf instead of tex, with integer coords). No discrernible performance difference. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23836>	2023-06-27 14:38:21 +00:00
Alyssa Rosenzweig	05adeb850b	agx: Use nir_lower_frag_coord_to_pixel_coord Instead of open-coding the logic. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23836>	2023-06-27 14:38:21 +00:00
Alyssa Rosenzweig	766535c867	agx: Implement vector live range splitting The SSA killer feature is that, under an "optimal" allocator, the number of registers used (register demand) is equal to the number of registers required (register pressure, the maximum number of variables simultaneously live at any point in the program). I put "optimal" in scare quotes, because we don't need to use the exact minimum number of registers as long as we don't sacrifice thread count or introduce spilling, and using a few extra registers when possible can help coalesce moves. Details-shmetails. The problem is that, prior to this commit, our register allocator was not well-behaved in certain circumstances, and would require an arbitrarily large number of registers. In particular, since different variables have different sizes and require contiguous allocation, in large programs the register file may become fragmented, causing the RA to use arbitrarily many registers despite having lots of registers free. The solution is vector live range splitting. First, we calculate the register pressure (the minimum number of registers that it is theoretically possible to allocate successfully), and round up to the maximum number of registers we will actually use (to give some wiggle room to coalesce moves). Then, we will treat this maximum as a bound, requiring that we don't use more registers than chosen. In the event that register file fragmentation prevents us from finding a contiguous sequence of registers to allocate a variable, rather than giving up or using registers we don't have, we shuffle the register file around (defragmenting it) to make room for the new variable. That lets us use a few moves to avoid sacrificing thread count or introducing spilling, which is usually a great choice. Android GLES3.1 shader-db results are as expected: some noise / small regressions for instruction count, but a bunch of shaders with improved thread count. The massive increase in register demand may seem weird, but this is the RA doing exactly what it's supposed to: using more registers if and only if they would not hurt thread count. Notice that no programs whatsoever are hurt for thread count, which is the salient part. total instructions in shared programs: 1781473 -> 1781574 (<.01%) instructions in affected programs: 276268 -> 276369 (0.04%) helped: 1074 HURT: 463 Inconclusive result (value mean confidence interval includes 0). total bytes in shared programs: 12196640 -> 12201670 (0.04%) bytes in affected programs: 1987322 -> 1992352 (0.25%) helped: 1060 HURT: 513 Bytes are HURT. total halfregs in shared programs: 488755 -> 529651 (8.37%) halfregs in affected programs: 295651 -> 336547 (13.83%) helped: 358 HURT: 9737 Halfregs are HURT. total threads in shared programs: 18875008 -> 18885440 (0.06%) threads in affected programs: 64576 -> 75008 (16.15%) helped: 82 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>	2023-06-23 17:37:41 +00:00

1 2 3 4 5 ...

917 commits