fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 09:38:07 +02:00

Author	SHA1	Message	Date
Asahi Lina	0a132b0640	asahi: Add a helper macro for debug/error messages This includes the program short name in the message, which is useful when running entire desktop sessions with a single log to figure out who is doing what. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	883ba4b161	asahi: Make BO import path failures more robust These operations can fail for complex reasons through no fault of mesa, so we should have proper runtime checks for them even in release builds. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	fcf594d00b	asahi: Implement valid buffer range tracking A common pattern is to allocate a vertex/etc buffer and write to it in subsets. Some games interleave this with draw calls using the buffer. This causes very expensive flushing for every draw call. Fix this by tracking which range of a buffer has been written to, and elide syncs when the range was previously uninitialized. Fixes Source engine game performance and probably helps a bunch of others. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	00064ba4e3	asahi: Fix style nits Found with a grep abomination which is probably too broken/silly to actually implement in CI... but hey, at least it found some. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	a88b9c5540	asahi: Locate low VA BOs correctly These need the shader_base added to them. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	030b2306a4	asahi: Enable glthread This helps a lot with FEX, since the GPU driver runs emulated (and only 64bit supports thunking). Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	4a5115c47b	asahi: Make agx_alloc_staging() take a screen instead of a context This makes it clear that it is thread-safe. Signed-off-by: Asahi Lina <lina@asahilina.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Asahi Lina	75e3212809	Revert "asahi: Advertise dual-source blending" This reverts commit `f4e2b22646`. This is broken until GL3 is enabled, possibly due to a core Mesa bug, but it's a corner case not worth fixing. Fixes Chromium. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	8a6d74d15b	agx: Make signal_pix instructions explicit Rather than implicitly packing them with the sample_mask. Again, this is just changing where they're emitted, no functional changes yet. Bug for bug compatibility with the old behaviour. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	bb530760a2	agx: Rename writeout to wait_pix This is the name applegpu is currently using, to capture the semantics of a pixel fence. I'm not sure what Apple calls this but wait_pix is closer than writeout for sure. This commit just does the rename. It doesn't fix the broken semantics we've had, this is to ease review and bisection. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	2028e7b88b	agx: Tease apart some sample_mask packing magic There's a second instruction here, and a second source in the first instruction. applegpu has known about the encodings for a while but I never updated the packing code. We will need to stop hardcoding this for multisampling support, as preparation tease apart the magic pieces. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	13b3da822b	asahi: Clamp texture buffer sizes Per the spec / freedreno. Fixes arb_texture_buffer_object-texture-buffer-size-clamp Fixes: `6b22a02f90` ("asahi,agx: Implement buffer textures with gnarly NIR") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	c4175c5fc8	asahi: Dirty track depth bias uploads Reduces how much we upload in SuperTuxKart. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig	23880daa8d	asahi: Lower 1D to 2D Khronos APIs require that we support mipmapping even for 1D textures. However, it isn't clear if this is supported in the hardware, and how it would work even if it is. But 1D textures are pretty useless, so we just lower 1D textures to 2D textures instead of worrying about that. Fixes piles of Piglits relating to 1D textures. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	098295f1a0	asahi: Implement null textures Use the same silly workaround that Metal does, to fill in texture descriptors when there's nothing bound in the interest of robust behaviour. Fixes null pointer dereference in arb_shading_language_420pack-active-sampler-conflict. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	1fb4e34020	asahi: Honour sampler count It may not be equal to the texture count. Prevents a regression from the next commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	203c9c12e2	agx: Don't overallocate registers We need to account for the full vector lengths. Especially important once we start restricting the reg file. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	42c5d6140b	agx: Coalesce more collects Try harder to coalesce collects, by trying to allocate collects only to regions of the register file where we actually have a full vector worth of registers free. If we already know that the vector will be blocked later, it's not a good base register to pick since we'd be force to shuffle later. So, this tweak to the collect coalescing heuristic lets us eliminate a pile of pointless copying. shader-db results are excellent. Note that, although we use more registers, none of the shaders tested had their thread count affected, likely because the max HURT isn't too high and most of the scary % here is from using a few more registers when the register pressure is already low. In the near future, that property will become guaranteed thanks to live range splitting, too. total instructions in shared programs: `1507337` -> 1500562 (-0.45%) instructions in affected programs: 428137 -> 421362 (-1.58%) helped: 2658 HURT: 167 helped stats (abs) min: 1.0 max: 34.0 x̄: 2.63 x̃: 2 helped stats (rel) min: 0.10% max: 25.00% x̄: 3.04% x̃: 2.14% HURT stats (abs) min: 1.0 max: 10.0 x̄: 1.24 x̃: 1 HURT stats (rel) min: 0.20% max: 23.81% x̄: 3.90% x̃: 3.57% 95% mean confidence interval for instructions value: -2.49 -2.31 95% mean confidence interval for instructions %-change: -2.76% -2.51% Instructions are helped. total bytes in shared programs: 10333670 -> 10293172 (-0.39%) bytes in affected programs: 2996682 -> 2956184 (-1.35%) helped: 2660 HURT: 175 helped stats (abs) min: 2.0 max: 204.0 x̄: 15.70 x̃: 12 helped stats (rel) min: 0.08% max: 23.08% x̄: 2.64% x̃: 1.83% HURT stats (abs) min: 2.0 max: 60.0 x̄: 7.26 x̃: 6 HURT stats (rel) min: 0.12% max: 22.39% x̄: 3.19% x̃: 2.78% 95% mean confidence interval for bytes value: -14.81 -13.76 95% mean confidence interval for bytes %-change: -2.39% -2.18% Bytes are helped. total halfregs in shared programs: 417284 -> 427363 (2.42%) halfregs in affected programs: 49814 -> 59893 (20.23%) helped: 95 HURT: 3018 helped stats (abs) min: 1.0 max: 8.0 x̄: 2.29 x̃: 2 helped stats (rel) min: 2.44% max: 28.57% x̄: 9.20% x̃: 6.06% HURT stats (abs) min: 1.0 max: 14.0 x̄: 3.41 x̃: 4 HURT stats (rel) min: 2.08% max: 150.00% x̄: 36.54% x̃: 27.27% 95% mean confidence interval for halfregs value: 3.17 3.31 95% mean confidence interval for halfregs %-change: 34.05% 36.23% Halfregs are HURT. total threads in shared programs: 16465280 -> 16465280 (0.00%) threads in affected programs: 0 -> 0 helped: 0 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	43b221cd59	asahi: Set PIPE_CAP_LOAD_CONSTBUF The CAP is a bit of a misnomer, what it really does is relax the alignment requirements for UBO packing. It should work fine and save us some memory. Noticed while debugging piglit fails. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	8e501b758a	asahi/decode: Print VDM barriers Instead of just decoding silently. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	0bbd8b502a	asahi/decode: Remove agxdecode_dump_bo Now that we have proper parsing this is more of a nuissance than not. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	e713983875	agx: Add helper for calculating occupancy Add information about the relationship between program register usage and program occupancy (the maximum number of threads that may execute concurrently on a single shader core). This table is derived from studying the maxTotalThreadsPerThreadgroup property in Metal while varying the register usage, something I blogged about a few years back. It's probably not 100% accurate and it hasn't been tested against hardware, but it matters "only" for performance (not correctness) so I'm not super stressed about the details. In the (near) future, RA will be able to make use of this information to know exactly when it can use more registers without hurting performance. In the present, it's just used for better shader-db statistics. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	05e614cc31	agx: Set loads_varying accurately Instead of just always mashing to true. Should be better for depth-only passes. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	80adaa47e5	asahi: Add perf debug for shader variants Compiling this can cause jank. This is still an issue in Quake3. There is a way to solve it but it's rather involved and certainly not this weekend's project. Better perf debugging on the other hand apparently is ^_^ Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	3a4920e928	asahi: Add perf debug for generate_mipmap The current implementation leaves a lot of perf on the table, so call it out on ASAHI_MESA_DEBUG=perf to help debugging perf problems, especially if this ever happens in a real application (i.e. not a benchmark). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	3a87d2cfbd	agx: Don't destroy usub_sat with constant Fixes KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	8ec91ee16f	agx: Don't allow uniform source to local_atomic Fixes KHR-GLES31.core.compute_shader.atomic-case3 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	c643f42dc6	agx: Constify agx_{read,write}_registers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	da9c8a4627	agx: Assert that we don't overflow registers This will become particularly important when we bound to smaller register files. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	7c7b95ba2a	agx: DCE even with noopt To simplify live range splitting, RA will soon assume that DCE has run (removing extraneous vectors). So run DCE even when otherwise disabling backend optimizations. AGX_MESA_DEBUG=noopt is still useful for disabling instruction combining, which is the more-likely-to-be-buggy pass anyway. This also fixes IR not being printed with noopt. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	75b858e904	asahi: Support more renderable formats Fixes KHR-GLES3.copy_tex_image_conversions.forbidden.* Arguably working around a mesa/st issue but more format support is good for compatibility and performance anyway. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Janne Grunau <j@jannau.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Yiwei Zhang	fc22380c32	venus/docs: sync to latest venus supported extensions Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>	2023-04-07 03:05:02 +00:00
Yiwei Zhang	bb7424b4b4	venus: add VK_EXT_rasterization_order_attachment_access support Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>	2023-04-07 03:05:02 +00:00
Yiwei Zhang	9c19d426cd	venus: add VK_EXT_load_store_op_none support There's no feature/properties structs associated with this extension. Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>	2023-04-07 03:05:02 +00:00
Yiwei Zhang	303a2136a4	venus: sync latest protocol for layering extensions - VK_EXT_load_store_op_none - VK_EXT_rasterization_order_attachment_access Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>	2023-04-07 03:05:02 +00:00
Sajeesh Sidharthan	ab3507691a	radeonsi/vcn: optimize bitstream buffer resize logic bitstream buffer is unmapped, resized and mapped again if new size is greater than the current bitstream buffer size. This will be done for each input buffer. This patch will avoid that and do resize only once irrespective of number of input buffers. With the new logic, total size is calculated first and call unmap, resize and map only once. Signed-off-by: Sajeesh Sidharthan <sajeesh.sidharthan@amd.com> Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22308>	2023-04-07 02:31:24 +00:00
Alyssa Rosenzweig	d1b569d26f	nir/print: Don't print sampler_index for txf NIR's docs for sampler_index say The following operations do not require a sampler and, as such, this field should be ignored: - nir_texop_txf - nir_texop_txf_ms - nir_texop_txs - nir_texop_query_levels - nir_texop_texture_samples - nir_texop_samples_identical Contrary to this documentation, we were still printing the sampler_index anyway, even though the value is formally undefined. This was helpful for PIPE_CAP_TEXTURE_BUFFER_SAMPLER drivers that (despite the NIR docs) respected the sampler_index anyway. There are no longer any such drivers, so we should stop printing sampler_index for txf to avoid confusion (and noise). Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig	a9f9953928	docs/gallium: Note samplers are not used for txf Now that PIPE_CAP_TEXTURE_BUFFER_SAMPLER is gone, txf does not require samplers for any texture on any Gallium driver. NIR already requires drivers to ignore sampler_index for non-sampler operation (mainly txf), and nowadays all Gallium drivers ingest NIR. So, document that samplers aren't bound for txf (etc) as part of the Gallium frontend-driver contract. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Suggested-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig	6ba29d37c8	gallium: Remove PIPE_CAP_TEXTURE_BUFFER_SAMPLER No more users. It was already not respected by rusticl so you couldn't set it if you wanted OpenCL support. I regret introducing the CAP in the first place, and no more drivers should use it. Reverts `d5d3f77e4a` ("gallium: Add new cap PIPE_CAP_TEXTURE_BUFFER_SAMPLER"). Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig	e406e74aa4	panfrost: Unset TEXTURE_BUFFER_SAMPLERS We no longer need this CAP, as we can easily synthesize our own internal sampler for this case. Gallium doesn't need to know about this quirk of our hardware. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Italo Nicola <italonicola@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig	b9cc2b2a98	pan/{mdg,bi}: Always use sampler 0 for txf Now that we upload workaround samplers for txf, sampler 0 is guaranteed to be valid but other samplers are not. So ignore whatever the current sampler_index value is (it's formally undefined in NIR) and use 0, which we know is valid. We already do this on Valhall for OpenCL, just need to generalize for Midgard and Bifrost. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Italo Nicola <italonicola@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig	e15603bdf1	panfrost: Always upload a workaround sampler The hardware requires a valid sampler even for texelFetch (txf), even though its contents are ignored. We'd rather not pass on this requirement to the frontends, so we should handle it by uploading our own workaround sampler in the case when no sampler is already present. We already do this on Valhall (for rusticl), so we just need to port the same workaround back to Midgard/Bifrost. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Italo Nicola <italonicola@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>	2023-04-07 01:15:40 +00:00
Mike Blumenkrantz	06bfe07212	zink: don't try copying multiple results for conditional render copy conditional render is only a single result, so multiple results need to first be aggregated fixes #8798 cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22345>	2023-04-07 00:52:27 +00:00
Ian Romanick	72a9d12c96	nir/tests: Port almost all loop_analyze tests to new macro-based infastructure The one test that remains would have an automatically generated name that would conflict with another test. This test is also a little special (per the comment in the test), so it's probably best to leave it separate anyway. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Yevhenii Kolesnikov	9427aaeab7	nir/loop_analyze: Determine iteration counts for more kinds of loops If loop iterator is incremented with something other than regular addition, it would be more error prone to calculate the number of iterations theoretically. What we can do instead, is try to emulate the loop, and determine the number of iterations empirically. These operations are covered: - imul - fmul - ishl - ishr - ushr Also add unit tests for loop unrollment. Improves performance of Aztec Ruins (sixonix gfxbench5.aztec_ruins_vk_high) by -1.28042% +/- 0.498555% (N=5) on Intel Arc A770. v2 (idr): Rebase on 3 years. :( Use nir_phi_instr_add_src in the test cases. v3 (idr): Use try_eval_const_alu in to evaluate loop termination condition in get_iteration_empirical. Also restructure the loop slightly. This fixed off by one iteration errors in "inverted" loop tests (e.g., nir_loop_analyze_test.ushr_ieq_known_count_invert_31). v4 (idr): Use try_eval_const_alu in to evaluate induction variable update in get_iteration_empirical. This fixes non-commutative update operations (e.g., shifts) when the induction varible is not the first source. This fixes the unit test nir_loop_analyze_test.ishl_rev_ieq_infinite_loop_unknown_count. v5 (idr): Fix _type parameter for fadd and fadd_rev loop unroll tests. Hopefully that fixes the failure on s390x. Temporarily disable fmul. This works-around the revealed problem in glsl-fs-loop-unroll-mul-fp64, and there were no shader-db or fossil-db changes. v6 (idr): Plumb max_unroll_iterations into get_iteration_empirical. I was going to do this, but I forgot. Suggested by Tim. v7 (idr): Disable fadd tests on s390x. They fail because S390 is weird. Almost all of the shaders affected (OpenGL or Vulkan) are from gfxbench or geekbench. A couple shaders in Deus Ex (OpenGL), Dirt Rally (OpenGL), Octopath Traveler (Vulkan), and Rise of the Tomb Raider (Vulkan) are helped. The lost / gained shaders in OpenGL are an Aztec Ruins shader that goes from SIMD16 to SIMD8. The spills / fills affected are in a single Aztec Ruins (Vulkan) compute shader. shader-db results: Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown) total loops in shared programs: 5514 -> 5470 (-0.80%) loops in affected programs: 62 -> 18 (-70.97%) helped: 37 / HURT: 0 LOST: 2 GAINED: 2 Haswell and Broadwell had similar results. (Broadwell shown) total loops in shared programs: 5346 -> 5298 (-0.90%) loops in affected programs: 66 -> 18 (-72.73%) helped: 39 / HURT: 0 fossil-db results: Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown) Instructions in all programs: 157374679 -> 157397421 (+0.0%) Instructions hurt: 28 SENDs in all programs: 7463800 -> 7467639 (+0.1%) SENDs hurt: 28 Loops in all programs: 38980 -> 38950 (-0.1%) Loops helped: 28 Cycles in all programs: 7559486451 -> 7557455384 (-0.0%) Cycles helped: 28 Spills in all programs: 11405 -> 11403 (-0.0%) Spills helped: 1 Fills in all programs: 19578 -> 19588 (+0.1%) Fills hurt: 1 Lost: 1 Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Yevhenii Kolesnikov	f051967f19	nir/loop_analyze: Track induction variables incremented by more operations These operations are covered: - imul - fmul - ishl - ishr - ushr The only cases that can be currently affected are those where the calculated loop-trip count would be zero. v2 (idr): Split out from original commit. Rebase on lots of other work. v3 (idr): Move operand size assertion. This code only cares that the operands have the same size for the iadd and fadd cases. In other cases, such as shifts, the sizes may not match. Fixes assertion failures in tests/spec/arb_gpu_shader_int64/glsl-fs-loop-unroll-ishl-int64.shader_test. No shader-db or fossil-db changes on any Intel platform. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Ian Romanick	bc170e895f	nir/loop_analyze: Use try_eval_const_alu and induction variable basis info This dramatically simplifies will_break_on_first_iteration, and, much more importantly, makes it significantly more flexible. It is now possible to handle loops with more complex exit condition and other kinds of increment operations. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Ian Romanick	99a7a6648d	nir/loop_analyze: Change invert_cond instead of changing the condition This ensures that scenarios like nir_loop_analyze_test.iadd_inot_ilt_rev_known_count_5 don't regress in the next commit. It also means we don't change float comparisons. These are probably fine... but it still made me a little uneasy. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Ian Romanick	aeb8af1141	nir/loop_analyze: Track induction variable basis information Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00
Ian Romanick	30879a760c	nir/loop_analyze: Add a function to evaluate an ALU as constant ...with a substitution. This function is largely a copy-and-paste of try_fold_alu (nir_opt_constant_folding.c), and an argument could be made that this function belongs in that file. v2: Some changes were mistakenly squashed in to "nir/loop_analyze: Use try_eval_const_alu and induction variable basis info" that should have been here. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>	2023-04-06 23:50:27 +00:00

1 2 3 4 5 ...

169515 commits