fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-22 22:50:32 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	292ceb297c	v3dv: Enable VK_EXT_debug_utils It's implemented in common code as long as you use vk_command_buffer. Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15560>	2022-04-06 01:18:23 +00:00
Omar Akkila	4208895175	ci: bump VK-GL-CTS to 1.3.1.1 Signed-off-by: Omar Akkila <omar.akkila@collabora.com> Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15668>	2022-04-04 23:04:33 +00:00
Iago Toral Quiroga	827ef5fba9	v3dv: fix limits for inline uniform blocks We don't support 'Update After Bind', however, the limits for this model also include the ones without it. See the with or without remark in the spec below: "maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings from descriptor sets created with or without the VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set." Fixes: dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>	2022-04-04 09:28:55 +00:00
Iago Toral Quiroga	597560e27c	broadcom/compiler: always enable per-quad on spill operations This ensures that any channels used for helper invocations are also spilled/filled correctly. Alternatively, we could recursively track all temps that get involved in computing values that are then used in explicit (dfdx,dfdy) or implicit (texture coordinates for mipmap or anisotropic filtering, etc) derivatives, and only enable per-quad on these (or disable spilling of any of these values). Fixes: dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>	2022-04-01 08:53:50 +00:00
Jason Ekstrand	688d478045	v3dv/queue: Rework multisync_free Thix fixes two bugs. First, we stop leaking in/out fences with multisync. Because the in_syncs and out_syncs parameters to set_multisync were arrays and not pointers to arrays, the caller's in_syncs and out_syncs pointers never got set and remained NULL so multisync_free() always sees to NULL pointers and does nothing, leaking both arrays. Not sure how this isn't showing up in the dEQP leak check tests. Second, the struct drm_v3d_multi_sync was in the scope of the then clause of the `if (device->pdevice->caps.multisync)` so it goes out of scope before the ioctl. This is, effectively, a use-after-free and, depending on stack allocation details, may result in the multisync extension struct getting stompped before the ioctl. Fixes: `ff8586c345` ("v3dv: enable multiple semaphores on cl submission") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>	2022-03-29 14:38:41 +00:00
Iago Toral Quiroga	7f6ecb8667	v3dv: add reference counting for descriptor set layouts The spec states that descriptor set layouts can be destroyed almost at any time: "VkDescriptorSetLayout objects may be accessed by commands that operate on descriptor sets allocated using that layout, and those descriptor sets must not be updated with vkUpdateDescriptorSets after the descriptor set layout has been destroyed. Otherwise, descriptor set layouts can be destroyed any time they are not in use by an API command." Based on a similar fix for RADV. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>	2022-03-29 11:28:39 +00:00
Iago Toral Quiroga	ca861bd6f4	v3dv: drop unnecessary memset We are already zeroing when we allocate the descriptor set layout memory with vk_object_zalloc. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>	2022-03-29 11:28:39 +00:00
Iago Toral Quiroga	591eed30b2	v3dv: fix sampler array addressing in v3dv_descriptor_set_layout Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>	2022-03-29 11:28:39 +00:00
Alejandro Piñeiro	81039feda4	broadcom: update language on V3D_DEBUG options Some typos, and bad grammar. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15593>	2022-03-28 19:21:48 +00:00
Iago Toral Quiroga	ce849032a4	broadcom/compiler: allow ldunifa with indirect uniform loads We handle uniforms by copying them into the uniform stream to be consumed with ldunif when they have a constant offset. Otherwise we fallback to general TMU access, which has more latency. However, just like we did for UBOs and read-only SSBOs, we can also try to use the unifa mechanism to handle indirect accesses in certain cases instead of the TMU fallback. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Iago Toral Quiroga	ea3223e7a4	v3dv: implement VK_EXT_inline_uniform_block Inline uniform blocks store their contents in pool memory rather than a separate buffer, and are intended to provide a way in which some platforms may provide more efficient access to the uniform data, similar to push constants but with more flexible size constraints. We implement these in a similar way as push constants: for constant access we copy the data in the uniform stream (using the new QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from which we need to copy and for indirect access we fallback to regular UBO access. Because at NIR level there is no distinction between inline and regular UBOs and the compiler isn't aware of Vulkan descriptor sets, we use the UBO index on UBO load intrinsics to identify inline UBOs, just like we do for push constants. Particularly, we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this, however, unlike push constants, inline buffers are accessed through descriptor sets, and therefore we need to make sure they are located in the first slots of the UBO descriptor map. This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS slots of the map, with regular UBOs always coming after these slots. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Boris Brezillon	56a2ccf058	v3dv: Stop using VK_OUTARRAY_MAKE() We're trying to replace VK_OUTARRAY_MAKE() by VK_OUTARRAY_MAKE_TYPED() so people don't get tempted to use it and make things incompatible with MSVC (which doesn't support typeof()). Suggested-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15522>	2022-03-25 11:00:02 +00:00
Jason Ekstrand	19f56e3fc4	v3dv: Drop GetPhysicalDeviceQueueFamilyProperties Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15459>	2022-03-18 11:19:14 -05:00
Iago Toral Quiroga	4f284254e4	v3dv: support importing external semaphores This was waiting for multisync support in our kernel interface so we can wait on the actual imported payload of a semaphore rather than the last job we submitted. Reviewed-by: Melissa Wen <mwen@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	fa1b10f36d	v3dv: lock around noop job submits Any thread we create may end up creating/submitting at least a noop job, which is a shared object. Before multisync, this was an issue only for the creation of the job itself, but with multisync we can also modify parameters of the noop job every time it is used (for signaling and serialization configuration). This change adds a noop mutex that all threads (main, wait and master) take before submitting a noop job to ensure concurrent access is not an issue. Fixes flakyness observed with multisync with the following test: dEQP-VK.api.command_buffers.secondary_execute_twice Reviewed-by: Melissa Wen <mwen@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	daa865fb2c	v3dv: fix semaphore wait from CPU job If a CPU job comes first in a command buffer with a semaphore wait operation we need to wait on the CPU for the semaphore to be signaled before we process the job. We have been doing this with a WaitForIdle operation, but that only works if the semaphore has been submitted for signaling from the same instance of the driver. If we have an imported payload from another instance in our semaphore however, waitForIdle may return too early since the submission to signal the semaphore may have been submitted by a different instance of the driver as well, and our wait for idle checks only know about this instance submissions. To fix this, we always submit a noop job from our instance that waits on the semaphores on the GPU and follow up with WaitForIdle to wait for that to complete. Fixes test failures and/or assert crashes in: dEQP-VK.synchronization.cross_instance.* (when enabling support for semaphore imports) Reviewed-by: Melissa Wen <mwen@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	3b8ab8a9ce	v3dv: don't signal semaphores/fences from a wait thread When we have a wait thread we can't ensure that the last job in the last command buffer will be the one to signal semaphores because in this case there is no gurantee that jobs from command buffers in the batch will be submitted to the GPU in order, as those put in a wait thread will be submitted later when the event wait operation is completed. Instead, we need to wait for all outstanding wait threads to complete and only then we should signal any semaphores or fences. This also fixes a bug where the wait for events was the last job in the command buffer. In this case, once the event wait is completed we have no additional jobs to submit and thus would never try to signal semaphores or fences. Reviewed-by: Melissa Wen <mwen@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	03840bfcd1	v3dv: fix temporary imports of semaphores and fences with multisync This is preparatory work to expose support for importing semaphores, which was waiting on kernel multisync support. When we implemented user-space multisync support we didn't handle temporary fence/semaphore payload imports at all, so we fix that here. Also, we add a has_temp boolean flag to identify the case where we have a temporary payload in a fence/sempahore instead of just checking if temp_sync is not 0. This is necessary to support semaphore imports (for which we are not exposing support yet) because these need to drop the temporary payload when they are used as wait semaphores in a submit, but we can't destroy the underlying temp_sync at that point because it needs to survive at least until the submit is finished, so instead we use a flag to tell if we have an active temporary payload or not, and we simply destroy any temp_sync on a semaphore destroy or any new import on the same semaphore. We only strictly need this flag for semaphores because fences drop the temporary payload when they are reset, which happens in the CPU and can only be done if the GPU is not using the fence, but we add the same flag for the fence for consistency. Reviewed-by: Melissa Wen <mwen@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	5a11a2fb6c	v3dv: don't expose image load/store features for linear images Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	0590ce1362	v3dv: return early on image to buffer blit copies if image is linear This path uses a shader blit to implement the copy which is only supported for tiled images (except 1D). While blit_shader() already checks for this, this path does a lot of heavy lifting to prepare for the blit_shader call so we rather avoid that if possible when we know blit_shader won't be able to implement the blit. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Iago Toral Quiroga	397f4963ed	v3dv: TFU destination must be UIF We had some code that considered the possibility that the destination might be linear when configuring TFU jobs, but we never actually allow for this to happen since we avoid hitting these paths in that case, as the TFU always produces UIF results. Instead, add an assert when producing the TFU packet to ensure we are expecting a UIF result. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>	2022-03-18 13:17:58 +00:00
Alejandro Piñeiro	e3d905ec39	v3dv/pipeline: use new helper vk_shader_module_to_nir In addition to use the helper, we also remove some of the lowering we had at preprocess_nir, as they are called now by the helper. As we are here we also move the call to nir_lower_sysvals_to_varyings, that for some reason we were calling it before preprocess_nir. It is worth to note that with this change we lose the ability to debug the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done on vk_spirv_to_nir, and as mentioned that includes several lowerings now. The workaround to that is to use NIR_DEBUG. We also needed to change how to check the entrypoint on the broadcom compiler, checking just if it is an entrypoint, instead of assuming that the name will be "main". v2: tweak comment, squash v3dv and compiler change (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>	2022-03-18 11:05:11 +00:00
Juan A. Suarez Romero	730a294b90	v3dv: implement VK_EXT_line_rasterization Allow to choose the line rasterization algorithm. It supports rectangular and Bresenham-style line rasterization. v2 (Iago): - Update documentation. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>	2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero	22759e9174	v3dv: add subpixel precision definition Move number of bits for subpixel precision in rasterizer to a define. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>	2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero	b53dda6da8	broadcom: add line rasterization mode to packet definition Add the supported line rasterization modes as enums in the XML packet definition. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15407>	2022-03-18 09:38:38 +00:00
Juan A. Suarez Romero	102ae4bdc8	broadcom: add on-disk cache debug option Add support for`V3D_DEBUG=cache`, which prints on-disk cache events. v2: - Use same debug format for v3d and v3dv (Alejandro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15380>	2022-03-18 08:58:01 +00:00
Iago Toral Quiroga	5c1302f47c	v3dv: expose VK_EXT_image_drm_format_modifier This has been implemented for a while but we could not expose it on Vulkan 1.0 because the extension declares a dependency on VK_KHR_sampler_ycbcr_conversion, which we don't implement, and CTS would complain. On Vulkan 1.1 however, VK_KHR_sampler_ycbcr_conversion was promoted to core as an optional feature, and this is enough for the the dependency to be satisfied, even if the feature is not supported, meaning that we can now expose the extension. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15426>	2022-03-18 06:42:06 +00:00
Juan A. Suarez Romero	c432bfe74b	broadcom/ci: Update flake list Some of the tests marked as flake didn't show up as flakes for a long time (more than 3 months). So likely they are already fixed. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15411>	2022-03-17 13:56:41 +00:00
Juan A. Suarez Romero	dfb6438392	v3dv: change MESA_GLSL_CACHE envvar reference This was renamed to MESA_SHADER_CACHE. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15390>	2022-03-17 11:16:45 +01:00
Juan A. Suarez Romero	000b935c50	v3dv/ci: add test to skip list Add test that it is a timeout in the CI, but otherwise it passes. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15374>	2022-03-14 18:55:13 +00:00
Tapani Pälli	adea096029	ci: update various ci result files Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12936>	2022-03-11 09:58:28 +00:00
Iago Toral Quiroga	49b5431197	broadcom/compiler: remove unused functions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15302>	2022-03-10 07:25:37 +00:00
Iago Toral Quiroga	44feff93c2	broadcom/compiler: don't always assign r5 if available Instead, only favor assigning r5 if we have first decided to assign an accumulator. This helps with assining r5 to short lived uniforms, favoring accumulator rotation to facilitate QPU merges. total instructions in shared programs: 12656164 -> 12628339 (-0.22%) instructions in affected programs: 5368373 -> 5340548 (-0.52%) helped: 17420 HURT: 9996 total uniforms in shared programs: 3704776 -> 3704863 (<.01%) uniforms in affected programs: 12247 -> 12334 (0.71%) helped: 23 HURT: 78 total max-temps in shared programs: 2153505 -> 2152684 (-0.04%) max-temps in affected programs: 26468 -> 25647 (-3.10%) helped: 569 HURT: 328 total fills in shared programs: 4656 -> 4657 (0.02%) fills in affected programs: 43 -> 44 (2.33%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 34728 -> 34403 (-0.94%) sfu-stalls in affected programs: 3411 -> 3086 (-9.53%) helped: 842 HURT: 534 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	77f58b46d9	broadcom/compiler: add comment on why we don't use r5 with ldunifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	5b140428b0	broadcom/compiler: adjust register threshold for 2-thread compiles We have twice the registers in this case so it makes sense to double this as well. While this causes slight regressions in shader-db stats (due to additional register pressure), it helps us hide latency of memory reads better on 2-thread compiles, where the thread switch mechanism will be less effective. This shows a ~3% performance improvement on the UE4 SunTemple demo. total instructions in shared programs: 12642413 -> 12656164 (0.11%) instructions in affected programs: 2272652 -> 2286403 (0.61%) helped: 2924 HURT: 3389 total uniforms in shared programs: 3703861 -> 3704776 (0.02%) uniforms in affected programs: 213729 -> 214644 (0.43%) helped: 823 HURT: 1272 total max-temps in shared programs: 2150686 -> 2153505 (0.13%) max-temps in affected programs: 191332 -> 194151 (1.47%) helped: 1900 HURT: 1891 total spills in shared programs: 3255 -> 3274 (0.58%) spills in affected programs: 166 -> 185 (11.45%) helped: 3 HURT: 6 total fills in shared programs: 4630 -> 4656 (0.56%) fills in affected programs: 367 -> 393 (7.08%) helped: 7 HURT: 15 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	a35b47a0b1	broadcom/compiler: add a strategy to disable scheduling of general TMU reads This can add quite a bit of register pressure so it makes sense to disable it to prevent us from dropping to 2 threads or increase spills: total instructions in shared programs: 12672813 -> 12642413 (-0.24%) instructions in affected programs: 256721 -> 226321 (-11.84%) helped: 719 HURT: 77 total threads in shared programs: 415534 -> 416322 (0.19%) threads in affected programs: 788 -> 1576 (100.00%) helped: 394 HURT: 0 total uniforms in shared programs: 3711370 -> 3703861 (-0.20%) uniforms in affected programs: 28859 -> 21350 (-26.02%) helped: 204 HURT: 455 total max-temps in shared programs: 2159439 -> 2150686 (-0.41%) max-temps in affected programs: 32945 -> 24192 (-26.57%) helped: 585 HURT: 47 total spills in shared programs: 5966 -> 3255 (-45.44%) spills in affected programs: 2933 -> 222 (-92.43%) helped: 192 HURT: 4 total fills in shared programs: 9328 -> 4630 (-50.36%) fills in affected programs: 5184 -> 486 (-90.62%) helped: 196 HURT: 0 Compared to the stats before adding scheduling of non-filtered memory reads we see we that we have now gotten back all that was lost and then some: total instructions in shared programs: 12663186 -> 12642413 (-0.16%) instructions in affected programs: 2051803 -> 2031030 (-1.01%) helped: 4885 HURT: 3338 total threads in shared programs: 415870 -> 416322 (0.11%) threads in affected programs: 896 -> 1348 (50.45%) helped: 300 HURT: 74 total uniforms in shared programs: 3711629 -> 3703861 (-0.21%) uniforms in affected programs: 158766 -> 150998 (-4.89%) helped: 1973 HURT: 499 total max-temps in shared programs: 2138857 -> 2150686 (0.55%) max-temps in affected programs: 177920 -> 189749 (6.65%) helped: 2666 HURT: 2035 total spills in shared programs: 3860 -> 3255 (-15.67%) spills in affected programs: 2653 -> 2048 (-22.80%) helped: 77 HURT: 21 total fills in shared programs: 5573 -> 4630 (-16.92%) fills in affected programs: 3839 -> 2896 (-24.56%) helped: 81 HURT: 15 total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%) sfu-stalls in affected programs: 8993 -> 7564 (-15.89%) helped: 1808 HURT: 1038 total nops in shared programs: 324894 -> 323685 (-0.37%) nops in affected programs: 30362 -> 29153 (-3.98%) helped: 2513 HURT: 2077 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	f783bd0d2a	broadcom/compiler: define v3d-specific delays for NIR instructions We do a few changes over NIR's defaults: 1. Lower delay for texture reads. Empirically, we don't observe any benefits with delays over 50 and since this delay value is still used by the scheduler in the "favor register pressure" case it is benefitial to avoid overestimating it too much. 2. Adjust delay for non-filtered TMU reads to the delay selected for texture reads. 3. In our case, UBO reads from dynamically uniform addresses don't use the TMU and have a latency of 1 instruction in the best case scenario or 4 at worse, so we go with 1 so we don't try to move this early. This helps us get back some of what we lost when updating the default scheduler configuration to add a delay for non-filtered memory reads: total instructions in shared programs: 13126587 -> 12671765 (-3.46%) instructions in affected programs: 3764097 -> 3309275 (-12.08%) helped: 14664 HURT: 4244 total threads in shared programs: 407208 -> 415522 (2.04%) threads in affected programs: 8716 -> 17030 (95.39%) helped: 4224 HURT: 67 total uniforms in shared programs: 3812698 -> 3711224 (-2.66%) uniforms in affected programs: 335170 -> 233696 (-30.28%) helped: 2816 HURT: 3551 total max-temps in shared programs: 2318430 -> 2159345 (-6.86%) max-temps in affected programs: 539991 -> 380906 (-29.46%) helped: 13173 HURT: 1440 total spills in shared programs: 49086 -> 5966 (-87.85%) spills in affected programs: 48306 -> 5186 (-89.26%) helped: 1655 HURT: 28 total fills in shared programs: 55810 -> 9328 (-83.29%) fills in affected programs: 54821 -> 8339 (-84.79%) helped: 1659 HURT: 22 LOST: 0 GAINED: 3 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	e7a4e97076	nir/schedule: use larger delay for non-filtered memory reads This has been pending for a long time. It is not very consistent to add a significant delay for textures and not do it for UBOs, etc The reason we have not been doing this so far is the accumulated effect on register pressure for V3D as shown by shader-db results below, but from the point of view of a generic scheduler it makes sense to do this. Later patches will address V3D specific issues with register pressure derived from this by letting the driver control its instruction delay settings. total instructions in shared programs: 12662138 -> 13126587 (3.67%) instructions in affected programs: 1813091 -> 2277540 (25.62%) helped: 2410 HURT: 10499 total threads in shared programs: 415858 -> 407208 (-2.08%) threads in affected programs: 17348 -> 8698 (-49.86%) helped: 8 HURT: 4333 total uniforms in shared programs: 3711483 -> 3812698 (2.73%) uniforms in affected programs: 128012 -> 229227 (79.07%) helped: 3474 HURT: 2143 total max-temps in shared programs: 2138763 -> 2318430 (8.40%) max-temps in affected programs: 318780 -> 498447 (56.36%) helped: 588 HURT: 11997 total spills in shared programs: 3860 -> 49086 (1171.66%) spills in affected programs: 709 -> 45935 (6378.84%) helped: 23 HURT: 1595 total fills in shared programs: 5573 -> 55810 (901.44%) fills in affected programs: 1067 -> 51304 (4708.25%) helped: 23 HURT: 1595 LOST: 3 GAINED: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	9ef499b315	broadcom/compiler: stop moving UBO loads before NIR scheduling This doesn't have any significant impact shader-db stats and would reduce our capacity to hide latency from the loads, so it is probably undesirable: total instructions in shared programs: 12663189 -> 12663186 (<.01%) instructions in affected programs: 4222 -> 4219 (-0.07%) helped: 9 HURT: 4 total uniforms in shared programs: 3711624 -> 3711629 (<.01%) uniforms in affected programs: 186 -> 191 (2.69%) helped: 0 HURT: 2 total max-temps in shared programs: 2138822 -> 2138857 (<.01%) max-temps in affected programs: 569 -> 604 (6.15%) helped: 1 HURT: 9 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:03 +00:00
Timur Kristóf	64acec0ef9	nir: Fix lowering terminology of compute system values: "from"->"to". This is to match other NIR terminology. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>	2022-03-08 17:36:31 +00:00
Iago Toral Quiroga	f761f8fd9e	broadcom/compiler: simplify node/temp translation during register allocation Now that we don't sort our nodes we can arrange them so we can easily translate between nodes and temps without a mapping table, just applying an offset. To do this we have a single array of nodes where twe put first the nodes for accumulators and then the nodes for temps. With this setup we can ensure that for any given temp T, its node is always T + ACC_COUNT. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	871b0a7f6a	broadcom/compiler: don't sort nodes for register allocation Nodes are allocated in order to registers so initially sorting was used to ensure that nodes with smaller life ranges would be assigned first and therefore be more likely to get accumulators. However, since `d81a6e5f1d` now we don't rely on order to make decisions about accumulators and instead we make policy decisions based on actual liveness, so sorting is no longer strictly relevant to this decision. Furthermore, we are not re-sorting nodes after each spill either, since that would probably require that we rebuild the interference graph after each spill (the graph identifies nodes by their index). Shader-db results show a significant improvement in instruction counts, due to more optimal accumulator assignments. The reason for this is that we use a round-robin policy for choosing the next accumulator to assign. The idea behind this is preventing nearby temps to be assigned to the same accumulator so that QPU scheduling is more flexible, but if we sort our nodes, we are basically not assigning temps in program order any more and the round-robin policy becomes less effective: total instructions in shared programs: 13000420 -> 12663189 (-2.59%) instructions in affected programs: 11791267 -> 11454036 (-2.86%) helped: 62890 HURT: 19987 total threads in shared programs: 415874 -> 415870 (<.01%) threads in affected programs: 20 -> 16 (-20.00%) helped: 2 HURT: 4 total uniforms in shared programs: 3711652 -> 3711624 (<.01%) uniforms in affected programs: 43430 -> 43402 (-0.06%) helped: 134 HURT: 173 total max-temps in shared programs: 2144876 -> 2138822 (-0.28%) max-temps in affected programs: 123334 -> 117280 (-4.91%) helped: 4112 HURT: 1195 total spills in shared programs: 3870 -> 3860 (-0.26%) spills in affected programs: 1013 -> 1003 (-0.99%) helped: 14 HURT: 12 total fills in shared programs: 5560 -> 5573 (0.23%) fills in affected programs: 1765 -> 1778 (0.74%) helped: 14 HURT: 17 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	4483cd24af	broadcom/compiler: sink uniform loads total instructions in shared programs: 13014428 -> 13000420 (-0.11%) instructions in affected programs: 743624 -> 729616 (-1.88%) helped: 1392 HURT: 611 total threads in shared programs: 415858 -> 415874 (<.01%) threads in affected programs: 16 -> 32 (100.00%) helped: 8 HURT: 0 total uniforms in shared programs: 3720410 -> 3711652 (-0.24%) uniforms in affected programs: 113442 -> 104684 (-7.72%) helped: 635 HURT: 29 total max-temps in shared programs: 2154268 -> 2144876 (-0.44%) max-temps in affected programs: 61279 -> 51887 (-15.33%) helped: 1124 HURT: 187 total spills in shared programs: 4002 -> 3870 (-3.30%) spills in affected programs: 265 -> 133 (-49.81%) helped: 6 HURT: 0 total fills in shared programs: 5788 -> 5560 (-3.94%) fills in affected programs: 603 -> 375 (-37.81%) helped: 6 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	e228642cf5	broadcom/compiler: move constants before their first user For us they are basically uniforms too so we want to make their lifespans short to facilitate allocating them to accumulators. total instructions in shared programs: 13043585 -> 13015385 (-0.22%) instructions in affected programs: 8326040 -> 8297840 (-0.34%) helped: 24939 HURT: 19894 total threads in shared programs: 415860 -> 415858 (<.01%) threads in affected programs: 4 -> 2 (-50.00%) helped: 0 HURT: 1 total uniforms in shared programs: 3721953 -> 3720451 (-0.04%) uniforms in affected programs: 96134 -> 94632 (-1.56%) helped: 744 HURT: 435 total max-temps in shared programs: 2173431 -> 2154260 (-0.88%) max-temps in affected programs: 264598 -> 245427 (-7.25%) helped: 10858 HURT: 841 total spills in shared programs: 4005 -> 4010 (0.12%) spills in affected programs: 700 -> 705 (0.71%) helped: 5 HURT: 10 total fills in shared programs: 5801 -> 5817 (0.28%) fills in affected programs: 1346 -> 1362 (1.19%) helped: 6 HURT: 11 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	a1998a9f43	broadcom/compiler: disallow TMU spills if max tmu spills is 0 If we are compiling with a strategy that does not allow TMU spills we should not allow spilling anything that is not a uniform. Otherwise the RA cost/benefit algorithm may choose to spill a temp that is not uniform and that will cause us to immediately fail the strategy and fallback to the next one, even if we could've instead chosen to spill more uniforms to compile the program successfully with that strategy. Some relevant shader-db stats: total instructions in shared programs: 13040711 -> 13043585 (0.02%) instructions in affected programs: 234238 -> 237112 (1.23%) helped: 73 HURT: 172 total threads in shared programs: 415664 -> 415860 (0.05%) threads in affected programs: 196 -> 392 (100.00%) helped: 98 HURT: 0 total uniforms in shared programs: 3717266 -> 3721953 (0.13%) uniforms in affected programs: 12831 -> 17518 (36.53%) helped: 6 HURT: 100 total max-temps in shared programs: 2174177 -> 2173431 (-0.03%) max-temps in affected programs: 4597 -> 3851 (-16.23%) helped: 79 HURT: 21 total spills in shared programs: 4010 -> 4005 (-0.12%) spills in affected programs: 55 -> 50 (-9.09%) helped: 5 HURT: 0 total fills in shared programs: 5820 -> 5801 (-0.33%) fills in affected programs: 186 -> 167 (-10.22%) helped: 5 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	cbb4d0dded	broadcom/compiler: increase cost of TMU spills to 10 Our cost was 5 which matches the number of instructions we have to add for a TMU spill (a fill is 4 instructions). Uniform spills on the other hand add an extra instruction for each fill and remove one instruction for the spill itself. These have a cost of 1. Therefore, if we have a single spill+fill, we end up with +9 instructions if it is a TMU spill and +0 instructions with a uniform spill, so making the former only 5 times more costly is probably not a good idea, and this is without even considering the added latency of the TMU accesses. Relevant shader-db changes show this causes as a marginal instruction count increase in a few shaders but better thread counts and lower TMU spilling overall: total instructions in shared programs: 13037315 -> 13040711 (0.03%) instructions in affected programs: 370106 -> 373502 (0.92%) helped: 187 HURT: 321 total threads in shared programs: 415090 -> 415664 (0.14%) threads in affected programs: 574 -> 1148 (100.00%) helped: 287 HURT: 0 total uniforms in shared programs: 3706674 -> 3717266 (0.29%) uniforms in affected programs: 63075 -> 73667 (16.79%) helped: 40 HURT: 395 total max-temps in shared programs: 2176080 -> 2174177 (-0.09%) max-temps in affected programs: 15838 -> 13935 (-12.02%) helped: 316 HURT: 34 total spills in shared programs: 4247 -> 4010 (-5.58%) spills in affected programs: 2599 -> 2362 (-9.12%) helped: 107 HURT: 14 total fills in shared programs: 6121 -> 5820 (-4.92%) fills in affected programs: 3622 -> 3321 (-8.31%) helped: 108 HURT: 13 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Guilherme Gallo	d1c6185b5a	ci: skqp: Add Vulkan support for a630_skqp job This commit adds support for Vulkan backend on a630_skqp job. = Needed changes - Needed to install libvulkan-dev package on system - Refactored the way the available skqp reports are printed tested in development builds with skia tools Piglit expectations had to be updated in various drivers due to !14750 not having bumped the tags when it tried to uprev. Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>	2022-02-25 05:50:06 +00:00
Iago Toral Quiroga	cf99584f51	broadcom/compiler: move uniforms right before their first use after scheduling On V3D the quality of the code we generate is significantly affected by how we decide to assign accumulators during register allocation, which is determined by liveness, favoring short-lived temps. There are many shaders that end up doing a whole lot of uniform loads first, and using them later, which is very inconvenient for our register allocation process because this increases uniform liveness and causes us to use accumulators less efficientely, leading to significant churn. To fix this, we move uniforms right before their first use in the same block, but we need to do this after NIR scheduling, which means we are doing it in non-SSA form, since the scheduler has a tendency to undo this optimization and it is not easy to modify it to avoid it, since it works in more abstract terms, using instruction dependencies, estimated register pressure and instruction delay information to do its work, which are very different concepts. total instructions in shared programs: 13316738 -> 13033613 (-2.13%) instructions in affected programs: 10389172 -> 10106047 (-2.73%) helped: 55442 HURT: 16144 total threads in shared programs: 413722 -> 415048 (0.32%) threads in affected programs: 1428 -> 2754 (92.86%) helped: 680 HURT: 17 total loops in shared programs: 1716 -> 1690 (-1.52%) loops in affected programs: 26 -> 0 helped: 26 HURT: 0 total uniforms in shared programs: 3704313 -> 3705181 (0.02%) uniforms in affected programs: 687730 -> 688598 (0.13%) helped: 2920 HURT: 7384 total max-temps in shared programs: 2364785 -> 2175190 (-8.02%) max-temps in affected programs: 1215387 -> 1025792 (-15.60%) helped: 49667 HURT: 1556 total spills in shared programs: 4241 -> 4248 (0.17%) spills in affected programs: 642 -> 649 (1.09%) helped: 11 HURT: 19 total fills in shared programs: 6115 -> 6125 (0.16%) fills in affected programs: 1276 -> 1286 (0.78%) helped: 11 HURT: 21 total sfu-stalls in shared programs: 34381 -> 36578 (6.39%) sfu-stalls in affected programs: 16055 -> 18252 (13.68%) helped: 3647 HURT: 5206 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>	2022-02-24 11:36:00 +00:00
Iago Toral Quiroga	c4a78a2d2a	broadcom/compiler: fix register class patching for postponed spills If we have a postponed spill, the temp we create at ip is no longer the spilled temp and therefore is affected by the thrsw injection. Fixes corruption in the additive blending animation demo from Three.js. Fixes: `f3c3228522` ('broadcom/compiler: do not rebuild the interference graph after each spill') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>	2022-02-22 11:17:10 +00:00
Erik Faye-Lund	25a37fabb7	vulkan/wsi: untangle buffer-images from prime Not all Vulkan implementations allows rendering to linear images, so in order to support scanning out from these on Windows we might have to copy through a buffer like we do in the PRIME path. To avoid reimplementing the same, let's instead generalize the code a bit so it doesn't have to specfy any PRIME-specific details. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>	2022-02-22 10:04:34 +00:00

1 2 3 4 5 ...

1947 commits