Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7260>
Clip & cull distances, which are compact arrays, exposed a lot of holes
because they can take up multiple slots and partially overlap.
I wanted to eliminate our dependence on knowing the layout of the
variables, as this can get complicated with things like partially
overlapping arrays, which can happen with ARB_enhanced_layouts or with
clip/cull distance arrays. This means no longer changing the layout
based on whether the i/o is part of an array or not, and no longer
matching producer <-> consumer based on the variables. At the end of the
day we have to match things based on the user-specified location, so for
simplicity this switches the entire i/o handling to be based off the
user location rather than the driver location. This means that the
primitive map may be a little bigger, but it reduces the complexity
because we never have to build a table mapping user location to driver
location, and it reduces the amount of work done at link time in the SSO
case. It also brings us closer to what the other drivers do.
While here, I also fixed the handling of component qualifiers, which was
another thing broken with clip/cull distances.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
OUT_OF_MEMORY is the only valid error code from this function, but this
error is more of a "things went horribly wrong, you can't talk to the GPU"
case. Set the device to be in error.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7224>
../src/freedreno/decode/cffdec.c: In function ‘reg_disasm_gpuaddr’:
../src/freedreno/decode/cffdec.c:404:29: error: ‘sprintf’ writing a
terminating nul past the end of the destination [-Werror=format-overflow=]
404 | sprintf(filename, "%04d.%s", n++, ext);
../src/freedreno/decode/cffdec.c:404:3: note: ‘sprintf’ output between
9 and 16 bytes into a destination of size 8
404 | sprintf(filename, "%04d.%s", n++, ext);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7224>
Given that we already link to Android's libsync, use it instead of using a
build-time dependency on libdrm for the KGSL path. This also would help
for older kernel compat with KGSL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6821>
And document where to find information on qcom gralloc's private handle
layout. I chose not to #include the gralloc_priv because it seems that
there's not much we need yet, and I'm hoping we can avoid the build-time
dependency on the specific platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7015>
Shaders may not use a particular region of a UBO in a given shader (think
UBOs shared between stages, or between shaders), and by just always
extending the existing range for a given UBO, we'd waste bandwidth
uploading it, and also waste our precious const space in storing the
unused data.
Instead, only upload exactly the ranges we can use, and merge ranges when
they're neighbors. We may end up with more upload packets, but the
bandwidth savings is surely going to be worth it (and if find we want a
distance threshold for merging with nearby uploads, that would be easy to
add).
total instructions in shared programs: 9266114 -> 9255092 (-0.12%)
total full in shared programs: 343162 -> 341709 (-0.42%)
total constlen in shared programs: 1454368 -> 1275236 (-12.32%)
total cat6 in shared programs: 93073 -> 82589 (-11.26%)
total (ss) in shared programs: 212402 -> 206404 (-2.82%)
total (sy) in shared programs: 122905 -> 114007 (-7.24%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7036>
turnip doesn't implement pre-emption, this hasn't been a problem with drm
backend since the kernel driver doesn't implement it either, however this
causes issues with kgsl backend.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6994>
v2:
* Use sub_cs when creating the IB in tu6_build_lrz(). (Jonathan Marek)
* Emit tu6_build_lrz() only when pipeline state changes or there is a
clear. (Jonathan Marek)
v3:
* Don't modify tu_pipeline object, track the changes in command buffer
state.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
v2:
* Don't emit tu6_clear_lrz() using a IB but in the command stream
provided. (Jonathan Marek)
* Valid_clear_ib is always false if TU_DEBUG_NOLRZ is set. Remove the
useless condition. (Jonathan Marek)
* Added more comments.
* Use r2d function for blitting LRZ. (Jonathan Marek)
v3:
* Do LRZ tracking in the command buffer state (Connor).
v4:
* Simplify the emission of source setup (Jonathan Marek)
v5:
* Separate LRZ setup in a different function.
* Not hide LRZ setup inside GMEM path (Jonathan Marek)
* Fix iova address emission in tu6_clear_lrz() (Jonathan Marek)
* Add CCU sysmem flushes (Jonathan Marek)
v6:
* Fixed bug related to storing a VkClearValue pointer that could be
out-of-scope when we access to it for emitting LRZ clear.
v7:
* Merge tu6_clear_lrz() and tu6_clear_lrz_setup() into the same
function and emit LRZ clear at the beginning of the renderpass.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
Disable LRZ write if the fragment shader discard the fragments, modify
its position or if early-Z is disabled.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
There are depth compare op modes that are not supported by LRZ in the
HW. Also, it is not supported when blend or stencil are enabled.
v2:
* Set pipeline->lrz.write to the same value than depthWriteEnable.
* Improve comment on disabling LRZ write on blend.
* Remove pipeline's lrz invalidation when there is no clear mask in
render pass. It is confusing. (Jonathan Marek)
* Mark the pipeline state as changed.
* Add comment on not using GREATER flag.
v3:
* Replace {rb,gras}_lrz_cntl by flags in struct tu_pipeline.
* Added z_test_enable flag.
v4:
* Created struct tu_lrz_pipeline to avoid modifying immutable objects.
v5:
* Fixed crashes when pDepthStencilState pointer is NULL.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
v2:
- Add missing vulkan subpass support. (Jonathan Marek)
- When creating the BO, mark it as not valid until it is cleared.
- Move LRZ struct to tu_image. (Jonathan Marek)
- Destroy BO when we destroy the image. (Jonathan Marek)
v3:
- Allocate the buffer as part of the image's BO (Connor)
- Moved image's LRZ values to its layout.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
This reverts commit 4fb2eddfdf.
This reverts commit 7a1deb16f8.
This reverts commit 2b6a172343.
This reverts commit 5af81393e4.
This reverts commit 87900afe5b.
A couple of problems were discovered after this series was merged that
cause breakage in different configurations:
(1) It seems that using -mf16c also enables AVX, leading to SIGILL on
platforms that do not support AVX.
(2) Since clang only warns about unknown flags, and as I understand
it Meson's handling in cc.has_argument() is broken, the F16C code is
wrongly enabled when clang is used, even for example on ARM, leading
to a compilation error.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3583
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6969>
We're about to introduce conversion ops which are going to want two
different types. We may as well just split the one we have rather than
end up with three. There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This will merge loads of UBO components together into vec4 loads. At the
same time, it improves the alignment information on our loads, fixing the
regression from the vec3 loads fix.
shader-db results:
total instructions in shared programs: 12829370 -> 8755851 (-31.75%)
total cat6 in shared programs: 145840 -> 97027 (-33.47%)
Overall results from before the vec3 fix:
total instructions in shared programs: 8019997 -> 8755851 (9.18%)
total cat6 in shared programs: 87683 -> 97027 (10.66%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>