And document where to find information on qcom gralloc's private handle
layout. I chose not to #include the gralloc_priv because it seems that
there's not much we need yet, and I'm hoping we can avoid the build-time
dependency on the specific platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7015>
This reverts commit bcfec61d1e.
The previous patch fixed the underlying issue that the above commit was
actually working around. It turns out that the previously observed
performance regression was due to invalid aux-map entries for
multi-layer HiZ+CCS buffers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Fixes rendering corruption in the shadowmappingcascade Sascha Willems
Vulkan demo. To see the corruption, I adjusted the demo options as
follows:
1. Enable "Display depth map"
2. Set "Split lambda" to 0.100
3. Make "Cascade" non-zero.
Fixes: 80ffbe915f ("anv: Add support for HiZ+CCS")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Shaders may not use a particular region of a UBO in a given shader (think
UBOs shared between stages, or between shaders), and by just always
extending the existing range for a given UBO, we'd waste bandwidth
uploading it, and also waste our precious const space in storing the
unused data.
Instead, only upload exactly the ranges we can use, and merge ranges when
they're neighbors. We may end up with more upload packets, but the
bandwidth savings is surely going to be worth it (and if find we want a
distance threshold for merging with nearby uploads, that would be easy to
add).
total instructions in shared programs: 9266114 -> 9255092 (-0.12%)
total full in shared programs: 343162 -> 341709 (-0.42%)
total constlen in shared programs: 1454368 -> 1275236 (-12.32%)
total cat6 in shared programs: 93073 -> 82589 (-11.26%)
total (ss) in shared programs: 212402 -> 206404 (-2.82%)
total (sy) in shared programs: 122905 -> 114007 (-7.24%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7036>
rehashing a populated hash table is very expensive, so for the case where
the maximum/likely table size is already known, this function allows for
pre-sizing the table to avoid ever needing a rehash
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7037>
Add resource pointers ptr1 and ptr2 and offsets offset1 and offset2,
and just emit relocs if the pointers are non-NULL. This lets us move
a little more logic to the CSO building.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6693>
Some GPUs can sample biplanar formats like NV12 natively, returning
the YUV values. Add a lowering type that uses that for sampling and
relies on existing colorspace conversions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6693>
This is a planar, subsampled format. It's basically NV12, but without
colorspace conversion.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6693>
On Gen7, the data cache is pretty terrible so we'd rather avoid it
there. On Gen8+, it should be fine and is less likely to conflict with
texturing so we should get less cache thrashing there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>
It's identical to nir_intrinsic_load_global except that it works on data
that's guaranteed to be constant throughout the shader invocation.
Fixes: ff2f44d865 "intel/fs: Implement nir_intrinsic_load_global_constant"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
In 53bfcdeecf, we added load/store_scratch instructions which deviate
a little bit from most memory load/store instructions in that we can't
use the normal untyped read/write instructions which can read and write
up to a vec4 at a time. Instead, we have to use the DWORD scattered
read/write instructions which are scalar. To handle this, we added code
to brw_nir_lower_mem_access_bit_sizes to cause them to be scalarized.
However, one case was missing: the load-as-larger-vector case. In this
case, we take small bit-sized constant-offset loads replace it with a
32-bit load and shuffle the result around as needed.
For scratch, this case is much trickier to get right because it often
emits vec2 or wider which we would then have to lower again. We did
this for other load and store ops because, for lower bit-sizes we have
to scalarize thanks to the byte scattered read/write instructions being
scalar. However, for scratch we're not losing as much because we can't
vectorize 32-bit loads and stores either. It's easier to just disallow
it whenever we have to scalarize.
Fixes: 53bfcdeecf "intel/fs: Implement the new load/store_scratch..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
If -b is specified, we don't add a null to the end of the char array.
If -b is not specified, we assert that there are no nulls in the middle.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7034>
sys and string are unused, os is needed but not imported
fixes: 412472da5c
("glsl: Add utility to convert text files to C strings")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7034>
This can be useful if you rsync an install between two machines and the
paths don't perfectly match up. OpenGL drivers already work fine but
anything which uses pipe-loader has a compile-time path.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
This commit enables clover support for iris. It is intended as a
compiler developer tool and not as a new OpenCL implementation from
Intel. If you want competent OpenCL, we have a different open-source
driver for that built on our LLVM-based IGC compiler stack. However,
using clover with iris is becoming increasingly useful as a compiler
development tool and I'm getting tired of carrying the patches in a
private branch.
By default, clover will not initialize on iris. To enable clover, set
the IRIS_ENABLE_CLOVER environment variable to "1" or "true". As we've
done with the semi-sketchy platform support in ANV, it dumps a very loud
WARNING to stderr when enabled. Use at your own risk.
NOTE: To anyone intending to benchmark this, the performance is going to
be terrible and that is expected. This is in no way representative of
the Intel/NIR compiler stack. As it currently stands, clover passes
-O0 to clang when compiling OpenCL C to make SPIRV-LLVM-Transator work.
When compiling the SPIR-V, clover currently doesn't run any NIR
optimizations before it lowers memory access so any NIR optimizations
iris attempts to do are severely hampered. One day, clover will get a
NIR optimization loop or the ability to hand things off to the driver
per-lowering but today is not that day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
The value specified in pipe_compute_state is in addition to the implicit
value computed by NIR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
To ask to debug a registr allocation failure
(V3D_DEBUG_REGISTER_ALLOCATION seemed too long to me).
When a fallback register allocation algorithm was added, if the
register allocation fails, it only dumpg the current vir with the
register pressure info with the failed fallback. But if we want do
debug the problem, we would be interested on both.
Additionally, it was strange that we got the full vir dump with the
failure even if no debug option was set.
Additionally we add shaderdb like stats for those failures, to make
easier to compare one and the other.
v2: keep a small warning message in case both register allocation
algorithms fails (Neil)
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
This is needed due Vulkan because by spec (31.1. Limit Requirements)
the minimum value for the following limits are the following ones:
maxPerStageDescriptorSampledImages 16
maxPerStageDescriptorStorageImages 4
maxPerStageDescriptorInputAttachments 4
And we are using v3d textures for all of them, so current limit would
not be enough for some cases.
Note that as the current comment explains there is not exactly a HW
limit for it, so we could bump to 32 for example, but let's just be
conservative and ask the minimum required.
It is worth to note that we needed to maintain the same value for the
OpenGL case, as it gets a register allocation failure on some GL
cases. We tried to fix that with small changes on the nir scheduler,
but we found that it would require some non-trivial effort to get it
done (that eventually we would need to).
Fixes tests like:
dEQP-VK.binding_model.descriptorset_random.sets16.constant.ubolimitlow.sbolimitlow.imglimitlow.noiub.uab.comp.noia.0
v2: keep the previous limit for Opengl (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
Shaders may read out components past the attributes provided by the
application, so the read mask can indicate a larger component count than
were actually reserved in the array.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>
Previously, ycbcr samplers were tightly packed with 4-byte alignment,
but the structure requires 8-byte alignment. These samplers are now padded
to 8-byte boundaries instead.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>
"max_bindings + 1" was repeatedly used throughout this function,
so talking about the binding *count* is more natural here.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>
Vulkan allows for these input pointers to be null when the respective
object count is zero. Calling memcpy with null pointers is undefined,
so they are guarded with a check for the legit use pattern now.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>