Currently we have 3 paths for ALU serialization/deserialization in NIR:
1. If the ALU is qualified as packed_src_ssa_16bit (identity swizzle)
- The write will store an object index only.
- The read will only load the swizzles actually used, the rest are 0.
2. If the ALU is not qualified as packed_src_ssa_16bit, we have two cases:
2.1 Up to vec4:
- The write stores all 4 swizzle components.
- The read loads all 4 swizzle components.
2.2 vec8/16
- The write stores only swizzle components used, the rest are 0.
- The read loads only swizzle components used, the rest are 0.
This inconsistency in how these paths encode/decode unsused swizzle components
can cause issues in some scenarios where a backend compiler may receive
functionally equivalent NIR shaders from Mesa that won't produce the same sha1,
leading to unnecessary cache misses.
This patch makes path 2.1 always encode and decode unused swizzle components
as 0, making it consistent with the other paths.
This fixes issues where sometimes backends need to compile a shader twice
before it is effectively retrieved from the disk cache. This has been
observed at least with V3d and Panfrost.
The problem occurs when an ALU src with unused swizzle components is serialized
in the Mesa frontend using path 1, but when it later hits the backend it is
serialized using path 2.1. The backend uses the sha1 of the serialized NIR for
the cache key. On the second execution the Mesa frontend has a cache hit and
when it deserializes the alu src, it sets its unused components to 0 but that
will cause the backend to have a cache miss since that NIR doesn't match the one
it cached on the first execution.
By always making unused swizzle components decode and encode consistently to 0
in all paths we ensure the issue never happens and that NIR variants that only
differ in swizzle components that are not used lead to cache hits.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37218>
This fixes new VKCTS coverage
dEQP-VK.descriptor_indexing.non_uniform_atomics.
Found this while implementing a new extension.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37295>
VDPAU only supports X11 and GL interop. There is no Wayland or Vulkan
interop support. The API has limitations that makes it impossible to
correctly decode certain streams.
Application support is also very limited, and VAAPI is always a better
choice over VDPAU.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>
This will allow ubo buffers to have arrays containing millions of
elements without excessive memory use on a remap table. Before this
change using the max sized array on radeonsi would result in 1.3GB
of memory being used for a remap table in a single shader.
There is also a small functional change here, previously if the
shader used more than GL_MAX_UNIFORM_BLOCK_SIZE mesa would ignore
and allow this as the original ARB_uniform_buffer_object spec
stated:
"If the amount of storage required for a uniform block exceeds
this limit, a program may fail to link."
However in OpenGL 4.3 the text was clarified and the "may" was
removed so with this change we enforce the max limit.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9953
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36997>
This was incorrect (it also lowered int64 reductions/scans), and the only
user can just use the general callback to precisely only lower what it wants.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
This was added with the goal to eventually replace the per
pass subgroup/ballot size options, but that won't work because
some backends don't have a fixed subgroup size across the compilation
process.
It was also mostly added to hack around mesa state tracker behavior,
and we have a better solution there now.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
We skip iterations with ifs.
These can be optimized aways after the subgroup size is known.
Every driver should do that because applications depend on it anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
If the offset is iadd(iadd(iadd(a, 1), b), -1), try_extract_const_addition
will create a dead iadd(a, b) and claim that it didn't modify the shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.
No stat changes on shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
We're about to add to nir_metadata_control_flow, and we don't want
passes to require the new metadata.
Via coccinelle:
@@
expression e1;
@@
- nir_metadata_require(e1, nir_metadata_control_flow)
+ nir_metadata_require(e1, nir_metadata_block_index | nir_metadata_dominance)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>