Previously, custom buffer descriptors were owned by a descriptor set. Now,
custom buffer descriptors are owned by the buffer. Additionally, we respect
the app-provided sizes when they're smaller than the buffer size, even if
robustness is not enabled, so that size queries work correctly.
This new design fixes several issues:
* Descriptor set copies were broken when they involved custom descriptors,
because the original descriptor set owned the lifetime of the custom
descriptor, the new one was just borrowing it. If those lifetimes didn't
line up, problems would arise.
* A single buffer with the same sub-view placed in multiplel descriptor sets
would allocate multiple slots, when it only really needed one.
* Custom buffer descriptors now lower the base offset to 0 to allow merging
multiple overlapping (ending at the same upper bound) descriptors. Since
the shader is already doing an offset add, making it nonzero is free.
* Dynamic buffer descriptors were incorrect before. The size passed into the
descriptor set is supposed to be the size from the *dynamic* offset, not the
size from the static offset. By allocating/populating the descriptor when
placed into the set, it prevented larger offsets from working correctly. This
buffer-owned design prevents cmdbufs from having to own lifetime of custom
descriptors.
Fixes dEQP-VK.ssbo.unsized_array_length.float_offset_explicit_size
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22639>
Fixes dEQP-VK.draw.renderpass.depth_bias.depth_bias_triangle_list_point
This is not complete, there's no slope scale or clamp handling, but it
does handle static or dynamic (though dynamic is untested).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22639>
This lets us follow the Vulkan spec requirements for MSAA line
rasterization, using a width of 1.0 instead of D3D's proscribed
width of 1.4. There's no reason to predicate this on MSAA being
enabled, since quadrilateral lines with a width of 1.0 are actually
the most desired type of line rasterization for Vulkan.
Follow-ups:
- We can probably turn on 'strict lines' when this is supported.
- We should enable the line rasterization mode extension.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22606>
The new code splits the work into a few passes instead of trying to do
everything with a single pass. This helps to apply the new clarified
rules for structured control flow in the SPIR-V specification, in
particular the "exit construct" rules.
First find an appropriate ordering for the blocks, based on the
approach taken by Tint (WebGPU compiler). Then, with those blocks
in order, identify the SPIR-V constructs start and end positions.
Finally, walk the blocks again to emit NIR for each of them, "opening"
and "closing" the necessary NIR constructs as we reach the start and
end positions of the SPIR-V constructs.
There are a couple of interesting choices when mapping the constructs
to NIR:
- NIR doesn't have something like a switch, so like the previous code,
we lower the switch construct to a series of conditionals for each
case.
- And, unlike the previous code, when there's a need to perform a
break from a construct that NIR doesn't directly support (e.g. inside
a case construct, conditionally breaking early from the switch), we
now use a combination of a NIR loop and an NIR if. Extra code is
added to ensure that loop_break and loop_continues are propagated
to the right loop.
This should fix various issues with valid SPIR-V that previously
resulted in "Invalid back or cross-edge in the CFG" errors.
Thanks to Alan Baker and David Neto for their explanations of
ordering the blocks, in the Tint code and in presentations to
the SPIR-V WG.
Thanks to Jack Clark for providing a lot of valuable tests used to
validate this MR.
Closes: #5973, #6369
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17922>
There's bad interactions with dynamic buffers at this point:
* Perf issues due to allocating and freeing the buffer to store indices/offsets
* Large dynamic uniform buffer offsets (above 65K) cause out-of-bounds reads
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22371>
The scenario of:
* App binds multiple descriptor sets
* App binds a pipeline that uses a subset of them
* App binds a pipeline that uses more of them
was broken. We were only copying the descriptors for the accessible
subset before, but then clearing all dirty bits, so simply changing
the pipeline wouldn't result in more descriptors being copied.
When running not-bindless, the right thing to do is to copy *all*
descriptors if we're copying any. When running bindless, each parameter
is set separately, and more importantly, *can't* be set on the command
list if the root signature can't access them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22371>
We can now determine whether a nir_src is for an if without a sideband, so
simplify the function signature.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Faith Ekstrand <faith@gfxstrand.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
If the app passes us unaligned buffer offsets, we need to align them
down to the nearest aligned offset, and then put the difference into
the descriptor set buffer.
Fixes: 8bd5fbf8 ("dzn: Bind buffers for bindless descriptor sets")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Cache coherent UMA implies that the GPU is reading data through the
CPU caches. Using write-combined CPU pages for such a system would
be bad, since the GPU would then be reading uncached data. One
example of such a system is WARP. This significantly improves WARP's
performance for some apps (including the CTS).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>