Move the tool to summarize a failed pipeline to a generic .marge/hooks
directory. This will allow the fdo-bots repo to handle all marge hooks in
a consistent way across repositories that use this service.
Add a symlink to the bin/ci directory so that the pipeline summary tool
can still be run locally as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32413>
Notably, our convergent block loads were already overfetching - we
rounded up to block sizes of 8, 16, 32, or 64(LSC-only). But we did
so in the backend, rather than NIR.
With recent changes, nir_opt_load_store_vectorizer allows holes of up
to 28 bytes (7 components at 4 bytes each). This allows us to detect
cases where we did a convergent block load for 1 component (but loaded
a whole vec8), then another load for the next vec8, and combine them
into a single V16 load. Single component loads aren't the most common,
but convergent loads of a vec2 in one group and a vec3 in another are
quite common, and it makes no sense to do V8+V8 loads instead of V16.
For non-block loads, we allow a max hole of 4 bytes. This allows the
common case of XYZ_ + XYZ_ loads (where the last component is unread)
to combine into a single larger load.
fossil-db results on Lunarlake:
Totals:
Instrs: 146692608 -> 146246432 (-0.30%); split: -0.33%, +0.02%
Subgroup size: 11100528 -> 11100512 (-0.00%)
Send messages: 7003425 -> 6862529 (-2.01%); split: -2.01%, +0.00%
Cycle count: 22396273274 -> 22523048654 (+0.57%); split: -1.08%, +1.64%
Spill count: 67671 -> 67594 (-0.11%); split: -1.59%, +1.48%
Fill count: 128999 -> 130223 (+0.95%); split: -1.73%, +2.68%
Scratch Memory Size: 5986304 -> 6042624 (+0.94%); split: -1.40%, +2.34%
Max live registers: 48898858 -> 48881655 (-0.04%); split: -0.05%, +0.01%
Non SSA regs after NIR: 172397792 -> 167577380 (-2.80%); split: -2.80%, +0.00%
Totals from 451003 (80.87% of 557667) affected shaders:
Instrs: 134111754 -> 133665578 (-0.33%); split: -0.36%, +0.03%
Subgroup size: 9039104 -> 9039088 (-0.00%)
Send messages: 6127775 -> 5986879 (-2.30%); split: -2.30%, +0.00%
Cycle count: 20306336726 -> 20433112106 (+0.62%); split: -1.19%, +1.81%
Spill count: 56230 -> 56153 (-0.14%); split: -1.92%, +1.78%
Fill count: 112920 -> 114144 (+1.08%); split: -1.97%, +3.06%
Scratch Memory Size: 3769344 -> 3825664 (+1.49%); split: -2.23%, +3.72%
Max live registers: 43750259 -> 43733056 (-0.04%); split: -0.05%, +0.01%
Non SSA regs after NIR: 158449343 -> 153628931 (-3.04%); split: -3.04%, +0.00%
In particular, sends get cut by 20.85% for Borderlands 3 DX12, 13.82%
on Cyberpunk 2077, 10.75% on Strange Brigade, and 10.20% on Red Dead
Redemption 2. Yet, spill/fills remain about the same.
fossil-db results on Alchemist are similar though not quite as good.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
nir_opt_load_store_vectorize checks for potential address wrapping
when vectorizing two loads ("low" and "high"). It looks for cases where
"low" might have a large address, and "high" has a positive offset
which, when added together, could trigger integer wraparound. The issue
here is that if the large address of "low" was considered out-of-bounds,
adding offset could wrap around to a small address, which might actually
be in-bounds. Thus, when loaded separately, "low" will fail and trigger
robustness out-of-bound-read behavior, but "high" would read correctly.
When vectorized, the entire load would fail. This is explicitly tested
for with 32-bit SSBO addresses in the Vulkan CTS.
However, anv's 64-bit global addresses and VMA handling effectively
prevent this case. Addresses 0-4095 are a reserved page so that if
people try to use 0 as a NULL pointer, it never maps to a valid BO.
That alone guarantees that the above case where "high" gets a small
address would never be in-bounds, so we don't need to check for it.
In fact, we allocate most user allocations out of high addresses,
and have specialized allocation heaps for certain types of GPU data
structures in the lower GB of memory. For a load to wrap around and
successfully land in the right heap, it would have to load gigabytes.
Disabling this allows load vectorization and overfetching in more cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments). This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8. We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.
So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.
Most drivers and intrinsics will not want such large holes. I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.
No driver callbacks currently allow holes, so this should not currently
affect any drivers. But any work in progress branches may need to be
updated to reject larger holes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
Just calculate the block size using util_logbase2() - it's simpler.
Also drop the name "oword" as this refers to legacy HDC messages,
rather than the newer LSC "vector size" field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
This will matter more with overfetching, where we may suggest loading
additional data that we don't actually need for vectorization purposes.
We want to make sure that push ranges have the data we actually need;
any extra padding is irrelevant.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
i965 used to upload its own regular GL uniforms and push those in
addition to UBO ranges. st/mesa instead uploads regular uniforms
and presents those to use as UBO 0. So this really isn't a thing
anymore.
nir_intrinsic_load_uniform is still used today but it represents
Vulkan push constants. anv_nir_compute_push_layout already takes
care of ensuring too many ranges aren't present, so it doesn't need
the pass to do so. iris doesn't use this intrinsic at all.
We can also drop the compute shader check, because neither iris nor
anv use UBO push analysis for compute shaders - except for anv's
internal kernels, which already have well specified push layouts.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
When this pass is used with Zink, gl_PrimitiveID needs to be passed
through, however this is unnecessary for other divers.
Analogous to previous commit
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: d0342e28b3 ("nir: Add helper to create passthrough GS shader")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397>
Zink emulates quads with a GS, which imposes requirements for gl_PrimitiveID.
Handle them here. Previously Zink went out of spec.
Fixes spec@glsl-1.50@execution@primitive-id-no-gs-quads and
spec@glsl-1.50@execution@primitive-id-no-gs-quad-strip.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Fixes: e2220ee55e ("zink: filled quad emulation gs generation function")
Closes: #12214
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397>
DXIL requires that each I/O variable has a unique semantic name, but when
dealing with semantics that take up multiple slots, that variable implicitly
takes up multiple names. So when assigning driver_location, we need to do
the same.
That means also updating outputs and patch constants to have a mapping from
driver_location to a compacted index, since the metadata arrays *can't* have
holes.
This would be simpler if we could hang it off the nir_variable but there's
not really any free fields to be able to do that. We only need this compacted
mapping inside the DXIL backend anyway so we can just store the array in the
module.
Tested-by: Benjamin Otte <otte@gnome.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12128
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32047>
Vulkan 1.4 can only be exposed on a7xx devices due to a number of bumps
in the required limits, including bumping maxDescriptorSets to 7. a7xx
bumped the number of bindless bases from 5 to 8, with one reserved for
the driver.
I've followed what we've already done and exposed a conformanceVersion
of 1.4.0.0 for all a7xx devices, even though I've only submitted
conformance for X1-85. I'm not sure if we want to change this, but at
least for now a618 on Chromebooks and X1-85 on laptops are the only
cases where turnip is being "shipped" to users in some official
capacity, so it shouldn't be a huge deal.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32437>
Vulkan runtime doesn't layer vkGetImageMemoryRequirements2
on top of vkGetDeviceImageMemoryRequirements, as that would
require initializing a full image, which is expensive on
certain drivers such as NVK, so it's up to us to implement
both functions.
In our implementation of vkGetDeviceImageMemoryRequirements,
we initialize a slimmed down image and then forward everything
to vkGetImageMemoryRequirements2.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32361>
This factors out the initialization of panvk_image, so we can reuse the
logic for computing requirements without crating an actual VkImage
object first.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32361>
This format was already supported on Bifrost as a single
plane format. Valhall doesn't support this interleaved D32_S8,
so we add support for multiplanar D32_S8 and move Bifrost to
this layout too, as it's more memory efficient than the
interleaved layout.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
Pass pan_image_section_info around instead of passing each field
of the struct separately.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
The layout can be extracted from the iview and plane_index arguments.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
Index is vague as it could refer to the array index too. Let's clarify
the situation by renaming the argument plane_index.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
The format is never adjusted, and can thus be extracted from the view.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
This allows us to properly split the multiplanar and single plane cases
in panfrost_emit_surface(), which makes the code easier to follow.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
The depth and stencil planes might be different. Let's add a specific
helper to retrieve the stencil plane. We keep using
pan_image_view_get_zs_plane() for the depth plane, because it's
guaranteed to always be on the first plane.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
Pass an image to pan_force_clean_write_rt() so we can easily
support the multiplanar depth-stencil case, and rename the
function pan_force_clean_write_on() to avoid the confusion.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
We are about to add multiplanar depth/stencil support. A stencil
only view of a multiplanar d32_s8 format will have NULL depth plane
(plane0), so we need to prepare the texture logic to deal with that.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32275>
pan_resource_modifier_convert can use a blit to convert images
from AFBC. If we call this from panfrost_set_shader_images then
we end up crashing due to using an inconsistent set of images.
Fix this by doing the AFBC/AFRC conversion before the image
bindings.
This fixes a crash in piglit oes_egl_image_external_essl3 tests.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30243>