The changes to try_copy_propagate will be removed later in the series.
v2: Fix up some comments to note that offset != 0 is allowed only when
stride == 0. Apply same offset=0 restriction in try_copy_propagate_def
too. Allow copy propagation if the source is either a def or
UNIFORM. Don't copy prop a load_reg through a non-def value.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
load_reg is something like load_payload except it has a single
source. It copies the entire source to the destination. Its purpose is
to convert a non-SSA VGRF into an SSA value. This copy is marked as
volatile so that it will act as a scheduling barrier.
v2: Fix some typos in the commit message. Eliminate the
brw_builder::LOAD_REG overload that returns a brw_inst*. This is
unlikely to ever be used. Add some checks to brw_validate. All
suggested by Caio.
v3: Force the source and destination types of the LOAD_REG to by
integer. This will (eventually) simplify the creating of unit tests for
the pass that adds LOAD_REG instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Move to immediately before the main optimization loop. Most
importantly, this is after the first call to DCE.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Non SSA regs after NIR: 237045283 -> 100183460 (-57.74%); split: -58.12%, +0.39%
Totals from 701423 (99.26% of 706657) affected shaders:
Non SSA regs after NIR: 236868848 -> 100007025 (-57.78%); split: -58.17%, +0.39%
Suggested-by: Ken
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
device-select-layer needs to obtain the display server's preferred
display device, and has so far relied on wl_drm for this. wl_drm is
superseded by linux-dmabuf with some Wayland servers having dropped
support for wl_drm entirely.
Implement linux-dmabuf as preferred mechanism for obtaining the main
device, with wl_drm support retained as a fallback for now.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34219>
Starting with sRGB meant we would refcount to -1 if an application
chooses PASS_THROUGH. Instead, just initialize with PASS_THROUGH so the
initial refcount of 0 reflects reality.
Previously, we would segfault if an application chose PASS_THROUGH at
swapchain initialization then switched to a color managed colorspace
later in the runtime, because we would increment refcount from -1 -> 0
and this would result in not creating a new color managed surface.
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Signed-off-by: llyyr <llyyr.public@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34353>
Calling dri2_x11_setup_swap_interval with swap_available = false
sets the min/max/default swap interval values to zero.
EGL_MIN/MAX_SWAP_INTERVAL is always reported as 0 and the interval
value set by eglSwapInterval gets clamped to 0.
Set swap_available to true before calling dri2_x11_setup_swap_interval,
as was done before.
Fixes: c00701c83a ("egl/x11: unify swrast/kopper/dri3 paths a bit")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34235>
This field was never used for determining the number of outputs,
just for determining whether streamout was enabled, which makes
it unnecessary. We can use enabled_stream_buffers_mask for that.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
Shaders may have undefined output stores after nir_opt_varyings.
These must be optimized out, otherwise they hit an assertion.
Fixes: 17f6ab28cc
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
We need to enable these buffers regardless of whether or not the
shader actually writes any outputs to them, otherwise we break
XFB queries.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
We need to remember which streamout buffers and streams were enabled,
even if the shader doesn't actually write any outputs to them,
because the API requires that we count vertices created by this shader
towards queries against those streams.
That information can be gathered by nir_gather_xfb_info_with_varyings
from the original NIR I/O variables that we get from the frontend,
but it isn't included in any intrinsics so would be otherwise lost here.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
On i686, where VK_USE_64_BIT_PTR_DEFINES is unset and Vulkan handles are
represented as 64-bit integers instead, the code used the wrong format
specifier, causing a build error.
Fixes: 7fb31361f4 ("Handle external fences in vkGetFenceStatus()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34124>
Per spec, logic operations between fragment values and color attachments
should be disabled when attachments are using float or sRGB formats.
Regardless of attachment's format, enabled logic operations should keep
blending disabled.
Fixes: dEQP-VK.pipeline.*.logic_op_na_formats.*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34212>
Context flushes can be caused by all kinds of operations that aren't
obvious to a GL API user. As those are quite heavy-weight operations
it is nice to have some insight into how many of those are happening
per frame. Add a sw query to make this information easily accessible.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34350>
For the pReferenceSlots.slotIndex, the max
value should the maxDpbSlots which is
h264: 16 + 1
h265 : 15 + 2
av1: 7+2
Fixing SVA_CL1_E test vector in JVT-AVC_V1
fluster test suite.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33094>
Since shared RA happens after creating merge sets, newly inserted
splits/collects did not have merge sets created for them. Fix this by
creating merge sets for new instructions after shared RA.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33319>
Shared RA might insert new defs to be handled by regular RA (e.g.,
shared spills). However, their interval offsets were not initialized
which caused their intervals to sometimes be mistakenly matched with
those containing offset 0. Fix this by calling index_merge_sets after
shared RA and modifying that function to only index new defs in that
case.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33319>
This command isn't supposed to be affected by conditional rendering.
This fixes new VKCTS coverage
dEQP-VK.conditional_rendering.conditional_ignore.resolve_image*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34338>
shpe is a bit of a special instruction: it's not really a terminator
(i.e., it does not perform a jump) but it does have to stay at the end
of its block. Up to now, we tried to enforce this by creating const
write barriers on shpe; the assumption being that everything that
happens in the preamble ends in a write to the const file so shpe stays
at the end. Alas, it turns out this is not true: things like sampler
prefetches do not write the const file and nothing was preventing those
from being scheduled after shpe.
Instead of trying to create even more barrier dependencies, fix this by
making shpe a terminator. Both sched and postsched treat terminators
specially to make sure they always stay at the end of their block.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34290>
Shader may have zero instructions and no prefetches but have inputs
that without modifications are used as output.
Fixed vkd3d test:
test_depth_bias_behaviour
Fixes: b0a98d3b13
("ir3: Detect empty fragment shaders")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34348>
In order to implement FDM offset, we will have to offset the viewport
and scissor in the binning pass. In order to do this, we have to pass a
bin with nonsensical negative offsets to the patchpoint function, which
would result in asserts when patching the load/store sequences. But we
don't really need to patch these anyways as they are unused during
binning, so add the ability to skip them when binning. FS params and
some implementations of CmdClearAttachments (that don't contribute to
visibility) can similarly be skipped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33500>
For FDM offset, we will need to expand the number of bins by 1, which
can change how pipes are allocated. We don't necessarily know whether
FDM offset will be used when creating the VkFramebuffer, so we'll have
to create two different configs when FDM is enabled. Split out the parts
that are affected by the number of bins into a separate "VSC config"
struct that will be duplicated with FDM offset.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33500>
nir_opt_vectorize could replace swizzled movs with vectorized movs in a
different block. If this happens with swizzled movs in a then block, it
could leave this block empty. ir3 assumes only the else block can be
empty (e.g., when lowering predicates) so make sure ifs are in that
canonical form again.
This fixes empty predication blocks in some shaders, for example:
predt
predf
...
prede
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34272>
At least on all a6xx/a7xx, mad.f32 and mad.f16 are not fused. This means
that when the sources of a NIR ffma are all uniform we can split it in
two to execute it on the scalar ALU. This is important to reduce
register pressure and make more preambles executed early.
On fossil-db the statistics are mostly a wash as expected, but with
early preambles increasing dramatically:
Totals:
MaxWaves: 2249180 -> 2249230 (+0.00%); split: +0.01%, -0.01%
Instrs: 49668884 -> 49662951 (-0.01%); split: -0.12%, +0.11%
CodeSize: 103662656 -> 103831154 (+0.16%); split: -0.22%, +0.38%
NOPs: 8502571 -> 8495568 (-0.08%); split: -0.61%, +0.53%
MOVs: 1554442 -> 1538804 (-1.01%); split: -2.01%, +1.01%
Full: 1820906 -> 1814292 (-0.36%); split: -0.39%, +0.03%
(ss): 1168628 -> 1165868 (-0.24%); split: -1.01%, +0.78%
(sy): 616751 -> 616521 (-0.04%); split: -0.52%, +0.49%
(ss)-stall: 4384397 -> 4361662 (-0.52%); split: -1.44%, +0.93%
(sy)-stall: 17850227 -> 17858949 (+0.05%); split: -0.58%, +0.63%
Early-preamble: 102262 -> 115702 (+13.14%)
Cat0: 9375820 -> 9367978 (-0.08%); split: -0.57%, +0.48%
Cat1: 2470212 -> 2454318 (-0.64%); split: -1.28%, +0.64%
Cat2: 18673655 -> 18707106 (+0.18%)
Cat3: 14227810 -> 14211106 (-0.12%)
Cat5: 1424184 -> 1424150 (-0.00%)
Cat7: 1404718 -> 1405808 (+0.08%); split: -0.39%, +0.47%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34115>
wsi_configure_image() with the same info is already called by
configure_image() in wsi_swapchain_init(), so this second call is
unnecessary. Furthermore, calling it the second time caused a memory
leak of queue family indices array.
Fixes: d4a2c0fc ("vulkan/wsi: add a headless swapchain implementation/option")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12811
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34194>
This was handled in other instances in a previous patch, but this
instance remains, as the zlib decompression routine is slightly
different.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34118>
There are three copies of this function, all of them have the same
memory leak in them. Instead of fixing them one by one, just use a
common implementation for all three, since they already all have a
shared helper lib.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34118>
With "classic" renderpasses, the VkFramebuffer's layerCount must be 1 if
multiview is enabled. We accidentally rely on this to not disable GMEM
for multiview, and possibly for other things too. Apparently the dynamic
rendering equivalent, VkRenderingInfo::layerCount, can be anything when
multiview is enabled, and some CTS tests set it to the number of views.
Sanitize it when constructing the internal framebuffer for dynamic
rendering.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34080>
We were a bit too conservative and fully disabled LRZ for when stencil
or blending were involved. There is no need to fully disable LRZ
in those cases, only LRZ writes should be disabled.
The final rules are:
LRZ is DISABLED until depth attachment is cleared when:
- Depth Write + changing direction of depth test
e.g. from OP_GREATER to OP_LESS;
- Depth Write + OP_ALWAYS or OP_NOT_EQUAL;
- Clearing depth with vkCmdClearAttachments;
- Depth image is a target of blit commands.
- (pre-a650) Not clearing depth attachment with LOAD_OP_CLEAR;
- (pre-a650) Using secondary command buffers;
LRZ WRITE is DISABLED until depth attachment is cleared when:
- Depth Write + blending (color blend, logic ops, partial color mask, etc.);
- Fragment may be killed by stencil;
LRZ is disabled for CURRENT draw when:
- Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
- Fragment shader writes depth or stencil;
LRZ WRITE is DISABLED (via LATE_Z) for CURRENT draw when:
- Fragment may be via killed alpha-to-coverage, discard, sample coverage;
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33851>
When creating the image view in the texel buffer shader copy function,
take in account the region to copy can start in a different Z-offset
than 0.
This fixes several dEQP-VK.image.concurrent_copy.* failing tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
As we will be creating an image view that covers the region to copy,
batch all the regions that share the same depth offset and depth extent.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
Source format is not involved at all on creating the blit render pass,
so remove from the function call.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34112>
Remapping was missing for format description which made these formats
effectively unsupported as zero format features were reported.
Fixes: 0098f8ef35 ("radv: Remap 10 and 12 bit formats to 16 bit formats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34274>