8-bit bit_count cannot simply use the masked result of a 16-bit
bit_count. Make sure it is properly lowered to a 16-bit bit_count.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 8aa2cad5df ("ir3: lower relevant 8-bit ALU ops in nir_lower_bit_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37116>
(cherry picked from commit 603d6fe240)
This fixes graphics artifacts happening with particular shader.
This 'heuristic' hits few very similar shaders but should provide better
performance than current fix to turn off caching from all shaders.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35929>
(cherry picked from commit 4035520ca9)
The highest possible values that can be represented with
16/12/10 bits are 65535/4095/1023, not 65536/4096/1024.
In order to ensure 1023 maps to 65535 in the Sx10 case
we thus need to multiply by 65535 / 1023 ~= 64.06158
instead of 64.
Fixes: a166d7609f ("gles: Add support for 10/12/16 bit SW decoder YCbCr formats")
Suggested-by: Benjamin Otte <otte@redhat.com>
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37077>
(cherry picked from commit 1772380307)
On Android, Vulkan loader implements KHR_swapchain and owns both surface
and swapchain handles. On non-Android, common wsi implements the same and
owns the same. So for both cases, the drivers are unable to handle
vkGet/SetPrivateData call on either a surface or a swapchain.
Inspired by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37043
Cc: mesa-stable
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Ryan Zhang <ryan.zhang@nxp.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37064>
(cherry picked from commit 6e1c2e4d83)
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.
2 reasons for this :
- some of the data accessed by the build might need to be flushed
as a result of a previous barrier
- the scratch buffer might get reused between builds
Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
(cherry picked from commit 90daa80d1d)
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37090>
(cherry picked from commit b7a0f0215f)
Missing 32-bit entry point in GLSL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36041>
(cherry picked from commit d9b388af27)
The original lower_phis_to_regs_block() is a little too clever. It
crawls up the predecessor tree until it finds a cross edge and places
the register writes as deep as it can. This breaks nak_nir_lower_cf().
Say you have a shader like...
con %0 = load_uniform()
con loop {
if div {
} else {
}
break;
}
con %1 = phi %0
The original lower_phis_to_regs_block() will turn it into
con %0 = load_uniform()
con %r = decl_reg();
con loop {
if div {
reg_store(%r, %0)
} else {
reg_store(%r, %0)
}
break;
}
con %1 = reg_load(%r)
We then convert it into unstructured control-flow and run regs_to_ssa()
to get our phis back, which lowers each of the registers we inserted to
a phi tree. When we try to recover divergence information on phis by
looking at their sources, this works fine if each source maps directly
to a reg_store() whic maps directly to a phi in the original IR.
However, because the reg_store() instructions are placed deeper, it may
introduce false divergence.
Switch to the simple version of nir_lower_phis_to_regs_block() which
places reg writes directly in phi predecessor blocks. We could probably
be more conservative and just avoid placing writes to uniform regs in
divergent control-flow but it's more robust to make the load/store_reg
intrinsics match the original phis directly.
This fixes some shaders in Horizon: Zero Dawn Remastered
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
(cherry picked from commit c6e831ac44)
Right now it tries to place reg_write instructions as far up the
predecessor chain as possible. This is useful for a bunch of the passes
that call it since it ensures they don't get placed in dead blocks or in
single successors and things like that. But it screws up NAK's control
flow lowering so we need the option to turn it off and make the pass
place the reg_write instructions in the most obvious place possible.
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
(cherry picked from commit 26e32417b9)
This makes lavapipe act like other DRM drivers whenever we have udmabuf
and just make everything a dma-buf even if it doesn't strictly have to
be. Without this we can end up in weird cases if the client asks to
allocate a memory object with multiple export types. Before, if this
happened, we would allocate a memfd and then return that when the client
calls GetMemoryFd() even if they asked for a dma-buf. In theory, we
could add additional plumbing to allow for using the memfd itself for
OPAQUE_FD and only wrap in a udmabuf if DMA_BUF is requested but this is
simpler and more in line with what hardware DRM drivers do.
Fixes: c1657de63c ("lavapipe: support VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13798
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37067>
(cherry picked from commit 31f0d0732e)
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: ed6d5c33 ("nvk: Implement VK_EXT/KHR_buffer_device_address")
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Closes#13784
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37047>
(cherry picked from commit 6fbe2be7a7)
Fix the following warning, and others that are similar, as rustc
suggests:
warning: hiding a lifetime that's elided elsewhere is confusing
--> ../src/gallium/frontends/rusticl/mesa/compiler/nir.rs:282:22
|
282 | pub fn variables(&mut self) -> ExecListIter<nir_variable> {
| ^^^^^^^^^ -------------------------- the same lifetime is hidden here
| |
| the lifetime is elided here
= help: the same lifetime is referred to in inconsistent ways, making the signature confusing
= note: `#[warn(mismatched_lifetime_syntaxes)]` on by default
help: use `'_` for type paths
|
282 | pub fn variables(&mut self) -> ExecListIter<'_, nir_variable> {
| +++
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37059>
(cherry picked from commit 0e6b24451d)
The problem with the current code is that there is a disconnect between :
- the virtual register size allocated
- the dispatch size
- the size_written value
Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.
The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.
Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
(cherry picked from commit 93996c07e2)
Initialization of driver has been moved to register_data_source() from
OnSetup() by: a739889789 ("pps: Report available counters when
gpu.counters* data source is registered")
With above change, pps will destroy driver when collecting data stops
(pps may keep running) then the driver will become nullptr when user try
to collect data again. This will cause segmentation fault in OnSetup().
So this remove driver = nullptr in OnStop() and init driver in OnSetup()
to make sure driver exists when pps-producer run more than once.
Fixes: a739889789 ("pps: Report available counters when gpu.counters*
data source is registered")
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36548>
(cherry picked from commit 20b809f1f0)
The previous algorithm just looked at the dominator's loop header.
However, if you have multiple consecutive loops like:
function_impl {
loop {
// Stuff
}
loop {
// Other stuff
}
}
then it will look like the second loop is contained in the first loop
because the first loop's header dominates the second loop. This isn't
actually what we want. Instead, we want a node N to be considered part
of a loop with header H if H dominates N and H is reachable from N.
Fixes: 741f7067f1 ("nak: Add loop detection to the CFG")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36524>
(cherry picked from commit a1d5e8bfdb)
Also add a #define NAK_QMD_ALIGN_B for alignments. The alignment is
always 256B and there's no evidence of that changing.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Backport-to: 25.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36995>
(cherry picked from commit 02ef6a5158)
It turns out that the `occlusion_query.syncobj` is used to set
state that later code relies on, and setting it to NULL causes
some Vulkan CTS tests to fail. Instead, we should explicitly check
for the mode being `MALI_OCCLUSION_MODE_DISABLED` to avoid using
an invalid `ptr` field.
Fixes: 24c692c981 ("panvk: fix a NULL pointer dereference in occlusion queries")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36794>
(cherry picked from commit 9b4eb81162)
When using formats with less than 32-bits per pixel, we pad the
tile-buffer to a multiple of 32-bits so we can store additional bits
used by dithering.
Account for this when computing the max MSAA setting.
Fixes: 329568b5eb ("panfrost: add color-attachment and msaa helpers")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35755>
(cherry picked from commit 6a83193771)
Right now, we store the last VkDeviceMemory object bound to an image in
panvk_image::mem, but this doesn't work for disjoint images where the mem
object can differ on each plane.
Move panvk_image::mem to panvk_image_plane::mem and prefix the offset
field with mem_, so it's clear the offset refers to the mem object.
Note this should only fix host copy on disjoint images, since the GPU
address was already properly set at bind time.
Fixes: 1cd61ee948 ("panvk: implement VK_EXT_host_image_copy for linear color images")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36926>
(cherry picked from commit d0126f5ced)
The actual number of samples chosen is allowed to equal the number
requested, but currently we just check for sample counts greater
than the request.
Fixes: 894b37e060 ("mesa: fix sample count handling for MSRTT")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36878>
(cherry picked from commit 095d1b6bcc)
Those are for push constants, no point in doing that because :
- there is no HW constant offsets in push constants (payload
delivery), it's just register offset calculation
- if we have an dynamic value it's already using MOV_INDIRECT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
(cherry picked from commit 27c69acb6a)
It's possible with GPL.
This fixes a NULL pointer dereference with updated pipeline binaries
tests in VKCTS.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36999>
(cherry picked from commit 944e26eae7)
FdMemoryDataSource was being registered as a Perfetto data source
unconditionally which led to anything calling fd_device_new(...)
attempting to do this even when they might not have Perfetto
initialized which is done as a part of util_perfetto_init, without
which trying to register the event causes a SEGFAULT.
Fixes: c7045e3e63 ("perfetto: unify init")
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36993>
(cherry picked from commit 098521559d)