The non-compute end flag should be INTEL_DS_TRACEPOINT_FLAG_END_OF_PIPE.
This fixes the broken anv utrace for anything non-compute that can
potentially overlap (execute in parallel).
Fixes: 6281b207db ("anv: add tracepoints timestamp mode for empty dispatches")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37155>
(cherry picked from commit c0e51bcf24)
ENABLE_DRM_AMDGPU must be defined when amdgpu_virtio is enabled;
otherwise, vdrm and amdgpu_virtio will have different definitions of
struct virgl_renderer_capset_drm. As a result, on amdgpu_virtio side,
the content of struct vdrm_device will be corrupted.
Thanks Honglei Huang <honglei1.huang@amd.com> for pointing out the
different definitions of struct virgl_renderer_capset_drm.
Cc: mesa-stable
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37023>
(cherry picked from commit 5736280730)
Pre-rasterization stages need a CS stall if they need to wait on the
flushes from a PIPE_CONTROL.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37132>
(cherry picked from commit f262865a90)
musl removed the LFS64 APIs like mmap64(), which were intended to be a
transitional measure multiple decades ago, causing a build failure
here. Since virtio-gpu sizes and offsets are 64-bit, we do still want
to make sure that we're using 64-bit mmap here, so I've added
-D_FILE_OFFSET_BITS=64, which will ensure that off_t is always 64-bit
in gfxstream guest, and which is generally the modern solution here.
With this change, I am able to build gfxstream with musl.
Fixes: fec8e296a3 ("Make VirtGpu* interfaces")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37086>
(cherry picked from commit 6f8cdd8a3c)
With the existing union setup, only the first 8 bytes are initialized
properly for UBOs, yet the UBO size is 16, and all 16 bytes are copied
to applications. This leads to broken capture-replay since the
descriptor payload is no longer invariant.
Fix this by ensuring all union members are 16 bytes, which then get
properly initialized with the designated initializers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 8b5835af31 ("nvk: Use bindless cbufs on Turing+")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37053>
(cherry picked from commit f28f72a5a2)
8-bit bit_count cannot simply use the masked result of a 16-bit
bit_count. Make sure it is properly lowered to a 16-bit bit_count.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 8aa2cad5df ("ir3: lower relevant 8-bit ALU ops in nir_lower_bit_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37116>
(cherry picked from commit 603d6fe240)
This fixes graphics artifacts happening with particular shader.
This 'heuristic' hits few very similar shaders but should provide better
performance than current fix to turn off caching from all shaders.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35929>
(cherry picked from commit 4035520ca9)
The highest possible values that can be represented with
16/12/10 bits are 65535/4095/1023, not 65536/4096/1024.
In order to ensure 1023 maps to 65535 in the Sx10 case
we thus need to multiply by 65535 / 1023 ~= 64.06158
instead of 64.
Fixes: a166d7609f ("gles: Add support for 10/12/16 bit SW decoder YCbCr formats")
Suggested-by: Benjamin Otte <otte@redhat.com>
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37077>
(cherry picked from commit 1772380307)
On Android, Vulkan loader implements KHR_swapchain and owns both surface
and swapchain handles. On non-Android, common wsi implements the same and
owns the same. So for both cases, the drivers are unable to handle
vkGet/SetPrivateData call on either a surface or a swapchain.
Inspired by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37043
Cc: mesa-stable
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Ryan Zhang <ryan.zhang@nxp.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37064>
(cherry picked from commit 6e1c2e4d83)
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.
2 reasons for this :
- some of the data accessed by the build might need to be flushed
as a result of a previous barrier
- the scratch buffer might get reused between builds
Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
(cherry picked from commit 90daa80d1d)
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37090>
(cherry picked from commit b7a0f0215f)
Missing 32-bit entry point in GLSL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36041>
(cherry picked from commit d9b388af27)
The original lower_phis_to_regs_block() is a little too clever. It
crawls up the predecessor tree until it finds a cross edge and places
the register writes as deep as it can. This breaks nak_nir_lower_cf().
Say you have a shader like...
con %0 = load_uniform()
con loop {
if div {
} else {
}
break;
}
con %1 = phi %0
The original lower_phis_to_regs_block() will turn it into
con %0 = load_uniform()
con %r = decl_reg();
con loop {
if div {
reg_store(%r, %0)
} else {
reg_store(%r, %0)
}
break;
}
con %1 = reg_load(%r)
We then convert it into unstructured control-flow and run regs_to_ssa()
to get our phis back, which lowers each of the registers we inserted to
a phi tree. When we try to recover divergence information on phis by
looking at their sources, this works fine if each source maps directly
to a reg_store() whic maps directly to a phi in the original IR.
However, because the reg_store() instructions are placed deeper, it may
introduce false divergence.
Switch to the simple version of nir_lower_phis_to_regs_block() which
places reg writes directly in phi predecessor blocks. We could probably
be more conservative and just avoid placing writes to uniform regs in
divergent control-flow but it's more robust to make the load/store_reg
intrinsics match the original phis directly.
This fixes some shaders in Horizon: Zero Dawn Remastered
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
(cherry picked from commit c6e831ac44)
Right now it tries to place reg_write instructions as far up the
predecessor chain as possible. This is useful for a bunch of the passes
that call it since it ensures they don't get placed in dead blocks or in
single successors and things like that. But it screws up NAK's control
flow lowering so we need the option to turn it off and make the pass
place the reg_write instructions in the most obvious place possible.
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
(cherry picked from commit 26e32417b9)
This makes lavapipe act like other DRM drivers whenever we have udmabuf
and just make everything a dma-buf even if it doesn't strictly have to
be. Without this we can end up in weird cases if the client asks to
allocate a memory object with multiple export types. Before, if this
happened, we would allocate a memfd and then return that when the client
calls GetMemoryFd() even if they asked for a dma-buf. In theory, we
could add additional plumbing to allow for using the memfd itself for
OPAQUE_FD and only wrap in a udmabuf if DMA_BUF is requested but this is
simpler and more in line with what hardware DRM drivers do.
Fixes: c1657de63c ("lavapipe: support VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13798
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37067>
(cherry picked from commit 31f0d0732e)
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: ed6d5c33 ("nvk: Implement VK_EXT/KHR_buffer_device_address")
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Closes#13784
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37047>
(cherry picked from commit 6fbe2be7a7)
Fix the following warning, and others that are similar, as rustc
suggests:
warning: hiding a lifetime that's elided elsewhere is confusing
--> ../src/gallium/frontends/rusticl/mesa/compiler/nir.rs:282:22
|
282 | pub fn variables(&mut self) -> ExecListIter<nir_variable> {
| ^^^^^^^^^ -------------------------- the same lifetime is hidden here
| |
| the lifetime is elided here
= help: the same lifetime is referred to in inconsistent ways, making the signature confusing
= note: `#[warn(mismatched_lifetime_syntaxes)]` on by default
help: use `'_` for type paths
|
282 | pub fn variables(&mut self) -> ExecListIter<'_, nir_variable> {
| +++
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37059>
(cherry picked from commit 0e6b24451d)
The problem with the current code is that there is a disconnect between :
- the virtual register size allocated
- the dispatch size
- the size_written value
Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.
The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.
Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
(cherry picked from commit 93996c07e2)
Initialization of driver has been moved to register_data_source() from
OnSetup() by: a739889789 ("pps: Report available counters when
gpu.counters* data source is registered")
With above change, pps will destroy driver when collecting data stops
(pps may keep running) then the driver will become nullptr when user try
to collect data again. This will cause segmentation fault in OnSetup().
So this remove driver = nullptr in OnStop() and init driver in OnSetup()
to make sure driver exists when pps-producer run more than once.
Fixes: a739889789 ("pps: Report available counters when gpu.counters*
data source is registered")
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36548>
(cherry picked from commit 20b809f1f0)
The previous algorithm just looked at the dominator's loop header.
However, if you have multiple consecutive loops like:
function_impl {
loop {
// Stuff
}
loop {
// Other stuff
}
}
then it will look like the second loop is contained in the first loop
because the first loop's header dominates the second loop. This isn't
actually what we want. Instead, we want a node N to be considered part
of a loop with header H if H dominates N and H is reachable from N.
Fixes: 741f7067f1 ("nak: Add loop detection to the CFG")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36524>
(cherry picked from commit a1d5e8bfdb)