lower_boolean_reduce only works if the number of components is 1, and even
asserts on this in its prologue. Otherwise, given a boolean vector type, it
may produce output using ballot/vote with a boolean vector input.
Acked-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41186>
v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to
meta_fill_buffer, ignoring the destination VkBuffer's mem_offset.
When several VkBuffers share one VkDeviceMemory at different offsets
(sub-allocation) the fill landed on whichever VkBuffer was
bound at offset 0 of the memory object instead of the requested one.
Fixes: 5ed78d91fe ("v3dv: implement vkCmdFillBuffer")
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>
Drivers using blorp on ELK platforms don't need the special
color->depth conversion path that needs 64bit floating point math.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Allen Ballway <ballway@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>
This shouldn't be necessary because SDMA can detile the image just fine,
only buffer->image and image->image need to fallback.
It just works on GFX10+ because RADV is using NBC views, and I think
it works on eg. VEGA10 just by luck due to different
swizzles/alignments.
Fixes: 3d803d7a2e ("radv: Use compute copy for emulated formats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41384>
KHR-Single-GL46.arrays_of_arrays_gl.SubroutineFunctionCalls2 subtests
pass, but some are slow and we keep skipping them.
copy_image.* now takes 2:30 of runtime on my T14s and has some interesting
fails in rgb9e5, though a750 CI seems to pass.
texture_swizzle.functional* now takes 6.5s of runtime on my T14s.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41245>
pre_chain.rp_trace usage relied on a bunch of bad assumptions
and together with u_trace_move didn't cause issues until
u_trace is started to be refactored. Fixing those bad assumptions
and correctly initializing and freeing pre_chain.rp_trace
also requires fixing u_trace_move at the same time.
u_trace_move fixes:
- If dst had trace chunks in it - we may have leaked them.
- The correct list move pattern is "list_replace -> list_inithead"
not "list_replace -> list_delinit"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41390>
Format draw and draw_indexed Perfetto events with their vertex count.
For draw_indirect and draw_indexed_indirect, include the draw count
when indirect tracing is enabled (MESA_GPU_TRACES=indirects), otherwise
fall back to the static name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Format compute events as compute(x,y,z) using the end-payload group
dimensions. Trailing dimensions that equal 1 are omitted to keep labels
concise — e.g. compute(128,1,1) becomes compute(128).
For compute_indirect, the dispatch dimensions are not known at command
record time since they live in GPU memory as a VkDispatchIndirectCommand.
The u_trace framework reads them back at trace flush time via the
is_indirect mechanism: the GPU address is recorded alongside the
tracepoint, and u_trace copies the pointed-to struct into indirect_data
once the GPU has finished. The same trailing-1 trimming is applied when
indirect tracing is enabled (MESA_GPU_TRACES=indirects); otherwise the
event falls back to the static "compute_indirect" name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Add a separate end_event_dyn() that takes a std::string by value for
dynamic event names. The [=] lambda capture deep-copies the string into
the closure, avoiding a dangling pointer when the Trace() continuation
runs after the caller's stack frame is gone.
The existing end_event() with const char* remains for string literals
and long-lived pointers (e.g. payload->str), where no copy is needed.
CREATE_DUAL_EVENT_CALLBACK_DYN formats the event name via snprintf and
passes the result as a std::string to end_event_dyn(). Follow-up patches
will use this macro to label events with runtime dimensions.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
When uncached memory type is used under emulation then most games have a
significant performance penalty due to accessing the buffer atomically.
Instead when this option is set, it will override uncached buffer
allocations to instead be cached+coherent if the host supports it. This
allows the atomic accesses to still be done but not have abysmal
performance.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41323>
Instead of leaving timestamp_copy_data half-initialized in
copy_timestamp_cs_pool - always have it fully initialized and valid
state there.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41438>
Add a new helper function pvr_pbe_format_num_sample_components that maps a
pvr_transfer_pbe_pixel_src format to the number of components it actually
uses. Use pvr_pbe_format_num_sample_components in pvr_uscgen_tq_frag_load
to set params.sample_components before calling pco_emit_nir_smp, so the
instruction is emitted with the correct component count. This allows the
generation of a more optimal SMP instruction, avoiding the emission of
unused result components.
Signed-off-by: Caius Moldovan <caius.moldovan@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41393>
We've been reporting in features.txt that we support this extension
unconditionally, but we didn't. Now that we have the bits wired up due
to Vulkan, we can actually enable it on Bifrost and later.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34339>
Enabling clamping in the opcode here doesn't do quite what we need. This
makes the HW clamp to the max LOD specified in the sampler, but we need
to clamp to the maximum available LOD instead, which is the minimum
of the max-lod of the sampler and the max level in the texture itself.
We also need to take the mipmap mode into account when computing the
level of detail. This is not something the TEX_GRADIENT instruction
does, so we need to do this manually.
Now that we no longer modify the flags in the loop, we can get rid of
the loop alltogether, and only issue a single TEX_GRADIENT instruction.
While we're at it, clean up some naming to better match the phrasing
from the spec.
This only applies to Valhall for now.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/14867
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34339>
This bit is reserved and should be zero on V9, so we should report an
illegal instruction if we ever encounter it while packing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34339>
If we use the texture coordinates mode for TEX_GRADIENT we need valid
texture coordinates on disabled lanes to compute correct lods across all
pixels on a triangle, otherwise pixels along triangle edges will read
garbage when computing coordinate deltas and produce bogus results.
We previously tried to solve this by setting the force_delta_enable bit,
but that doesn't always work... and worse, this bit isn't supported on
V9, which means we sometimes end up generating illegal instructions.
Fixes Piglit:
shaders/zero-tex-coord texturequerylod
Fixes: 4e58029dc0 ("pan/va: fix base-level for nir_texop_lod")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34339>