Gallivm runs shaders that are originally compiled with another backend's
compiler options, which may have optimizations that introduce opcodes
that gallivm does not support. Add a pass to lower these.
Assisted-by: Claude Opus 4.6
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41302>
Previously, the mem alloc wait barrier is via a separate renderer
submission (e.g. execbuf for virtgpu backend). In fact, we can leverage
the cmd payload in resource_create_blob to avoid the extra submission.
This would help downstream win32 backend as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
This is to leverage drm_virtgpu_resource_create_blob::cmd for expressing
the blob mem host resource dependency in the virtgpu backend, which can
avoid the execbuf. Similar for vtest backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
Currently, when hashing a pipeline stage, the final hash is different
when the module is passed as VkPipelineShaderStageCreateInfo::module
(the module's hash is hashed) or as a VkShaderModuleCreateInfo in its
pNext chain (the module's code is hashed). This causes unnecessary cache
misses. To prevent this, hash the code first in the latter case and add
that hash to the stage's hash.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42014>
Current size of prev_refs is 8, which just means the size of ref-frames
but needs to be aligned with full size of dpb, which is 9.
Also prev_refs is now indexed by dpb slot and holds the last intra frame
written to that slot.
This fixes visible artifacts on AV1 streams that mix super-res and
non-super-res frames in a hierarchical reference structure.
Closes: mesa/mesa#15503
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
This may delete existing pointer flags coming from the instance if the
traversal loop is exited and then restarted, as is done with ray
queries.
Fixes geometry being incorrectly culled due to FLIP_FACING flags going
missing.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41965>
V3D 7.1 now exposes shaderFloat16, shaderInt8, shaderInt16 and
VK_KHR_shader_float16_int8.
Partial native Float16 support is already available. But the rest of
sub-32-bit ALU operations are widened to 32-bit by nir_lower_bit_size
in v3d_lower_nir(); conversion and pack operations are kept at their
native bit width so the QPU's 16-bit pack/unpack paths on mul/mov can
be used.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Keep f16 fadd/fsub/fmul/fmin/fmax/fneg/fabs at 16-bit through
nir_lower_bit_size on V3D 7.1+ and emit the matching VF* op in
nir_to_vir, instead of widening to f32 with f16<->f32 round-trip
movs that pack-fold can absorb into hints. The native path saves
the absorption overhead in f16-heavy shaders.
Only the lower half of each VF* result is consumed; the upper half
is computed but unused.
New VIR helpers vir_VFADD, vir_VFSUB, vir_VFCMP, vir_VFMIN,
vir_VFMUL, vir_VFMOV, vir_VFABS, vir_VFNEG, vir_VFNAB were added.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Add the V3D 7.1+ 2x16-bit f16 add-pipe ops (VFADD/VFSUB/VFCMP and
the sign-manipulation family VFMOV/VFABS/VFNEG/VFNAB), wire VFMAX
into v3d71_add_ops, and complete the V3D 7.1 decode/encode for
VFMIN/VFMAX/VFMUL.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
The liveness analysis treated any output-pack write (D.l /
D.h) as a partial definition, refusing to mark the variable as
defined in the block. That extended live ranges all the way to the
top of the program for every f16 temporary, artificially increasing
register pressure.
D.l/h only modifies the written bits, leaving the unwritten half bits
preserved. So a pack write is a full definition whenever no
consumer ever observes the unwritten half, or when both halves are
written before the variable is used.
This scans every instruction into a per-temp read-flag array
(TEMP_READ_LO / TEMP_READ_HI, with FULL = LO | HI) by inspecting
each source's input unpack. And recognizes two patterns as full
definitions:
* Both PACK_L and PACK_H written unconditionally in the same block.
* The instruction's pack writes the half that covers every observed
read of the variable across the program (the unwritten half is never
read).
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
nir_lower_subgroups lowers reduce/scan to a tree of shuffle + ALU
chains over the source data type. When the source is sub-32-bit
(int8, int16, float16, or vector forms) those new ALU ops escape
the bit_size widening done earlier in v3d_lower_nir, leaving the
QPU codegen to emit raw min/max/etc. on 32-bit channel registers
whose upper bits are unspecified. The result is wrong reductions
for signed integer min/max (the upper bits make a signed int8 look
like a positive int32), wrong unsigned reductions (high-bit garbage
mixes into the result), and wrong f16 reductions.
Re-run nir_lower_bit_size after nir_lower_subgroups so the
generated sub-32-bit ALU ops are widened with the correct
sign/zero extension on inputs and the matching narrow on outputs.
Also widen vote_feq/vote_ieq when the source operand is sub-32-bit:
the V3D backend emits ALLFEQ/ALLEQ on full 32-bit channels (it does
not use yet the f16 vfcmp/vfmin/vfmax HW path), so the comparison input
must be 32-bit.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
flrp32 is already lowered; mirror it for flrp16 so V3D's f16 ALU
path doesn't see an unsupported flrp@16 leftover after bit_size
widening. No measurable test impact on the current f16 sweep,
but matches the f32 behaviour and keeps the lowering surface
consistent across bit sizes.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
The frexp lowering decomposes frexp into bit manipulation (fabs, ushr,
iand, ior) that relies on implicit float-to-int bit reinterpretation.
When lowered at 16-bit, the subsequent nir_lower_bit_size pass widens
float operations with f2f32 (changing the bit pattern to IEEE fp32)
and integer operations with u2u32 (zero-extending 16-bit bits). This
breaks the reinterpretation: ushr on the fabs result gets f2f32-widened
float bits instead of the original fp16 bit pattern, causing the sign
bit to leak into the exponent extraction for negative inputs.
Moving nir_lower_frexp into v3d_lower_nir after nir_lower_bit_size.
This way frexp decomposition operates at 32-bit where float and integer
operations share the same bit width, and the bit manipulation masks use
the correct IEEE fp32 constants.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
itof and utof natively support packing the f32 result to f16
(.l/.h), but the encode/decode paths fell through to the default
case and rejected any non-NONE pack, breaking nir_op_i2f16 /
nir_op_u2f16 codegen with "Failed to pack instruction: itof rfN.l".
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Add the tu-build-id meson option to force the build ID to a particular
value. This allows us the share the shader cache between different
builds. This enables, for example, sharing the cache between x86
drm-shim and aarch64 native builds.
Also add tu_override_{graphics,compute}_shader_version driconf options
to force recompilation of shaders even when tu-build-id stays the same.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41954>
gpu_id has been deprecated for a while. Moreover, drm-shim actually sets
a gpu_id for a7xx devices (while native builds do not) making the cache
UUID inconsistent.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41954>
Metal does not support importing host memory pointers into MTLHeap,
only MTLBuffer. Buffers can import without issue, and images are
restricted to linear images without flags requiring aliasing.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41894>
Same approach as HK for tessellation. It also handles instance_id lowering.
instance_id_includes_base_index is not taken into account in multiple
other passes that use instance id. These passes expect instance id to
actually be instance id. This change adds a pass to work around this.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41038>
Tessellation and geometry stages require emulation by launching
pre-graphics compute workloads, modifying the draw index and switching to
indirect. However, since these emulation steps can only take one draw at
a time (multi draw being the issue), we need to accommodate this limitation
by splitting kk_draw_data into 2. A constant structure that maintains the
initial values such as is restart enabled, index buffer, etc. and a second
structure containing the modified values used to dispatch the Metal draw
call.
This change also early returns if any of the emulation steps fail instead
of allowing the draw to continue to avoid potential issues.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41038>
Adds layer size and mip level offset information to image layouts.
With this information, we can calculate the subresource accessed for
block texel view and create an aliased texture in the intended format.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41900>
Metal does not guarantee that image reads after writes will be coherent,
requiring us to insert fences for read-write textures.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41900>
On real hardware compute_heap_size() reserves a fraction of total_ram for
the rest of the system and compute_memory_budget() reports at most 90% of
the available memory, both because that RAM is shared between the GPU and
the CPU. In simulator mode the memory is instead a dedicated GPU pool
allocated by the simulator, so these reservations just hid memory: although
we allocate 1 GiB for the simulator, only 512 MiB was exposed as the heap
and as the budget.
Expose the full simulator allocation as both the heap size and the budget.
The simulator never allocates more than the 4 GiB the GPU MMU can address,
which we assert.
Before:
memoryHeaps[0]:
size = 536870912 (0x20000000) (512.00 MiB)
budget = 536870912 (0x20000000) (512.00 MiB)
After:
memoryHeaps[0]:
size = 1073741824 (0x40000000) (1024.00 MiB)
budget = 1073725536 (0x3fffc060) (1023.98 MiB)
Assisted-by: Claude Opus 4.8
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41898>
Add a helper to allocate a counter for a requested countable, and (if
supported by KMD) do the PERFCNTR_CONFIG ioctl to reserve the counter
for UMD local (inline) usage.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Add support for the new ioctl for KMD global counter collection. This
avoids needing hacks to parse dtb and mmap the GPU's i/o space.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
With PERFCNTR_CONFIG, some other process may have already reserved some
counters, so not all will be available to fdperf. Prepare for this by
using num_counters in counter_group.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Move this earlier so we have the counter config early enough to probe
kernel support for PERFCNTR_CONFIG with a valid config.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Pull in updated UABI header with PERFCNTR_CONFIG ioctl. Sync with:
commit 44c460d2cc8b87c08360fe60f861660c8045ef90
Merge: 9bb8af2770b7 9a967125427e
Author: Dave Airlie <airlied@redhat.com>
Merge tag 'drm-msm-next-2026-05-30' of https://gitlab.freedesktop.org/drm/msm into drm-next
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>