itof and utof natively support packing the f32 result to f16
(.l/.h), but the encode/decode paths fell through to the default
case and rejected any non-NONE pack, breaking nir_op_i2f16 /
nir_op_u2f16 codegen with "Failed to pack instruction: itof rfN.l".
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Add the tu-build-id meson option to force the build ID to a particular
value. This allows us the share the shader cache between different
builds. This enables, for example, sharing the cache between x86
drm-shim and aarch64 native builds.
Also add tu_override_{graphics,compute}_shader_version driconf options
to force recompilation of shaders even when tu-build-id stays the same.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41954>
gpu_id has been deprecated for a while. Moreover, drm-shim actually sets
a gpu_id for a7xx devices (while native builds do not) making the cache
UUID inconsistent.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41954>
Metal does not support importing host memory pointers into MTLHeap,
only MTLBuffer. Buffers can import without issue, and images are
restricted to linear images without flags requiring aliasing.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41894>
Same approach as HK for tessellation. It also handles instance_id lowering.
instance_id_includes_base_index is not taken into account in multiple
other passes that use instance id. These passes expect instance id to
actually be instance id. This change adds a pass to work around this.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41038>
Tessellation and geometry stages require emulation by launching
pre-graphics compute workloads, modifying the draw index and switching to
indirect. However, since these emulation steps can only take one draw at
a time (multi draw being the issue), we need to accommodate this limitation
by splitting kk_draw_data into 2. A constant structure that maintains the
initial values such as is restart enabled, index buffer, etc. and a second
structure containing the modified values used to dispatch the Metal draw
call.
This change also early returns if any of the emulation steps fail instead
of allowing the draw to continue to avoid potential issues.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41038>
Adds layer size and mip level offset information to image layouts.
With this information, we can calculate the subresource accessed for
block texel view and create an aliased texture in the intended format.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41900>
Metal does not guarantee that image reads after writes will be coherent,
requiring us to insert fences for read-write textures.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41900>
On real hardware compute_heap_size() reserves a fraction of total_ram for
the rest of the system and compute_memory_budget() reports at most 90% of
the available memory, both because that RAM is shared between the GPU and
the CPU. In simulator mode the memory is instead a dedicated GPU pool
allocated by the simulator, so these reservations just hid memory: although
we allocate 1 GiB for the simulator, only 512 MiB was exposed as the heap
and as the budget.
Expose the full simulator allocation as both the heap size and the budget.
The simulator never allocates more than the 4 GiB the GPU MMU can address,
which we assert.
Before:
memoryHeaps[0]:
size = 536870912 (0x20000000) (512.00 MiB)
budget = 536870912 (0x20000000) (512.00 MiB)
After:
memoryHeaps[0]:
size = 1073741824 (0x40000000) (1024.00 MiB)
budget = 1073725536 (0x3fffc060) (1023.98 MiB)
Assisted-by: Claude Opus 4.8
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41898>
Add a helper to allocate a counter for a requested countable, and (if
supported by KMD) do the PERFCNTR_CONFIG ioctl to reserve the counter
for UMD local (inline) usage.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Add support for the new ioctl for KMD global counter collection. This
avoids needing hacks to parse dtb and mmap the GPU's i/o space.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
With PERFCNTR_CONFIG, some other process may have already reserved some
counters, so not all will be available to fdperf. Prepare for this by
using num_counters in counter_group.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Move this earlier so we have the counter config early enough to probe
kernel support for PERFCNTR_CONFIG with a valid config.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
Pull in updated UABI header with PERFCNTR_CONFIG ioctl. Sync with:
commit 44c460d2cc8b87c08360fe60f861660c8045ef90
Merge: 9bb8af2770b7 9a967125427e
Author: Dave Airlie <airlied@redhat.com>
Merge tag 'drm-msm-next-2026-05-30' of https://gitlab.freedesktop.org/drm/msm into drm-next
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41158>
When we do f2fmp(fadd(f2f32(a), f2f32(b))) we can always optimize it to
fadd(a, b) and obtain the same result minus an intermediate rounding
step, same for fmul.
I verified this on CPU using a custom script with Berkley SoftFloat
implementation, the results there are bit-for-bit identical except for
NaN representations.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41419>
This change splits the algorithm in two steps: first we have the
logical decision of which caches to bypass based on the needs of the
send operation, and then we have the code that picks the caching modes
based on which caches to bypass.
This should make it significantly easier for us to add new workarounds
without the risk of breaking existing cases.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
Instead of having an if ladder followed by another if that overwrites
the previous result, have a single if ladder.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
This is the next - but not final - step into making this function more
organized: split cache_mode into atomic, load and store versions, then
pick the version at the end.
v2: Initialize {load,store}_cache_mode (Sagar).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
We have code to choose cache_mode before send->sfid is assigned, but
after it we have more code to choose cache_mode that relies on
send->sfid. Move everything to after the selection of send->sfid so
the code to pick cache_mode is all together. I plan to simplify this
futher in the next commits, the goal of this patch is to make the next
diff easier to read.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
GL_TEXTURE_IMMUTABLE_LEVELS is core state in OpenGL ES 3.0 (it comes with
immutable textures / glTexStorage), queryable through glGetTexParameter. The
getter only allowed it when ARB_texture_view or OES_texture_view is present, so
a GLES3 driver without texture views returns GL_INVALID_ENUM for a valid query.
Its sibling GL_TEXTURE_IMMUTABLE_FORMAT is correctly ungated.
Allow the query on any GLES3 context, matching the spec.
Fixes 8 dEQP-GLES3.functional.state_query.texture.*_immutable_levels_* cases on
etnaviv (which exposes neither texture-view extension).
Fixes: 214fd4e40d ("mesa/main: fix texture view enum checks")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41910>
`VkDeviceDeviceMemoryReportCreateInfoEXT::pUserData` was updated to have
`optional` sometime between v1.4.335 and v1.4.337 which updates codegen
in a backwards incompatible way. VkDeviceDeviceMemoryReportCreateInfoEXT
should not really be sent to the host anyways (as a guest provided callback
can never be called from the host) but older existing guest images are
already sending this struct so we need to preserve compatibility.
Bug: b/519606352
Test: GfxstreamEnd2EndTests
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42007>
Remove the unused tgsi_dump include, the stale nir_to_tgsi comment, the
redundant freshly-allocated null check, and the dead VS/FS delete branches.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41925>
r300_create_vs_state always allocates vs->first and builds the initial shader
variant before the state can be bound, so the !vs->first path in
r300_pick_vertex_shader is unreachable.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41925>
Replace the TGSI-based r300_draw_init_vertex_shader with a NIR
implementation, removing the nir_to_tgsi call from r300_state.c.
The three transformations previously done via tgsi_transform are now
done directly on a cloned NIR shader:
- Insert missing primary color output if secondary color is present.
- Insert all missing color/bcolor outputs if any back-face color is used.
- Add a WPOS output (copy of gl_Position) in the next available generic
slot, by duplicating each store_deref to gl_Position to the new output
at the same point.
The ntr_fixup_varying_slots is applied to the clone before any
transforms, keeping VS outputs aligned with FS inputs.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Assisted-by: Claude Sonnet 4.6
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41925>
Even though we use an enum to implement it internally, there's no real
benefit to that enum being exposed to users. This makes it look more
like any other container type with an opaque implementation.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41941>