This allows rusticl to make use of the native fma instructions giving us
better OpenCL performance.
e.g. ProjectPhysX_OpenCL-Benchmark on my GA102:
FP32 0.610 -> 11.474 TFLOPs/s
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41720>
Vulkan dispatch commands should multiply group count by local size,
but pre-compiled dispatches should not. For example, the predicate
indirect shader has a local size of 32 and a grid size equal to the
max draw count, which was resulting in a dispatch total grid size of
(max_draw_count * 32), overrunning the buffer.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41740>
Ported from HoneyKrisp upload pools, with some adaptations such as
providing the MTLBuffer and offset for use with certain commands.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41740>
The register values will depend on new fields in PS_STATE and it doesn't
seem like dynamic state belongs in radv_emit_fragment_shader_state.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41689>
to reduce the number of initialized PS VGPRs, increasing the PS wave launch
rate.
The pass will have more RADV-specific stuff.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41689>
These are SGPR inputs, so they are uniform in subgroups but may
have different values in different subgroups.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
These intrinsics are generally divergent between different
subgroups, but they can be uniform when all their sources
are also uniform.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
AMD SMEM instructions are always uniform within a subgroup,
but they may be divergent across subgroups, ie. each subgroup
may have a different value from the same SMEM instruction.
This needs to be considered for divergence across subgroups
as well as for vertex divergence, because vertices of the
same primitive may be split between different waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
There is no hardware which supports ffmaz with denorms.
We also need this to be seperate because there is AMD hardware
with ffma but not ffmaz.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41649>
This change adds the DRAW_ATTR_NONEXIST functionality
which fixes the memory access issue.
For instance, this issue is triggered with "piglit/bin/glsl-routing -auto -fbo":
==8384==ERROR: AddressSanitizer: heap-use-after-free on address 0xa11dfd84 at pc 0xae573fbd bp 0xbf87f688 sp 0xbf87f67c
READ of size 4 at 0xa11dfd84 thread T0
#0 0xae573fbc in emit_hw_vertex ../src/gallium/drivers/i915/i915_prim_emit.c:92
#1 0xae574ab0 in emit_prim ../src/gallium/drivers/i915/i915_prim_emit.c:154
#2 0xae574ab0 in setup_tri ../src/gallium/drivers/i915/i915_prim_emit.c:160
#3 0xad65d322 in do_triangle ../src/gallium/auxiliary/draw/draw_pipe.c:173
#4 0xad65d322 in pipe_run_linear ../src/gallium/auxiliary/draw/draw_decompose_tmp.h:181
#5 0xad663375 in draw_pipeline_run_linear ../src/gallium/auxiliary/draw/draw_pipe.c:337
#6 0xad86d9ac in pipeline ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:476
#7 0xad86d9ac in llvm_pipeline_generic ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:701
#8 0xad86ed75 in llvm_middle_end_linear_run ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:784
#9 0xad6aaaee in vsplit_segment_simple_linear ../src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h:223
#10 0xad6aaaee in vsplit_run_linear ../src/gallium/auxiliary/draw/draw_split_tmp.h:64
#11 0xad68a74b in draw_pt_arrays ../src/gallium/auxiliary/draw/draw_pt.c:161
#12 0xad68b7ca in draw_pt_arrays_restart ../src/gallium/auxiliary/draw/draw_pt.c:430
#13 0xad68b7ca in draw_instances ../src/gallium/auxiliary/draw/draw_pt.c:491
#14 0xad68ce0a in draw_vbo ../src/gallium/auxiliary/draw/draw_pt.c:628
#15 0xae5651d4 in i915_draw_vbo ../src/gallium/drivers/i915/i915_context.c:115
#16 0xae5651d4 in i915_draw_vbo ../src/gallium/drivers/i915/i915_context.c:51
#17 0xac7f50d3 in _mesa_draw_arrays ../src/mesa/main/draw.c:1204
Fixes: 247cee92df ("i915g: replace "uint" with normal uint32_t.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27571>
* `msl_ensure_vertex_point_size_output` is missing some NIR pass logic
after being converted to use lowered I/O, and only needs to be called
for point primitives.
* `nir_separate_merged_clip_cull_io` requires the `compact_arrays` NIR
option be enabled for compact clip/cull distance arrays.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41741>
We now handle that specific form on F2F and F2I but were still
asserting.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 67bfbc7535 ("nak: rework swizzling on scalar FP16 ops")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41771>
Let-parameters (e.g. hasRasterization, hasTessellation for
VkGraphicsPipelineCreateInfo) are local variables initialized by
reading a value from the stream. The codegen was creating them
with isConst=True, which caused streamPrimitive() to emit a cast
like:
hasRasterization = (const uint32_t)vkStream->getBe32();
The const qualifier on a scalar rvalue cast result is discarded
and triggers -Werror=ignored-qualifiers once that flag is enabled
for Soong compatibility, breaking the build:
src/gfxstream/guest/vulkan_enc/goldfish_vk_marshaling_guest.cpp:
In function 'void gfxstream::vk::unmarshal_VkGraphicsPipelineCreateInfo(
VulkanStreamGuest*, VkStructureType, VkGraphicsPipelineCreateInfo*)':
goldfish_vk_marshaling_guest.cpp:4202:28: error: type qualifiers
ignored on cast result type [-Werror=ignored-qualifiers]
4202 | hasRasterization = (const uint32_t)vkStream->getBe32();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
goldfish_vk_marshaling_guest.cpp:4207:27: error: type qualifiers
ignored on cast result type [-Werror=ignored-qualifiers]
4207 | hasTessellation = (const uint32_t)vkStream->getBe32();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: some warnings being treated as errors
Mark the let-param as non-const since it is a local that gets
assigned.
Fixes: 190ce8280f ("meson: Add Soong compatibility compiler flags to Vulkan drivers")
Assisted-by: Claude Code (Opus 4.7)
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41763>
The FS compilation needs the VUE map from the previous stage when the FS
has more inputs than SBE_SWIZ can remap. Recomputing a VUE map just
from FS inputs loses certain slots like multiview slots and extra
position slots (for primitive replication), so high-numbered attributes
can read the wrong source.
When available, pass the previous stage VUE map to the FS compilation
and use it. Make sure that the payload is sized based on what is read,
in case the previous stage has more outputs than FS reads.
Bugs did not surface when there were just 16 or fewer varying inputs,
because the driver can program SBE_SWIZ to translate the positions in
the previous shader VUE into what the FS wants. For more inputs, this
mapping is not used, and the FS must use the exact same slots.
Note this is not a problem for pipeline libraries because they use
a different fixed layout. This is also not an issue with
EXT_shader_object because multiview draws are not allowed with that
extension.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41747>
If the aux-usage changes, we need to flush out the previous mode from
the cache (see iris's flush_previous_aux_mode()).
I ran into this while testing layout-based compression toggling with the
Hogwarts Legacy trace on DG2. The trace exhibited graphical corruption
unless the DATA_CACHE was flushed.
On an unmodified driver, this currently only affects transitions from
AUX_NONE->AUX_CCS_D.
Backport-to: *
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
To do this, we use a heuristic that depends on the image format and size
(see HSD 18014810884).
Average of two runs on an A750 from the performance CI:
* Naraka +0.89%
* TWWH3 +0.45%
* Control +0.37%
* Cyberpunk +0.35%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
We don't support CCS on block-compressed textures prior to Xe2. On Xe2,
CCS is enabled on every image.
Improves INTEL_DEBUG=perf outputs. For example, in the Naraka trace on
DG2, we now report that r32_uint is CCS_E-incompatible instead of
bptc_rgba. This incompatibility is due to the storage usage flag and
will be clarified in future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
Print the image format which is incompatible (or has an incompatible
list). On gfx12+, the format list shouldn't impact CCS_E-compatibility.
So, not printing the entire list should be sufficient on those
platforms.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
anv emits performance warnings earlier about compression being disabled,
so no need to emit this for AUX_NONE. Do provide the tiling however as
Xe2+ supports compressed linear surfaces.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41034>
We're already grabbing the VS for VERTEX_VARY_SPD on v11 and earlier and
we're already carrying the code to check for IA_PRIMITIVE_TOPOLOGY. It
makes sense to have the code which selects shader descriptor there, too.
Otherwise the helper is a little too magic and can lead to bugs if
someone isn't paying attention. (See also the previous commit.)
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41638>
Intersection shader works on custom procedural geometries which are
present only in BLAS (Object) level not in the TLAS (World) level.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41739>
Stops incorrectly assuming cached-coherent memory is supported on
hardware that does not support it, such as a610 and a619-holi.
Fixes: 5a59410962 ("turnip: add cached and cached-coherent memory types")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41761>
Zink sets multiple external memory handle types(like Opaque FD,DMA-BUF)
without confirming if the Vulkan driver actually supports them. This may lead
to failures when attempting to allocate external memory with vkAllocateMemory.
This patch introduces query_external_memory_compatibility() to verify
handle type support via VkPhysicalDeviceImageFormatProperties2.
Combine handle types only if they are compatible; otherwise, use a single
supported type as a fallback.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40212>
find_good_mod was accumulating DISJOINT_BIT across iterations and
setting ici->usage on success. Change it to return results via
out-parameters and save/restore ici->flags around each modifier
attempt. The caller (negotiate_image_config) now explicitly sets
ici->usage and ici->flags after find_good_mod returns.
Also save/restore flags in the LINEAR modifier fallback path.
Assisted-by: Claude
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41734>
Replace the nested retry loops (eval_ici, set_image_usage,
double_check_ici, suboptimal_check_ici, try_set_image_usage_or_EXTENDED)
with a flat candidate array that encodes the same fallback order.
Instead of mutating a shared VkImageCreateInfo through deeply nested
function calls and retrying with toggled flags, we now:
1. build_usage_candidates() generates an array of (tiling, usage,
flags, has_format_list) tuples in preference order
2. try_image_config() applies each candidate and calls check_ici
3. negotiate_image_config() iterates tiling/extended combos, builds
candidates for each, and takes the first passing one
The modifier path (find_good_mod) is kept separate since it iterates
modifiers and takes the last good one (max-by-position, matching
the GBM worst-to-best convention), which is fundamentally different
from the candidate model's first-match-from-fallback-chain.
Duplicate candidates from the old code's redundant retry paths are
eliminated via dedup_configs(). The pNext chain surgery in
double_check_ici (manually unlinking VkImageFormatListCreateInfo) is
replaced by try_image_config's explicit format list chain/unchain.
The cube-compatible post-pass is simplified to a single check_ici
call instead of re-running the full negotiation.
Assisted-by: Claude
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41734>
Move disjoint/non-disjoint BindImageMemory into bind_image_memory().
create_image now reads as a clear sequence: format list, init_ici,
negotiate, pNext chain, CreateImage, allocate, bind. No functional
change.
Assisted-by: Claude
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41734>
Move VkExternalMemoryImageCreateInfo, DRM modifier explicit/list
create info, and user memory pNext chain building into
setup_image_pnext(). The Vk*Info structs now live in
image_pnext_state on the caller's stack. No functional change.
Assisted-by: Claude
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41734>