has_indirect_partial_stride is a direct copy of
screen->caps.multi_draw_indirect_partial_stride.
Read the cap from the screen directly.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
has_multi_draw_indirect is a direct copy of
screen->caps.multi_draw_indirect.
Read the cap from the screen directly at each call site.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
has_shareable_shaders is a direct copy of screen->caps.shareable_shaders.
Read the cap from the screen directly at each call site.
Fix st_format_test which was missing st_context.screen initialization,
exposed by this change.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
This cap is a direct alias of screen->caps.astc_void_extents_need_denorm_flush.
Replace all st_context usages with the screen cap directly.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41184>
More common and this implicitly enables this for Path Of Exile and X4
Foundations. Though, zero VRAM allocs is already the default in AMDGPU,
so that doesn't change anything in practice (except for very old
kernels).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41735>
libdrm dups the fd internally, so local_fd and get_fd() are different
fd number but they point to the same file descriptor. Close it right
after the amdgpu device is initialized to avoid keeping two fds open
for the same thing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41753>
Update the with_gfx_compute flag so decode-only builds do not
unnecessarily compile gfx/compute code.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41493>
Removes gfx, tgsi and driver_trace code from auxiliary for
non-gfx/compute builds, to reduce the number of files that gets
compiled.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41493>
This allows rusticl to make use of the native fma instructions giving us
better OpenCL performance.
e.g. ProjectPhysX_OpenCL-Benchmark on my GA102:
FP32 0.610 -> 11.474 TFLOPs/s
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41720>
Vulkan dispatch commands should multiply group count by local size,
but pre-compiled dispatches should not. For example, the predicate
indirect shader has a local size of 32 and a grid size equal to the
max draw count, which was resulting in a dispatch total grid size of
(max_draw_count * 32), overrunning the buffer.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41740>
Ported from HoneyKrisp upload pools, with some adaptations such as
providing the MTLBuffer and offset for use with certain commands.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41740>
The register values will depend on new fields in PS_STATE and it doesn't
seem like dynamic state belongs in radv_emit_fragment_shader_state.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41689>
to reduce the number of initialized PS VGPRs, increasing the PS wave launch
rate.
The pass will have more RADV-specific stuff.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41689>
These are SGPR inputs, so they are uniform in subgroups but may
have different values in different subgroups.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
These intrinsics are generally divergent between different
subgroups, but they can be uniform when all their sources
are also uniform.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
AMD SMEM instructions are always uniform within a subgroup,
but they may be divergent across subgroups, ie. each subgroup
may have a different value from the same SMEM instruction.
This needs to be considered for divergence across subgroups
as well as for vertex divergence, because vertices of the
same primitive may be split between different waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
There is no hardware which supports ffmaz with denorms.
We also need this to be seperate because there is AMD hardware
with ffma but not ffmaz.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41649>
This change adds the DRAW_ATTR_NONEXIST functionality
which fixes the memory access issue.
For instance, this issue is triggered with "piglit/bin/glsl-routing -auto -fbo":
==8384==ERROR: AddressSanitizer: heap-use-after-free on address 0xa11dfd84 at pc 0xae573fbd bp 0xbf87f688 sp 0xbf87f67c
READ of size 4 at 0xa11dfd84 thread T0
#0 0xae573fbc in emit_hw_vertex ../src/gallium/drivers/i915/i915_prim_emit.c:92
#1 0xae574ab0 in emit_prim ../src/gallium/drivers/i915/i915_prim_emit.c:154
#2 0xae574ab0 in setup_tri ../src/gallium/drivers/i915/i915_prim_emit.c:160
#3 0xad65d322 in do_triangle ../src/gallium/auxiliary/draw/draw_pipe.c:173
#4 0xad65d322 in pipe_run_linear ../src/gallium/auxiliary/draw/draw_decompose_tmp.h:181
#5 0xad663375 in draw_pipeline_run_linear ../src/gallium/auxiliary/draw/draw_pipe.c:337
#6 0xad86d9ac in pipeline ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:476
#7 0xad86d9ac in llvm_pipeline_generic ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:701
#8 0xad86ed75 in llvm_middle_end_linear_run ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:784
#9 0xad6aaaee in vsplit_segment_simple_linear ../src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h:223
#10 0xad6aaaee in vsplit_run_linear ../src/gallium/auxiliary/draw/draw_split_tmp.h:64
#11 0xad68a74b in draw_pt_arrays ../src/gallium/auxiliary/draw/draw_pt.c:161
#12 0xad68b7ca in draw_pt_arrays_restart ../src/gallium/auxiliary/draw/draw_pt.c:430
#13 0xad68b7ca in draw_instances ../src/gallium/auxiliary/draw/draw_pt.c:491
#14 0xad68ce0a in draw_vbo ../src/gallium/auxiliary/draw/draw_pt.c:628
#15 0xae5651d4 in i915_draw_vbo ../src/gallium/drivers/i915/i915_context.c:115
#16 0xae5651d4 in i915_draw_vbo ../src/gallium/drivers/i915/i915_context.c:51
#17 0xac7f50d3 in _mesa_draw_arrays ../src/mesa/main/draw.c:1204
Fixes: 247cee92df ("i915g: replace "uint" with normal uint32_t.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27571>
* `msl_ensure_vertex_point_size_output` is missing some NIR pass logic
after being converted to use lowered I/O, and only needs to be called
for point primitives.
* `nir_separate_merged_clip_cull_io` requires the `compact_arrays` NIR
option be enabled for compact clip/cull distance arrays.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41741>
We now handle that specific form on F2F and F2I but were still
asserting.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 67bfbc7535 ("nak: rework swizzling on scalar FP16 ops")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41771>
Let-parameters (e.g. hasRasterization, hasTessellation for
VkGraphicsPipelineCreateInfo) are local variables initialized by
reading a value from the stream. The codegen was creating them
with isConst=True, which caused streamPrimitive() to emit a cast
like:
hasRasterization = (const uint32_t)vkStream->getBe32();
The const qualifier on a scalar rvalue cast result is discarded
and triggers -Werror=ignored-qualifiers once that flag is enabled
for Soong compatibility, breaking the build:
src/gfxstream/guest/vulkan_enc/goldfish_vk_marshaling_guest.cpp:
In function 'void gfxstream::vk::unmarshal_VkGraphicsPipelineCreateInfo(
VulkanStreamGuest*, VkStructureType, VkGraphicsPipelineCreateInfo*)':
goldfish_vk_marshaling_guest.cpp:4202:28: error: type qualifiers
ignored on cast result type [-Werror=ignored-qualifiers]
4202 | hasRasterization = (const uint32_t)vkStream->getBe32();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
goldfish_vk_marshaling_guest.cpp:4207:27: error: type qualifiers
ignored on cast result type [-Werror=ignored-qualifiers]
4207 | hasTessellation = (const uint32_t)vkStream->getBe32();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: some warnings being treated as errors
Mark the let-param as non-const since it is a local that gets
assigned.
Fixes: 190ce8280f ("meson: Add Soong compatibility compiler flags to Vulkan drivers")
Assisted-by: Claude Code (Opus 4.7)
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41763>
The FS compilation needs the VUE map from the previous stage when the FS
has more inputs than SBE_SWIZ can remap. Recomputing a VUE map just
from FS inputs loses certain slots like multiview slots and extra
position slots (for primitive replication), so high-numbered attributes
can read the wrong source.
When available, pass the previous stage VUE map to the FS compilation
and use it. Make sure that the payload is sized based on what is read,
in case the previous stage has more outputs than FS reads.
Bugs did not surface when there were just 16 or fewer varying inputs,
because the driver can program SBE_SWIZ to translate the positions in
the previous shader VUE into what the FS wants. For more inputs, this
mapping is not used, and the FS must use the exact same slots.
Note this is not a problem for pipeline libraries because they use
a different fixed layout. This is also not an issue with
EXT_shader_object because multiview draws are not allowed with that
extension.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41747>