Commit graph

213920 commits

Author SHA1 Message Date
Romaric Jodin
a25e88cd84 aux/trace: remove -I argument
enums2names.py is only uses in one place. I propose to remove the -I
argument that is not strictly necessary as we can already get the
header name from the `-H` argument.

That modification is motivated by the need to help ninja-to-soong to
generate proper rule for the Android build system.

ninja-to-soong can't differenciate output file location and a string
matching the output file name.

Ref #14072

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37785>
2025-10-23 07:03:01 +00:00
Faith Ekstrand
aa0f404f7b nvk: Disable sampleLocationsSampleCounts for 1x MSAA
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Suggested-by: Mel Henning <mhenning@darkrefraction.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14108
Fixes: a34edc7500 ("nvk: Fill out sample locations on Maxwell B+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38018>
2025-10-22 21:55:22 +00:00
Faith Ekstrand
d1793c7a59 nvk: Include the chipset in the pipeline/binary cache UUID
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38018>
2025-10-22 21:55:22 +00:00
Lionel Landwerlin
aa929ea706 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38012>
2025-10-22 21:21:47 +00:00
Aitor Camacho
cf49338ccd kk: Expose missing BC formats
Exposing textureCompressionBC requires all BC formats in Vulkan.
Missing BC formats added in this commit:
 - VK_FORMAT_BC1_RGB_UNORM_BLOCK
 - VK_FORMAT_BC1_RGB_SRGB_BLOCK

Fixes: fa6fa8c19e ("kk: BCn Formats")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38014>
2025-10-22 21:08:27 +00:00
Lionel Landwerlin
f3df267735 brw: handle GLSL/GLSL tessellation parameters
Apparently various tessellation parameters come specified from
TESS_EVAL stage in GLSL while they come from the TESS_CTRL stage in
HLSL.

We switch to store the tesselation params more like shader_info with 0
values for unspecified fields. That let's us merge it with a simple OR
with values from from tcs/tes and the resulting merge can be used for
state programming.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Fixes: 50fd669294 ("anv: prep work for separate tessellation shaders")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37979>
2025-10-22 20:48:59 +00:00
Lionel Landwerlin
8d05b7b72e anv: rename structure holding 3DSTATE_WM_DEPTH_STENCIL state
Cc stable for the next commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37979>
2025-10-22 20:48:59 +00:00
Mel Henning
fafb81cd02 treewide: Use vk_collect_dependency_info_src_stages
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37961>
2025-10-22 19:22:17 +00:00
Mel Henning
90d17c0088 vulkan: Add vk_collect_dependency_info_src_stages
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37961>
2025-10-22 19:22:17 +00:00
Valentine Burley
fd2fa0fbc9 tu: Fix maxVariableDescriptorCount with inline uniform blocks
It must not be larger than maxInlineUniformBlockSize.

Fixes VKCTS 1.4.4.0's
dEQP-VK.api.maintenance3_check.support_count_inline_uniform_block*.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38002>
2025-10-22 18:14:00 +00:00
Valentine Burley
17e25b4983 tu: Fix indexing with variable descriptor count
Based on RADV.
The Vulkan spec says:
    "If bindingCount is zero or if this structure is not included in
     the pNext chain, the VkDescriptorBindingFlags for each descriptor
     set layout binding is considered to be zero. Otherwise, the
     descriptor set layout binding at
     VkDescriptorSetLayoutCreateInfo::pBindings[i] uses the flags in
     pBindingFlags[i]."

Fixes dEQP-VK.api.maintenance3_check.* in VKCTS 1.4.4.0.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38002>
2025-10-22 18:14:00 +00:00
Simon Perretta
ff51e6dc9e nir: commonize barycentric intrinsic opt pass
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
2025-10-22 16:48:01 +00:00
Caio Oliveira
e38491eb18 mesa/st: Lower to ALU scalar after fp64 subgroup lowering
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The subgroup lowering may generate new fp64 vector operations, so
ensure that those are lowered before calling nir_lower_doubles().

Issue spotted by Georg Lehmann.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38003>
2025-10-22 15:35:34 +00:00
Isaac Marovitz
fa6fa8c19e kk: BCn Formats
Signed-off-by: Isaac Marovitz <isaacryu@icloud.com>
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37967>
2025-10-22 15:21:29 +00:00
Jarrett Johnson
a62f285607 kk: advertise multiDrawIndirect
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37962>
2025-10-22 15:08:13 +00:00
Olivia Lee
bca29b1c92 hk: fix data race when initializing poly_heap
hk_heap is called during command buffer recording, which may be
concurrent, so writing dev->heap without synchronization is a data race.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37973>
2025-10-22 14:54:24 +00:00
Benjamin Cheng
b6d6c1af73 radv/video_enc: Cleanup slice count assert
This was left over when first enabling multiple slice encoding.

Fixes: 63e952ff2c ("radv/video: Support encoding multiple slices")
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37999>
2025-10-22 14:31:45 +00:00
Alyssa Rosenzweig
05481f56a0 brw: use the right int8/int16 division lowering
lowering bitsize before lowering idiv is silly, since then it forces us
down the software int32 division path instead of the much faster
int8/int16 lowered path. Relevant CTS tests:

dEQP-VK.spirv_assembly.type.scalar.i16.div_comp,
dEQP-VK.spirv_assembly.type.scalar.i8.rem_comp,

Go from:

SIMD8 shader: 46 instructions. 1 loops. 4716 cycles. 0:0 spills:fills
SIMD8 shader: 1008 instructions. 0 loops. 3600 cycles. 0:0 spills:fills, 8 sends

to:

SIMD8 shader: 17 instructions. 1 loops. 2556 cycles. 0:0 spills:fills
SIMD8 shader: 464 instructions. 0 loops. 1394 cycles. 0:0 spills:fills, 8 sends

No stats change on fossil-db (which has very little int8/int16 and even
less integer division, apparently).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37966>
2025-10-22 10:00:36 -04:00
Aksel Hjerpbakk
c2a6fb6419 panvk: cull semaphores in unrelated subqueues
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Skip waiting/signaling on semaphores with stages not related
to a given subqueue

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37810>
2025-10-22 11:48:50 +00:00
Aksel Hjerpbakk
1381228329 panvk: refactor vk_stage_to_subqueue_mask
- rename vk_stage_to_subqueue_mask -> vk_stages_to_subqueue_mask
- handle stage masks instead of single stages.
- Add which sync scope it is reading to better reason with the mask semantics.
- Handle ALL_COMMANDS as well as TOP/BOTTOM (using sync scopes)
- add timestamp utility vk_stage_to_timestamp_subqueue_mask

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37810>
2025-10-22 11:48:50 +00:00
Julian Orth
9fde755357 kopper: disable color management for wayland surfaces
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise clients cannot use color management themselves.

Signed-off-by: Julian Orth <ju.orth@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37693>
2025-10-22 10:22:41 +00:00
Pierre-Eric Pelloux-Prayer
90103fe618 radeonsi: propagate shader updates for merged shaders
In case of merged shaders (eg: VS+GS), a change to VS should trigger
a GS update.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13935
Fixes: b1a34ac95d ("radeonsi: change do_update_shaders boolean to a bitmask")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37882>
2025-10-22 09:45:49 +00:00
Pierre-Eric Pelloux-Prayer
091f18ea57 radeonsi: set VS dirty bit from si_vs_key_update_inputs
Since it has to be done for every caller we might as well
do it from a single place.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37882>
2025-10-22 09:45:49 +00:00
Pierre-Eric Pelloux-Prayer
d3f8571e84 radeonsi: limit the sqtt buffer size
We don't want the buffer size to wraparound to 0.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37882>
2025-10-22 09:45:49 +00:00
Lars-Ivar Hesselberg Simonsen
a7bf37e3bd panvk/v9+: Reduce maxBoundDescriptorSets to 7
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Until now, the driver has been using a single set for internal state
while leaving maxBoundDescriptorSets to the remaining 15.

This gives us no room for optimizations of driver sets, which might
become an issue in the future.

To remedy this, we therefore reduce maxBoundDescriptorSets to 7. This
aligns with the proprietary driver and gives us the space to optimize
the driver sets.

We might increase this in the future if we see that we don't need all
the driver sets we now reserve.

Reviewed-by: John Anthony <john.anthony@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37978>
2025-10-22 10:38:35 +02:00
Eric Engestrom
4ab65cdaa4 docs: update/fix vk spec urls
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37993>
2025-10-22 09:23:34 +02:00
Faith Ekstrand
efbecd93ba util: Build util/cache_ops_x86.c with -msse2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
__builtin_ia32_clflush() requires -msse2 so we need to set -msse2 at
least for building that file.  Fortunately, there are no GPUs that
actually need userspace cache flushing that can ever be bolted onto a
pre-SSE2 x86 CPUs.

Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14134
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37958>
2025-10-22 03:36:10 +00:00
Faith Ekstrand
3739d7a90c util: Don't advertise cache ops on x86 without SSE2
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37958>
2025-10-22 03:36:10 +00:00
Olivia Lee
a410d90fd2 panfrost: fix cl_local_size for precompiled shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
2025-10-22 00:15:49 +00:00
Aitor Camacho
02a1ec1021 kk: Hash vertex input state
We embed vertex fetching into vertex shaders and therefore
we require hashing that state so the hash works as expected.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37992>
2025-10-21 23:59:33 +00:00
Lorenzo Rossi
dc0dcc993b nvk: implement VK_EXT_discard_rectangles
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33476>
2025-10-21 23:17:38 +00:00
Lorenzo Rossi
4c62e09505 vulkan: increase MESA_VK_MAX_DISCARD_RECTANGLES
Turing and newer Nvidia cards can work with up to 8 discard rectangles

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33476>
2025-10-21 23:17:38 +00:00
Rhys Perry
b18421ae3d amd/lower_mem_access_bit_sizes: fix shared access when bytes<bit_size/8
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.

This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Rhys Perry
64ec757688 nir/lower_mem_access_bit_sizes: increase chunk limit
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Rhys Perry
e89b22280f amd/lower_mem_access_bit_sizes: be more careful with 8/16-bit scratch load
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.3
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Rhys Perry
8829fc3bd6 amd/lower_mem_access_bit_sizes: improve subdword/unaligned SMEM lowering
Summary of changes:
- handle unaligned 16-bit scalar loads when supported_dword=true
- increases the size of 8/16/32/64-bit buffer loads which are not dword
  aligned, which can create less SMEM loads.
- handles when "bytes" is less than "bit_size / 8"

fossil-db (gfx1201):
Totals from 26 (0.03% of 79839) affected shaders:
Instrs: 12676 -> 12710 (+0.27%); split: -0.30%, +0.57%
CodeSize: 67272 -> 67384 (+0.17%); split: -0.24%, +0.40%
Latency: 44399 -> 44375 (-0.05%); split: -0.09%, +0.04%
SClause: 352 -> 344 (-2.27%)
SALU: 3972 -> 3992 (+0.50%)
SMEM: 554 -> 528 (-4.69%)

fossil-db (navi21):
Totals from 6 (0.01% of 79825) affected shaders:
Instrs: 2192 -> 2186 (-0.27%)
CodeSize: 12188 -> 12140 (-0.39%)
Latency: 10037 -> 10033 (-0.04%); split: -0.12%, +0.08%
SMEM: 124 -> 118 (-4.84%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:34 +00:00
Rhys Perry
79b2fa785d amd/lower_mem_access_bit_sizes: don't create subdword UBO loads with LLVM
These are unsupported.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14127
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
2025-10-21 22:10:33 +00:00
Dylan Baker
38e1a43f53 intel/mda: Fix potential underflow in printing code
The actual chances of this happening seem dubious, but the cleaned up
code seems nice. printf returns a value >= 0 on success, which is the
number of characters it writes a return < 0 means that an error
occurred, and then errno is set. Which negative value doesn't seem to be
specified, but it also seems unlikely that any implementation would
return `-MAX_INT`...

Anyway, this is fixed by converting the generic `print_repeated` to a
`print_separator` that avoids the need to do arithmetic at all by just
stopping the loop at 1 instead of 0, and then printing a newline.

CID: 1666497
CID: 1666256
CID: 1666531
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37746>
2025-10-21 21:55:53 +00:00
Dylan Baker
f25e59b951 intel/mda/tests: use an ASSERT on fread()
Coverity is pointing out that we should check this, and in reality if
this isn't what we expect the rest of the test is probably invalid
anyway.

CID: 1666504
CID: 1666544
CID: 1666552
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37750>
2025-10-21 14:39:18 -07:00
Mel Henning
28fbc6addb nvk: VK_DEPENDENCY_ASYMMETRIC_EVENT_BIT_KHR
This was missed in the original maintenance9 MR.

Fixes the flakes in test
dEQP-VK.synchronization2.op.single_queue.event.write_ssbo_compute_read_ssbo_compute.buffer_16384_maintenance9

Fixes: 7692d3c0 ("nvk: Advertise VK_KHR_maintenance9")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37964>
2025-10-21 20:57:41 +00:00
Karol Herbst
e7dca5a6ca nak: fix MMA latencies on Ampere
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 7a01953a39 ("nak: Add Ampere and Ada latency information")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
2025-10-21 20:12:30 +00:00
Karol Herbst
cf4df97093 nak: improve fp16 latencies on Ampere
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
2025-10-21 20:12:30 +00:00
Karol Herbst
85480200f8 nak: simplify SM80 HMMA latency categorization
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
2025-10-21 20:12:30 +00:00
Karol Herbst
3bbf3f7826 nak: ensure deref has a ptr_stride in cmat load/store lowering
With untyped pointer we might get a deref_cast with a 0 ptr_stride. But we
were supposed to ignore the stride information on the pointer anyway, so
let's do that properly now.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
2025-10-21 20:12:30 +00:00
Karol Herbst
f632bfc715 nak: extract cmat load/store element offset calculation
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
2025-10-21 20:12:30 +00:00
Konstantin Seurer
d423554e9e radv/bvh: Pair compress triangles in more cases
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
2025-10-21 19:32:55 +00:00
Konstantin Seurer
c0f332f1cb vulkan/bvh: Add leaf.h to vk_bvh_includes
Otherwise, the shader will not recompile when the file was modified.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
2025-10-21 19:32:55 +00:00
Konstantin Seurer
020bd86d30 vulkan: Remove the vk_ir_triangle_node::id field
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
2025-10-21 19:32:55 +00:00
Konstantin Seurer
c18a7d0e2b radv: Emit compressed primitive nodes on GFX12
The normal encode pass writes batches to a section in build scratch
memory. Those batches contain information about the internal node and
the primitive nodes. The encoder is split to avoid the register
pressure of the compressor and maximize occupancy.

The compressor works in two passes because one pass can not guarantee
that every primitive node (except) has at least two triangles. This
guarantee is used to advertise a smaller acceleration structure size to
the application.

During compression, every invocation processes at most two triangles.
Groups of 8 invocations are used to support the maximum triangle count
of 16 that the hardware supports.

The first step of compression is loading the triangle(s). Shared
vertices are deduplicated early to avoid doing it in the compression
loop. The compression loop tries to add triangles to a list of triangles
until the computed node size needed for storing the triangles reaches
the hardware node size. For this, each invocation first deduplicates
vertices with the triangles that have already been picked. It then
computes the node size of the picked triangles plus the candidate
triangles of the current invocation. The invocation that computed the
smallest size is added to the list.

Because it may not be possible to fit every triangle into the same node,
there can be multiple hardware nodes which are written in parallel for
optimal performance. If there are no nodes with only one triangle, all
nodes are written. If there is, compression of the batch is aborted and
the index of the batch is written to build scratch memory. The second
compression pass will repeat the steps above but only for those aborted
batches. The nodes with only one triangle can and are now merged.

It can not be determined during box node encode which triangles will be
compressed together so the encoder also has to fix up the parent box
node's child infos.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
2025-10-21 19:32:55 +00:00
Konstantin Seurer
c5f9fe5e3b radv/rra/gfx12: Properly validate geometry indices
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
2025-10-21 19:32:54 +00:00