Commit graph

186646 commits

Author SHA1 Message Date
Rebecca Mckeever
ddbbc1d217 panvk: Update panvk_get_desc_stride prototype
This will help set things up for multiplane samplers and textures.

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
9e5b6370c0 panvk: Create helper function for sampler descriptor emission
This will help set things up for multiplane samplers.

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
339c58f21f panvk: Change immutable_samplers to panvk_sampler **
We will need vk_sampler for colorspace conversion.

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
53df2c2260 panvk: Move single-plane views of multiplane formats to pview.planes[0]
Place the view plane at index 0 for single-plane views of multiplane
formats. Does not apply to YCbCr views of multiplane images since
view->vk.aspects for those will contain the full set of plane aspects.

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
9c4b530c49 panvk: Allow a 32-bit binding value in desc id key and use 64-bit keys
Since the binding value can be any 32-bit number, we cannot assume that
it is <= 27 bits. We need 64-bit keys to accommodate a 32-bit binding.

This will also provide more bits to store the subdesc id, which will be
needed for multiplane texture and sampler descriptors.

Fixes: 7bea6f86 ("panvk: Overhaul the Bifrost descriptor set implementation")

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
1d0f44739d util/hash_table: Add _mesa_hash_table_u64_replace()
This function updates the data of a u64 hash_table entry and is safe to
use inside a hash_table_u64_foreach() loop.

Fixes: 7bea6f86 ("panvk: Overhaul the Bifrost descriptor set implementation")

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Rebecca Mckeever
3b5114a34b vk/meta: Extend copy/fill/update helpers to support YCbCr
Since copies happen one plane at a time, we can handle multiplanar copies
like color copies. The user gets to decide the format to use for each
plane, but the pipeline type and the optimal tile size applies to the
whole image.

Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
2025-02-08 07:48:40 +00:00
Kenneth Graunke
d06c3e21ac brw: Drop unnecessary mlen/header_size on virtual GET_BUFFER_SIZE op
The logical send lowering code sets these, and is the code which
-should- set these.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
37a6278c9f brw: Drop INTERPOLATE_AT mlen handling from size_read()
FS_OPCODE_INTERPOLATE_AT_{SAMPLE,SHARED_OFFSET} never have a mlen set.
They are lowered to SHADER_OPCODE_SEND in logical send lowering, at
which point they acquire an mlen, but cease to be those opcodes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
ae60338142 brw: Lower MEMORY_FENCE and INTERLOCK in lower_logical_sends
We teach lower_logical_sends to lower these to SHADER_OPCODE_SEND
and drop all the corresponding generator and eu_emit code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
7b4e31b243 brw: Add latencies for HDC/RC memory fences
We're about to start lowering these in the IR, at which point the
scheduler will see SEND instructions with fence messages.  Previously,
we handled those in the generator, and didn't handle the virtual opcodes
here, letting them fall through to the default case of 14 cycles.

These new numbers are completely fabricated, matching the times we have
for atomic operations.  This is basically what we did for LSC atomics.
While it may not be accurate, it's at least better than 14 cycles.

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
b9de19f917 brw: Eliminate the BTI source from MEMORY_FENCE/INTERLOCK opcodes
Memory fences do not refer to an element of a binding table.  Rather,
the reason we had "BTI" in these opcodes was to distinguish what in
modern terms are called UGM (untyped memory data cache) vs. SLM
(cross-thread shared local memory) fences.

Icelake and older platforms used the "data cache" SFID for both
purposes, distinguishing them by having a special binding table
index, 254, meaning "this is actually SLM access".  This is where
the notion that fences had BTIs came in.  (In fact, prior to Icelake,
separate SLM fences were not a thing, so BTI wasn't used there either.)

To avoid confusion about BTI being involved, we choose a simpler lie: we
have Icelake SLM fences target GFX12_SFID_SLM (like modern platforms
would), even though it didn't really exist back then.  Later lowering
code sets it back to the correct Data Cache SFID with magic SLM binding
table index.  This eliminates BTI everywhere and an unnecessary source.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
43d0ac9eb4 brw: Change destination of memory fences to UD type
For some reason, we were using UW type for the destination of memory
fences at the generator level, while in the IR we selected UD.

There are some comments in the documentation for the message about it
writing the notification register to the destination, which is 32-bit.
Prior to Xe2, bits 31:16 were Reserved/MBZ.  But on Xe2, all 32 bits
are populated with actual data.

I don't know whether this will fix anything in practice, but it seems
like a better plan to use UD.  Often we used UW types to avoid having
the destination region of sends span too many registers, but we're in
SIMD1 here, so it shouldn't matter.

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
c0a32af125 brw: Use correct builder size for MEMORY_FENCE/INTERLOCK virtual opcodes
brw_memory_fence() overrides the instructions generated by the
MEMORY_FENCE or INTERLOCK opcodes to be force_writemask_all with
exec_size == 1.  But the IR was emitting it in SIMD8 (regardless
of dispatch width).  Instead, just emit the IR as SIMD1/NoMask so
the IR matches what we actually generate.  Have size_written indicate
that the entire destination is written, however, as it is ultimately
going to be a SEND that writes a whole register.

We were also using a UD register for the source of
FS_OPCODE_SCHEDULING_FENCE when the generator overrides it to UW,
so just specify UW in the IR as well so that they line up.

Also add validation for MEMORY_FENCE/INTERLOCK that we've done the
exec_size and masking right in the IR.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
accef5e8f5 brw: Replace fs_inst::target field with logical FB read/write sources
We can just specify this as a source to the logical FB read/write
opcodes.  Notably FB reads had no sources before; now they have one.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
32dd722ff3 brw: Replace fs_inst::last_rt with a logical control source
Rather than using a bit in the generic fs_inst data structure, we can
simply set a source on our logical FB write messages.  (We already do
so for many other cases.)

In the repclear shader, setting this wasn't actually having an effect,
as we were setting it on a SHADER_OPCODE_SEND message which ignored it.
(We had already correctly set the bit in the message descriptor.)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
fce01b8461 brw: Drop FB_WRITE_LOGICAL_SRC_DST_DEPTH source
This was used for legacy depth passthrough on older hardware.  Gfx9+
doesn't actually have dst depth as part of the message, which is the
only hardware brw supports these days.

It sure looks like we were setting it though...

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
7390d6189c brw: Replace fs_inst::pi_noperspective with a logical control source
We already have logical pixel interpolator messages that get lowered
to send messages.  We can just add an extra boolean source to those
opcodes rather than sticking a opcode-specific boolean in the generic
fs_inst data structure.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
168ac07ffd brw: Eliminate fs_inst::shadow_compare
brw_lower_logical_sends can just check for the TEX_LOGICAL_SRC_SHADOW_C
source; we don't need a generic instruction bit for this.  We used to
have one because this was handled in the generator for older hardware
before the advent of logical opcode lowering.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Kenneth Graunke
df836ee895 brw: Drop unused defines
Nothing uses these.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
2025-02-08 01:07:22 +00:00
Ian Romanick
9c133fe638 crocus: Use nir_shader_intrinsics_pass in crocus_lower_storage_image_derefs
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33450>
2025-02-07 23:20:16 +00:00
Ian Romanick
d2458f964f iris: Use nir_shader_intrinsics_pass in iris_lower_storage_image_derefs
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33450>
2025-02-07 23:20:16 +00:00
Ian Romanick
40948b9715 crocus: Add missing nir_metadata_preserve in crocus_lower_storage_image_derefs
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Closes: #12589
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33450>
2025-02-07 23:20:16 +00:00
Ian Romanick
f2a01be57e iris: Add missing nir_metadata_preserve in iris_lower_storage_image_derefs
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 26a54ae4b2 ("iris: lower storage image derefs")
Closes: #12589
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33450>
2025-02-07 23:20:16 +00:00
Benjamin Lee
08cd331cc0 panvk: implement VK_EXT_separate_stencil_usage
Needed for Vulkan 1.2.

The only real improvement from this is that in some situations we can
skip creating texture descriptors for image views that have a more
restrictive usage for either the depth or stencil aspect.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33392>
2025-02-07 12:54:33 -08:00
Aaron Ruby
2553d60d47 gfxstream: Add common interfaces in the VirtGpuDevice to query DrmInfo
and PciBusInfo

- Advertise the availability of these extensions, fully implemented as
guestOnly features

Reviewed-By: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33363>
2025-02-07 17:08:34 +00:00
Aaron Ruby
94f8244ac8 gfxstream: Change "mesaOnly" nomenclature to be "guestOnly"
This refers to extensions that are fully implemented by the guest driver

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33363>
2025-02-07 17:08:34 +00:00
Aaron Ruby
5d2c0cc526 gfxstream: Make the virtgpu device discovery for LinuxVirtGpu more robust
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33363>
2025-02-07 17:08:34 +00:00
Eric R. Smith
e550a3cab0 panfrost: avoid potential divide by 0 calculating timer_resolution
On armhf integer divide by 0 can raise SIGFPE, whereas on aarch64
it just returns 0. This has become an issue because the recently
added panfrost_init_screen_caps always calls pan_gpu_time_to_ns to
calculate caps->timer_resolution, whereas before we only called it
when PIPE_CAP_TIMER_RESOLUTION was queried, and only OpenCL
does that (and not always).

Fixes: 205669e3a9 ("panfrost: add panfrost_init_screen_caps")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33435>
2025-02-07 14:51:57 +00:00
Erik Faye-Lund
2ae97a4eb6 panvk: correct number of read bytes for dynamic buffers
This function takes the number of bytes, not number of entries. This
should hopefully fix start-up issues on Citra.

While we're at it, fixup the alignment of the line that writes the
bytes.

Fixes: 27beadcbdb ("panvk: Extend the shader logic to support Valhall")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12539
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33429>
2025-02-07 14:34:11 +00:00
Rhys Perry
c3d27906d8 radv: vectorize lowered shader IO
fossil-db (navi31):
Totals from 2329 (2.93% of 79377) affected shaders:
MaxWaves: 72152 -> 72102 (-0.07%)
Instrs: 1048791 -> 1041920 (-0.66%); split: -0.72%, +0.07%
CodeSize: 5331832 -> 5285572 (-0.87%); split: -0.90%, +0.03%
VGPRs: 113844 -> 113820 (-0.02%); split: -0.14%, +0.12%
Latency: 4349524 -> 4346374 (-0.07%); split: -0.35%, +0.28%
InvThroughput: 609449 -> 609235 (-0.04%); split: -0.27%, +0.24%
VClause: 22613 -> 22451 (-0.72%); split: -1.03%, +0.31%
SClause: 21197 -> 21177 (-0.09%); split: -0.45%, +0.35%
Copies: 81900 -> 82446 (+0.67%); split: -1.51%, +2.18%
PreSGPRs: 94697 -> 93596 (-1.16%); split: -1.23%, +0.07%
PreVGPRs: 69962 -> 70080 (+0.17%); split: -0.01%, +0.18%
VALU: 625247 -> 625390 (+0.02%); split: -0.23%, +0.25%
SALU: 101692 -> 101555 (-0.13%); split: -0.24%, +0.11%
VMEM: 46459 -> 44845 (-3.47%)

fossil-db (navi21):
Totals from 17522 (22.07% of 79377) affected shaders:
MaxWaves: 425698 -> 425460 (-0.06%); split: +0.00%, -0.06%
Instrs: 11444215 -> 11428321 (-0.14%); split: -0.14%, +0.00%
CodeSize: 59227492 -> 59019376 (-0.35%); split: -0.35%, +0.00%
VGPRs: 780920 -> 781208 (+0.04%); split: -0.00%, +0.04%
Latency: 44965072 -> 44926529 (-0.09%); split: -0.12%, +0.03%
InvThroughput: 9718148 -> 9728793 (+0.11%); split: -0.01%, +0.12%
VClause: 225732 -> 225605 (-0.06%); split: -0.10%, +0.04%
SClause: 217196 -> 217160 (-0.02%); split: -0.03%, +0.01%
Copies: 1050351 -> 1065263 (+1.42%); split: -0.03%, +1.45%
PreSGPRs: 747538 -> 747223 (-0.04%); split: -0.05%, +0.01%
PreVGPRs: 626702 -> 626748 (+0.01%); split: -0.00%, +0.01%
VALU: 6629403 -> 6643822 (+0.22%); split: -0.01%, +0.23%
SALU: 1898492 -> 1898452 (-0.00%); split: -0.00%, +0.00%
VMEM: 529942 -> 528361 (-0.30%)

fossil-db (vega10):
Totals from 1791 (2.84% of 62962) affected shaders:
MaxWaves: 12270 -> 12253 (-0.14%); split: +0.01%, -0.15%
Instrs: 602026 -> 597473 (-0.76%); split: -0.83%, +0.08%
CodeSize: 3109872 -> 3071664 (-1.23%); split: -1.26%, +0.03%
SGPRs: 137826 -> 137938 (+0.08%); split: -0.10%, +0.19%
VGPRs: 70364 -> 70520 (+0.22%); split: -0.03%, +0.26%
Latency: 4757850 -> 4781905 (+0.51%); split: -0.35%, +0.86%
InvThroughput: 2296941 -> 2310685 (+0.60%); split: -0.14%, +0.74%
VClause: 14161 -> 14050 (-0.78%); split: -1.23%, +0.44%
SClause: 14058 -> 14077 (+0.14%); split: -0.57%, +0.70%
Copies: 40954 -> 42191 (+3.02%); split: -1.69%, +4.71%
PreSGPRs: 64314 -> 63214 (-1.71%); split: -1.81%, +0.10%
PreVGPRs: 53558 -> 53894 (+0.63%); split: -0.01%, +0.64%
VALU: 449920 -> 450830 (+0.20%); split: -0.19%, +0.39%
SALU: 32973 -> 32839 (-0.41%); split: -0.76%, +0.35%
VMEM: 28796 -> 25151 (-12.66%)

fossil-db (polaris10):
Totals from 1769 (2.86% of 61794) affected shaders:
MaxWaves: 12024 -> 12021 (-0.02%)
Instrs: 474761 -> 470760 (-0.84%); split: -0.94%, +0.10%
CodeSize: 2447964 -> 2420712 (-1.11%); split: -1.15%, +0.04%
SGPRs: 129664 -> 129728 (+0.05%); split: -0.14%, +0.19%
VGPRs: 65216 -> 65560 (+0.53%); split: -0.05%, +0.58%
Latency: 4304734 -> 4318319 (+0.32%); split: -0.41%, +0.72%
InvThroughput: 2114950 -> 2122580 (+0.36%); split: -0.18%, +0.54%
VClause: 10933 -> 10808 (-1.14%); split: -1.42%, +0.27%
SClause: 11430 -> 11446 (+0.14%); split: -0.70%, +0.84%
Copies: 32290 -> 31891 (-1.24%); split: -2.80%, +1.56%
PreSGPRs: 58184 -> 57096 (-1.87%); split: -1.98%, +0.11%
PreVGPRs: 48757 -> 48874 (+0.24%); split: -0.02%, +0.26%
VALU: 359097 -> 358582 (-0.14%); split: -0.25%, +0.11%
SALU: 26279 -> 25934 (-1.31%); split: -1.75%, +0.43%
VMEM: 18825 -> 17247 (-8.38%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
953faac23e radv: vectorize descriptor loads
fossil-db (navi31):
Totals from 49237 (62.03% of 79377) affected shaders:
MaxWaves: 1497901 -> 1497851 (-0.00%); split: +0.00%, -0.00%
Instrs: 25766029 -> 25595468 (-0.66%); split: -0.68%, +0.02%
CodeSize: 133811412 -> 132616356 (-0.89%); split: -0.90%, +0.01%
VGPRs: 2318068 -> 2318200 (+0.01%); split: -0.00%, +0.01%
SpillSGPRs: 4512 -> 4507 (-0.11%); split: -0.64%, +0.53%
Latency: 164086813 -> 163869930 (-0.13%); split: -0.22%, +0.09%
InvThroughput: 24811220 -> 24802709 (-0.03%); split: -0.05%, +0.02%
VClause: 553717 -> 557194 (+0.63%); split: -0.30%, +0.93%
SClause: 723038 -> 710431 (-1.74%); split: -2.69%, +0.95%
Copies: 1709226 -> 1711030 (+0.11%); split: -0.48%, +0.59%
Branches: 465169 -> 465164 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1775360 -> 1961282 (+10.47%); split: -0.01%, +10.48%
VALU: 15418039 -> 15417896 (-0.00%); split: -0.02%, +0.02%
SALU: 2424519 -> 2416263 (-0.34%); split: -0.61%, +0.26%
SMEM: 1245273 -> 1121006 (-9.98%)
VOPD: 3882 -> 3885 (+0.08%); split: +0.18%, -0.10%

fossil-db (navi21):
Totals from 48539 (61.15% of 79377) affected shaders:
MaxWaves: 1262958 -> 1262912 (-0.00%); split: +0.00%, -0.01%
Instrs: 30334013 -> 30154279 (-0.59%); split: -0.60%, +0.01%
CodeSize: 161298192 -> 160027616 (-0.79%); split: -0.80%, +0.01%
VGPRs: 1979248 -> 1979192 (-0.00%); split: -0.01%, +0.01%
SpillSGPRs: 3751 -> 3776 (+0.67%); split: -0.75%, +1.41%
Latency: 185665578 -> 185429672 (-0.13%); split: -0.23%, +0.10%
InvThroughput: 41413438 -> 41406558 (-0.02%); split: -0.03%, +0.02%
VClause: 624116 -> 626703 (+0.41%); split: -0.30%, +0.71%
SClause: 775094 -> 764569 (-1.36%); split: -2.73%, +1.38%
Copies: 2437041 -> 2441758 (+0.19%); split: -0.23%, +0.42%
Branches: 770540 -> 770552 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1919117 -> 2021093 (+5.31%); split: -0.01%, +5.32%
VALU: 18926346 -> 18926269 (-0.00%); split: -0.01%, +0.01%
SALU: 4316722 -> 4310066 (-0.15%); split: -0.33%, +0.17%
SMEM: 1350230 -> 1216865 (-9.88%)

fossil-db (vega10):
Totals from 41793 (66.38% of 62962) affected shaders:
MaxWaves: 306797 -> 306685 (-0.04%); split: +0.02%, -0.06%
Instrs: 16251398 -> 16140153 (-0.68%); split: -0.71%, +0.02%
CodeSize: 83407848 -> 82543596 (-1.04%); split: -1.05%, +0.01%
SGPRs: 2787936 -> 2854864 (+2.40%); split: -0.73%, +3.13%
VGPRs: 1585644 -> 1586156 (+0.03%); split: -0.01%, +0.05%
SpillSGPRs: 3856 -> 3843 (-0.34%); split: -1.50%, +1.17%
SpillVGPRs: 560 -> 562 (+0.36%)
Latency: 167478607 -> 166829429 (-0.39%); split: -0.50%, +0.12%
InvThroughput: 76378642 -> 76353650 (-0.03%); split: -0.06%, +0.03%
VClause: 361639 -> 362694 (+0.29%); split: -0.31%, +0.60%
SClause: 546919 -> 535879 (-2.02%); split: -2.98%, +0.96%
Copies: 1388817 -> 1396020 (+0.52%); split: -0.37%, +0.89%
Branches: 227697 -> 227705 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1384316 -> 1532654 (+10.72%); split: -0.01%, +10.73%
VALU: 11896315 -> 11896547 (+0.00%); split: -0.01%, +0.01%
SALU: 1371452 -> 1370143 (-0.10%); split: -0.83%, +0.73%
VMEM: 628506 -> 628510 (+0.00%)
SMEM: 984495 -> 882129 (-10.40%)

fossil-db (polaris10):
Totals from 41057 (66.44% of 61794) affected shaders:
MaxWaves: 270307 -> 270311 (+0.00%); split: +0.02%, -0.01%
Instrs: 16082187 -> 15972163 (-0.68%); split: -0.71%, +0.02%
CodeSize: 82199592 -> 81341176 (-1.04%); split: -1.05%, +0.01%
SGPRs: 2894960 -> 2970720 (+2.62%); split: -0.67%, +3.29%
VGPRs: 1620132 -> 1620352 (+0.01%); split: -0.01%, +0.02%
SpillSGPRs: 3885 -> 3868 (-0.44%); split: -1.47%, +1.03%
SpillVGPRs: 617 -> 619 (+0.32%)
Latency: 166722696 -> 166066137 (-0.39%); split: -0.52%, +0.13%
InvThroughput: 76887856 -> 76862349 (-0.03%); split: -0.08%, +0.04%
VClause: 353499 -> 354709 (+0.34%); split: -0.28%, +0.62%
SClause: 544073 -> 533053 (-2.03%); split: -2.97%, +0.95%
Copies: 1398025 -> 1405848 (+0.56%); split: -0.30%, +0.86%
Branches: 224038 -> 224041 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1362781 -> 1509495 (+10.77%); split: -0.01%, +10.77%
VALU: 11771997 -> 11772271 (+0.00%); split: -0.01%, +0.01%
SALU: 1416410 -> 1415708 (-0.05%); split: -0.72%, +0.68%
VMEM: 616867 -> 616871 (+0.00%)
SMEM: 970539 -> 869729 (-10.39%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
3c5bcc5f7f nir/algebraic: optimize ishl(iadd(iadd(a, #b), c), #d)
fossil-db (navi31):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 17048 -> 17052 (+0.02%); split: +0.04%, -0.01%
Instrs: 786643 -> 785459 (-0.15%); split: -0.20%, +0.05%
CodeSize: 4074988 -> 4069304 (-0.14%); split: -0.18%, +0.04%
VGPRs: 43896 -> 43860 (-0.08%); split: -0.11%, +0.03%
SpillSGPRs: 753 -> 748 (-0.66%)
Latency: 8187731 -> 8186707 (-0.01%); split: -0.11%, +0.10%
InvThroughput: 1274564 -> 1274582 (+0.00%); split: -0.11%, +0.11%
VClause: 14292 -> 14183 (-0.76%); split: -0.98%, +0.22%
SClause: 21527 -> 21426 (-0.47%); split: -0.53%, +0.06%
Copies: 59381 -> 59299 (-0.14%); split: -0.67%, +0.53%
PreSGPRs: 29358 -> 29349 (-0.03%)
PreVGPRs: 36595 -> 36368 (-0.62%); split: -0.70%, +0.08%
VALU: 482669 -> 481927 (-0.15%); split: -0.21%, +0.06%
SALU: 70019 -> 70009 (-0.01%); split: -0.06%, +0.05%
VOPD: 142 -> 139 (-2.11%)

fossil-db (navi21):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 11536 -> 11516 (-0.17%); split: +0.03%, -0.21%
Instrs: 773615 -> 772476 (-0.15%); split: -0.18%, +0.03%
CodeSize: 4092564 -> 4086688 (-0.14%); split: -0.17%, +0.03%
VGPRs: 43424 -> 43448 (+0.06%); split: -0.04%, +0.09%
SpillSGPRs: 565 -> 560 (-0.88%)
Latency: 8650893 -> 8633993 (-0.20%); split: -0.31%, +0.11%
InvThroughput: 1920741 -> 1920368 (-0.02%); split: -0.10%, +0.08%
VClause: 15830 -> 15774 (-0.35%); split: -0.76%, +0.40%
SClause: 21025 -> 21009 (-0.08%); split: -0.11%, +0.03%
Copies: 65425 -> 65460 (+0.05%); split: -0.37%, +0.43%
Branches: 21845 -> 21848 (+0.01%)
PreSGPRs: 29457 -> 29448 (-0.03%)
PreVGPRs: 37296 -> 37066 (-0.62%); split: -0.69%, +0.08%
VALU: 516908 -> 516056 (-0.16%); split: -0.20%, +0.04%
SALU: 91545 -> 91531 (-0.02%); split: -0.05%, +0.03%

fossil-db (vega10):
Totals from 497 (0.79% of 62962) affected shaders:
MaxWaves: 2325 -> 2328 (+0.13%); split: +0.17%, -0.04%
Instrs: 298230 -> 297284 (-0.32%); split: -0.35%, +0.03%
CodeSize: 1535212 -> 1530636 (-0.30%); split: -0.34%, +0.04%
SGPRs: 36464 -> 36480 (+0.04%)
VGPRs: 29412 -> 29396 (-0.05%); split: -0.07%, +0.01%
SpillSGPRs: 164 -> 159 (-3.05%)
Latency: 3957230 -> 3948919 (-0.21%); split: -0.51%, +0.30%
InvThroughput: 1680680 -> 1679105 (-0.09%); split: -0.17%, +0.08%
VClause: 6175 -> 6102 (-1.18%); split: -1.55%, +0.37%
SClause: 9503 -> 9510 (+0.07%); split: -0.15%, +0.22%
Copies: 20992 -> 20892 (-0.48%); split: -0.97%, +0.50%
PreSGPRs: 17803 -> 17795 (-0.04%)
PreVGPRs: 23072 -> 22823 (-1.08%); split: -1.11%, +0.03%
VALU: 225322 -> 224587 (-0.33%); split: -0.36%, +0.04%
SALU: 21029 -> 21011 (-0.09%); split: -0.22%, +0.13%

fossil-db (polaris10):
Totals from 489 (0.79% of 61794) affected shaders:
Instrs: 299330 -> 298308 (-0.34%); split: -0.40%, +0.06%
CodeSize: 1529316 -> 1525440 (-0.25%); split: -0.32%, +0.07%
SpillSGPRs: 159 -> 149 (-6.29%)
Latency: 3924819 -> 3898471 (-0.67%); split: -0.93%, +0.25%
InvThroughput: 1687167 -> 1684956 (-0.13%); split: -0.22%, +0.09%
VClause: 6248 -> 6067 (-2.90%); split: -3.28%, +0.38%
SClause: 9519 -> 9492 (-0.28%); split: -0.72%, +0.44%
Copies: 21673 -> 21637 (-0.17%); split: -0.90%, +0.73%
PreSGPRs: 17611 -> 17603 (-0.05%)
PreVGPRs: 22873 -> 22625 (-1.08%)
VALU: 226805 -> 225928 (-0.39%); split: -0.45%, +0.06%
SALU: 21419 -> 21413 (-0.03%); split: -0.28%, +0.25%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
150305bbb8 nir/load_store_vectorize: fix sorting of vectors in add_to_entry_key
fossil-db (navi31):
Totals from 13 (0.02% of 79377) affected shaders:
Instrs: 2997 -> 2990 (-0.23%); split: -0.77%, +0.53%
CodeSize: 16552 -> 16504 (-0.29%); split: -0.85%, +0.56%
Latency: 75923 -> 75744 (-0.24%); split: -0.30%, +0.06%
InvThroughput: 12741 -> 12754 (+0.10%); split: -0.14%, +0.24%
PreVGPRs: 225 -> 230 (+2.22%)
VALU: 1565 -> 1569 (+0.26%); split: -0.77%, +1.02%

fossil-db (navi21):
Totals from 13 (0.02% of 79377) affected shaders:
Instrs: 2522 -> 2518 (-0.16%); split: -0.75%, +0.59%
CodeSize: 14660 -> 14620 (-0.27%); split: -0.85%, +0.57%
Latency: 77878 -> 77634 (-0.31%); split: -0.36%, +0.05%
InvThroughput: 15512 -> 15518 (+0.04%); split: -0.15%, +0.19%
Copies: 230 -> 231 (+0.43%); split: -0.87%, +1.30%
PreVGPRs: 225 -> 230 (+2.22%)
VALU: 1536 -> 1541 (+0.33%); split: -0.91%, +1.24%

fossil-db (vega10):
Totals from 13 (0.02% of 62962) affected shaders:
Instrs: 2684 -> 2674 (-0.37%); split: -0.75%, +0.37%
CodeSize: 14784 -> 14752 (-0.22%); split: -0.65%, +0.43%
Latency: 118228 -> 118215 (-0.01%); split: -0.06%, +0.05%
InvThroughput: 42893 -> 42892 (-0.00%); split: -0.11%, +0.11%
SClause: 63 -> 62 (-1.59%)
PreVGPRs: 236 -> 241 (+2.12%)
VALU: 1665 -> 1666 (+0.06%); split: -0.72%, +0.78%

fossil-db (polaris10):
Totals from 9 (0.01% of 61794) affected shaders:
Instrs: 1872 -> 1885 (+0.69%); split: -0.16%, +0.85%
CodeSize: 9980 -> 10012 (+0.32%); split: -0.20%, +0.52%
Latency: 82331 -> 82382 (+0.06%); split: -0.01%, +0.07%
InvThroughput: 30603 -> 30686 (+0.27%)
SClause: 44 -> 45 (+2.27%)
Copies: 252 -> 256 (+1.59%)
PreVGPRs: 169 -> 173 (+2.37%)
VALU: 1100 -> 1117 (+1.55%); split: -0.27%, +1.82%
SALU: 430 -> 434 (+0.93%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
5fe0012670 radv: DCE before nir_opt_shrink_vectors
fossil-db (navi31):
Totals from 941 (1.19% of 79377) affected shaders:
MaxWaves: 28422 -> 28502 (+0.28%)
Instrs: 645374 -> 642031 (-0.52%); split: -0.55%, +0.03%
CodeSize: 3264460 -> 3244488 (-0.61%); split: -0.63%, +0.02%
VGPRs: 48392 -> 48044 (-0.72%)
SpillVGPRs: 7 -> 0 (-inf%)
Scratch: 1792 -> 0 (-inf%)
Latency: 2596896 -> 2536952 (-2.31%); split: -2.33%, +0.02%
InvThroughput: 528726 -> 500139 (-5.41%); split: -5.42%, +0.01%
VClause: 14566 -> 14539 (-0.19%)
Copies: 53022 -> 51296 (-3.26%); split: -3.37%, +0.12%
PreSGPRs: 29369 -> 29367 (-0.01%)
PreVGPRs: 29710 -> 29694 (-0.05%)
VALU: 366134 -> 364245 (-0.52%); split: -0.53%, +0.02%
SALU: 74017 -> 73891 (-0.17%)
VMEM: 25240 -> 25208 (-0.13%)

fossil-db (navi21):
Totals from 941 (1.19% of 79377) affected shaders:
MaxWaves: 22018 -> 22058 (+0.18%); split: +0.28%, -0.10%
Instrs: 521053 -> 518898 (-0.41%); split: -0.43%, +0.02%
CodeSize: 2750628 -> 2734996 (-0.57%); split: -0.58%, +0.01%
VGPRs: 41152 -> 41024 (-0.31%); split: -0.41%, +0.10%
SpillVGPRs: 5 -> 0 (-inf%)
Scratch: 2048 -> 0 (-inf%)
Latency: 2655941 -> 2607022 (-1.84%); split: -1.86%, +0.02%
InvThroughput: 711733 -> 690032 (-3.05%); split: -3.07%, +0.02%
VClause: 16388 -> 16363 (-0.15%)
Copies: 35152 -> 33485 (-4.74%); split: -4.98%, +0.24%
PreSGPRs: 28486 -> 28484 (-0.01%)
PreVGPRs: 30317 -> 30301 (-0.05%)
VALU: 348423 -> 346614 (-0.52%); split: -0.54%, +0.02%
SALU: 44020 -> 43869 (-0.34%)
VMEM: 25216 -> 25195 (-0.08%)

fossil-db (vega10):
Totals from 416 (0.66% of 62962) affected shaders:
MaxWaves: 2687 -> 2696 (+0.33%); split: +0.37%, -0.04%
Instrs: 245634 -> 243501 (-0.87%); split: -0.88%, +0.01%
CodeSize: 1312836 -> 1297248 (-1.19%); split: -1.19%, +0.01%
VGPRs: 17684 -> 17692 (+0.05%); split: -0.43%, +0.48%
SpillVGPRs: 5 -> 0 (-inf%)
Scratch: 2048 -> 0 (-inf%)
Latency: 1928393 -> 1881346 (-2.44%); split: -2.44%, +0.00%
InvThroughput: 1163915 -> 1117096 (-4.02%); split: -4.03%, +0.00%
VClause: 7070 -> 7053 (-0.24%)
Copies: 22577 -> 20834 (-7.72%); split: -7.78%, +0.06%
Branches: 4328 -> 4320 (-0.18%)
PreSGPRs: 13993 -> 13991 (-0.01%)
PreVGPRs: 13452 -> 13436 (-0.12%)
VALU: 165253 -> 163366 (-1.14%); split: -1.15%, +0.01%
SALU: 26258 -> 26111 (-0.56%)
VMEM: 11736 -> 11715 (-0.18%)

fossil-db (polaris10):
Totals from 355 (0.57% of 61794) affected shaders:
Instrs: 108639 -> 108682 (+0.04%); split: -0.03%, +0.07%
CodeSize: 583804 -> 583936 (+0.02%); split: -0.03%, +0.06%
SGPRs: 17712 -> 17728 (+0.09%)
Latency: 735332 -> 734777 (-0.08%); split: -0.08%, +0.01%
InvThroughput: 443975 -> 444045 (+0.02%); split: -0.03%, +0.04%
VClause: 2552 -> 2558 (+0.24%)
SClause: 2394 -> 2393 (-0.04%)
Copies: 11433 -> 11464 (+0.27%); split: -0.15%, +0.42%
PreVGPRs: 7365 -> 7364 (-0.01%)
VALU: 69385 -> 69416 (+0.04%); split: -0.02%, +0.07%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
d04e1ea02d radv: move nir_opt_shrink_vectors later
This seems to be helpful with shaders which use NGG culling.

fossil-db (navi21):
Totals from 3529 (4.45% of 79377) affected shaders:
MaxWaves: 81490 -> 82066 (+0.71%)
Instrs: 2868872 -> 2863476 (-0.19%); split: -0.22%, +0.04%
CodeSize: 14949540 -> 14927580 (-0.15%); split: -0.18%, +0.03%
VGPRs: 165440 -> 164144 (-0.78%)
SpillSGPRs: 578 -> 405 (-29.93%)
Latency: 15388119 -> 15151882 (-1.54%); split: -1.74%, +0.20%
InvThroughput: 2935873 -> 2929736 (-0.21%); split: -0.25%, +0.04%
VClause: 70192 -> 68904 (-1.83%); split: -2.17%, +0.33%
SClause: 67678 -> 67679 (+0.00%); split: -0.10%, +0.10%
Copies: 265824 -> 261458 (-1.64%); split: -1.96%, +0.32%
Branches: 75084 -> 75088 (+0.01%); split: -0.02%, +0.02%
PreSGPRs: 165962 -> 165716 (-0.15%)
PreVGPRs: 135122 -> 134724 (-0.29%)
VALU: 1681747 -> 1677134 (-0.27%); split: -0.32%, +0.05%
SALU: 436003 -> 435915 (-0.02%); split: -0.03%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
f034aa9cd3 radv: don't use bit_sizes_int to skip nir_lower_bit_size
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
19394f44df ac/nir: set memory_modes for lowered TES input loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
0a699e16f9 nir/load_store_vectorize: handle load_buffer_amd/store_buffer_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
1fca72ddc8 ac/nir/ngg: update bit_sizes_int
This is used for RADV's bit size lowering.

fossil-db (navi21):
Totals from 4520 (5.69% of 79377) affected shaders:
(no stat changes)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
cfa217ee04 nir/opt_offsets: don't check NUW for unswizzled buffer_amd
This isn't necessary.

fossil-db (navi21):
Totals from 13 (0.02% of 79377) affected shaders:
Instrs: 18070 -> 18042 (-0.15%); split: -0.17%, +0.01%
CodeSize: 98336 -> 98012 (-0.33%)
Latency: 72735 -> 72992 (+0.35%); split: -0.02%, +0.38%
InvThroughput: 13157 -> 13105 (-0.40%)
VClause: 334 -> 324 (-2.99%)
SClause: 563 -> 564 (+0.18%)
Copies: 1194 -> 1197 (+0.25%)
VALU: 12330 -> 12297 (-0.27%)

fossil-db (polaris10):
Totals from 10 (0.02% of 61794) affected shaders:
Instrs: 4543 -> 4441 (-2.25%)
CodeSize: 30196 -> 29388 (-2.68%)
Latency: 64290 -> 64272 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 20371 -> 20362 (-0.04%); split: -0.08%, +0.04%
VClause: 195 -> 135 (-30.77%)
Copies: 97 -> 100 (+3.09%)
PreSGPRs: 178 -> 177 (-0.56%)
VALU: 1765 -> 1666 (-5.61%)
VMEM: 2448 -> 2445 (-0.12%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Rhys Perry
539f9b4ba6 nir,aco,radv: add align_mul/offset to buffer_amd intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
David Rosca
62b0f84981 ac/vcn_dec: Fix AV1 film grain on VCN5
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33376>
2025-02-07 13:13:45 +00:00
Juan A. Suarez Romero
f3de2134dd broadcom/ci: add new failures/flakes
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33443>
2025-02-07 12:44:12 +00:00
Samuel Pitoiset
76dcac9d47 radv: advertise VK_KHR_cooperative_matrix on GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33378>
2025-02-07 12:06:10 +00:00
Samuel Pitoiset
b05a112d92 radv/nir: add cooperative matrix lowering for GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33378>
2025-02-07 12:06:10 +00:00
Samuel Pitoiset
ad611adeb7 radv/nir: add a struct for parameters to cooperative matrix lowering
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33378>
2025-02-07 12:06:10 +00:00
Samuel Pitoiset
baa09cb94a nir: adjust number of components for cmat_muladd_amd
On GFX12, A&B matrices can be vectors of 4 or 8 elements.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33378>
2025-02-07 12:06:10 +00:00
Karol Herbst
7c51ffe560 rusticl/mem: accelerate Buffer::write_rect
similar to the copy_rect change

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33426>
2025-02-07 10:36:28 +00:00
Karol Herbst
b76784e655 rusticl/mem: accelerate Buffer::copy_rect
It's doing a bunch of copies, but at least they are done on the GPU, not
the CPU.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33426>
2025-02-07 10:36:28 +00:00