Commit graph

218710 commits

Author SHA1 Message Date
Rhys Perry
f81aaee7f1 aco/ra: create vectors for affinities of split definitions
For example:
a = ...
b = ...
if {
   c, d = split
}
phi(a, c)
phi(b, d)

This patch will allocate 'a' and 'b' as a vector.

fossil-db (navi31):
Totals from 2556 (3.20% of 79825) affected shaders:
MaxWaves: 59957 -> 59955 (-0.00%)
Instrs: 9170941 -> 9154954 (-0.17%); split: -0.19%, +0.02%
CodeSize: 48245956 -> 48182620 (-0.13%); split: -0.15%, +0.02%
VGPRs: 189372 -> 189900 (+0.28%); split: -0.04%, +0.32%
Latency: 85469322 -> 85262360 (-0.24%); split: -0.32%, +0.08%
InvThroughput: 14515911 -> 14486970 (-0.20%); split: -0.27%, +0.07%
VClause: 197980 -> 197959 (-0.01%); split: -0.02%, +0.01%
Copies: 787838 -> 774288 (-1.72%); split: -1.91%, +0.19%
Branches: 271810 -> 271799 (-0.00%); split: -0.01%, +0.01%
VALU: 5331813 -> 5318566 (-0.25%); split: -0.28%, +0.03%
SALU: 1133559 -> 1133054 (-0.04%); split: -0.05%, +0.01%
VOPD: 2435 -> 2418 (-0.70%); split: +0.12%, -0.82%

fossil-db (navi21):
Totals from 37513 (46.99% of 79825) affected shaders:
Instrs: 26734825 -> 26681225 (-0.20%); split: -0.23%, +0.03%
CodeSize: 141353284 -> 141144360 (-0.15%); split: -0.17%, +0.02%
VGPRs: 1556760 -> 1556384 (-0.02%); split: -0.21%, +0.18%
Latency: 146201548 -> 146156473 (-0.03%); split: -0.20%, +0.17%
InvThroughput: 33921803 -> 33867398 (-0.16%); split: -0.23%, +0.07%
VClause: 502263 -> 502209 (-0.01%); split: -0.27%, +0.26%
SClause: 593142 -> 593155 (+0.00%); split: -0.00%, +0.00%
Copies: 2600995 -> 2551257 (-1.91%); split: -2.16%, +0.25%
Branches: 857910 -> 857787 (-0.01%); split: -0.03%, +0.02%
VALU: 15674532 -> 15625013 (-0.32%); split: -0.35%, +0.04%
SALU: 4635548 -> 4634680 (-0.02%); split: -0.04%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
86f0195f5c aco/ra: prefer phi operands which don't create waitcnt
fossil-db (navi31):
Totals from 89 (0.11% of 79825) affected shaders:
Instrs: 343443 -> 343384 (-0.02%); split: -0.10%, +0.09%
CodeSize: 1792948 -> 1792668 (-0.02%); split: -0.10%, +0.08%
Latency: 2656294 -> 2656490 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 517696 -> 517691 (-0.00%); split: -0.01%, +0.01%
SClause: 9213 -> 9215 (+0.02%); split: -0.01%, +0.03%
Copies: 39138 -> 39089 (-0.13%); split: -0.84%, +0.71%
Branches: 10863 -> 10872 (+0.08%); split: -0.05%, +0.13%
SALU: 49185 -> 49136 (-0.10%); split: -0.67%, +0.57%

fossil-db (navi21):
Totals from 34490 (43.21% of 79825) affected shaders:
Instrs: 23005853 -> 22956529 (-0.21%); split: -0.25%, +0.04%
CodeSize: 120532004 -> 120341412 (-0.16%); split: -0.19%, +0.03%
VGPRs: 1396928 -> 1397520 (+0.04%); split: -0.07%, +0.11%
Latency: 108740068 -> 108499644 (-0.22%); split: -0.53%, +0.30%
InvThroughput: 25286526 -> 25358695 (+0.29%); split: -0.11%, +0.39%
VClause: 421179 -> 421132 (-0.01%); split: -0.29%, +0.27%
SClause: 446414 -> 446423 (+0.00%); split: -0.00%, +0.00%
Copies: 2242236 -> 2243168 (+0.04%); split: -0.42%, +0.46%
Branches: 724556 -> 724903 (+0.05%); split: -0.02%, +0.07%
VALU: 13321078 -> 13321940 (+0.01%); split: -0.07%, +0.08%
SALU: 4069929 -> 4070580 (+0.02%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
310f588f92 aco/ra: move variables from affinity register to avoid waitcnt
If we don't use this affinity register, we're likely to end up moving the
temporary later. If it's a memory instruction destination, that's probably
more expensive than just copying the blocking variables.

fossil-db (navi31):
Totals from 504 (0.63% of 79825) affected shaders:
Instrs: 4108284 -> 4109026 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21226764 -> 21229764 (+0.01%); split: -0.01%, +0.02%
Latency: 26931635 -> 26806989 (-0.46%); split: -0.47%, +0.00%
InvThroughput: 8443520 -> 8439235 (-0.05%); split: -0.06%, +0.01%
VClause: 99209 -> 99314 (+0.11%); split: -0.00%, +0.11%
SClause: 85089 -> 85085 (-0.00%)
Copies: 340323 -> 340993 (+0.20%); split: -0.06%, +0.26%
Branches: 117225 -> 117209 (-0.01%); split: -0.02%, +0.00%
VALU: 2421859 -> 2422529 (+0.03%); split: -0.01%, +0.04%
SALU: 503465 -> 503470 (+0.00%); split: -0.00%, +0.00%

fossil-db (navi21):
Totals from 582 (0.73% of 79825) affected shaders:
Instrs: 3714908 -> 3714990 (+0.00%); split: -0.02%, +0.02%
CodeSize: 19977880 -> 19973076 (-0.02%); split: -0.04%, +0.01%
VGPRs: 40480 -> 40496 (+0.04%)
Latency: 26028895 -> 25772711 (-0.98%); split: -0.99%, +0.00%
InvThroughput: 9827389 -> 9818194 (-0.09%); split: -0.10%, +0.01%
VClause: 103702 -> 103815 (+0.11%); split: -0.02%, +0.13%
SClause: 90861 -> 90857 (-0.00%)
Copies: 335276 -> 335992 (+0.21%); split: -0.09%, +0.30%
Branches: 123912 -> 123897 (-0.01%); split: -0.02%, +0.00%
VALU: 2466032 -> 2466748 (+0.03%); split: -0.01%, +0.04%
SALU: 533658 -> 533667 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
681ec4cba7 aco/ra: track cost of moving variables
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
69bc4efa37 aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.

fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%

fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
88b6b6db17 aco: only consider cost of memory loads at waitcnt
We don't run this code before waitcnt insertion, so this isn't necessary.

This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:

v0 = valu
if (divergent) {
    v0 = vmem
} else {
    use(v0)
}

v0 = vmem
if (divergent) {
    wait vmcnt(0)
} else {
    wait vmcnt(0)
}
use(v0)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Georg Lehmann
bca5aab2be nir: let nir_analyze_fp_range take a nir_def
This is midly worse for vector constants, but so much simpler.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Georg Lehmann
474af815ff nir: rename nir_analyze_range because it's float only
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Georg Lehmann
f2a59fdea6 nir: remove non float nir_analyse_range support
This was always unused/unfinished.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Georg Lehmann
f7222d6939 nir/opt_algebraic: remove few uses of integer nir_analyze_range
Surprisingly, this has an effect on GFX1201:

Totals from 66 (0.08% of 82405) affected shaders:
Instrs: 200725 -> 201517 (+0.39%)
CodeSize: 978676 -> 981488 (+0.29%)
Latency: 291736 -> 291760 (+0.01%)
InvThroughput: 31556 -> 31604 (+0.15%)
Copies: 11928 -> 12588 (+5.53%)
Branches: 14850 -> 15048 (+1.33%)
SALU: 68981 -> 69509 (+0.77%)

I say surprisingly, because nir_analyze_range handles nothing but
constants and bcsel for integers. Maybe rdr2 is actually
hitting some weird bcsel(a, #b, #c) == 0 case where b and c are not 0?
No, I looked at a few of those shaders, and it's just noise from changed
instruction order.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Ryan Zhang
418e6c4ed9 panvk: guard against NULL pointers to avoid crash
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Vkcts simulate_oom caselist try to alloc fail manual
which caused the panvk crash. We should guard driver
cannot access null pointor.

Fixes: 598a8d9d11 ("panvk: Collect allocated push
sets at the command level")

Fixed:
dEQP-VK.wsi.wayland.swapchain.simulate_oom.*

Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39873>
2026-02-16 16:21:11 +00:00
Rhys Perry
6963c8dd80 radv,aco/gfx11: preserve s2 when NGG_WAVE_ID_EN=1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
According to the ISA doc, this is needed for hang recovery.

This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:58 +00:00
Rhys Perry
f9c11a8e15 radv: add ngg_wave_id_en to radv_shader_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:57 +00:00
Marek Olšák
1a105e1b1f radeonsi: remove CB_RESOLVE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's rarely used because a custom pixel or compute shader is almost always
faster, and we have those already.

RGB<->BGR swapping and microtile mode switching existed only for CB_RESOLVE
and are removed too.

RADV could also remove CB_RESOLVE, but it should probably use a pixel shader
until it can use ac_nir_meta_cs_blit, which is the fastest option for gfx12.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39872>
2026-02-16 13:38:43 +00:00
Marek Olšák
aa92b464f3 nir/opt_non_uniform_access: use new query flags
NFC for drivers

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Marek Olšák
61a96be494 nir/lower_non_uniform_access: add an option not to lower tex & image queries
AMD can do non-uniform queries. The RADV change will be in a separate commit.

NFC for drivers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Marek Olšák
a9df891bc6 nir: allow get_ssbo_size to return a 64-bit result
to match get_ubo_size, and to support HW where SSBOs can have a 64-bit size.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Marek Olšák
c151402f35 nir: add ACCESS to get_ubo_size
so that we can set NON_UNIFORM

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Marek Olšák
1d09a975bf nir: handle get_ubo_size as a resource query in nir_shader_gather_info
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Lars-Ivar Hesselberg Simonsen
2d9be41706 panvk/v13: Support HSR Prepass
Add an option to enable HSR Prepass.

It is currently disabled by default as it might cause performance
regressions for content that:

- Has very simple fragment work.
- Already does a ZS prepass.

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:14 +00:00
Lars-Ivar Hesselberg Simonsen
3d6c7cf8b7 panvk/v13: Set HSR flags
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:14 +00:00
Lars-Ivar Hesselberg Simonsen
b10555ea63 pan/compiler: Add pass to collect HSR info
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:14 +00:00
Lars-Ivar Hesselberg Simonsen
6e88d9cbe3 pan/genxml/v13: Add HSR operation enums
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:14 +00:00
Lars-Ivar Hesselberg Simonsen
71500a32fa pan/genxml/v13: Fix HSR Prepass typo
Fixes: ece01443e1 ("pan/genxml: Add v13 definition")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:14 +00:00
Lars-Ivar Hesselberg Simonsen
75242b1862 panvk: Fix dcd_flags1 dirty bit
dcd_flags1 was not counted as dirty in case the color attachment map was
updated. This could lead to an outdated value for render_target_mask.

Fixes: a4670a67e0 ("panvk/csf: Set the correct DCD_FLAGS_1.render_rarget_mask")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
2026-02-16 12:25:13 +00:00
Pavel Ondračka
0763fb947a r300: align macro-tiled stride-addressed textures in X
Odd macro-tile counts in X trigger flaky rendering/readback in
parallel stress runs with macro-tiled NPOT textures (for example
piglit draw-pixel-with-texture -auto -fbo).

When a texture is macro-tiled and uses stride addressing, align the
width to two macro tiles. This keeps the stride at an even number of
macro tiles in X and avoids the corruption without disabling
macrotiling.

I was not able to find anything about this in the docs.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39882>
2026-02-16 13:04:56 +01:00
Pavel Ondračka
7ae9262dc3 r300: split unaligned 3D texsubimage uploads by layer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
TexSubImage3D failures were caused by tiled multi-slice uploads for
unaligned XY box. Falls back to per-layer uploads when the 3D box
depth is > 1 and XY box is unaligned.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39897>
2026-02-16 11:26:41 +00:00
Hyunjun Ko
eedbe136ea anv/video: remove unsupported feautres for encoders
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
2026-02-16 10:58:40 +00:00
Hyunjun Ko
1185bbe18d anv/video: set Sad Qp Lambda values properly for H265 encoder.
This is taken from media-driver(Intel VA-API)

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
2026-02-16 10:58:40 +00:00
Hyunjun Ko
1cb4fe5ef5 anv/video: Handle GPB(Generalized P and B frames) properly for H265 enc.
The previous code was copying RefPicList0 to RefPicList1 but not updating
num_ref_idx_l1_active_minus1, leaving it potentially uninitialized or zero.
This caused the hardware to see an inconsistent L1 list state.

Accordingly it sets num_ref_idx_active_override_flag if necessary.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
2026-02-16 10:58:40 +00:00
Hyunjun Ko
4d4a5e4a42 anv/video: set Qp passed from apps for h265 encoder
Instead of 26 by default.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
2026-02-16 10:58:40 +00:00
Hyunjun Ko
6efbb80c98 anv/video: set transform skip numbers according to qp
Instead of hardcode.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39884>
2026-02-16 10:58:40 +00:00
Juan A. Suarez Romero
f641eb4fad broadcom/ci: update expected results
Add new flakes found in the nightly runs.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39912>
2026-02-16 11:21:23 +01:00
Juan A. Suarez Romero
154b5ccc9e broadcom/ci: update available devices
These are the devices used for pre-merges.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39912>
2026-02-16 11:21:15 +01:00
Job Noorman
65362a9c38 ir3: don't use predication for large blocks
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Even when the branch condition is divergent, it may still be uniform
within a wave. When this happens for very large blocks, predication
causes a large overhead because the instructions in the false block are
not simply jumped over. Therefore, we fall back to normal branches for
large blocks.

Measuring renderpass time on some OpenGL traces:
average fps +0.5% (+/- 0.1%)
max fps +0.5% (+/- 0.1%)

Totals from 18522 (10.51% of 176279) affected shaders:
MaxWaves: 203126 -> 203156 (+0.01%)
Instrs: 23999194 -> 23521729 (-1.99%); split: -1.99%, +0.00%
CodeSize: 45462360 -> 45250224 (-0.47%); split: -0.47%, +0.00%
NOPs: 5078652 -> 4647917 (-8.48%); split: -8.48%, +0.00%
MOVs: 813450 -> 812615 (-0.10%); split: -0.26%, +0.15%
COVs: 296638 -> 296620 (-0.01%); split: -0.01%, +0.00%
Full: 334991 -> 334923 (-0.02%)
(ss): 636625 -> 636682 (+0.01%); split: -0.12%, +0.13%
(sy): 283395 -> 283429 (+0.01%); split: -0.10%, +0.11%
(ss)-stall: 2652246 -> 2651590 (-0.02%); split: -0.18%, +0.16%
(sy)-stall: 7862615 -> 7881590 (+0.24%); split: -0.13%, +0.37%
STPs: 15994 -> 15992 (-0.01%)
LDPs: 23360 -> 23356 (-0.02%)
Subgroup size: 896 -> 1792 (+100.00%)
Cat0: 5572855 -> 5097380 (-8.53%); split: -8.53%, +0.00%
Cat1: 1146050 -> 1145189 (-0.08%); split: -0.18%, +0.11%
Cat2: 8975537 -> 8974390 (-0.01%); split: -0.01%, +0.00%
Cat6: 196837 -> 196831 (-0.00%)
Cat7: 795866 -> 795890 (+0.00%); split: -0.06%, +0.06%

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39734>
2026-02-16 09:00:14 +00:00
Job Noorman
e7c3834a27 ir3: add block_can_be_predicated helper
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39734>
2026-02-16 09:00:13 +00:00
Samuel Pitoiset
47841c1142 radv/meta: remove useless DCC decompressions for image<->buffer
It's not needed to decompress DCC when formats are compatible each
other, this basically removes all decompressions on GFX11-GFX11.5.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39888>
2026-02-16 07:40:13 +00:00
Yiwei Zhang
b0397b967d venus: workaround a gcc-15 dead store elimination (DSE) bug
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No issue with clang or gcc-14.x (or earlier versions). The issue only
shows up since gcc-15.1. The compiler somehow fails to consider those
cs helpers dereferencing the pointer from the pNext chain for reads,
and thus has falsely optimized away the pNext store. This change works
around this with a no-op memory clobber.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13242
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39906>
2026-02-16 01:39:10 +00:00
Vinson Lee
7239b5288f freedreno/decode: replace lua_pushunsigned with lua_pushinteger
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
lua_pushunsigned was introduced in Lua 5.2, deprecated in 5.3,
and removed in 5.4. Replace it with lua_pushinteger which has
been available since Lua 5.0 and handles the uint32_t value
safely via implicit widening to the 64-bit lua_Integer type.

This fixes the build with Lua 5.5:

../src/freedreno/decode/script.c: In function 'pushdecval':
../src/freedreno/decode/script.c:182:7: error: implicit declaration of function 'lua_pushunsigned'; did you mean 'lua_pushinteger'? [-Wimplicit-function-declaration]
  182 |       lua_pushunsigned(L, val.u);
      |       ^~~~~~~~~~~~~~~~
      |       lua_pushinteger

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39901>
2026-02-14 22:45:45 -08:00
Yiwei Zhang
a8baedef29 venus: expose VK_EXT_descriptor_heap behind a debug option
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There's TODO for optimizing descriptor allocations, so currently we
expose the extension behind VN_DEBUG=desc_heap

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:30 +00:00
Yiwei Zhang
6265dad4f2 venus: fill descriptor heap feats and props
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:30 +00:00
Yiwei Zhang
dea6221a65 venus: take care of combined image sampler descriptor for ycbcr
We'd have to query it by reconstructing the sampler ycbcr conversion
image format props query. It's straightforward except having to consider
the modifier tiling case.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:30 +00:00
Yiwei Zhang
4f475789d5 venus: ensure descriptor writes invariance
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:30 +00:00
Yiwei Zhang
1d779f5af1 venus: cache descriptor size query
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:29 +00:00
Yiwei Zhang
dfc5d76205 venus: rename format_update_mutex for general purpose
Not worth separate locks for physical device level queries.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:29 +00:00
Yiwei Zhang
be52338399 venus: add vn_descriptor.h to be shared between different desc systems
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:28 +00:00
Yiwei Zhang
990b5fca37 venus: skip image cache for VkOpaqueCaptureDataCreateInfoEXT
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:28 +00:00
Yiwei Zhang
526788a097 venus: pipeline layout is now optional
If descriptor heap is used, there's no pipeline layout created. So we
have to patch compute and RT pipelines to allow it. Graphics pipeline
doesn't need the change because of GPL.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:28 +00:00
Yiwei Zhang
95331f3bd0 venus: cmd inheritance info fix to consider descriptor heap
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:28 +00:00
Yiwei Zhang
485b2b501c venus: implement all descriptor heap commands
There're potential optimizations available for below:
- vkWriteSamplerDescriptorsEXT
- vkWriteResourceDescriptorsEXT
- vkGetPhysicalDeviceDescriptorSizeEXT
- vkRegisterCustomBorderColorEXT

...and we can revisit if there's perf hit from above for real apps.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39762>
2026-02-15 04:32:28 +00:00