Commit graph

216211 commits

Author SHA1 Message Date
Emma Anholt
c00ebca5c4 ir3: Improve spilling of NIR vars to scratch.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:

1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand).  This would fail to
register allocate.  We haven't seen this in practice.

2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate.  This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.

Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space.  This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs.  On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase.  However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:

MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%

Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)

The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Emma Anholt
0d9428736b ir3/ra: Make a helper to get RA register pressure limits.
I'll be reusing this to let vars_to_scratch keep bigger arrays in register
space.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Emma Anholt
d5cb38e457 ir3: Move the compute shader threadsize forcing earlier.
With this, we can look at real_wavesize while running NIR passes and know
if we have to be doubled because of the shader info coming in.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Emma Anholt
5a09abe890 nir: Introduce nir_lower_vars_to_scratch_global().
This lets the driver make a more informed decision about which vars to
lower to scratch based on the vars available to spill.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Emma Anholt
059d301c79 nir: Drop the mode argument of nir_lower_vars_to_scratch().
It only makes sense for function temps, and that's the only way it's been
used.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
2025-12-17 19:50:28 +00:00
Yiwei Zhang
962bed2dd6 vulkan: update ALLOWED_ANDROID_VERSION for api level 36
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38988>
2025-12-17 19:22:47 +00:00
Mel Henning
dfdaee5ca7 nak: Use the hardware's max warps_per_sm value
This should improve our occupancy estimates.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
2025-12-17 19:08:05 +00:00
Mel Henning
b154071178 nak: Don't box ShaderModelInfo
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
2025-12-17 19:08:05 +00:00
Mel Henning
d7e906d60e nak: Replace &dyn ShaderModel w/ &ShaderModelInfo
This is mostly a s/dyn ShaderModel/ShaderModelInfo/ with a few manual fixes.
With this change, we now statically dispatch into ShaderModel, which is
a bit faster than dynamically dispatching. Together, this commit and the
last one improve compile times by about 1% geomean.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
2025-12-17 19:08:04 +00:00
Mel Henning
ee65578fa1 nak: Add ShaderModelInfo
which statically dispatches into the right ShaderModel implementation.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
2025-12-17 19:08:04 +00:00
Ian Romanick
66fd4d72fd nir/algebraic: Mask with shifted constant instead of shift-then-mask
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17088766 -> 17088765 (<.01%)
instructions in affected programs: 1375 -> 1374 (-0.07%)
helped: 1 / HURT: 1

total cycles in shared programs: 887873068 -> 887871748 (<.01%)
cycles in affected programs: 136402 -> 135082 (-0.97%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 40937696 -> 40937728 (+0.00%)
Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00%
Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00%
Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01%
Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 49078640 -> 49078656 (+0.00%)
Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00%

Totals from 13809 (0.68% of 2019450) affected shaders:
Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02%
Subgroup size: 32 -> 64 (+100.00%)
Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25%
Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24%
Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88%
Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02%
Max dispatch width: 234848 -> 234864 (+0.01%)
Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 27452928 -> 27452944 (+0.00%)
Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01%
Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02%
Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05%
Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37811544 -> 37811584 (+0.00%)
Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00%

Totals from 14438 (0.63% of 2281134) affected shaders:
Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06%
Subgroup size: 16 -> 32 (+100.00%)
Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55%
Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46%
Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37%
Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02%
Max dispatch width: 123528 -> 123568 (+0.03%)
Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01%
Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01%
Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01%
Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37805336 -> 37805472 (+0.00%)
Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00%

Totals from 14835 (0.65% of 2278397) affected shaders:
Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02%
Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71%
Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76%
Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54%
Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02%
Max dispatch width: 126600 -> 126736 (+0.11%)
Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>
2025-12-17 18:38:55 +00:00
Tapani Pälli
2418c91537 anv/drirc: disable Xe2 CCS drm modifiers for GTK engine
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38373>
2025-12-17 17:34:09 +00:00
Connor Abbott
68c1a8230d freedreno/crashdec: Fix crash with older kernels
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Older kernels lack the cluster-name property. Don't crash decoding
devcoredumps from them, even if they can't be converted to snapshots
properly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38931>
2025-12-17 16:00:56 +00:00
Samuel Pitoiset
f8feed17e1 ac,radv,radeonsi: add tracked register macros to common code
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:26 +00:00
Samuel Pitoiset
c580fc667f ac,radv: add ac_cmdbuf::context_roll and use it
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:26 +00:00
Samuel Pitoiset
f3b385859a ac,radv: add more cmdbuf emit helpers
Some can't be shared with RadeonSI because it uses templates in some
places.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:25 +00:00
Samuel Pitoiset
b444dc145a radv: remove redundant assertions in radeon_emit_{array}()
The common helpers already have assertions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:25 +00:00
Samuel Pitoiset
262fc80e45 ac,radv,radeonsi: add functions to initialize tracked regs
Also initialize the new slots for RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:25 +00:00
Samuel Pitoiset
eb2f4a13c4 radeonsi: remove dead code in si_set_tracked_regs_to_clear_state()
GFX12 doesn't have clear state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:24 +00:00
Samuel Pitoiset
44314e1ea6 ac,radv,radeonsi: add ac_tracked_regs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:24 +00:00
Samuel Pitoiset
c97bd17d4d radv: switch to AC_TRACKED_xxx
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:23 +00:00
Samuel Pitoiset
fad24d6fcc ac/cmdbuf: add new slots to ac_tracked_reg
For RADV registers that aren't tracked in RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:23 +00:00
Samuel Pitoiset
18bdb76408 ac,radeonsi: move si_tracked_reg to common code
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
2025-12-17 15:09:22 +00:00
Icenowy Zheng
6bda88bfdb pvr: copy WSI can_present_on_device function from PanVK
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Both PVR and PanVK are drivers for generic embedded GPU IP cores, so
just take the can_present_on_device implementation from PanVK, which
allows any platform devices for presentation.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38985>
2025-12-17 14:53:39 +00:00
Martin Roukala (né Peres)
8b8e472c65 zink/ci: update the a750 expectations
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
2025-12-17 14:10:32 +00:00
Martin Roukala (né Peres)
5f54ae9048 turnip/ci: update the vkd3d expectations
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
2025-12-17 14:10:32 +00:00
Martin Roukala (né Peres)
f155711a33 freedreno/ci: update the a750 expectations
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
2025-12-17 14:10:32 +00:00
Martin Roukala (né Peres)
6993b0172b freedreno/ci/a750: switch to the linux-firmware-provided gpu fw
Now that qcom has released the gpu firmware for the a750, let's stop
using my fw package in favor of the publicly-available ones.

v2:

 * Be more specific in the list of files we want to keep (lumag)
 * Uprev the linux firmware version
 * Use gfx-ci/firmware rather than the upstream gitlab repo

Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
2025-12-17 14:10:32 +00:00
Erik Faye-Lund
74b7b68628 mesa/st: always override internal-format for 10-bit formats
We also need to do this in the GLES-only code-path, otherwise we'll end
up setting PIPE_BIND_RENDER_TARGET for these, which means we'll
incorrectly require these to be color-renderable.

Fixes: 60e115dedf ("mesa/st: do not drop binding prematurely")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38945>
2025-12-17 13:42:21 +00:00
Caterina Shablia
0da350f879 panvk: remove AFBC header zeroing
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is not actually necessary and moreover was corrupting
mipmapped arrayed 2D images in cases when the transition barrier
wasn't transitioning all mips, but more than one layer.

Keep the layout transition infrastructure in place as we'll need
it for transaction elimination CRC zeroing on v10-.

Fixes: c95f8993 ("panvk: add a meta command for transitioning image layout")

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
2025-12-17 12:33:58 +00:00
Caterina Shablia
d8ceb38ef1 panvk: do not access the image in image view's destructor
Vulkan allows destroying an image without destroying the views of
this image first. These views can not be used in any way and the
only thing that the user can do with such a view is destroy it.
This also means that the driver can not refer to the image inside
the image view's destructor.

Fixes cb3f6481 ("panvk: Create MS shadow images and views")

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
2025-12-17 12:33:58 +00:00
Samuel Pitoiset
bf2aa05b60 zink/ci: add two tests to the skip lists
They either fails or hangs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38714>
2025-12-17 11:11:18 +00:00
Samuel Pitoiset
5d76202b6d radv: create descriptors for color/depth-stencil surfaces earlier
For less CPU overhead when rendering begins and also because it's
easy to pre-compute those descriptors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38714>
2025-12-17 11:11:18 +00:00
Samuel Pitoiset
c8729cdd3c radv/meta: stop passing a stencil attachment for depth decompress
It should only be the depth aspect.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38714>
2025-12-17 11:11:18 +00:00
Samuel Pitoiset
43d7d97b13 radv/meta: inject image view usage info
This will be used to initialize color/depth-stencil descriptors earlier
when the image view is created.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38714>
2025-12-17 11:11:18 +00:00
Samuel Pitoiset
ce69cabb60 radv: constify radv_{cb,ds}_buffer_info parameters
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38714>
2025-12-17 11:11:18 +00:00
Lucas Fryzek
48799005d7 Revert "drisw: Copy entire buffer ignoring damage regions"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit 755e795e4c.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38817>
2025-12-17 10:06:32 +00:00
Lucas Fryzek
17ab0f2ece drisw: Modify drisw_swap_buffers_with_damage to swap entire buffer
When swapping buffer with damage regions, to be strictly correct we
need to swap the entire back buffer to the front buffer. This needs to
be done in case the compositor does not support damage regions. This
means we need to ignore the input damage region and tell drisw to swap
the entire buffer.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38817>
2025-12-17 10:06:32 +00:00
Georg Lehmann
37c3a2fb89 zink/ci: update radv trace checksums
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38730>
2025-12-17 08:41:32 +00:00
Georg Lehmann
0478021fdc aco/optimizer: reassociate rcp(mul(a, const)) into rcp_omod(a)
Foz-DB Navi48:
Totals from 2484 (2.54% of 97637) affected shaders:
Instrs: 10368279 -> 10361892 (-0.06%); split: -0.06%, +0.00%
CodeSize: 55161104 -> 55150752 (-0.02%); split: -0.02%, +0.00%
SpillSGPRs: 14665 -> 14666 (+0.01%)
Latency: 87694014 -> 87689324 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 16595764 -> 16594448 (-0.01%); split: -0.01%, +0.00%
VClause: 209922 -> 209918 (-0.00%); split: -0.01%, +0.00%
SClause: 205195 -> 205251 (+0.03%); split: -0.01%, +0.04%
Copies: 843771 -> 843765 (-0.00%); split: -0.01%, +0.01%
Branches: 275985 -> 275962 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 170608 -> 170494 (-0.07%)
VALU: 5840893 -> 5838038 (-0.05%); split: -0.05%, +0.00%
SALU: 1481388 -> 1479037 (-0.16%); split: -0.16%, +0.00%
VOPD: 7496 -> 7485 (-0.15%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38730>
2025-12-17 08:41:32 +00:00
Georg Lehmann
a8f5ced670 aco/optimizer: reassociate mul(mul(a, const), b) into mul_omod(a, b)
Foz-DB Navi48:
Totals from 14608 (14.96% of 97637) affected shaders:
MaxWaves: 364201 -> 364421 (+0.06%)
Instrs: 28051720 -> 28022503 (-0.10%); split: -0.13%, +0.03%
CodeSize: 148938740 -> 148943480 (+0.00%); split: -0.04%, +0.04%
VGPRs: 994520 -> 994004 (-0.05%); split: -0.05%, +0.00%
SpillSGPRs: 45182 -> 45179 (-0.01%)
Latency: 187734461 -> 187725301 (-0.00%); split: -0.07%, +0.06%
InvThroughput: 33967002 -> 33949881 (-0.05%); split: -0.11%, +0.06%
VClause: 495237 -> 495207 (-0.01%); split: -0.03%, +0.02%
Copies: 2048324 -> 2047937 (-0.02%); split: -0.12%, +0.10%
Branches: 598445 -> 598431 (-0.00%); split: -0.01%, +0.01%
PreSGPRs: 877715 -> 877684 (-0.00%)
PreVGPRs: 778146 -> 776383 (-0.23%); split: -0.23%, +0.00%
VALU: 16413380 -> 16391508 (-0.13%); split: -0.15%, +0.01%
SALU: 3685279 -> 3677655 (-0.21%); split: -0.23%, +0.02%
VOPD: 26219 -> 25926 (-1.12%); split: +0.43%, -1.55%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38730>
2025-12-17 08:41:31 +00:00
Daniel Schürmann
125ac1626d radv: remove precomputed registers from radv_shader_binary
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It is enough to compute them after upload.
This saves some disk space and eliminates an unlikely
bug where the shader cache is shared between two GPUs
with the same chip but a different number of enabled CUs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38970>
2025-12-17 08:16:06 +00:00
Sagar Ghuge
61287b00f3 anv: Stop using RCS companion for MSAA copy/clear on Xe3+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On Xe3+, we have typed MSAA load/store message support. We can use them
during MSAA copies. We don't have to fallback on RCS companion queue
anymore.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
2025-12-17 05:34:02 +00:00
Sagar Ghuge
de0c547448 blorp: Handle 2D MSAA array image copies on compute shader
We are passing number of layers as inline parameter register, so figure
out z_pos and write to 2D MSAA array images in compute
shader. We already get component X, Y and sample index, all we needed
was the number of layers.

Ken:
- Use load/store var instead of derefs

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
2025-12-17 05:34:02 +00:00
Sagar Ghuge
080d28a03e blorp: Set persample_msaa_dispatch for render shader
Only 3D shader gets dispatched per sample not the compute shader.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
2025-12-17 05:34:02 +00:00
Lionel Landwerlin
d99a3d9b58 anv: remove CS-L3 coherency on Xe2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I'll try to write some crucible tests for this.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: be5f5f659f ("anv: consider CS coherent with L3 on Xe2+")
Fixes: 503355c7f8 ("anv: update pipeline barriers for Xe2+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38966>
2025-12-16 21:35:27 +00:00
Steev Klimaszewski
10f259e673 tu: Stop printing descriptor pool allocation failures
The VK_ERROR_FRAGMENTED_POOL and VK_ERROR_OUT_OF_POOL_MEMORY errors are
not as exceptional cases as most.  These are expected to be hit by
applications in the normal course of doing their thing.  Probably best
not to spam stderr and the debug logs with them.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38940>
2025-12-16 21:11:41 +00:00
Rob Clark
a520752328 freedreno/a6xx: gen8 lrz support
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38930>
2025-12-16 19:38:38 +00:00
Rob Clark
0e82a8d759 freedreno/a6xx: Fix layered lrz
Don't hard-code to a single layer, and fix lrz (slow) clear path to
account for the # of layers.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5582
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38930>
2025-12-16 19:38:37 +00:00
Rob Clark
14a23e8b3e freedreno/lrz: Add gen8 lrz layout support
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38930>
2025-12-16 19:38:37 +00:00