Mike Blumenkrantz
4d0650d188
zink: fix image sync deferral
...
each of these cases wasn't actually checking what the comment claimed
it was checking, which would add unnecessary deferred sync
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36846 >
2025-08-19 22:11:51 +00:00
Mike Blumenkrantz
af7b39a22f
zink: optimize a GENERAL layout case in pre-draw/dispatch barriers
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36846 >
2025-08-19 22:11:50 +00:00
Job Noorman
77c1c688dc
ir3/array_to_ssa: remove trivial all-undef phis
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
remove_trivial_phi erroneously skipped phis containing an undef src
because the remaining srcs may not dominate the phi. However, it's fine
to replace a phi whose srcs are all undef with undef. Fix this by simply
checking if all srcs are equal, whether undef or not.
Note that in practice, this often caused phis with undef srcs to be
inserted all the way up to the entry block, keeping their defs alive for
much longer than necessary.
Fixes unnecessary spilling in God Of War and Neon Noir traces.
Totals:
MaxWaves: 2381774 -> 2384954 (+0.13%)
Instrs: 49052269 -> 49052865 (+0.00%); split: -0.03%, +0.04%
CodeSize: 102493810 -> 102514296 (+0.02%); split: -0.02%, +0.04%
NOPs: 8391570 -> 8385296 (-0.07%); split: -0.14%, +0.07%
MOVs: 1448918 -> 1455153 (+0.43%); split: -0.43%, +0.86%
COVs: 824835 -> 824846 (+0.00%)
Full: 1714015 -> 1707987 (-0.35%)
(ss): 1125974 -> 1126692 (+0.06%); split: -0.14%, +0.21%
(sy): 553893 -> 553561 (-0.06%); split: -0.23%, +0.17%
(ss)-stall: 4011440 -> 4006144 (-0.13%); split: -0.21%, +0.08%
(sy)-stall: 16707741 -> 16664838 (-0.26%); split: -0.48%, +0.23%
STPs: 18953 -> 18495 (-2.42%)
LDPs: 23957 -> 22121 (-7.66%)
Preamble Instrs: 11100893 -> 11100673 (-0.00%)
Early Preamble: 122185 -> 122188 (+0.00%)
Last helper: 11913048 -> 11914963 (+0.02%); split: -0.04%, +0.06%
Subgroup size: 12925248 -> 12926272 (+0.01%)
Cat0: 9246551 -> 9240417 (-0.07%); split: -0.13%, +0.07%
Cat1: 2335781 -> 2341487 (+0.24%); split: -0.29%, +0.53%
Cat2: 18445905 -> 18445930 (+0.00%)
Cat6: 515382 -> 514732 (-0.13%)
Cat7: 1635575 -> 1637224 (+0.10%); split: -0.09%, +0.19%
Totals from 2293 (1.39% of 164705) affected shaders:
MaxWaves: 21622 -> 24802 (+14.71%)
Instrs: 3399456 -> 3400052 (+0.02%); split: -0.49%, +0.51%
CodeSize: 6576806 -> 6597292 (+0.31%); split: -0.24%, +0.55%
NOPs: 774365 -> 768091 (-0.81%); split: -1.54%, +0.73%
MOVs: 226724 -> 232959 (+2.75%); split: -2.73%, +5.48%
COVs: 48005 -> 48016 (+0.02%)
Full: 50599 -> 44571 (-11.91%)
(ss): 88248 -> 88966 (+0.81%); split: -1.85%, +2.66%
(sy): 41345 -> 41013 (-0.80%); split: -3.03%, +2.23%
(ss)-stall: 396793 -> 391497 (-1.33%); split: -2.11%, +0.78%
(sy)-stall: 1594786 -> 1551883 (-2.69%); split: -5.06%, +2.37%
STPs: 1147 -> 689 (-39.93%)
LDPs: 2535 -> 699 (-72.43%)
Preamble Instrs: 707407 -> 707187 (-0.03%)
Early Preamble: 180 -> 183 (+1.67%)
Last helper: 1538341 -> 1540256 (+0.12%); split: -0.35%, +0.47%
Subgroup size: 149248 -> 150272 (+0.69%)
Cat0: 857696 -> 851562 (-0.72%); split: -1.43%, +0.72%
Cat1: 275565 -> 281271 (+2.07%); split: -2.44%, +4.51%
Cat2: 1139467 -> 1139492 (+0.00%)
Cat6: 22505 -> 21855 (-2.89%)
Cat7: 129600 -> 131249 (+1.27%); split: -1.15%, +2.42%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36714 >
2025-08-19 20:07:34 +00:00
Job Noorman
ca15116fa1
ir3/array_to_ssa: fix updating/removing phis
...
Fix checking instruction flags instead of dst flags, and updating src
instead of def.
Totals:
MaxWaves: 2381954 -> 2381958 (+0.00%)
Instrs: 49073677 -> 49073417 (-0.00%)
CodeSize: 102537524 -> 102536824 (-0.00%)
NOPs: 8396340 -> 8396432 (+0.00%); split: -0.00%, +0.00%
MOVs: 1450777 -> 1450422 (-0.02%)
Full: 1714304 -> 1714287 (-0.00%)
(ss): 1126433 -> 1126463 (+0.00%); split: -0.00%, +0.00%
(ss)-stall: 4013834 -> 4013854 (+0.00%)
(sy)-stall: 16713036 -> 16713082 (+0.00%)
Cat0: 9252109 -> 9252194 (+0.00%); split: -0.00%, +0.00%
Cat1: 2337941 -> 2337592 (-0.01%)
Cat7: 1636810 -> 1636814 (+0.00%); split: -0.00%, +0.00%
Totals from 5 (0.00% of 164705) affected shaders:
MaxWaves: 42 -> 46 (+9.52%)
Instrs: 9052 -> 8792 (-2.87%)
CodeSize: 16806 -> 16106 (-4.17%)
NOPs: 2369 -> 2461 (+3.88%); split: -0.17%, +4.05%
MOVs: 1140 -> 785 (-31.14%)
Full: 133 -> 116 (-12.78%)
(ss): 206 -> 236 (+14.56%); split: -0.97%, +15.53%
(ss)-stall: 901 -> 921 (+2.22%)
(sy)-stall: 6229 -> 6275 (+0.74%)
Cat0: 2695 -> 2780 (+3.15%); split: -0.22%, +3.38%
Cat1: 1333 -> 984 (-26.18%)
Cat7: 419 -> 423 (+0.95%); split: -0.48%, +1.43%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 3ac743c333 ("ir3: Add pass to lower arrays to SSA")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36714 >
2025-08-19 20:07:34 +00:00
Michal Krol
2385fa2098
gallium: Do not flush subnormals during tessellation.
...
D3D11 requires that subnormals are not flushed to zero
when tessellating primitives. Since we are flushing
subnormals during shader execution, we must temporarily
turn flushing off when calling the tessellator.
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36811 >
2025-08-19 19:45:29 +00:00
Gert Wollny
8fc2b0d24c
r600/sfn: Emit thread position as two-slot op
...
It doesn't change much though, because it always has to be scheduled
as in the xy channels.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:33 +00:00
Gert Wollny
b0bf1d914a
r600/sfn: give more liberty to the channel selection in simple two-slot ops
...
Some ops on 64 bit data don't require the data to reside in neighboring
channels and can be executed as seperate 32 bit ops. In these cases we don't
need to pin the registers to a specific channel, but for scheduling it is better
that we make sure that both destination values reside in different channels, so
that they can be scheduled into one ALU group and reduce the probability of
read-port conflicts when used as source values.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:33 +00:00
Gert Wollny
206d50ba25
r600/sfn: op1v_flt64_to_flt32 as multi-slot instruction
...
With that the optimizer can better switch the channel.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:32 +00:00
Gert Wollny
2d88e9236d
r600/sfn: Handle more ops in desk mask evaluation
...
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:32 +00:00
Gert Wollny
00c41ad03a
r600/sfn: replace hard-coded multislot dot handling
...
More ops then op2_dot_ieee + op2_mul_ieee can be submitted
as multi-slot ops. Make it ease to handle additional opcodes
when splitting the alu op that has only one dst but requires
multiple slots. With that we can emit more multi-slot ops that
use consecutive slots and use a different opcode in the last slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:31 +00:00
Gert Wollny
f2916b3df4
r600/sfn: Fix the mods when splitting ALU op
...
In preparation of splitting 64 bit two slot ops with one 32 bit
dest register use the right start slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:31 +00:00
Gert Wollny
1ba8ff9fe6
r600/sfn: Take slot count into account when pinning registers
...
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:30 +00:00
Gert Wollny
77eaad8e21
r600/sfn: Fix test when allocating registers more freely
...
With the changes to the register pinning we have to update the test
to avoid failures later.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:29 +00:00
Gert Wollny
b6a917b6da
r600/sfn: Only map ssa index to register index if pinning is not free
...
If we have more than one register that is associated with the same
ssa index, but can be allocated without a specific channel pinning,
then don't add it to the ssa.index/register.index map to not
re-use the same register index.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:29 +00:00
Gert Wollny
6e2f08633a
r600/sfn: Take allowed dest mask into account in copy-prop
...
In addition, on Cayman some trans opts can use three or four channels,
and it may be an advantage to use the four channel version if the
result needs to be written to the w channel to reduce the all-over
ALU instruction group count.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743 >
2025-08-19 19:30:29 +00:00
Faith Ekstrand
14b4160792
vulkan/wsi: Only test for dma-buf sync file support once
...
Instead of each helper having a VK_ERROR_FEATURE_NOT_PRESENT fast-reject
path, drop those paths and check at the top of each caller. This
ensures that we do the check once per wsi_device, and only on a known
test dma-buf and that any subsequent fails turn into fails rather than
silently turning off explicit/implicit sync in potentially inconsistent
ways.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36816 >
2025-08-19 18:59:43 +00:00
Faith Ekstrand
6d3c82704d
vulkan/wsi: Sanitize the result of wsi_drm_check_dma_buf_sync_file_import_export()
...
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36816 >
2025-08-19 18:59:43 +00:00
Faith Ekstrand
9ddd29639c
vulkan/wsi: Style nits
...
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36816 >
2025-08-19 18:59:43 +00:00
Natalie Vock
4de3a5cce3
radv: Only expose indirect raytracing on gfx7+
...
It relies on unaligned indirect dispatches which are broken on gfx6.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30811 >
2025-08-19 18:34:41 +00:00
Rob Clark
e1493996b5
freedreno/decode: Add missing varset check
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13688
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36818 >
2025-08-19 18:19:58 +00:00
Samuel Pitoiset
baaf5d643a
radv: emit inlined push constants with buffered SH regs on GFX12
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36570 >
2025-08-19 18:01:23 +00:00
Samuel Pitoiset
c710eaa443
radv: emit descriptor pointers with buffered SH regs on GFX12
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36570 >
2025-08-19 18:01:22 +00:00
Samuel Pitoiset
95d2f009a9
radv: emit compute pipeline with buffered SH regs on GFX12
...
This also includes RT, task shaders and DGC IES for compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36570 >
2025-08-19 18:01:21 +00:00
Samuel Pitoiset
bbf8338443
radv: rework the helper to emit buffered regs on GFX12
...
Also reserve enough space if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36570 >
2025-08-19 18:01:21 +00:00
Samuel Pitoiset
1f26f93aa7
radv: emit relocation for task shaders at the same place as other stages
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36570 >
2025-08-19 18:01:21 +00:00
Karol Herbst
f2f945c2b7
nak: run nir_opt_move nir_move_comparisons
...
Totals:
CodeSize: 914469536 -> 914055696 (-0.05%); split: -0.07%, +0.02%
Number of GPRs: 3863818 -> 3866731 (+0.08%); split: -0.01%, +0.08%
SLM Size: 841076 -> 840828 (-0.03%); split: -0.03%, +0.00%
Static cycle count: 1073101189 -> 1059404451 (-1.28%); split: -1.39%, +0.11%
Spills to memory: 57317 -> 54698 (-4.57%); split: -4.57%, +0.00%
Fills from memory: 57317 -> 54698 (-4.57%); split: -4.57%, +0.00%
Spills to reg: 67707 -> 57646 (-14.86%); split: -15.24%, +0.38%
Fills from reg: 80456 -> 71960 (-10.56%); split: -10.75%, +0.20%
Max warps/SM: 3672668 -> 3672244 (-0.01%); split: +0.00%, -0.01%
Totals from 33585 (38.33% of 87622) affected shaders:
CodeSize: 614909536 -> 614495696 (-0.07%); split: -0.10%, +0.03%
Number of GPRs: 1771770 -> 1774683 (+0.16%); split: -0.01%, +0.18%
SLM Size: 659824 -> 659576 (-0.04%); split: -0.04%, +0.00%
Static cycle count: 994849091 -> 981152353 (-1.38%); split: -1.50%, +0.12%
Spills to memory: 57317 -> 54698 (-4.57%); split: -4.57%, +0.00%
Fills from memory: 57317 -> 54698 (-4.57%); split: -4.57%, +0.00%
Spills to reg: 67372 -> 57311 (-14.93%); split: -15.32%, +0.39%
Fills from reg: 80178 -> 71682 (-10.60%); split: -10.79%, +0.20%
Max warps/SM: 1299808 -> 1299384 (-0.03%); split: +0.01%, -0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36536 >
2025-08-19 17:29:07 +00:00
Karol Herbst
83cf765f8e
nak: run nir_opt_move nir_move_load_ubo
...
Usually we can fold most ldc and ldcx into the instruction using it,
however there are a couple of cases where we can't, e.g. when there is an
indirect offset.
Moving the ldc(x) down to the consumer leads to increase value ranges for
uniform registers, but lowering them for normal registers.
Totals:
CodeSize: 914650304 -> 914469536 (-0.02%); split: -0.05%, +0.03%
Number of GPRs: 3879754 -> 3863818 (-0.41%); split: -0.42%, +0.01%
Static cycle count: 1073273107 -> 1073101189 (-0.02%); split: -0.09%, +0.08%
Spills to reg: 67219 -> 67707 (+0.73%); split: -0.10%, +0.83%
Fills from reg: 79733 -> 80456 (+0.91%); split: -0.10%, +1.01%
Max warps/SM: 3666036 -> 3672668 (+0.18%); split: +0.18%, -0.00%
Totals from 24235 (27.66% of 87622) affected shaders:
CodeSize: 444747392 -> 444566624 (-0.04%); split: -0.11%, +0.07%
Number of GPRs: 1360384 -> 1344448 (-1.17%); split: -1.20%, +0.03%
Static cycle count: 806310857 -> 806138939 (-0.02%); split: -0.12%, +0.10%
Spills to reg: 35826 -> 36314 (+1.36%); split: -0.19%, +1.55%
Fills from reg: 31863 -> 32586 (+2.27%); split: -0.26%, +2.53%
Max warps/SM: 911328 -> 917960 (+0.73%); split: +0.74%, -0.01%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36536 >
2025-08-19 17:29:07 +00:00
Erik Faye-Lund
efd73dca12
docs/panfrost: update exposed vulkan version
...
I've been waiting for the Vulkan 1.4 results to be formally conformant
to submit this, so I didn't have to update the wording, hehe.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36838 >
2025-08-19 17:24:25 +00:00
Erik Faye-Lund
4c9aac2799
docs/features: sort drivers
...
We usually keep these alphabetically sorted, let's update the sorting
here.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36838 >
2025-08-19 17:24:25 +00:00
Daniel Schürmann
0546ecfadb
aco/scheduler: small refactor of schedule_VMEM()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:12 +00:00
Daniel Schürmann
0c590eb903
aco/scheduler: schedule VMEM store clauses during the regular forward pass
...
Totals from 1456 (1.82% of 79839) affected shaders: (Navi48)
MaxWaves: 37780 -> 37128 (-1.73%); split: +0.15%, -1.87%
Instrs: 3788175 -> 3788435 (+0.01%); split: -0.04%, +0.04%
CodeSize: 20468648 -> 20467432 (-0.01%); split: -0.04%, +0.03%
VGPRs: 86820 -> 91440 (+5.32%); split: -0.10%, +5.42%
Latency: 26866232 -> 26858867 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 3491741 -> 3828339 (+9.64%); split: -0.02%, +9.66%
VClause: 90413 -> 89426 (-1.09%); split: -1.27%, +0.18%
SClause: 130532 -> 130530 (-0.00%); split: -0.00%, +0.00%
Copies: 347397 -> 347806 (+0.12%); split: -0.11%, +0.23%
Branches: 117476 -> 117496 (+0.02%)
VALU: 1897427 -> 1897830 (+0.02%); split: -0.02%, +0.04%
SALU: 602365 -> 602379 (+0.00%)
VOPD: 1259 -> 1251 (-0.64%); split: +0.24%, -0.87%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:12 +00:00
Daniel Schürmann
f601eb8555
aco/scheduler: move clauses as batch
...
Totals from 391 (0.49% of 79839) affected shaders:
Instrs: 612478 -> 612515 (+0.01%); split: -0.06%, +0.06%
CodeSize: 3342896 -> 3343228 (+0.01%); split: -0.04%, +0.05%
Latency: 6909794 -> 6909938 (+0.00%); split: -0.03%, +0.03%
VClause: 10752 -> 10167 (-5.44%); split: -5.46%, +0.02%
Copies: 26623 -> 26627 (+0.02%); split: -0.00%, +0.02%
VALU: 377494 -> 377499 (+0.00%); split: -0.00%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:12 +00:00
Daniel Schürmann
70f0c065e8
aco/scheduler: ignore potential SMEM stalls when forming clauses
...
Totals from 4190 (5.25% of 79839) affected shaders: (Navi48)
MaxWaves: 117020 -> 117014 (-0.01%)
Instrs: 4801892 -> 4801547 (-0.01%); split: -0.06%, +0.05%
CodeSize: 25327632 -> 25325500 (-0.01%); split: -0.05%, +0.04%
VGPRs: 236452 -> 236488 (+0.02%)
Latency: 30569070 -> 30539464 (-0.10%); split: -0.13%, +0.04%
InvThroughput: 4891650 -> 4891062 (-0.01%); split: -0.03%, +0.01%
VClause: 119615 -> 118763 (-0.71%); split: -1.02%, +0.31%
SClause: 100482 -> 100297 (-0.18%); split: -0.44%, +0.26%
Copies: 326644 -> 326756 (+0.03%); split: -0.19%, +0.22%
Branches: 98982 -> 98980 (-0.00%)
VALU: 2712397 -> 2712534 (+0.01%); split: -0.02%, +0.03%
SALU: 591836 -> 591817 (-0.00%); split: -0.00%, +0.00%
VOPD: 993 -> 987 (-0.60%); split: +0.20%, -0.81%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:11 +00:00
Daniel Schürmann
d3a0f268b9
aco/scheduler: short-cut downwards_move_clause() when no movement is done
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:11 +00:00
Daniel Schürmann
8543b6cf2e
aco/scheduler: remove DownwardsCursor::clause_demand
...
As we stop scheduling after forming clauses, this value
is not needed anymore.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:10 +00:00
Daniel Schürmann
5ae30deffb
aco/scheduler: remove DownwardsCursor::insert_demand_clause
...
This partially reverts 93872270f0 ('aco/scheduler: keep track of RegisterDemand at DownwardsCursor::insert_idx{_clause}').
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:10 +00:00
Daniel Schürmann
e95d728a98
aco/scheduler: split downwards_move_clause() from downwards_move()
...
We will do batched moves for clauses with the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:09 +00:00
Daniel Schürmann
37299a8d1a
aco/scheduler: Stop downwards scheduling after encountering the first clause
...
Totals from 9899 (12.40% of 79839) affected shaders: (Navi48)
MaxWaves: 276355 -> 276317 (-0.01%); split: +0.01%, -0.02%
Instrs: 8781768 -> 8766504 (-0.17%); split: -0.25%, +0.07%
CodeSize: 46297556 -> 46236104 (-0.13%); split: -0.19%, +0.06%
VGPRs: 574680 -> 574800 (+0.02%); split: -0.00%, +0.03%
Latency: 54261324 -> 54357916 (+0.18%); split: -0.14%, +0.32%
InvThroughput: 9122700 -> 9121115 (-0.02%); split: -0.07%, +0.05%
VClause: 222062 -> 218499 (-1.60%); split: -2.33%, +0.73%
SClause: 167138 -> 163233 (-2.34%); split: -2.43%, +0.09%
Copies: 602395 -> 598560 (-0.64%); split: -1.21%, +0.57%
Branches: 161939 -> 161932 (-0.00%); split: -0.01%, +0.00%
VALU: 5063999 -> 5060199 (-0.08%); split: -0.14%, +0.07%
SALU: 988254 -> 988285 (+0.00%); split: -0.02%, +0.02%
VOPD: 2478 -> 2443 (-1.41%); split: +0.40%, -1.82%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:09 +00:00
Daniel Schürmann
fb6b95517e
aco/scheduler: check dependencies of entire clause upfront
...
and bail if any instruction of the clause can't be moved.
Totals from 4310 (5.40% of 79839) affected shaders:
MaxWaves: 115826 -> 115834 (+0.01%)
Instrs: 6256436 -> 6257599 (+0.02%); split: -0.05%, +0.07%
CodeSize: 32816488 -> 32820768 (+0.01%); split: -0.04%, +0.05%
VGPRs: 260184 -> 260172 (-0.00%)
Latency: 41207213 -> 41052150 (-0.38%); split: -0.45%, +0.07%
InvThroughput: 6822608 -> 6815208 (-0.11%); split: -0.14%, +0.03%
VClause: 148412 -> 147133 (-0.86%); split: -1.03%, +0.17%
SClause: 120854 -> 120856 (+0.00%); split: -0.01%, +0.01%
Copies: 425910 -> 427276 (+0.32%); split: -0.25%, +0.57%
VALU: 3572293 -> 3573647 (+0.04%); split: -0.03%, +0.07%
VOPD: 2803 -> 2816 (+0.46%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:08 +00:00
Aksel Hjerpbakk
0e339c7a64
panvk: clear big_bos on cmd pool reset with release bit
...
Clear big bos cache if the the user calls vkResetCommandPool with
VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36713 >
2025-08-19 16:41:31 +00:00
Aksel Hjerpbakk
0e88dd575f
panvk: pool large TLS allocations
...
Cache TLS in the case of large spilling. For content that is spilling
large amounts of TLS this can bring substantial uplifts in
performance.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36713 >
2025-08-19 16:41:31 +00:00
Georg Lehmann
de3d04dd72
nir/uub: guard against division by 0
...
Fixes: 8ee5440073 ("nir/uub: improve ishl/imul with constant sources")
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36805 >
2025-08-19 15:49:57 +00:00
Romaric Jodin
910ac069c5
panfrost/perfetto: Use Android-internal perfetto
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This enables ninja-to-soong to generate an Android.bp that builds Mesa
against Android's libperfetto_client_experimental library.
Following:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36561
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36807 >
2025-08-19 15:02:06 +00:00
Daniel Schürmann
7e63251d1f
aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 1435 (1.80% of 79839) affected shaders: (Navi48)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
e504c2543a
radv: unconditionally call ac_nir_lower_mem_access_bit_sizes()
...
radv_nir_lower_io_to_mem() might also create unaligned memory accesses.
Totals from 1339 (1.68% of 79839) affected shaders: (Navi48)
MaxWaves: 35424 -> 35408 (-0.05%); split: +0.07%, -0.12%
Instrs: 1080783 -> 1047739 (-3.06%)
CodeSize: 5559464 -> 5311520 (-4.46%)
VGPRs: 78900 -> 78852 (-0.06%); split: -0.17%, +0.11%
Latency: 2802027 -> 2769668 (-1.15%); split: -1.16%, +0.01%
InvThroughput: 439935 -> 439313 (-0.14%); split: -0.23%, +0.09%
SClause: 15188 -> 15187 (-0.01%)
Copies: 63302 -> 62585 (-1.13%); split: -1.35%, +0.22%
PreVGPRs: 64891 -> 64901 (+0.02%)
VALU: 604979 -> 605116 (+0.02%); split: -0.04%, +0.06%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
1fde289539
aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 3 (0.00% of 79839) affected shaders: (Navi48)
Instrs: 700 -> 698 (-0.29%)
CodeSize: 3860 -> 3852 (-0.21%)
Latency: 2351 -> 2349 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
4632ee4c37
aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform()
...
Also allow to use p_as_uniform and improve vector splitting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Daniel Schürmann
52cd5f7e69
ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions
...
Totals from 1400 (1.75% of 79839) affected shaders: (Navi48)
MaxWaves: 38313 -> 38317 (+0.01%); split: +0.06%, -0.05%
Instrs: 1162521 -> 1199627 (+3.19%); split: -0.01%, +3.20%
CodeSize: 5874288 -> 6146832 (+4.64%); split: -0.01%, +4.65%
VGPRs: 79948 -> 79984 (+0.05%); split: -0.12%, +0.17%
Latency: 3703961 -> 3741457 (+1.01%); split: -0.02%, +1.04%
InvThroughput: 589594 -> 590597 (+0.17%); split: -0.06%, +0.23%
VClause: 22561 -> 22564 (+0.01%)
SClause: 19615 -> 19611 (-0.02%); split: -0.03%, +0.01%
Copies: 70721 -> 71678 (+1.35%); split: -0.25%, +1.60%
PreVGPRs: 61068 -> 61101 (+0.05%); split: -0.00%, +0.06%
VALU: 651754 -> 651785 (+0.00%); split: -0.07%, +0.07%
SALU: 141953 -> 141955 (+0.00%)
VOPD: 489 -> 485 (-0.82%); split: +0.41%, -1.23%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Daniel Schürmann
63f7a03dd1
ac/nir: use HW-requirements on alignment for vectorizing LDS
...
Totals from 663 (0.83% of 79839) affected shaders: (Navi48)
MaxWaves: 16758 -> 16752 (-0.04%)
Instrs: 748063 -> 750213 (+0.29%); split: -0.08%, +0.37%
CodeSize: 3864912 -> 3874984 (+0.26%); split: -0.11%, +0.37%
VGPRs: 40640 -> 40604 (-0.09%); split: -0.30%, +0.21%
Latency: 6977888 -> 6980523 (+0.04%); split: -0.05%, +0.09%
InvThroughput: 1176313 -> 1174557 (-0.15%); split: -0.23%, +0.08%
VClause: 13852 -> 13843 (-0.06%); split: -0.10%, +0.04%
SClause: 13221 -> 13219 (-0.02%)
Copies: 44814 -> 44760 (-0.12%); split: -0.41%, +0.29%
PreSGPRs: 29276 -> 29285 (+0.03%)
PreVGPRs: 30835 -> 30861 (+0.08%); split: -0.11%, +0.19%
VALU: 423942 -> 423782 (-0.04%); split: -0.21%, +0.17%
SALU: 81271 -> 81188 (-0.10%); split: -0.19%, +0.09%
VOPD: 243 -> 238 (-2.06%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Daniel Schürmann
26595577b3
aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00