Marek Olšák
f00f054087
ac,radeonsi: move lowering to load_color0/1 to ac_nir_lower_ps_early
...
It's better to have these all in one pass.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802 >
2026-01-01 18:30:29 +00:00
Georg Lehmann
cbedced5e8
ac/nir/cull: do not reuse variables if subgroup ops are used
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Subgroup ops make divergence information useless for our purpose,
we would need workgroup divergence.
The game affected here has control flow dependent on vote_any,
so it's possible that a wave only executes the code after culling/reordering
invocations.
That means we can't reuse the maybe undefined value from before culling.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14459
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39060 >
2025-12-29 18:38:29 +00:00
Samuel Pitoiset
78e1f53429
ac/perfcounter: update configuration of many blocks on GFX12
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:41 +00:00
Samuel Pitoiset
e377060e5c
ac/perfcounter: rework computing the number of block instances on GFX12
...
This needs to be generalized to older generations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:41 +00:00
Samuel Pitoiset
a90b913817
ac/perfcounter: fix the number of static instances for some blocks on GFX12
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:41 +00:00
Samuel Pitoiset
a62ca19010
ac/perfcounter: update the number of events for GRBME_SE on GFX12
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:40 +00:00
Samuel Pitoiset
3317ea5122
ac/perfcounter: define a distribution mode for all perf blocks on GFX12
...
This will be used to compute the number of instances and more stuff.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:40 +00:00
Samuel Pitoiset
5de9390d4c
ac/perfcounter: move configuration for GFX12 in a separate file
...
Performance counters are too different between generations and it's
less error prone to define them separately for each generations.
I'm starting with GFX12 first.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39083 >
2025-12-29 07:22:39 +00:00
Samuel Pitoiset
b3c983b8dd
amd,radv,radeonsi: add a new function to update windowed perf counters
...
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065 >
2025-12-24 07:20:01 +00:00
Timur Kristóf
7dbabc6acc
ac/nir/lower_taskmesh_io_to_mem: Use AC_TASK_DRAW_ENTRY_BYTES
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Replace draw_entry_bytes with AC_TASK_DRAW_ENTRY_BYTES.
This is 16 on all AMD HW that supports task/mesh shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032 >
2025-12-22 15:17:59 +00:00
Timur Kristóf
fc57fa4589
radv, radeonsi: Don't pass task ring info to mesh/task payload lowering
...
The pass now uses the ring descriptors to figure these out.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032 >
2025-12-22 15:17:59 +00:00
Timur Kristóf
4d381c9136
ac/nir/lower_taskmesh_io_to_mem: Don't hardcode payload entry size in shaders
...
Currently the number of task payload entry size is hardcoded
in shaders as a constant. This isn't a good idea because it
makes the code inflexible, eg. doesn't allow us
to change the number of entries dynamically.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032 >
2025-12-22 15:17:59 +00:00
Timur Kristóf
5348d953aa
ac/nir/lower_taskmesh_io_to_mem: Don't hardcode num_entries in shaders
...
Currently the number of task shader ring entries is hardcoded
in shaders as a constant. This isn't a good idea because it
makes the code inflexible, eg. prevents us from using the same
shader binary accross some chips as well as doesn't allow us
to change the number of entries dynamically.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032 >
2025-12-22 15:17:58 +00:00
Samuel Pitoiset
3b18fa348e
ac/rgp: enable new performance counters for RGP 2.6 on GFX10-GFX11
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
GFX12 needs more work and it will be added separately.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:52:14 +01:00
Samuel Pitoiset
8bc37d0d19
ac/spm: add support for Ray Tracing counters in RGP
...
These aren't new in RGP 2.6, they have been added since a while. But
because RADV wasn't supporting the new derived SPM chunk it wasn't
possible to expose them.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:51:44 +01:00
Samuel Pitoiset
0b5ae0758e
ac/spm: add support for new Memory percentage counters in RGP 2.6
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:51:14 +01:00
Samuel Pitoiset
3d2bb52a81
ac/spm: add support for new Memory bytes counters in RGP 2.6
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:50:44 +01:00
Samuel Pitoiset
84ecdc534c
ac/spm: add support for new LDS counters in RGP 2.6
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:50:41 +01:00
Samuel Pitoiset
07d9fc574c
ac/spm: implement the new derived SPM chunk for performance counters
...
This is the new method to add performance counters to RGP captures.
This will be used to add the new RGP 2.6 counters too.
The previous SPM code will be deprecated at some point but it's hard
to support all generations in one batch. So, I will implement this
step by step.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:48:59 +01:00
Samuel Pitoiset
3e4d629458
ac/spm: add an ID to raw performance counters
...
This will be used to compute derived values for the new RGP/SPM chunk.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:48:29 +01:00
Samuel Pitoiset
21ad7e4e32
ac/spm: print an error message when a group is unknown
...
Help debugging.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:48:21 +01:00
Samuel Pitoiset
7da6fe6a00
ac/spm: fix programming more than one counter slot
...
Some blocks have two or more SPM counters and they should be used when
more than 4 counters are programmed (ie. 16-bit per counter).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:48:14 +01:00
Samuel Pitoiset
e5a041ee1c
ac/spm: add an assertion to check the number of global instances
...
To make sure counters aren't silently discarded.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:48:06 +01:00
Samuel Pitoiset
eca9c00430
ac/spm: adjust configuration of some GPU blocks
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:47:58 +01:00
Samuel Pitoiset
6613dfb234
ac/perfcounter: add GCEA block description on GFX10-11
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:47:29 +01:00
Samuel Pitoiset
25e28819bd
ac/perfcounter: adjust the number of events for TD on GFX10.3
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:47:21 +01:00
Samuel Pitoiset
a4cb114f5a
ac/perfcounter: add a separate group for GFX10.3
...
This is just a copy&paste but GFX10.3 has way more counters than GFX10
that will be added later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013 >
2025-12-22 09:47:09 +01:00
Daniel Schürmann
1e8d367537
amd: add and use ac_cu_info::has_vtx_format_alpha_adjust_bug
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:48 +00:00
Daniel Schürmann
febc29907c
amd: add and use ac_cu_info::has_gfx6_mrt_export_bug
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:47 +00:00
Daniel Schürmann
7b7bdb76ab
amd: add ac_cu_info::has_point_sample_accel flag and use in ACO
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:47 +00:00
Daniel Schürmann
cfb745592d
amd: add ac_cu_info::has_mad32 flag and use in ACO
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:47 +00:00
Daniel Schürmann
f7c4aa48a0
ac/gpu_info: add some more flags to ac_cu_info
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:46 +00:00
Daniel Schürmann
6f4e8046b5
ac/gpu_info: create separate function ac_fill_cu_info() to fill out CU info
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:45 +00:00
Daniel Schürmann
749c619c45
ac/gpu_info: correct some SGPR and VGPR allocation values in ac_cu_info
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:45 +00:00
Daniel Schürmann
553b431aca
ac/gpu_info: move some CU information into separate struct ac_cu_info
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:44 +00:00
Daniel Schürmann
d94e90df25
amd/common: link with libamdgpu_addrlib
...
ac_surface needs that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701 >
2025-12-22 07:34:41 +00:00
Daniel Schürmann
f930ecdc55
amd: add newer small APUs to get_task_num_entries()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38999 >
2025-12-19 13:03:49 +00:00
Pierre-Eric Pelloux-Prayer
645fff5dae
ac/descriptors: account for num_storage_samples for gfx10
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a page fault when nr_samples=4 but nr_storage_samples=2.
Based on si_is_format_supported this is only supported for color
formats and when has_eqaa_surface_allocator is true (< GFX11).
The referenced commit below didn't introduce the issue but it
exposed it by forcing the gfx blit path to be used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13255
Fixes: 3424e16ece ("radeonsi: add decision code to select when to use CB_RESOLVE for performance")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925 >
2025-12-18 10:45:49 +00:00
Emma Anholt
059d301c79
nir: Drop the mode argument of nir_lower_vars_to_scratch().
...
It only makes sense for function temps, and that's the only way it's been
used.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245 >
2025-12-17 19:50:28 +00:00
Samuel Pitoiset
f8feed17e1
ac,radv,radeonsi: add tracked register macros to common code
...
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:26 +00:00
Samuel Pitoiset
c580fc667f
ac,radv: add ac_cmdbuf::context_roll and use it
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:26 +00:00
Samuel Pitoiset
f3b385859a
ac,radv: add more cmdbuf emit helpers
...
Some can't be shared with RadeonSI because it uses templates in some
places.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:25 +00:00
Samuel Pitoiset
262fc80e45
ac,radv,radeonsi: add functions to initialize tracked regs
...
Also initialize the new slots for RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:25 +00:00
Samuel Pitoiset
44314e1ea6
ac,radv,radeonsi: add ac_tracked_regs
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:24 +00:00
Samuel Pitoiset
fad24d6fcc
ac/cmdbuf: add new slots to ac_tracked_reg
...
For RADV registers that aren't tracked in RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:23 +00:00
Samuel Pitoiset
18bdb76408
ac,radeonsi: move si_tracked_reg to common code
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740 >
2025-12-17 15:09:22 +00:00
Timur Kristóf
0324700c03
radv: Use zero-filled BO for GFX6 and GFX10 null index buffer bug
...
GFX10 hangs when drawing from a 0-sized index buffer.
GFX6 has a HW bug when the index buffer address is 0.
Looking at VK CTS runs, GFX6 still triggers VM faults despite the
current mitigation, and it also tries to access memory when the
index buffer is zero sized. So it looks like GFX6 and GFX10
really have the same bug.
Let's share the mitigation between the two.
Use a zero-filled BO instead of the upload buffer.
This fixes VM faults on GFX6, and should speed up GFX10 a bit.
Note that the zero-filled BO is also going to be used for
other bug mitigations on GFX6-7.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958 >
2025-12-15 21:03:19 +00:00
Georg Lehmann
da197c3d55
ac/nir/lower_ps_late: remove gfx6 mrtz writemask workaround
...
This is now done in the backends.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38853 >
2025-12-12 17:00:51 +00:00
Rhys Perry
b5cf3b1628
ac/nir: fix check for increasing size of non-descriptor loads
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In the previous version, "end" could have been zero, which would have
allowed an increase of "mul" bytes, when it should not not be increased at all.
For example:
- align_offset=4
- mul=4
- unaligned_new_size=96
- aligned_new_size=128
This would have loaded a dword which was not loaded previously.
fossil-db (gfx1201):
Totals from 115 (0.14% of 79839) affected shaders:
Instrs: 286697 -> 287097 (+0.14%); split: -0.16%, +0.30%
CodeSize: 1477728 -> 1481256 (+0.24%); split: -0.13%, +0.37%
SpillSGPRs: 1662 -> 1658 (-0.24%); split: -0.42%, +0.18%
Latency: 2288612 -> 2290248 (+0.07%); split: -0.04%, +0.11%
InvThroughput: 467307 -> 467602 (+0.06%); split: -0.03%, +0.10%
VClause: 3689 -> 3691 (+0.05%)
SClause: 5052 -> 5064 (+0.24%); split: -0.20%, +0.44%
Copies: 34837 -> 35103 (+0.76%); split: -0.80%, +1.56%
Branches: 7402 -> 7401 (-0.01%)
PreSGPRs: 9147 -> 9143 (-0.04%); split: -0.44%, +0.39%
VALU: 159333 -> 159372 (+0.02%); split: -0.01%, +0.04%
SALU: 52047 -> 52276 (+0.44%); split: -0.55%, +0.99%
SMEM: 9556 -> 9697 (+1.48%)
fossil-db (navi31):
Totals from 238 (0.30% of 79825) affected shaders:
Instrs: 484480 -> 485105 (+0.13%); split: -0.05%, +0.17%
CodeSize: 2514012 -> 2517928 (+0.16%); split: -0.06%, +0.22%
SpillSGPRs: 1064 -> 1059 (-0.47%)
Latency: 3941121 -> 3944670 (+0.09%); split: -0.04%, +0.13%
InvThroughput: 897483 -> 898090 (+0.07%); split: -0.04%, +0.11%
VClause: 7101 -> 7098 (-0.04%)
SClause: 9036 -> 9052 (+0.18%); split: -0.44%, +0.62%
Copies: 42790 -> 43096 (+0.72%); split: -0.30%, +1.01%
PreSGPRs: 14357 -> 14342 (-0.10%); split: -0.37%, +0.26%
VALU: 298325 -> 298347 (+0.01%); split: -0.01%, +0.02%
SALU: 57288 -> 57577 (+0.50%); split: -0.20%, +0.70%
SMEM: 18768 -> 18967 (+1.06%); split: -0.01%, +1.07%
fossil-db (navi21):
Totals from 239 (0.30% of 79825) affected shaders:
Instrs: 444783 -> 445177 (+0.09%); split: -0.07%, +0.15%
CodeSize: 2371776 -> 2373136 (+0.06%); split: -0.13%, +0.19%
Latency: 4226478 -> 4219221 (-0.17%); split: -0.24%, +0.07%
InvThroughput: 1430962 -> 1428445 (-0.18%); split: -0.23%, +0.06%
SClause: 9357 -> 9398 (+0.44%); split: -0.20%, +0.64%
Copies: 42742 -> 42927 (+0.43%); split: -0.53%, +0.96%
Branches: 12975 -> 12970 (-0.04%); split: -0.05%, +0.02%
PreSGPRs: 14368 -> 14312 (-0.39%); split: -0.47%, +0.08%
VALU: 306642 -> 306720 (+0.03%); split: -0.02%, +0.05%
SALU: 63702 -> 63790 (+0.14%); split: -0.31%, +0.45%
SMEM: 20030 -> 20231 (+1.00%); split: -0.00%, +1.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14458
Backport-to: 25.3
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38903 >
2025-12-12 13:58:42 +00:00
Rhys Perry
49d923078f
ac/nir: fix calculation of aligned_new_size
...
This should consider nir_round_up_components().
fossil-db (gfx1201):
Totals from 90 (0.11% of 79839) affected shaders:
MaxWaves: 1829 -> 1901 (+3.94%)
Instrs: 410780 -> 411825 (+0.25%); split: -0.02%, +0.27%
CodeSize: 2227956 -> 2234464 (+0.29%); split: -0.02%, +0.31%
VGPRs: 6952 -> 6760 (-2.76%); split: -3.11%, +0.35%
Latency: 3071765 -> 3073960 (+0.07%); split: -0.00%, +0.07%
InvThroughput: 766201 -> 767322 (+0.15%); split: -0.00%, +0.15%
VClause: 7887 -> 7898 (+0.14%); split: -0.08%, +0.22%
Copies: 48189 -> 48324 (+0.28%); split: -0.05%, +0.33%
PreVGPRs: 6605 -> 6595 (-0.15%); split: -0.18%, +0.03%
VALU: 237272 -> 238147 (+0.37%); split: -0.01%, +0.37%
SALU: 48987 -> 49003 (+0.03%)
VMEM: 15542 -> 15560 (+0.12%)
VOPD: 188 -> 200 (+6.38%)
fossil-db (navi31):
Totals from 89 (0.11% of 79825) affected shaders:
MaxWaves: 1811 -> 1883 (+3.98%)
Instrs: 403695 -> 404691 (+0.25%); split: -0.01%, +0.26%
CodeSize: 2150612 -> 2154860 (+0.20%); split: -0.03%, +0.23%
VGPRs: 6892 -> 6676 (-3.13%)
Latency: 3306107 -> 3310010 (+0.12%); split: -0.01%, +0.13%
InvThroughput: 813092 -> 814382 (+0.16%); split: -0.00%, +0.16%
VClause: 7999 -> 8010 (+0.14%); split: -0.06%, +0.20%
Copies: 50089 -> 50210 (+0.24%); split: -0.05%, +0.29%
PreVGPRs: 6596 -> 6586 (-0.15%); split: -0.18%, +0.03%
VALU: 239617 -> 240392 (+0.32%); split: -0.01%, +0.33%
SALU: 45349 -> 45363 (+0.03%)
VMEM: 15762 -> 15780 (+0.11%)
VOPD: 258 -> 262 (+1.55%)
fossil-db (navi21):
Totals from 89 (0.11% of 79825) affected shaders:
Instrs: 345634 -> 346426 (+0.23%); split: -0.00%, +0.23%
CodeSize: 1895616 -> 1900156 (+0.24%); split: -0.00%, +0.24%
Latency: 3043334 -> 3046859 (+0.12%); split: -0.01%, +0.13%
InvThroughput: 928236 -> 929626 (+0.15%); split: -0.01%, +0.16%
VClause: 7894 -> 7905 (+0.14%); split: -0.06%, +0.20%
Copies: 48694 -> 48785 (+0.19%); split: -0.03%, +0.22%
PreVGPRs: 6580 -> 6570 (-0.15%); split: -0.18%, +0.03%
VALU: 228323 -> 229072 (+0.33%); split: -0.01%, +0.33%
SALU: 47202 -> 47216 (+0.03%)
VMEM: 16546 -> 16564 (+0.11%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14458
Backport-to: 25.3
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38903 >
2025-12-12 13:58:42 +00:00