There are two fragment shaders from RDR2 that is hurt for spills and
fills on Lunar Lake.
Totals from 2 (0.00% of 551413) affected shaders:
Spill count: 1252 -> 1317 (+5.19%)
Fill count: 2518 -> 2642 (+4.92%)
Those shaders... have a lot of room for improvement. There are some
patterns in those shaders that we handle very, very poorly. Improving
those patterns would likely improve the spills and fills in these
shaders quite dramatically.
Given how much other platforms are helped, I don't this should block
this commit.
No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms.
v2: Add some comments and an additional assertion. Suggested by Ken.
shader-db:
Lunar Lake
total instructions in shared programs: 18094517 -> 18094511 (<.01%)
instructions in affected programs: 809 -> 803 (-0.74%)
helped: 6 / HURT: 0
total cycles in shared programs: 921532158 -> 921532168 (<.01%)
cycles in affected programs: 2266 -> 2276 (0.44%)
helped: 0 / HURT: 3
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820845 -> 19820839 (<.01%)
instructions in affected programs: 803 -> 797 (-0.75%)
helped: 6 / HURT: 0
total cycles in shared programs: 906372999 -> 906372949 (<.01%)
cycles in affected programs: 3216 -> 3166 (-1.55%)
helped: 6 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00%
Spill count: 69732 -> 69797 (+0.09%)
Fill count: 128521 -> 128645 (+0.10%)
Totals from 349 (0.06% of 551413) affected shaders:
Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21%
Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41%
Spill count: 1951 -> 2016 (+3.33%)
Fill count: 4899 -> 5023 (+2.53%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00%
Spill count: 79279 -> 79003 (-0.35%)
Fill count: 148803 -> 147942 (-0.58%)
Scratch Memory Size: 3949568 -> 3946496 (-0.08%)
Max live registers: 31879325 -> 31879230 (-0.00%)
Totals from 366 (0.06% of 633185) affected shaders:
Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01%
Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24%
Spill count: 3238 -> 2962 (-8.52%)
Fill count: 10018 -> 9157 (-8.59%)
Scratch Memory Size: 257024 -> 253952 (-1.20%)
Max live registers: 28187 -> 28092 (-0.34%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.
The next commit wants to use this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Previously an instruction like
cmp.l.f0.0(16) null:F, v359:F, 0f
would get lowered to
undef(16) v13703:UD
cmp.l.f0.0(16) v13703:F, v359:F, 0f
mov(16) null:UD, v13703:UD
After copy propagation and dead-code elimination are run again, the
original CMP gets turned back into its original form!
Some cases can also emit MOVs from the original NULL register.
It should be possible to not do any lowering here, but there are some
interactions with source lowering passes for things like
cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf
What inspired this was... diff'ing step-by-step dumps from
INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs
and undefs. It was very annoying. This low-effort change gets the
majority of the possible benefit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Copy propagation would incorrectly occur in this code
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
to create
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, u0<0>:UD NoMask group0
This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.
It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)
mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
I'm not sure it's worth it.
No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.
v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
This is required, otherwise we regress latency in cases where
applications are using FIFO without explicit KHR_present_wait.
This is an unacceptable regression.
The fix is to normalize the behavior to X11 WSI.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: d052b0201e ("vulkan/wsi/wayland: Use fifo protocol for FIFO")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32029>
SCC is only updated on GFX9+ but let's do it by default because the
trap handler shader is likely going to be more and more complex over
time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
ac_fake_hw_db can be the single place where radeon_info content
is emulated when overriding the GPU type.
For some fields we need to avoid overriding them with the value
coming from the ioctls to get the correct behavior.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
The next commit will reuse the radeon_info when AMD_FORCE_FAMILY
is used.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
If the gfx version is overriden by the exporter process, the surface
layout might not be compatible with the importer process (which uses
the real gfx version).
So fail early, except if the layout is LINEAR (because it should work
on all gen) or a modifier is used (which should be rejected elsewhere
if not supported).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
Instead of increasing the version number to describe which fields
are set, use the lower 16 bits for the metadata format version,
and the other bits as flags.
This way the version number defines the layout, and the flags
tell which values are set.
The format version is bumped to 3 (= can have flags), and 2 flags
are defined:
* AC_SURF_METADATA_FLAG_EXTRA_MD_BIT: replaces what was version
number = 2. This means the metadata contains extra information
for tools.
* AC_SURF_METADATA_FLAG_FAMILY_OVERRIDEN_BIT: if set, it means the
surface was allocated from a context that used an overriden gfx
family. This allows the importer process to fail the import early,
as the surface is likely to be invalid.
It also adds an extra dw at the end, to store the fake family.
This is a breaking change for existing code that interpreted
"version > 1" as 2, but only in one case:
AC_SURF_METADATA_FLAG_FAMILY_OVERRIDEN_BIT being set, but not
AC_SURF_METADATA_FLAG_EXTRA_MD_BIT, which produces a version number
of 0x20001 but there's not extra data.
I think this is ok, since both gfx family overriding and extra_md
are debugging tools.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
It's not used anymore
Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
People had enough time to migrate to rusticl, also nobody would support
this anyway anymore.
Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
Fixes: 212f1ab40e ("nvc0: support PIPE_CAP_RESOURCE_FROM_USER_MEMORY_COMPUTE_ONLY")
Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
This is only header metadata hint, so it should be passed directly
from packed headers to output. Also fix the value as render_width from
frontend is actually render_width_minus_1 (and same for height).
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31977>