This is similar to
75ede9d9bc ("intel/brw: track last successful pass and leave the loop early")
except that it uses the common nir helpers.
Note that I've also marked nir_opt_peephole_select as NOT_IDEMPOTENT
because I'm skeptical that it actually is idempotent. This differs from
both brw and radv.
I'm also marking gcm as not idempotent because it isn't idempotent in
practice on one of the shaders in my shader-db:
2bf4ba7133/fossils/blender
pipeline hash 0e972f8e349af903
This is about a 4% geomean compile time speedup on my local collection
of shaders.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
When lowering cf we go out of SSA which translates phis into reg
intrinsics. However when converting them back to SSA, initially single
source phis now have an undef source leading to increased register
pressure on the NAK side. This also hinders copy propagation as it's not
designed to handle sources through phis yet.
Totals from 50621 (4.17% of 1212873) affected shaders:
CodeSize: 1605273744 -> 1621029728 (+0.98%); split: -0.34%, +1.32%
Number of GPRs: 4673586 -> 4067935 (-12.96%); split: -12.97%, +0.01%
SLM Size: 263428 -> 258176 (-1.99%)
Static cycle count: 2599838439 -> 2586392435 (-0.52%); split: -1.11%, +0.59%
Spills to memory: 23512 -> 15527 (-33.96%)
Fills from memory: 23512 -> 15527 (-33.96%)
Spills to reg: 64590 -> 57328 (-11.24%); split: -13.83%, +2.58%
Fills from reg: 55559 -> 44319 (-20.23%); split: -22.66%, +2.42%
Max warps/SM: 1189396 -> 1347600 (+13.30%)
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41042>
Some Max Payne 3 shaders are impacted by this and probably will fix some
issue there. The VK CTS isn't testing this, but it was verified to fix a
real problem by inserting 0 offsets into the instruction and having CTS
tests fail with the old ordering.
Totals from 3 (0.00% of 1163204) affected shaders:
CodeSize: 2496 -> 2736 (+9.62%)
Static cycle count: 732 -> 741 (+1.23%)
Fixes: ad01fbdda0 ("nak: Add a NIR texture lowering pass")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40957>
This appears to cause some sort of prefetching which is causing
page faults for linear textures on the following page after the
texture allocation.
This might be okay for tiled, but for now just disable it.
The test crashing this was to allocate an 800x409 linear 2D texture
which gnome-initial-setup was doing.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15277
Cc: mesa-stable
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40939>
This helps RA a bit by reducing the size of the vectors passed to tex
instructions and therefore eliminate a few movs.
Totals from 145533 (12.51% of 1163204) affected shaders:
CodeSize: 1868329120 -> 1855817520 (-0.67%); split: -0.70%, +0.03%
Number of GPRs: 7007196 -> 7007028 (-0.00%); split: -0.01%, +0.01%
Static cycle count: 1157484762 -> 1153189018 (-0.37%); split: -0.46%, +0.09%
Spills to reg: 30581 -> 30580 (-0.00%)
Fills from reg: 33263 -> 33262 (-0.00%)
Max warps/SM: 5911104 -> 5911100 (-0.00%); split: +0.00%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900>
With OpLd now having a predicate, we forgot to update legalize_ext_instr
to allow predicates for it.
We should really get ride of those functions but for now let's keep it
simple and sync the implementation to what SM20 backend have.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9d90cbc314 ("nak: add input predicate to load_global_nv and OpLd")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40934>
The issues with this are proving to be difficult to solve. Turn it off
for now until we have a proper fix.
Fixes: c24963d8da ("nvk: Enable zcull for VK_ATTACHMENT_LOAD_OP_LOAD")
Acked-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40894>
Calling this varaible `i` made it very easy for it to shadow a loop
variable in the enclosing scope, which became an issue if `src` were an
expression referencing a different variable `i`. Rename the variable to
make shadowing less likely.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
The more general address space we used to have cannot be
implemented on top of ROOT_TABLE because of ROOT_TABLE's bank pattern.
Instead, adjust the address space so it provides a less general index
into dynamic_buffers.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
Those were relevent for Fermi or just the Gallium driver.
For the vertex runout, it is implemented a bit after
(SET_VERTEX_STREAM_SUBSTITUTE_A)
I also rewrote the comment about CSAA_ENABLE as it is still relevent.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver set 4 for UNORM8 and SRGB8, let's match this.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver does that, we were missing this and possibly
making the VAF (Vertex Attribute Fetch) unit evict the first entry
instead if nothing was setting it.
The golden ctx already set it for us at least on Ada but for consistency
let's make sure it's set here in case this is different on other
generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
There is no reasons to cut 48KiB of memory out of L1 cache on gfx
considering that we do not have shared memory and that local
memory does not need to be directly addressable.
This is not set by NVIDIA proprietary driver and the golden ctx setup keep it
uninitialized.
Unsure if that will change anything in term of performance but it might reduce
L1 cache usage on 3D.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
We are already doing this in nvk_push_draw_state_init there is no need
for the extra DMA fill.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
The nil Rust test fails to link when built under Coverity (cov-build):
/usr/bin/ld: src/util/libmesa_util.a.p/format_u_format_other.c.o:
undefined reference to symbol 'sqrtf@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libm.so.6:
error adding symbols: DSO missing from command line
This does not reproduce with plain GCC or Clang builds.
When rustc invokes the linker for the nil test binary, the generated
link command is structured as:
cc ... [Rust rlibs] -Bdynamic -lm -ldl -lc ...
-fuse-ld=lld -B.../gcc-ld ...
[static archives: libmesa_util.a ...]
The -lm appears before libmesa_util.a in both Coverity and non-Coverity
builds. With --as-needed enabled, the linker only records a shared
library as needed if it resolves an undefined symbol at the point it
is encountered. Since no symbols need -lm when it is first seen, the
outcome depends on the linker implementation:
- lld (rustc's bundled linker, used in plain builds): Tolerates
back-references from later static archives to earlier shared
libraries, so libmesa_util.a's sqrtf reference is still resolved
by the previously-seen libm.so.
- ld.bfd (GNU ld): Strict single-pass left-to-right. Once -lm is
skipped by --as-needed, it cannot satisfy sqrtf when libmesa_util.a
is processed later.
Coverity's cov-build wrapper intercepts rustc's call to the linker
and strips the -fuse-ld=lld and -B.../gcc-ld arguments, causing the
linker to fall back to the system's ld.bfd. This exposes the latent
link-order problem that lld was masking.
The underlying issue is that rustc places default libraries (-lm, -lc,
etc.) before user-specified static archives in the link command, which
is a known rustc limitation.
See also: https://github.com/rust-lang/rust/issues/154975
Fix this by passing -lm via rust_args with --no-as-needed brackets.
This forces ld.bfd to record libm as needed regardless of when it
appears on the command line, so sqrtf from libmesa_util.a is resolved
correctly under both lld and ld.bfd.
Fixes: 0920e0afb5 ("nil: Add zcull support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40793>
On Ampere B and later, we can specify the prefetch size in blocks of a
gfx shader we are binding.
NVIDIA proprietary driver always set it with the
max size possible. (up to 127 blocks)
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
We are going to need the total shader size (without embedded data),
let's move this out of the upload codepath.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
Instead of keeping track of the topology with some scratch value in MME,
we can rely on SET_PRIMITIVE_TOPOLOGY to directly set it.
This simplify some of the MME codegen but does not seems to have any
impact on performance in general.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40749>
Fixes multiple cts tests on blackwell, including eg.
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc
Fixes: d031365f7c ("nak: support MUFU.F16")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>
Doesn't really help many shaders, but I've seen a couple that turn from
MUFU into F2F(MUFU.F16(F2F)). Though this might be as well a limitation
of related code, e.g. returning F32 from TEX, and not use TEX.F16 instead.
Totals:
CodeSize: 8662337424 -> 8662336960 (-0.00%)
Static cycle count: 4718044491 -> 4718044554 (+0.00%); split: -0.00%, +0.00%
Totals from 7 (0.00% of 1163204) affected shaders:
CodeSize: 236480 -> 236016 (-0.20%)
Static cycle count: 2108061 -> 2108124 (+0.00%); split: -0.01%, +0.01%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>