Commit graph

3983 commits

Author SHA1 Message Date
Karol Herbst
4b66258717 nak: call nir_opt_algebraic_distribute_src_mods
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals from 134863 (11.12% of 1212873) affected shaders:
CodeSize: 2109574320 -> 2105266608 (-0.20%); split: -0.23%, +0.02%
Number of GPRs: 7199115 -> 7194107 (-0.07%); split: -0.13%, +0.06%
SLM Size: 201728 -> 201720 (-0.00%); split: -0.01%, +0.00%
Static cycle count: 2037608114 -> 2035165858 (-0.12%); split: -0.17%, +0.05%
Spills to memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01%
Fills from memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01%
Spills to reg: 78193 -> 78139 (-0.07%); split: -0.17%, +0.10%
Fills from reg: 83383 -> 83335 (-0.06%); split: -0.15%, +0.09%
Max warps/SM: 5188428 -> 5188840 (+0.01%); split: +0.03%, -0.02%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>
2026-04-28 03:08:01 +02:00
Karol Herbst
a9eac010dd nak: call nir_opt_fp_math_ctrl
Totals from 77360 (6.38% of 1212873) affected shaders:
CodeSize: 1255332672 -> 1250129888 (-0.41%); split: -0.44%, +0.03%
Number of GPRs: 4233257 -> 4226625 (-0.16%); split: -0.20%, +0.05%
Static cycle count: 937314398 -> 935865851 (-0.15%); split: -0.22%, +0.07%
Spills to memory: 11371 -> 11373 (+0.02%)
Fills from memory: 11371 -> 11373 (+0.02%)
Spills to reg: 24245 -> 24262 (+0.07%); split: -0.65%, +0.72%
Fills from reg: 23689 -> 23742 (+0.22%); split: -0.55%, +0.77%
Max warps/SM: 2912604 -> 2916096 (+0.12%); split: +0.15%, -0.03%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>
2026-04-28 03:07:51 +02:00
Mel Henning
4b0a0ed7b6 nak: Use NIR_LOOP_PASS
This is similar to
75ede9d9bc ("intel/brw: track last successful pass and leave the loop early")
except that it uses the common nir helpers.

Note that I've also marked nir_opt_peephole_select as NOT_IDEMPOTENT
because I'm skeptical that it actually is idempotent. This differs from
both brw and radv.

I'm also marking gcm as not idempotent because it isn't idempotent in
practice on one of the shaders in my shader-db:
2bf4ba7133/fossils/blender
pipeline hash 0e972f8e349af903

This is about a 4% geomean compile time speedup on my local collection
of shaders.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
2026-04-27 20:14:05 +00:00
Mel Henning
75fc9e2704 nak: Use shader_info->var_copies_lowered
This mirrors the change from
ba0bc7182d ("anv: use shader_info->var_copies_lowered")

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
2026-04-27 20:14:05 +00:00
Karol Herbst
4cd64165a3 nak/lower_cf: remove single src phis
When lowering cf we go out of SSA which translates phis into reg
intrinsics. However when converting them back to SSA, initially single
source phis now have an undef source leading to increased register
pressure on the NAK side. This also hinders copy propagation as it's not
designed to handle sources through phis yet.

Totals from 50621 (4.17% of 1212873) affected shaders:
CodeSize: 1605273744 -> 1621029728 (+0.98%); split: -0.34%, +1.32%
Number of GPRs: 4673586 -> 4067935 (-12.96%); split: -12.97%, +0.01%
SLM Size: 263428 -> 258176 (-1.99%)
Static cycle count: 2599838439 -> 2586392435 (-0.52%); split: -1.11%, +0.59%
Spills to memory: 23512 -> 15527 (-33.96%)
Fills from memory: 23512 -> 15527 (-33.96%)
Spills to reg: 64590 -> 57328 (-11.24%); split: -13.83%, +2.58%
Fills from reg: 55559 -> 44319 (-20.23%); split: -22.66%, +2.42%
Max warps/SM: 1189396 -> 1347600 (+13.30%)

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41042>
2026-04-21 23:37:55 +00:00
Karol Herbst
e09045e26c nak: the MS location comes last in TLD, same spot as depth compare in TEX
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some Max Payne 3 shaders are impacted by this and probably will fix some
issue there. The VK CTS isn't testing this, but it was verified to fix a
real problem by inserting 0 offsets into the instruction and having CTS
tests fail with the old ordering.

Totals from 3 (0.00% of 1163204) affected shaders:
CodeSize: 2496 -> 2736 (+9.62%)
Static cycle count: 732 -> 741 (+1.23%)

Fixes: ad01fbdda0 ("nak: Add a NIR texture lowering pass")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40957>
2026-04-15 17:00:07 +00:00
Dave Airlie
7067b66846 nvk: don't set sector promotion on texture headers
This appears to cause some sort of prefetching which is causing
page faults for linear textures on the following page after the
texture allocation.

This might be okay for tiled, but for now just disable it.

The test crashing this was to allocate an 800x409 linear 2D texture
which gnome-initial-setup was doing.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15277
Cc: mesa-stable
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40939>
2026-04-14 21:56:56 +00:00
Karol Herbst
9fdf3f684f nak: uregs are 6 bits before Hopper, so enforce that
Some instructions actually use the 2 other bits for things, e.g. sust

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40891>
2026-04-14 20:18:39 +00:00
Karol Herbst
bf6c3e9d99 nak: add is_gpr_reg and is_ugpr_reg helpers
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40891>
2026-04-14 20:18:39 +00:00
Karol Herbst
6c5ee118cd nak: add ugpr latency classes for memory instructions
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40891>
2026-04-14 20:18:39 +00:00
Karol Herbst
0c92d2191b nak/nvdisasm_tests: fix offset stride for gens older than Turing
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40891>
2026-04-14 20:18:39 +00:00
Karol Herbst
c10b4b1e47 nak: scalarize tex, tld and tld4 on SM70+
This helps RA a bit by reducing the size of the vectors passed to tex
instructions and therefore eliminate a few movs.

Totals from 145533 (12.51% of 1163204) affected shaders:
CodeSize: 1868329120 -> 1855817520 (-0.67%); split: -0.70%, +0.03%
Number of GPRs: 7007196 -> 7007028 (-0.00%); split: -0.01%, +0.01%
Static cycle count: 1157484762 -> 1153189018 (-0.37%); split: -0.46%, +0.09%
Spills to reg: 30581 -> 30580 (-0.00%)
Fills from reg: 33263 -> 33262 (-0.00%)
Max warps/SM: 5911104 -> 5911100 (-0.00%); split: +0.00%, -0.00%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900>
2026-04-14 17:48:04 +00:00
Karol Herbst
b6fb51caf5 nak/nvdisasm_tests: test .SCR flag in TEX, TLD and TLD4
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900>
2026-04-14 17:48:04 +00:00
Karol Herbst
f76e7d8e62 nak: add scalar tex encoding support
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900>
2026-04-14 17:48:04 +00:00
Karol Herbst
f3ce8fe90b nak: properly copy prop neg/abs float sources for flushed values
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows us to copy prop fadd.ftz -rZ, -|ssa| into consuming
instructions giving us nice gains across the board.

Totals from 1033868 (85.24% of 1212873) affected shaders:
CodeSize: 8813536528 -> 8355226128 (-5.20%); split: -5.21%, +0.01%
Number of GPRs: 44954066 -> 44299483 (-1.46%); split: -1.52%, +0.06%
SLM Size: 799688 -> 798544 (-0.14%)
Static cycle count: 4646939330 -> 4485129185 (-3.48%); split: -3.67%, +0.18%
Spills to memory: 35405 -> 33136 (-6.41%); split: -6.41%, +0.01%
Fills from memory: 35405 -> 33136 (-6.41%); split: -6.41%, +0.01%
Spills to reg: 196547 -> 196231 (-0.16%); split: -1.22%, +1.06%
Fills from reg: 201227 -> 200988 (-0.12%); split: -1.00%, +0.88%
Max warps/SM: 44143984 -> 44306960 (+0.37%); split: +0.38%, -0.01%

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40897>
2026-04-13 23:15:10 +00:00
Karol Herbst
8170f18d9b nak/copy_prop: allow modified F16v2 and F16 sources
Seems to help a couple of shaders using MUFU.F16

Totals from 178 (0.01% of 1212873) affected shaders:
CodeSize: 5929856 -> 5925088 (-0.08%); split: -0.08%, +0.00%
Static cycle count: 8667151 -> 8665940 (-0.01%); split: -0.02%, +0.00%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40897>
2026-04-13 23:15:10 +00:00
Mary Guillemard
3be57aa4c3 nak: Allows predicate in legalize_ext_instr
With OpLd now having a predicate, we forgot to update legalize_ext_instr
to allow predicates for it.

We should really get ride of those functions but for now let's keep it
simple and sync the implementation to what SM20 backend have.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9d90cbc314 ("nak: add input predicate to load_global_nv and OpLd")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40934>
2026-04-13 22:13:06 +00:00
Mary Guillemard
13f98d8658 nvk: Adjust maxFragmentCombinedOutputResources to match max descriptors limit
This was set to the lowest allowed value by spec but it should really be
matching the max descriptors limit.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15249 for NVK
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40868>
2026-04-13 18:44:08 +00:00
Mel Henning
c7ab501171 nvk: Disable zcull save/restore regions for now
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The issues with this are proving to be difficult to solve. Turn it off
for now until we have a proper fix.

Fixes: c24963d8da ("nvk: Enable zcull for VK_ATTACHMENT_LOAD_OP_LOAD")
Acked-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40894>
2026-04-10 22:20:58 +00:00
Mel Henning
ad65ed643b nvk: SET_ROOT_TABLE_PREFETCH
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:39 +00:00
Mel Henning
30b3de6ec4 nvk: Wire up ROOT_TABLE
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/12576
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:39 +00:00
Mel Henning
bff2d8dd9b nvk: Move mme_set_anti_alias_tests to a check func
This is more flexible than the expected array and will be necessary in
the following patches.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:38 +00:00
Mel Henning
26c00f6e17 nvk/cmd_indirect: Pass pdev into more functions
This will be used for checking if root table is enabled.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
701a2579fe nak: Add printf_cb to nak_constant_offset_info
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
aa782218e5 nak: Add an is_graphics param to nak_const_offsets
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
017bb885db nak: Turn nak_const_offsets into a function.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
7fc8fdbdaf nvk: Factor out build_push_write_push_const
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
3b4b72f546 nvk: Reorder nvk_root_descriptor_table
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
18fcf24547 nvk: Initialize NVC597_SET_ROOT_TABLE_VISIBILITY
This matches the initialization I've seen in traces from the proprietary
driver.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
398837aa58 nvk: Swizzle root_table.dynamic_buffers[]
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
54b41565a0 nvk: Rename macro loop index from i to _index
Calling this varaible `i` made it very easy for it to shadow a loop
variable in the enclosing scope, which became an issue if `src` were an
expression referencing a different variable `i`. Rename the variable to
make shadowing less likely.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
3476b34f63 nvk/lower_descriptors: Change ROOT_DESC addr space
The more general address space we used to have cannot be
implemented on top of ROOT_TABLE because of ROOT_TABLE's bank pattern.
Instead, adjust the address space so it provides a less general index
into dynamic_buffers.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:37 +00:00
Mel Henning
e53036c85a nvk/lower_descriptors: Add load_root_table_array()
This is a helper function for loading from an index of an array member
of root_table.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:36 +00:00
Mel Henning
5d90bbe8e5 nvk/lower_descriptors: .base in load_root_table
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:36 +00:00
Mel Henning
4bdbd6c341 nvk/lower_descriptors: Use more load_root_table
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:36 +00:00
Mel Henning
153454a6fd nvk/lower_descriptors: Move load_root_table up
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40639>
2026-04-10 19:21:36 +00:00
Rhys Perry
463e3643f2 nir: add and use block predecessor helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:32 +00:00
Mary Guillemard
b2e55f5a1a nvk: Remove old comments from draw state init
Those were relevent for Fermi or just the Gallium driver.
For the vertex runout, it is implemented a bit after
(SET_VERTEX_STREAM_SUBSTITUTE_A)

I also rewrote the comment about CSAA_ENABLE as it is still relevent.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
2026-04-08 08:06:19 +00:00
Mary Guillemard
091db8a827 nvk: adjust reduce color thresholds default values
NVIDIA proprietary driver set 4 for UNORM8 and SRGB8, let's match this.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
2026-04-08 08:06:19 +00:00
Mary Guillemard
99c226b833 nvk: Set VAF eviction policy to nornmal
NVIDIA proprietary driver does that, we were missing this and possibly
making the VAF (Vertex Attribute Fetch) unit evict the first entry
instead if nothing was setting it.

The golden ctx already set it for us at least on Ada but for consistency
let's make sure it's set here in case this is different on other
generations.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
2026-04-08 08:06:18 +00:00
Mary Guillemard
90c005dd90 nvk: Do not use SET_L1_CONFIGURATION on 3D state init
There is no reasons to cut 48KiB of memory out of L1 cache on gfx
considering that we do not have shared memory and that local
memory does not need to be directly addressable.

This is not set by NVIDIA proprietary driver and the golden ctx setup keep it
uninitialized.

Unsure if that will change anything in term of performance but it might reduce
L1 cache usage on 3D.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
2026-04-08 08:06:18 +00:00
Mary Guillemard
3b674771bb nvk: Do not fill cb0 at queue creation
We are already doing this in nvk_push_draw_state_init there is no need
for the extra DMA fill.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
2026-04-08 08:06:18 +00:00
Vinson Lee
ca6edbd9c8 nil: Fix Rust test link failure under Coverity due to missing -lm
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The nil Rust test fails to link when built under Coverity (cov-build):

  /usr/bin/ld: src/util/libmesa_util.a.p/format_u_format_other.c.o:
    undefined reference to symbol 'sqrtf@@GLIBC_2.2.5'
  /usr/bin/ld: /lib/x86_64-linux-gnu/libm.so.6:
    error adding symbols: DSO missing from command line

This does not reproduce with plain GCC or Clang builds.

When rustc invokes the linker for the nil test binary, the generated
link command is structured as:

  cc ... [Rust rlibs] -Bdynamic -lm -ldl -lc ...
     -fuse-ld=lld -B.../gcc-ld ...
     [static archives: libmesa_util.a ...]

The -lm appears before libmesa_util.a in both Coverity and non-Coverity
builds. With --as-needed enabled, the linker only records a shared
library as needed if it resolves an undefined symbol at the point it
is encountered. Since no symbols need -lm when it is first seen, the
outcome depends on the linker implementation:

- lld (rustc's bundled linker, used in plain builds): Tolerates
  back-references from later static archives to earlier shared
  libraries, so libmesa_util.a's sqrtf reference is still resolved
  by the previously-seen libm.so.

- ld.bfd (GNU ld): Strict single-pass left-to-right. Once -lm is
  skipped by --as-needed, it cannot satisfy sqrtf when libmesa_util.a
  is processed later.

Coverity's cov-build wrapper intercepts rustc's call to the linker
and strips the -fuse-ld=lld and -B.../gcc-ld arguments, causing the
linker to fall back to the system's ld.bfd. This exposes the latent
link-order problem that lld was masking.

The underlying issue is that rustc places default libraries (-lm, -lc,
etc.) before user-specified static archives in the link command, which
is a known rustc limitation.
See also: https://github.com/rust-lang/rust/issues/154975

Fix this by passing -lm via rust_args with --no-as-needed brackets.
This forces ld.bfd to record libm as needed regardless of when it
appears on the command line, so sqrtf from libmesa_util.a is resolved
correctly under both lld and ld.bfd.

Fixes: 0920e0afb5 ("nil: Add zcull support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40793>
2026-04-07 21:17:27 -07:00
Mary Guillemard
5a5febfccd nvk: Ensure that shader I-cache prefetch is enabled on Ada+
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
2026-04-08 00:05:40 +00:00
Mary Guillemard
55a279e8b8 nvk: Wire up shader program prefetch method
On Ampere B and later, we can specify the prefetch size in blocks of a
gfx shader we are binding.

NVIDIA proprietary driver always set it with the
max size possible. (up to 127 blocks)

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
2026-04-08 00:05:40 +00:00
Mary Guillemard
742c91ce68 nvk: Move shader size and offset calculations to nvk_shader_get_shader_size
We are going to need the total shader size (without embedded data),
let's move this out of the upload codepath.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
2026-04-08 00:05:40 +00:00
Mary Guillemard
6d700284ac nvk: Use SET_PRIMITIVE_TOPOLOGY instead of MME scratch
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of keeping track of the topology with some scratch value in MME,
we can rely on SET_PRIMITIVE_TOPOLOGY to directly set it.

This simplify some of the MME codegen but does not seems to have any
impact on performance in general.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40749>
2026-04-07 14:11:16 +00:00
Mel Henning
001de6d71b nak: Fix mufu's f16 bit on sm90+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes multiple cts tests on blackwell, including eg.
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc

Fixes: d031365f7c ("nak: support MUFU.F16")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>
2026-04-07 05:10:16 +00:00
Karol Herbst
72e9f9a760 nak: add algebraic patterns to improve MUFU.F16
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Doesn't really help many shaders, but I've seen a couple that turn from
MUFU into F2F(MUFU.F16(F2F)). Though this might be as well a limitation
of related code, e.g. returning F32 from TEX, and not use TEX.F16 instead.

Totals:
CodeSize: 8662337424 -> 8662336960 (-0.00%)
Static cycle count: 4718044491 -> 4718044554 (+0.00%); split: -0.00%, +0.00%

Totals from 7 (0.00% of 1163204) affected shaders:
CodeSize: 236480 -> 236016 (-0.20%)
Static cycle count: 2108061 -> 2108124 (+0.00%); split: -0.01%, +0.01%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>
2026-04-02 01:10:57 +00:00
Karol Herbst
9cc2cd843b nak: enable MUFU.F16 on Turing and newer
Totals from 1427 (0.12% of 1163204) affected shaders:
CodeSize: 18599616 -> 18495424 (-0.56%); split: -0.56%, +0.00%
Number of GPRs: 91579 -> 91571 (-0.01%)
SLM Size: 14144 -> 14140 (-0.03%)
Static cycle count: 96164214 -> 96075886 (-0.09%); split: -0.13%, +0.04%
Spills to memory: 2677 -> 2681 (+0.15%)
Fills from memory: 2677 -> 2681 (+0.15%)
Max warps/SM: 48868 -> 48872 (+0.01%)

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>
2026-04-02 01:10:57 +00:00