Commit graph

209614 commits

Author SHA1 Message Date
Alyssa Rosenzweig
74ed2b78e8 asahi,hk: optimize no-op FS
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
626fa80c1b asahi: optimize pass type with depth-only passes
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
7f2a6cdd26 hk: only enable image view min LOD for dx12
I don't really want random Vulkan apps using this. fixes Steam shading
precaching via fossilize.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
a0a18c084e hk: kill psiz writes via topology, not feature
this regresses DXVK fast link shaders, I guess, but fixes Proton shader
precompiles. per discussion with Hans-Kristian

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
9c987ee75e asahi: use native colour masking
seems to work now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
562377f01d agx: try to rematerialize to improve occupancy
we already have a perfectly good spiller and SSA... use it when it helps. yes,
this costs a bit of CPU time, but it's guarded behind enough checks that the
average time should be fine.

this was prompted by a shadertoy where we were losing waves due to way too
many constants pooled at the start of a chunky shader.

in GL shader-db, only affected shaders are in blender:

   instrs HURT:   shaders/blender/1020.shader_test FS:              3125 -> 3178 (1.70%)
   instrs HURT:   shaders/blender/981.shader_test FS:               3125 -> 3178 (1.70%)
   instrs HURT:   shaders/blender/729.shader_test FS:               3086 -> 3154 (2.20%)
   instrs HURT:   shaders/blender/1023.shader_test FS:              3085 -> 3153 (2.20%)
   instrs HURT:   shaders/blender/424.shader_test FS:               3085 -> 3153 (2.20%)

   threads helped:   shaders/blender/1020.shader_test FS:              576 -> 640 (11.11%)
   threads helped:   shaders/blender/1023.shader_test FS:              576 -> 640 (11.11%)
   threads helped:   shaders/blender/424.shader_test FS:               576 -> 640 (11.11%)
   threads helped:   shaders/blender/729.shader_test FS:               576 -> 640 (11.11%)
   threads helped:   shaders/blender/981.shader_test FS:               576 -> 640 (11.11%)

in VK fossils, there's a lot more high pressure shaders that benefit:

   Totals from 113 (0.21% of 54019) affected shaders:
   MaxWaves: 64448 -> 73088 (+13.41%)
   Instrs: 388529 -> 391646 (+0.80%); split: -0.00%, +0.80%
   CodeSize: 2750064 -> 2769106 (+0.69%); split: -0.00%, +0.69%
   ALU: 292960 -> 295863 (+0.99%); split: -0.00%, +0.99%
   FSCIB: 292960 -> 295863 (+0.99%); split: -0.00%, +0.99%
   GPRs: 21297 -> 19289 (-9.43%)
   Preamble instrs: 75703 -> 75911 (+0.27%)

notable improvement in Far Cry 5.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Alyssa Rosenzweig
6544a4f1ae asahi: drop sink/move in GS code
this is asking for trouble, since divergence analysis doesn't handle stuff we
lower quickly. this fixes geometry shaders blowing up since the cited commit,
but since I was the one who r-b'd that change, I don't have anyone to blame but
myself C:

Fixes: d61edf079b ("nir: add nir_move_only_convergent/divergent")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36399>
2025-08-03 14:40:53 -04:00
Antonino Maniscalco
e4584c9470 tu: Add support for realtime vk priority
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The kernel creates 4 rings so it is possible to map each of vulkan's
priority to each ring.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36172>
2025-08-03 12:46:17 +00:00
LingMan
8227283d58 nak: Drop include paths for size_of and size_of_val
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
They have been added to the prelude with Rust 1.80.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:21 +00:00
LingMan
8376ecd842 rusticl: Use std::mem::offset_of!()
Support for nested fields got stabilized with Rust 1.82.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:21 +00:00
LingMan
0631b4fd7e rusticl: Drop include paths for size_of, size_of_val, and align_of
They have been added to the prelude with Rust 1.80.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:21 +00:00
LingMan
d4a7811519 rusticl: Use is_aligned from std
It got stabilized with Rust 1.79.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:20 +00:00
LingMan
6c7084357d mesa: Bump required Rust version to 1.82
Firefox ESR requires Rust 1.82 since version 140. Thus, this update
is in line with our Rust update policy.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:20 +00:00
LingMan
eda7043025 docs/rusticl: Update documented version requirements for meson and bindgen
The requirements bump a few weeks ago forgot to update the docs.

Fixes: 1a698c75ae ("build: Rust: Bump minimum Meson and bindgen version")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:20 +00:00
LingMan
b364732502 ci/rust: Drop date from Rust release channel selection
For stable Rust, specifying the patch version already uniquely identifies a toolchain build. Specifying the date would only be required for nightly releases.

Reviewed-by: Eric Engestrom <eric@igalia.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36526>
2025-08-03 10:16:19 +00:00
Job Noorman
b101aecb03 ir3: add shader bisect debug tool
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When debugging a problem in a trace, CTS test,... that is caused by a
known compiler feature, the first step is usually to find which shader
causes the problem. This is often non-trivial as the amount of shaders
in a trace can be huge. This commit adds a debugging tool to help with
this.

The idea behind this tool is to assign every shader a deterministic
(pre-compilation) ID that can be used to order shaders. Once we have
this, we can use it to bisect which shader causes the problem. This
obviously only works if the problem can be traced back to a single
shader. In my experience, this is often the case.

This tool reuses the shader cache key as deterministic ID. It is
concatenated with the variant ID to distinguish the different variants
of a shader.

In practice, bisecting the shaders in a test run works like this:
- Gate the problematic compiler feature using ir3_shader_bisect_select;
  E.g., if (ir3_shader_bisect_select(v)) IR3_PASS(...);
- Run test with IR3_SHADER_BISECT_DUMP_IDS_PATH=ids.txt
- Sort ids.txt
- Bisect the shader IDs using IR3_SHADER_BISECT_LO/IR3_SHADER_BISECT_HI.
- Dump the problematic shader using IR3_SHADER_BISECT_DISASM.

A Python script is provided to make all this easier:
- ir3_shader_bisect.py dump-ids -o ids.txt 'test args'
- ir3_shader_bisect.py bisect -i ids.txt 'test args'

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602>
2025-08-03 09:30:49 +00:00
Job Noorman
0a123ce68b ir3: add pointer from ir3_shader_variant to ir3_shader
Needed in the next commit to get the shader key for a variant.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602>
2025-08-03 09:30:49 +00:00
Job Noorman
d36594f7f0 ir3/ra: fix file start wraparound
The initial wraparound was calculated in a way I do not fully
understand. However, it could lead to not starting from register 0 when
a wraparound occurs. This, in turn, could lead to some unnecessary gaps.
Fix this by explicitly setting start to 0 when a wraparound occurs.

Totals from 16452 (10.00% of 164575) affected shaders:
Instrs: 16456187 -> 16449330 (-0.04%); split: -0.14%, +0.10%
CodeSize: 32357818 -> 32345432 (-0.04%); split: -0.14%, +0.10%
NOPs: 3411778 -> 3410810 (-0.03%); split: -0.43%, +0.40%
MOVs: 603559 -> 603199 (-0.06%); split: -0.81%, +0.75%
COVs: 262804 -> 262761 (-0.02%); split: -0.13%, +0.11%
Full: 279264 -> 279179 (-0.03%); split: -0.04%, +0.01%
(ss): 422887 -> 422739 (-0.03%); split: -0.81%, +0.77%
(sy): 188298 -> 188513 (+0.11%); split: -0.53%, +0.65%
(ss)-stall: 1685300 -> 1679865 (-0.32%); split: -0.99%, +0.67%
(sy)-stall: 5797450 -> 5788564 (-0.15%); split: -0.74%, +0.58%
STPs: 18359 -> 18341 (-0.10%); split: -0.14%, +0.04%
LDPs: 32825 -> 32833 (+0.02%); split: -0.22%, +0.24%
Preamble Instrs: 3307822 -> 3308388 (+0.02%); split: -0.31%, +0.33%
Early Preamble: 5853 -> 5852 (-0.02%)
Last helper: 4154632 -> 4164580 (+0.24%); split: -0.34%, +0.58%

Cat0: 3760257 -> 3759249 (-0.03%); split: -0.39%, +0.36%
Cat1: 968587 -> 963086 (-0.57%); split: -0.99%, +0.43%
Cat2: 6133128 -> 6133532 (+0.01%); split: -0.03%, +0.03%
Cat6: 183289 -> 183275 (-0.01%); split: -0.05%, +0.05%
Cat7: 684028 -> 683290 (-0.11%); split: -0.35%, +0.25%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36374>
2025-08-03 08:58:29 +00:00
sarbes
0a12ff6f45 lima: move RSW packing/unpacking to genxml
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This MR removes most magic values of the affected code paths, and makes the code more readable. Parsing of the RSW words is now done by genxml.

v2:
- Renamed varying types
- Removed unnecessary whitespaces

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36401>
2025-08-02 18:26:55 +00:00
Iván Briano
bf8ebb6a7d intel: Re-disable ray tracing on 32 bits
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We had this disabled before moving to the common framework for BVH
building and lost it along the way.

Fixes: f0e18c475b ("intel: remove GRL/intel-clc")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36522>
2025-08-02 00:12:44 +00:00
Yiwei Zhang
83b9c13b6f Revert "android: moving HMI symbol to separate file"
This reverts commit 6c7f7e4953.

The original change wasn't properly reviewed and the rationale was
obscure. Meanwhile, it was for gfxstream Android frontend which was not
built in upstream mesa at all.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36532>
2025-08-01 23:44:49 +00:00
Paulo Zanoni
4c7254d105 zink: new expected failures for sparse depth buffers
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Anv removed support for sparse depth buffers, but some glcts tests try
to use them without first asking if we support them. We'll have to fix
this in the VK-GL-CTS codebase. In the meantime, keep Marge happy.

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>
2025-08-01 14:51:10 -07:00
Paulo Zanoni
ea9f19ac7b anv/sparse: call sparse_image_check_support from get_image_format_properties
Funcion anv_get_image_format_properties() can get called from two
different Vulkan entry points:
  - anv_GetPhysicalDeviceImageFormatProperties2
  - anv_GetPhysicalDeviceSparseImageFormatProperties2

While there is a sparse-named function aimed specifically at sparse
images, you can call vkGetPhysicalDeviceImageFormatProperties2
passing sparse flags in VkPhysicalDeviceImageFormatInfo2::flags. And
when that happens, we need to detect it and properly either return
VK_ERROR_FORMAT_NOT_SUPPORTED or properly set
props->imageFormatProperties->sampleCounts with a value that matches
the sparse usage.

This change affects our behavior in 3 types of cases: color MSAA
cases, depth/stencil MSAA cases and atomic_emulated cases. The
previous patches should have covered these cases, so everything should
be passing now.

v2: Rebase.
v3: Reword the commit message.
v4: Rebase and reword the commit message.

Testcase: dEQP-VK.api.info.sparse_image_format_properties2.2d.optimal.r16g16_unorm
Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm
Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.r64_uint
Reviewed-by: Iván Briano <ivan.briano@intel.com> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>
2025-08-01 14:51:10 -07:00
Paulo Zanoni
a1628aba1f anv/sparse: we can support R64 and other atomics emulated formats
We set sparseImageInt64Atomics to false on these formats, so there's
no need for the software detiling. Thus, we can not set the flag,
which will make ISL pick Tile64 for these formats, and things will
work.

Thanks to Lionel for pointing the fix here.

Testcase: dEQP-VK.api.info.image_format_properties.*d.optimal.r64_*int
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>
2025-08-01 14:51:10 -07:00
Paulo Zanoni
d5da6980d3 anv/sparse: don't support depth/stencil with sparse
We can't support multi-sampling with depth/stencil, only 1x and only
with 2D and sometimes 3D formats. Claim everything as not supported,
since games don't seem to be affected.

This will be noticeable once we fix
anv_GetPhysicalDeviceImageFormatProperties2() to stop (accidentally)
lying about what we support: without this patch we'll get failures.
It seems CTS expects that, if we do support the format, we have to
support it with multi-sampling as well.

Testcase: dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint (and 5 others)
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>
2025-08-01 14:51:10 -07:00
Paulo Zanoni
420cda4798 anv/sparse: allow multiple sample bits in anv_sparse_image_check_support
Prepare this function in a way where the caller is able to pass
multiple sample bits as the 'samples' argument, and add an output to
the function where we return the subset of 'samples' that is actually
valid, when it's valid.

For now none of the two callers is using the new argument, but this
will be changed in the next patch.

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35524>
2025-08-01 14:51:10 -07:00
Paulo Zanoni
1797337efc anv/sparse: declare sparse MSAA block shapes as standard before Xe2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Only Xe2 and newer contain non-standard block shapes for sparse MSAA
images.

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36523>
2025-08-01 21:32:04 +00:00
Paulo Zanoni
c6f832e849 anv/sparse: don't claim Xe2's non-standard MSAA shapes as unsupported
We already advertise residencyStandard2DMultisampleBlockShape to be
false, there's no need to claim these as not supported.

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36523>
2025-08-01 21:32:04 +00:00
Alyssa Rosenzweig
aca4948997 clc: force exact! across libclc
libclc seems to have piles of bugs where it relies on precise floating point
behaviours to meet CL precision requirements but doesn't actually disable fast
math in its own spir-v. I am tired of playing this whack-a-mole game. Let's just
assume that the math in CLC is right and should not be optimized in unsafe ways,
and force the exact bit across libclc. This works around a large class of libclc
bugs that keep cropping up from innocuous NIR changes.

This does not force the exact bit for application shaders using libclc, just for
the calculations inside of libclc itself. This seems like the right tradeoff all
considered, anything "fast" bypasses libclc anyway.

Fixes generated_tests/cl/builtin/math/builtin-float-pow-1.0.generated.cl on
drivers using nir_opt_reassociate, and probably other stuff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36527>
2025-08-01 21:00:47 +00:00
Georg Lehmann
cfd5fbfde1 nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21%
CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15%
Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00%
InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10%
VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91%
Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47%
VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21%
VOPD: 1916 -> 1400 (-26.93%)

These stats looks bad, but it's actually just unlucky RA.
Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16)
should still be a win from a register bandwidth perspective.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:30 +00:00
Georg Lehmann
3168ebe2c5 nir/range_analysis: look through vec2
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58987 -> 58298 (-1.17%)
CodeSize: 402844 -> 397836 (-1.24%)
Latency: 209630 -> 209634 (+0.00%); split: -0.66%, +0.66%
InvThroughput: 230240 -> 229152 (-0.47%); split: -0.48%, +0.00%
VClause: 838 -> 826 (-1.43%); split: -1.55%, +0.12%
Copies: 3019 -> 2954 (-2.15%); split: -2.82%, +0.66%
VALU: 50196 -> 49637 (-1.11%)
VOPD: 1950 -> 1916 (-1.74%); split: +0.72%, -2.46%

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:29 +00:00
Georg Lehmann
caf89c97de nir/range_analysis: look through f2f
Foz-DB Navi31:
Totals from 93 (0.12% of 80273) affected shaders:
Instrs: 123927 -> 121073 (-2.30%); split: -2.30%, +0.00%
CodeSize: 670832 -> 653332 (-2.61%); split: -2.61%, +0.00%
Latency: 337678 -> 322803 (-4.41%); split: -4.41%, +0.00%
InvThroughput: 63277 -> 61083 (-3.47%)
VClause: 460 -> 373 (-18.91%)
SClause: 2178 -> 2100 (-3.58%)
Copies: 7637 -> 7744 (+1.40%)
PreSGPRs: 4414 -> 4287 (-2.88%)
PreVGPRs: 4229 -> 4230 (+0.02%)
VALU: 77375 -> 75693 (-2.17%)
SALU: 16497 -> 16383 (-0.69%); split: -0.73%, +0.04%
VMEM: 561 -> 477 (-14.97%)
SMEM: 3197 -> 3113 (-2.63%)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00
Georg Lehmann
261239a492 nir/opt_algebraic: use range analysis to detect no-op fmin/fmax
Foz-DB Navi31:
Totals from 418 (0.52% of 80273) affected shaders:
Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01%
CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01%
Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00%
Copies: 40126 -> 40125 (-0.00%)
VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01%
SALU: 50290 -> 50283 (-0.01%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00
Georg Lehmann
a0665e79e9 nir/opt_algebraic: push fsat into bcsel with constant
bcsel doesn't have a free clamp modifier on AMD hardware,
but what's inside might have free clamp.

Foz-DB Navi31:
Totals from 873 (1.09% of 80273) affected shaders:
MaxWaves: 22008 -> 21968 (-0.18%)
Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02%
CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01%
VGPRs: 57900 -> 57960 (+0.10%)
Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00%
VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14%
SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02%
Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10%
PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08%
VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01%
SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e9e5146848 nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive
Foz-DB Navi31:
Totals from 946 (1.18% of 80273) affected shaders:
Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00%
CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00%
Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00%
VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00%
SClause: 109694 -> 109688 (-0.01%)
Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01%
Branches: 132636 -> 132633 (-0.00%)
PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00%
VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00%
SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Rob Clark
898fa317dd util: Optimize MESA_TRACE_FUNC()
Avoiding the vsnprintf speeds up drawoverhead -test 3 by 60+% !!

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36492>
2025-08-01 19:58:24 +00:00
Rob Clark
b833bb2df4 freedreno/registers: Fix DBGC_CFG_DBGBUS_SEL_D definition
Offset is the same, but bitfields change between a6xx and a7xx.  Syncing
the change from https://patchwork.freedesktop.org/series/152200/

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>
2025-08-01 19:33:28 +00:00
Rob Clark
a05b6e293c freedreno/crashdec: Add option to export a snapshot
Add support to convert into the "snapshot" format used by internal
tooling.

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>
2025-08-01 19:33:28 +00:00
Rob Clark
08b9d771e3 freedreno/crashdec: Sanitize index-regs section names
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>
2025-08-01 19:33:28 +00:00
Rob Clark
d8840db682 freedreno/decode: Add enum value decoding
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36426>
2025-08-01 19:33:27 +00:00
Job Noorman
c8f9990733 ir3/legalize: prevent infinite loop when inserting (ss)nop
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We need to insert a (ss)nop when an instruction that doesn't support
(ss) needs it. However, when this happens in a block that needs to be
legalized more than once (e.g., because it is in a loop), the (ss)nop
would be inserted every iteration, causing an infinite loop.

Fix this by checking if the previous instructions is a nop and applying
(ss) there.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 5993723471 ("freedreno/a3xx/compiler: scheduling/legalize fixes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36440>
2025-08-01 19:08:23 +00:00
Paulo Zanoni
257e1515e3 brw: null-tile sends don't need to skip L3 on Xe2 and newer
Despite the information in "Overview of Memory Access" (57046), the L3
seems to be smarter on Xe2+. See 4aa3b2d3ad ("anv: LNL+ doesn't need
the special flush for sparse").

The behavior is the same both with vm_bind and TR-TT.

v2: Add some comments (Caio).

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:37 +00:00
Paulo Zanoni
80f01c03ba brw: remove unnecessary casts to unsigned after calling LSC_CACHE()
The macro already casts the values to unsigned.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:37 +00:00
Paulo Zanoni
c845b30a21 brw: adjust comment pasted from a commit message
The comment was pasted from the commit message that added it. Remove
the parts that only make sense in the commit message, not in the final
code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:37 +00:00
Paulo Zanoni
4bb41156b9 brw: mark 'volatile' sends as uncached on LSC messages
The residencyNonResidentStrict property requires that writes to
unbound memory be ignored and reads return zero. We need this
property, otherwise vkd3d will claim we don't support DX12.

If a shader writes to a variable associated with an unbound memory
region (i.e., mapped to a null tile), reads it back (in the same
shader) and expects the value be 0 instead of what is wrote, it has to
use the 'volatile' access qualifier to the variable associated with
the access, otherwise the compiler will be allowed to optmize things
and use the non-zero value.  This is explained in the "Accessing
Unbound Regions" section of the Vulkan spec.

Our hardware adds an extra problem on top of the above. BSpec page
"Overview of Memory Access" (47630, 57046) says:

  "If a read from a Null tile gets a cache-hit in a
   virtually-addressed GPU cache, then the read may not return
   zeroes."

So, when we detect this type of access, we have to turn off the
caching.

There's a proposed Vulkan CTS test that does exactly the above.

No shaders on shader_db seem to be using 'volatile'.

v2:
 - Reorder commit order
 - Rewrite commit message

v3:
 - Rework the patch after Caio pointed out the interaction with
   'coherent'.
 - Remove previous R-B tags due to the patch differences.

v4:
 - Rework the patch and commit message again after further
   discussions.

v5:
 - Check for atomic first so we don't regress DG2 atomic tests.

Fixes future test: dEQP-VK.sparse_resources.buffer.ssbo.read_write.sparse_residency_non_resident_strict

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:37 +00:00
Paulo Zanoni
f7581e4a38 brw: consider 'volatile' memory access when doing CSE
The GLSL spec says (among other things):

  "When a volatile variable is read, its value must be re-fetched from
   the underlying memory, even if the shader invocation performing the
   read had previously fetched its value from the same memory. When a
   volatile variable is written, its value must be written to the
   underlying memory, even if the compiler can conclusively determine
   that its value will be overwritten by a subsequent write."

The SPIR-V spec says (among other things):

  "Accesses to volatile memory cannot be eliminated, duplicated, or
   combined with other accesses."

So in this commit we make sure that both writes and reads marked as
volatile can't be affected by CSE.

v2: Reorder patches in the series.

Credits-to: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Iván Briano <ivan.briano@intel.com> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:36 +00:00
Paulo Zanoni
8e1e3ba152 brw: store 'volatile' GLSL/SPIR-V access in MEMORY_LOGICAL_FLAGS
We seem to be ignoring the 'volatile' keyword coming from the shaders.
Record this in MEMORY_LOGICAL_FLAGS so we can use it later.

Credits-to: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:36 +00:00
Paulo Zanoni
670cd08c68 brw: remove unnecessary <vector> inclusions
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
2025-08-01 18:47:35 +00:00
Jeongik Cha
3e39c09aa0 gfxstream: Generate goldfish dispatch code for AHB extension
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36510>
2025-08-01 18:34:15 +00:00
Daniel Schürmann
4ca3cc5a1a aco/ra: propagate precolor affinities through parallelcopies and tied definitions
Totals from 214 (0.27% of 79839) affected shaders: (Navi48)

Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00%
CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07%
VGPRs: 9984 -> 9960 (-0.24%)
Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 40422 -> 40397 (-0.06%)
Copies: 3180 -> 3155 (-0.79%)
VALU: 38347 -> 38322 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00