Sagar Ghuge
620835926d
brw: Pass write back register for ray query messages
...
For DG2 (Bspec 47937) has the same programming note as of Xe2+,
"When this bit is set in the header, Trace Ray Message behaves like a
Ray Query. This message requires a write-back message indicating
RayQuery for all valid Rays (SIMD lanes) have completed."
So this patch is just passing a write back destination register when we
have ray query message.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039 >
2026-04-21 23:16:09 +00:00
José Roberto de Souza
64bc538f5e
intel/brw: Explicitly upcast UB to UW for SHR with vector immediates
...
HW does not allow instructions with vector immediates to cross a GRF boundary if
it has a stride.
Under register pressure, the register allocator may place a temporary register
across such a boundary.
To resolve this, we now explicitly emit a MOV to upcast the UB payload into a
UW VGRF.
This ensures the SHR instruction operates on a dense, well-aligned region that
satisfies hardware alignment constraints.
Below is the portion of the shader exhibiting this issue:
Native code for unnamed fragment shader GLSL6 (src_hash 0x9c84a007) (sha1 48745e7dae90d08f8a9bbe4dbf837de23440c841f0344e669cb8af9df79bce58)
SIMD32 shader: 44 instructions. 0 loops. 354 cycles. 0:0 spills:fills, 2 sends, scheduled with mode latency-sensitive. Promoted 0 constants. GRF registers: 22. Non-SSA regs (after NIR): 11. Compacted 800 to 800 bytes (0%)
mov(1) f1<1>UW g0.30<0,1,0>UW { align1 WE_all 1N };
mov(1) f1.1<1>UW g1.30<0,1,0>UW { align1 WE_all 1N I@1 };
mov(32) g2<2>UW g0.20<2,8,0>UW { align1 WE_all };
mov(32) g4<2>UW g0.21<2,8,0>UW { align1 WE_all };
mov(32) g8<2>UW g1.20<2,8,0>UW { align1 WE_all };
mov(32) g10<2>UW g1.21<2,8,0>UW { align1 WE_all };
mov(16) g12<4>UB g0.60<1,8,0>UB { align1 1H };
mov(16) g13<4>UB g1.60<1,8,0>UB { align1 2H };
add(32) g0<1>UW g2<16,8,2>UW 0x01000100V { align1 WE_all I@6 };
add(32) g1<1>UW g4<16,8,2>UW 0x01010000V { align1 WE_all I@6 };
add(32) g2<1>UW g8<16,8,2>UW 0x01000100V { align1 WE_all I@6 };
add(32) g3<1>UW g10<16,8,2>UW 0x01010000V { align1 WE_all I@6 };
shr(16) g4<1>UW g12<32,8,4>UB 0x76543210V { align1 1H I@6 };
mov(16) g14.32<4>UB g13<32,8,4>UB { align1 2H I@6 };
sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@6 };
mov(16) g5<1>UW g0<16,8,2>UW { align1 1H };
sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@6 };
mov(16) g0<1>UW g1<16,8,2>UW { align1 1H };
sync nop(1) null<0,1,0>UB { align1 WE_all 5N I@6 };
mov(16) g5.16<1>UW g2<16,8,2>UW { align1 2H };
sync nop(1) null<0,1,0>UB { align1 WE_all 5N I@6 };
mov(16) g0.16<1>UW g3<16,8,2>UW { align1 2H };
shr(16) g4.16<1>UW g14.32<32,8,4>UB 0x76543210V { align1 2H I@5 };
ERROR: Invalid register region for source 0. See special restrictions section.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40856 >
2026-04-21 22:51:45 +00:00
Eric R. Smith
4ae192a3d9
glsl, spirv: Improve accuracy of asin() and acos()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The polynomial used for asin_expr() was suboptimal (and its source was
not documented).
A better approximation is found in the _Handbook_of_Mathematical_Functions_
by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However,
while this approximation gives a good absolute error bound, its relative
error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page
from the spirv implementation of asin(), we implement a piecewise
approximation where a Taylor series is used for small values of |x|.
This patch also harmonizes the GLSL and Vulkan implementations by moving
the implementation to common code (nir_builder).
Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0,
the original asin() at 32 bits has:
```
glsl spirv
RMSE: 1.756451e-04 1.609091e-04
worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001
worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331
```
whereas the new implementation has for both:
```
RMSE: 2.528056e-05
worst abs error: 4.962087e-05 at 0.451149
worst ulp error: 2379 at 0.215106
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862 >
2026-04-21 21:10:22 +00:00
Jordan Justen
fa784fffd0
brw: Don't set header_size at init since it will be re-set in later code
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ref: efcba73b49 ("brw: switch to new sampler payload description scheme")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035 >
2026-04-21 19:23:41 +00:00
José Roberto de Souza
26525ac7ae
anv: Move code to load color border to memory to a function
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035 >
2026-04-21 19:23:41 +00:00
José Roberto de Souza
83d75a0384
anv: Move init and finish of state pools to its own functions
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035 >
2026-04-21 19:23:41 +00:00
José Roberto de Souza
a4c22baeb4
anv: Move VMA heaps init and finish of vma heaps to anv_va.c
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035 >
2026-04-21 19:23:40 +00:00
José Roberto de Souza
32f3d6486c
anv: Change fill_inline_params() first parameter from struct GENX(COMPUTE_WALKER_BODY) to uint32_t *
...
This will make this function more generic allowing us to use it for
COMPUTE_WALKER_2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035 >
2026-04-21 19:23:40 +00:00
Jesse Natalie
6f8656ec64
microsoft/compiler: Back-propagate interpolator modes from FS
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41067 >
2026-04-21 18:31:31 +00:00
Erik Faye-Lund
c4287eaa04
gallium: delete leftovers of post-processing infrastructure
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was removed, but driconfs and docs were left behind.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41048 >
2026-04-21 18:04:11 +00:00
Erik Faye-Lund
8259e06645
haiku: remove unfinished post-processing support
...
This doesnt' work, because pp_init_fbos and pp_run aren't wired up and
no filters ever gets enabled.
But the post processing infrastructure has been removed, so let's just
delete this code. This gives the code a chance of compiling!
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41048 >
2026-04-21 18:04:11 +00:00
Eric Engestrom
4731fc588e
docs: add stub of vk_struct_type_cast.h for vk_util.h
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029 >
2026-04-21 17:29:04 +00:00
Samuel Pitoiset
ebf2797da2
vulkan,treewide: stop passing vk_device to vk_pipeline_robustness_state_fill()
...
This will be helpful for RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029 >
2026-04-21 17:29:04 +00:00
Samuel Pitoiset
b7a8b09b21
vulkan: pre-compute the default robustness state in the device
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029 >
2026-04-21 17:29:04 +00:00
Samuel Pitoiset
5828ebeb70
vulkan: refactor vk_pipeline_robustness_state_fill() slightly
...
Overwrite the default device state only when requested.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029 >
2026-04-21 17:29:04 +00:00
Lionel Landwerlin
b0c17357db
intel/ci: update expectation for RPL
...
This fails everywhere but CI only run this test on RPL.
A CTS fix has been merged in main.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451 >
2026-04-21 16:29:14 +00:00
Lionel Landwerlin
eda83bc2b6
anv: add a pass to realign global loads on DX CBV resources
...
CBV resources are supposed to be 256B aligned
(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT).
vkd3d-proton will puts CBV addresses in the push constant data and do
global loads on them. Unfortunately those loads don't have a 256B
alignment value on them. So when looking at what we can promote to HW
push buffers, we can't consider them.
This change introduces a detection pass for CBV resources (according
to vkd3d-proton devs those are 64KiB in size) and realign the loads to
be 256B aligned.
This is only enabled on DX emulation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451 >
2026-04-21 16:29:14 +00:00
Lionel Landwerlin
bba428ce3f
anv: promote push constant pointers to push buffers
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451 >
2026-04-21 16:29:14 +00:00
Lionel Landwerlin
0539f26065
brw: track push constants shader stats
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451 >
2026-04-21 16:29:14 +00:00
squidbus
f59734d5d3
kk: Use device limits for buffers and compute shared memory.
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Metal provides these limits as properties of MTLDevice, which can be
used instead of hardcoding them.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41077 >
2026-04-21 14:11:08 +00:00
Valentine Burley
17d03d98c7
ci/zink/intel: Disable flaky TGL canvas_moire-v2 trace
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41083 >
2026-04-21 15:37:19 +02:00
Lionel Landwerlin
b10be13434
ci/zink/intel: disable TGL demo-v2 trace
...
Flaky trace, renders at the wrong resolution (32x32).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41083 >
2026-04-21 15:41:28 +03:00
Rhys Perry
bddd8b36a6
aco: use RegisterDemand::operator[] more
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690 >
2026-04-21 11:16:26 +00:00
Rhys Perry
176b075129
aco: prefer spilling smaller temporaries if it finishes spilling
...
fossil-db stats seemed less positive when updating process_block() too.
fossil-db (navi31):
Totals from 41 (0.05% of 84369) affected shaders:
Instrs: 294758 -> 294694 (-0.02%); split: -0.11%, +0.09%
CodeSize: 1566136 -> 1564392 (-0.11%); split: -0.21%, +0.10%
SpillSGPRs: 2306 -> 2143 (-7.07%); split: -8.37%, +1.30%
Latency: 3877251 -> 3868194 (-0.23%); split: -0.29%, +0.05%
InvThroughput: 881747 -> 882352 (+0.07%); split: -0.01%, +0.08%
SClause: 6498 -> 6494 (-0.06%); split: -0.09%, +0.03%
Copies: 33582 -> 33900 (+0.95%); split: -0.23%, +1.18%
Branches: 6799 -> 6801 (+0.03%)
VALU: 192977 -> 192646 (-0.17%); split: -0.21%, +0.04%
SALU: 28082 -> 28395 (+1.11%); split: -0.27%, +1.39%
VOPD: 1939 -> 1959 (+1.03%); split: +1.19%, -0.15%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690 >
2026-04-21 11:16:26 +00:00
Rhys Perry
0ffbc30d7f
aco: refactor spiller to use spills_needed variable
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690 >
2026-04-21 11:16:26 +00:00
Samuel Pitoiset
e60b49a3f6
radv/ci: document more HIC regressions on NAVI10
...
addrlib support for HIC needs more bugfixes and AMD is aware.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40996 >
2026-04-21 10:14:43 +00:00
Samuel Pitoiset
87e95c5e50
radv: advertise VK_EXT_host_image_copy by default on GFX10.3+
...
Latest addrlib supports SIMD (AVX2) and it's definitely fast enough to
be used in production now.
GFX10 is still not enabled by default due to some regressions from the
addrlib bump, also still missing AVX for some formats.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40996 >
2026-04-21 10:14:43 +00:00
Samuel Pitoiset
aea04d11b7
amd: allow addrlib to enable SIMD if possible
...
The SIMD variants are way faster, the order of magnitude seems x10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40996 >
2026-04-21 10:14:42 +00:00
Caius-Moldovan-img
daeb52d38d
pco: Replace nir_shader_lower_instructions with nir_shader_*_pass
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Caius Moldovan <caius.moldovan@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40390 >
2026-04-21 09:17:28 +00:00
David Rosca
27dbe82800
ac/parse_ib: Fix printing enc recon VAs on VCN5
...
Fixes: f8f80c3700 ("ac/parse_ib: Fix VCN address parsing")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41025 >
2026-04-21 08:09:09 +00:00
Samuel Pitoiset
1fc8683281
radv: allow depth+stencil formats with host image copy
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Not super useful but it's supported. The NAVI10 crashes are expected
and they are due to a bug in addrlib.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41000 >
2026-04-21 08:57:31 +02:00
Samuel Pitoiset
4de652c78b
radv: add depth+stencil formats support with host image copy
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41000 >
2026-04-21 08:57:31 +02:00
Samuel Pitoiset
fd95195f45
ac/surface: add stencil-only support for host mem->surf copies
...
It's needed to tweak the surface info and to adjust the base pointer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41000 >
2026-04-21 08:57:31 +02:00
Brandon Jones
d1dd65d425
nir/opt_algebraic: fix fabs optimization
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a regression found in blender's unit testing, which called
fabs(-0.0) and invoked an NIR optimization that is was not valid for
the parameter -0.0. IEEE 754 requires that abs clear the sign bit
for the value -0.0.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060 >
2026-04-21 04:10:29 +00:00
jinmiliu
e5392e3d5f
mesa/st: Set protected content context flag based on pipe context attributes
...
If the PIPE_CONTEXT_PROTECTED flag is set in the context attributes,
propagate this by enabling GL_CONTEXT_FLAG_PROTECTED_CONTENT_BIT_EXT
on the corresponding Mesa GL context.
Signed-off-by: jinmiliu <jinming.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40998 >
2026-04-21 03:14:35 +00:00
Sagar Ghuge
7a627fa8f3
anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.
Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012 >
2026-04-21 01:38:34 +00:00
Alyssa Rosenzweig
6397ddd15d
gallium: Drop post-processing filters
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.co
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5448 >
2026-04-20 22:58:39 +00:00
Alyssa Rosenzweig
168141fbac
gallium: Drop users of post-processing filters
...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5448 >
2026-04-20 22:58:39 +00:00
Alyssa Rosenzweig
fd46a48ccc
jay/ra: only use stride=4 temps
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SIMD16:
Totals from 56 (2.12% of 2647) affected shaders:
Instrs: 541831 -> 542004 (+0.03%); split: -0.40%, +0.44%
CodeSize: 8597680 -> 8597248 (-0.01%); split: -0.45%, +0.44%
SIMD32:
Totals:
Instrs: 4858179 -> 4734713 (-2.54%); split: -2.78%, +0.24%
CodeSize: 78651424 -> 76667440 (-2.52%); split: -2.76%, +0.24%
Totals from 1108 (41.86% of 2647) affected shaders:
Instrs: 4241312 -> 4117846 (-2.91%); split: -3.18%, +0.27%
CodeSize: 68753152 -> 66769168 (-2.89%); split: -3.16%, +0.27%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
1f62da938b
jay/ra: drop memory copy reordering
...
No shader-db changes, and no longer required for correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
45845ea7f2
jay/ra: use accumulator for stride=4 swaps
...
SIMD16:
Totals:
Instrs: 2767930 -> 2767190 (-0.03%)
CodeSize: 44327408 -> 44312304 (-0.03%); split: -0.04%, +0.00%
Totals from 142 (5.36% of 2647) affected shaders:
Instrs: 658928 -> 658188 (-0.11%)
CodeSize: 10514512 -> 10499408 (-0.14%); split: -0.16%, +0.01%
SIMD32:
Totals:
Instrs: 4884039 -> 4858179 (-0.53%)
CodeSize: 79079008 -> 78651424 (-0.54%); split: -0.54%, +0.00%
Totals from 761 (28.75% of 2647) affected shaders:
Instrs: 3803274 -> 3777414 (-0.68%)
CodeSize: 61707728 -> 61280144 (-0.69%); split: -0.70%, +0.00%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
489f883277
jay/ra: use accumulator for memory swaps
...
SIMD1:
Totals from 34 (1.28% of 2647) affected shaders:
Instrs: 427731 -> 434349 (+1.55%); split: -0.03%, +1.58%
CodeSize: 6773248 -> 6881136 (+1.59%); split: -0.04%, +1.63%
Number of spill instructions: 1833 -> 1700 (-7.26%)
Number of fill instructions: 2095 -> 1944 (-7.21%)
SIMD32:
Totals from 621 (23.46% of 2647) affected shaders:
Instrs: 3663406 -> 3739089 (+2.07%); split: -0.62%, +2.68%
CodeSize: 59392464 -> 60624704 (+2.07%); split: -0.61%, +2.68%
Number of spill instructions: 52115 -> 50109 (-3.85%); split: -3.90%, +0.05%
Number of fill instructions: 53864 -> 51355 (-4.66%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
2e5fd6da42
jay/ra: use accumulator for memory copies
...
SIMD16:
Totals from 34 (1.28% of 2647) affected shaders:
Instrs: 424527 -> 427731 (+0.75%); split: -0.03%, +0.78%
CodeSize: 6720896 -> 6773248 (+0.78%); split: -0.04%, +0.82%
Number of spill instructions: 1967 -> 1833 (-6.81%)
Number of fill instructions: 2247 -> 2095 (-6.76%)
SIMD32:
Totals:
Instrs: 4691989 -> 4808356 (+2.48%); split: -0.46%, +2.94%
CodeSize: 76011248 -> 77884320 (+2.46%); split: -0.46%, +2.92%
Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19%
Number of fill instructions: 56519 -> 53864 (-4.70%)
Totals from 606 (22.89% of 2647) affected shaders:
Instrs: 3509511 -> 3625878 (+3.32%); split: -0.61%, +3.93%
CodeSize: 56909488 -> 58782560 (+3.29%); split: -0.61%, +3.90%
Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19%
Number of fill instructions: 56519 -> 53864 (-4.70%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
7d2a88a9e5
jay/ra: don't reserve registers when not spilling
...
No changes at SIMD16. At SIMD32:
Totals:
Instrs: 4691895 -> 4691989 (+0.00%); split: -0.03%, +0.03%
CodeSize: 76010880 -> 76011248 (+0.00%); split: -0.03%, +0.03%
Number of spill instructions: 54369 -> 54223 (-0.27%)
Number of fill instructions: 56668 -> 56519 (-0.26%)
Totals from 71 (2.68% of 2647) affected shaders:
Instrs: 75963 -> 76057 (+0.12%); split: -1.67%, +1.79%
CodeSize: 1229792 -> 1230160 (+0.03%); split: -1.71%, +1.74%
Number of spill instructions: 146 -> 0 (-inf%)
Number of fill instructions: 149 -> 0 (-inf%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
e5bf153d4f
jay/lower_post_ra: drop old 2<-->8 lowering
...
this XOR based lowering is no longer needed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
915af8e121
jay/lower_post_ra: remove SWAP macro
...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
4c5ad7a832
jay/register_allocate: start using accumulators
...
this lets us lower away 8<-->2 copies/swaps in a faster, more straightforward
way by (ab)using accumulators. I think as an edge case this plays nicely enough
with my plans to profit from accs for normal fma-heavy code.
SIMD16:
Totals:
Instrs: 2761525 -> 2758108 (-0.12%)
CodeSize: 44222384 -> 44167168 (-0.12%)
Totals from 33 (1.25% of 2647) affected shaders:
Instrs: 422130 -> 418713 (-0.81%)
CodeSize: 6713680 -> 6658464 (-0.82%)
SIMD32:
Totals:
Instrs: 4911601 -> 4691895 (-4.47%)
CodeSize: 79553984 -> 76010880 (-4.45%)
Totals from 947 (35.78% of 2647) affected shaders:
Instrs: 4143501 -> 3923795 (-5.30%)
CodeSize: 67174592 -> 63631488 (-5.27%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
53c1c076a8
jay: validate non-SSA accumulators
...
just enough for us to do parallel copy lowering with them.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
28cf0f52c1
jay/to_binary: handle packing accumulators
...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
aa37d8b248
jay/print: deal with bare r0 copies
...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064 >
2026-04-20 22:32:09 +00:00