fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 05:18:12 +02:00

Author	SHA1	Message	Date
Ian Romanick	e301817753	brw: Don't lower phis involved in DPAS instructions to scalar On my Arc A380 (DG2), this more than doubles the performance of Jeff Bolz's cooperative matrix benchmark. With llama.cpp modified to use cooperative matrix on DG2, performance is improved by 37%. Closes: #15311 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Matt Corallo <git@bluematt.me> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>	2026-04-27 18:09:16 +00:00
Ian Romanick	09b43966ba	brw: Lower all phis to scalar The next commit will cause some very specific phis to not be lowered to scalar, and that's the reason the callback is used instead of nir_lower_all_phis_to_scalar. It's worth noting that the comment in nir_lower_phis_to_scalar.c specifically calls out Deus Ex as the reason some phis should not be lowered. At least on current BRW, zero shaders from Deus Ex trace were affected for spills or fills on any Intel platform. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17050005 -> 17051449 (<.01%) instructions in affected programs: 41032 -> 42476 (3.52%) helped: 29 / HURT: 159 total cycles in shared programs: 876411976 -> 876433702 (<.01%) cycles in affected programs: 1455550 -> 1477276 (1.49%) helped: 40 / HURT: 150 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01% CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02% Send messages: 40870114 -> 40870113 (-0.00%) Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01% Spill count: 3460669 -> 3460240 (-0.01%) Fill count: 4988325 -> 4987891 (-0.01%) Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 48848112 -> 48848128 (+0.00%) Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03% Totals from 5713 (0.28% of 2010434) affected shaders: Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91% CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82% Send messages: 284751 -> 284750 (-0.00%) Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90% Spill count: 857 -> 428 (-50.06%) Fill count: 845 -> 411 (-51.36%) Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40% Max dispatch width: 160416 -> 160432 (+0.01%) Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Matt Corallo <git@bluematt.me> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>	2026-04-27 18:09:16 +00:00
Jaishankar Rajendran	12f43d048e	anv: tune parameters of the ASTC software decoding Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com> Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>	2026-04-27 15:17:04 +00:00
Jaishankar Rajendran	cd941d3970	vulkan/runtime: enable parametrization of ASTC software decode Enable the driver to select : - LUT allocation alignment - LUT memory flags selection Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com> Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>	2026-04-27 15:17:04 +00:00
José Roberto de Souza	965e28ff8a	intel/tools: Fix parse of '[HWCTX].replay_' in aubinator_error_decode_xe This hides [HWCTX].replay_offset and [HWCTX].replay_length for error decoder as those are not relevante when just reading the error decoded. From: GuC ID: 33 Name: bcs33 Class: 3 Logical mask: 0x1 Width: 1 Ref: 65 Timeout: 0 (ms) Timeslice: 1000 (us) Preempt timeout: 640000 (us) HW Context Desc: 0x03862000 HW Ring address: 0x0385e000 HW Indirect Ring State: 0x00000000 LRC Head: (memory) 0 LRC Tail: (internal) 4408, (memory) 4408 Ring start: (memory) 0x0385e000 Start seqno: (memory) -127 Seqno: (memory) -128 Timestamp: 0x00000001 Job Timestamp: 0x0000005c type char: [HWCTX].replay_offset: 0x0 type char: [HWCTX].replay_length: 0x1000 Schedule State: 0x241 Flags: 0x0 To: * Contexts ** GuC ID: 33 Name: bcs33 Class: 3 Logical mask: 0x1 Width: 1 Ref: 65 Timeout: 0 (ms) Timeslice: 1000 (us) Preempt timeout: 640000 (us) HW Context Desc: 0x03862000 HW Ring address: 0x0385e000 HW Indirect Ring State: 0x00000000 LRC Head: (memory) 0 LRC Tail: (internal) 4408, (memory) 4408 Ring start: (memory) 0x0385e000 Start seqno: (memory) -127 Seqno: (memory) -128 Timestamp: 0x00000001 Job Timestamp: 0x0000005c Schedule State: 0x241 Flags: 0x0 Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Carlos Santa <carlos.santa@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41113>	2026-04-24 20:19:09 +00:00
Alyssa Rosenzweig	bccaeb28bb	brw/nir_lower_cs_intrinsics: do some math at 16-bit Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details There are less than 2^16 lanes within a threadgroup, so it is safe to do all math at 16-bit. This allows us to use 16-bit integer division which is much faster than 32-bit integer division (in terms of the lowerings). In a "hello world" kernel with variable wg size, simd32 goes 72 inst -> 57 inst on jay and 82 -> 67 inst on brw. OTOH it's a loss for non-variable wg size, so do it only there to avoid unwelcome stats regresions on Vulkan. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41084>	2026-04-24 17:13:24 +00:00
Caio Oliveira	0422165d9a	brw: Remove various unused fields Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details These are a mix of fields whose last used was removed or fields that were never used, possibly because they remained in a patch while the rest of the code changed before landing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41139>	2026-04-24 15:04:25 +00:00
Sagar Ghuge	f36b6c8f13	anv: Update values for DispatchTimeoutCounter Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details BTD unit will keep accumulating the threads and then eventually dispatch those active threads once it reaches the counter. I guess dispatching too fast will not have full occupancy at the BTD unit, instead we just pick the half of max value for counter. This patch also add drirc option to dispatch_timeout_counter and tweak values internally with respect to HW limits. Default value we have right now is 512 clocks, we can for sure tune it per app. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>	2026-04-24 01:38:20 +00:00
Sagar Ghuge	8a990b5a1c	intel/genxml: Added dispatch timeout counter extended field Since field is split in between multiple fields, we have to manually write the values and refer to Bspec 43851 for exact values. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>	2026-04-24 01:38:20 +00:00
Emma Anholt	01cb024922	ci/intel: Switch over to the new tool for restricted traces. The new tool has much better image diffing presentation (thanks to Danilo's work on turnip's private trace CI), better performance, flake checking within a single run, parallelized downloads along with replays, system monitoring for replay debug (OOMs especially), and DXVK support (I've added a few traces, but not most of the collection because I didn't want to block on stabilizing this job with everything). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41115>	2026-04-23 22:54:12 +00:00
Sagar Ghuge	e65e62b17f	intel/genxml: Disable compute walker mid-thread preemption Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On Xe, we have this bit reversed. It's called Thread preemption Disable. On Xe2+ (Bspec 56590), it's called Thread preemption with option enabled/disabled. AFAIK, we don't support mid-thread preemption. This patch set values properly according to bspec. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41120>	2026-04-23 19:24:41 +00:00
Lionel Landwerlin	b3fe0cb34e	anv: expose VK_KHR_shader_constant_data Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40741>	2026-04-23 19:02:27 +00:00
Tapani Pälli	c105366165	drirc/anv: add flag to disable VK_EXT_subgroup_size_control This can be used to workaround problem cases with application controlled subgroup size. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40813>	2026-04-23 13:16:05 +00:00
Iván Briano	c5edb90046	anv: silence warning Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details ../src/intel/vulkan/genX_init_state.c: In function ‘gfx9_CreateSampler’: ../src/intel/vulkan/genX_init_state.c:1507:40: warning: ‘border_color_offset’ may be used uninitialized [-Wmaybe-uninitialized] 1507 \| sampler_state.BorderColorPointer = border_color_offset; Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41116>	2026-04-22 16:17:35 -07:00
GKraats	3c01e6139a	hasvk: unbreak assert format != ISL_FORMAT_UNSUPPORTED Format is set to ISL_FORMAT_UNSUPPORTED at anv_get_format_plane at src/intel/vulkan_hasvk/anv_formats.c, because Ivy Bridge does not support enough 24 and 48-bits formats. Problem solved by checking format after the call. Signed-off-by: GKraats <vd.kraats@hccnet.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40237>	2026-04-22 20:35:25 +00:00
Valentine Burley	d982092865	anv/ci: Revert ADL VKCTS job to stable 6.17 kernel Xe is unstable on 6.19; revert to the previous stable kernel. https://gitlab.freedesktop.org/mesa/mesa/-/jobs/97945843 https://gitlab.freedesktop.org/mesa/mesa/-/jobs/97944526 Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41112>	2026-04-22 19:29:43 +00:00
Caio Oliveira	26ef12f7c1	brw: Use brw prefix to LSC helpers tied to brw Mapping from BRW ops to LSC ops. And the len() helpers that use the REG_SIZE as unit -- which is a BRW convention. Acked-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41006>	2026-04-22 18:25:41 +00:00
Caio Oliveira	9329da6d88	brw: Don't set saturate for SYNC instruction This helper might be used as by another instruction emission, which itself might have set the saturate bit in the default state. This might result in the SYNC being created already with saturate bit set. Since SYNC doesn't have saturate, clear that field instead of sometimes having it set. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41005>	2026-04-22 16:06:42 +00:00
Lionel Landwerlin	6031d52393	anv: implement VK_EXT_primitive_restart_index Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40776>	2026-04-22 08:52:57 +00:00
Samuel Pitoiset	9d17a7bdb4	spirv,treewide: rework specialization constant Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details With SPV_KHR_constant_data, it's allowed to specialize array of constants. RustiCL changes are from Karol Herbst <kherbst@redhat.com>. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>	2026-04-22 06:57:55 +00:00
Sagar Ghuge	12f81eaa88	anv: Enable dynamic stack ID control on Xe3+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This patch enables dynamic stack ID control on Xe3+. Programmed values are the recommended settings from the Bspec. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41066>	2026-04-22 01:48:19 +00:00
Sagar Ghuge	acecc0f1b3	intel/genxml: Update xml for dynamic stack ID control fields Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41066>	2026-04-22 01:48:18 +00:00
Sagar Ghuge	620835926d	brw: Pass write back register for ray query messages For DG2 (Bspec 47937) has the same programming note as of Xe2+, "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." So this patch is just passing a write back destination register when we have ray query message. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>	2026-04-21 23:16:09 +00:00
José Roberto de Souza	64bc538f5e	intel/brw: Explicitly upcast UB to UW for SHR with vector immediates HW does not allow instructions with vector immediates to cross a GRF boundary if it has a stride. Under register pressure, the register allocator may place a temporary register across such a boundary. To resolve this, we now explicitly emit a MOV to upcast the UB payload into a UW VGRF. This ensures the SHR instruction operates on a dense, well-aligned region that satisfies hardware alignment constraints. Below is the portion of the shader exhibiting this issue: Native code for unnamed fragment shader GLSL6 (src_hash 0x9c84a007) (sha1 48745e7dae90d08f8a9bbe4dbf837de23440c841f0344e669cb8af9df79bce58) SIMD32 shader: 44 instructions. 0 loops. 354 cycles. 0:0 spills:fills, 2 sends, scheduled with mode latency-sensitive. Promoted 0 constants. GRF registers: 22. Non-SSA regs (after NIR): 11. Compacted 800 to 800 bytes (0%) mov(1) f1<1>UW g0.30<0,1,0>UW { align1 WE_all 1N }; mov(1) f1.1<1>UW g1.30<0,1,0>UW { align1 WE_all 1N I@1 }; mov(32) g2<2>UW g0.20<2,8,0>UW { align1 WE_all }; mov(32) g4<2>UW g0.21<2,8,0>UW { align1 WE_all }; mov(32) g8<2>UW g1.20<2,8,0>UW { align1 WE_all }; mov(32) g10<2>UW g1.21<2,8,0>UW { align1 WE_all }; mov(16) g12<4>UB g0.60<1,8,0>UB { align1 1H }; mov(16) g13<4>UB g1.60<1,8,0>UB { align1 2H }; add(32) g0<1>UW g2<16,8,2>UW 0x01000100V { align1 WE_all I@6 }; add(32) g1<1>UW g4<16,8,2>UW 0x01010000V { align1 WE_all I@6 }; add(32) g2<1>UW g8<16,8,2>UW 0x01000100V { align1 WE_all I@6 }; add(32) g3<1>UW g10<16,8,2>UW 0x01010000V { align1 WE_all I@6 }; shr(16) g4<1>UW g12<32,8,4>UB 0x76543210V { align1 1H I@6 }; mov(16) g14.32<4>UB g13<32,8,4>UB { align1 2H I@6 }; sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@6 }; mov(16) g5<1>UW g0<16,8,2>UW { align1 1H }; sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@6 }; mov(16) g0<1>UW g1<16,8,2>UW { align1 1H }; sync nop(1) null<0,1,0>UB { align1 WE_all 5N I@6 }; mov(16) g5.16<1>UW g2<16,8,2>UW { align1 2H }; sync nop(1) null<0,1,0>UB { align1 WE_all 5N I@6 }; mov(16) g0.16<1>UW g3<16,8,2>UW { align1 2H }; shr(16) g4.16<1>UW g14.32<32,8,4>UB 0x76543210V { align1 2H I@5 }; ERROR: Invalid register region for source 0. See special restrictions section. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40856>	2026-04-21 22:51:45 +00:00
Eric R. Smith	4ae192a3d9	glsl, spirv: Improve accuracy of asin() and acos() Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The polynomial used for asin_expr() was suboptimal (and its source was not documented). A better approximation is found in the _Handbook_of_Mathematical_Functions_ by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However, while this approximation gives a good absolute error bound, its relative error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page from the spirv implementation of asin(), we implement a piecewise approximation where a Taylor series is used for small values of \|x\|. This patch also harmonizes the GLSL and Vulkan implementations by moving the implementation to common code (nir_builder). Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0, the original asin() at 32 bits has: ``` glsl spirv RMSE: 1.756451e-04 1.609091e-04 worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001 worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331 ``` whereas the new implementation has for both: ``` RMSE: 2.528056e-05 worst abs error: 4.962087e-05 at 0.451149 worst ulp error: 2379 at 0.215106 ``` Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>	2026-04-21 21:10:22 +00:00
Jordan Justen	fa784fffd0	brw: Don't set header_size at init since it will be re-set in later code Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Ref: `efcba73b49` ("brw: switch to new sampler payload description scheme") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>	2026-04-21 19:23:41 +00:00
José Roberto de Souza	26525ac7ae	anv: Move code to load color border to memory to a function Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>	2026-04-21 19:23:41 +00:00
José Roberto de Souza	83d75a0384	anv: Move init and finish of state pools to its own functions Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>	2026-04-21 19:23:41 +00:00
José Roberto de Souza	a4c22baeb4	anv: Move VMA heaps init and finish of vma heaps to anv_va.c Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>	2026-04-21 19:23:40 +00:00
José Roberto de Souza	32f3d6486c	anv: Change fill_inline_params() first parameter from struct GENX(COMPUTE_WALKER_BODY) to uint32_t * This will make this function more generic allowing us to use it for COMPUTE_WALKER_2. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>	2026-04-21 19:23:40 +00:00
Lionel Landwerlin	b0c17357db	intel/ci: update expectation for RPL This fails everywhere but CI only run this test on RPL. A CTS fix has been merged in main. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>	2026-04-21 16:29:14 +00:00
Lionel Landwerlin	eda83bc2b6	anv: add a pass to realign global loads on DX CBV resources CBV resources are supposed to be 256B aligned (D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT). vkd3d-proton will puts CBV addresses in the push constant data and do global loads on them. Unfortunately those loads don't have a 256B alignment value on them. So when looking at what we can promote to HW push buffers, we can't consider them. This change introduces a detection pass for CBV resources (according to vkd3d-proton devs those are 64KiB in size) and realign the loads to be 256B aligned. This is only enabled on DX emulation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>	2026-04-21 16:29:14 +00:00
Lionel Landwerlin	bba428ce3f	anv: promote push constant pointers to push buffers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>	2026-04-21 16:29:14 +00:00
Lionel Landwerlin	0539f26065	brw: track push constants shader stats Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>	2026-04-21 16:29:14 +00:00
Sagar Ghuge	7a627fa8f3	anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH levels. Fixes: `cb423ee6` ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921") Fixes: `c1a44e8d` ("anv: force StackIDControl value for Wa_14021821874") Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012>	2026-04-21 01:38:34 +00:00
Alyssa Rosenzweig	fd46a48ccc	jay/ra: only use stride=4 temps Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details SIMD16: Totals from 56 (2.12% of 2647) affected shaders: Instrs: 541831 -> 542004 (+0.03%); split: -0.40%, +0.44% CodeSize: 8597680 -> 8597248 (-0.01%); split: -0.45%, +0.44% SIMD32: Totals: Instrs: 4858179 -> 4734713 (-2.54%); split: -2.78%, +0.24% CodeSize: 78651424 -> 76667440 (-2.52%); split: -2.76%, +0.24% Totals from 1108 (41.86% of 2647) affected shaders: Instrs: 4241312 -> 4117846 (-2.91%); split: -3.18%, +0.27% CodeSize: 68753152 -> 66769168 (-2.89%); split: -3.16%, +0.27% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig	1f62da938b	jay/ra: drop memory copy reordering No shader-db changes, and no longer required for correctness. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig	45845ea7f2	jay/ra: use accumulator for stride=4 swaps SIMD16: Totals: Instrs: 2767930 -> 2767190 (-0.03%) CodeSize: 44327408 -> 44312304 (-0.03%); split: -0.04%, +0.00% Totals from 142 (5.36% of 2647) affected shaders: Instrs: 658928 -> 658188 (-0.11%) CodeSize: 10514512 -> 10499408 (-0.14%); split: -0.16%, +0.01% SIMD32: Totals: Instrs: 4884039 -> 4858179 (-0.53%) CodeSize: 79079008 -> 78651424 (-0.54%); split: -0.54%, +0.00% Totals from 761 (28.75% of 2647) affected shaders: Instrs: 3803274 -> 3777414 (-0.68%) CodeSize: 61707728 -> 61280144 (-0.69%); split: -0.70%, +0.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig	489f883277	jay/ra: use accumulator for memory swaps SIMD1: Totals from 34 (1.28% of 2647) affected shaders: Instrs: 427731 -> 434349 (+1.55%); split: -0.03%, +1.58% CodeSize: 6773248 -> 6881136 (+1.59%); split: -0.04%, +1.63% Number of spill instructions: 1833 -> 1700 (-7.26%) Number of fill instructions: 2095 -> 1944 (-7.21%) SIMD32: Totals from 621 (23.46% of 2647) affected shaders: Instrs: 3663406 -> 3739089 (+2.07%); split: -0.62%, +2.68% CodeSize: 59392464 -> 60624704 (+2.07%); split: -0.61%, +2.68% Number of spill instructions: 52115 -> 50109 (-3.85%); split: -3.90%, +0.05% Number of fill instructions: 53864 -> 51355 (-4.66%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig	2e5fd6da42	jay/ra: use accumulator for memory copies SIMD16: Totals from 34 (1.28% of 2647) affected shaders: Instrs: 424527 -> 427731 (+0.75%); split: -0.03%, +0.78% CodeSize: 6720896 -> 6773248 (+0.78%); split: -0.04%, +0.82% Number of spill instructions: 1967 -> 1833 (-6.81%) Number of fill instructions: 2247 -> 2095 (-6.76%) SIMD32: Totals: Instrs: 4691989 -> 4808356 (+2.48%); split: -0.46%, +2.94% CodeSize: 76011248 -> 77884320 (+2.46%); split: -0.46%, +2.92% Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19% Number of fill instructions: 56519 -> 53864 (-4.70%) Totals from 606 (22.89% of 2647) affected shaders: Instrs: 3509511 -> 3625878 (+3.32%); split: -0.61%, +3.93% CodeSize: 56909488 -> 58782560 (+3.29%); split: -0.61%, +3.90% Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19% Number of fill instructions: 56519 -> 53864 (-4.70%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig	7d2a88a9e5	jay/ra: don't reserve registers when not spilling No changes at SIMD16. At SIMD32: Totals: Instrs: 4691895 -> 4691989 (+0.00%); split: -0.03%, +0.03% CodeSize: 76010880 -> 76011248 (+0.00%); split: -0.03%, +0.03% Number of spill instructions: 54369 -> 54223 (-0.27%) Number of fill instructions: 56668 -> 56519 (-0.26%) Totals from 71 (2.68% of 2647) affected shaders: Instrs: 75963 -> 76057 (+0.12%); split: -1.67%, +1.79% CodeSize: 1229792 -> 1230160 (+0.03%); split: -1.71%, +1.74% Number of spill instructions: 146 -> 0 (-inf%) Number of fill instructions: 149 -> 0 (-inf%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig	e5bf153d4f	jay/lower_post_ra: drop old 2<-->8 lowering this XOR based lowering is no longer needed. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig	915af8e121	jay/lower_post_ra: remove SWAP macro Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig	4c5ad7a832	jay/register_allocate: start using accumulators this lets us lower away 8<-->2 copies/swaps in a faster, more straightforward way by (ab)using accumulators. I think as an edge case this plays nicely enough with my plans to profit from accs for normal fma-heavy code. SIMD16: Totals: Instrs: 2761525 -> 2758108 (-0.12%) CodeSize: 44222384 -> 44167168 (-0.12%) Totals from 33 (1.25% of 2647) affected shaders: Instrs: 422130 -> 418713 (-0.81%) CodeSize: 6713680 -> 6658464 (-0.82%) SIMD32: Totals: Instrs: 4911601 -> 4691895 (-4.47%) CodeSize: 79553984 -> 76010880 (-4.45%) Totals from 947 (35.78% of 2647) affected shaders: Instrs: 4143501 -> 3923795 (-5.30%) CodeSize: 67174592 -> 63631488 (-5.27%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig	53c1c076a8	jay: validate non-SSA accumulators just enough for us to do parallel copy lowering with them. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig	28cf0f52c1	jay/to_binary: handle packing accumulators Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig	aa37d8b248	jay/print: deal with bare r0 copies Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:09 +00:00
Kenneth Graunke	e55af8793f	jay: Add missing ROR case Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig	6c862b1951	jay: fix SEL types SEL.f32 flushes denorms but SEL.u32 does not. That means changing the type of the SEL is only justified if we know we're used as a float. This fixes miscompilation in cases like: ieq(1, bcsel(a, fneg(b), c)) Previously we'd be too greedy and form (a) SEL.f32 t, -b, c cmp.u32 t, 1 But that would inadvertently flush c which is an integer here. So just set the type based on what we're used as. Some regressions due to is_only_used_as_float not seeing through phis (..could probably be fixed?). Totals: Instrs: 2760796 -> 2761525 (+0.03%); split: -0.06%, +0.08% CodeSize: 44244128 -> 44222384 (-0.05%); split: -0.13%, +0.08% Totals from 945 (35.70% of 2647) affected shaders: Instrs: 1968645 -> 1969374 (+0.04%); split: -0.08%, +0.11% CodeSize: 31721968 -> 31700224 (-0.07%); split: -0.17%, +0.11% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig	b5898a418b	jay: relax mov type check prevents regression with next patch which turns u32 into s32. Totals: Instrs: 2764288 -> 2760796 (-0.13%) CodeSize: 44299920 -> 44244128 (-0.13%); split: -0.13%, +0.00% Totals from 193 (7.29% of 2647) affected shaders: Instrs: 255455 -> 251963 (-1.37%) CodeSize: 4160400 -> 4104608 (-1.34%); split: -1.34%, +0.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>	2026-04-20 22:32:07 +00:00

1 2 3 4 5 ...

15916 commits