fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 17:58:09 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	2418525b2e	anv: avoid 64bit atomics emulation on Xe2+ Xe2+ still requires lowering 64bit image load/store to 2x32bit for the message format. But atomics work without lowering. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34876>	2025-05-23 10:04:29 +00:00
Paulo Zanoni	d77b49eb0a	anv/trtt: don't avoid the TR-TT submission when there is stuff to signal Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When an application issues a sparse binding operation, it may be the case that the state the app is setting is the state that is already there. In that case, both n_l3l2_binds and n_l1_binds are zero, so the batch doesn't contain anything and, since `0802bbd486`, we just skip the batch submission and return. The problem is that skipping the batch submission and returning ignores the synchronization: there may be syncobjs that we have to wait and, more importantly, there may be syncobjs that we have to signal. This case is exercised by vkd3d-proton's test suite, but I'm not aware of any other workload that triggers it. This commit only affects Meteor Lake and older, as TR-TT is only the default behavior for the platforms running i915.ko. Testcase: vkd3d-proton/d3d12/test_sparse_buffer_memory_lifetime Fixes: `0802bbd486` ("anv/trtt: don't submit empty batches when there are no binds to do") Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35078>	2025-05-23 00:17:18 +00:00
Sushma Venkatesh Reddy	6d226ceca1	intel/compiler: Call brw_try_override_assembly independent of debug flag Previously, brw_try_override_assembly was only called when a debug flag was enabled. However, during investigations involving workloads such as Steam games, enabling the debug flag results in excessive NIR and ISA output to stderr, making debugging more difficult. This change ensures that brw_try_override_assembly is called when the INTEL_SHADER_ASM_READ_PATH is set, regardless of the debug flag. This improves usability in scenarios where minimal debug output is desired. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35115>	2025-05-22 21:45:38 +00:00
Dylan Baker	51b51eb676	anv: Add comment why we overmap and then unmap a region Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>	2025-05-22 21:14:26 +00:00
Dylan Baker	25dd3923dc	anv: attempt to make coverity happy Coverity is upset that we're using `ptr` after we've `munmap`ed up to the offset of the region, even though we're just moving past the unmapped region to the still mapped region. Attempt to make it happy by doing that calculation before unmapping. If it's still mad there's nothing left we can do. CID: 1646981 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> CID: 1646956 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>	2025-05-22 21:14:26 +00:00
Dylan Baker	ff5cb90880	anv: avoid potential integer overflow Coverity points out that we're using a 32bit type on the left side here, so the entire operation is done as 32 bit instead of 64 CID: 1646960 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>	2025-05-22 21:14:26 +00:00
Dylan Baker	2a3cf70db8	blorp: cast uint32_t -> int64_t to avoid potential overflow In practice, I don't think it's actually going to overflow, but it could in theory, which coverity is pointing out. CID: 1647010 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>	2025-05-22 21:14:26 +00:00
Lionel Landwerlin	b036d2ded2	hasvk/elk: stop turning load_push_constants into load_uniform Those intrinsics have different semantics in particular with regards to divergence. Turning one into the other without invalidating the divergence information breaks NIR validation. But also the conversion means we get artificially less convergent values in the shaders. So just handle load_push_constants in the backend and stop changing things in Hasvk. Fixes a bunch of tests in dEQP-VK.descriptor_indexing.* dEQP-VK.pipeline..push_constant.graphics_pipeline.dynamic_index_ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>	2025-05-22 07:49:20 +00:00
Lionel Landwerlin	df15968813	anv/brw: stop turning load_push_constants into load_uniform Those intrinsics have different semantics in particular with regards to divergence. Turning one into the other without invalidating the divergence information breaks NIR validation. But also the conversion means we get artificially less convergent values in the shaders. So just handle load_push_constants in the backend and stop changing things in Anv. Fixes a bunch of tests in dEQP-VK.descriptor_indexing.* dEQP-VK.pipeline..push_constant.graphics_pipeline.dynamic_index_ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>	2025-05-22 07:49:20 +00:00
Lionel Landwerlin	b204153106	anv: add a comment about Wa_14016820455 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35101>	2025-05-22 07:31:49 +00:00
Sushma Venkatesh Reddy	524733a990	intel/compiler: Centralize type stomping logic for Gen12.5 restrictions This patch improves code readability by centralizing the type stomping logic for Gen12.5 region restrictions in `brw_lower_alu_restrictions`. It removes redundant comments and ensures type consistency assertions in `brw_broadcast`, `generate_mov_indirect`, and `generate_shuffle`. Thank you Ken for guiding me on this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35006>	2025-05-22 06:46:18 +00:00
Valentine Burley	dc483ea924	ci: Remove firmware from test-base Firmware packages continue to grow in size, so stop installing them in the test-base image. The necessary firmware is now collected and uploaded per vendor in an external repository. LAVA devices can opt into optional firmware by specifying the name of the archive via LAVA_FIRMWARE. For bare-metal, Qualcomm firmware required for DUTs in the Google lab is included in the baremetal image. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13051 Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34861>	2025-05-21 08:48:15 +00:00
Iván Briano	815bcda06d	anv: enable VK_KHR_fragment_shader_barycentric Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:59 +00:00
Iván Briano	6268792a29	anv: set HW state for fragment shader barycentric When the FS requires it, set the VertexAttributeBypass and LegacyBaryAssignmentDisable bits accordingly. Also program the provoking vertex to give us the per-vertex attributes in the order the Vulkan specification dictates, and track its dynamic value for the FS to pick up constant interpolated inputs correctly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:59 +00:00
Iván Briano	27a2f6d1ff	brw: add lowering passes for FS barycentric inputs Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:59 +00:00
Iván Briano	8ee14e5291	brw/anv: add provoking vertex to fs_msaa_flags This will be necessary to select the right value for flat inputs in fragment shaders when fragment shader barycentrics are in use. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Iván Briano	acdd30a9da	brw: check if the FS needs vertex_attributes_bypass to be set Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Iván Briano	c327b83706	brw: implement load_input_vertex intrinsic Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Iván Briano	4c1f9554f5	intel/genxml: update some instructions for Xe2+ 3DSTATE_CLIP and 3DSTATE_SF add: - Triangle Strip Odd Provoking Vertex Select 3DSTATE_RASTER: - Legacy Bary Assignment Disable 3DSTATE_SBE: - Vertex Attributes Bypass Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Aditya Swarup	6528d267ef	anv: Disable fast clear when surface width is 16k HSD 16023071695 description mentions we need to extend WA_16021232440 to cover the case when surface width is 16k. BSpec: 57340 Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34838>	2025-05-20 10:44:52 -07:00
Tapani Pälli	5828612da2	anv: use internal rt-null-ahs when any_hit is null Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Tested on BMG and PTL using both settings for RT_CTRL. Cc: mesa-stable Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>	2025-05-20 10:58:53 +00:00
Tapani Pälli	0f591425c9	intel/compiler: provide a helper for null any-hit shader Xe driver will be disabling the HW functionality for null any-hit shaders, drivers need to take care of it instead. This commit brings back parts of older workaround (see `b0624e414f`) we used to have to handle the null any-hit case. Cc: mesa-stable Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>	2025-05-20 10:58:53 +00:00
Felix DeGrood	e75c0160df	intel/tools: add intel_measure.py We are moving away from the INTEL_MEASURE tool, replacing it with utrace. Utrace is better maintained and provides similar debug data. The eventual plan is to EOL INTEL_MEASURE from the driver. This python script reinterprets the dumped utrace data into the traditional INTEL_MEASURE csv file format. Usage: MESA_GPU_TRACES=print_csv MESA_GPU_TRACEFILE=/tmp/ut.csv INTEL_DEBUG=stall <cmd> intel_measure.py /tmp/ut.csv > im.csv Reviewed-by: Casey Bowman <casey.g.bowman@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34662>	2025-05-19 17:27:30 +00:00
Lionel Landwerlin	c570740272	anv: enable preemption setting on command/batch correctly The 2 helpers we're using for doing internal operations (copies, command generation, etc...) can work on command buffers or lower level batches. When working with command buffers, the helpers should set the preemption using genX(cmd_buffer_set_preemption) so that whatever operation comes after toggles the state back to what it needs and we minimize the toggles. When working with batchs, the helpers should disable preemption using genX(batch_set_preemption) and turn it back on when done. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35030>	2025-05-19 06:56:04 +00:00
Jianxun Zhang	2212865ce0	anv: Use different PAT entries for compressed resources Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Displayable compressed resournces have a different PAT entry from the non-displayable compressed. Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>	2025-05-16 16:03:54 -07:00
Jianxun Zhang	ca092db7ce	intel/dev: Differentiate displayable PAT entry of compression (xe2) We need two PAT entries with compression for displayable and non-displayable compressed images. The current 'compressed' entry is renamed to 'scanout_compressed' for the displayable. Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>	2025-05-16 16:03:54 -07:00
Kenneth Graunke	20222cd956	anv: Use the new nir_opt_acquire_release_barriers pass Improves performance of Phasmophobia with the "Eye Adaptation" video setting enabled on Arc B570 by about 9.5%. fossil-db results on Battlemage: Totals: Instrs: 148797922 -> 148797865 (-0.00%) Send messages: 7066341 -> 7066317 (-0.00%) Cycle count: 21459978352 -> 21459975048 (-0.00%) Totals from 8 (0.00% of 574410) affected shaders: Instrs: 4633 -> 4576 (-1.23%) Send messages: 479 -> 455 (-5.01%) Cycle count: 611886 -> 608582 (-0.54%) Observed to cut 15% of sends in a Phasmophobia shader, 8.3% in a Far Cry New Dawn shader, 7% in a Borderlands 3 DX11 shader, and 3.4-3.7% of sends in a few Witcher 3 and Dark Souls 3 shaders. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>	2025-05-16 00:29:13 +00:00
José Roberto de Souza	cb6f96a1e8	anv: Remove a '#if GFX_VER >= 30' block inside of a else of '#if GFX_VERx10 >= 125' Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Removing deadcode. Reviewed-by: Lucas Fryzek <lfryzek@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>	2025-05-15 15:25:12 +00:00
José Roberto de Souza	37b42ef648	anv: Drop '#if GFX_VERx10 >= 125' inside of '#if GFX_VERx10 >= 125' This is just redundant. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>	2025-05-15 15:25:12 +00:00
José Roberto de Souza	3cd972a2d3	anv: Enable preemption due 3DPRIMITIVE in GFX 12 The issues preventing it to be enabled were fixed so now we can enable it but we need also to enable workaround 16013994831 back again. Cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>	2025-05-15 15:25:12 +00:00
José Roberto de Souza	2432d6677e	anv: Implement missing part of Wa_1604061319 Description of this workaround are not clear but looking at Iris implementation we need to emit all 3DSTATE_PUSH_CONSTANT_ALLOC_XS if any 3DSTATE_PUSH_CONSTANT_ALLOC_XS is emitted. Cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>	2025-05-15 15:25:12 +00:00
Hyunjun Ko	7ddf51dc99	anv: Fix to set CDEF filter flag correctly. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This fixes to play av1_intel_broken2.ivf. Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>	2025-05-15 01:02:05 +00:00
Hyunjun Ko	2e256a3cee	anv: Allocate MV buffers enough for AV1 decoding. As other video memories for AV1 are already allocated for the maximum sizes, now it does the same for MV buffers too. This fixes a bunch of artifacts of AV1 playing. Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>	2025-05-15 01:02:05 +00:00
Hyunjun Ko	f4d480f808	anv: Always allocate cdf tables when independent profiles provided Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>	2025-05-15 01:02:05 +00:00
Nanley Chery	4502254cd2	anv: Drop the slow clear heuristic Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This no longer provides a performance improvement. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:05 +00:00
Nanley Chery	67d60f4325	intel/blorp: Simplify get_fast_clear_rect() for gfx12.5 Refactor the scale factors to highlight the 16-tile width requirement on Tile4. The fast-clear simulator code associated with HSD 1407682962 also contains a 16-tile requirement for Tile4 surfaces (for the pitch). Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:05 +00:00
Nanley Chery	312952048b	intel/blorp: Redescribe gfx12.5 surfaces for CCS fast clears According to HSD 1407682962 and the associated simulator code, fast-clear performance can be affected by: image alignment, tiling, dimensionality, and row pitch. Redescribe surfaces in order avoid fast-clearing at a slower rate. Also, benchmarking the main patch in the performance CI (hw=A750) shows that some traces are helped significantly: * TotalWarWarhammer3 +5.58% (n=2) * Factorio +3.75% (n=1) * TerminatorResistance +3.3% (n=2) * Borderlands3 +3.23% (n=2) We could additionally increase the alignment requirements of surfaces in order to deterministically increase fast-clear performance. That's left out of this patch in order to avoid any functional pitfalls that can arise with increased memory consumption. As a result, performance will continue to be affected by how ISL/drivers/apps configure main surface memory alignments (directly or indirectly). Thanks to Lionel Landwerlin for pointing me to the relevant simulator code. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11168 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11418 Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:05 +00:00
Nanley Chery	169e22f962	intel/blorp: Drop clear color assignment prior to Xe2 This hasn't been used since the responsibility of clear color updates moved to the drivers. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:05 +00:00
Nanley Chery	e353244553	intel/blorp: Disable repclear for gfx12 fast-clear Docs indicate that this shouldn't be used. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:05 +00:00
Nanley Chery	8dad01903a	intel: Add and use isl_surf_image_has_unique_tiles() Returns whether or not a subresource range maps to a tile-aligned memory range which doesn't overlap other subresources. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:04 +00:00
Nanley Chery	fcdae4d4c0	intel: Add and use isl_surf_from_mem() Unify code which creates surfaces from buffers. The behavior is slightly changed to use array layers to enable arrayed buffer clears (as needed). Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>	2025-05-13 15:13:04 +00:00
Mauro Rossi	04a643d877	intel/compiler: use ffsll instead of ffsl in brw_vue_map.c `18bbcf9a` triggered the following building error in Android, simple fix is to use ffsll() as it was done before `18bbcf9a` to process uint64_t generics argument. Fixes the following building error: FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o ... ../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio n] const int first_generic_output = ffsl(generics) - 1; ^ ../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'? /home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here int ffs(int __i) __INTRODUCED_IN_X86(18); ^ 1 error generated. Fixes: `18bbcf9a` ("intel: introduce new VUE layout for separate compiled shader with mesh") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>	2025-05-11 00:50:21 +02:00
Ian Romanick	338273dedd	brw/reg_allocate: Optimize spill offset calculation using integer MAD Gfx12.5 and later allow the use of two 16-bit immediate values in integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD, but that is not helpful where. v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the check around using integer MAD is unnecessary. No shader-db or fossil-db changes on any pre-Gfx12.5 platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17119962 -> 17118441 (<.01%) instructions in affected programs: 65398 -> 63877 (-2.33%) helped: 32 / HURT: 0 total cycles in shared programs: 895433316 -> 895425578 (<.01%) cycles in affected programs: 13437376 -> 13429638 (-0.06%) helped: 30 / HURT: 2 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 210052706 -> 209550074 (-0.24%) Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00% Totals from 7081 (1.00% of 707082) affected shaders: Instrs: 16864614 -> 16361982 (-2.98%) Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>	2025-05-09 21:31:09 +00:00
Ian Romanick	3db8dbfdc3	brw/reg_allocate: Optimize spill offset calculation using more SIMD8 Re-associate the calculation. The current calcuation is ((lane + zero_or_8) << 2) + offset The first addition is SIMD8, and the shift and second addition are SIMD16. By switching to ((lane << 2) + offset) + zero_or_32 All operations are SIMD8. The SHL operates directly on the UW 0x76543210UV value, and that eliminates the MOV to expand the UW to UD. v2: Switch to alternate method. Update for SIMD32 on Xe2. No shader-db or fossil-db changes on any pre-Gfx12.5 platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17121519 -> 17119962 (<.01%) instructions in affected programs: 73208 -> 71651 (-2.13%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56 helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79% 95% mean confidence interval for instructions value: -56.02 -30.48 95% mean confidence interval for instructions %-change: -3.24% -1.75% Instructions are helped. total cycles in shared programs: 895450146 -> 895433316 (<.01%) cycles in affected programs: 13709400 -> 13692570 (-0.12%) helped: 31 HURT: 2 helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672 helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51% HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -652.42 -367.58 95% mean confidence interval for cycles %-change: -0.61% -0.19% Cycles are helped. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 210566294 -> 210052706 (-0.24%) Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00% Totals from 7091 (1.00% of 707082) affected shaders: Instrs: 17408115 -> 16894527 (-2.95%) Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>	2025-05-09 21:31:09 +00:00
Iván Briano	99405647a4	anv: vkCmdTraceRays* are not covered by conditional rendering Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The spec says: Certain rendering commands can be executed conditionally based on a value in buffer memory. These rendering commands are limited to drawing commands, dispatching commands, and clearing attachments with vkCmdClearAttachments within a conditional rendering block which is defined by commands vkCmdBeginConditionalRenderingEXT and vkCmdEndConditionalRenderingEXT. Other rendering commands remain unaffected by conditional rendering. It would seem that vkCmdTraceRays* are not covered by that. Fixes new tests dEQP-VK.conditional_rendering.conditional_ignore.trace_rays* Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34864>	2025-05-08 21:08:06 +00:00
Lionel Landwerlin	5c7c1eceb5	anv/brw: handle pipeline libraries with mesh I always thought there was a massive issue with pipeline libraries & mesh shaders. Indeed recent CTS tests have exposed a number of issues. Some values delivered to the fragment shader are coming from different places depending on whether the preceding shader is Mesh or not. For example PrimitiveID is delivered in the per-primitive block in Mesh pipelines whereas for other pipelines it's coming as a VUE slot (which is per-vertex). Those are 2 different locations in the payload. We have to find a layout for fragment shaders that is compatible with everything. Leaving gaps here and there in the thread payload. Fixes the following test pattern : dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	18bbcf9a63	intel: introduce new VUE layout for separate compiled shader with mesh Mesh shaders have per vertex block in URB pretty much identical to the VUE format. Let's just reuse that concept to do all of our layout in the payload attribute registers. This will ensure that we have consistent VUE layout between Mesh & non-Mesh pipelines. We need a new way of laying out the VUE though as we have to accomodate a HW constraint of maximum (per-primitive + per-vertex) of 32 varying. This means we cannot have 2 locations in the payload for things like PrimitiveID which can come from either the per-primitive or the per-vertex block. The new layout places the PrimitiveID at the end of the per-vertex attributes and shrinks the delivery dynamically if the mesh stage is active. The shader is compiled with a MOV_INDIRECT to read the PrimitiveID from the right location in the attributes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	2d396f6085	intel: prepare VUE layout for more than 2 layouts Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	95efdca00b	brw: add documentation pointers to FS attribute layout Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	9d342081e7	brw/nir: add intrinsics to read attribute payload register indirectly Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00

1 2 3 4 5 ...

14047 commits