fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 20:08:06 +02:00

Author	SHA1	Message	Date
Nanley Chery	f616d4fb2a	anv: Treat non-WSI PRESENT_SRC as TRANSFER_SRC For non-WSI images, explicitly map VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in anv_layout_to_aux_state(). Before this patch, the function passed PRESENT_SRC into vk_image_layout_to_usage_flags() and got a return value of 0 from it (that function expects that layout to be explicitly handled by the caller). This caused the logic dependent on the return value to be unreliable. Fixes: `c5cad407f8` ("anv: handle non-wsi images in anv_layout_to_aux_state") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>	2026-02-02 18:40:50 +00:00
Nanley Chery	476f461ce7	anv: Fix clear state of WSI blit sources during presentation On gfx12+, this fixes assert failures in hybrid GPU scenarios. Fixes: `811c413f98` ("anv: Don't return the Xe2+ fast-clear type early") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>	2026-02-02 18:40:49 +00:00
José Roberto de Souza	1f61b1c367	intel/brw: Add BRW_DEPENDENCY_INSTRUCTIONS invalidation when instructions are added or removed in brw_opt_split_virtual_grfs() This fix a brw_ip_ranges shader analysis, were it fails because there is a different number of instructions than expected after brw_opt_split_virtual_grfs() optimization. Reproduced in Piglit test spec@arb_sample_shading@builtin-gl-sample-mask 0: arb_sample_shading-builtin-gl-sample-mask: ../src/intel/compiler/brw_analysis.h:150: T& brw_analysis<T, C>::require() [with T = brw_ip_ranges; C = brw_shader]: Assertion `p->validate(c)' failed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39629>	2026-02-02 14:46:50 +00:00
Hyunjun Ko	260908cecb	anv: Add dummy workload for AV1 decode on affected platforms (Wa_1508208842) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Implement software workaround for AVP decoder corruption on Gen12 platforms. These platforms require a warmup workload before the actual AV1 decode to prevent output corruption. - Gen12: Tiger Lake, DG1, Rocket Lake, Alder Lake Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39604>	2026-01-30 04:24:05 +00:00
Hyunjun Ko	8e9fec8e40	anv/video: Compute AV1 tile positions internally Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The pMiColStarts/pMiRowStarts arrays from applications may have incorrect units. Instead of using them directly, compute the tile start positions in superblock units internally based on the tile dimensions. Cc: mesa-stable Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>	2026-01-30 03:28:01 +00:00
Hyunjun Ko	8004f46466	anv/video: fix a typo in Vulkan AV1 decoding. Cc: mesa-stable Fixes: e510efed05d("anv: support in-loop super resolution for AV1 decoding") Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>	2026-01-30 03:28:01 +00:00
Caio Oliveira	db4bc5407f	brw: Print "GRF registers" in INTEL_DEBUG=shaders output Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>	2026-01-29 20:16:48 +00:00
Caio Oliveira	0d19fc8256	brw: Fix "GRF registers" stats output Pick the value from the brw_shader instead of from the prog_data, since when there are multiple variants, the prog_data one will have the maximum value. Picking the wrong value also caused compute shaders that had a single variant to report 0 GRFs since the prog_data was being filled after the generate_code() call. Issue spotted by Felix DeGrood. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>	2026-01-29 20:16:48 +00:00
Lionel Landwerlin	8661cb12e2	anv: implement VK_KHR_internally_synchronized_queues Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39534>	2026-01-29 16:03:26 +00:00
Lionel Landwerlin	db5319fbf0	anv/xe: move special WaitIdle optimization to submission path Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39534>	2026-01-29 16:03:26 +00:00
Tapani Pälli	85978ccd28	anv: route clear operations on compute to companion Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This fixes bunch of cts tests hitting issues when attempting anv_image_mcs_op with compute. Fixes: `ab9d3528dc` ("anv: fix queue check in anv_blorp_execute_on_companion on xe3") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39581>	2026-01-29 14:25:54 +00:00
Michael Cheng	4f82dfc5f5	anv: Implement RT shader group handle capture/replay Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33022>	2026-01-29 08:46:50 +00:00
Lionel Landwerlin	9e4d9d3f35	anv: fix shader heap replay addr Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Michael Cheng <michael.cheng@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33022>	2026-01-29 08:46:50 +00:00
Kenneth Graunke	bfca9d32d3	brw: Fix geometry shaders with non-constant vertex indices Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Geometry shaders load from separate handles for each vertex, so they don't incorporate the vertex index in the URB offset like tessellation shaders do. This means we can have a constant offset (within a vertex's section) but not have a constant vertex index. Prior to `41d7debcfe` we were emitting non-folded ALU so we thought the offset was non-constant at this point. Now we can properly detect constant offsets...but still don't want to use push inputs for non-constant vertex indices. Fixes: `41d7debcfe` ("brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39603>	2026-01-29 00:18:20 +00:00
Caio Oliveira	cc06e1ebe2	brw: Remove outdated comment about remove_dead_variables Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This now also removes dead variables created by split_array_vars, and in the future it is reasonable other optimizations inside the optimization loop to make temp variables dead. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39596>	2026-01-28 22:26:43 +00:00
Caio Oliveira	d404f5934d	intel/mda: Use -W for color words diff and -U for regular unified diff Also add colors to -Y. Default continue to be the "color words" now called -W. As before, MDA_DIFF_COMMAND environment variable can be used to set a custom diff command. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39595>	2026-01-28 22:11:11 +00:00
Caio Oliveira	05fc275837	intel/mda: Change the matching logic Previously the matching logic was designed to match names like this ``` 99993681767ac...32132a.anv.mda.tar/CS/NIR8/046-ssa ``` So up until the first slash of a pattern, a prefix match would be used, followed by fuzzy matching for the remaining pattern. This don't work well when there are subdirectories in the name, so when we see ``` before/99993681767ac...32132a.anv.mda.tar/CS/NIR8/046-ssa before/91132154353bd...090919.anv.mda.tar/CS/NIR8/046-ssa after/91132154353bd...090919.anv.mda.tar/CS/NIR8/046-ssa ``` the first entry can't be matched by `before/9999/first` since the fuzzy match will kick in for the 9999 and if the second entry has four 9s (which it does here) there would be multiple choices. In practice the flexibility of fuzzy matching is not really needed since we've been using consistent small prefixes (like CS, NIR8, BRW, etc). The exception is the last part (the object versions, i.e. "pass names"), where sometimes is convenient to reach by a substring. The new matching logic is to use prefix match by default, except when matching the "object version", where substring match is used. In the example a possible set of the patterns to identify each entry can be `b/99/ssa`, `b/91/ssa` and `a/91/ssa`. The patch adds a few tests to the `is_match()` to clarify the behavior. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39506>	2026-01-28 21:56:59 +00:00
Caio Oliveira	354dbbe3ae	brw: Use the "early break" loop macros when possible Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This macro will stop the loop early if there's no chance to make further progress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>	2026-01-28 19:52:02 +00:00
Caio Oliveira	da80122257	brw: Include backend NIR passes in mda files Add a pass tracker struct that can live the whole lifetime of brw_compile() functions, it will keep track of the debug_archiver and also store some metadata that allow us to name the passes. With that, we can also embed the loop tracking in the same struct, so that is free for any loop to use the "early break" optimization. There are other brw_nir_* passes that are called in the pre-processing phase. These are not currently included in the mda yet. Will be handled when we hook debug_archiver or similar to the runtime/driver. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>	2026-01-28 19:52:02 +00:00
Caio Oliveira	b91c576ae7	intel/mda: add difflog command Compares versions of two objects one by one. Useful to compare two shader compilations and find the first pass that changed. This could already be done by using something like `diff <(mda log ...) <(mda log ...)` but it is useful enough to become a builtin. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39420>	2026-01-28 18:00:45 +00:00
Lionel Landwerlin	a05fc97bc9	anv/iris: add drirc to enable sampler state & compute surface state prefetch I noticed we disable the prefetch only on Gfx12.5. But surely that recommendation carries on on later platforms. It seems other drivers just disable it all the time and only have an option to force the prefetch. So implementing the same thing here. Blorp path is left untouched. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39424>	2026-01-28 13:13:40 +00:00
Iván Briano	5b48805b42	brw: fix local_invocation_index with quad derivaties on mesh/task shaders For mesh/task shaders, the thread payload provides a local invocation index, but it's always linear so it doesn't give the correct value when quad derivatives are in use. The lowering pass where all of this is done correctly for compute shaders assumes load_local_invocation_index will be lowered in the backend for mesh/task, calculates the values for the quads correctly but then avoid replacing the original intrinsic and we remain with the wrong results. Add an intel specific intrinsic and always lower the generic one to that (or whatever else was calculated) to avoid ambiguities and fix the value for quad derivatives. Fixes future CTS tests using mesh/task shaders under: dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.* Fixes: `d89bfb1ff7` ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>	2026-01-27 22:28:19 +00:00
Nanley Chery	4512d81559	intel/blorp: Bump pitch when clearing unaligned bottom rows Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This might be faster if the layer starts at a 64KB offset. No performance benefits found in the performance CI. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:55 +00:00
Nanley Chery	3e331e4fe9	intel/blorp: Optimize non-zero-layer fast-clears Allow surface redescription when fast-clearing a layer > 0. This affects at least five traces in the performance CI, but the CI doesn't report any performance benefit from this. We already had code to handle unaligned rows at the bottom of an image. Now that this handles the misalignment at the top of the image range, we gain some symmetry. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:55 +00:00
Nanley Chery	ba63883692	intel/blorp: Avoid unused surface redescription calc Suggested-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:54 +00:00
Nanley Chery	e42b2a5d70	anv: Don't partial resolve LOD1+ for non-FCV CCS We don't allow fast-clears in this case. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:54 +00:00
Nanley Chery	21d187b7f5	anv: Support fast clears on more layers On Xe2+, support multi-layer and non-zero-layer CCS fast-clears. To do this in a simple manner, drop the code which splits multi-layer clears into fast clears and slow clears. The performance CI reports no regressions nor improvements on BMG. For MCS on all platforms and for CCS on prior platforms, use a new heuristic. Instead of only allowing fast clears on the first slice/layer, do the following: For 3D images, only fast-clear if all slices are cleared. Enables fast-clearing every slice of 3D textures in: * Terminator Resistance - 480x270x128. * Ghostrunner 2 - 320x180x128. For 2D arrays, match the Xe2+ behavior and allow clearing to any layer. This is possible because we only allow fast-clearing if the clear color matches the default value. Enables fast-clearing every layer of 2D array textures in: * Assassin's Creed - 128x128, 6-layers. * Blackops 3 - 1024x1024, 6-layers. * Borderlands 3 - 128x128, 6-layers. * Cyberpunk - 1024x1024, 10-layers. * Unigine Superposition - 4K, 2-layers. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11893 Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:54 +00:00
Nanley Chery	b8f6ad9060	anv: Use variable default value for some images using CLEAR A future commit will enable clearing to more than the first layer of 2D array images. To ensure consistency for the clear color, require the ANV_FAST_CLEAR_DEFAULT_VALUE for such images if they make use of ISL_AUX_STATE_CLEAR. Also, use a non-zero default value for some image formats. I tested the majority of workloads in the performance CI. This will cause those which clear to 2D array layers to gain clears on more than just the first layer. At the moment, we still only support clearing the first layer, so there should be no change in performance. Affected games are documented in the code. Acked-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:53 +00:00
Nanley Chery	811c413f98	anv: Don't return the Xe2+ fast-clear type early Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll need to make more use of the function for some MCS changes in later commits. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:53 +00:00
Nanley Chery	7bb7b63b96	anv: Line wrap anv_CmdClearColorImage Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:52 +00:00
Nanley Chery	390c9e3fda	anv: Inline the CCS/MCS predicated resolve functions Now we can see the MI writes performed before and after the resolves in transition_color_buffer(). Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:52 +00:00
Nanley Chery	4d8c71ab1f	anv: Delete conversion of CCS_D partial resolve Now that hasvk is the driver for supporting HSW and BDW, we no longer need to convert CCS_D partial resolves to full resolves to avoid an assert-failure in BLORP. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:51 +00:00
Nanley Chery	b1db1179c2	anv: Set compressed bit separately from fast-clear type This will make handling fast-clears on multiple layers simpler by saving us from having to pass more parameters into fast-clear state setting functions. It also allows us to set more complex fast-clear state for FCV_CCS_E without marking the image as compressed. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:50 +00:00
Nanley Chery	c054d4fe2f	anv: Support partial resolves on any level/layer Enables more support for FCV_CCS_E partial resolves if we ever need it. Also enables support for multiple layers being fast cleared and needing resolves. Support for that will arrive in several commits. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:50 +00:00
Nanley Chery	0a8ab13b9d	anv: Reset fast-clear type in transition_color_buffer() Moving the code here will simplify the task of supporting fast-clears on multiple array layers and depth slices. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:49 +00:00
Nanley Chery	ce196c9de5	anv: Fix the fast clear type for FCV writes We started allowing non-default clear colors with FCV in commit `cd8e120b97`. When rendering to an image with FCV, set the fast-clear type to ANV_FAST_CLEAR_ANY if the image properties allow such fast-clears. Fixes: `cd8e120b97` ("anv: Allow more single subresource fast-clears with FCV") Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:49 +00:00
Nanley Chery	e7854d06a5	anv: Update predicated resolve documentation * Don't mention gfx7-8 due to the hasvk split. * Account for the array of clear colors. Fixes: `0e6b132a75` ("anv: Access more colors in fast_clear_memory_range") Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:48 +00:00
Nanley Chery	eb4a581e44	intel/isl: Fix QPitch of arrayed MCS From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+, This field must be set to an integer multiple of the Surface Vertical Alignment Accomplish this by aligning the height of each MCS layer to main surface's vertical alignment. Prevents the following test group from failing on Xe2 when a future commit enables multi-layer fast-clears in anv: dEQP-VK.api.image_clearing.. clear_color_attachment.multiple_layers. _clamp_input_sample_count_* The main test I used to debug this: dEQP-VK.api.image_clearing.core. clear_color_attachment.multiple_layers. a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2 Backport-to: 25.3 Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>	2026-01-27 18:46:47 +00:00
Kenneth Graunke	41d7debcfe	brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This avoids generating some useless math that would need to be cleaned up later, without complicating things too much. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	24c66d3871	brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize This helps cut down URB messages on tessellation and mesh shaders significantly. fossil-db results on Battlemage: Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01% Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00% Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00% Spill count: 576554 -> 576616 (+0.01%) Fill count: 545304 -> 545413 (+0.02%) Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04% Max dispatch width: 39856192 -> 39856208 (+0.00%) Totals from 4231 (0.27% of 1583648) affected shaders: Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40% Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03% Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70% Spill count: 378 -> 440 (+16.40%) Fill count: 1308 -> 1417 (+8.33%) Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36% Max dispatch width: 67952 -> 67968 (+0.02%) There are several reasons we didn't go with nir_opt_vectorize_io: 1. nir_opt_vectorize_io appears to work on the slot location level. We want to be able to vectorize based on the URB offsets, especially for cases like point size, layer, and viewport which have different VARYING_SLOT_* values but live in the same vec4 in a URB entry. 2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize within a single 32-bit vec4. It does handle 8 components, but that's only for packing 16-bit values into a 32-bit vec4. Improves performance of Sascha Willems' tessellation demo by around 4% on Meteorlake. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	aafe8967fd	brw: Avoid using URB global offset with per-slot offsets on <= Icelake Both the URB Global Offset and Per-Slot Offsets are specified to be unsigned numbers. The URB Global Offset is only 11 bits, and so is limited to be between [0, 2047]. While the per-slot offsets are given as U32 values, it would appear that adding the two offsets does not handle 32-bit overflow/unsigned wrap correctly. This pops up in Piglit's TCS variable-indexing tests, which ends up performing loads from offset (x - 16) and a base of 18, and at an offset (x) with a base of 2. These should be equivalent, but when x <= 15, the per-slot offset calculated in the shader is negative (0xfffffff[0-f]) and adding the base of 18 is not wrapping around correctly to [2, 17]. To work around this, avoid using the global offset when the per-slot offset is present, and just add the two in the shader where unsigned wrap works correctly. Tigerlake and later don't seem to have this issue. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	07ac0e3463	brw: Skip vec8 store_urb_vec4_intel noop writemasks as well We were checking for 0xf which is fine for vec4, but vec8 gets 0xff. Either way, nothing is writemasked, so we can skip sending the mask. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	dbb24ff56b	brw: Assert that urb_vec4_intel stores only have 4/8 components vec1-3, 5-7, and 9+ are not supported. Only vec4 and vec8. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Kenneth Graunke	c2f03ba12f	nir: Add memory modes to URB load intrinsics This makes it easier for NIR passes to distinguish between inputs and outputs without having to reason about which URB handle source was passed to the intrinsic. It probably also makes it a bit easier for humans to read the NIR too. v2: Don't add memory mode to store intrinsics. It's always output. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>	2026-01-27 16:08:36 +00:00
Tapani Pälli	bb84773c81	blorp: fix asserts hit with msaa blorp blits on xe3 Tested on PTL, fixes various copy_and_blit tests that utilize compute after `ab9d3528dc` that exposed this to them. Fixes: `ab9d3528dc` ("anv: fix queue check in anv_blorp_execute_on_companion on xe3") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>	2026-01-27 15:28:55 +00:00
Caleb Callaway	a91a636faf	driconf: LTO disable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39544>	2026-01-27 14:57:20 +00:00
Hans-Kristian Arntzen	22bd72aa58	anv: Enable VK_EXT_present_timing. Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>	2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen	c18b14aea2	anv: Add PRESENT_STAGE_LOCAL_EXT path for calibration. Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>	2026-01-27 11:09:50 +00:00
Francisco Jerez	c0cf14f0e2	intel/isl: Add unit tests for ISL_AUX_STATE_COMPRESSED_HIER_DEPTH. v2: Add additional AUX state transition test-cases for HIZ_CCS (Nanley). v3: Assume partial resolve is equivalent to full resolve on legacy HiZ surfaces during isl_aux_state_transition_aux_op() instead of asserting (Nanley). v4: Move some tests into different group, add more MCS tests (Nanley). Acked-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>	2026-01-27 08:52:18 +00:00
Francisco Jerez	349b09f8a2	anv/gfx12.5: Apply HIZ-CCS resolve TC flush on full resolves for all gfx12.5. This appears to be needed to guarantee that a resolved depth surface has no remaining fast-cleared blocks on DG2 as well as MTL. After this series this should no longer be hit in practice since we'll be doing partial resolves in most cases, but it seems sensible to keep and correct the workaround for our peace of mind to make sure that full resolves are truly resolving the main surface. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>	2026-01-27 08:52:17 +00:00

1 2 3 4 5 ...

15393 commits