fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-08 11:18:08 +02:00

Author	SHA1	Message	Date
Loïc Molinari	0a36a33f53	panfrost: Move clear funcs to pan_blitter Move panfrost_clear_render_target() and panfrost_clear_depth_stencil() to pan_blitter.c. This allows to make pan_blitter_save_states() static and avoids to expose the saved states enums. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:33 +00:00
Loïc Molinari	e30d95309f	panfrost: Move blit funcs to pan_blitter Add a dedicated header and prefix blit funcs accordingly. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:33 +00:00
Loïc Molinari	547c4ff8cb	panfrost: Fix MTK tiled resources mapped as shader images MTK tiled resources, like AFBC/AFRC resources, must be converted to linear before being used as a shader image in a CS. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:32 +00:00
Loïc Molinari	23ab18e9e3	panfrost: Blit with the RUN_FULLSCREEN instruction (v13) Extend RUN_FULLSCREEN support to architecture v13. Draw call descriptor flags must now be copied to input staging registers for the tiler to avoid dereferences in the fragment pre-pass. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:32 +00:00
Loïc Molinari	712f2534b9	panfrost: Blit with the RUN_FULLSCREEN instruction (v12) Extend RUN_FULLSCREEN support to architecture v12. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:32 +00:00
Loïc Molinari	1e9c175ea0	panfrost: Blit with the RUN_FULLSCREEN instruction (v10) The GPU (tiler) only writes the VertexArrayDescriptor fields "pointer", "vertex_packet_stride" and "vertex_attribute_stride" in RUN_IDVS malloc mode. These fields can also be programmed in non-malloc modes. Update the XML spec file comment accordingly. Fixes: `69ddb910e0` ("panfrost/ci: Enable G610 piglit job") Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:31 +00:00
Loïc Molinari	ed05834caf	panfrost: Blit with the FullScreenJob descriptor (v9) A vertex array is pre-allocated in order to interpolate varyings. The GPU (tiler) only writes the VertexArrayDescriptor fields "pointer", "vertex_packet_stride" and "vertex_attribute_stride" in MallocVertexShaderJob mode. These fields can also be programmed with IndexedVertexShaderJob and FullScreenJob modes. Update the XML spec file comment accordingly. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:31 +00:00
Loïc Molinari	82306a6bd7	panfrost: Emit draw call flags with dedicated func The new function will be used in the next commit to emit fullscreen draw call flags. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:31 +00:00
Loïc Molinari	338b937d01	panfrost: Pass primitive mode to funcs instead of full draw info panfrost_update_active_prim() and prepare_draw() both take a pipe_draw_info struct pointer only for accessing the primitive mode. Directly pass the primitive mode instead in order to ease the addition of new draw functions. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:30 +00:00
Loïc Molinari	6ad3944290	panfrost: Add infrastructure for fullscreen draw calls Panfrost blits are implemented using u_blitter which exposes the draw_rectangle handler for drivers to blit with dedicated GPU support. By default, it ends up blitting with a draw_vbo call on the pipe. In Panfrost, draw_vbo then emits a full-featured draw call using an indexed or malloc vertex shader job on pre-CSF GPUs or a RUN_IDVS instruction on CSF GPUs. Since v9, Mali GPUs expose a lighter draw call with the FullScreenJob descriptor on pre-CSF GPUs or with the RUN_FULLSCREEN instruction on CSF GPUs. These draw calls emit a quad primitive into the polygon list and run tiling of a fullscreen fragment job without vertex processing. This commit adds the infrastructure to implement blits using fullscreen draw calls. It supports all types of blits apart from the instanced ones. Partial blits are supported using scissors and textured blits are supported using varying interpolation and fragment shading. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:30 +00:00
Loïc Molinari	d83e090f12	u_blitter: Fix out-dated draw_rectangle handler doc Remove mention of UTIL_BLITTER_ATTRIB_COLOR and UTIL_BLITTER_ATTRIB_TEXCOORD and add a few words about UTIL_BLITTER_ATTRIB_XY and UTIL_BLITTER_ATTRIB_XYZW. Fixes: `22ed1ba01a` ("gallium/u_blitter: use draw_rectangle for all blits except cubemaps") Fixes: `ca09c173f6` ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer") Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Ashley Smith <ashley.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>	2026-04-28 11:43:30 +00:00
Nick Hamilton	b205c7d592	pvr: Enable shaderImageGatherExtended Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Enables the shaderImageGatherExtended feature and sets the {min,max}TexelGatherOffset physical device properties. The properties are queried via Zink and are expected to be non-zero. Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>	2026-04-28 12:04:09 +01:00
Simon Perretta	57791c4a99	pco: track how many tg4/raw sample comps are needed Rather than always emitting and swizzling 16 components for raw samples, scale it by the number actually needed as defined by the selected tg4 channel/components. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>	2026-04-28 12:04:03 +01:00
Nick Hamilton	b80a5f9b7d	pco: fix clamping the array index when shaderImageGatherExtended is enabled The array index value is a signed integer but the compiler was using the unsigned version of the clamp helper function meaning the value was not been clamped to 0 when its value was < 0. Fix the following deqp test cases when shaderImageGatherExtended is enabled dEQP-VK.glsl.texture_gather.basic.2d_array.* dEQP-VK.glsl.texture_gather.offset..2d_array. dEQP-VK.glsl.texture_gather.offset_dynamic..2d_array. dEQP-VK.glsl.texture_gather.offsets..2d_array. Fixes: `854563f0f8` ("pco: fully switch over to common smp emission code") Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>	2026-04-28 12:03:51 +01:00
Simon Perretta	56b8dc92a9	pco: amend tg4 lowering Use lower_tg4_offsets to take care of explicit offsets, and just swizzle the texels in the order defined by textureGather* Fixes: `46c9239c11` ("pvr, pco: initial texture gather support with gather sampler") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>	2026-04-28 12:03:30 +01:00
Arjob Mukherjee	35f57a2739	pvr: increase value of maxPerStageDescriptorStorageBuffers Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Increase past the minimum required by the Vulkan Spec to fix tests. This was needed due to Zink requirements which splits `maxPerStageDescriptorStorageBuffers` between atomic buffers and `MaxShaderStorageBlocks`. Fixes the following GLES conformance tests: KHR-GLES31.core.compute_shader.resources-max KHR-GLES31.core.draw_indirect.advanced-twoPass-Compute-arrays KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-cs KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-fs KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-int KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case1-cs KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs dEQP-GLES31.functional.draw_indirect.compute_interop.combined.drawelements_compute_cmd_and_data_and_indices dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_overwrite dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_write dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_overwrite dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_write dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_atomic_multiple_write_read dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_multiple_write_read dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_overwrite dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_write dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_overwrite dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_write Backport-to: 26.0 Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41156>	2026-04-28 08:36:48 +00:00
Christian Gmeiner	7d59c62fde	panvk: Wire up VK_EXT_conservative_rasterization on v11+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Mali >= v11 has a Conservative Rast Mode field in DCD Flags 0 with values Disabled and Over Estimate. Wire it to vk_runtime's rasterization state and expose the extension on PAN_ARCH >= 11, with caps restricted to overestimate only — HW has no underestimate value and no overestimation-size granularity. On v11-v13, degenerate triangles produce a wrong fragment w when overestimate is enabled, so cull_zero_area is forced on alongside the mode bit and degenerateTrianglesRasterized is reported as false. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41189>	2026-04-28 09:34:28 +02:00
Samuel Pitoiset	df3de4acbb	ac,radv,radeonsi: replace mesh_fast_launch_2 by gfx_level checks Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41204>	2026-04-28 06:50:43 +00:00
Samuel Pitoiset	94ae99f16f	radv: replace use_ngg_streamout by gfx_level checks There is no way to enable/disable via debug options or so, it's only used on GFX11+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41204>	2026-04-28 06:50:43 +00:00
Karol Herbst	4b66258717	nak: call nir_opt_algebraic_distribute_src_mods Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Totals from 134863 (11.12% of 1212873) affected shaders: CodeSize: 2109574320 -> 2105266608 (-0.20%); split: -0.23%, +0.02% Number of GPRs: 7199115 -> 7194107 (-0.07%); split: -0.13%, +0.06% SLM Size: 201728 -> 201720 (-0.00%); split: -0.01%, +0.00% Static cycle count: 2037608114 -> 2035165858 (-0.12%); split: -0.17%, +0.05% Spills to memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01% Fills from memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01% Spills to reg: 78193 -> 78139 (-0.07%); split: -0.17%, +0.10% Fills from reg: 83383 -> 83335 (-0.06%); split: -0.15%, +0.09% Max warps/SM: 5188428 -> 5188840 (+0.01%); split: +0.03%, -0.02% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>	2026-04-28 03:08:01 +02:00
Karol Herbst	a9eac010dd	nak: call nir_opt_fp_math_ctrl Totals from 77360 (6.38% of 1212873) affected shaders: CodeSize: 1255332672 -> 1250129888 (-0.41%); split: -0.44%, +0.03% Number of GPRs: 4233257 -> 4226625 (-0.16%); split: -0.20%, +0.05% Static cycle count: 937314398 -> 935865851 (-0.15%); split: -0.22%, +0.07% Spills to memory: 11371 -> 11373 (+0.02%) Fills from memory: 11371 -> 11373 (+0.02%) Spills to reg: 24245 -> 24262 (+0.07%); split: -0.65%, +0.72% Fills from reg: 23689 -> 23742 (+0.22%); split: -0.55%, +0.77% Max warps/SM: 2912604 -> 2916096 (+0.12%); split: +0.15%, -0.03% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>	2026-04-28 03:07:51 +02:00
Marek Olšák	3dcba87ca3	nir/opt_licm: hoist instructions across multiple levels of nested loops radv gfx12: Totals: Instrs: 42861311 -> 42861476 (+0.00%); split: -0.00%, +0.00% CodeSize: 227917476 -> 227918160 (+0.00%); split: -0.00%, +0.00% Latency: 265381068 -> 265373506 (-0.00%); split: -0.00%, +0.00% InvThroughput: 42954018 -> 42952350 (-0.00%) VClause: 819026 -> 819024 (-0.00%) SClause: 1210348 -> 1210293 (-0.00%) Copies: 2919525 -> 2919597 (+0.00%); split: -0.00%, +0.00% PreSGPRs: `2889432` -> 2889406 (-0.00%) VALU: 23757371 -> 23757377 (+0.00%); split: -0.00%, +0.00% SALU: 5981417 -> 5981485 (+0.00%); split: -0.00%, +0.00% VOPD: 8966 -> 8964 (-0.02%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:21 +00:00
Marek Olšák	8e036fcaec	nir/opt_licm: use nir_metadata_control_flow Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:21 +00:00
Marek Olšák	e0112be522	nir/opt_licm: add a private state structure for the pass The structure will grow in later commits. The major change is that the preheader and exit blocks are replaced by tracking just the innermost optimized nir_loop * and getting the predecessor and successor blocks out of it. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:20 +00:00
Yiwei Zhang	90c4f1f9dc	util/android_stub: drop legacy atrace Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The legacy atrace support was added in lack of perfetto c bindings in: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13255 Now that perfetto has matured while atrace has compat issue with C++, let's drop legacy atrace support in favor of perfetto even for Android. Reviewed-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41219>	2026-04-27 23:29:27 +00:00
Timothy Arceri	a42c55da46	amd/radeonsi: dont clamp packed user varyings ac_nir_optimize_outputs() might pack user varyings into the color built-ins. If this happens we skip adding clamping to the components that contain the user varying. This change also fixes a second bug where a color built-in can be packed into a non-color slot and was no longer being clamped. Fixes: `3777a5d7` ("radeonsi: assign param export indices before compilation") Closes: #14443 Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40594>	2026-04-27 22:59:58 +00:00
Marek Olšák	0684976de8	ac/nir: add ac_nir_assign_fs_input_locations to set PS input locations in stone Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details No intended functional change. This prevents possible breakage due to DCE removing input loads followed by nir_shader_gather_info updating input masks and changing the result of ac_nir_get_io_driver_location after PS input register contents are already determined. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41175>	2026-04-27 21:05:53 +00:00
Mel Henning	4b0a0ed7b6	nak: Use NIR_LOOP_PASS This is similar to `75ede9d9bc` ("intel/brw: track last successful pass and leave the loop early") except that it uses the common nir helpers. Note that I've also marked nir_opt_peephole_select as NOT_IDEMPOTENT because I'm skeptical that it actually is idempotent. This differs from both brw and radv. I'm also marking gcm as not idempotent because it isn't idempotent in practice on one of the shaders in my shader-db: `2bf4ba7133/fossils/blender` pipeline hash 0e972f8e349af903 This is about a 4% geomean compile time speedup on my local collection of shaders. Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>	2026-04-27 20:14:05 +00:00
Mel Henning	75fc9e2704	nak: Use shader_info->var_copies_lowered This mirrors the change from `ba0bc7182d` ("anv: use shader_info->var_copies_lowered") Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>	2026-04-27 20:14:05 +00:00
Danylo Piliaiev	5b5bc956df	tu/perfetto: Move away from single timeline for all apps This moves from deprecated stage_id/hw_queue_id to per-context stage_iid/hw_queue_iid, which leads to separate timelines per app. There are several benefits to this: - Different driver versions could be used by different apps and perfetto won't confuse tracepoints. - Tracepoints from different apps may not align perfectly, so previously we got a fair amount of weird vertical ordering of tracepoints. The downside is that info is spread across several timelines multiplied by queues, but I think that's better since it is easier to understand which tracepoints correspond to which app. The changes are mostly copied from radeon/intel perfetto integration. This also fixes app_event emission along the way, previously debug_marker_stage was called _before_ SEQ_INCREMENTAL_STATE_CLEARED. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41105>	2026-04-27 19:45:42 +00:00
Serdar Kocdemir	117f3cb1fc	gfxstream: allow VK_KHR_maintenance extensions Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Add latest maintenance extensions required by Android Vulkan requirements, except VK_KHR_maintenance5 which enables dynamic rendering on ANGLE and causes issues with some cuttlefish targets. Test: CI Reviewed-by: Aaron Ruby <aruby@qnx.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41210>	2026-04-27 18:57:43 +00:00
Serdar Kocdemir	e8773b96df	gfxstream: Add VK_EXT_pipeline_protected_access Test: dEQP-VK.protected_memory.* Test: dEQP-VK.pipeline.monolithic.image.dedicated_allocation.* Test: dEQP-VK.api.info.get_physical_device_properties2.features.* Reviewed-by: Aaron Ruby <aruby@qnx.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41210>	2026-04-27 18:57:43 +00:00
Ian Romanick	e301817753	brw: Don't lower phis involved in DPAS instructions to scalar On my Arc A380 (DG2), this more than doubles the performance of Jeff Bolz's cooperative matrix benchmark. With llama.cpp modified to use cooperative matrix on DG2, performance is improved by 37%. Closes: #15311 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Matt Corallo <git@bluematt.me> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>	2026-04-27 18:09:16 +00:00
Ian Romanick	09b43966ba	brw: Lower all phis to scalar The next commit will cause some very specific phis to not be lowered to scalar, and that's the reason the callback is used instead of nir_lower_all_phis_to_scalar. It's worth noting that the comment in nir_lower_phis_to_scalar.c specifically calls out Deus Ex as the reason some phis should not be lowered. At least on current BRW, zero shaders from Deus Ex trace were affected for spills or fills on any Intel platform. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17050005 -> 17051449 (<.01%) instructions in affected programs: 41032 -> 42476 (3.52%) helped: 29 / HURT: 159 total cycles in shared programs: 876411976 -> 876433702 (<.01%) cycles in affected programs: 1455550 -> 1477276 (1.49%) helped: 40 / HURT: 150 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01% CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02% Send messages: 40870114 -> 40870113 (-0.00%) Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01% Spill count: 3460669 -> 3460240 (-0.01%) Fill count: 4988325 -> 4987891 (-0.01%) Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 48848112 -> 48848128 (+0.00%) Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03% Totals from 5713 (0.28% of 2010434) affected shaders: Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91% CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82% Send messages: 284751 -> 284750 (-0.00%) Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90% Spill count: 857 -> 428 (-50.06%) Fill count: 845 -> 411 (-51.36%) Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40% Max dispatch width: 160416 -> 160432 (+0.01%) Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Matt Corallo <git@bluematt.me> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>	2026-04-27 18:09:16 +00:00
Suresh Guttula	71508d90aa	ac: Add vcn_5_3_0 support Enable hardware decode/encode capabilities for VCN 5.3.0 by configuring the supported codec list. This allows vainfo to properly enumerate available codec capabilities. Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>	2026-04-27 17:13:18 +00:00
Benjamin Cheng	0e04954c9a	radv/video_enc: Use correct swizzle mode for VCN5 with GFX11 Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>	2026-04-27 17:13:18 +00:00
Benjamin Cheng	922d04c9a5	ac/vcn: Rename VCN5 swizzle mode to GFX12 The original naming is inaccurate, it depends on the GFX version, not VCN. Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>	2026-04-27 17:13:18 +00:00
Matt Turner	acba4c9fd8	radv: expose VK_KHR_performance_query on GFX11 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Enable VK_KHR_performance_query on GFX11 (RDNA3 / RDNA3.5) now that the selector tables and packet emission are in place. Tested on Strix Halo with dEQP-VK.query_pool.performance_query.* (6 pass, 6 not-supported for the allowCommandBufferQueryCopies cases). Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>	2026-04-27 16:16:00 +00:00
Matt Turner	8499d86b94	radv/perfcounter: add GFX11 performance counter selectors GFX11 reorganizes the shader perfcounter blocks: wave counts move from SQ to the SQG registers (still mapped as the SQ block in ac/), while per-instruction counters move from SQ to the new SQ_WGP block. Add GFX11-specific selector enums using the new block assignments and branch radv_query_perfcounter_descs to select them on GFX11+. GL2C, GL1C, and TCP selectors are unchanged between GFX10.3 and GFX11. The "Instructions" (total count) counter is dropped on GFX11 as there is no direct SQ_WGP equivalent for INSTS_ALL. Selector indices verified against gpu_performance_api's gpa_hw_counter_gfx11.cc. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>	2026-04-27 16:15:59 +00:00
Matt Turner	703de21af8	radv/perfcounter: guard select1 access in radv_emit_select Some perfcounter blocks (e.g. SQ_WGP on GFX11) define num_spm_modules but have no select1 register array. Skip the select1 loop when the array is NULL. This is a prerequisite for enabling performance queries on GFX11. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>	2026-04-27 16:15:59 +00:00
Matt Turner	2595940b0d	radv: fix UB in radv_format_pack_clear_color for snorm formats Casting a negative float to uint64_t is undefined behavior. GCC 15 with -O2 produces 0xFFFFFFFFFFFFFFFF for (uint64_t)(-32767.5f), causing snorm clear values to be packed incorrectly (e.g. 0xFFFF instead of 0x8001 for snorm16 -1.0). This results in wrong DCC comp-to-single clear colors and ~966 CTS snorm multisample_resolve test failures. Fix by casting through int64_t first, which is well-defined (truncation toward zero) and preserves the two's complement bit pattern. Fixes: `585c25be1e` ("radv: fix color conversions for normalized uint/sint formats") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41164>	2026-04-27 15:44:09 +00:00
Jaishankar Rajendran	12f43d048e	anv: tune parameters of the ASTC software decoding Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com> Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>	2026-04-27 15:17:04 +00:00
Jaishankar Rajendran	cd941d3970	vulkan/runtime: enable parametrization of ASTC software decode Enable the driver to select : - LUT allocation alignment - LUT memory flags selection Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com> Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>	2026-04-27 15:17:04 +00:00
Yiwei Zhang	0f2a42afcf	lvp/android: use common ANB implementations Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This has been unblocked by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40211. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41138>	2026-04-27 14:58:18 +00:00
David Rosca	7b5277ce5c	frontends/va: Fix out of bounds write in AV1 decode tile info For invalid streams tile cols and rows may be higher than 64. This would overwrite data after the height_in_sbs array, but since the maximum amount of bytes overwritten is bound by the maximum supported decode resolution, this can't overwrite any important fields and thus won't cause any observable issue. As this can only happen with invalid streams, it still won't decode correctly with this fixed. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15290 Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41016>	2026-04-27 14:29:34 +00:00
David Rosca	c2a4fa33b8	frontends/va: Fix finding LTRs from POCs in HEVC decode This should only consider valid entries, not loop over the entire array. In addition the array size was wrong before. Fixes: `779edc0759` ("frontends/va: Correctly derive HEVC StCurrBefore, StCurrAfter and LtCurr") Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41016>	2026-04-27 14:29:34 +00:00
David Rosca	630a4d2249	radeonsi: Always use 2D tiling for video dpb Fixes decode on VCN5 with AMD_DEBUG=notiling Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41201>	2026-04-27 14:10:11 +00:00
Pavel Ondračka	cc7be8433a	r300: dirty VS state when switching variants Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When r300_pick_vertex_shader switches to a WPOS variant, it only dirtied rs_block_state, leaving vs_state with a stale code size. This caused cs_count warnings (offset of -4 for one extra VS instruction) but was mostly harmless since the emitted packet stream still used the current shader. Factor the VS code dirtying out of r300_bind_vs_state into a helper and call it when selecting a new variant too. Fixes: `806dcf9db7` ("r300: only output wpos in vertex shaders when needed") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41200>	2026-04-27 12:48:34 +00:00
Jesse Natalie	3f35e65253	wgl: Use an hwnd xor hdc for framebuffers It seems maybe hdcs can get recycled? Fixes: `28058221` ("wgl: Support contexts created from non-window DCs") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41174>	2026-04-27 12:26:54 +00:00
Pavel Ondračka	416da54cce	r300: fix MSAA resolve COLORPITCH tiling after pipe_surface de-pointerization r300_simple_msaa_resolve used to patch srcsurf->pitch with the resolve destination's tiling bits before passing the surface to the blitter. That worked when set_framebuffer_state kept the same pipe_surface pointer, so r300_get_nonnull_cb returned the patched object. After the de-pointerization, r300_framebuffer_init creates a fresh r300_surface from the pipe_surface template, discarding the pitch modification. The hardware then uses the MSAA source tiling for R300_RB3D_COLORPITCH0, leading to corruption. Move the tiling override into r300_emit_fb_state and override the tiling bits of COLORPITCH from the destination surface at emit time. Fixes: `2eb45daa9c` ("gallium: de-pointerize pipe_surface") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15303 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41092>	2026-04-27 12:05:44 +00:00

1 2 3 4 5 ...

221716 commits