fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-24 01:58:16 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	bc22a37d98	jay: schedule for pressure Implement a simple pre-RA bottom-up list scheduler with the goal of decreasing register pressure. On Xe2, this significantly reduces spilling. SSA form allows us to estimate register demand cheaply and accurately, which theoretically [1] gives this algorithm the two Hippocratic properties: 1. Shaders with low register pressure are unaffected. 2. Register pressure can only be decreased, never increased. In other words: first, do no harm. The heuristic itself is very simple: greedily choose instructions that decrease liveness using a backwards list scheduler. This is far from optimal! But thanks to the above properties, even a heuristic that picked random instructions would be a win overall - by construction, we can only ever win. In other words: this scheduler is your older brother powering off the game console any time he's about to lose a game, maintaining a 100% win rate. [1] In reality, neither property is strictly satisfied due to the messy details of mapping our clean logical model onto Intel's many weird physical register files. Nevertheless, the algorithm is well-motivated and the empirical results on Xe2 are excellent. SIMD16: Totals: Instrs: 2754194 -> 2753957 (-0.01%); split: -0.23%, +0.22% CodeSize: 41094768 -> 41092768 (-0.00%); split: -0.23%, +0.23% Number of spill instructions: 1724 -> 1129 (-34.51%) Number of fill instructions: 1912 -> 1119 (-41.47%) Totals from 168 (6.35% of 2647) affected shaders: Instrs: 850994 -> 850757 (-0.03%); split: -0.75%, +0.73% CodeSize: 12825680 -> 12823680 (-0.02%); split: -0.74%, +0.73% Number of spill instructions: 1724 -> 1129 (-34.51%) Number of fill instructions: 1912 -> 1119 (-41.47%) SIMD32: Totals: Instrs: 4688858 -> 4557800 (-2.80%); split: -3.53%, +0.74% CodeSize: 70177200 -> 68214816 (-2.80%); split: -3.53%, +0.74% Number of spill instructions: 50316 -> 45795 (-8.99%); split: -9.56%, +0.57% Number of fill instructions: 51526 -> 45075 (-12.52%); split: -13.23%, +0.71% Totals from 819 (30.94% of 2647) affected shaders: Instrs: 3810182 -> 3679124 (-3.44%); split: -4.35%, +0.91% CodeSize: 57044000 -> 55081616 (-3.44%); split: -4.35%, +0.91% Number of spill instructions: 49264 -> 44743 (-9.18%); split: -9.76%, +0.58% Number of fill instructions: 50182 -> 43731 (-12.86%); split: -13.58%, +0.73% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	81e21a8756	jay: factor jay_op_(starts,ends)_block queries Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	e72ffb0046	jay: annotate pure sends for scheduling, CSE, etc Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	c069b7e47c	jay/opt_propagate: avoid branching on poison logically it doesn't matter because we'll bail on a later check, but this is still UB and therefore releases nasal demons. i am jealous of Faith's Rust compilers. there, I said it. ==107281== Conditional jump or move depends on uninitialised value(s) ==107281== at 0x7069768: propagate_backwards (jay_opt_propagate.c:327) ==107281== by 0x7069768: jay_opt_propagate_backwards (jay_opt_propagate.c:367) ==107281== by 0x7058960: jay_compile (jay_from_nir.c:2677) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	4b0c3f5c32	jay/lower_scoreboard: add asserts on key bounds if these are botched you get UB (-: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	4c97493b69	jay/lower_scoreboard: handle accumulator hazard Challenging to hit but fixes dEQP-GLES3.functional.shaders.swizzle_math_operations.vector_multiply.mediump_ivec4_wzyx_zyxw_fragment with scheduling changes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	9a68101bc2	jay/liveness: drop redundant source filtering Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	9b68b4e7a1	jay/liveness: speed up physical CFG merging on top of scheduler changes, compile-time of shaders/blender/1017.shader_test: Difference at 95.0% confidence -0.00173202 +/- 0.00116931 -0.791537% +/- 0.532384% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	1b50d3eed2	jay/liveness: remove pointless bitset init dup initializes it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	5da3b57605	jay: insert simd32 deswizzle in a dedicated pass we don't actually need the DESWIZZLE pseudo instruction, and the pseudo op complicates pre-RA scheduling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	47c6601d5e	jay: relax fragment payload layout this isn't optimal but it should unblock bring up. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Co-authored-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	cb75c9f962	brw: Lower sample_pos for non-per-sample shaders in NIR We generalize the sample_mask_in lowering to handle this too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:45 +00:00
Mike Blumenkrantz	58308b7580	zink: add another anv/adl flake Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41728>	2026-05-21 14:58:52 +00:00
Mike Blumenkrantz	64be743fbe	zink: fix unbinding vertex buffers from null VS state num_bindings doesn't encompass all the bound buffers if bindings reuse the same buffers Fixes: `f8c96df9d2` ("zink: move vbo unbind to bind_vertex_state") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41728>	2026-05-21 14:58:52 +00:00
Samuel Pitoiset	07754c960a	radv: validate drirc option names at compile time This would prevent any typos or if something is backported incorrectly in the future. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41700>	2026-05-21 14:26:28 +00:00
Samuel Pitoiset	ccb669a05f	util: add very basic way to validate drirc files This just checks for option names that don't exist. This is something that already happened in the past with RADV. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41700>	2026-05-21 14:26:28 +00:00
Samuel Pitoiset	e685f8d6aa	radv/ci: cleanup list of expected failures Triage invalid tests to make it easier to see real failures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41717>	2026-05-21 14:03:22 +00:00
Samuel Pitoiset	91cf0a6e6d	radv: use the new generation script for drirc Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41634>	2026-05-21 12:57:43 +00:00
Samuel Pitoiset	bf787fd91b	radv: rename few drirc options for consistency So that the option name matches everywhere. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41634>	2026-05-21 12:57:41 +00:00
Lucas Francisco Fryzek	7b84183201	util/u_trace: Don't use empty initializer list Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Modify empty initializer list to use a zero initializer so we aren't relying on the gnu extension. Fixes: `690d9b0d00` ("util/u_trace: Rework resource management") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41715>	2026-05-21 12:18:07 +00:00
Benjamin Gaignard	0e91cf34af	pan/format: Advertise support for AFBC(32x8,sparse) Some video decoders spit out AFBC(32x8,sparse) images. Advertise support for this modifier so we can import such images. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>	2026-05-21 11:50:16 +00:00
Daniel Stone	4203b770b4	pan/afbc: Properly validate format/parameter combinations AFBC has a number of superblock sizes and valid layouts, with differing combinations allowed. It's quite clear that 16x16 is ambivalent about whether or not block-split mode is used. 64x4 prohibits block-split mode, and 32x8 either requires or prohibits it depending on the format. Add proper handling so we filter out the right combinations. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>	2026-05-21 11:50:16 +00:00
Daniel Stone	c5415c7aed	pan/mod: Reorder linear modifier checks As with AFBC, split the checks into 'can this ever work' vs. 'can this work for what I want it to?'. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>	2026-05-21 11:50:16 +00:00
Daniel Stone	4364f5352a	pan/mod: Protect against no usage flags for 64k This doesn't happen now, but it will later. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>	2026-05-21 11:50:15 +00:00
Daniel Stone	0fb529053b	pan/afbc: Code motion for split modifier queries Reorder the AFBC modifier checking code to first query whether the device can do the mode at all, then to query whether or not the format + modifier is supported at all, then to query whether the specific image usage is OK, then to query whether or not it's optimal. This will come in useful later when we want to split modifier queries into: can this modifier ever be used, what can this modifier be used for, and is this the best modifier for this usage. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>	2026-05-21 11:50:15 +00:00
Karol Herbst	48ec237bf9	zink: proper advertise keep_weak_ffma for fp16 Zink never sets the fp16 screen cap, but the caps also are initialized after zink_screen_init_compiler. So just replicate the check to be safe here. Fixes: `2146e09962` ("zink: keep ffma_weak and use GLSLstd450Fma for it") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41722>	2026-05-21 10:50:32 +00:00
squidbus	b1c72223af	kk: Support VK_KHR_unified_image_layouts Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Metal has no concept of image layouts, and we don't care about them. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41613>	2026-05-21 09:59:38 +00:00
squidbus	f52f7bf8d5	kk: Support attachment feedback loop extensions Metal GPU image optimization is disabled for attachment feedback usage since it causes some CTS flakes. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41613>	2026-05-21 09:59:38 +00:00
squidbus	2a119991f6	kk: Support VK_KHR_shader_fma Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41692>	2026-05-21 09:36:35 +00:00
squidbus	33ce3040e6	kk: Support VK_EXT_host_image_copy Metal provides straightforward ways to copy an image to/from memory, and image-to-image copies can be implemented by chaining them. Note that host copy of combined depth-stencil is not supported, as Metal does not allow CPU copy for these formats. Additionally, GPU optimized contents are not allowed with host image copy usage; CTS directly initializes the raw memory of optimized images to random invalid data, which appears to decompress differently on GPU vs CPU and fail. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>	2026-05-21 02:06:46 -07:00
squidbus	76125cb7af	kk: Separate linear and GPU optimized image layout properties `linear` controls whether the created image is in linear layout, and `optimized_layout` controls only the `allowGPUOptimizedContents` Metal property. Reviewed-by: Aitor Camacho <aitor@lunarg.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>	2026-05-21 02:06:46 -07:00
Valentine Burley	8cc3ca6231	turnip/ci: Add nightly Android CTS job Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The job runs the following modules with ANGLE: - CtsGraphicsTestCases - CtsNativeHardwareTestCases - CtsSkQPTestCases Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>	2026-05-21 06:21:02 +00:00
Valentine Burley	03a84a1e03	ci/android: Add arm64 support for Android CTS Android CTS for both arm64 and x86_64 Android targets always ships with an x86_64 host JDK. Tradefed supports running on arm64 hosts though, so provide a native JDK by installing Debian's openjdk-21-jdk-headless package on arm64. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>	2026-05-21 06:21:02 +00:00
Valentine Burley	aeb40ed23b	ci/android: Update Android CTS to android-cts-16.0_r5 The latest Android 16 release. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>	2026-05-21 06:21:02 +00:00
Valentine Burley	5dacbdc3ce	ci: Bump ci-deb-repo revision to update aapt Update aapt from the Android 14-based version in Trixie to a custom fork based on the upstream Android 16 QPR2 branch, which fixes the following error spam on arm64: E aapt2 : Entry offset at index 0 points outside the Type's boundaries E aapt2 : Entry offset at index 1 points outside the Type's boundaries E aapt2 : Entry offset at index 2 points outside the Type's boundaries E aapt2 : Entry offset at index 3 points outside the Type's boundaries E aapt2 : Entry offset at index 4 points outside the Type's boundaries E aapt2 : Entry offset at index 5 points outside the Type's boundaries E aapt2 : Entry offset at index 6 points outside the Type's boundaries E aapt2 : Entry offset at index 7 points outside the Type's boundaries E aapt2 : Entry offset at index 8 points outside the Type's boundaries E aapt2 : Entry offset at index 9 points outside the Type's boundaries E aapt2 : Entry offset at index 10 points outside the Type's boundaries E aapt2 : Entry offset at index 11 points outside the Type's boundaries E aapt2 : Entry offset at index 12 points outside the Type's boundaries E aapt2 : Entry offset at index 13 points outside the Type's boundaries E aapt2 : Entry at index 14 is too small (0) E aapt2 : Index 15 points to entry with unaligned offset 0x03080001 Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>	2026-05-21 06:21:02 +00:00
Timothy Arceri	ca88f851c8	ac/nir/lower_tex_coord: basic lower tex coord test Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Tests issue from: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15494 Assisted-by: ChatGPT (GPT-5.5) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41666>	2026-05-21 00:54:56 +00:00
Mike Blumenkrantz	eb5bb61f87	lavapipe: enable some forgotten ds3 states Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details these are already used by shader objects Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41683>	2026-05-20 22:53:35 +00:00
Collabora's Gfx CI Team	18ba81e5b6	Uprev Piglit to 6fd29fe44f8857b876a67bee962919635f22ecc8 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `11ce9eb56e...6fd29fe44f` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>	2026-05-20 21:37:44 +00:00
Sergi Blanch Torne	7831892158	xfiles: update before uprev Running jobs in the uprev, some results don't come from the uprev itself, but they are already in the mesa nightly run: https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/1670759. Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>	2026-05-20 21:37:44 +00:00
Mike Blumenkrantz	c7758681f3	zink: rework custom sample locations this is more consistent and comprehensible Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41707>	2026-05-20 21:15:48 +00:00
Christoph Neuhauser	7eba054c5b	anv: Add compute only divergent atomics fusion optimization for Blender Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Blender uses atomic operations as part of its virtual shadow mapping implementation. Virtual shadow mapping page tagging in compute shaders benefits from divergent atomics fusion, while fragment shaders doing the atomic raster step in general have worse performance with this optimization turned on. Thus, an option is added to only apply divergent atomics fusion to compute shaders in ANV, and this option is enabled for Blender. Initial support for divergent atomics fusion optimization in ANV was added in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631. Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>	2026-05-20 19:29:15 +00:00
Jordan Justen	28f6a442c6	brw/compact: Precompact using 2src fields on 3src instructions In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not compact an instruction because precompact overwrites portions of the instruction. (Treating the three source instruction as a two source when accessing instruction fields.) This instruction could be compacted: mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q }; But, since precompact erroneously sets bits, the instruction isn't compacted. Fossil testing: * Tested with `0a3f3fd193` ("brw: drop unused color_outputs_valid key") reverted, as fossils are currently producing inconsitent results otherwise. * Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change. SKL: Totals: CodeSize: 8335219296 -> 8320248992 (-0.18%) Totals from 359508 (14.42% of 2492689) affected shaders: CodeSize: 2838254352 -> 2823284048 (-0.53%) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>	2026-05-20 11:52:52 -07:00
Mike Blumenkrantz	65b75137b4	zink: disable implicit sync handling for qcom proprietary Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41710>	2026-05-20 17:58:16 +00:00
Karol Herbst	8735aa72a1	nak: optimize iadds with an uniform operand in iadds of address calculations Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Instead of doing the iadd manually we can use the uniform slot of the ld/st/atom instruction getting rid of the iadd altogether. Additionally for global memory we can also consume a 32 bit offset instead of requiring it to be 64 bit. Totals from 158539 (13.07% of 1212873) affected shaders: CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00% Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04% SLM Size: 238816 -> 238604 (-0.09%) Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01% Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02% Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02% Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53% Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06% Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:33 +00:00
Karol Herbst	4cebda7f66	nak: add UGPR/GPR lowering for load/store/atom instructions This tries to handle all combinations we might run into to. We should rely on previous optimizations that the more difficult cases never happend. As a side benefit instead of lowering a UGPR to a GPR, it will now be moved to the UGPR slot. Totals from 258010 (21.27% of 1212873) affected shaders: CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01% Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05% SLM Size: 589740 -> 589660 (-0.01%) Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50% Spills to memory: 56180 -> 56136 (-0.08%) Fills from memory: 56180 -> 56136 (-0.08%) Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30% Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56% Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:33 +00:00
Karol Herbst	273204e24e	nir: add uniform address to nvidia IO intrinsics Adding the zero constants have a minor impact on stats due to some unlucky interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs. Totals from 61 (0.01% of 1212873) affected shaders: CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27% Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:33 +00:00
Karol Herbst	22daaddd67	nak: wire up UGPR Ld/St/Atom encoding Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:33 +00:00
Karol Herbst	b1323de44a	nak/sm70: add helper for memory load store addresses This also makes the selection of 32 vs 64 bit addresses based on the actual source in the IR. Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:33 +00:00
Karol Herbst	32fd51687d	nir: add nir_intrinsic_cmat_load_shared_nv to nir_get_io_offset_src_number Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>	2026-05-20 17:23:32 +00:00
Konstantin Seurer	cfdaa26a64	vulkan,spirv: Update spec to 1.4.352 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41682>	2026-05-20 15:36:39 +00:00

1 2 3 4 5 ...

222863 commits