fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 09:48:16 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	5ebf0c9161	jay: elide atomic dests simd16 results. kinda noisy but obviously the right thing to do. Totals from 45 (1.70% of 2647) affected shaders: Instrs: 59182 -> 59194 (+0.02%); split: -0.11%, +0.14% CodeSize: 905200 -> 904752 (-0.05%); split: -0.17%, +0.12% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>	2026-05-12 22:46:31 +00:00
Alyssa Rosenzweig	b3fe01e2c1	jay: fix bfn with 0xffff constant awkward. Totals from 128 (4.84% of 2647) affected shaders: Instrs: 258121 -> 257970 (-0.06%); split: -0.07%, +0.01% CodeSize: 3662400 -> 3661792 (-0.02%); split: -0.14%, +0.12% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>	2026-05-12 22:46:30 +00:00
Alyssa Rosenzweig	c5cee5d973	jay: add JAY_DEBUG=noacc option can help when debugging RA. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>	2026-05-12 22:46:30 +00:00
Alyssa Rosenzweig	9dbaaecb74	jay: swap predication/acc pass order Lets us use more accumulators, I think this is well motivated. Saw this in a test shader. Totals from 242 (9.14% of 2647) affected shaders: Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00% CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>	2026-05-12 22:46:30 +00:00
Ian Romanick	907cc49c32	brw: Calcuate divergence before brw_from_nir Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We were previously assuming that potentially stale divergence data was valid. On some paths the register pressure estimator would recalculate this, but, as is obvious from the results, not always. v2: Add an assertion in brw_from_nir_emit_impl to ensure we don't end up in this situation again. v3: Call nir_divergence_analysis from brw_nir_lower_deferred_urb_writes. This fixes assertion failures (the assertion added in v2) in basically every graphics shader. The altnerative was to call it from brw_compile_vs, brw_compile_gs, and brw_compile_tes. shader-db: All Intel platformms had similar results. (Lunar Lake shown) total instructions in shared programs: 17050403 -> 17054033 (0.02%) instructions in affected programs: 296344 -> 299974 (1.22%) helped: 0 / HURT: 376 total cycles in shared programs: 876063126 -> 875817316 (-0.03%) cycles in affected programs: 78627328 -> 78381518 (-0.31%) helped: 91 / HURT: 276 LOST: 1 GAINED: 10 fossil-db: All Intel platformms had similar results. (Lunar Lake shown) Totals: Instrs: 913770429 -> 916075391 (+0.25%); split: -0.00%, +0.26% CodeSize: 14647414640 -> 14726176320 (+0.54%); split: -0.02%, +0.56% Cycle count: 102308091527 -> 102290664775 (-0.02%); split: -0.26%, +0.24% Spill count: 3469632 -> 3469124 (-0.01%); split: -0.08%, +0.07% Fill count: 5007038 -> 4998674 (-0.17%); split: -0.51%, +0.34% Max live registers: 192568853 -> 192595355 (+0.01%); split: -0.00%, +0.02% Max dispatch width: 48713168 -> 48712880 (-0.00%); split: +0.00%, -0.00% Non SSA regs after NIR: 140252767 -> 140253718 (+0.00%) Totals from 223099 (11.11% of 2007586) affected shaders: Instrs: 314077245 -> 316382207 (+0.73%); split: -0.01%, +0.75% CodeSize: 5335583824 -> 5414345504 (+1.48%); split: -0.06%, +1.54% Cycle count: 45868025821 -> 45850599069 (-0.04%); split: -0.58%, +0.54% Spill count: 2062649 -> 2062141 (-0.02%); split: -0.14%, +0.11% Fill count: 3343019 -> 3334655 (-0.25%); split: -0.76%, +0.51% Max live registers: 36762498 -> 36789000 (+0.07%); split: -0.02%, +0.09% Max dispatch width: 5542224 -> 5541936 (-0.01%); split: +0.03%, -0.03% Non SSA regs after NIR: 43727142 -> 43728093 (+0.00%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1] Fixes: `1bff4f93ca` ("brw: Basic infrastructure to store convergent values as scalars") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41370>	2026-05-11 21:03:19 +00:00
Caio Oliveira	d08d345686	brw: Remove references to SIMD4x2 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In Gfx9 the enum value was changed to mean SIMD8 double precision, so drop the old unused enum. At least on Gfx9 there is an extension bit to set to use the old SIMD4x2 mode, we can recover if we ever need this in the future. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41457>	2026-05-11 20:16:02 +00:00
Iván Briano	756343271a	anv: add and use a drirc option to enable FullyCovered for vkd3d Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:50 +00:00
Iván Briano	2ad92e3ea4	anv/brw: handle FullyCoveredEXT Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:50 +00:00
Iván Briano	58006eaaa4	anv/brw: add conservative raster on/off to FS_CONFIG FullyCovered will need to know if conservative rasterization is enabled, so pass it on to the shader. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:50 +00:00
Iván Briano	fea8830946	intel/brw: add load_frag_shading_rate_intel Add a new intrinsic to read the raw shading rate provided to the FS payload, and lower load_frag_shading_rate in NIR using it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:49 +00:00
Iván Briano	5383afadbf	intel/brw: add load_msaa_rate_intel intrinsic Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:49 +00:00
Iván Briano	3448f3ce4a	intel/brw: add load_coverage_mask_intel intrinsic We'll need the raw coverage mask provided to the fragment shader in a future patch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>	2026-05-11 18:15:49 +00:00
Lionel Landwerlin	7d3b62e13d	anv: only load fp64 software shader when needed Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14665 Reviewed-by: Allen Ballway <ballway@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>	2026-05-11 08:27:14 +00:00
Lionel Landwerlin	beb0ffc069	anv: sweep the NIR fp64 shader before keeping it on the device Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Allen Ballway <ballway@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>	2026-05-11 08:27:14 +00:00
Lionel Landwerlin	19997bc245	blorp: only request fp64 shader on when required Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Allen Ballway <ballway@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>	2026-05-11 08:27:14 +00:00
Lionel Landwerlin	91cf85906b	blorp: stop requesting the fp64 shader for ELK Drivers using blorp on ELK platforms don't need the special color->depth conversion path that needs 64bit floating point math. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Allen Ballway <ballway@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>	2026-05-11 08:27:14 +00:00
Valentine Burley	2c4ed4f90d	ci: Add missing rule for new trace replay config files Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41443>	2026-05-11 08:02:05 +00:00
Hyunjun Ko	ff3e0ec5f4	anv/video: fix up H.264/H.265 encode session parameters to match advertised caps Initially, this is to fix an issue when apps set wrong ctb sizes. In addition to it, we need to align things with advertised caps. This is inspired by radv. The relevant discussion is here: https://github.com/KhronosGroup/Vulkan-Video-Samples/pull/169 Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41433>	2026-05-11 07:12:33 +00:00
Caio Oliveira	46cd7b6e28	brw: Move brw_prog_data_init to a different file Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The generator code will be reworked, remove this unrelated function from there. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41458>	2026-05-10 00:07:15 +00:00
Caio Oliveira	2273533504	brw: Fix some indentation in brw_generator.cpp Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Will reduce noise in later changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41459>	2026-05-09 16:40:32 -07:00
Caio Oliveira	b1c3e36fe3	intel/dev: Expose list of known platform names Acked-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41007>	2026-05-09 22:00:54 +00:00
Michael Cheng	24aa7715cb	intel/ds: Label selected draw events with vertex count Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Format draw and draw_indexed Perfetto events with their vertex count. For draw_indirect and draw_indexed_indirect, include the draw count when indirect tracing is enabled (MESA_GPU_TRACES=indirects), otherwise fall back to the static name. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>	2026-05-08 19:51:48 +00:00
Michael Cheng	e8b6f61a50	intel/ds: Label compute events with dispatch dimensions in Perfetto Format compute events as compute(x,y,z) using the end-payload group dimensions. Trailing dimensions that equal 1 are omitted to keep labels concise — e.g. compute(128,1,1) becomes compute(128). For compute_indirect, the dispatch dimensions are not known at command record time since they live in GPU memory as a VkDispatchIndirectCommand. The u_trace framework reads them back at trace flush time via the is_indirect mechanism: the GPU address is recorded alongside the tracepoint, and u_trace copies the pointed-to struct into indirect_data once the GPU has finished. The same trailing-1 trimming is applied when indirect tracing is enabled (MESA_GPU_TRACES=indirects); otherwise the event falls back to the static "compute_indirect" name. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>	2026-05-08 19:51:48 +00:00
Michael Cheng	ecbc6625cf	intel/ds: Add end_event_dyn() and CREATE_DUAL_EVENT_CALLBACK_DYN macro Add a separate end_event_dyn() that takes a std::string by value for dynamic event names. The [=] lambda capture deep-copies the string into the closure, avoiding a dangling pointer when the Trace() continuation runs after the caller's stack frame is gone. The existing end_event() with const char* remains for string literals and long-lived pointers (e.g. payload->str), where no copy is needed. CREATE_DUAL_EVENT_CALLBACK_DYN formats the event name via snprintf and passes the result as a std::string to end_event_dyn(). Follow-up patches will use this macro to label events with runtime dimensions. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>	2026-05-08 19:51:48 +00:00
Lionel Landwerlin	d2732faac0	anv: enable VK_EXT_swapchain_compression_control when possible Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40429>	2026-05-08 13:24:47 +03:00
Lionel Landwerlin	7094ad91e3	anv: implement missing device image property compression filtering We want to avoid reporting support for disabling compression with compressed drm modifier. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c94cd1235f` ("anv: implement VK_EXT_image_compression_control") Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40429>	2026-05-08 13:24:28 +03:00
Paulo Zanoni	ff5b909511	anv/sparse: bring back our (limited) support for depth/stencil Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The ambiguity of the Vulkan spec was clarified, and we don't need to support sparse depth/stencil with exactly the same number of samples as non-sparse. If you want to pass CTS, you'll need VK-GL-CTS commit 03976477f521 ("Don't require more than VK_SAMPLE_COUNT_1_BIT for non-color sparse resident images"). This is essentially a revert of `d5da6980d3` ("anv/sparse: don't support depth/stencil with sparse") and `7b337e214d` ("anv: remove dead code"). Thanks to Iván Briano for working with Khronos to get clarification on the spec and for implementing the VK-GL-CTS fix. Reviewed-by: Iván Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>	2026-05-07 23:47:52 +00:00
Paulo Zanoni	7eab94d542	intel/nir: fix sparse shadow comparison for BRW While Jay overwrites sparse_tex->op with the newer opcodes that only return red and the sparse stuff, BRW keeps using the original opcode of the cloned instruction, so it can't change def->num_components. This was not previously detectable since we did not have sparse enabled for depth/stencil on Anv for a while. A patch to re-enable that was proposed a while ago (MR !37423), never merged, but then a recent attempt to try to merge it (by me) detected this regression. Let's fix the regression first, then we can finally re-enable sparse depth/stencil support in Anv, hopefully. Fixes: `7468261d3d` ("intel/nir: Make intel_nir_lower_sparse work for either brw or jay") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>	2026-05-07 23:47:51 +00:00
Tapani Pälli	c540405ca3	anv: use INTEL_NEEDS_WA_14025112257 define for workaround Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41281>	2026-05-07 16:20:29 +00:00
Tapani Pälli	c381b4fdd4	intel/dev: update mesa_defs.json from workaround database This removes 18042479026 as we don't utilize BRW_AOP_MOV in compiler and adds missing xe2 entries for 14025112257. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41281>	2026-05-07 16:20:29 +00:00
Lionel Landwerlin	62b890046f	anv: remove old entrypoints Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40387>	2026-05-07 15:49:20 +00:00
Lionel Landwerlin	f123030dcd	anv: implement VK_KHR_device_address_commands Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40387>	2026-05-07 15:49:20 +00:00
Lionel Landwerlin	7adece7ce0	anv: fixup null address check Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40387>	2026-05-07 15:49:19 +00:00
Kenneth Graunke	2729b1608f	brw: Limit SIMD width based on NIR rather than first backend compile Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details I originally added this mechanism to have the first (SIMD8) compile note that certain features were in use which would prevent SIMD16/32 from compiling, so we could skip the work of trying those. But these days, there aren't many cases, and the ones we have are easily detectable based on the NIR. We can detect it earlier without even having to do the SIMD8 compile. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>	2026-05-07 08:29:40 +00:00
Kenneth Graunke	c5928d40ae	brw: Drop dead code from dispatch limit check for dual source blending We checked that ver is 11 or 12. It can't be >= 20. This is dead code. Dual source blending on Xe2 does not have native SIMD32 RT write message support, but SIMD splitting is currently lowering it to low/high SIMD16 message pairs when using SIMD32 dispatch. I'm not aware of any of the hardware errata from previous platform still applying. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>	2026-05-07 08:29:40 +00:00
Kenneth Graunke	599d26db00	brw: Set prog_data::dual_src_blend from NIR outputs written bitfield Simpler and set earlier. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>	2026-05-07 08:29:40 +00:00
Kenneth Graunke	afb97ff2af	brw: Switch FS outputs to semantic IO and FRAG_RESULT_DUAL_SRC_BLEND The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than looking for FRAG_RESULT_DATA0 with an index of 1. This also means we no longer care about the dual source blend index, and can just use the FRAG_RESULT location. That cascades to meaning we no longer have to store a tuple in driver_location. And, if we just need location, we can avoid populating that at all and use nir_io_semantics to get it. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>	2026-05-07 08:29:40 +00:00
Kenneth Graunke	fbaa5ad0c3	iris: Implement force_dual_color_blend_by_location via NIR We can just have iris look at its own program key and change the fragment shader output variable's location/index in the NIR. By doing this before lowering fragment shader outputs, the rest of the output lowering does the right thing, and the backend no longer has to consider hacks for broken OpenGL apps. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>	2026-05-07 08:29:40 +00:00
Calder Young	efc6a3053d	anv: Fix some usage flags not propagated to ISL for explicit layouts Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Some vulkancts tests rely on vkGetImageMemoryRequirements to return the same exact size after exporting and importing an image. This broke when we started adding padding to sampled surfaces to manage overfetch, because the texture usage flag does not get applied to the ISL surface when the image is recreated using an explicit layout. Fixes: `8d13628f7` ("isl: Add additional alignment/padding requirements to prevent overfetch") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41376>	2026-05-07 00:02:43 +00:00
Alyssa Rosenzweig	5636a57f60	jay/lower_scoreboard: use SYNC.allrd/allwr This collapses piles of silliness. Totals: CodeSize: 71626288 -> 70710000 (-1.28%) Totals from 1634 (61.73% of 2647) affected shaders: CodeSize: 66319376 -> 65403088 (-1.38%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig	c1dc9d3b1a	jay/lower_scoreboard: be the sole emitter of SYNC this gets closer to something we can schedule and avoids some pointless syncs. Totals from 491 (18.55% of 2647) affected shaders: Instrs: 602994 -> 602946 (-0.01%) CodeSize: 9063888 -> 9015904 (-0.53%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig	0885ed10f5	jay/lower_scoreboard: use .src annotations This is less heavy handed, avoiding unnecessary stalls after SENDs in a bunch of common cases. The stats (SIMD32) are: Totals: CodeSize: 70345392 -> 71674272 (+1.89%) Totals from 1774 (67.02% of 2647) affected shaders: CodeSize: 67359248 -> 68688128 (+1.97%) What's happening here is we are inserting extra SYNC.nop instructions in a bunch of cases for the .src preceding the eventual .dst. However, putting aside the i-cache impact for a moment, this is showing the optimization doing what it should (deferring dst syncs and inserting cheaper src syncs first). So this should be positive in reality despite the negative stat impact. The most hurt shaders are pooling up SYNC.nop's at the end of blocks due to local-only SWSB and lack of SYNC.allwr optimization. The latter is added later in this MR. The former is planned. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	130e724d5e	jay/lower_scoreboard: refactor SYNC.nop insertion for next commit Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	1ecd75a397	jay/lower_scoreboard: fix tracking for A@* and *@7 update the tracking with what we actually waited on, not what we ideally wanted to wait on. reduces extra annotations in some cases. SIMD32: Totals from 194 (7.33% of 2647) affected shaders: CodeSize: 14473840 -> 14469088 (-0.03%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	93edf9a3fd	jay/lower_scoreboard: refactor wait pipe code for next commit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	18e09858eb	jay/lower_scoreboard: elide more dependencies IGC does these optimizations and I think they should be safe given my mental model. Given a sequence like: r0 = add.f32 r1, r2 r1 = add.f32 r3, r4 Each ALU pipe is pipelined but in-order. Therefore, the second add cannot possibly complete before the first add, so it cannot write r1 before the first add reads r1, so we can elide the write-after-read dependency. That in term avoids a pipeline bubble between the two instructions. Ditto for write-after-write. Similarly if the distance is too great within an in-order pipe since there is a maximum pipeline length, it's not infinite. Note that if there was cross-pipe dependencies we do need the annotation since the pipes themselves are parallel. SIMD32: Totals from 58 (2.19% of 2647) affected shaders: CodeSize: 3316592 -> 3315056 (-0.05%); split: -0.05%, +0.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	e4dc161277	jay: assign accumulators post-RA Greedy post-RA substitution pass, similar to IGC's AccSubstitution pass. Stats together with the previous commits. SIMD16: Totals from 2209 (83.45% of 2647) affected shaders: Instrs: 2701029 -> 2696350 (-0.17%) CodeSize: 39166720 -> 40372272 (+3.08%); split: -0.36%, +3.44% SIMD32: Totals from 2211 (83.53% of 2647) affected shaders: Instrs: 4691165 -> 4641188 (-1.07%) CodeSize: 69365792 -> 69341616 (-0.03%); split: -0.50%, +0.47% The instruction count reduction is from RA shuffle code getting coalesced via accumulators. The code size changes are from: * Fewer moves from the instr count reduction (helped) * Smaller MADs encoded as MACs (helped) * Fewer SYNC.nop due to fewer scoreboarding annotations (helped) * Less compaction due to explicit accumulator operands (hurt) I expect significant cycle count changes from this but we don't have a cycle model wired up yet, so reading the assembly will have to do. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	8b324591d1	jay: move simd32 deswizzling to float pipe for more accumulator usage. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	712719a2ae	jay: do moves on the float pipe where possible this allows us to use accumulators more. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig	6f2b1cece6	jay: model MAC Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>	2026-05-06 23:25:25 +00:00

1 2 3 4 5 ...

16064 commits