fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-21 03:48:22 +02:00

Author	SHA1	Message	Date
Sagar Ghuge	b00f00a87a	intel: Add debug option to dump out parent-child map This commit adds new debug options to dump out parent-child relationship map using INTEL_DEBUG=bvh_pcrel_map. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	fc4458db9e	anv: Track leaf block offset map Track where is each leaf_id encoded in final BVH. It's a map of leaf_id == final_bvh_offset. This will help us to navigate the BVH layout in update pass. Leaf block offset will give us : Leaf id -> bvh block and parent-child map can be used for: bvh_block -> parent offset. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	6ef5d17523	anv: Track parent-child map for BVH update This map stores parent BVH offset for each of their children. This will help us to walk the BVH layout later in the update pass. Since we are tracking block indexes, even with 2^32 large BVH size, we can have 2^26 max indices (each block 64B wide) that leaves us 6 bits in which we can track child slot index occupancies in parent. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	1f4b6f21ea	anv/rt: Use constant BVH offset instead of pushing Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	4c9a4abb65	anv/rt: Extract common code in separate header Extract leaf encoding in encode.h and move some of the helper in anv_build_helper.h Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	86db940766	anv: Pass vk_acceleration_structure_build_state as param Pass vk_acceleration_structure_build_state as parameter to get_bvh_layout. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>	2026-06-15 18:27:03 -07:00
Sagar Ghuge	d9263b617c	anv/rt: Skip invalid node in child block count Previously, we were accounting invalid nodes as well in child block count which insert holes in the BVH memory. These holes in the memory would trigger the HW traversal hangs. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40858>	2026-06-16 01:00:55 +00:00
Kenneth Graunke	392ccf3517	intel/nir: Turn load_global_constant into load_global_intel too This allows us to use immediate offsets for these as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42237>	2026-06-16 00:00:02 +00:00
Kenneth Graunke	dde67bbaca	intel/nir: Improve address reuse in brw_nir_lower_immediate_offsets When a base is larger than the supported [min, max] bounds, we were clamping the base to that range, and adding the rest. This works, but it leaves us with a bunch of loads/stores with the same maximum base, and different iadds for addresses. This isn't ideal, because it means that every access has a different iadd. Instead, flip it around: now we calculate the largest multiple of (max + 1) which is less than base, and iadd that. Then the new base becomes the remaining portion, which is guaranteed to be <= max. With that, all loads/stores within a maximum-offset window share a common iadd which can be CSE'd, and use the immediate offset field for small deltas from there. Note that this should work for negative offsets beyond the minimum too; we do calculate a larger negative addition and then flip to positive immediate offsets. Cuts 11% of instructions from the first compute shader of dEQP-VK.ray_query.builtin.rayqueryterminate.comp.aabbs. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42237>	2026-06-16 00:00:02 +00:00
Alyssa Rosenzweig	8f4d469a6a	intel, nir: Add {load,store}_global_intel intrinsics These take a base offset that we can plug into the LSC extended descriptor immediate. This is essentially the same improvement that we made by switching to the ssbo_intel intrinsics. eliminates spilling in dEQP-VK.ray_query.builtin.rayqueryterminate.comp.aabbs Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42237>	2026-06-16 00:00:02 +00:00
Paulo Zanoni	d3371e22d7	brw: don't preprocess software doubles if opts->softfp64 is not set Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The Anv driver doesn't ever set opts->softfp64 for the preprocess stage (anv_shader_preprocess_nir()). The Vulkan preprocess stage is a "physical device" stage, and softfp64 requires the actual anv_device: see the comments for the preprocess_nir function pointer inside the definition of struct vk_device_shader_ops, and the definition of anv_ensure_fp64_shader(). It is only during anv_shader_compile() that we call anv_ensure_fp64_shader(), where we actually build and store the nir_shader we name fp64_nir. Then we have everything ready and we can call the nir_lower_doubles pass. To account for all that, just have brw check if opts->softfp64 is actually set, and disable the full_software lowering if we don't have it: otherwise we'll either segfault or hit the assert(softfp64) that is in lower_doubles_instr_to_soft() in nir_lower_double_ops.c. This prevents a segfault (or an assertion failure when in debug mode) when running DIRT 5 on Tiger Lake. Fixes: `7d3b62e13d` ("anv: only load fp64 software shader when needed") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42105>	2026-06-15 23:34:04 +00:00
Georg Lehmann	5134104c9c	nir/skip_helpers: don't require helpers for non uniform descriptors Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details If the descriptor is allowed to be non uniform, we don't have to force helpers to keep it uniform. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>	2026-06-15 19:23:06 +00:00
Georg Lehmann	78206c06fe	nir/skip_helpers: keep descriptors uniform even for stores that skip helpers Before we only did this for loads, but the same logic applied here too. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>	2026-06-15 19:23:06 +00:00
Georg Lehmann	7516487df3	nir/skip_helpers: handle vendored store_scratch We might as well make sure that those backends don't break on future use. At least jay will probably use this pass. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>	2026-06-15 19:23:06 +00:00
Georg Lehmann	8b53692614	nir/skip_helpers: fix stores with ACCESS_INCLUDE_HELPERS These need all sources, including the data, to not skip helpers. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>	2026-06-15 19:23:06 +00:00
José Roberto de Souza	d3c50442f9	anv: Replace anv_descriptor_set_binding_layout::descriptor_data_sampler_size by a local variable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42023>	2026-06-15 18:33:32 +00:00
José Roberto de Souza	ef26f1a592	anv: Support sampler state of different sizes A future GPU will have a larger size for the sampler state in GPU, so here doing the necessary adjustment to support sampler state of any size in run-time. For now ANV_SAMPLER_STATE_GPU_SIZE is doing a dumb check because without it compiler will complain that device is not used. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42023>	2026-06-15 18:33:32 +00:00
José Roberto de Souza	0e41204af2	anv: Fix memcpy overflows around sampler state This issue happens in a couple of places but here main problem: ANV_SAMPLER_STATE_SIZE is 32 bytes long(no idea why), but SAMPLER_STATE in GPU is 16 bytes long. anv_sampler_state::state and anv_sampler_state::state_no_bc has 16 bytes of storage but in some places we do a mempcy of ANV_SAMPLER_STATE_SIZE bytes, like in anv_GetDescriptorEXT(): memcpy(pDescriptor, sampler->state.state[0], ANV_SAMPLER_STATE_SIZE); So lets replace the magic numbers by macros, have CPU data with ANV_SAMPLER_STATE_SIZE size and only when copying to GPU copy the exacly size that GPU expects with ANV_SAMPLER_STATE_GPU_SIZE. Cc: stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42023>	2026-06-15 18:33:32 +00:00
Alyssa Rosenzweig	2e169746ec	nir/opt_dead_cf: delete redundant returns/halts Cleans up the final halt in dEQP-VK.rasterization.frag_side_effects.color_at_beginning.terminate_invocation with the terminate lowering. O(1) for the function so that's pretty good. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42219>	2026-06-15 18:00:43 +00:00
Alyssa Rosenzweig	9cc686ac72	jay: rewrite demote/terminate/helper/halt handling * implement terminate * fix HALT brokenness on all shader stages (we need a real end block) * optimize demote codegen a ton * optimize gl_HelperInvocation/gl_SampleMask * optimize "all lanes demoted" via HALT.any * optimize scheduling of stores/atomics/demotes in FS * optimize some texturing with helper invocations Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:16 +00:00
Alyssa Rosenzweig	52d4d47edc	jay: track skip_helpers Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	483999e954	jay: clang-format Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	ec29c05907	jay: manually format jay_type_for_glsl_base_type Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	86ac591a5c	jay: autopep8 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Kenneth Graunke	00aa817892	jay: Implement load_subgroup_size Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	7e0027444c	jay: forbid 8-bit immediate prop dEQP-VK.renderpasses.renderpass1.depth_stencil_write_conditions.stencil_terminate_initialize_d24unorm_s8ui Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	7dc69f747f	jay: cache message headers locally this is a little spicier than CSE, but well, the stats speak for themselves. SIMD16: Totals from 1150 (43.45% of 2647) affected shaders: Instrs: 1752063 -> 1671121 (-4.62%); split: -4.62%, +0.00% CodeSize: 24366528 -> 23326992 (-4.27%); split: -4.28%, +0.01% SIMD32: Totals from 1152 (43.52% of 2647) affected shaders: Instrs: 2008124 -> 1922714 (-4.25%); split: -4.27%, +0.02% CodeSize: 28563184 -> 27442624 (-3.92%); split: -3.95%, +0.02% Number of spill instructions: 12562 -> 12600 (+0.30%); split: -0.02%, +0.32% Number of fill instructions: 31496 -> 31545 (+0.16%); split: -0.01%, +0.16% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	592022f989	jay: remove #include Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	94e141b374	jay: pool constants per block under pressure annoying compromise solution. SIMD16: Totals from 322 (12.16% of 2647) affected shaders: Instrs: 980848 -> 990315 (+0.97%); split: -0.02%, +0.98% CodeSize: 13477760 -> 13727264 (+1.85%); split: -0.05%, +1.90% Number of spill instructions: 170 -> 124 (-27.06%) Number of fill instructions: 378 -> 260 (-31.22%) SIMD32: Totals from 1105 (41.75% of 2647) affected shaders: Instrs: 2376617 -> 2400203 (+0.99%); split: -0.28%, +1.27% CodeSize: 33702928 -> 34234176 (+1.58%); split: -0.28%, +1.86% Number of spill instructions: 14487 -> 14250 (-1.64%) Number of fill instructions: 34343 -> 33745 (-1.74%); split: -1.75%, +0.01% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Kenneth Graunke	c19607b505	jay: Still predicate Null RT store if everything is discarded Even if nothing is being written, we still need to avoid generating fragments for occlusion query purposes. Fixes dEQP-GLES31.functional.fbo.no_attachments.maximums.height as well as misrendering in Baldurs Gate 3 and Elder Scrolls Online. Fixes: `e7cfcf41f4` ("jay: Ignore RT store condition if there are no outputs") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	5e64954fe0	jay: introduce accumulators into the partition In SIMD16, map acc2/acc3 as extra GPRs. This gets us a pressure reduction. We leave acc0/acc1 reserved for mul_32 lowering and for parallel copy lowering, changing this would be very challenging due to the possibility of SIMD1 multiplies leading to uniform access on the accumulator => stuff blows up. But this is an easy win on select platforms. Note we still use acc2/acc3 for post-RA accumulator substitution, this just lets us also use them as panic registers. SIMD16: Totals from 784 (29.62% of 2647) affected shaders: Instrs: 1686724 -> 1686700 (-0.00%); split: -0.15%, +0.15% CodeSize: 23406952 -> 23409432 (+0.01%); split: -0.16%, +0.17% Number of spill instructions: 224 -> 174 (-22.32%) Number of fill instructions: 546 -> 382 (-30.04%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	091e6976d9	jay: allow more 3-src imms SIMD16: Totals from 2082 (78.66% of 2647) affected shaders: Instrs: 2349636 -> 2345856 (-0.16%); split: -0.23%, +0.06% CodeSize: 32796448 -> 32748448 (-0.15%); split: -0.30%, +0.15% SIMD32: Totals from 2081 (78.62% of 2647) affected shaders: Instrs: 2609349 -> 2604746 (-0.18%); split: -0.27%, +0.09% CodeSize: 37148624 -> 37077824 (-0.19%); split: -0.34%, +0.15% Number of spill instructions: 13104 -> 13094 (-0.08%) Number of fill instructions: 32677 -> 32662 (-0.05%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	e673e68ac3	jay/register_allocate: don't search a 2nd UGPR temp unused Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	ec1a9d353a	jay/register_allocate: simplify split copy logic Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	178983f51b	jay/validate: check for mixed ugpr/gpr problems hit with spilling stuff. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	8943077cb9	jay: fix bogus unit tests will fail validation with next commit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	8863fd74e4	jay/spill: simplify limit() stats look like mostly noise - qsort() is not a stable sort. SIMD16: Totals from 10 (0.38% of 2647) affected shaders: Instrs: 225879 -> 225843 (-0.02%); split: -0.03%, +0.02% CodeSize: 3237440 -> 3237152 (-0.01%); split: -0.03%, +0.02% SIMD32: Totals from 151 (5.70% of 2647) affected shaders: Instrs: 922754 -> 923121 (+0.04%); split: -0.15%, +0.19% CodeSize: 13313364 -> 13318004 (+0.03%); split: -0.15%, +0.18% Number of spill instructions: 7510 -> 7452 (-0.77%) Number of fill instructions: 22819 -> 22845 (+0.11%); split: -0.01%, +0.12% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	32ace4f400	jay/spill: drop sketchy heuristic I did this for AGX for some Blender shader but apparently it's not doing all that much for Jay! SIMD16: Totals from 0 (0.00% of 2647) affected shaders: SIMD32: Totals from 8 (0.30% of 2647) affected shaders: Instrs: 29566 -> 29255 (-1.05%); split: -1.08%, +0.03% CodeSize: 432672 -> 427408 (-1.22%); split: -1.24%, +0.02% Number of spill instructions: 799 -> 658 (-17.65%) Number of fill instructions: 1010 -> 906 (-10.30%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	bcd958cb89	jay/spill: refactor Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	c96dba7e47	jay/spill: unstub rematerialization Once we start spilling UGPRs, this will become interesting. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	97a44c138d	jay/spill: do initial find-and-replace for ugpr spilling doesn't solve any of the hard problems yet, just getting prepped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	99a2538029	jay/spill: spill at definitions Braun-Hack has a complex algorithm to insert spills on-demand when pressure exceeds the limit, with fix ups along control flow to ensure we spill along each execution path. Faith implemented a slightly different version in NAK which is what Jay did, with some nonobvious tradeoffs between the two. But actually.. why are we doing this at all? We can alternatively spill immediately after the definition, which guarantees (by the usual properties of dominance) that the spill executes before any reloads. Then we don't need any tricky bookkeeping or control flow fixups. Beyond the simplification, this has a couple advantages: * Lower register pressure throughout more of the program. This doesn't affect the /amount/ of spilling, but it gives RA more freedom so should reduce shuffling. This in itself probably justifies doing this. * Less SSA repair needed around memory definitions, which can reduce memory traffic due to our naive memory definition handling. (I think if we implemented Braun-Hack properly with CSSA this would be less of a concern, but whatever.) * Less reliance on the "no critical edges" property which will come in handy for UGPR spilling, this is a yak shave for that. * No risk of executing the same spill twice due to divergence (if we spill inside of a divergent IF). This means this commit is probably a better idea for GPUs than CPUs in practice. This also has a couple of disadvantages explaining why the paper didn't do this: * If a value only needs to be spilled/filled in conditional control flow, this executes extra spills. But, spills are cheaper than fills (they burn bandwidth but have basically no latency since they're stores), so I'm not super concerned about this corner case. * If a value is defined in a loop and needs to be spilled due to a use outside the loop, it spills N times instead of 1. This is more compelling reason to do the paper's think. But we demand the input program is LCSSA so this shouldn't actually happen for us. * The spill stalls on the definition value. That's probably not a big deal and the eventual post-RA scheduler should be able to cope. Overall I think this is a reasonable direction. We can revisit later but I don't want to add more complexity to the spiller than absolutely necessarily, and it's about to be necessary to add complexity for UGPRs. SIMD16: Totals from 17 (0.64% of 2647) affected shaders: Instrs: 250304 -> 249221 (-0.43%); split: -0.44%, +0.01% CodeSize: 3476640 -> 3461312 (-0.44%); split: -0.45%, +0.01% Number of spill instructions: 555 -> 223 (-59.82%) Number of fill instructions: 551 -> 543 (-1.45%) SIMD32: Totals from 420 (15.87% of 2647) affected shaders: Instrs: 1779193 -> 1698683 (-4.53%); split: -4.53%, +0.01% CodeSize: 25455456 -> 24198416 (-4.94%); split: -4.95%, +0.01% Number of spill instructions: 36900 -> 14440 (-60.87%) Number of fill instructions: 36550 -> 35103 (-3.96%) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>	2026-06-15 17:29:15 +00:00
Alyssa Rosenzweig	68b21aa375	nir/lower_terminate_to_demote: tweak terminate_if lowering Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This saves instructions on Jay. We probably could teach backend predication to chew thru the mess, but I don't see a reason not to just do this everywhere. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42197>	2026-06-15 16:12:44 +00:00
Lorenzo Rossi	706d431d08	pan/bifrost: Fix 16-bit demote_if Right now demote_if only works with 32-bit registers but in NIR it can also have 16-bit sources, we have a couple of bug on those. First is during NIR->BIR translation (h0 swizzle was not set), second is in discard_b32 to discard_f32 lowering (bifrost has restrictions). Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42197>	2026-06-15 16:12:44 +00:00
Caio Oliveira	10486d0dbc	nir: Handle nir_var_mem_push_const in divergence analysis Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42243>	2026-06-15 15:33:13 +00:00
Jose Maria Casanova Crespo	cdc6a0bfed	v3dv: allow TFU readahead padding above maxMemoryAllocationSize Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Our get_buffer/image_memory_requirements() pad TRANSFER_SRC resources with V3D_TFU_READAHEAD_SIZE, so allocating the reported requirements of a resource of exactly maxMemoryAllocationSize failed with VK_ERROR_OUT_OF_DEVICE_MEMORY. Accept up to one extra page over the limit: since the allocation size is page-aligned, that covers any sub-page padding. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42179>	2026-06-15 15:15:10 +00:00
Samuel Pitoiset	0c3a45c202	radv: create one winsys for each logical device This prevents holding open file descriptors after physical devices are enumerated. This also prevents potential (and unknown) multithreading issues with the winsys being shared between more than one logical device. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41824>	2026-06-15 14:52:50 +00:00
Samuel Pitoiset	b7cd4d718c	radv: duplicate the fd used for syncobj with KHR_display This is required to move the winsys to logical devices. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41824>	2026-06-15 14:52:49 +00:00
Samuel Pitoiset	47e818c18e	radv: query heap info without using the winsys The winsys will be moved to logical devices. This creates a ac_drm_device on-demand to make the call faster because otherwise it's too slow for a function that can be called every frame. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41824>	2026-06-15 14:52:49 +00:00
Samuel Pitoiset	9bbc72f3de	radv/amdgpu: add a function to query heap info To remove the winsys dependency. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41824>	2026-06-15 14:52:49 +00:00

1 2 3 4 5 ...

224393 commits