fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 22:08:10 +02:00

Author	SHA1	Message	Date
Ian Romanick	0c089a5c32	brw: Eliminate duplicate fills When the register allocator decides to spill a value, all reads of that value are filled. This can result in cases where the same value is filled many times in a single block. In those cases, the result of an earlier fill may still be available when a later fill occurs. This optimization replaces the later fill with a move from the result of the earlier fill. v2: Use FIXED_GRF for register overlap tests. Since this is after register allocation, the VGRF values will not tell the whole truth. v3: Use brw_transform_inst. Suggested by Caio. Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. v4: In intervening spill to the same location also invalidates the value. 🤦 v5: Don't eliminate a fill if its destination partially overlaps the preceeding fill destination. Fixes failures in cooperative matrix CTS. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17249903 -> 17249653 (<.01%) instructions in affected programs: 35550 -> 35300 (-0.70%) helped: 20 / HURT: 0 total cycles in shared programs: 893092398 -> 893101836 (<.01%) cycles in affected programs: 2501720 -> 2511158 (0.38%) helped: 6 / HURT: 14 total fills in shared programs: 1901 -> 1776 (-6.58%) fills in affected programs: 1757 -> 1632 (-7.11%) helped: 20 / HURT: 0 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 929949528 -> 926770338 (-0.34%) Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02% Fill count: 6520785 -> 5021518 (-22.99%) Totals from 54281 (2.69% of 2018922) affected shaders: Instrs: 239616289 -> 236437099 (-1.33%) Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08% Fill count: 6406295 -> 4907028 (-23.40%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:13 +00:00
Ian Romanick	d2e3707ecc	brw: Eliminate redundant fills and spills When the register allocator decides to spill a value, all writes to that value are spilled and all reads are filled. In regions where there is not high register pressure, a spill of a value may be followed by a fill of that same file while the spilled register is still live. This optimization pass finds these cases, and it converts the fill to a move from the still-live register. The restriction that the spill and the fill must have matching NoMask really hampers this optimization. With the restriction removed, the pass was more than 2x helpful. v2: Require force_writemask_all to be the same for the spill and the fill. v3: Use FIXED_GRF for register overlap tests. Since this is after register allocation, the VGRF values will not tell the whole truth. v4: Use brw_transform_inst. Suggested by Caio. The allows two of the loops to be merged. Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. v5: Add no-fill-opt debug option to disable optimizations. Suggested by Lionel. v6: Move a calculation outside a loop. Suggested by Lionel. v7: Check that spill ranges overlap instead of just checking initial offset. Zero shaders in fossil-db were affected, but some CTS with spill_fs were fixed (e.g., dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize). Suggested by Lionel. v8: Add DEBUG_NO_FILL_OPT to debug_bits in brw_get_compiler_config_value(). Noticed by Lionel. shader-db: Lunar Lake total instructions in shared programs: 17249907 -> 17249903 (<.01%) instructions in affected programs: 10684 -> 10680 (-0.04%) helped: 2 / HURT: 0 total cycles in shared programs: 893092630 -> 893092398 (<.01%) cycles in affected programs: 237320 -> 237088 (-0.10%) helped: 2 / HURT: 0 total fills in shared programs: 1903 -> 1901 (-0.11%) fills in affected programs: 110 -> 108 (-1.82%) helped: 2 / HURT: 0 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19968898 -> 19968778 (<.01%) instructions in affected programs: 33020 -> 32900 (-0.36%) helped: 10 / HURT: 0 total cycles in shared programs: 885157211 -> 884925015 (-0.03%) cycles in affected programs: 39944544 -> 39712348 (-0.58%) helped: 8 / HURT: 2 total fills in shared programs: 4454 -> 4394 (-1.35%) fills in affected programs: 2678 -> 2618 (-2.24%) helped: 10 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 930445228 -> 929949528 (-0.05%) Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00% Spill count: 3495279 -> 3494400 (-0.03%) Fill count: 6767063 -> 6520785 (-3.64%) Totals from 43844 (2.17% of 2018922) affected shaders: Instrs: 212614840 -> 212119140 (-0.23%) Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03% Spill count: 2831100 -> 2830221 (-0.03%) Fill count: 6128316 -> 5882038 (-4.02%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 1001375893 -> 1001113407 (-0.03%) Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01% Spill count: 3729157 -> 3728585 (-0.02%) Fill count: 6697296 -> 6566874 (-1.95%) Totals from 35062 (1.53% of 2284674) affected shaders: Instrs: 179819265 -> 179556779 (-0.15%) Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04% Spill count: 2453752 -> 2453180 (-0.02%) Fill count: 5279259 -> 5148837 (-2.47%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:13 +00:00
Ian Romanick	b7f5285ad3	brw: Add fill and spill opcodes for LSC platforms These opcodes are emitted during register allocation instead of the scratch reads and writes that were previously emitted. These instructions contain additional information (i.e., the instruction encodes the scratch offset) that enable optimizations to be added later. The fill and spill opcodes are lowered to scratch reads and writes shortly after register allocation. Eventually this lower may have some optimizations (e.g., reuse previous address calculations for successive spills). v2: Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:12 +00:00
Ian Romanick	2215003d95	brw: Add OPT macro to brw_shader.cpp like brw_opt.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:11 +00:00
Ian Romanick	1f42ff530c	brw: Return the new register from brw_lower_vgrf_to_fixed_grf ...and make the function public. v2: s/struct brw_reg/brw_reg/. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:11 +00:00
Ian Romanick	243a3a4ca7	brw: Don't pass compressed to brw_lower_vgrf_to_fixed_grf The parameter is never used. It's recalculated in the function. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:10 +00:00
Ian Romanick	1fc2f52d36	brw: Force allow_spilling when spill_all is set This ensures that g0 is reserved for spilling since there is going to be spilling. Fixes: `8bca7e520c` ("intel/brw: Only force g0's liveness to be the whole program if spilling") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:09 +00:00
Ian Romanick	042417a72e	brw: Don't spill_all on internal shaders Basically all of the internal shaders (e.g., from blorp) will fail assertions if there is any scratch space used. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:09 +00:00
Alyssa Rosenzweig	e3328dfa2f	brw: only initialize sample mask flag if needed This is a refinement of `7c129d9365` ("intel/brw/xe2+: Keep PS sample mask in the f1.0 register whether or not kill is used."). Rather than always insert this move, do so only when we'll actually read the register: for memory writes and for discards. This deletes an instruction from piles of fragment shaders. shader-db on LNL: total instructions in shared programs: 17134031 -> 17042706 (-0.53%) instructions in affected programs: 9065743 -> 8974418 (-1.01%) helped: 65045 HURT: 0 helped stats (abs) min: 1.0 max: 3.0 x̄: 1.40 x̃: 1 helped stats (rel) min: <.01% max: 50.00% x̄: 3.06% x̃: 1.64% 95% mean confidence interval for instructions value: -1.41 -1.40 95% mean confidence interval for instructions %-change: -3.10% -3.03% Instructions are helped. total cycles in shared programs: 885172098 -> 884835306 (-0.04%) cycles in affected programs: 590294230 -> 589957438 (-0.06%) helped: 53636 HURT: 4500 helped stats (abs) min: 2.0 max: 1126.0 x̄: 8.02 x̃: 4 helped stats (rel) min: <.01% max: 50.00% x̄: 1.24% x̃: 0.24% HURT stats (abs) min: 2.0 max: 7706.0 x̄: 20.77 x̃: 6 HURT stats (rel) min: <.01% max: 82.06% x̄: 1.09% x̃: 0.54% 95% mean confidence interval for cycles value: -6.15 -5.43 95% mean confidence interval for cycles %-change: -1.10% -1.02% Cycles are helped. LOST: 385 GAINED: 47 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38665>	2025-11-26 16:53:36 +00:00
Kenneth Graunke	3182deaae1	brw: Combine output stores for TCS outputs even when unlinked Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Otherwise we get a lot of individual x/y/z stores to tesslevels when we should really just be storing the whole thing at once. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:03 +00:00
Kenneth Graunke	7e02738b63	brw: Drop check for legacy tess levels from remap_patch_urb_offsets The newly rewritten remap_tess_levels_legacy will have already lowered anything it cares about to URB intrinsics. So the generic remapping pass won't see them, as it operates on generic input/output intrinsics. This also drops some of the callback boilerplate we needed temporarily. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:03 +00:00
Kenneth Graunke	d95a9714c2	brw: Rewrite legacy tess level remapping This unifies the dynamic (SSO) and fixed (linked together) versions. We emit piles of NIR as if we were doing the dynamic version, but replace the tess config field access with constant values. It all should optimize away back to something reasonable. We lower these directly to URB read/write intrinsics. It also rewrites the dynamic version to directly read/write the URB rather than going through temporaries. The old version was broken in that tessellation control shader invocations can technically use the shared output area for cross-invocation data sharing with barriers, although doing so using the built-in tesslevel patch outputs is very unlikely. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:03 +00:00
Kenneth Graunke	ee407481c2	brw: Switch to URB intrinsics for TCS inputs Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:02 +00:00
Kenneth Graunke	943b2acf02	brw: Switch to NIR URB intrinsics for TES inputs Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:01 +00:00
Kenneth Graunke	c0d69b2faf	brw: Switch to NIR URB intrinsics for TCS outputs Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:01 +00:00
Kenneth Graunke	9aff3cac3c	brw: Add infrastructure for lowering to URB intrinsics Based on earlier code by Lionel Landwerlin. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:44:00 +00:00
Kenneth Graunke	13acc889af	brw: Use io_sem.location instead of base to get varying slots Alyssa noted we can be using semantic IO here rather than relying on bases not having been remapped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:59 +00:00
Kenneth Graunke	96d331766a	brw: Generalize read_attribute_payload_intel to handle more cases We were using this for indirect loads of the shader input thread payload, but there's no reason we can't use it for constant access too. In this case we can just MOV from the ATTR file directly without a special opcode that turns into MOV_INDIRECT later. We also allow it to load multiple components now. This is useful for say, returning vec4 pushed inputs. And, we allow it in more stages than just the fragment stage. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:59 +00:00
Kenneth Graunke	792762617a	brw: Rename read_attribute_payload_intel to load_attribute_payload_intel We're going to change the intrinsic to a load(...) which puts "load" in the name. Also, it's just more consistent with our usual terminology. We also rename the corresponding backend opcode so they remain matched. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:58 +00:00
Kenneth Graunke	0f7590af81	brw, anv, iris: Switch to reversed patch header layouts These are a ton more convenient. When the TCS and TES were linked together, the legacy layouts were a hassle, but didn't impose any significant cost. With unlinked TCS and TES, the legacy layouts involve significant runtime code for scrambling the data, whereas the reversed layouts are substantially less overhead. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:57 +00:00
Kenneth Graunke	7d1dfc3468	brw: Lower tesslevel vars to vectors even for unlinked TCS/TES st/nir lowers this for iris, and brw_link_shaders lowers this for anv, but for unlinked tessellation control / evaluation shaders, the lowering was not happening for TCS. Just do it unconditionally when lowering TCS outputs and TES inputs. This lets the remapping code just assume vectors all the time, rather than getting single component stores with nir_intrinsic_component set (which came from nir_lower_io lowering compact arrays). This also requires changes to the dynamic unlinked TCS/TES lowering to temporaries, which needs to use vectors rather than arrays with this change. That code is going away in future patches anyway, but this keeps it going for now to avoid interim breakage. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	7736e693b1	brw: Pass devinfo into remap_patch_urb_offsets Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	4dc6413de8	brw: Rename remap_non_header_patch_values to remap_patch_values See rationale in the previous patch. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	2b51963b8c	brw: Remap tesslevels before other patch remapping We now call remap_tess_levels before remap_non_header_patch_urb_offsets. The latter already excludes tess levels anyway, so the order shouldn't matter. This paves the way for remap_tess_levels to skip handling some header values in certain cases, because with reversed layouts, many of them no longer need any special handling and we can just let the generic pass handle them. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	e8669a8333	brw: Rework the tess level remapping interface Just have a single remap_tess_levels that does either the statically-known-primitive or the dynamic (unlinked) mode. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	1995c879a9	brw: Flip the TESS_LEVEL_INNER/OUTER vue map slot assignments Our current legacy patch header layout handling doesn't actually care which is which slot, and remaps everything to its correct spot anyway. For using the newer "reversed" patch header layouts, it will be more convenient to have outer as slot 0, and inner as slot 1, as that just works with no special remapping needed for both quads and triangles (but unfortunately isolines are still a pain). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	e5c1d00faf	brw: Pass devinfo to brw_nir_lower_tes_inputs This will be useful for using reversed patch header layouts. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Kenneth Graunke	a1c7ae9d15	brw: Implement URB handle intrinsics for TCS and TES stages Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:56 +00:00
Lionel Landwerlin	e290f9641d	brw: Implement load/store URB intrinsics These work the same regardless of stage. v2 (Ken): Rebase, move from mesh to all stages, add reorderable load variant, allow channel masks to be non-constant even on Xe2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:55 +00:00
Lionel Landwerlin	0d8ee4ed23	brw: use default builder for urb handle adjustment Be consistent with lowering that happens after, so that it gets a full vector register and can stride into it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:55 +00:00
Lionel Landwerlin	7e72d392d7	brw: switch to load_(pixel_coord\|frag_coord_z\|frag_coord_w) intrinsics Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Allows us to better determine if we need Z/W payload delivery. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36392>	2025-11-25 15:50:48 +00:00
Kenneth Graunke	e49418744a	brw: Set extended_bindless_surface_offset to true for Gfx12.5+ anv sets device->uses_ex_bso on verx10 >= 125 and then sets the compiler->extended_bindless_surface_offset to that. iris was not setting anything. However, LSC_ADDR_SURFTYPE_SS used for scratch on Gfx12.5 is bindless, and Xe2 uses ExBSO for all UGM access, so we need to be setting this. Just set it in the compiler so both drivers have it set. Fixes piglit arb_tessellation_shader-tes-gs-max-output -small -scan 1 50 on iris. Fixes: `80c89909f3` ("brw: fixup immediate bindless surface handling") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38645>	2025-11-25 08:21:30 +00:00
Lionel Landwerlin	d51c0b8988	brw: fix SS surfaces usage In `80c89909f3` ("brw: fixup immediate bindless surface handling") I forgot that we have a special usage for the only _SS surface (the scratch surface). Because it's only delivered in the 31:10 bits of R0 and because we want to minimize the amount of shader instructions for scratch messages, the surface offset in shifted right by the driver to align things properly for the 31:6 extended descriptor format. This is unfortunately incompatible with the full 32bit format of ExBSO. So this surface type currently cannot be considered bindless. We might revisit later if we start using _SS surfaces for other things. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `80c89909f3` ("brw: fixup immediate bindless surface handling") Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38618>	2025-11-24 16:12:27 +00:00
Lionel Landwerlin	8f9acc0150	brw: compute final copy propagation resulting source Fixes this test on Xe2+: INTEL_DEBUG=no32 ./deqp-vk -n dEQP-VK.spirv_assembly.instruction.maint9_vectorization.bit_field_u_extract.result_v16i-base_v16i-offset_s64u-count_s16i Generate invalid code for that platform: and(16) g37<1>UW g65<16,4,4>UW 0x000fUW { align1 1H I@5 }; ERROR: Invalid register region for source 0. See special restrictions section. Several helpers like has_subdword_integer_region_restriction() do not see the final type of the source, so compute it early. Maybe new_src could be used in more cases. Being conservative for now. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38548>	2025-11-24 10:14:32 +00:00
Kenneth Graunke	3160c516ca	brw: Delete input_slots_valid from brw_wm_prog_key Nothing in the compiler seems to use this anymore. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38556>	2025-11-20 14:10:39 -08:00
Kenneth Graunke	868377e4c7	brw: Delete program_string_id from brw program keys This is strictly a GL thing. iris can manage it in its own program keys without polluting the compiler with stuff nobody else cares about. We can also drop a lot of padding that was introduced in commit `a18835a9ca` which doesn't appear to be necessary. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38556>	2025-11-20 14:10:38 -08:00
Marek Olšák	9e339f4b32	nir: rename nir_lower_indirect_derefs -> nir_lower_indirect_derefs_to_if_else_trees This describes better what it does. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38471>	2025-11-20 05:42:11 +00:00
Boris Brezillon	ea4d4d2a77	nir: Prepare nir_lower_io_vars_to_temporaries() for optional PLS lowering Rather than adding another boolean to optionally lower PLS vars, pass the types we want to lowers through a nir_variable_mode bitmask. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>	2025-11-18 20:25:42 +00:00
Lionel Landwerlin	4816318887	brw: fix workaround fence rlen field Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details send.ugm (1\|M0) r125 r0 null:0 0x0 0x0200651F {$9} // wr:1+0, rd:0; fence invalid flush type scoped to tile When destination of Send(s) is not null, the response length must not be 0. Should only affect DG2 products. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38478>	2025-11-17 17:08:30 +00:00
Calder Young	d6fbbfef5c	brw: fix SIMD lowering of fp16 sampler message data with multiple components Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Fixes: `61d6aea4` ("brw: fix SIMD lowering of sampler messages with fp16 data") Closes: mesa/mesa#13149 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38455>	2025-11-17 12:38:14 +00:00
Marek Olšák	e372365cf4	nir: rename nir_copy_prop -> nir_opt_copy_prop Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38411>	2025-11-15 02:16:38 +00:00
Caio Oliveira	e20d910a6a	brw: Remove 3src_exec_size from the field macros It is incomplete and it is the same as regular exec_size. Change the test code that was using it to use the regular one. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38208>	2025-11-14 18:46:58 +00:00
Lionel Landwerlin	61d6aea401	brw: fix SIMD lowering of sampler messages with fp16 data We need to make sure the data part returned by sampler messages is always aligned to a physical register. Just like the residency data lives in a single physical register after the data. Lowering a vec3 16bits per components led to a half a physical register allocation which then confused the descriptor lowering (expecting physical register units). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `295734bf88` ("intel/fs: fix residency handling on Xe2") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12794 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34008>	2025-11-14 10:26:23 +02:00
Lionel Landwerlin	80c89909f3	brw: fixup immediate bindless surface handling This is unused at the moment but the backend incorrectly assumes immediate handles are for the binding table (therefore not bindless). Some new CTS tests are using an immediate bindless handle which is broken. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38359>	2025-11-14 00:24:55 +00:00
Lionel Landwerlin	b3cc54731f	brw: fixup 64bit atomics emulation on 2D array images Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ce7208c3ee` ("brw: add support for texel address lowering") Acked-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38409>	2025-11-14 00:01:50 +00:00
Yonggang Luo	ecb0ccf603	treewide: Replace calling to function ALIGN with align This is done by grep ALIGN( to align( docs,*.xml,blake3 is excluded Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>	2025-11-12 21:58:40 +00:00
Yonggang Luo	db767eb7e0	brw: Do not use align as variable name, as it's a function in u_math.h and will be used Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>	2025-11-12 21:58:38 +00:00
Konstantin Seurer	de32f9275f	treewide: add & use parent instr helpers We add a bunch of new helpers to avoid the need to touch >parent_instr, including the full set of: * nir_def_is_* * nir_def_as__or_null nir_def_as_* [assumes the right instr type] * nir_src_is_* * nir_src_as_* * nir_scalar_is_* * nir_scalar_as_* Plus nir_def_instr() where there's no more suitable helper. Also an existing helper is renamed to unify all the names, while we're churning the tree: * nir_src_as_alu_instr -> nir_src_as_alu ..and then we port the tree to use the helpers as much as possible, using nir_def_instr() where that does not work. Acked-by: Marek Olšák <maraeo@gmail.com> --- To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm taking this opportunity to clean up a lot of NIR patterns. Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Kenneth Graunke	9ffae42975	brw: Store brw_urb_inst::offset in bytes on Xe2 Xe2 uses byte offsets rather than OWord offsets. We've been storing the per-slot offsets in bytes on Xe2 for a while, but kept the global offset immediate in OWords for some reason, choosing to lower it during logical send lowering. This patch makes both offsets (global immediate, per-slot) in the same units, so they could be added together if necessary without scaling. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>	2025-11-11 10:55:44 +00:00
Kenneth Graunke	cde3a34a43	brw: Use nir_intrinsic_[set_]base rather than poking at const_index[0] Much clearer, especially since we're dealing with at least four different kinds of intrinsics. These helpers were introduced years ago, but probably didn't exist when we first wrote this code. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>	2025-11-11 10:55:43 +00:00

1 2 3 4 5 ...

4768 commits