fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 15:18:09 +02:00

Author	SHA1	Message	Date
José Roberto de Souza	39ec9e3448	intel/brw: Add and call brw_lsc_supports_base_offset() in places that checks for support of this feature Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39817>	2026-02-19 16:53:03 +00:00
José Roberto de Souza	91c5744e25	intel/brw: Use computed push constants size in brw_assign_urb_setup() It was already computed in brw_shader::assign_curb_setup() so we can use it in brw_assign_urb_setup(). There was a mismatch between assign_curb_setup() and brw_assign_urb_setup() when push_sizes were not multiple of REG_SIZE, the first one was aligning every push_sizes before sum it, while brw_assign_urb_setup() was only aligning the sum of all push_size. By luck the only places that did not had a push_size aligned to REG_SIZE only had one push_size, so this was not an issue. So here also fixing this mismatch and adding an assert to caught any future mismatch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39817>	2026-02-19 16:53:03 +00:00
Lionel Landwerlin	d1a1e98e4e	brw: handle non-GRF aligned pushed UBO masking Right now all the drivers align push data to GRF (32B pre Xe2, 64B post Xe2) but the push constant delivery mechanism can actually pack 32B ranges so alignment is not required. Off course we need the push UBO masking to deal with unaligned pushed ranges. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Calder Young <cgiacun@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>	2026-02-12 16:45:25 +00:00
Kenneth Graunke	c5859b2d40	intel: Rename wm_prog_key to fs_prog_key Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This is the shader key for the fragment shader. Nobody even knows what the windowizer/masker unit is or does anymore. Even on Gen4-6, "fs" is still clearer. This makes the codebase easier to read. This is only about 15 years overdue. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>	2026-02-06 20:52:01 -08:00
Kenneth Graunke	56e638be81	intel: Rename wm_prog_data to fs_prog_data This is the program data for the fragment shader. Nobody even knows what the windowizer/masker unit is or does anymore. Even on Gen4-6, "fs" is still clearer. This makes the codebase easier to read. This is only about 15 years overdue. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>	2026-02-06 20:51:59 -08:00
Kenneth Graunke	6fbe201a12	brw: Convert VS/TES/GS outputs to URB intrinsics. For VS/TES/GS, we lower all outputs to temporaries and emit copies at the end of the shader (or for GS, at each EmitVertex() call) from those temporaries back to real outputs. We use vec8 URB writes without writemasking, since our output area's contents are undefined anyhow. This is simpler than what TCS and Mesh do, which allow for output variables to be read/written at a per-component level at any time, with the output memory being used for cross-thread communication. Rather than using the complicated TCS/Mesh handling and relying on vectorization, we port the emit_urb_writes() approach to NIR. This also takes care of emitting the VUE header with default values when fields aren't explicitly written by the shader. We also handle multiview in the process. It simplifies things, and also drops another case of non-semantic IO in brw. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:21 +00:00
Kenneth Graunke	52341b8b9c	brw: Split EOT handling out of emit_urb_writes() The TES workaround code is still going to be needed even after we rework URB output handling for VS/TES/GS to use NIR intrinsics. For VS, we know at least one URB write will have been emitted at the end of the program, so we can just tag it. GS already handles EOT via emit_gs_thread_end(). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>	2026-02-03 19:11:21 +00:00
Caio Oliveira	da80122257	brw: Include backend NIR passes in mda files Add a pass tracker struct that can live the whole lifetime of brw_compile() functions, it will keep track of the debug_archiver and also store some metadata that allow us to name the passes. With that, we can also embed the loop tracking in the same struct, so that is free for any loop to use the "early break" optimization. There are other brw_nir_* passes that are called in the pre-processing phase. These are not currently included in the mda yet. Will be handled when we hook debug_archiver or similar to the runtime/driver. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>	2026-01-28 19:52:02 +00:00
Caio Oliveira	74f1d4f47b	intel/compiler: Use SPDX annotations Minor adjustments to formatting of the copyright line, but keep dates and holders. "Authors" entries that could be obtained via Git logs were also removed. The license in brw_disasm.c and elk_disasm.c don't match directly any SPDX pattern I could find, so kept as is. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39503>	2026-01-24 20:37:31 +00:00
Lionel Landwerlin	3d2a696763	brw: treat inline parameters like UNIFORM Makes a bunch of copy propagation and other passes work much better. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>	2026-01-20 21:25:53 +00:00
Lionel Landwerlin	0e9453291c	brw: improve push constant loading using base offsets Xe2+ only Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>	2026-01-20 21:25:52 +00:00
Lionel Landwerlin	faa857a061	intel: rework push constant handling Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details nr_params & params array are gone. brw_ubo_range is not stored on the prog_data structure anymore (Anv already stored a copy of that with its own additional information) The backend now only deals with load_push_data_intel. load_uniform & load_push_constant have to be lowered by the driver. Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify where the subgroup_id value is located in the push constants. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>	2026-01-09 14:19:52 +00:00
Lionel Landwerlin	60e359412d	iris: manage TBIMR null push constant wa in driver Anv already manages this itself. This allows removing the logic from the compiler. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>	2026-01-09 14:19:52 +00:00
Lionel Landwerlin	f4a0e05970	anv/brw/iris: get rid of param array on prog_data Drivers can do all the lowering to push constants to find the only value useful in that array (subgroup_id). Then drivers call into brw_cs_fill_push_const_info() to get the cross/per thread constant layout computed in the prog_data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>	2026-01-09 14:19:51 +00:00
Ian Romanick	b967942b64	brw: Do cmod prop again after scheduling Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details After selecting the scheduling mode, do cmod prop again. It's possible that doing cmod prop between performing a schedule and trying to register allocate would cause a different scheduling mode to be selected. However, this would require fully restoring the pre-schedule set of instructions (via cloning). I have tried to implement this, and it's harder than it looks. :( v2: Delete unused variable `progress`. Noticed by Marge. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19967018 -> 19967006 (<.01%) instructions in affected programs: 10652 -> 10640 (-0.11%) helped: 4 / HURT: 0 total cycles in shared programs: 884129990 -> 884139590 (<.01%) cycles in affected programs: 20334512 -> 20344112 (0.05%) helped: 0 / HURT: 4 fossil-db: Lunar Lake Totals: Instrs: 924967191 -> 924963460 (-0.00%); split: -0.00%, +0.00% Cycle count: 105962414958 -> 105961925594 (-0.00%); split: -0.00%, +0.00% Spill count: 3423582 -> 3423564 (-0.00%); split: -0.00%, +0.00% Fill count: 4877121 -> 4876955 (-0.00%); split: -0.00%, +0.00% Totals from 2511 (0.12% of 2018786) affected shaders: Instrs: 12541707 -> 12537976 (-0.03%); split: -0.03%, +0.00% Cycle count: 4816359238 -> 4815869874 (-0.01%); split: -0.01%, +0.00% Spill count: 179536 -> 179518 (-0.01%); split: -0.03%, +0.02% Fill count: 279407 -> 279241 (-0.06%); split: -0.07%, +0.01% Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 980252404 -> 980237686 (-0.00%); split: -0.00%, +0.00% Cycle count: 91758669556 -> 91764028404 (+0.01%); split: -0.00%, +0.01% Spill count: 3664771 -> 3664744 (-0.00%); split: -0.00%, +0.00% Fill count: 4962078 -> 4960482 (-0.03%); split: -0.04%, +0.01% Totals from 8472 (0.38% of 2251522) affected shaders: Instrs: 34977623 -> 34962905 (-0.04%); split: -0.04%, +0.00% Cycle count: 6251857553 -> 6257216401 (+0.09%); split: -0.04%, +0.13% Spill count: 480251 -> 480224 (-0.01%); split: -0.01%, +0.00% Fill count: 676539 -> 674943 (-0.24%); split: -0.28%, +0.05% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	09450faf6a	brw: Do cmod prop again after post-RA scheduling shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19968728 -> 19963825 (-0.02%) instructions in affected programs: 788014 -> 783111 (-0.62%) helped: 2503 / HURT: 0 total cycles in shared programs: 884112912 -> 884093268 (<.01%) cycles in affected programs: 20017168 -> 19997524 (-0.10%) helped: 1830 / HURT: 52 LOST: 0 GAINED: 6 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 980768016 -> 980172179 (-0.06%) Cycle count: 91762351767 -> 91757280093 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 37602592 -> 37608768 (+0.02%) Totals from 157150 (6.98% of 2251329) affected shaders: Instrs: 107323207 -> 106727370 (-0.56%) Cycle count: 12696754006 -> 12691682332 (-0.04%); split: -0.04%, +0.00% Max dispatch width: 3708584 -> 3714760 (+0.17%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Caio Oliveira	9c16bbd023	brw: Perform mark_last_urb_write_with_eot optimization after CFG Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Avoid using exec_node::remove() and the initial "main list of instructions", and instead use the existing helpers like other passes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37146>	2025-12-16 17:02:58 +00:00
Ian Romanick	d2e3707ecc	brw: Eliminate redundant fills and spills When the register allocator decides to spill a value, all writes to that value are spilled and all reads are filled. In regions where there is not high register pressure, a spill of a value may be followed by a fill of that same file while the spilled register is still live. This optimization pass finds these cases, and it converts the fill to a move from the still-live register. The restriction that the spill and the fill must have matching NoMask really hampers this optimization. With the restriction removed, the pass was more than 2x helpful. v2: Require force_writemask_all to be the same for the spill and the fill. v3: Use FIXED_GRF for register overlap tests. Since this is after register allocation, the VGRF values will not tell the whole truth. v4: Use brw_transform_inst. Suggested by Caio. The allows two of the loops to be merged. Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. v5: Add no-fill-opt debug option to disable optimizations. Suggested by Lionel. v6: Move a calculation outside a loop. Suggested by Lionel. v7: Check that spill ranges overlap instead of just checking initial offset. Zero shaders in fossil-db were affected, but some CTS with spill_fs were fixed (e.g., dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize). Suggested by Lionel. v8: Add DEBUG_NO_FILL_OPT to debug_bits in brw_get_compiler_config_value(). Noticed by Lionel. shader-db: Lunar Lake total instructions in shared programs: 17249907 -> 17249903 (<.01%) instructions in affected programs: 10684 -> 10680 (-0.04%) helped: 2 / HURT: 0 total cycles in shared programs: 893092630 -> 893092398 (<.01%) cycles in affected programs: 237320 -> 237088 (-0.10%) helped: 2 / HURT: 0 total fills in shared programs: 1903 -> 1901 (-0.11%) fills in affected programs: 110 -> 108 (-1.82%) helped: 2 / HURT: 0 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19968898 -> 19968778 (<.01%) instructions in affected programs: 33020 -> 32900 (-0.36%) helped: 10 / HURT: 0 total cycles in shared programs: 885157211 -> 884925015 (-0.03%) cycles in affected programs: 39944544 -> 39712348 (-0.58%) helped: 8 / HURT: 2 total fills in shared programs: 4454 -> 4394 (-1.35%) fills in affected programs: 2678 -> 2618 (-2.24%) helped: 10 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 930445228 -> 929949528 (-0.05%) Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00% Spill count: 3495279 -> 3494400 (-0.03%) Fill count: 6767063 -> 6520785 (-3.64%) Totals from 43844 (2.17% of 2018922) affected shaders: Instrs: 212614840 -> 212119140 (-0.23%) Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03% Spill count: 2831100 -> 2830221 (-0.03%) Fill count: 6128316 -> 5882038 (-4.02%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 1001375893 -> 1001113407 (-0.03%) Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01% Spill count: 3729157 -> 3728585 (-0.02%) Fill count: 6697296 -> 6566874 (-1.95%) Totals from 35062 (1.53% of 2284674) affected shaders: Instrs: 179819265 -> 179556779 (-0.15%) Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04% Spill count: 2453752 -> 2453180 (-0.02%) Fill count: 5279259 -> 5148837 (-2.47%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:13 +00:00
Ian Romanick	b7f5285ad3	brw: Add fill and spill opcodes for LSC platforms These opcodes are emitted during register allocation instead of the scratch reads and writes that were previously emitted. These instructions contain additional information (i.e., the instruction encodes the scratch offset) that enable optimizations to be added later. The fill and spill opcodes are lowered to scratch reads and writes shortly after register allocation. Eventually this lower may have some optimizations (e.g., reuse previous address calculations for successive spills). v2: Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:12 +00:00
Ian Romanick	2215003d95	brw: Add OPT macro to brw_shader.cpp like brw_opt.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:11 +00:00
Ian Romanick	042417a72e	brw: Don't spill_all on internal shaders Basically all of the internal shaders (e.g., from blorp) will fail assertions if there is any scratch space used. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>	2025-11-26 17:20:09 +00:00
Kenneth Graunke	9ffae42975	brw: Store brw_urb_inst::offset in bytes on Xe2 Xe2 uses byte offsets rather than OWord offsets. We've been storing the per-slot offsets in bytes on Xe2 for a while, but kept the global offset immediate in OWords for some reason, choosing to lower it during logical send lowering. This patch makes both offsets (global immediate, per-slot) in the same units, so they could be added together if necessary without scaling. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>	2025-11-11 10:55:44 +00:00
Kenneth Graunke	73cbb35442	brw: Move into a new src/intel/compiler/brw subdirectory Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This keeps the directory structure a bit more organized: - brw specific code - elk specific code - common NIR passes that could be used in both places It also means that you can now 'git grep' in the brw directory without finding a bunch of elk code, or having to "grep thing b*". Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>	2025-10-09 07:01:47 +00:00

23 commits