Commit graph

215413 commits

Author SHA1 Message Date
Georg Lehmann
37d3c63a12 aco/optimizer: add new helpers for applying output modifiers
To replace the old instr_mod_labels.

Foz-DB Navi21:
Totals from 683 (0.70% of 97591) affected shaders:
Instrs: 3341288 -> 3340447 (-0.03%); split: -0.03%, +0.00%
CodeSize: 18522460 -> 18520212 (-0.01%); split: -0.01%, +0.00%
Latency: 34359519 -> 34358772 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9229621 -> 9229494 (-0.00%); split: -0.00%, +0.00%
Copies: 368383 -> 368260 (-0.03%); split: -0.04%, +0.00%
PreSGPRs: 48060 -> 48061 (+0.00%)
SALU: 543991 -> 543150 (-0.15%); split: -0.16%, +0.00%

Changes are caused by optimizing not(salu) without killed scc.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38658>
2025-11-29 08:27:56 +00:00
Georg Lehmann
fc29821d3b aco/optimizer: move med3 -> add_clamp opt later
Soon we will apply omod later,
when the combine_instruction reaches the multiplication with constant.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38658>
2025-11-29 08:27:55 +00:00
Georg Lehmann
39a61502e5 aco/opt_postRA: allow v_cmpx to clobber exec before nop split/create vector
Kind of ugly, but I really hate seeing this in every rt traversal loop:

image_bvh64_intersect_ray v[56:59], [v40, v41, v42, v47, v48, v49, v50, v51, v52, v53, v54, v55], s[44:47]
v_cmp_class_f32_e64 s57, 0xff800000, v12
s_and_b32 exec_lo, s57, exec_lo
s_cbranch_execz BB219

Foz-DB Navi21:
Totals from 3394 (3.48% of 97591) affected shaders:
Instrs: 9536259 -> 9533592 (-0.03%)
CodeSize: 51657072 -> 51640120 (-0.03%); split: -0.03%, +0.00%
Latency: 109493553 -> 109513317 (+0.02%); split: -0.01%, +0.02%
InvThroughput: 29125525 -> 29131876 (+0.02%); split: -0.00%, +0.02%
Copies: 815888 -> 818219 (+0.29%); split: -0.01%, +0.30%
Branches: 277451 -> 277449 (-0.00%)
SALU: 1217642 -> 1214976 (-0.22%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38697>
2025-11-29 08:02:24 +00:00
Marek Olšák
1f2d129bfa gallium: add a flag to finalize_nir to allow drivers to skip NIR opts
This could help achieve better compile times.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38600>
2025-11-29 07:29:05 +00:00
Marek Olšák
9294448fe1 nir/recompute_io_bases: report progress only if anything was changed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
also preserve all metadata because it doesn't add/remove any instructions

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599>
2025-11-29 05:00:40 +00:00
Marek Olšák
e6499fa73e nir/recompute_io_bases: move color input bases after all other inputs
This is related to the FS prolog.
It should have no effect on other drivers.

v2: make it optional via io_options

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599>
2025-11-29 05:00:40 +00:00
Marek Olšák
18a338066b nir/recompute_io_bases: don't use safe iterators
the pass doesn't remove anything

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599>
2025-11-29 05:00:40 +00:00
Faith Ekstrand
4711e5954e nir: Always use sysvals in lower_input_attachments()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The last holdouts of the var options are gone so we can just emit the
system values.  This is overall simpler as it confines all the sysval to
var logic to nir_lower_sysvals_to_varyings().

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:34 +00:00
Faith Ekstrand
5bbbf5cf9b tu: Set use_layer_id_sysval for nir_lower_input_attachments
We can just use nir_lower_sysvals_to_varyings instead.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:33 +00:00
Faith Ekstrand
b02a98d7d8 microsof: Run lower_sysvals_to_varyings after lower_input_attachments
This lets us request system values from lower_input_attachments and just
lower them ourselves instead of asking it to create variables.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:32 +00:00
Faith Ekstrand
82280a7e86 nir: Support sysval intrinsics in lower_sysvals_to_varyings()
Since this is a downgrade path for drivers, it's useful to support both
forms of these common sysvals.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:32 +00:00
Faith Ekstrand
0c36c39103 spirv: Emit SYSTEM_VALUE_LAYER_ID for fragment shaders
We have nir_lower_sysvals_to_varyings() so we can just have that lower
it for the drivers who don't want a sysval.  Most have to support the
sysval version anyway for various lowering so making them all have to
support both is pretty annoying.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:32 +00:00
Faith Ekstrand
701a9c269e nir: Add LAYER_ID and VIEW_INDEX to nir_lower_sysvals_to_varyings()
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
2025-11-29 00:50:31 +00:00
Marek Olšák
fa0bea5ff8 nir: remove nir_io_add_const_offset_to_base
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_opt_constant_folding does it now.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:38 +00:00
Marek Olšák
726bbb352e nir/opt_constant_folding: add nir_io_add_const_offset_to_base behavior
We almost always call both passes next to each other.

The code is copied from nir_io_add_const_offset_to_base. No changes.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:38 +00:00
Marek Olšák
9a56672f56 nir: add shader_info::disable_input/output_offset_src_constant_folding
and set it where needed to prevent nir_opt_constant_folding from breaking
those drivers.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:38 +00:00
Marek Olšák
7330bca9db nir: handle load_fs_input_interp_deltas in nir_is_input_load
for nir_opt_constant_folding

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:37 +00:00
Marek Olšák
ffcbbeb54a nir/validate: don't require offset src to be 0 if constant
nir_opt_constant_folding does the folding, so this can be non-zero before
that.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
2025-11-29 00:16:36 +00:00
Eric Engestrom
b87b83d15e broadcom/ci: update device count in ci-tron farm
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38719>
2025-11-28 23:50:01 +01:00
Eric Engestrom
c09550e3c0 broadcom/ci: apply "Cannot open root device" reboot workaround to all rpi boards
The problem has been observed on rpi4 and rpi5 as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38719>
2025-11-28 23:43:07 +01:00
Marek Olšák
21cdbfa223 ac,radv: move opt_vectorize_callback to common code
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
radeonsi will use it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603>
2025-11-28 20:16:10 +00:00
Marek Olšák
2c9995a94f ac/nir: move aco_nir_op_supports_packed_math_16bit here
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603>
2025-11-28 20:16:10 +00:00
Juan A. Suarez Romero
d95b43e07b broadcom/ci: remove ci-tron- prefix from nightly jobs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All the nighly jobs are run with CI-Tron, so no need to prefix them.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38715>
2025-11-28 18:09:09 +01:00
Juan A. Suarez Romero
50ba2a0e34 broadcom/ci: remove all baremetal nightly jobs
All those jobs will be executed using CI-Tron.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38715>
2025-11-28 18:09:09 +01:00
Yiwei Zhang
a6ade961b2 venus: implement VK_EXT_map_memory_placed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38706>
2025-11-28 16:38:26 +00:00
Yiwei Zhang
8adfdc3304 venus: add renderer support for placed mapping
Prepare for VK_EXT_map_memory_placed support.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38706>
2025-11-28 16:38:25 +00:00
David Rosca
38090d5be0 radv/video: Drop casts from vk_find_struct*
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The macro itself does the cast.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521>
2025-11-28 15:35:26 +00:00
David Rosca
32a02720a8 radv/video: Init session and update rate control in ControlVideoCoding
This eliminates the last state we kept in encode video session.
Also fixes changing encode resolution without reset.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521>
2025-11-28 15:35:26 +00:00
David Rosca
a7fe0188d4 radv/video: Remove tile config and skip mode from video session state
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521>
2025-11-28 15:35:25 +00:00
David Rosca
5d0d00e5f8 radv/video: Use radv_enc_aligned_coded_extent for session params overrides
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521>
2025-11-28 15:35:25 +00:00
David Rosca
0fc4ead36f radv/video: Remove enc_session from video session state
It was only used to store aligned picture size. Add helper
function to get the aligned size and use it when needed.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521>
2025-11-28 15:35:25 +00:00
Samuel Pitoiset
c3420ca932 Revert "radv: remove the workaround for DISPATCH_TASKMESH_INDIRECT_MULTI_ACE on GFX10.3"
This reverts commit 0391902eb5.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38711>
2025-11-28 15:34:53 +01:00
Georg Lehmann
653716b745 nir/opt_algebraic: create more bit test
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Helps hackends with has_bit_test more (i.e. ACO), but it
shouldn't hurt others either.

Foz-DB Navi21:
Totals from 1138 (1.17% of 97591) affected shaders:
Instrs: 5478747 -> 5476055 (-0.05%); split: -0.05%, +0.00%
CodeSize: 29850188 -> 29853140 (+0.01%); split: -0.04%, +0.05%
SpillSGPRs: 1406 -> 1401 (-0.36%)
Latency: 42324245 -> 42325921 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 11396940 -> 11394048 (-0.03%); split: -0.04%, +0.01%
VClause: 142294 -> 142309 (+0.01%); split: -0.00%, +0.01%
SClause: 124412 -> 124411 (-0.00%); split: -0.00%, +0.00%
Copies: 572696 -> 572749 (+0.01%); split: -0.02%, +0.03%
Branches: 199932 -> 199929 (-0.00%)
PreSGPRs: 73372 -> 74970 (+2.18%)
PreVGPRs: 79514 -> 79511 (-0.00%)
VALU: 3628764 -> 3625744 (-0.08%); split: -0.08%, +0.00%
SALU: 818258 -> 818475 (+0.03%); split: -0.03%, +0.06%

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38700>
2025-11-28 13:25:24 +00:00
Valentine Burley
f9ef7e0f64 Revert "anv/ci: Run vkd3d job in parallel"
With the new vkd3d-proton uprev, a random crash has appeared when
running in parallel.

This reverts commit 45c9c61ad3.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38652>
2025-11-28 11:44:28 +00:00
Samuel Pitoiset
92a468f8f2 ci: uprev vkd3d
vkd3d-proton had an issue with its runner and few tests were excluded
by accident.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38652>
2025-11-28 11:44:28 +00:00
Erik Faye-Lund
60e115dedf mesa/st: do not drop binding prematurely
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
While it's true that we currently need to eventually fall back to
checking without the render-target binding, this should really be the
last resort. Because otherwise we might end up picking a format that
isn't possible to render to for a color-renderable internalformat.

In the long run, this code should be rewritten to check *properly* if
the internalformat is color-renderable or not *up front*, and not even
try to fall back. But we're currently missing proper helpers for this,
and reworking what we have is a fair bit of work.

So for now, let's just do what we currently do, but shuffle around the
order of testing things so we don't end up dropping unless we absolutely
have to.

Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38673>
2025-11-28 10:46:53 +00:00
Samuel Pitoiset
0391902eb5 radv: remove the workaround for DISPATCH_TASKMESH_INDIRECT_MULTI_ACE on GFX10.3
Only very old MEC firmwares are concerned, so let's remove it and
disable mesh shaders with those firmwares.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38691>
2025-11-28 10:21:30 +00:00
Christoph Pillmayer
262a427a51 pan/bi: Add missing 8bit widen swizzles
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38109>
2025-11-28 09:52:11 +00:00
Tapani Pälli
ba89826b75 anv: add furmark workaround layer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14274
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38410>
2025-11-28 09:26:41 +00:00
Timothy Arceri
d10036362f util/driconf: Add linux version of Penumbra fixes
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38563>
2025-11-28 08:49:55 +00:00
Job Noorman
233b77878d ir3/ra: try to allocate overlapping regs for shared subreg movs
This was implemented for vector RA but not for shared RA yet.

Totals from 1361 (0.77% of 176279) affected shaders:
Instrs: 1175437 -> 1170238 (-0.44%); split: -0.45%, +0.01%
CodeSize: 2300656 -> 2290258 (-0.45%)
NOPs: 221042 -> 220527 (-0.23%); split: -0.48%, +0.25%
MOVs: 30645 -> 30643 (-0.01%); split: -0.01%, +0.00%
COVs: 47425 -> 47016 (-0.86%)
(ss): 35953 -> 35890 (-0.18%); split: -0.21%, +0.03%
(sy): 20174 -> 20168 (-0.03%)
(ss)-stall: 124094 -> 123625 (-0.38%); split: -0.38%, +0.00%
(sy)-stall: 806166 -> 805832 (-0.04%); split: -0.06%, +0.02%
Preamble Instrs: 173151 -> 171299 (-1.07%)
Cat0: 250836 -> 250321 (-0.21%); split: -0.43%, +0.22%
Cat1: 78738 -> 78327 (-0.52%); split: -0.52%, +0.00%
Cat2: 386528 -> 382255 (-1.11%)

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38573>
2025-11-28 08:00:42 +00:00
Job Noorman
1fc49eb120 ir3/ra: try to allocate subreg movs earlier
Successful subreg allocations allow us to remove the instruction so it
makes sense to try this first before trying other allocation strategies.

Totals from 72 (0.04% of 176279) affected shaders:
Instrs: 144346 -> 144277 (-0.05%); split: -0.06%, +0.01%
CodeSize: 312174 -> 312182 (+0.00%); split: -0.01%, +0.01%
NOPs: 32438 -> 32443 (+0.02%); split: -0.07%, +0.09%
MOVs: 5923 -> 5934 (+0.19%)
COVs: 3039 -> 3000 (-1.28%)
(ss): 2967 -> 2968 (+0.03%)
(sy): 1831 -> 1830 (-0.05%)
(ss)-stall: 9113 -> 9128 (+0.16%)
(sy)-stall: 45844 -> 45858 (+0.03%); split: -0.03%, +0.06%
Cat0: 36136 -> 36141 (+0.01%); split: -0.06%, +0.08%
Cat1: 9010 -> 8982 (-0.31%); split: -0.37%, +0.06%
Cat2: 53533 -> 53487 (-0.09%)

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38573>
2025-11-28 08:00:42 +00:00
Samuel Pitoiset
5fd7af9e42 ac/surface: do not use tile swizzle for replayable/aliased FMASK surfaces
Otherwise the VA might change.

Fixes: 2bbc7d1db6 ("radv: move more surf_index logic to use_tile_swizzle")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38696>
2025-11-28 07:39:33 +00:00
Emma Anholt
2d441c10af ir3: Make the debug-print block numbers be the NIR block numbers.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We still have to fall back to our pointers-as-indices for things like the
preamble's block or old streamout (I'm presuming here that pointers will
always be much greater than the NIR block count, which seems safe enough
for debug).  This gives us nice printouts in debugoptimized builds, and
helps you correlate your ir3 back to NIR (which was helpful in the
hundreds of blocks in the shader I fixed in the previous commit).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38666>
2025-11-28 07:06:22 +00:00
Emma Anholt
a35f26a983 ir3: Fix incorrect use of predicated ifs on getlast.
The getlast lowering will generate new branches, violating the assumptions
of prede_sched().

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38666>
2025-11-28 07:06:21 +00:00
Job Noorman
435db6fabe ir3: merge rpt groups after postsched
It often happens that postsched puts rpt groups back into an order that
allows them to be merged into a repeated instruction. Breaking up these
rpt groups before postsched loses this opportunity.

To fix this, change ir3_cleanup_rpt to not take the ip into account and
call ir3_merge_rpt after postsched.

Totals from 129238 (73.31% of 176279) affected shaders:
MaxWaves: 1834226 -> 1834248 (+0.00%); split: +0.00%, -0.00%
Instrs: 46484782 -> 46382869 (-0.22%); split: -0.69%, +0.48%
CodeSize: 95513914 -> 93871848 (-1.72%); split: -2.24%, +0.52%
NOPs: 8018516 -> 7939362 (-0.99%); split: -3.28%, +2.30%
MOVs: 1391770 -> 1408039 (+1.17%); split: -4.39%, +5.56%
COVs: 776518 -> 776182 (-0.04%); split: -0.06%, +0.02%
Full: 1473903 -> 1489694 (+1.07%); split: -0.76%, +1.83%
(ss): 1143180 -> 1146977 (+0.33%); split: -3.07%, +3.40%
(sy): 552487 -> 562122 (+1.74%); split: -1.83%, +3.57%
(ss)-stall: 4292082 -> 4259946 (-0.75%); split: -3.95%, +3.20%
(sy)-stall: 16573976 -> 17151457 (+3.48%); split: -2.41%, +5.89%
STPs: 16131 -> 16157 (+0.16%); split: -0.10%, +0.26%
LDPs: 19583 -> 19634 (+0.26%); split: -0.02%, +0.28%
Preamble Instrs: 9889595 -> 9887178 (-0.02%); split: -0.23%, +0.21%
Early Preamble: 103194 -> 103646 (+0.44%); split: +0.51%, -0.07%
Cat0: 8850422 -> 8769964 (-0.91%); split: -3.00%, +2.09%
Cat1: 2212326 -> 2226425 (+0.64%); split: -2.90%, +3.54%
Cat2: 17452525 -> 17448724 (-0.02%); split: -0.02%, +0.00%
Cat6: 501182 -> 501263 (+0.02%); split: -0.00%, +0.02%
Cat7: 1293844 -> 1262010 (-2.46%); split: -4.17%, +1.71%

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38576>
2025-11-28 06:41:32 +00:00
Job Noorman
e8dbed2be4 ir3: don't use list_head for rpt groups
To link together instructions in a rpt group we currently (ab)use
list_head. This is a bit of a hack because we don't actually have a
list_head that points to the first instruction without being embedded in
an instruction itself (the way list_head is supposed to be used).
Instead, the list_head embedded in the first instruction of a rpt group
also serves as the one pointing to the list. In order make a distinction
between the first and last instruction (for which the main list_head
would usually be used), we rely on the fact that (currently)
instructions in a rpt group are emitted in order which means that later
instructions have a larger serialno than earlier ones.

In order to make all this less hacky, and to lift the restriction of
needing instructions to be emitted in order, replace the list_head with
explicit rpt_next/rpt_prev pointers which link the instructions together
in a doubly but non-circular linked list.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38576>
2025-11-28 06:41:32 +00:00
Emma Anholt
2cf0ba35bc ir3: Drop the vector splitting and simplify ir3_nir_lower_64b_global().
There's no need to generate the separate memory accesses, when
nir_lower_mem_access_bit_sizes() has already done so.  Also, the way the
64b address is handled changed in 2490ecf5fc ("ir3: ingest global
addresses as 64b values from NIR")

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688>
2025-11-28 06:14:00 +00:00
Emma Anholt
997c500cc4 ir3: Drop ir3_nir_lower_64b_intrinsics
Our 64-bit memory load/stores are already split to 32 bits by
nir_lower_mem_access_bit_sizes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688>
2025-11-28 06:14:00 +00:00
Emma Anholt
f8901bddac ir3: Drop use of nir_lower_wrmasks().
It gets done by nir_lower_mem_access_bit_sizes that's called right after.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688>
2025-11-28 06:13:59 +00:00