Georg Lehmann
37d3c63a12
aco/optimizer: add new helpers for applying output modifiers
...
To replace the old instr_mod_labels.
Foz-DB Navi21:
Totals from 683 (0.70% of 97591) affected shaders:
Instrs: 3341288 -> 3340447 (-0.03%); split: -0.03%, +0.00%
CodeSize: 18522460 -> 18520212 (-0.01%); split: -0.01%, +0.00%
Latency: 34359519 -> 34358772 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9229621 -> 9229494 (-0.00%); split: -0.00%, +0.00%
Copies: 368383 -> 368260 (-0.03%); split: -0.04%, +0.00%
PreSGPRs: 48060 -> 48061 (+0.00%)
SALU: 543991 -> 543150 (-0.15%); split: -0.16%, +0.00%
Changes are caused by optimizing not(salu) without killed scc.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38658 >
2025-11-29 08:27:56 +00:00
Georg Lehmann
fc29821d3b
aco/optimizer: move med3 -> add_clamp opt later
...
Soon we will apply omod later,
when the combine_instruction reaches the multiplication with constant.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38658 >
2025-11-29 08:27:55 +00:00
Georg Lehmann
39a61502e5
aco/opt_postRA: allow v_cmpx to clobber exec before nop split/create vector
...
Kind of ugly, but I really hate seeing this in every rt traversal loop:
image_bvh64_intersect_ray v[56:59], [v40, v41, v42, v47, v48, v49, v50, v51, v52, v53, v54, v55], s[44:47]
v_cmp_class_f32_e64 s57, 0xff800000, v12
s_and_b32 exec_lo, s57, exec_lo
s_cbranch_execz BB219
Foz-DB Navi21:
Totals from 3394 (3.48% of 97591) affected shaders:
Instrs: 9536259 -> 9533592 (-0.03%)
CodeSize: 51657072 -> 51640120 (-0.03%); split: -0.03%, +0.00%
Latency: 109493553 -> 109513317 (+0.02%); split: -0.01%, +0.02%
InvThroughput: 29125525 -> 29131876 (+0.02%); split: -0.00%, +0.02%
Copies: 815888 -> 818219 (+0.29%); split: -0.01%, +0.30%
Branches: 277451 -> 277449 (-0.00%)
SALU: 1217642 -> 1214976 (-0.22%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38697 >
2025-11-29 08:02:24 +00:00
Marek Olšák
1f2d129bfa
gallium: add a flag to finalize_nir to allow drivers to skip NIR opts
...
This could help achieve better compile times.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38600 >
2025-11-29 07:29:05 +00:00
Marek Olšák
9294448fe1
nir/recompute_io_bases: report progress only if anything was changed
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
also preserve all metadata because it doesn't add/remove any instructions
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599 >
2025-11-29 05:00:40 +00:00
Marek Olšák
e6499fa73e
nir/recompute_io_bases: move color input bases after all other inputs
...
This is related to the FS prolog.
It should have no effect on other drivers.
v2: make it optional via io_options
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599 >
2025-11-29 05:00:40 +00:00
Marek Olšák
18a338066b
nir/recompute_io_bases: don't use safe iterators
...
the pass doesn't remove anything
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599 >
2025-11-29 05:00:40 +00:00
Faith Ekstrand
4711e5954e
nir: Always use sysvals in lower_input_attachments()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The last holdouts of the var options are gone so we can just emit the
system values. This is overall simpler as it confines all the sysval to
var logic to nir_lower_sysvals_to_varyings().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:34 +00:00
Faith Ekstrand
5bbbf5cf9b
tu: Set use_layer_id_sysval for nir_lower_input_attachments
...
We can just use nir_lower_sysvals_to_varyings instead.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:33 +00:00
Faith Ekstrand
b02a98d7d8
microsof: Run lower_sysvals_to_varyings after lower_input_attachments
...
This lets us request system values from lower_input_attachments and just
lower them ourselves instead of asking it to create variables.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:32 +00:00
Faith Ekstrand
82280a7e86
nir: Support sysval intrinsics in lower_sysvals_to_varyings()
...
Since this is a downgrade path for drivers, it's useful to support both
forms of these common sysvals.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:32 +00:00
Faith Ekstrand
0c36c39103
spirv: Emit SYSTEM_VALUE_LAYER_ID for fragment shaders
...
We have nir_lower_sysvals_to_varyings() so we can just have that lower
it for the drivers who don't want a sysval. Most have to support the
sysval version anyway for various lowering so making them all have to
support both is pretty annoying.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:32 +00:00
Faith Ekstrand
701a9c269e
nir: Add LAYER_ID and VIEW_INDEX to nir_lower_sysvals_to_varyings()
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562 >
2025-11-29 00:50:31 +00:00
Marek Olšák
fa0bea5ff8
nir: remove nir_io_add_const_offset_to_base
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
nir_opt_constant_folding does it now.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:38 +00:00
Marek Olšák
726bbb352e
nir/opt_constant_folding: add nir_io_add_const_offset_to_base behavior
...
We almost always call both passes next to each other.
The code is copied from nir_io_add_const_offset_to_base. No changes.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:38 +00:00
Marek Olšák
9a56672f56
nir: add shader_info::disable_input/output_offset_src_constant_folding
...
and set it where needed to prevent nir_opt_constant_folding from breaking
those drivers.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:38 +00:00
Marek Olšák
7330bca9db
nir: handle load_fs_input_interp_deltas in nir_is_input_load
...
for nir_opt_constant_folding
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:37 +00:00
Marek Olšák
ffcbbeb54a
nir/validate: don't require offset src to be 0 if constant
...
nir_opt_constant_folding does the folding, so this can be non-zero before
that.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277 >
2025-11-29 00:16:36 +00:00
Eric Engestrom
b87b83d15e
broadcom/ci: update device count in ci-tron farm
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38719 >
2025-11-28 23:50:01 +01:00
Eric Engestrom
c09550e3c0
broadcom/ci: apply "Cannot open root device" reboot workaround to all rpi boards
...
The problem has been observed on rpi4 and rpi5 as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38719 >
2025-11-28 23:43:07 +01:00
Marek Olšák
21cdbfa223
ac,radv: move opt_vectorize_callback to common code
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
radeonsi will use it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603 >
2025-11-28 20:16:10 +00:00
Marek Olšák
2c9995a94f
ac/nir: move aco_nir_op_supports_packed_math_16bit here
...
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603 >
2025-11-28 20:16:10 +00:00
Juan A. Suarez Romero
d95b43e07b
broadcom/ci: remove ci-tron- prefix from nightly jobs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All the nighly jobs are run with CI-Tron, so no need to prefix them.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38715 >
2025-11-28 18:09:09 +01:00
Juan A. Suarez Romero
50ba2a0e34
broadcom/ci: remove all baremetal nightly jobs
...
All those jobs will be executed using CI-Tron.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38715 >
2025-11-28 18:09:09 +01:00
Yiwei Zhang
a6ade961b2
venus: implement VK_EXT_map_memory_placed
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38706 >
2025-11-28 16:38:26 +00:00
Yiwei Zhang
8adfdc3304
venus: add renderer support for placed mapping
...
Prepare for VK_EXT_map_memory_placed support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38706 >
2025-11-28 16:38:25 +00:00
David Rosca
38090d5be0
radv/video: Drop casts from vk_find_struct*
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The macro itself does the cast.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521 >
2025-11-28 15:35:26 +00:00
David Rosca
32a02720a8
radv/video: Init session and update rate control in ControlVideoCoding
...
This eliminates the last state we kept in encode video session.
Also fixes changing encode resolution without reset.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521 >
2025-11-28 15:35:26 +00:00
David Rosca
a7fe0188d4
radv/video: Remove tile config and skip mode from video session state
...
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521 >
2025-11-28 15:35:25 +00:00
David Rosca
5d0d00e5f8
radv/video: Use radv_enc_aligned_coded_extent for session params overrides
...
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521 >
2025-11-28 15:35:25 +00:00
David Rosca
0fc4ead36f
radv/video: Remove enc_session from video session state
...
It was only used to store aligned picture size. Add helper
function to get the aligned size and use it when needed.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38521 >
2025-11-28 15:35:25 +00:00
Samuel Pitoiset
c3420ca932
Revert "radv: remove the workaround for DISPATCH_TASKMESH_INDIRECT_MULTI_ACE on GFX10.3"
...
This reverts commit 0391902eb5 .
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38711 >
2025-11-28 15:34:53 +01:00
Georg Lehmann
653716b745
nir/opt_algebraic: create more bit test
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Helps hackends with has_bit_test more (i.e. ACO), but it
shouldn't hurt others either.
Foz-DB Navi21:
Totals from 1138 (1.17% of 97591) affected shaders:
Instrs: 5478747 -> 5476055 (-0.05%); split: -0.05%, +0.00%
CodeSize: 29850188 -> 29853140 (+0.01%); split: -0.04%, +0.05%
SpillSGPRs: 1406 -> 1401 (-0.36%)
Latency: 42324245 -> 42325921 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 11396940 -> 11394048 (-0.03%); split: -0.04%, +0.01%
VClause: 142294 -> 142309 (+0.01%); split: -0.00%, +0.01%
SClause: 124412 -> 124411 (-0.00%); split: -0.00%, +0.00%
Copies: 572696 -> 572749 (+0.01%); split: -0.02%, +0.03%
Branches: 199932 -> 199929 (-0.00%)
PreSGPRs: 73372 -> 74970 (+2.18%)
PreVGPRs: 79514 -> 79511 (-0.00%)
VALU: 3628764 -> 3625744 (-0.08%); split: -0.08%, +0.00%
SALU: 818258 -> 818475 (+0.03%); split: -0.03%, +0.06%
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38700 >
2025-11-28 13:25:24 +00:00
Valentine Burley
f9ef7e0f64
Revert "anv/ci: Run vkd3d job in parallel"
...
With the new vkd3d-proton uprev, a random crash has appeared when
running in parallel.
This reverts commit 45c9c61ad3 .
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38652 >
2025-11-28 11:44:28 +00:00
Samuel Pitoiset
92a468f8f2
ci: uprev vkd3d
...
vkd3d-proton had an issue with its runner and few tests were excluded
by accident.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38652 >
2025-11-28 11:44:28 +00:00
Erik Faye-Lund
60e115dedf
mesa/st: do not drop binding prematurely
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
While it's true that we currently need to eventually fall back to
checking without the render-target binding, this should really be the
last resort. Because otherwise we might end up picking a format that
isn't possible to render to for a color-renderable internalformat.
In the long run, this code should be rewritten to check *properly* if
the internalformat is color-renderable or not *up front*, and not even
try to fall back. But we're currently missing proper helpers for this,
and reworking what we have is a fair bit of work.
So for now, let's just do what we currently do, but shuffle around the
order of testing things so we don't end up dropping unless we absolutely
have to.
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38673 >
2025-11-28 10:46:53 +00:00
Samuel Pitoiset
0391902eb5
radv: remove the workaround for DISPATCH_TASKMESH_INDIRECT_MULTI_ACE on GFX10.3
...
Only very old MEC firmwares are concerned, so let's remove it and
disable mesh shaders with those firmwares.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38691 >
2025-11-28 10:21:30 +00:00
Christoph Pillmayer
262a427a51
pan/bi: Add missing 8bit widen swizzles
...
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38109 >
2025-11-28 09:52:11 +00:00
Tapani Pälli
ba89826b75
anv: add furmark workaround layer
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14274
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38410 >
2025-11-28 09:26:41 +00:00
Timothy Arceri
d10036362f
util/driconf: Add linux version of Penumbra fixes
...
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38563 >
2025-11-28 08:49:55 +00:00
Job Noorman
233b77878d
ir3/ra: try to allocate overlapping regs for shared subreg movs
...
This was implemented for vector RA but not for shared RA yet.
Totals from 1361 (0.77% of 176279) affected shaders:
Instrs: 1175437 -> 1170238 (-0.44%); split: -0.45%, +0.01%
CodeSize: 2300656 -> 2290258 (-0.45%)
NOPs: 221042 -> 220527 (-0.23%); split: -0.48%, +0.25%
MOVs: 30645 -> 30643 (-0.01%); split: -0.01%, +0.00%
COVs: 47425 -> 47016 (-0.86%)
(ss): 35953 -> 35890 (-0.18%); split: -0.21%, +0.03%
(sy): 20174 -> 20168 (-0.03%)
(ss)-stall: 124094 -> 123625 (-0.38%); split: -0.38%, +0.00%
(sy)-stall: 806166 -> 805832 (-0.04%); split: -0.06%, +0.02%
Preamble Instrs: 173151 -> 171299 (-1.07%)
Cat0: 250836 -> 250321 (-0.21%); split: -0.43%, +0.22%
Cat1: 78738 -> 78327 (-0.52%); split: -0.52%, +0.00%
Cat2: 386528 -> 382255 (-1.11%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38573 >
2025-11-28 08:00:42 +00:00
Job Noorman
1fc49eb120
ir3/ra: try to allocate subreg movs earlier
...
Successful subreg allocations allow us to remove the instruction so it
makes sense to try this first before trying other allocation strategies.
Totals from 72 (0.04% of 176279) affected shaders:
Instrs: 144346 -> 144277 (-0.05%); split: -0.06%, +0.01%
CodeSize: 312174 -> 312182 (+0.00%); split: -0.01%, +0.01%
NOPs: 32438 -> 32443 (+0.02%); split: -0.07%, +0.09%
MOVs: 5923 -> 5934 (+0.19%)
COVs: 3039 -> 3000 (-1.28%)
(ss): 2967 -> 2968 (+0.03%)
(sy): 1831 -> 1830 (-0.05%)
(ss)-stall: 9113 -> 9128 (+0.16%)
(sy)-stall: 45844 -> 45858 (+0.03%); split: -0.03%, +0.06%
Cat0: 36136 -> 36141 (+0.01%); split: -0.06%, +0.08%
Cat1: 9010 -> 8982 (-0.31%); split: -0.37%, +0.06%
Cat2: 53533 -> 53487 (-0.09%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38573 >
2025-11-28 08:00:42 +00:00
Samuel Pitoiset
5fd7af9e42
ac/surface: do not use tile swizzle for replayable/aliased FMASK surfaces
...
Otherwise the VA might change.
Fixes: 2bbc7d1db6 ("radv: move more surf_index logic to use_tile_swizzle")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38696 >
2025-11-28 07:39:33 +00:00
Emma Anholt
2d441c10af
ir3: Make the debug-print block numbers be the NIR block numbers.
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We still have to fall back to our pointers-as-indices for things like the
preamble's block or old streamout (I'm presuming here that pointers will
always be much greater than the NIR block count, which seems safe enough
for debug). This gives us nice printouts in debugoptimized builds, and
helps you correlate your ir3 back to NIR (which was helpful in the
hundreds of blocks in the shader I fixed in the previous commit).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38666 >
2025-11-28 07:06:22 +00:00
Emma Anholt
a35f26a983
ir3: Fix incorrect use of predicated ifs on getlast.
...
The getlast lowering will generate new branches, violating the assumptions
of prede_sched().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38666 >
2025-11-28 07:06:21 +00:00
Job Noorman
435db6fabe
ir3: merge rpt groups after postsched
...
It often happens that postsched puts rpt groups back into an order that
allows them to be merged into a repeated instruction. Breaking up these
rpt groups before postsched loses this opportunity.
To fix this, change ir3_cleanup_rpt to not take the ip into account and
call ir3_merge_rpt after postsched.
Totals from 129238 (73.31% of 176279) affected shaders:
MaxWaves: 1834226 -> 1834248 (+0.00%); split: +0.00%, -0.00%
Instrs: 46484782 -> 46382869 (-0.22%); split: -0.69%, +0.48%
CodeSize: 95513914 -> 93871848 (-1.72%); split: -2.24%, +0.52%
NOPs: 8018516 -> 7939362 (-0.99%); split: -3.28%, +2.30%
MOVs: 1391770 -> 1408039 (+1.17%); split: -4.39%, +5.56%
COVs: 776518 -> 776182 (-0.04%); split: -0.06%, +0.02%
Full: 1473903 -> 1489694 (+1.07%); split: -0.76%, +1.83%
(ss): 1143180 -> 1146977 (+0.33%); split: -3.07%, +3.40%
(sy): 552487 -> 562122 (+1.74%); split: -1.83%, +3.57%
(ss)-stall: 4292082 -> 4259946 (-0.75%); split: -3.95%, +3.20%
(sy)-stall: 16573976 -> 17151457 (+3.48%); split: -2.41%, +5.89%
STPs: 16131 -> 16157 (+0.16%); split: -0.10%, +0.26%
LDPs: 19583 -> 19634 (+0.26%); split: -0.02%, +0.28%
Preamble Instrs: 9889595 -> 9887178 (-0.02%); split: -0.23%, +0.21%
Early Preamble: 103194 -> 103646 (+0.44%); split: +0.51%, -0.07%
Cat0: 8850422 -> 8769964 (-0.91%); split: -3.00%, +2.09%
Cat1: 2212326 -> 2226425 (+0.64%); split: -2.90%, +3.54%
Cat2: 17452525 -> 17448724 (-0.02%); split: -0.02%, +0.00%
Cat6: 501182 -> 501263 (+0.02%); split: -0.00%, +0.02%
Cat7: 1293844 -> 1262010 (-2.46%); split: -4.17%, +1.71%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38576 >
2025-11-28 06:41:32 +00:00
Job Noorman
e8dbed2be4
ir3: don't use list_head for rpt groups
...
To link together instructions in a rpt group we currently (ab)use
list_head. This is a bit of a hack because we don't actually have a
list_head that points to the first instruction without being embedded in
an instruction itself (the way list_head is supposed to be used).
Instead, the list_head embedded in the first instruction of a rpt group
also serves as the one pointing to the list. In order make a distinction
between the first and last instruction (for which the main list_head
would usually be used), we rely on the fact that (currently)
instructions in a rpt group are emitted in order which means that later
instructions have a larger serialno than earlier ones.
In order to make all this less hacky, and to lift the restriction of
needing instructions to be emitted in order, replace the list_head with
explicit rpt_next/rpt_prev pointers which link the instructions together
in a doubly but non-circular linked list.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38576 >
2025-11-28 06:41:32 +00:00
Emma Anholt
2cf0ba35bc
ir3: Drop the vector splitting and simplify ir3_nir_lower_64b_global().
...
There's no need to generate the separate memory accesses, when
nir_lower_mem_access_bit_sizes() has already done so. Also, the way the
64b address is handled changed in 2490ecf5fc ("ir3: ingest global
addresses as 64b values from NIR")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688 >
2025-11-28 06:14:00 +00:00
Emma Anholt
997c500cc4
ir3: Drop ir3_nir_lower_64b_intrinsics
...
Our 64-bit memory load/stores are already split to 32 bits by
nir_lower_mem_access_bit_sizes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688 >
2025-11-28 06:14:00 +00:00
Emma Anholt
f8901bddac
ir3: Drop use of nir_lower_wrmasks().
...
It gets done by nir_lower_mem_access_bit_sizes that's called right after.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688 >
2025-11-28 06:13:59 +00:00