Commit graph

6118 commits

Author SHA1 Message Date
Caio Oliveira
2ed79f80ba nir/load_store_vectorize: Skip new bit-sizes that are unaligned with high_offset
Otherwise this would require combining two values to produce a single
(new bit-size) channel, which vectorize_stores() don't handle.  The pass
can still keep trying smaller bit-sizes.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12946
Fixes: ce9205c03b ("nir: add a load/store vectorization pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34414>
2025-04-11 19:17:17 +00:00
Georg Lehmann
d046ecf95a nir/opt_algebraic: optimize open coded ffract
Foz-DB Navi21:
Totals from 274 (0.34% of 79789) affected shaders:
Instrs: 522630 -> 522181 (-0.09%); split: -0.09%, +0.01%
CodeSize: 2880668 -> 2878940 (-0.06%); split: -0.07%, +0.01%
VGPRs: 14488 -> 14464 (-0.17%)
Latency: 4092358 -> 4091243 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 1014148 -> 1013471 (-0.07%); split: -0.07%, +0.00%
VClause: 11646 -> 11639 (-0.06%)
SClause: 18614 -> 18611 (-0.02%)
Copies: 56248 -> 56309 (+0.11%); split: -0.05%, +0.16%
PreVGPRs: 13649 -> 13647 (-0.01%)
VALU: 359733 -> 359285 (-0.12%); split: -0.13%, +0.01%
SALU: 59719 -> 59720 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33369>
2025-04-11 12:36:02 +00:00
Konstantin Seurer
ba001626ac nir: Turn the format string index into a const index
It is already expected to be constant.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34208>
2025-04-10 19:31:37 +00:00
Boris Brezillon
4f4ac56145 pan/va: Support relaxed waits on read-only render targets
On Valhall we can optimize lower waits, which waits for both readers and
writers, into resource_waits which only wait for writers, allowing
threads accessing read-only resources to execute concurrently.

Let's use that on LD_TILE instructions so we can optmize the read-only
case.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
2025-04-10 13:17:53 +00:00
Boris Brezillon
20275d6521 pan/bi: Introduce two intrinsics to support input attachment remapping
In order to dynamically load the content of the tile buffer, we need
to know the target (color, depth or stencil) and the conversion to
apply. Let's define the load_input_attachment_{target,conv}_pan
intrinsics so we can dissociate the logic lowering input attachment
loads into load_converted_output_pan, and the part optimizing the shader
when input attachment map is passed at compile time.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
2025-04-10 13:17:53 +00:00
Boris Brezillon
f3be0836b7 pan/bi: Pass an explicit sampleid to load_converted_output_pan
Needed if we want to lower multisample input attachment loads to
tile buffer loads.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
2025-04-10 13:17:53 +00:00
Boris Brezillon
cdeda45282 pan/bi: Pass load_converted_output_pan target through a source
This allows us to pass a dynamic render target which will be needed
to support VK_KHR_dynamic_rendering_local_read.

While at it, we also enable support for depth/stencil tile loads.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
2025-04-10 13:17:53 +00:00
Alyssa Rosenzweig
c2a3c70086 nir/lower_tex: use vector_insert_imm
was in the area.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
2025-04-08 19:04:47 +00:00
Alyssa Rosenzweig
c23201ad8a nir/lower_blend: disable logic ops for unsupported formats
Fixes new Vulkan CTS cases on Honeykrisp (and probably panvk and whatever)

dEQP-VK.pipeline.shader_object_unlinked_binary.logic_op_na_formats.*

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
2025-04-08 19:04:47 +00:00
Alyssa Rosenzweig
54ccc8ed0b nir/lower_blend: refactor logicop variables
This pulls out the logicop_func variable from the options struct, so we can
modify it in the next commit in a central place. It then refactors out the
format variable from the options struct since we end up duplicating
options->format[rt] a zillion times and passing in both an options struct and a
logicop func override is confusing so this will just make everything neater and
self-contained next commit.

no functional change.

Cc'd to make the next commit cherrypickable.

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
2025-04-08 19:04:46 +00:00
Faith Ekstrand
6aa2c152b8 nak,nir: Add an image_load_raw_nv intrinsic
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34336>
2025-04-08 04:06:45 +00:00
Marek Olšák
1d5c42528b nir/opt_algebraic: lower 16-bit imul_high & umul_high
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Timothy Arceri
d8782db3a4 glsl: fix regression in ubo cloning
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes KHR-GL46.layout_binding.block_layout_binding_block_VertexShader
with radeonsi.

Fixes: 2b2132d2ac ("nir: fix uniform cloning helper")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34337>
2025-04-06 19:43:47 +10:00
Konstantin
e7a44de184 nir/tests: Do not rely on __LINE__
__LINE__ can be inconsistent when using different compilers. This patch
changes the test runner to do a simple string find/replace of the test
source file instead of looking for the line where the reference string
starts.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33980>
2025-04-04 19:01:01 +00:00
Timur Kristóf
a530890e75 nir/print: Fix variable mode for arrayed output load intrinsics.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This helps print the names of varyings correctly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
2025-04-03 19:54:51 +00:00
Timur Kristóf
96d11d0f56 nir/opt_varyings: Fix assertion when deduplicating TCS outputs.
When deduplicating TCS outputs, we may find outputs that aren't
loaded by the shader itself. This previously hit a bad assertion.

Fixes: c66967b5cb
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12410
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
2025-04-03 19:54:51 +00:00
Timur Kristóf
a29b5857f7 nir/xfb: Preserve some xfb information when gathering from intrinsics.
We need to remember which streamout buffers and streams were enabled,
even if the shader doesn't actually write any outputs to them,
because the API requires that we count vertices created by this shader
towards queries against those streams.

That information can be gathered by nir_gather_xfb_info_with_varyings
from the original NIR I/O variables that we get from the frontend,
but it isn't included in any intrinsics so would be otherwise lost here.

Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
2025-04-03 19:54:51 +00:00
Faith Ekstrand
a3935c7aa2 nak,nir: Generalize nak_nir_split_64bit_conversions and move it to NIR
This pass was originally based on a similar pass from Intel but it's
grown support for some fancy stuff like fp64 -> fp16 conversion
splitting with proper rounding.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34126>
2025-03-29 03:02:17 +00:00
Lionel Landwerlin
772beb0ebf nir: add support for lowering non uniform texture offsets
Intel HW only has support for non-uniform offsets for TG4 operations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Georg Lehmann
2b1fc1a7fe nir: add option to keep mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:15 +00:00
Timothy Arceri
2b2132d2ac nir: fix uniform cloning helper
glsl allows for ubos to have the same name but different bindings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes: b47b8d16d9 ("nir: expose reusable linking helpers for cloning uniform loads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12852
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34138>
2025-03-25 06:54:53 +00:00
Connor Abbott
1621080df7 compiler,nir: Gather needs_full_quad_helper_invocations info
This is needed on Qualcomm, where there are separate fields to enable
just 3 fragments and all 4 fragments.

Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>
2025-03-14 21:55:58 +00:00
Connor Abbott
7a55e13939 nir, compiler: Rename needs_quad_helper_invocations
This currently treats coarse and fine derivatives the same, but Qualcomm
needs to know whether just coarse derivatives are used or fine
derivatives/quad ops are also used. Rename this to
needs_coarse_quad_helper_invocations make clear the difference from the
new field, needs_full_quad_helper_invocations.

Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>
2025-03-14 21:55:57 +00:00
Karol Herbst
3a9954c117 nir/serialize: fix decoding of is_return and is_uniform
Fixes: 3321a56d1d ("nir: Serialize all parameter attributes")
Fixes: 26cbb6b933 ("nir: Add parameter divergence info")

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34052>
2025-03-14 15:01:32 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Matt Turner
7534559f2f nir: Return NULL, not false, from functions returning pointers
Reported by clang's `-Wbool-conversion`.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34014>
2025-03-13 20:11:09 +00:00
Mary Guillemard
e0be93d881 nir: Add Panfrost specific shader_output intrinsic
On Avalon, this is a bitfield that holds information on what
values a vertex shader should output.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33910>
2025-03-10 07:38:16 +01:00
Alyssa Rosenzweig
bc6b527b52 nir/lower_helper_writes: fix stores after discard
We need to use nir_is_helper_invocation instead of
nir_load_helper_invocation, to correctly predicate stores after demote.

Identified in a Piglit on AGX a year ago but I forgot to upstream this.

Fixes: 586da7b329 ("nir: Add nir_lower_helper_writes pass")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33939>
2025-03-08 07:47:40 +00:00
Daniel Schürmann
dbd41e3ddd nir: set SYSTEM_VALUE_HELPER_INVOCATION read for nir_intrinsic_is_helper_invocation
is_helper_invocation is the volatile access of load_helper_invocation.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33492>
2025-03-07 15:44:49 +00:00
Daniel Schürmann
a4cffa91b8 nir: remove nir_lower_discard_if_to_cf option
Since removing nir_intrinsic_discard{_if} it has no purpose anymore.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33492>
2025-03-07 15:44:49 +00:00
Corentin Noël
eb1274ef08 nir: Add bool return value to nir_legacy_trivialize(..)
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33686>
2025-03-06 03:29:20 +00:00
Caterina Shablia
ca9ff8c8c7 nir: teach nir_lower_bit_size to handle ballot and ballot_relaxed
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33365>
2025-03-05 22:58:15 +00:00
Karol Herbst
5c1f61d900 nir: Do not eliminate dead writes to shared memory in called functions.
Fixes regressions in rusticl and c11_atomic OpenCL CTS test.

Fixes: e65c1473de ("nir: Eliminate dead writes to shared memory at the end of the program")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33807>
2025-03-04 19:41:13 +00:00
Konstantin Seurer
3aeab4ce40 nir/print: Do not print debug information when gathering it
Referencing a shader string with differend debug information is
confusing.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28613>
2025-03-04 18:42:48 +00:00
Konstantin Seurer
a04b5ebd3c nir/sweep: Fix handling instructions with debug info
When debug information is present, the nir_instr pointer is not the
start of the allocation.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28613>
2025-03-04 18:42:48 +00:00
Konstantin Seurer
3a69b52d37 nir: Test nir_minimize_call_live_states
Adds a couple of tests for various instructions and controlflow
constructs.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33289>
2025-03-03 23:30:57 +00:00
Faith Ekstrand
a65009e808 nir: Add a nir_opt_tex_skip_helpers optimization
Arm and NVIDIA hardware both have this as a bit you can set on the
texture instruction so we may as well have a shared pass for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33402>
2025-03-01 08:44:15 +00:00
Faith Ekstrand
7ac6ec2ceb nir: Add a get_io_index_src() helper
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33402>
2025-03-01 08:44:15 +00:00
Georg Lehmann
d272a6e261 nir/opt_algebraic: optimize d3d a ? b : 0
Foz-DB Navi21:
Totals from 3466 (4.34% of 79789) affected shaders:
MaxWaves: 73163 -> 73161 (-0.00%); split: +0.02%, -0.02%
Instrs: 3993862 -> 3987633 (-0.16%); split: -0.19%, +0.04%
CodeSize: 21747420 -> 21725620 (-0.10%); split: -0.15%, +0.05%
VGPRs: 190736 -> 190728 (-0.00%); split: -0.04%, +0.03%
SpillSGPRs: 489 -> 478 (-2.25%); split: -2.86%, +0.61%
Latency: 48169718 -> 48159068 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 12132999 -> 12128721 (-0.04%); split: -0.05%, +0.01%
VClause: 78063 -> 78052 (-0.01%); split: -0.09%, +0.08%
SClause: 109095 -> 108996 (-0.09%); split: -0.13%, +0.04%
Copies: 265784 -> 264530 (-0.47%); split: -0.72%, +0.25%
Branches: 84533 -> 84553 (+0.02%)
PreSGPRs: 172577 -> 172531 (-0.03%); split: -0.19%, +0.16%
PreVGPRs: 165776 -> 165825 (+0.03%); split: -0.06%, +0.09%
VALU: 2851544 -> 2850426 (-0.04%); split: -0.08%, +0.04%
SALU: 413543 -> 408408 (-1.24%); split: -1.45%, +0.21%
VMEM: 139890 -> 139887 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
2e7f34af6b nir/opt_algebraic: optimize more ine/ieq(umin(b2i, ), 0)
Foz-DB Navi21:
Totals from 76 (0.10% of 79789) affected shaders:
MaxWaves: 1050 -> 1062 (+1.14%)
Instrs: 113754 -> 113691 (-0.06%); split: -0.11%, +0.06%
CodeSize: 605096 -> 605216 (+0.02%); split: -0.03%, +0.05%
VGPRs: 6024 -> 5976 (-0.80%)
Latency: 1776501 -> 1777519 (+0.06%); split: -0.06%, +0.12%
InvThroughput: 379644 -> 376751 (-0.76%)
SClause: 2132 -> 2134 (+0.09%)
Copies: 4131 -> 4128 (-0.07%); split: -1.77%, +1.69%
PreSGPRs: 4275 -> 4270 (-0.12%)
PreVGPRs: 5568 -> 5526 (-0.75%)
VALU: 86732 -> 86581 (-0.17%); split: -0.24%, +0.07%
SALU: 7112 -> 7198 (+1.21%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
7bc3062a3b nir/opt_algebraic: push comparisons with constants into bcsel with constant
Foz-DB Navi21:
Totals from 1657 (2.08% of 79789) affected shaders:
MaxWaves: 30275 -> 30261 (-0.05%); split: +0.01%, -0.05%
Instrs: 3316251 -> 3315701 (-0.02%); split: -0.04%, +0.02%
CodeSize: 17831924 -> 17832020 (+0.00%); split: -0.06%, +0.06%
SpillSGPRs: 815 -> 859 (+5.40%)
SpillVGPRs: 3335 -> 3293 (-1.26%)
Scratch: 231424 -> 230400 (-0.44%)
Latency: 33413310 -> 33402751 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 9116062 -> 9112904 (-0.03%); split: -0.04%, +0.00%
VClause: 65587 -> 65560 (-0.04%); split: -0.05%, +0.01%
SClause: 86208 -> 86261 (+0.06%); split: -0.02%, +0.08%
Copies: 356158 -> 356439 (+0.08%); split: -0.07%, +0.15%
PreSGPRs: 101710 -> 101806 (+0.09%); split: -0.01%, +0.11%
PreVGPRs: 89293 -> 89286 (-0.01%); split: -0.04%, +0.04%
VALU: 2220900 -> 2218839 (-0.09%); split: -0.11%, +0.01%
SALU: 472988 -> 474567 (+0.33%); split: -0.08%, +0.42%
VMEM: 118401 -> 118347 (-0.05%)
SMEM: 123597 -> 123592 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
3837bc6d16 nir/opt_algebraic: optimize ~a == ~b and ~a == #b
Foz-DB Navi21:
Totals from 2 (0.00% of 79789) affected shaders:
Instrs: 8343 -> 8323 (-0.24%)
CodeSize: 43884 -> 43764 (-0.27%)
Latency: 19390 -> 19363 (-0.14%)
InvThroughput: 3380 -> 3356 (-0.71%)
VALU: 5413 -> 5393 (-0.37%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
8759223498 nir/opt_algebraic: optimize b2i/b2f comparision with non 0/1 constants
Foz-DB Navi21:
Totals from 28 (0.04% of 79789) affected shaders:
MaxWaves: 732 -> 728 (-0.55%)
Instrs: 23425 -> 22559 (-3.70%)
CodeSize: 137740 -> 132292 (-3.96%)
VGPRs: 1128 -> 1144 (+1.42%)
Latency: 94604 -> 92423 (-2.31%)
InvThroughput: 19166 -> 18814 (-1.84%); split: -2.38%, +0.54%
VClause: 429 -> 423 (-1.40%)
SClause: 937 -> 926 (-1.17%)
Copies: 1199 -> 914 (-23.77%); split: -24.52%, +0.75%
Branches: 451 -> 421 (-6.65%)
PreSGPRs: 1043 -> 996 (-4.51%)
PreVGPRs: 992 -> 973 (-1.92%); split: -3.53%, +1.61%
VALU: 17566 -> 16865 (-3.99%)
SALU: 1254 -> 1157 (-7.74%)
VMEM: 619 -> 609 (-1.62%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
2bfcfef5da nir/opt_algebraic: optimize bcsel of b2f and constants
Foz-DB Navi21:
Totals from 212 (0.27% of 79789) affected shaders:
MaxWaves: 4024 -> 4030 (+0.15%)
Instrs: 1314134 -> 1313894 (-0.02%); split: -0.03%, +0.02%
CodeSize: 7033216 -> 7026888 (-0.09%); split: -0.10%, +0.01%
VGPRs: 14224 -> 14176 (-0.34%)
Latency: 7402062 -> 7399180 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 1724879 -> 1723773 (-0.06%); split: -0.07%, +0.00%
VClause: 37741 -> 37711 (-0.08%); split: -0.11%, +0.03%
SClause: 29266 -> 29268 (+0.01%); split: -0.01%, +0.01%
Copies: 123810 -> 123786 (-0.02%); split: -0.19%, +0.17%
Branches: 42370 -> 42407 (+0.09%); split: -0.03%, +0.11%
PreSGPRs: 13149 -> 13196 (+0.36%); split: -0.05%, +0.40%
PreVGPRs: 12407 -> 12395 (-0.10%)
VALU: 884471 -> 883475 (-0.11%); split: -0.12%, +0.01%
SALU: 177671 -> 178408 (+0.41%); split: -0.03%, +0.45%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
b90826736d nir/opt_algebraic: optimize bit_count(a) != 0
vkd3d-proton will emit
b = ballot(!gl_HelperInvocation);
(subgroupBallotBitCount(b) != 0u) ? subgroupShuffle(a, subgroupBallotFindLSB(b)) : 0u;

for WaveReadFirstLane(a) in fragment shaders

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33808>
2025-02-28 18:03:04 +00:00
Georg Lehmann
f595bcfe78 nir/opt_varyings: clean up nir_progress usage
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33770>
2025-02-28 14:38:14 +00:00
Job Noorman
739ca77e66 nir/lower_subgroups: use build_cluster_mask for quad mask
build_subgroup_quad_mask can now be written in terms of
build_cluster_mask.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31732>
2025-02-27 18:53:19 +00:00
Benjamin Lee
252c59602e panfrost: implement 16-bit ldexp
Bifrost LDEXP.v2f16 takes a 16-bit exponent, which requires messy
lowering. The codegen for this is quite bad currently, but would be
improved by implementing unpack_32_2x16_split_*, and by fusing
comparisons with CSEL.

The main alternative is converting to F32, then LDEXP.f32, then
converting back to F16. This has better codegen for dynamic exponents
currently, but worse in the common case with a constant exponent where
all the saturating cast logic can be folded.

Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.ldexp.compute.vec2
when shaderFloat16 is enabled in panvk.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
2025-02-27 16:49:11 +00:00
Job Noorman
2619d576e7 nir/lower_phis_to_scalar: don't create moves for undef sources
Creating moves out of undefs makes it more difficult for other passes to
detects undefs without having to chase moves. Instead, just create a new
1-component undef.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29889>
2025-02-27 13:18:14 +00:00
Job Noorman
5ae12b6a5a nir/lower_phis_to_scalar: use nir_builder API where possible
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29889>
2025-02-27 13:18:14 +00:00