Commit graph

5878 commits

Author SHA1 Message Date
Alyssa Rosenzweig
7a4469681e nir: pass a callback to nir_lower_robust_access
rather than try to enumerate everything a driver might want with an unmanageable
collection of booleans, just do a filter callback + data. this ends up simpler
overall, and will allow Intel to use this pass for just 64-bit images without
needing to add even more booleans.

while we're churning the pass signature, also do a quick port to
nir_shader_intrinsics_pass

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [NIR and V3D]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32907>
2025-01-08 15:59:05 +00:00
Daniel Schürmann
d2f52e61c2 nir/divergence: change nir_has_divergent_loop() to return true only for divergent breaks
The important information is whether a loop has a uniform number
of iterations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28627>
2025-01-08 13:33:54 +01:00
Mary Guillemard
ecdccae990 nir,agx: Allow nir_precomp_print_blob to print a static array
This makes it stop leaking shader binary blobs definition and is
required for panfrost clc.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
2025-01-08 11:37:27 +00:00
Georg Lehmann
67d74a04b9 nir/peephole_select: allow load_vector/scalar_arg_amd
Foz-DB Navi21:
Totals from 1507 (1.90% of 79395) affected shaders:
MaxWaves: 31830 -> 31870 (+0.13%); split: +0.20%, -0.08%
Instrs: 938704 -> 937232 (-0.16%); split: -0.19%, +0.03%
CodeSize: 4970860 -> 4964652 (-0.12%); split: -0.14%, +0.02%
VGPRs: 79536 -> 79512 (-0.03%); split: -0.08%, +0.05%
Latency: 5194524 -> 5218285 (+0.46%); split: -0.38%, +0.84%
InvThroughput: 1200152 -> 1207251 (+0.59%); split: -0.02%, +0.61%
VClause: 20728 -> 20741 (+0.06%); split: -0.11%, +0.17%
SClause: 33612 -> 32871 (-2.20%); split: -2.78%, +0.57%
Copies: 70601 -> 68847 (-2.48%); split: -2.62%, +0.13%
Branches: 20032 -> 17521 (-12.53%)
PreSGPRs: 47828 -> 47801 (-0.06%)
VALU: 637446 -> 638094 (+0.10%); split: -0.02%, +0.13%
SALU: 88627 -> 88462 (-0.19%); split: -1.08%, +0.90%
VMEM: 36664 -> 36659 (-0.01%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32792>
2025-01-08 09:56:39 +00:00
Boris Brezillon
2af6e4beeb pan: Don't pretend we support load_{vertex_id_zero_base,first_vertex}
load_vertex_id_zero_base() is supposed to return the zero-based
vertex ID, which is then offset by load_first_vertex() to get
an absolute vertex ID. At the same time, when we're in a Vulkan
environment, load_first_vertex() also encodes the vertexOffset
passed to the indexed draw.

Midgard/Bifrost have a sligtly different semantics, where
load_first_vertex() returns vertexOffset + minVertexIdInIndexRange,
and load_vertex_id_zero_base() returns an ID that needs to be offset
by this vertexOffset + minVertexIdInIndexRange to get the absolute
vertex ID. Everything works fine as long as all the load_first_vertex()
and load_vertex_id_zero_base() calls are coming from the
load_vertex_id() lowering. But as mentioned above, that's no longer
the case in Vulkan, where gl_BaseVertexARB will be turned into
load_first_vertex() and expect a value of vertexOffset in an
indexed draw context.

We thus need to fix the mismatch by introducing two new
panfrost-specific intrinsic so we can stop abusing load_first_vertex()
and load_vertex_id_zero_base().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32415>
2025-01-07 08:15:19 +00:00
Marek Olšák
3800f0af41 nir/algebraic: optimize pack_split(unpack(a).x, unpack(a).y) -> a
This is required to optimize FP64 and Int64 shaders generated by
virglrenderer. It generates pack/unpack around every 64-bit op,
which NIR currently can't eliminate. This fixes that.

There is a new constraint ".y", which means that the use of an instruction
should have swizzle.y. This allows us to add patterns that have Y swizzle
on results of instructions.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Marek Olšák
b1bc691b0f nir/algebraic: add and improve pack/unpack patterns
Some duplicated patterns are removed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Marek Olšák
ebec182b04 nir/algebraic: use is_used_once for comparison patterns
otherwise we are just creating new instructions while not removing any

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Marek Olšák
ee8916c414 nir: use IO intrinsics in nir_lower_drawpixels
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
0de28a9fd0 nir: use IO intrinsics in nir_lower_bitmap
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
a7ad1b302b nir: remove redundant option linker_ignore_precision
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
730c8d506f nir: flip the early exit condition in nir_lower_io_temporaries
no change in behavior other than skipping COMPUTE as well.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
7b55ee999d nir: don't set num_slots/src/dest_type/write_mask when they're set automatically
to those values

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
55a4a8a2a8 nir: set src_type and dest_type to float implicitly for IO build helpers
If you want to set it to int/uint, set .src_type or .dest_type. If you want
to set it to float, you don't need to set the type at all. It's implicitly
set to float.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Marek Olšák
b9f9d001d7 nir: set nir_io_semantics::num_slots to at least 1 in build helpers
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32779>
2025-01-06 19:09:17 +00:00
Benjamin Lee
081438ad39 panfrost: add nir pass to lower noperspective varyings
Mali only supports perspective-correct varying interpolation in
hardware, so we have to emulate noperspective with lowering in both the
VS and FS.

Both vulkan and opengl allow mismatched interpolation qualifiers between
stages. Because we need all varyings that are noperspective in the FS to
be lowered in the VS, we cannot rely on the interpolation qualifiers in
the VS. Loading the set of noperspective varyings as a sysval allows the
implementation to pass them as a compile-time constant when known
statically, or a runtime push constant when not. Passing noperspective
varyings dynamically has a performance cost with unnecessary branches
and fmuls.

This sysval is not hooked up yet in either panfrost or panvk, so shader
compilation will fail.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
2025-01-03 07:04:05 +00:00
Benjamin Lee
6f541e2016 panfrost: add intrinsic to load frag coord at a barycentric
This is needed for noperspective lowering, where we need to multiply the
varying value by gl_FragCoord.w at the same barycentric as the varying.
Normal nir_load_frag_coord_zw instructions are lowered to the new
intrinsic on bifrost with the pan_lower_frag_coord_zw pass.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
2025-01-03 07:04:05 +00:00
Timur Kristóf
ec548fd37b Revert "nir/opt_varyings: Add workaround for RADV mesh shader multiview."
The workaround is not needed anymore, because RADV now implements
the FS layer ID input as a sysval.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32641>
2025-01-02 14:07:51 +00:00
Marek Olšák
c21bc65ba7 nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.

Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
2025-01-01 00:03:55 +00:00
Georg Lehmann
e112e2b047 nir,amd: optimize front_face ? a : -a
Foz-DB Navi31:
Totals from 3345 (4.21% of 79395) affected shaders:
MaxWaves: 96182 -> 96174 (-0.01%)
Instrs: 3135439 -> 3129508 (-0.19%); split: -0.24%, +0.05%
CodeSize: 16776088 -> 16718048 (-0.35%); split: -0.38%, +0.03%
VGPRs: 190884 -> 190848 (-0.02%); split: -0.03%, +0.01%
Latency: 32624132 -> 32621734 (-0.01%); split: -0.16%, +0.16%
InvThroughput: 5759987 -> 5749957 (-0.17%); split: -0.23%, +0.05%
VClause: 51044 -> 51086 (+0.08%); split: -0.12%, +0.20%
SClause: 103415 -> 103223 (-0.19%); split: -0.64%, +0.45%
Copies: 170398 -> 170555 (+0.09%); split: -0.64%, +0.74%
PreSGPRs: 135567 -> 133887 (-1.24%)
PreVGPRs: 140569 -> 141317 (+0.53%)
VALU: 1959144 -> 1953839 (-0.27%); split: -0.30%, +0.03%
SALU: 217956 -> 217676 (-0.13%); split: -0.20%, +0.07%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann
9bd4296845 nir: add nir_alu_srcs_negative_equal_typed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann
15d754fefa nir: add load_front_face_fsign
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:34 +00:00
Georg Lehmann
b8fa9daf0c nir: sink/move alu with two identical, non constant sources.
Foz-DB Navi21:
Totals from 32363 (40.76% of 79395) affected shaders:
MaxWaves: 787499 -> 787675 (+0.02%); split: +0.02%, -0.00%
Instrs: 28783404 -> 28783464 (+0.00%); split: -0.01%, +0.01%
CodeSize: 156763536 -> 156765148 (+0.00%); split: -0.01%, +0.02%
VGPRs: 1493304 -> 1492848 (-0.03%); split: -0.04%, +0.01%
Latency: 243022511 -> 243051994 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 57827398 -> 57828129 (+0.00%); split: -0.05%, +0.05%
VClause: 582208 -> 582298 (+0.02%); split: -0.07%, +0.08%
SClause: 959634 -> 959312 (-0.03%); split: -0.07%, +0.04%
Copies: 1965821 -> 1965826 (+0.00%); split: -0.17%, +0.17%
Branches: 710593 -> 710596 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 1313513 -> 1313632 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 1210596 -> 1209103 (-0.12%); split: -0.12%, +0.00%
VALU: 19463445 -> 19463497 (+0.00%); split: -0.02%, +0.02%
SALU: 3319529 -> 3319500 (-0.00%); split: -0.01%, +0.01%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32783>
2024-12-30 13:28:30 +00:00
Georg Lehmann
5b4b195f1b nir: optimize unpacking 8bit values from a 64bit source
Useful for load vectorization.

Foz-DB Navi21:
Totals from 299 (0.38% of 79395) affected shaders:
Instrs: 287818 -> 284333 (-1.21%); split: -1.21%, +0.00%
CodeSize: 1557124 -> 1540544 (-1.06%); split: -1.07%, +0.00%
Latency: 4009407 -> 4012389 (+0.07%); split: -0.05%, +0.12%
InvThroughput: 1260613 -> 1262530 (+0.15%); split: -0.01%, +0.17%
VClause: 5472 -> 5369 (-1.88%); split: -1.92%, +0.04%
SClause: 5419 -> 5305 (-2.10%); split: -2.58%, +0.48%
Copies: 36709 -> 36060 (-1.77%); split: -1.81%, +0.04%
PreSGPRs: 11861 -> 11655 (-1.74%)
SALU: 66920 -> 64310 (-3.90%)

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32778>
2024-12-26 17:50:32 +00:00
Marek Olšák
58132d6fc8 radeonsi: implement nir_opt_frag_depth using kill_z instead of the NIR pass
This uses si_shader_info to store whether gl_FragDepth can be removed,
and it uses the kill_z epilog flag to do the removal without recompilation.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
2024-12-24 12:02:20 +00:00
Marek Olšák
a50d069d1c nir/opt_varyings: clear info->clip/cull_distance_array_size if relocated
svga breaks if shader_info declares these, but the shader is missing
the outputs.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32684>
2024-12-20 02:32:08 +00:00
Marek Olšák
9d129505b5 nir/opt_varyings: set all IO types to float to facilitate full vectorization
If types differ between components of a vec4 slot, IO vectorization can't
be done.

This also helps drivers like d3d12 that require matching types between
shaders.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32684>
2024-12-20 02:32:08 +00:00
Caterina Shablia
f4fcfa8016 pan,nir: introduce load_attribute_pan
load_attribute_pan is a panfrost-specific intrinsic for loading
vertex attributes. Takes explicit vertex and instance IDs which
we need in order to implement vertex attribute divisor with
non-zero base instance on v9+.

Passes which are used by panvk are modified to be aware of
load_attribute_pan.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32039>
2024-12-18 08:33:16 +00:00
Georg Lehmann
c695043e81 nir/opt_algebraic: optimize min(max(a, b), a)
Foz-DB Navi21:
Totals from 105 (0.13% of 79395) affected shaders:
MaxWaves: 2638 -> 2646 (+0.30%)
Instrs: 76531 -> 75077 (-1.90%)
CodeSize: 413668 -> 406484 (-1.74%)
VGPRs: 4856 -> 4848 (-0.16%)
Latency: 333684 -> 328438 (-1.57%); split: -1.57%, +0.00%
InvThroughput: 80417 -> 78579 (-2.29%)
VClause: 1818 -> 1768 (-2.75%)
SClause: 3028 -> 2964 (-2.11%)
Copies: 4708 -> 4513 (-4.14%); split: -4.50%, +0.36%
PreVGPRs: 3792 -> 3715 (-2.03%); split: -2.08%, +0.05%
VALU: 54734 -> 53528 (-2.20%)
SALU: 6195 -> 6137 (-0.94%)
VMEM: 2363 -> 2313 (-2.12%)
SMEM: 5219 -> 5119 (-1.92%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32634>
2024-12-16 22:29:21 +00:00
Georg Lehmann
0e6d32777f nir/opt_remove_phis: rematerialize equal alu
Foz-DB Navi31:
Totals from 943 (1.19% of 79395) affected shaders:
MaxWaves: 24672 -> 24722 (+0.20%)
Instrs: 1541665 -> 1544956 (+0.21%); split: -0.23%, +0.44%
CodeSize: 8085180 -> 8109212 (+0.30%); split: -0.16%, +0.46%
VGPRs: 57768 -> 57624 (-0.25%)
Latency: 18043743 -> 17948245 (-0.53%); split: -1.28%, +0.75%
InvThroughput: 2692605 -> 2677049 (-0.58%); split: -2.07%, +1.49%
VClause: 25321 -> 25343 (+0.09%); split: -0.48%, +0.57%
SClause: 38473 -> 38614 (+0.37%); split: -0.00%, +0.37%
Copies: 86089 -> 86236 (+0.17%); split: -0.46%, +0.63%
Branches: 36719 -> 36777 (+0.16%); split: -0.60%, +0.76%
PreSGPRs: 44138 -> 44303 (+0.37%); split: -0.05%, +0.42%
PreVGPRs: 43319 -> 43009 (-0.72%)
VALU: 893684 -> 894272 (+0.07%); split: -0.42%, +0.48%
SALU: 189561 -> 191358 (+0.95%); split: -0.05%, +1.00%
VMEM: 42294 -> 42313 (+0.04%); split: -0.44%, +0.49%
SMEM: 72916 -> 73144 (+0.31%)

Instruction count regressions are largly caused by additional
loop unrolling.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31028>
2024-12-16 20:38:38 +00:00
Qiang Yu
129e37bab6 nir: do not generate b2i64 when driver want to lower it
This is found on GFX12 by:
  KHR-GL43.shader_ballot_tests.ShaderBallotBitmasks

ACO does not support it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Alyssa Rosenzweig
bd89279dd4 nir: add lower_scratch_to_var pass
to ease opencl pain.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:13 +00:00
Rhys Perry
26790e90d3 nir: make ballot ALU and mbcnt_amd operations reorderable
These can be lowered to ALU and load_subgroup_invocation, all of which are
reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
650468fbdf nir/move_discards_to_top: don't move across more intrinsics
This missed dpp16_shift_amd, lane_permute_16_amd, last_invocation and
ballot_relaxed.

Instead, list the non-reorderable intrinsics which are allowed to be moved
after discards.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
5368569d06 nir: make load_helper_invocation non-reorderable
This can't be moved to after demote, so it's not reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Georg Lehmann
e8b29abb25 nir: add unsigned upper bound support for fsat
Foz-DB Navi21:
Totals from 89 (0.11% of 79395) affected shaders:
Instrs: 97018 -> 96995 (-0.02%)
CodeSize: 492996 -> 492488 (-0.10%)
Latency: 504649 -> 504555 (-0.02%)
InvThroughput: 121968 -> 121875 (-0.08%)
VALU: 67427 -> 67404 (-0.03%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
e78e63e3fe nir: add unsigned upper bound support for f2i32
Foz-DB Navi21:
Totals from 649 (0.82% of 79395) affected shaders:
CodeSize: 2330592 -> 2314112 (-0.71%)
Latency: 2068161 -> 2053370 (-0.72%)
InvThroughput: 346583 -> 329425 (-4.95%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
0b366a7ab2 nir/uub: properly limit float support to 32bit
Cc: mesa-stable

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Alyssa Rosenzweig
83dd4889a7 nir/lower_point_size: skip non-var derefs
these can happen depending on pass order, otherwise we crash on the null
pointer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
69a0962c70 nir/lower_printf: use 64-bit math
this lets load_store_vectorize vectorize the stores we produce. it also matches
actual OpenCL kernel code looks, so drivers need to have an optimized path for
these 64+32 patterns regardless.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
da967416db nir/lower_printf: use unsigned math
negative offsets/sizes don't make sense, and zero-extension is often easier
to optimize/lower than sign-extension.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
8db0751eb8 nir/lower_printf: lower aborts
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
0b9072e2e5 nir/lower_printf: allow fixed address
fixed address printf buffers can avoid a lot of complexity, especially with the
general case of (e.g.) DGC-enqueued precompiled kernels. so add a knob for that
and save the driver the need to write a lowering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
816c14d33d nir: add printf_abort intrinsic
abort() for the gpu, implemented with the printf infrastructure since they go
together.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Georg Lehmann
c5c22fc3a3 nir: add constant clip/cull distance optimization
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32518>
2024-12-10 16:35:01 +00:00
Benjamin Lee
b01afd06cd nir: update docs for nir_get_io_arrayed_index_src
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
74ccf6cbdc nir: add option to use compact view indices
In panvk we pass absolute view indices to the hardware, so we need to do
the conversion from compacted to absolute at some point. Emitting
absolute indices from nir_lower_multiview initially looks like the
simplest option, but nir_lower_io_to_temporaries will emit a write for
every element of array varyings. This results in unnecessary writes to
disabled views.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
becb014d27 nir: treat per-view outputs as arrayed IO
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.

The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.

ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
6d843cde45 nir: document index semantics in nir_lower_multiview
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
975c3ecd1e nir: handle arbitrary per-view outputs in nir_lower_multiview
This is needed for panvk, where multiview is "all or nothing". When
multiview is enabled, all outputs may be written with separate values
for each view.

The edge case mentioned in the previous `nir_can_lower_multiview` is now
handled because we now handle an arbitrary number of per-view output
vars instead of expecting to find exactly one.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00