Commit graph

10075 commits

Author SHA1 Message Date
Mel Henning
3b341366a6 compiler/rust: Fix running tests
`ninja test` wasn't actually running these tests, I guess because the
target name was duplicated in meson. Fix this so the tests actually run.

Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32812>
2025-01-02 20:52:47 +00:00
Mel Henning
639211dea8 compiler/rust/bitset: Fix the bitset iterator
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32812>
2025-01-02 20:52:47 +00:00
Timur Kristóf
ec548fd37b Revert "nir/opt_varyings: Add workaround for RADV mesh shader multiview."
The workaround is not needed anymore, because RADV now implements
the FS layer ID input as a sysval.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32641>
2025-01-02 14:07:51 +00:00
Marek Olšák
c21bc65ba7 nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.

Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
2025-01-01 00:03:55 +00:00
Georg Lehmann
e112e2b047 nir,amd: optimize front_face ? a : -a
Foz-DB Navi31:
Totals from 3345 (4.21% of 79395) affected shaders:
MaxWaves: 96182 -> 96174 (-0.01%)
Instrs: 3135439 -> 3129508 (-0.19%); split: -0.24%, +0.05%
CodeSize: 16776088 -> 16718048 (-0.35%); split: -0.38%, +0.03%
VGPRs: 190884 -> 190848 (-0.02%); split: -0.03%, +0.01%
Latency: 32624132 -> 32621734 (-0.01%); split: -0.16%, +0.16%
InvThroughput: 5759987 -> 5749957 (-0.17%); split: -0.23%, +0.05%
VClause: 51044 -> 51086 (+0.08%); split: -0.12%, +0.20%
SClause: 103415 -> 103223 (-0.19%); split: -0.64%, +0.45%
Copies: 170398 -> 170555 (+0.09%); split: -0.64%, +0.74%
PreSGPRs: 135567 -> 133887 (-1.24%)
PreVGPRs: 140569 -> 141317 (+0.53%)
VALU: 1959144 -> 1953839 (-0.27%); split: -0.30%, +0.03%
SALU: 217956 -> 217676 (-0.13%); split: -0.20%, +0.07%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann
9bd4296845 nir: add nir_alu_srcs_negative_equal_typed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann
15d754fefa nir: add load_front_face_fsign
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:34 +00:00
Georg Lehmann
b8fa9daf0c nir: sink/move alu with two identical, non constant sources.
Foz-DB Navi21:
Totals from 32363 (40.76% of 79395) affected shaders:
MaxWaves: 787499 -> 787675 (+0.02%); split: +0.02%, -0.00%
Instrs: 28783404 -> 28783464 (+0.00%); split: -0.01%, +0.01%
CodeSize: 156763536 -> 156765148 (+0.00%); split: -0.01%, +0.02%
VGPRs: 1493304 -> 1492848 (-0.03%); split: -0.04%, +0.01%
Latency: 243022511 -> 243051994 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 57827398 -> 57828129 (+0.00%); split: -0.05%, +0.05%
VClause: 582208 -> 582298 (+0.02%); split: -0.07%, +0.08%
SClause: 959634 -> 959312 (-0.03%); split: -0.07%, +0.04%
Copies: 1965821 -> 1965826 (+0.00%); split: -0.17%, +0.17%
Branches: 710593 -> 710596 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 1313513 -> 1313632 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 1210596 -> 1209103 (-0.12%); split: -0.12%, +0.00%
VALU: 19463445 -> 19463497 (+0.00%); split: -0.02%, +0.02%
SALU: 3319529 -> 3319500 (-0.00%); split: -0.01%, +0.01%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32783>
2024-12-30 13:28:30 +00:00
Georg Lehmann
5b4b195f1b nir: optimize unpacking 8bit values from a 64bit source
Useful for load vectorization.

Foz-DB Navi21:
Totals from 299 (0.38% of 79395) affected shaders:
Instrs: 287818 -> 284333 (-1.21%); split: -1.21%, +0.00%
CodeSize: 1557124 -> 1540544 (-1.06%); split: -1.07%, +0.00%
Latency: 4009407 -> 4012389 (+0.07%); split: -0.05%, +0.12%
InvThroughput: 1260613 -> 1262530 (+0.15%); split: -0.01%, +0.17%
VClause: 5472 -> 5369 (-1.88%); split: -1.92%, +0.04%
SClause: 5419 -> 5305 (-2.10%); split: -2.58%, +0.48%
Copies: 36709 -> 36060 (-1.77%); split: -1.81%, +0.04%
PreSGPRs: 11861 -> 11655 (-1.74%)
SALU: 66920 -> 64310 (-3.90%)

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32778>
2024-12-26 17:50:32 +00:00
Marek Olšák
58132d6fc8 radeonsi: implement nir_opt_frag_depth using kill_z instead of the NIR pass
This uses si_shader_info to store whether gl_FragDepth can be removed,
and it uses the kill_z epilog flag to do the removal without recompilation.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
2024-12-24 12:02:20 +00:00
Marek Olšák
dae57e184a glsl,st/mesa: always lower IO for GLSL, unlower IO for drivers
This enables nir_opt_varyings for all gallium drivers.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31942>
2024-12-24 05:54:07 -05:00
Mary Guillemard
13fe5a597b meson: Add mesa-clc and install-mesa-clc options
Due to the cross build issues in current meson, we adds new options to
allow mesa_clc and vtn_bindgen to be installed or searched on the
system.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32719>
2024-12-23 15:09:40 +00:00
Marek Olšák
a50d069d1c nir/opt_varyings: clear info->clip/cull_distance_array_size if relocated
svga breaks if shader_info declares these, but the shader is missing
the outputs.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32684>
2024-12-20 02:32:08 +00:00
Marek Olšák
9d129505b5 nir/opt_varyings: set all IO types to float to facilitate full vectorization
If types differ between components of a vec4 slot, IO vectorization can't
be done.

This also helps drivers like d3d12 that require matching types between
shaders.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32684>
2024-12-20 02:32:08 +00:00
Caterina Shablia
f4fcfa8016 pan,nir: introduce load_attribute_pan
load_attribute_pan is a panfrost-specific intrinsic for loading
vertex attributes. Takes explicit vertex and instance IDs which
we need in order to implement vertex attribute divisor with
non-zero base instance on v9+.

Passes which are used by panvk are modified to be aware of
load_attribute_pan.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32039>
2024-12-18 08:33:16 +00:00
Georg Lehmann
c695043e81 nir/opt_algebraic: optimize min(max(a, b), a)
Foz-DB Navi21:
Totals from 105 (0.13% of 79395) affected shaders:
MaxWaves: 2638 -> 2646 (+0.30%)
Instrs: 76531 -> 75077 (-1.90%)
CodeSize: 413668 -> 406484 (-1.74%)
VGPRs: 4856 -> 4848 (-0.16%)
Latency: 333684 -> 328438 (-1.57%); split: -1.57%, +0.00%
InvThroughput: 80417 -> 78579 (-2.29%)
VClause: 1818 -> 1768 (-2.75%)
SClause: 3028 -> 2964 (-2.11%)
Copies: 4708 -> 4513 (-4.14%); split: -4.50%, +0.36%
PreVGPRs: 3792 -> 3715 (-2.03%); split: -2.08%, +0.05%
VALU: 54734 -> 53528 (-2.20%)
SALU: 6195 -> 6137 (-0.94%)
VMEM: 2363 -> 2313 (-2.12%)
SMEM: 5219 -> 5119 (-1.92%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32634>
2024-12-16 22:29:21 +00:00
Georg Lehmann
0e6d32777f nir/opt_remove_phis: rematerialize equal alu
Foz-DB Navi31:
Totals from 943 (1.19% of 79395) affected shaders:
MaxWaves: 24672 -> 24722 (+0.20%)
Instrs: 1541665 -> 1544956 (+0.21%); split: -0.23%, +0.44%
CodeSize: 8085180 -> 8109212 (+0.30%); split: -0.16%, +0.46%
VGPRs: 57768 -> 57624 (-0.25%)
Latency: 18043743 -> 17948245 (-0.53%); split: -1.28%, +0.75%
InvThroughput: 2692605 -> 2677049 (-0.58%); split: -2.07%, +1.49%
VClause: 25321 -> 25343 (+0.09%); split: -0.48%, +0.57%
SClause: 38473 -> 38614 (+0.37%); split: -0.00%, +0.37%
Copies: 86089 -> 86236 (+0.17%); split: -0.46%, +0.63%
Branches: 36719 -> 36777 (+0.16%); split: -0.60%, +0.76%
PreSGPRs: 44138 -> 44303 (+0.37%); split: -0.05%, +0.42%
PreVGPRs: 43319 -> 43009 (-0.72%)
VALU: 893684 -> 894272 (+0.07%); split: -0.42%, +0.48%
SALU: 189561 -> 191358 (+0.95%); split: -0.05%, +1.00%
VMEM: 42294 -> 42313 (+0.04%); split: -0.44%, +0.49%
SMEM: 72916 -> 73144 (+0.31%)

Instruction count regressions are largly caused by additional
loop unrolling.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31028>
2024-12-16 20:38:38 +00:00
Qiang Yu
129e37bab6 nir: do not generate b2i64 when driver want to lower it
This is found on GFX12 by:
  KHR-GL43.shader_ballot_tests.ShaderBallotBitmasks

ACO does not support it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Alyssa Rosenzweig
923e6361d1 compiler/glsl_types: add glsl_get_word_size_align_bytes
this alignment matches what nir_lower_scratch_to_var wants. this is not
correctness bearing but it mitigates stats regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:13 +00:00
Alyssa Rosenzweig
bd89279dd4 nir: add lower_scratch_to_var pass
to ease opencl pain.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:13 +00:00
Alyssa Rosenzweig
8abb043c19 compiler: add mesa_prim_has_adjacency helper
hk will use this, it's a pretty obvious thing to want.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig
e4f61771d8 compiler: use libcl.h for CL
instead of redefining BITFIELD_BIT.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig
d64caf4161 libcl: add VkDraw(Indexed)IndirectCommand definitions
this is helpful to indirect draw munging code, which applies to at least 3
stacks using driver CL stuff (current Intel, shortterm Asahi, mediumterm
Panfrost)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig
12e27497b3 libcl: add a common header for CPU/GPU stuff
In an attempt to make OpenCL shaders more "batteries included", start building
up a standard library. Based on libagx.h.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:12 +00:00
Alyssa Rosenzweig
13b8af95fb clc: plumb cl_khr_subgroup_ballot
although rusticl isn't lighting it up yet, it's helpful to get
sub_group_ballot for driver CL, which is all standard Vulkan-compatible spirv.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:12 +00:00
Samuel Pitoiset
4d4418dbb3 spirv: add an options to lower SpvOpTerminateInvocation to OpKill
To workaround game bugs like Indiana Jones.

Original workaround found by Hans-Kristian.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32606>
2024-12-12 19:54:39 +00:00
Rhys Perry
26790e90d3 nir: make ballot ALU and mbcnt_amd operations reorderable
These can be lowered to ALU and load_subgroup_invocation, all of which are
reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
650468fbdf nir/move_discards_to_top: don't move across more intrinsics
This missed dpp16_shift_amd, lane_permute_16_amd, last_invocation and
ballot_relaxed.

Instead, list the non-reorderable intrinsics which are allowed to be moved
after discards.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
5368569d06 nir: make load_helper_invocation non-reorderable
This can't be moved to after demote, so it's not reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Georg Lehmann
e8b29abb25 nir: add unsigned upper bound support for fsat
Foz-DB Navi21:
Totals from 89 (0.11% of 79395) affected shaders:
Instrs: 97018 -> 96995 (-0.02%)
CodeSize: 492996 -> 492488 (-0.10%)
Latency: 504649 -> 504555 (-0.02%)
InvThroughput: 121968 -> 121875 (-0.08%)
VALU: 67427 -> 67404 (-0.03%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
e78e63e3fe nir: add unsigned upper bound support for f2i32
Foz-DB Navi21:
Totals from 649 (0.82% of 79395) affected shaders:
CodeSize: 2330592 -> 2314112 (-0.71%)
Latency: 2068161 -> 2053370 (-0.72%)
InvThroughput: 346583 -> 329425 (-4.95%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
0b366a7ab2 nir/uub: properly limit float support to 32bit
Cc: mesa-stable

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Alyssa Rosenzweig
83dd4889a7 nir/lower_point_size: skip non-var derefs
these can happen depending on pass order, otherwise we crash on the null
pointer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
69a0962c70 nir/lower_printf: use 64-bit math
this lets load_store_vectorize vectorize the stores we produce. it also matches
actual OpenCL kernel code looks, so drivers need to have an optimized path for
these 64+32 patterns regardless.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
da967416db nir/lower_printf: use unsigned math
negative offsets/sizes don't make sense, and zero-extension is often easier
to optimize/lower than sign-extension.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
8db0751eb8 nir/lower_printf: lower aborts
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
0b9072e2e5 nir/lower_printf: allow fixed address
fixed address printf buffers can avoid a lot of complexity, especially with the
general case of (e.g.) DGC-enqueued precompiled kernels. so add a knob for that
and save the driver the need to write a lowering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
816c14d33d nir: add printf_abort intrinsic
abort() for the gpu, implemented with the printf infrastructure since they go
together.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Georg Lehmann
c5c22fc3a3 nir: add constant clip/cull distance optimization
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32518>
2024-12-10 16:35:01 +00:00
Benjamin Lee
b01afd06cd nir: update docs for nir_get_io_arrayed_index_src
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
74ccf6cbdc nir: add option to use compact view indices
In panvk we pass absolute view indices to the hardware, so we need to do
the conversion from compacted to absolute at some point. Emitting
absolute indices from nir_lower_multiview initially looks like the
simplest option, but nir_lower_io_to_temporaries will emit a write for
every element of array varyings. This results in unnecessary writes to
disabled views.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
becb014d27 nir: treat per-view outputs as arrayed IO
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.

The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.

ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
6d843cde45 nir: document index semantics in nir_lower_multiview
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
975c3ecd1e nir: handle arbitrary per-view outputs in nir_lower_multiview
This is needed for panvk, where multiview is "all or nothing". When
multiview is enabled, all outputs may be written with separate values
for each view.

The edge case mentioned in the previous `nir_can_lower_multiview` is now
handled because we now handle an arbitrary number of per-view output
vars instead of expecting to find exactly one.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Alyssa Rosenzweig
3d35ea6a6b mesa_clc: add depfile support
This allows the tool to tell ninja what headers it read, so ninja can
correctly rebuild when necessary.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>
2024-12-06 13:48:26 -05:00
Dylan Baker
33a1acb0da clc: Tell clang to track imported dependencies
Clang is capable of tacking what headers it imports, as long as we set
it up to do that. While that isn't important for rusticl, it would be
useful for the various `_clc` tools, as they can then tell Ninja which
headers they read to make rebuilds more reliable.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>
2024-12-06 13:48:26 -05:00
Karmjit Mahil
047049dcb5 nir: Fix the spelling of compare
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Karmjit Mahil
b79994e92d nir,ir3: Add icsel_eqz
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.

total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.

total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.

total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.

total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.

total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.

total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.

total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.

total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Karmjit Mahil
aad0aa0a9c nir/algebraic: turn u{ge,lt} a, 1 to i{ne,eq} a, 0
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Ian Romanick
e1bb53bb3c nir/algebraic: Optimize some trivial bfi
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).

On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0

total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6

fossil-db:

Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)

Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)

Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%

Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493>
2024-12-05 21:39:07 +00:00