Commit graph

72 commits

Author SHA1 Message Date
Samuel Pitoiset
22e1d1a1f4 nir/opt_sink: add heap support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40657>
2026-03-31 10:10:17 +00:00
Lorenzo Rossi
c0e0591999 pan/compiler: Replace frag_coord_zw_pan with var_special_pan
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Just a bit cleaner, and we can unify point size too.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
2026-03-27 19:23:02 +00:00
Alyssa Rosenzweig
373358da45 nir/opt_sink: sink pack_64_2x32_split
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.

Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Alyssa Rosenzweig
507e7a04bf nir/opt_sink: sink Intel UBO loads
Acts like load_ubo, handle it in the same path.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Georg Lehmann
077b654cc7 nir: don't sink alu that uses ballot(true)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Don't sink alu that uses ballot(true), as that can a local system value
and moving the alu then requires a new mov in the old location.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38829>
2025-12-08 09:07:54 +00:00
Kenneth Graunke
f1ab64ad74 nir: add new intrinsics to load/store from URB on intel
We add several new intrinsics for accessing URB handles:

- load_urb_output_handle_intel
- load_urb_input_handle_intel
- load_urb_input_handle_intel_indexed

The latter is used by stages like TCS and GS where each input control
point has a unique handle.  The index is which ICP to read from.  The
others are for most stages, where all inputs or outputs are accessed
via a single handle.

Then we have URB load and store operations, split for Xe2+ (URB via LSC)
and earlier (HDC OWord messages):

- load_urb_vec4_intel
- load_urb_lsc_intel
- store_urb_vec4_intel
- store_urb_lsc_intel

The legacy vec4 variants take a handle and a 128-bit OWord offset as
sources.  Additionally, stores take a set of channel enables to mask
off and avoid writing vec4 components.  We don't use the WRITE_MASK
const-index as our channel enables are not required to be constant.

The Xe2+ LSC variants are simpler.  Handles are byte offsets into the
URB memory region, and offsets are expressed in bytes.  So we simply
add them into a single "address" source.  We don't support writemasks
here, as they aren't really necessary with the better addressability.
(Plus, the store_cmask operations work significantly differently than
the previous HDC OWord messages).  We will lower disjoint writemasks
to multiple stores.

Based on earlier code by Lionel Landwerlin.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
2025-11-25 22:43:54 +00:00
Konstantin Seurer
de32f9275f treewide: add & use parent instr helpers
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:

* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*

Plus nir_def_instr() where there's no more suitable helper.

Also an existing helper is renamed to unify all the names, while we're
churning the tree:

* nir_src_as_alu_instr -> nir_src_as_alu

..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.

Acked-by: Marek Olšák <maraeo@gmail.com>

---

To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.

Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
2025-11-12 21:22:13 +00:00
Marek Olšák
3fe651f607 nir: remove load_smem_amd
replaced by load_global_amd + ACCESS_SMEM_AMD

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>
2025-10-08 08:54:11 +00:00
Mel Henning
17876a00af nir: Add a faster lowest common ancestor algorithm
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.

No stat changes on shader-db.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
2025-09-08 23:03:13 +00:00
Mel Henning
ee8d448241 nir: Don't require nir_metadata_control_flow
We're about to add to nir_metadata_control_flow, and we don't want
passes to require the new metadata.

Via coccinelle:

@@
expression e1;
@@
- nir_metadata_require(e1, nir_metadata_control_flow)
+ nir_metadata_require(e1, nir_metadata_block_index | nir_metadata_dominance)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
2025-09-08 23:03:13 +00:00
Marek Olšák
48050dbef6 nir/opt_sink: handle load_global_amd
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 14:55:13 -04:00
Marek Olšák
3aadae22ad nir: make nir_block::predecessors & dom_frontier sets non-malloc'd
We can just place the set structures inside nir_block.

This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
2025-08-21 06:13:48 +00:00
Karol Herbst
83cf765f8e nak: run nir_opt_move nir_move_load_ubo
Usually we can fold most ldc and ldcx into the instruction using it,
however there are a couple of cases where we can't, e.g. when there is an
indirect offset.

Moving the ldc(x) down to the consumer leads to increase value ranges for
uniform registers, but lowering them for normal registers.

Totals:
CodeSize: 914650304 -> 914469536 (-0.02%); split: -0.05%, +0.03%
Number of GPRs: 3879754 -> 3863818 (-0.41%); split: -0.42%, +0.01%
Static cycle count: 1073273107 -> 1073101189 (-0.02%); split: -0.09%, +0.08%
Spills to reg: 67219 -> 67707 (+0.73%); split: -0.10%, +0.83%
Fills from reg: 79733 -> 80456 (+0.91%); split: -0.10%, +1.01%
Max warps/SM: 3666036 -> 3672668 (+0.18%); split: +0.18%, -0.00%

Totals from 24235 (27.66% of 87622) affected shaders:
CodeSize: 444747392 -> 444566624 (-0.04%); split: -0.11%, +0.07%
Number of GPRs: 1360384 -> 1344448 (-1.17%); split: -1.20%, +0.03%
Static cycle count: 806310857 -> 806138939 (-0.02%); split: -0.12%, +0.10%
Spills to reg: 35826 -> 36314 (+1.36%); split: -0.19%, +1.55%
Fills from reg: 31863 -> 32586 (+2.27%); split: -0.26%, +2.53%
Max warps/SM: 911328 -> 917960 (+0.73%); split: +0.74%, -0.01%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36536>
2025-08-19 17:29:07 +00:00
Alyssa Rosenzweig
bcf1a1c20b treewide: use nir_def_block
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -definition->parent_instr->block
    +nir_def_block(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Marek Olšák
d61edf079b nir: add nir_move_only_convergent/divergent
This will be needed by nir_opt_move_reorder_loads, which will use
the move flags.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:53 -04:00
Marek Olšák
35bbc8405b nir: add more nir_move_options
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:51 -04:00
Marek Olšák
44d78c4451 nir: handle load_input_vertex in nir_can_move_instr
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:49 -04:00
Marek Olšák
8d3e76c250 nir: split nir_move_load_frag_coord from nir_move_load_input
It's a pure system value on AMD, not an input.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:48 -04:00
Marek Olšák
8d584586f5 nir: handle can_reorder robustly in nir_can_move_instr
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:44 -04:00
Marek Olšák
c229c93540 nir: change how can_mov_out_of_loop is set for intrinsics in nir_can_move_instr
Set to false first, then set to true when needed.

More intrinsics will set false.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:42 -04:00
Alyssa Rosenzweig
63ce73a601 nir,hk: sink lowered UBOs
this is better than doing it once we've lowered to hardware ops which makes it
more challenging to sink since then we'd have to sink the whole tree instead of
a single intrinsic.

Totals from 17617 (32.81% of 53701) affected shaders:
MaxWaves: 16863872 -> 16901504 (+0.22%); split: +0.24%, -0.02%
Instrs: 12406405 -> 12430375 (+0.19%); split: -0.15%, +0.35%
CodeSize: 87055248 -> 87180802 (+0.14%); split: -0.18%, +0.33%
Spills: 10350 -> 9301 (-10.14%); split: -11.57%, +1.43%
Fills: 5215 -> 3733 (-28.42%); split: -31.49%, +3.07%
Scratch: 113164 -> 110472 (-2.38%); split: -2.63%, +0.25%
ALU: 9552550 -> 9558513 (+0.06%); split: -0.22%, +0.28%
FSCIB: 9552545 -> 9558508 (+0.06%); split: -0.22%, +0.28%
IC: 2874032 -> 2876442 (+0.08%); split: -0.00%, +0.09%
GPRs: 1470040 -> 1459283 (-0.73%); split: -1.00%, +0.27%
Uniforms: 5113254 -> 5115158 (+0.04%); split: -0.82%, +0.85%

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info> [NIR]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>
2025-06-26 16:41:55 +00:00
Alyssa Rosenzweig
caa0854da8 nir: plumb load_global_bounded
this lets the backend implement bounded loads (i.e. robust SSBOs) in a way
that's more clever than a full branch. similar idea to
load_global_constant_bound which should eventually be merged into this.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>
2025-06-26 16:41:53 +00:00
Georg Lehmann
7de352e99e nir,radv: add an option to not move 8/16bit vecs
ACO will overestimate the register demand of the sources, so we don't
want to create the vector later.

Foz-DB Navi48:
Totals from 240 (0.30% of 80265) affected shaders:
MaxWaves: 6429 -> 6435 (+0.09%)
Instrs: 3406069 -> 3406646 (+0.02%); split: -0.01%, +0.03%
CodeSize: 18231596 -> 18233288 (+0.01%); split: -0.01%, +0.02%
VGPRs: 14768 -> 14732 (-0.24%)
Latency: 18981274 -> 18979170 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 4247331 -> 4246634 (-0.02%); split: -0.02%, +0.01%
VClause: 85453 -> 85458 (+0.01%); split: -0.01%, +0.01%
Copies: 262046 -> 261971 (-0.03%); split: -0.06%, +0.03%
PreVGPRs: 10899 -> 10775 (-1.14%)
VALU: 1923441 -> 1923485 (+0.00%); split: -0.01%, +0.01%
SALU: 457983 -> 457982 (-0.00%)
VOPD: 4980 -> 4861 (-2.39%); split: +0.48%, -2.87%

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35729>
2025-06-26 09:29:43 +00:00
Georg Lehmann
7ac9a87572 nir/opt_sink: don't assume moving conversion can't increase register pressure
Foz-DB Navi48:
Totals from 11311 (14.09% of 80265) affected shaders:
MaxWaves: 337664 -> 337648 (-0.00%); split: +0.00%, -0.01%
Instrs: 10102221 -> 10101625 (-0.01%); split: -0.05%, +0.04%
CodeSize: 55000184 -> 54999292 (-0.00%); split: -0.04%, +0.03%
VGPRs: 571052 -> 571064 (+0.00%); split: -0.03%, +0.03%
Latency: 59247189 -> 59204726 (-0.07%); split: -0.13%, +0.06%
InvThroughput: 10236407 -> 10215675 (-0.20%); split: -0.26%, +0.06%
VClause: 211730 -> 211677 (-0.03%); split: -0.07%, +0.04%
SClause: 284802 -> 284762 (-0.01%); split: -0.07%, +0.06%
Copies: 702890 -> 702539 (-0.05%); split: -0.18%, +0.13%
Branches: 205117 -> 205112 (-0.00%)
PreSGPRs: 475898 -> 475825 (-0.02%); split: -0.02%, +0.00%
PreVGPRs: 366318 -> 366449 (+0.04%); split: -0.14%, +0.17%
VALU: 5764791 -> 5764349 (-0.01%); split: -0.02%, +0.01%
SALU: 1259529 -> 1259517 (-0.00%); split: -0.04%, +0.04%
VOPD: 5854 -> 5724 (-2.22%); split: +0.70%, -2.92%

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35729>
2025-06-26 09:29:43 +00:00
Lionel Landwerlin
16fca611d7 nir: add new intel ssbo intrinsics
Similar to ir3 ones, to optimize offsets in the backend.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:23 +00:00
Alyssa Rosenzweig
b0f8c22682 nir/opt_sink: sink agx backfacing
helps an elden ring shader:

Totals from 1 (0.03% of 3206) affected shaders:
Instrs: 4146 -> 4141 (-0.12%)
CodeSize: 27374 -> 27334 (-0.15%)
ALU: 2554 -> 2549 (-0.20%)
FSCIB: 2554 -> 2549 (-0.20%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35559>
2025-06-20 16:09:28 +00:00
Emma Anholt
7db62e6dad nir: Split nir_load_frag_coord_zw to separate z/w intrinsics.
This will be a win for Intel for tracking which payload values need to be
set up.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:36 +00:00
Mary Guillemard
e0be93d881 nir: Add Panfrost specific shader_output intrinsic
On Avalon, this is a bitfield that holds information on what
values a vertex shader should output.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33910>
2025-03-10 07:38:16 +01:00
Alyssa Rosenzweig
9a58a8257e treewide: Switch to nir_progress
Via the Coccinelle patch at the end of the commit message, followed by

sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
ninja -C ~/mesa/build clang-format
cd ~/mesa/src/compiler/nir && clang-format -i *.c
agxfmt

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -return true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    -}
    +bool progress = prog_expr;
    +return nir_progress(progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    +nir_progress(prog, impl, metadata);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -return true;
    +return nir_progress(true, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    +return nir_no_progress(impl);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -other_prog |= prog;
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -other_prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -other_prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +nir_progress(impl_progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    +prog = nir_progress(true, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    +nir_no_progress(impl);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    +nir_progress(true, impl, metadata);

squashme! sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
2025-02-26 15:19:53 +00:00
Benjamin Lee
6f541e2016 panfrost: add intrinsic to load frag coord at a barycentric
This is needed for noperspective lowering, where we need to multiply the
varying value by gl_FragCoord.w at the same barycentric as the varying.
Normal nir_load_frag_coord_zw instructions are lowered to the new
intrinsic on bifrost with the pan_lower_frag_coord_zw pass.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
2025-01-03 07:04:05 +00:00
Georg Lehmann
b8fa9daf0c nir: sink/move alu with two identical, non constant sources.
Foz-DB Navi21:
Totals from 32363 (40.76% of 79395) affected shaders:
MaxWaves: 787499 -> 787675 (+0.02%); split: +0.02%, -0.00%
Instrs: 28783404 -> 28783464 (+0.00%); split: -0.01%, +0.01%
CodeSize: 156763536 -> 156765148 (+0.00%); split: -0.01%, +0.02%
VGPRs: 1493304 -> 1492848 (-0.03%); split: -0.04%, +0.01%
Latency: 243022511 -> 243051994 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 57827398 -> 57828129 (+0.00%); split: -0.05%, +0.05%
VClause: 582208 -> 582298 (+0.02%); split: -0.07%, +0.08%
SClause: 959634 -> 959312 (-0.03%); split: -0.07%, +0.04%
Copies: 1965821 -> 1965826 (+0.00%); split: -0.17%, +0.17%
Branches: 710593 -> 710596 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 1313513 -> 1313632 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 1210596 -> 1209103 (-0.12%); split: -0.12%, +0.00%
VALU: 19463445 -> 19463497 (+0.00%); split: -0.02%, +0.02%
SALU: 3319529 -> 3319500 (-0.00%); split: -0.01%, +0.01%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32783>
2024-12-30 13:28:30 +00:00
Caterina Shablia
f4fcfa8016 pan,nir: introduce load_attribute_pan
load_attribute_pan is a panfrost-specific intrinsic for loading
vertex attributes. Takes explicit vertex and instance IDs which
we need in order to implement vertex attribute divisor with
non-zero base instance on v9+.

Passes which are used by panvk are modified to be aware of
load_attribute_pan.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32039>
2024-12-18 08:33:16 +00:00
Georg Lehmann
dbf63a0788 nir: remove nir_op_is_derivative
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
41e82b8b8e nir: sink is_subgroup_invocation_lt_amd
Having it closer to the branches means we can eliminate an exec copy.

Foz-DB Navi31:
Totals from 11615 (14.63% of 79395) affected shaders:
Instrs: 6804372 -> 6804903 (+0.01%); split: -0.04%, +0.05%
CodeSize: 33684672 -> 33680584 (-0.01%); split: -0.07%, +0.05%
VGPRs: 578616 -> 578604 (-0.00%)
SpillSGPRs: 1506 -> 1304 (-13.41%)
Latency: 29817034 -> 29821320 (+0.01%); split: -0.03%, +0.05%
InvThroughput: 3581587 -> 3581217 (-0.01%); split: -0.02%, +0.01%
VClause: 124826 -> 124782 (-0.04%); split: -0.04%, +0.00%
SClause: 187916 -> 187645 (-0.14%); split: -0.27%, +0.13%
Copies: 520969 -> 510027 (-2.10%); split: -2.20%, +0.10%
PreSGPRs: 442584 -> 421344 (-4.80%)
VALU: 3810755 -> 3810267 (-0.01%); split: -0.01%, +0.00%
SALU: 763402 -> 752650 (-1.41%); split: -1.48%, +0.07%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
2024-09-26 14:29:14 +00:00
Georg Lehmann
7fa7812219 nir: merge out of loop decision with nir_can_move_instr logic
One place to modify instead of two when adding new intrinsics here.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
2024-09-12 21:49:34 +00:00
Georg Lehmann
91f8e32a85 nir/opt_sink: do not sink inverse_ballot out of loops
Inverse_ballot result is undefined if the input is not dynamically uniform.
And sinking out of loops might make the input divergent.

Fixes: 18a0ff137f ("nir: sink/move inverse_ballot like moves")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
2024-09-12 21:49:34 +00:00
Georg Lehmann
1ec3cc2aed nir/opt_sink: do not sink load_ubo_vec4 out of loops
Same reason as for load_ubo.

Fixes: d199d65c3a ("nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
2024-09-12 21:49:34 +00:00
Daniel Schürmann
50d416fe89 nir: add nir_block *nir_src_get_block(src) helper
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7710>
2024-08-29 09:42:55 +00:00
Marek Olšák
b2d32ae246 nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag
Instead of having 1 bit in nir_io_semantics indicating a per-primitive
FS input, add a dedicated intrinsic for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
2024-07-23 16:13:16 +00:00
Daniel Schürmann
ffef3d1709 nir/opt_sink: ignore loops without backedge
Loops without backedge should not be considered loops.
For RADV, 2069 (2.61% of 79395) affected shaders.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783>
2024-07-16 12:29:08 +00:00
Karol Herbst
d5da434851 nir/opt_sink: add load_kernel_input
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:02 +00:00
Alyssa Rosenzweig
15257b65c6 treewide: use nir_metadata_control_flow
Via Coccinelle patch:

    @@
    @@

    -nir_metadata_block_index | nir_metadata_dominance
    +nir_metadata_control_flow

...plus some manual fixups for call sites missed by coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>
2024-06-17 16:28:14 -04:00
Georg Lehmann
18a0ff137f nir: sink/move inverse_ballot like moves
It's just a copy for the backends that don't lower it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29502>
2024-06-04 15:40:57 +00:00
Alyssa Rosenzweig
c39896b17b nir: Use getters for nir_src::parent_*
First, we need to give the parent_instr field a unique name to be able to
replace with a helper.  We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.

This was done with a combination of sed and manual fix-ups.

Then we use semantic patches plus manual fixups:

    @@
    expression s;
    @@

    -s->renamed_parent_instr
    +nir_src_parent_instr(s)

    @@
    expression s;
    @@

    -s.renamed_parent_instr
    +nir_src_parent_instr(&s)

    @@
    expression s;
    @@

    -s->parent_if
    +nir_src_parent_if(s)

    @@
    expression s;
    @@

    -s.renamed_parent_if
    +nir_src_parent_if(&s)

    @@
    expression s;
    @@

    -s->is_if
    +nir_src_is_if(s)

    @@
    expression s;
    @@

    -s.is_if
    +nir_src_is_if(&s)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
2023-10-10 04:58:05 -04:00
Alyssa Rosenzweig
4bcb62d203 nir/opt_sink: Also consider load_preamble as const
Acts like constants, schedule them like constants. This lets us move lowered
frag coord code down. Results on dolphin ubers:

   total instructions in shared programs: 195144 -> 196633 (0.76%)
   instructions in affected programs: 175737 -> 177226 (0.85%)
   helped: 28
   HURT: 27
   Instructions are HURT.

   total bytes in shared programs: 1379980 -> 1388308 (0.60%)
   bytes in affected programs: 1244250 -> 1252578 (0.67%)
   helped: 28
   HURT: 27
   Bytes are HURT.

   total halfregs in shared programs: 13591 -> 13557 (-0.25%)
   halfregs in affected programs: 2176 -> 2142 (-1.56%)
   helped: 12
   HURT: 2
   Inconclusive result (%-change mean confidence interval includes 0).

   total threads in shared programs: 233728 -> 234112 (0.16%)
   threads in affected programs: 3264 -> 3648 (11.76%)
   helped: 6
   HURT: 0
   Threads are helped.

Results on Android shader-db:

   total instructions in shared programs: 1775324 -> 1775912 (0.03%)
   instructions in affected programs: 155305 -> 155893 (0.38%)
   helped: 353
   HURT: 548
   Instructions are HURT.

   total bytes in shared programs: 11676650 -> 11678454 (0.02%)
   bytes in affected programs: 1058924 -> 1060728 (0.17%)
   helped: 370
   HURT: 547
   Inconclusive result (value mean confidence interval includes 0).

   total halfregs in shared programs: 484143 -> 471212 (-2.67%)
   halfregs in affected programs: 98833 -> 85902 (-13.08%)
   helped: 2478
   HURT: 674
   Halfregs are helped.

Instr count changes due to losing the RA lottery.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
aead5316d2 nir/opt_sink: Move ALU with constant sources
In general, sinking ALU instructions can negatively impact register pressure,
since it extends the live ranges of the sources, although it does shrink the live range
of the destination.

However, constants do not usually contribute to register pressure. This is not a
totally true assumption, but it's pretty good in practice, since...

* constants can be rematerialized (backend-dependent)
* constants can often be inlined (ISA-dependent)
* constants can sometimes be promoted to free uniform registers (ISA-dependent)
* constants can live in scalar registers although the ALU destination might need
  a vector register (and vector registers are assumed to be much more expensive
  than scalar registers, again ISA-dependent)

So, assume that constants have zero effect on register pressure. Now consider an
ALU instruction where all but one source is a constant. Then there are two
cases:

1. The ALU instruction is moved past when its source was otherwise killed. Then
   there is no effect on register pressure, since the source live range is
   extended exactly as much as the destination live range shrinks.

2. The ALU instruction is moved down but its source is still alive where it's
   moved to. Then register pressure is improved, since the source live range is
   unchanged while the destination live range shrinks.

So, as a heuristic, we always move ALU instructions where n-1 sources are
constant. As an inevitable special case, this also (necessarily) moves unary ALU
ops, which should be beneficial by the same justification. This is not 100%
perfect but it is well-motivated. Results on AGX are decent:

   total instructions in shared programs: 1796101 -> 1795652 (-0.02%)
   instructions in affected programs: 326822 -> 326373 (-0.14%)
   helped: 800
   HURT: 371
   Inconclusive result (%-change mean confidence interval includes 0).

   total bytes in shared programs: 11805004 -> 11801424 (-0.03%)
   bytes in affected programs: 2610630 -> 2607050 (-0.14%)
   helped: 912
   HURT: 462
   Inconclusive result (%-change mean confidence interval includes 0).

   total halfregs in shared programs: 525818 -> 515399 (-1.98%)
   halfregs in affected programs: 118197 -> 107778 (-8.81%)
   helped: 2095
   HURT: 804
   Halfregs are helped.

   total threads in shared programs: 18916608 -> 18917056 (<.01%)
   threads in affected programs: 4800 -> 5248 (9.33%)
   helped: 7
   HURT: 0
   Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
561df40211 nir/opt_sink: Do not move derivatives
At the moment, this does nothing. It will prevent problems from the next patch,
however.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
469fd36fba nir/opt_sink: Sink frag coord instructions
load_input-like. ubershaders:

   instructions in affected programs: 72392 -> 72522 (0.18%)
   helped: 8
   HURT: 18
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 1468550 -> 1469170 (0.04%)
   bytes in affected programs: 560486 -> 561106 (0.11%)
   helped: 10
   HURT: 17
   Inconclusive result (value mean confidence interval includes 0).

   total halfregs in shared programs: 13946 -> 13898 (-0.34%)
   halfregs in affected programs: 3642 -> 3594 (-1.32%)
   helped: 21
   HURT: 0
   Halfregs are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
c07a9dca65 nir/opt_sink: Sink load_local_pixel_agx
This is the AGX version of load_output, which shaders can use for framebuffer
fetch. It is beneficial to sink framebuffer fetch as late as possible, both to
reduce register pressure but also to reduce serialization of overlapping
fragments.

Results on a collection of ubershaders:

   total bytes in shared programs: 1468928 -> 1468550 (-0.03%)
   bytes in affected programs: 495300 -> 494922 (-0.08%)
   helped: 24
   HURT: 0
   Bytes are helped.

   total halfregs in shared programs: 14162 -> 13946 (-1.53%)
   halfregs in affected programs: 5148 -> 4932 (-4.20%)
   helped: 27
   HURT: 0
   Halfregs are helped.

   total threads in shared programs: 216896 -> 217664 (0.35%)
   threads in affected programs: 6912 -> 7680 (11.11%)
   helped: 12
   HURT: 0
   Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
596682ad4b nir/opt_sink: Sink load_constant_agx
By the time this runs, we will have already lowered load_ubo and load_vbo to
load_constant_agx so we need to handle the backend version.

This is very important for reducing register pressure in monolithic VS+GS
shaders on AGX. Since no other backend has _agx intrinsics, there's no need for
an option to gate this.

The additional instruction count is from more frequent wait instructions due to
fewer instructions grouped together. This should be mitigated in the future with
an ACO-style latency-reducing scheduler in the backend, after register pressure
is reduced by opt_sink.

total instructions in shared programs: 1793385 -> 1796101 (0.15%)
instructions in affected programs: 199816 -> 202532 (1.36%)
helped: 3
HURT: 941
Instructions are HURT.

total bytes in shared programs: 11799628 -> 11805004 (0.05%)
bytes in affected programs: 1345656 -> 1351032 (0.40%)
helped: 34
HURT: 919
Bytes are HURT.

total halfregs in shared programs: 533151 -> 525818 (-1.38%)
halfregs in affected programs: 40335 -> 33002 (-18.18%)
helped: 613
HURT: 42
Halfregs are helped.

total threads in shared programs: 18910464 -> 18916608 (0.03%)
threads in affected programs: 6144 -> 12288 (100.00%)
helped: 12
HURT: 0
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00