This is mostly a direct port of the GLSL IR code there are only
2 real functional changes.
1. The direct use of mesa symbol_table instead of glsl_symbol_table.
However since none of the extra functionality offered by
glsl_symbol_table was ever used here this can be seen as an
improvement.
2. Because interface blocks are lowered before this new nir linker
sees them we must explicitly skip them (they are validated
elsewhere) to avoid errors.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25371>
Includes a modified version of using extract/insert for OpLoad/OpStore
from Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (earlier version)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (earlier version)
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23825>
In certain cases, we have complex opaque objects that are loaded
into (SPIR-V) SSA values. To represent these, we now can store a
reference to a variable in vtn_ssa_value.
Also implements a few operations we know will have to be supported,
like Select and Copy.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23825>
If the high 32 bits were zero, this would be umin(find_lsb(lo), 31). This
evaluates to 31 if lo is also zero, instead of -1.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 9293d8e64b ("nir: Add find_lsb lowering to nir_lower_int64.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25409>
This pass is useful for vector based backends as we might end up with alu
instructions referencing vec8/vec16 values even though being vec4 or
smaller themselves.
This new pass intents to clean up any use of vec8/vec16 sources other
passes won't.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25330>
In the linear allocation only the parent (context) can be used
to allocate new children, so let's use an opaque type to identify
the linear context. This is similar to what's done in GC allocator.
Update the documentation and a couple of function names to
refer to linear context instead of linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
None of the callsites took advantage of this, so remove
the feature. This will help to a next change that will
add an opaque type to represent a linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
The inc_compiler should come as part of idep_compiler, idep_nir or
idep_nir_headers dependency.
Acked-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v3dv)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25314>
That will make sure the include directories are passed on and also
make sure the generated headers are properly built before whoever code
depends on it. NIR dependency propagates that dependency too.
Since the right include directory is always propagated, we can remove
the extra "compiler/" prefix from the `#include`s in glsl_types.h.
Note: NIR has a special "header only" dependency, so include the
generated headers for compiler there too.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9843
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25314>
This was 4th on the list of things to try in 3ee2e84c60 ("nir:
Rematerialize compare instructions"). This is implemented as a separate
subpass that tries to find ALU instructions (with restrictions) that are
only used by comparisons with zero that are in turn only used as
conditions for bcsel or if-statements.
There are two restrictions implemented. One of the sources must be a
constant. This is done in an attempt to prevent increasing register
pressure. Additionally, the opcode of the instruction must be one that
has a high probablility of getting a conditional modifier on Intel
GPUs. Not all instructions can have a conditional modifiers (e.g., min
and max), so I don't think there is any benefit to moving these
instructions.
v2: Rebase on many, many recent NIR infrastructure changes.
v3: Make data in commit message more clear. Suggested by Matt. Rebase on
b5d6b7c402 ("nir: Drop most uses if nir_instr_rewrite_src()").
All of the affected shaders on ILK and G45 are in CS:GO. There is some
brief analysis of the changes in the MR.
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Shader-db results:
DG2
total instructions in shared programs: 22824637 -> 22824258 (<.01%)
instructions in affected programs: 365742 -> 365363 (-0.10%)
helped: 190 / HURT: 97
total cycles in shared programs: 832186193 -> 832157290 (<.01%)
cycles in affected programs: 41245259 -> 41216356 (-0.07%)
helped: 208 / HURT: 117
total spills in shared programs: 4072 -> 4060 (-0.29%)
spills in affected programs: 366 -> 354 (-3.28%)
helped: 4 / HURT: 2
total fills in shared programs: 3601 -> 3607 (0.17%)
fills in affected programs: 708 -> 714 (0.85%)
helped: 4 / HURT: 2
LOST: 0
GAINED: 1
Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 20320934 -> 20320689 (<.01%)
instructions in affected programs: 236592 -> 236347 (-0.10%)
helped: 176 / HURT: 29
total cycles in shared programs: 849846341 -> 849843856 (<.01%)
cycles in affected programs: 41277336 -> 41274851 (<.01%)
helped: 195 / HURT: 110
LOST: 0
GAINED: 1
Skylake
total instructions in shared programs: 18550811 -> 18550470 (<.01%)
instructions in affected programs: 233908 -> 233567 (-0.15%)
helped: 182 / HURT: 25
total cycles in shared programs: 835910983 -> 835889167 (<.01%)
cycles in affected programs: 38764359 -> 38742543 (-0.06%)
helped: 207/ HURT: 94
total spills in shared programs: 4522 -> 4506 (-0.35%)
spills in affected programs: 324 -> 308 (-4.94%)
helped: 4 / HURT: 0
total fills in shared programs: 5296 -> 5280 (-0.30%)
fills in affected programs: 324 -> 308 (-4.94%)
helped: 4 / HURT: 0
LOST: 0
GAINED: 1
Broadwell
total instructions in shared programs: 18199130 -> 18197920 (<.01%)
instructions in affected programs: 214664 -> 213454 (-0.56%)
helped: 191 / HURT: 0
total cycles in shared programs: 935131908 -> 934870248 (-0.03%)
cycles in affected programs: 75770568 -> 75508908 (-0.35%)
helped: 203 / HURT: 84
total spills in shared programs: 13896 -> 13734 (-1.17%)
spills in affected programs: 162 -> 0
helped: 3 / HURT: 0
total fills in shared programs: 16989 -> 16761 (-1.34%)
fills in affected programs: 228 -> 0
helped: 3 / HURT: 0
Haswell
total instructions in shared programs: 16969502 -> 16969085 (<.01%)
instructions in affected programs: 185498 -> 185081 (-0.22%)
helped: 121 / HURT: 1
total cycles in shared programs: 925290863 -> 924806827 (-0.05%)
cycles in affected programs: 30200863 -> 29716827 (-1.60%)
helped: 100 / HURT: 85
total spills in shared programs: 13565 -> 13533 (-0.24%)
spills in affected programs: 736 -> 704 (-4.35%)
helped: 8 / HURT: 0
total fills in shared programs: 15468 -> 15436 (-0.21%)
fills in affected programs: 740 -> 708 (-4.32%)
helped: 8 / HURT: 0
LOST: 0
GAINED: 1
Ivy Bridge
total instructions in shared programs: 15839127 -> 15838947 (<.01%)
instructions in affected programs: 77776 -> 77596 (-0.23%)
helped: 58 / HURT: 0
total cycles in shared programs: 459852774 -> 459739770 (-0.02%)
cycles in affected programs: 11970210 -> 11857206 (-0.94%)
helped: 79 / HURT: 53
Sandy Bridge
total instructions in shared programs: 14106847 -> 14106831 (<.01%)
instructions in affected programs: 1611 -> 1595 (-0.99%)
helped: 10 / HURT: 0
total cycles in shared programs: 775004024 -> 775007516 (<.01%)
cycles in affected programs: 2530686 -> 2534178 (0.14%)
helped: 55 / HURT: 48
Iron Lake
total cycles in shared programs: 257753356 -> 257754900 (<.01%)
cycles in affected programs: 2977374 -> 2978918 (0.05%)
helped: 12 / HURT: 106
GM45
total cycles in shared programs: 169711382 -> 169712816 (<.01%)
cycles in affected programs: 2402070 -> 2403504 (0.06%)
helped: 12 / HURT: 57
Fossil-db results:
All Intel platforms had similar results. (DG2 shown)
Totals:
Instrs: 193884596 -> 193465896 (-0.22%); split: -0.25%, +0.03%
Cycles: 14050193354 -> 14048194826 (-0.01%); split: -0.34%, +0.33%
Spill count: 114944 -> 100449 (-12.61%); split: -13.59%, +0.98%
Fill count: 201525 -> 179534 (-10.91%); split: -11.22%, +0.31%
Scratch Memory Size: 10028032 -> 8468480 (-15.55%)
Totals from 16912 (2.59% of 653124) affected shaders:
Instrs: 34173709 -> 33755009 (-1.23%); split: -1.41%, +0.19%
Cycles: 2945969110 -> 2943970582 (-0.07%); split: -1.62%, +1.55%
Spill count: 97753 -> 83258 (-14.83%); split: -15.98%, +1.15%
Fill count: 176355 -> 154364 (-12.47%); split: -12.82%, +0.35%
Scratch Memory Size: 8619008 -> 7059456 (-18.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20176>
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated. There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:
- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.
Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
Since using nir_shader_lower_instructions(), instructions get revisited
before proceeding with the next one. This already guarantees that any
subsequent lowerings of those instructions happen during the same pass
of nir_lower_subgroups().
v2: use nir_shader_lower_instructions() instead of setting the cursor.
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
It is now unused and has no real use cases now that nir_register is gone.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>
In general, sinking ALU instructions can negatively impact register pressure,
since it extends the live ranges of the sources, although it does shrink the live range
of the destination.
However, constants do not usually contribute to register pressure. This is not a
totally true assumption, but it's pretty good in practice, since...
* constants can be rematerialized (backend-dependent)
* constants can often be inlined (ISA-dependent)
* constants can sometimes be promoted to free uniform registers (ISA-dependent)
* constants can live in scalar registers although the ALU destination might need
a vector register (and vector registers are assumed to be much more expensive
than scalar registers, again ISA-dependent)
So, assume that constants have zero effect on register pressure. Now consider an
ALU instruction where all but one source is a constant. Then there are two
cases:
1. The ALU instruction is moved past when its source was otherwise killed. Then
there is no effect on register pressure, since the source live range is
extended exactly as much as the destination live range shrinks.
2. The ALU instruction is moved down but its source is still alive where it's
moved to. Then register pressure is improved, since the source live range is
unchanged while the destination live range shrinks.
So, as a heuristic, we always move ALU instructions where n-1 sources are
constant. As an inevitable special case, this also (necessarily) moves unary ALU
ops, which should be beneficial by the same justification. This is not 100%
perfect but it is well-motivated. Results on AGX are decent:
total instructions in shared programs: 1796101 -> 1795652 (-0.02%)
instructions in affected programs: 326822 -> 326373 (-0.14%)
helped: 800
HURT: 371
Inconclusive result (%-change mean confidence interval includes 0).
total bytes in shared programs: 11805004 -> 11801424 (-0.03%)
bytes in affected programs: 2610630 -> 2607050 (-0.14%)
helped: 912
HURT: 462
Inconclusive result (%-change mean confidence interval includes 0).
total halfregs in shared programs: 525818 -> 515399 (-1.98%)
halfregs in affected programs: 118197 -> 107778 (-8.81%)
helped: 2095
HURT: 804
Halfregs are helped.
total threads in shared programs: 18916608 -> 18917056 (<.01%)
threads in affected programs: 4800 -> 5248 (9.33%)
helped: 7
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
At the moment, this does nothing. It will prevent problems from the next patch,
however.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
This is the AGX version of load_output, which shaders can use for framebuffer
fetch. It is beneficial to sink framebuffer fetch as late as possible, both to
reduce register pressure but also to reduce serialization of overlapping
fragments.
Results on a collection of ubershaders:
total bytes in shared programs: 1468928 -> 1468550 (-0.03%)
bytes in affected programs: 495300 -> 494922 (-0.08%)
helped: 24
HURT: 0
Bytes are helped.
total halfregs in shared programs: 14162 -> 13946 (-1.53%)
halfregs in affected programs: 5148 -> 4932 (-4.20%)
helped: 27
HURT: 0
Halfregs are helped.
total threads in shared programs: 216896 -> 217664 (0.35%)
threads in affected programs: 6912 -> 7680 (11.11%)
helped: 12
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
By the time this runs, we will have already lowered load_ubo and load_vbo to
load_constant_agx so we need to handle the backend version.
This is very important for reducing register pressure in monolithic VS+GS
shaders on AGX. Since no other backend has _agx intrinsics, there's no need for
an option to gate this.
The additional instruction count is from more frequent wait instructions due to
fewer instructions grouped together. This should be mitigated in the future with
an ACO-style latency-reducing scheduler in the backend, after register pressure
is reduced by opt_sink.
total instructions in shared programs: 1793385 -> 1796101 (0.15%)
instructions in affected programs: 199816 -> 202532 (1.36%)
helped: 3
HURT: 941
Instructions are HURT.
total bytes in shared programs: 11799628 -> 11805004 (0.05%)
bytes in affected programs: 1345656 -> 1351032 (0.40%)
helped: 34
HURT: 919
Bytes are HURT.
total halfregs in shared programs: 533151 -> 525818 (-1.38%)
halfregs in affected programs: 40335 -> 33002 (-18.18%)
helped: 613
HURT: 42
Halfregs are helped.
total threads in shared programs: 18910464 -> 18916608 (0.03%)
threads in affected programs: 6144 -> 12288 (100.00%)
helped: 12
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>