None of the callsites took advantage of this, so remove
the feature. This will help to a next change that will
add an opaque type to represent a linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
The inc_compiler should come as part of idep_compiler, idep_nir or
idep_nir_headers dependency.
Acked-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v3dv)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25314>
That will make sure the include directories are passed on and also
make sure the generated headers are properly built before whoever code
depends on it. NIR dependency propagates that dependency too.
Since the right include directory is always propagated, we can remove
the extra "compiler/" prefix from the `#include`s in glsl_types.h.
Note: NIR has a special "header only" dependency, so include the
generated headers for compiler there too.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9843
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25314>
This was 4th on the list of things to try in 3ee2e84c60 ("nir:
Rematerialize compare instructions"). This is implemented as a separate
subpass that tries to find ALU instructions (with restrictions) that are
only used by comparisons with zero that are in turn only used as
conditions for bcsel or if-statements.
There are two restrictions implemented. One of the sources must be a
constant. This is done in an attempt to prevent increasing register
pressure. Additionally, the opcode of the instruction must be one that
has a high probablility of getting a conditional modifier on Intel
GPUs. Not all instructions can have a conditional modifiers (e.g., min
and max), so I don't think there is any benefit to moving these
instructions.
v2: Rebase on many, many recent NIR infrastructure changes.
v3: Make data in commit message more clear. Suggested by Matt. Rebase on
b5d6b7c402 ("nir: Drop most uses if nir_instr_rewrite_src()").
All of the affected shaders on ILK and G45 are in CS:GO. There is some
brief analysis of the changes in the MR.
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Shader-db results:
DG2
total instructions in shared programs: 22824637 -> 22824258 (<.01%)
instructions in affected programs: 365742 -> 365363 (-0.10%)
helped: 190 / HURT: 97
total cycles in shared programs: 832186193 -> 832157290 (<.01%)
cycles in affected programs: 41245259 -> 41216356 (-0.07%)
helped: 208 / HURT: 117
total spills in shared programs: 4072 -> 4060 (-0.29%)
spills in affected programs: 366 -> 354 (-3.28%)
helped: 4 / HURT: 2
total fills in shared programs: 3601 -> 3607 (0.17%)
fills in affected programs: 708 -> 714 (0.85%)
helped: 4 / HURT: 2
LOST: 0
GAINED: 1
Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 20320934 -> 20320689 (<.01%)
instructions in affected programs: 236592 -> 236347 (-0.10%)
helped: 176 / HURT: 29
total cycles in shared programs: 849846341 -> 849843856 (<.01%)
cycles in affected programs: 41277336 -> 41274851 (<.01%)
helped: 195 / HURT: 110
LOST: 0
GAINED: 1
Skylake
total instructions in shared programs: 18550811 -> 18550470 (<.01%)
instructions in affected programs: 233908 -> 233567 (-0.15%)
helped: 182 / HURT: 25
total cycles in shared programs: 835910983 -> 835889167 (<.01%)
cycles in affected programs: 38764359 -> 38742543 (-0.06%)
helped: 207/ HURT: 94
total spills in shared programs: 4522 -> 4506 (-0.35%)
spills in affected programs: 324 -> 308 (-4.94%)
helped: 4 / HURT: 0
total fills in shared programs: 5296 -> 5280 (-0.30%)
fills in affected programs: 324 -> 308 (-4.94%)
helped: 4 / HURT: 0
LOST: 0
GAINED: 1
Broadwell
total instructions in shared programs: 18199130 -> 18197920 (<.01%)
instructions in affected programs: 214664 -> 213454 (-0.56%)
helped: 191 / HURT: 0
total cycles in shared programs: 935131908 -> 934870248 (-0.03%)
cycles in affected programs: 75770568 -> 75508908 (-0.35%)
helped: 203 / HURT: 84
total spills in shared programs: 13896 -> 13734 (-1.17%)
spills in affected programs: 162 -> 0
helped: 3 / HURT: 0
total fills in shared programs: 16989 -> 16761 (-1.34%)
fills in affected programs: 228 -> 0
helped: 3 / HURT: 0
Haswell
total instructions in shared programs: 16969502 -> 16969085 (<.01%)
instructions in affected programs: 185498 -> 185081 (-0.22%)
helped: 121 / HURT: 1
total cycles in shared programs: 925290863 -> 924806827 (-0.05%)
cycles in affected programs: 30200863 -> 29716827 (-1.60%)
helped: 100 / HURT: 85
total spills in shared programs: 13565 -> 13533 (-0.24%)
spills in affected programs: 736 -> 704 (-4.35%)
helped: 8 / HURT: 0
total fills in shared programs: 15468 -> 15436 (-0.21%)
fills in affected programs: 740 -> 708 (-4.32%)
helped: 8 / HURT: 0
LOST: 0
GAINED: 1
Ivy Bridge
total instructions in shared programs: 15839127 -> 15838947 (<.01%)
instructions in affected programs: 77776 -> 77596 (-0.23%)
helped: 58 / HURT: 0
total cycles in shared programs: 459852774 -> 459739770 (-0.02%)
cycles in affected programs: 11970210 -> 11857206 (-0.94%)
helped: 79 / HURT: 53
Sandy Bridge
total instructions in shared programs: 14106847 -> 14106831 (<.01%)
instructions in affected programs: 1611 -> 1595 (-0.99%)
helped: 10 / HURT: 0
total cycles in shared programs: 775004024 -> 775007516 (<.01%)
cycles in affected programs: 2530686 -> 2534178 (0.14%)
helped: 55 / HURT: 48
Iron Lake
total cycles in shared programs: 257753356 -> 257754900 (<.01%)
cycles in affected programs: 2977374 -> 2978918 (0.05%)
helped: 12 / HURT: 106
GM45
total cycles in shared programs: 169711382 -> 169712816 (<.01%)
cycles in affected programs: 2402070 -> 2403504 (0.06%)
helped: 12 / HURT: 57
Fossil-db results:
All Intel platforms had similar results. (DG2 shown)
Totals:
Instrs: 193884596 -> 193465896 (-0.22%); split: -0.25%, +0.03%
Cycles: 14050193354 -> 14048194826 (-0.01%); split: -0.34%, +0.33%
Spill count: 114944 -> 100449 (-12.61%); split: -13.59%, +0.98%
Fill count: 201525 -> 179534 (-10.91%); split: -11.22%, +0.31%
Scratch Memory Size: 10028032 -> 8468480 (-15.55%)
Totals from 16912 (2.59% of 653124) affected shaders:
Instrs: 34173709 -> 33755009 (-1.23%); split: -1.41%, +0.19%
Cycles: 2945969110 -> 2943970582 (-0.07%); split: -1.62%, +1.55%
Spill count: 97753 -> 83258 (-14.83%); split: -15.98%, +1.15%
Fill count: 176355 -> 154364 (-12.47%); split: -12.82%, +0.35%
Scratch Memory Size: 8619008 -> 7059456 (-18.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20176>
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated. There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:
- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.
Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
Since using nir_shader_lower_instructions(), instructions get revisited
before proceeding with the next one. This already guarantees that any
subsequent lowerings of those instructions happen during the same pass
of nir_lower_subgroups().
v2: use nir_shader_lower_instructions() instead of setting the cursor.
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
It is now unused and has no real use cases now that nir_register is gone.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>
In general, sinking ALU instructions can negatively impact register pressure,
since it extends the live ranges of the sources, although it does shrink the live range
of the destination.
However, constants do not usually contribute to register pressure. This is not a
totally true assumption, but it's pretty good in practice, since...
* constants can be rematerialized (backend-dependent)
* constants can often be inlined (ISA-dependent)
* constants can sometimes be promoted to free uniform registers (ISA-dependent)
* constants can live in scalar registers although the ALU destination might need
a vector register (and vector registers are assumed to be much more expensive
than scalar registers, again ISA-dependent)
So, assume that constants have zero effect on register pressure. Now consider an
ALU instruction where all but one source is a constant. Then there are two
cases:
1. The ALU instruction is moved past when its source was otherwise killed. Then
there is no effect on register pressure, since the source live range is
extended exactly as much as the destination live range shrinks.
2. The ALU instruction is moved down but its source is still alive where it's
moved to. Then register pressure is improved, since the source live range is
unchanged while the destination live range shrinks.
So, as a heuristic, we always move ALU instructions where n-1 sources are
constant. As an inevitable special case, this also (necessarily) moves unary ALU
ops, which should be beneficial by the same justification. This is not 100%
perfect but it is well-motivated. Results on AGX are decent:
total instructions in shared programs: 1796101 -> 1795652 (-0.02%)
instructions in affected programs: 326822 -> 326373 (-0.14%)
helped: 800
HURT: 371
Inconclusive result (%-change mean confidence interval includes 0).
total bytes in shared programs: 11805004 -> 11801424 (-0.03%)
bytes in affected programs: 2610630 -> 2607050 (-0.14%)
helped: 912
HURT: 462
Inconclusive result (%-change mean confidence interval includes 0).
total halfregs in shared programs: 525818 -> 515399 (-1.98%)
halfregs in affected programs: 118197 -> 107778 (-8.81%)
helped: 2095
HURT: 804
Halfregs are helped.
total threads in shared programs: 18916608 -> 18917056 (<.01%)
threads in affected programs: 4800 -> 5248 (9.33%)
helped: 7
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
At the moment, this does nothing. It will prevent problems from the next patch,
however.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
This is the AGX version of load_output, which shaders can use for framebuffer
fetch. It is beneficial to sink framebuffer fetch as late as possible, both to
reduce register pressure but also to reduce serialization of overlapping
fragments.
Results on a collection of ubershaders:
total bytes in shared programs: 1468928 -> 1468550 (-0.03%)
bytes in affected programs: 495300 -> 494922 (-0.08%)
helped: 24
HURT: 0
Bytes are helped.
total halfregs in shared programs: 14162 -> 13946 (-1.53%)
halfregs in affected programs: 5148 -> 4932 (-4.20%)
helped: 27
HURT: 0
Halfregs are helped.
total threads in shared programs: 216896 -> 217664 (0.35%)
threads in affected programs: 6912 -> 7680 (11.11%)
helped: 12
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
By the time this runs, we will have already lowered load_ubo and load_vbo to
load_constant_agx so we need to handle the backend version.
This is very important for reducing register pressure in monolithic VS+GS
shaders on AGX. Since no other backend has _agx intrinsics, there's no need for
an option to gate this.
The additional instruction count is from more frequent wait instructions due to
fewer instructions grouped together. This should be mitigated in the future with
an ACO-style latency-reducing scheduler in the backend, after register pressure
is reduced by opt_sink.
total instructions in shared programs: 1793385 -> 1796101 (0.15%)
instructions in affected programs: 199816 -> 202532 (1.36%)
helped: 3
HURT: 941
Instructions are HURT.
total bytes in shared programs: 11799628 -> 11805004 (0.05%)
bytes in affected programs: 1345656 -> 1351032 (0.40%)
helped: 34
HURT: 919
Bytes are HURT.
total halfregs in shared programs: 533151 -> 525818 (-1.38%)
halfregs in affected programs: 40335 -> 33002 (-18.18%)
helped: 613
HURT: 42
Halfregs are helped.
total threads in shared programs: 18910464 -> 18916608 (0.03%)
threads in affected programs: 6144 -> 12288 (100.00%)
helped: 12
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
Redefine in terms of the algebraic property. This correctly handles the
Mali-specific derivatives.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
Will be useful later to generate string tables for the builtin types.
Note we make some extra effort to ensure C++ client code doesn't need to change,
by keeping glsl_type::*_type pointers around.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (Python and Meson changes)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25191>
Rewrite the code to use linear_asprintf and always flip the
dimensions in place if the element type is an array. The new
code will now (correctly) flip even in the case of unsized arrays.
The flipping is done by swapping the ranges [a, b) and [b, c), as
shown below, with element type int[...] and an array of length 4.
```
+--------------- a: first bracket in the name
| +---------- b: end of the element name
| | +------- c: end of the array name
| | |
int[...][4]$
will be transformed into
int[4][...]$
```
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23278>
And fix all the errors it found.
Note that for the unsized array error, we will print the
toplevel type -- so that the fact that an inner array is
unsized can be seen.
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25200>
While both clang and gcc can handle designated initializers in C++,
MSVC only does with the C++20 support enabled. So move the initialization
of builtins to a C file.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25006>
Remove constructors from glsl_type so it can be used as a
POD ("plain old data") struct, allowing the builtins to be
initialized directly in memory.
For other types, we now allocate them from glsl_type_cache's mem_ctx,
instead of using the global allocator.
As a side-effect of how the new helpers work, we can completely
create the mock key types for struct/interface lookup without
allocating any memory.
Note there's no `make_sampler_type` since all the sampler types
are created through direct initialization.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25006>
Since now all the data referenced by it is allocated with the cache's
mem_ctx, it is sufficient to just free it, and then reset the cache
state to be ready for a next initialization if it happens.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25006>
These are used only by types created at runtime. Since those will follow
the lifetime of the glsl_type_cache, we can use its mem_ctx for all the types.
Without a mem_ctx, there's nothing to be done in the destructor, so remove it.
Note some keys are calculated by building a mock type, so we need to create
a tmp_ctx in some cases. We'll get rid of them in a later commit.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25006>