Shaders might declare PLS vars as inout but might just use them as in
or out but not both. This pass detects those cases and adjusts the
variable/deref modes accordingly.
This pass should be called before nir_lower_io_vars_to_temporaries(),
otherwise the copy_derefs will be inserted, turning unused variables
into used ones.
This should ideally be called after DCE to make sure we don't leave
PLS inout variables behind.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Rather than adding another boolean to optionally lower PLS vars, pass
the types we want to lowers through a nir_variable_mode bitmask.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
This adds some new call operations to handle various parts of the
reductions.
cmat_reduce: is the initial toplevel operation from SPIR-V
this is used after lowering for row/col operation on single hw
supported matrix sizes. The spir-v operation is lowered into
multiple of these on flex dimensions, but also can be lowered into
others.
cmat_reduce_finish:
after multiple reduction operations on a flexible dimension matrix,
there is often subsequent operations on the output matrices to
finish the operation.
cmat_reduce_2x2:
this takes 4 input matrices, and 1 dst to do a 2x2 reduction op.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
With coopmat2 a bunch of functions need a lot of lowering passes
to happen before they can be lowered, so mark them as to be lowered
later.
Drivers needing these should call the nir_remove_non_cmat_call_entrypoints
where they remove entrypoints now, and call the original nir_remove_non_entrypoints
after lowering coopmat2.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This adds a new instruction type to handle cooperative matrix calls.
This clones the call instr, drops callee, and adds a single metadata
slot and a call operation (dummy only for now).
(Not NACKed by Alyssa)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
Only needed by zink. This clip/cull distance separation pass is needed
to remove nir_io_separate_clip_cull_distance_arrays, so that all shared
GLSL code only uses merged clip+cull distance outputs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38452>
This aborts if a pass would make any progress. It can be used to assert that:
- our minimalist pass invocation loops in drivers are sufficient and don't
leave any unoptimized code in the shader
- our lowering is sufficient and other passes don't add instructions that
would cause lowering having to be repeated
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38406>
Based on existing softfloat64 support and Berkeley SoftFloat. This is
targeted at drivers that can't preserve denorms, so operations where
denorm support is irrelevant like conversions to/from integers aren't
handled.
Because the existing mechanism used by Gallium for softfloat64 doesn't
support includes, we unfortunately can't extract common code into a
header. This can be done later if we switch Gallium to using glslang and
spirv-to-nir.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37608>
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
On Mali, we need not only clamp but also convert to float16 on Valhall+.
We could have a separate pass for this but it fits in nicely with the
rest of nir_lower_point_size() so we might as well put it there.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>
The size and stage parameters are left-overs from history. Originally,
the function acted on a list and so it needed an explicit stage and size
output. Now that it takes a NIR shader and a mode, we can just take the
stage from the shader and set num_(in|out)puts.
The one caller that actually used the explicit output parameter was
turnip. However, given that the helper sorts and re-numbers all the I/O
variables, it's not like changing num_(in|out)puts instead of writing it
to some other location is that big of a deal.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38297>
Some shaders, especially RTPSO shaders that have parts of the PSO
inlined, can become absolutely huge. Using a sparse bitset avoids
quadratic complexity in memory consumption for the liveness information.
This reduces peak memory usage in worst-case tests (hammering
compilation of many huge RTPSOs on 32 threads concurrently) by ~60%,
from 43GB to 18GB.
CPU time (seconds) differences for a workload with mostly small shaders:
Difference at 95.0% confidence
-5.27 +/- 1.08963
-0.88811% +/- 0.183626%
(Student's t, pooled s = 0.629735)
Peak resident set usage for the mostly-small workload:
Difference at 95.0% confidence
30809 +/- 13394.3
1.59276% +/- 0.69246%
(Student's t, pooled s = 7741.09)
CPU time for the heavy workload did not show any difference.
Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
The standard way to query options in mesa is `os_get_option()` which
abstracts platform-specific mechanisms to get config variables.
However in quite a few places `getenv()` is still used and this may
preclude controlling some options on some systems.
For instance it is not generally possible to use `MESA_DEBUG` on
Android.
So replace most `getenv()` occurrences with `os_get_option()` to
support configuration options more consistently across different
platforms.
Do the same with `secure_getenv()` replacing it with
`os_get_option_secure()`.
The bulk of the proposed changes are mechanically performed by the
following script:
-----------------------------------------------------------------------
#!/bin/sh
set -e
replace() {
# Don't replace in some files, for example where `os_get_option` is defined,
# or in external files
EXCLUDE_FILES_PATTERN='(src/util/os_misc.c|src/util/u_debug.h|src/gtest/include/gtest/internal/gtest-port.h)'
# Don't replace some "system" variables
EXCLUDE_VARS_PATTERN='("XDG|"DISPLAY|"HOME|"TMPDIR|"POSIXLY_CORRECT)'
git grep "[=!( ]$1(" -- src/ | cut -d ':' -f 1 | sort | uniq | \
grep -v -E "$EXCLUDE_FILES_PATTERN" | \
while read -r file;
do
# Don't replace usages of XDG_* variables or HOME
sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!\( ])$1\(/\1$2\(/g" -i "$file";
done
}
# Add const to os_get_option results, to avoid warning about discarded qualifier:
# warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
# but also errors in some cases:
# error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
add_const_results() {
git grep -l -P '(?<!const )char.*os_get_option' | \
while read -r file;
do
sed -e '/^\s*const/! s/\(char.*os_get_option\)/const \1/g' -i "$file"
done
}
replace 'secure_getenv' 'os_get_option_secure'
replace 'getenv' 'os_get_option'
add_const_results
-----------------------------------------------------------------------
After this, the `#include "util/os_misc.h"` is also added in files where
`os_get_option()` was not used before.
And since the replacements from the script above generated some new
`-Wdiscarded-qualifiers` warnings, those have been addressed as well,
generally by declaring `os_get_option()` results as `const char *` and
adjusting some function declarations.
Finally some replacements caused new errors like:
-----------------------------------------------------------------------
../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok'
127 | for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) {
| ^~~~~~
/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char *') would lose const qualifier
124 | char* _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter);
| ^ ~~~~~~~~~~~~~~~~~~~
-----------------------------------------------------------------------
Those have been addressed too, copying the const string returned by
`os_get_option()` so that it could be modified.
In particular, the error above has been fixed by copying the `const
char *env_llc_options` variable in
`src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char *` which can
be tokenized using `strtok()`.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
When you're trying to figure out what shader some NIR pass broke, use
nir_shader_bisect_select() to decide between NIR pass behaviors, and then
nir_shader_bisect.py will help you automatically bisect down to which
source_blake3 is at fault. Once it's identified, it prints you a C call
you can use for selecting that shader specifically, which you can use for
continuing on in your debugging.
On a test I was looking at, this took 10 steps to bisect 134 shaders down
to the source_blake3 of the NIR shader in question.
This idea is heavily lifted from Job Noorman's ir3_shader_bisect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>
This broke after divergence became metadata because the divergence
analysis pass does not support all instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35069>
This was incorrect (it also lowered int64 reductions/scans), and the only
user can just use the general callback to precisely only lower what it wants.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.
No stat changes on shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
Wether need_nuw is used is currently decided in two different ways:
- globally through the allow_offset_wrap option;
- per intrinsic but hard-coded in opt_offsets.
Make this more flexible by creating a callback that is called per
intrinsic. This will allow backends to decide, on a per-intrinsic basis,
whether need_nuw is needed.
Note that the main use case for ir3 is to add support for opt_offsets
for global memory accesses. Other intrinsics don't need need_nuw but
global memory accesses do.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37114>