It's the only driver that uses the pass so it may as well go there.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40307>
This catches a number of bugs in the current NIR algebraic optimizations
or opcodes implementations (as fixed in this series, or documented in the
XFAIL tests), and should prevent many future bugs from landing.
This required bumping the test timeout, because s390x is very slow to
emulate in CI.
Closes: #3338
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>
Lowers a shader to use a smaller workgroup to do the same work,
while it will still appear as a bigger workgroup to applications.
To achieve this, the pass augments the CF of the shader
so that each real subgroup will execute two or more logical
subgroups. A logical subgroup represents what the application
can observe as a subgroup.
The size of a logical subgroup is the same as a real subgroup.
Only one logical subgroup may be executed per real subgroup
at the same time. This ensures that all subgroup operations
keep working and the subgroup invocation ID stays the same.
- When the CF contains barriers, we can't just repeat
the code and we need to augment each CF node individually
so that they are aware of logical subgroups.
- In case parts of the CF don't contain any barriers, we can simply
repeat and predicate that CF for each logical subgroup.
It is technically not necessary to implement this strategy, but
in practice it helps reduce the amount of branches in the shader
and therefore improves compile times.
The pass is mainly intended for working around HW limitations,
for example when the HW has an upper limit on the workgroup size
or doesn't support workgroups at all, but the API requires a
certain minimum.
Notes:
- Only applicable to shader stages that use workgroups
- Hits an assertion when called on smaller workgroups
- Always flattens workgroup size to 1D
- Creates local variables
- Does not change subgroup size
- Variable workgroup size not supported yet, maybe later
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Anna Maniscalco <anna.maniscalco2000@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37985>
Introduce a new NIR pass called nir_remove_outputs which works on
lowered I/O intrinsics and can remove any output varying or sysval.
This is meant to replace custom solutions in drivers,
such as radv_remove_varyings and similar.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>
Shaders might declare PLS vars as inout but might just use them as in
or out but not both. This pass detects those cases and adjusts the
variable/deref modes accordingly.
This pass should be called before nir_lower_io_vars_to_temporaries(),
otherwise the copy_derefs will be inserted, turning unused variables
into used ones.
This should ideally be called after DCE to make sure we don't leave
PLS inout variables behind.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Only needed by zink. This clip/cull distance separation pass is needed
to remove nir_io_separate_clip_cull_distance_arrays, so that all shared
GLSL code only uses merged clip+cull distance outputs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38452>
Based on existing softfloat64 support and Berkeley SoftFloat. This is
targeted at drivers that can't preserve denorms, so operations where
denorm support is irrelevant like conversions to/from integers aren't
handled.
Because the existing mechanism used by Gallium for softfloat64 doesn't
support includes, we unfortunately can't extract common code into a
header. This can be done later if we switch Gallium to using glslang and
spirv-to-nir.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37608>
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
When you're trying to figure out what shader some NIR pass broke, use
nir_shader_bisect_select() to decide between NIR pass behaviors, and then
nir_shader_bisect.py will help you automatically bisect down to which
source_blake3 is at fault. Once it's identified, it prints you a C call
you can use for selecting that shader specifically, which you can use for
continuing on in your debugging.
On a test I was looking at, this took 10 steps to bisect 134 shaders down
to the source_blake3 of the NIR shader in question.
This idea is heavily lifted from Job Noorman's ir3_shader_bisect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>
Usage of '--outdir' argument in python scripts makes it very
complicated for tools like ninja-to-soong to generate the Android
equivalent build file.
This is because the option is less clear on what will be generated.
Instead, change it for '--out' where we give the full path of the file
to generate. This has the good point of deduplicating the locations of
the file name to have it only in 'meson.build'.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>
VDPAU only supports X11 and GL interop. There is no Wayland or Vulkan
interop support. The API has limitations that makes it impossible to
correctly decode certain streams.
Application support is also very limited, and VAAPI is always a better
choice over VDPAU.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.
No stat changes on shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This adds a generic lowering pass for coop mat flexible dimensions.
This should be suitable for all drivers that implement coop mat2 flexible dimensions
or even just lowering sw exposed sizes to hw sizes.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36544>
a bunch of drivers have versions of this, might as well make a common one.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
See the comment at the top file :-)
The ideas in this pass are based on LLVM. The implementation itself is from
scratch because have you /tried/ to read that thing?
Because LLVM and therefore prop drivers do these optimizations, this should help
narrow Mesa's performance gap with the blobs.
This probably needs some tuning for best results on other ISAs, but the stats
for AGX speak for themselves, see next commits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36147>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>