Some ALU instructions will likely end up being copy propagated in the
backend, which means they would not have any cost. This helps the
scheduler make better decisions for the new open-coded patterns
produced in NIR for extracts (i.e. unpack_2x16) with MR#39511.
With this (together with previous patches) we manage to produce similar
shader-db results as with the unpack_2x16 NIR extract opcodes that
MR#39511 will drop.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We need this to produce optimal code in the backend for sequences
like this:
32 %10 = ushr %5.x, %9 (0x10)
16 %14 = u2u16 %10
32 %17 = f2f32 %14
With such code, our copy propagation pass will drop the u216 and
with this patch we will be able to drop the ushr too.
This pattern can show up for VK_KHR_16bit_storage when we successfully
vectorize 16-bit loads into 32-bit loads, but will become a lot more
common after MR#39511 lands, since that would also affect things like
16-bit TMU loads, which are more common.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We only really use sub-32bit integers in conversions, so we can skip
clearing the MSB bits when we produce them by converting from larger types
(leaving these bits undefined) and only clear them when we convert from them
to larger types, since we don't have native opcodes to do these conversions
that would only access relevant bits, at least on Pi4. Also, document the
cases where we could do better for Pi5.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
Set the per-pixel mask based on the value of skip_helpers.
This slightly increase the performance on several traces.
fps_avg helped: gl_gfxbench_trex.trace: 22.30 -> 22.79 (2.20%)
total fps_avg in all runs: 55.18 -> 55.71 (0.97%)
total fps_avg in affected (through threshold) runs: 22.30 -> 22.79 (2.20%)
helped: 1
HURT: 0
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38759>
It will be used with image loads to enable or disable helper invocations.
This fixes a Vulkan CTS test that perform an imageLoad() inside a
fwidth() operation.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38759>
v3d_nir_lower_load_store_bitsize that uses nir_lower_mem_access_bit_sizes
already ensures that any writemask on store has consecutive bits set.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38921>
This is only compile tested. I have not collected any shader-db or
fossil-db data.
v2: Drop the calls to nir_opt_constant_folding. The builder in
nir_lower_flrp will already take care of this.
v3: NIR_PASS_V is gone. Noticed by Marge.
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
For umul24 we expose the operation as UMUL24_RTOP0 so we can identify
the difference between umul24 as part of a sequence generated from an
imul as "multop+umul24" and a simple umul24 where rtop will always be 0.
For umul24_rtop0 instructions we relax the scheduling restrictions,
so they don't need to be serialized like the multop+umul24 ops. But
we maintain the read dependency with the last_rtop.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38642>
All drivers use the same callback and it is unlikely that new drivers will use
this pass since it has better replacements today (lower_mem_bit_sizes for
memory, and it never worked for I/O). This should discourage as much.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38533>
This adds a new instruction type to handle cooperative matrix calls.
This clones the call instr, drops callee, and adds a single metadata
slot and a call operation (dummy only for now).
(Not NACKed by Alyssa)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
On Mali, we need not only clamp but also convert to float16 on Valhall+.
We could have a separate pass for this but it fits in nicely with the
rest of nir_lower_point_size() so we might as well put it there.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>
Certain subgroup operations don’t impose constraints on
CSD supergroup packing. Mark these as supported
and account for them in v3d_csd_choose_workgroups_per_supergroup()
so packing remains unchanged when they are present.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37836>
All the compiler support was implemented as part of the v3dv
implementation (see commit 31e8740808 and MR#27211).
We are using the same size/supported_stages and mostly the same
supported features, so probably at some point it would be good to have
a common place for that info. Zink reuses their definitions, but as
far as I see it does that because the PIPE and equivalent VK
definitions has the same values, that seems somewhat fragile.
We don't support all features, and in order to support arithmetic we
need to enable a lowering.
Using CTS, right now we are passing 1023 tests out of 6053 (the rest
are skipped).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37621>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.
We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.
Via Coccinelle patch:
@@
expression a, b;
@@
-nir_def_rewrite_uses_after(a, b, b->parent_instr)
+nir_def_rewrite_uses_after_def(a, b)
Followed by a bunch of sed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
When building for the C23 standard, the compiler gives the following
error:
-----------------------------------------------------------------------
../src/broadcom/compiler/vir_register_allocate.c
../src/broadcom/compiler/vir_register_allocate.c: In function ‘update_graph_and_reg_classes_for_inst’:
../src/broadcom/compiler/vir_register_allocate.c:1225:44: error: expected statement before ‘;’ token
1225 | FALLTHROUGH;
| ^
../src/broadcom/compiler/vir_register_allocate.c:1225:44: warning: ‘fallthrough’ attribute ignored [-Wattributes]
../src/broadcom/compiler/vir_register_allocate.c:1225:44: error: suggest braces around empty body in an ‘else’ statement [-Werror=empty-body]
../src/broadcom/compiler/vir_register_allocate.c:1222:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
1222 | if (c->devinfo->ver >= 71)
| ^
../src/broadcom/compiler/vir_register_allocate.c:1226:17: note: here
1226 | case 1:
| ^~~~
-----------------------------------------------------------------------
Fix that by doing what the compiler suggests, i.e. by using braces
around empty body in the ‘else’ statement.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36323>
This should work now that the pass returns progress and invalidates metadata.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36291>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
RA is intended to dump the vir when register allocation fails, so it
should be checked when we set c->compilation_result to
FAILED_REGISTER_ALLOCATION.
But it seems that this option was forgotten when on some of the
refactorings around compilation_result, as was let on an old condition
that reported register allocation failures.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35823>
The GL frontend is perfectly good at lowering it like we do. Cuts out a
bunch of duplicate code.
We still have ucp_enables for the FS due to lowering of CLIPDIST to
discards in the FS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8953>
Before we move a UBO load to a previous location in the block we take a
reference to the instruction after it so we can continue the loop from
there, however, if the load we just moved was already the last instruction
in the block we just want to break the loop right there.
Fixes crashes with shaders from http://flightradar24.com
Fixes: 8998666de7 ("broadcom/compiler: sort constant UBO loads by index and offset")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35333>