Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
To add the const offset to the base index.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23254>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
Currently, we have an atomic intrinsic for each combination of memory type
(global, shared, image, etc) and atomic operation (add, sub, etc). So for m
types of memory supported by the driver and n atomic opcodes, the driver has to
handle O(mn) intrinsics. This makes a total mess in every single backend I've
looked at, without fail.
It would be a lot nicer to unify the intrinsics. There are two obvious ways:
1. Make the memory type a constant index, keep different intrinsics for
different operations. The problem with this is that different memory types
imply different intrinsic signatures (number of sources, etc). As an
example, it doesn't make sense to unify global_atomic_amd with
global_atomic_2x32, as an example. The first takes 3 scalar sources, the
second takes 1 vector and 1 scalar. Also, in any single backend, there are a
lot more operations than there are memory types.
2. Make the opcode a constant index, keep different intrinsics for different
operations. This works well, with one exception: compswap and fcompswap
take an extra argument that other atomics don't, so there's an extra axis of
variation for the intrinsic signatures.
So, the solution is to have 2 intrinsics for each memory type -- for atomics
taking 1 argument and atomics taking 2 respectively. Both of these intrinsics
take an nir_atomic_op enum to describe its operation. We don't use a nir_op for
this purpose, as there are some atomics (cmpxchg, inc_wrap, etc) that don't
cleanly map to any ALU op and it would be weird to force it.
The plan is to transition to these new opcodes gradually. This series adds a
lowering pass producing these opcodes from the existing opcodes, so that
backends can opt-in to the new forms one-by-one. Then we can convert backends
separately without any cross-tree flag day. Once everything is converted, we can
convert the producers and core NIR as a flag day, but we have far fewer
producers than backends so this should be fine. Finally we can drop the old
stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
We need to keep variables in the IR because a few places use them,
like nir_build_program_resource_list. This will allow us to lower IO
in the linker.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21861>
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
We have specialized lowering passes dealing with most of that already:
1. gl_nir_lower_samplers_as_deref
2. nir_lower_samplers
3. nir_lower_cl_images
If we need more than that, those passes can deal with following deref
chains as well.
We _might_ need to improve nir_lower_cl_images a bit for more complex
kernels, but CL also doesn't allow indirect images, so we are always able
to optimize the entire deref chain away.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>
If the offset is negative like it's the case in
dEQP-VK.robustness.robustness2.bind.notemplate.r32i.unroll.volatile.storage_buffer_dynamic.readwrite.no_fmt_qual.len_256.samples_1.1d.comp
we end up passing the bounds checking condition because it's using
signed integers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20762>
Move them up to where the other conversion helpers. For nir_b2<T>(),
suffix them with N like all the others and make them use
nir_type_convert() as well.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20067>
This pass can insert if blocks, therefore no dominance/block_index for
you.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19818>
...even for non-pixel interpolation, for consistency. Otherwise backends get
funny intrinsics with interpolateAt:
vec4 32 ssa_4 = intrinsic load_interpolated_input (ssa_3, ssa_2) (base=1, component=0, dest_type=invalid /*0*/, io location=33 slots=1 /*161*/)
We know it'll be a float, but backends shouldn't need to special case this. (Or
maybe interpolated_input shouldn't have a dest_type index. I'd be ok with that
resolution too. But having one and not setting it consistently is wrong.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19085>
A task shader must use this instruction to specify the dimensions
of the launched mesh shader workgroups.
It is a terminating instruction.
When the task shader doesn't have the optional payload, use the
pre-existing launch_mesh_workgroups intrinsics.
When the task shader has a payload, use a new
launch_mesh_workgroups_with_payload_deref intrinsics which has
a deref that refers to the payload variable.
We also add this new intrinsic to nir_lower_io which lowers this
to the pre-existing explicit intrinsic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18366>
This is almost always a nir_instr and updating the src of a nir_if will
have to work slightly differently in the future.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12910>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
Before, if the ssbo is too large this would always return 0.
Also, this code is easier to optimize, so the common case of offset 0
and pot stride results in one ushr instead of 5+ instructions.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>
Perform address calculation in 32 bits when
dealing with inbounds array derefs.
Closes: #6562
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16729>
This is useful for drivers which wish to consume XFB information. These
hopefully-uncontroversial hunks are extracted from the much more controversial
"st,nir,radeons: Move nir_lower_io_passes to si_nir_lower_io" by Jason.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
For the NIR XFB gathering as well as all the Vulkan drivers, buffer
strides in nir_xfb_info are in bytes. When Marek started using
nir_xfb_info for GLSL on radeonsi, he copied directly from the GLSL
struct which has strides in dwords. This inconsistency didn't show up
until I went through and started us using the NIR passes for GL drivers
directly without going through the GLSL structs. We could change the
nir_xfb_buffer_info field to be in dwords to be consistent with
shader_info but that would mean changing all the Vulkan drivers but, for
now, it's easier to always use bytes in nir_xfb_info.
Fixes: 2a22885a45 ("st,nir: Use nir_shader::xfb_info in nir_lower_io_passes")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
PRIMITIVE_INDICES is a flat array in NV_mesh_shader,
not a proper arrayed output, as opposed to D3D-style
mesh shaders where it's addressed by the primitive index.
Prevent assigning several slots to primitive indices,
to avoid issues.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15160>
moved from radeonsi without the vectorization, which won't be needed for
now. We will lower IO in st/mesa instead of radeonsi to get the transform
feedback info into store instructions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
Task shader outputs work differently than other shaders, so they
need special consideration. Essentially, they have two kinds of
outputs:
1. Number of mesh shader workgroups to launch.
Will be still represented by a shader output.
2. Optional payload of up to (at least) 16K bytes.
These payload variables behave similarly to shared memory, but
the spec doesn't actually define them as shared memory (also, they
may be implemented differently by each backend), so we need to add
a new NIR variable mode for them.
These payload variables can't be represented by shader outputs
because the 16K bytes don't fit the 32x vec4 model that NIR uses
for its output variables.
This patch adds a new NIR variable mode: nir_var_mem_task_payload
and corresponding explicit I/O intrinsics, as well as support for
this new mode in nir_lower_io.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14930>
Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer
to correctly declare output variables.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
I aimed for "things that look like big switch statements, or cases where
the compiler is unlikely to be able to constant-propagate an argument into
something useful."
Saves another 80kb on disk. No perf difference on iris shader-db, n=23.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
Mesh shader outputs are either:
- non-array builtins
- array builtins that are either per-primitive or per-vertex
- user-defined outputs that must be either per-primitive or per-vertex
So we can identify any array output as "arrayed" for the purposes of
I/O lowering.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>