nir_foreach_src() bails after cb returns false for any src. Which isn't
the behavior we were looking for. Move progress flag to state struct
instead, so we don't skip visiting some sources.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>
Normal UBOs have explicit strides on them, make our lowered one behave the
same.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
Without this, there's no way to match the UBO nir_variable declarations to
the load_ubo intrinsics referencing their data.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
These won't work since a workgroup can span more than one thread, and
the temporaries are not shared memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Mesh shader outputs are either:
- non-array builtins
- array builtins that are either per-primitive or per-vertex
- user-defined outputs that must be either per-primitive or per-vertex
So we can identify any array output as "arrayed" for the purposes of
I/O lowering.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Per-primitive is similar to per-vertex attributes, but applies to all
fragments of the primitive without any interpolation involved.
Because they are regular input and outputs, keep track in shader_info
of which I/O is per-primitive so we can distinguish them after deref
lowering. These fields can be used combined with the regular
`inputs_read`, `outputs_written` and `outputs_read`.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
We were treating each field as if it took up a single slot. However
that's not the case. And with strict matching (GLSL 4.20+ / ES 3.1+) we
would end up not matching identical interfaces.
Fixes: c4545676d7 ("glsl/linker: fix location aliasing checks for interface variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12479>
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.
This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.
v2: Move prev block ending check earlier in the function (Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
v2 (Ivan): Add missing capability enum handling.
v3 (idr): Properly handle cases where dest_size != 32.
v4 (idr): Rewrite most of the error checking to use vtn_fail_if. Use
nir_ssa_def with vtn_push_nir_ssa instead of vtn_ssa_value with
vtn_push_ssa_value. All suggested by Jason. Massive rewrite of the
handling of packed 4x8 saturating opcodes. Based on some observations
made by Jason.
v5 (idr): Remove some debugging cruft accidentally added in v4. Noticed
by Jason.
v6: Emit packed versions of vectored instructions when possible.
Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Fix copy-and-paste bugs in lowering patterns.
v3: Add has_sudot_4x8 flag. Requested by Rhys.
v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Add and modify patterns to let constant folding do better.
v3: Remove '(is_not_zero)' from the patterns that try to combine
addends. I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in. Noticed by
Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by
Alyssa and Jason. Add missing sudot_4x8 patterns.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These
represent the combinations of integer dot-product and add that operate
on packed source vectors. That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.
Some hardware may prefer to operate on unpacked byte vectors. When such
hardware comes to Mesa, we'll have to figure out how to name things.
v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes
are not 2-source commutative.
v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur. Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts. All suggested by Jason.
v4: Fix typo previously noticed by Alyssa but missed in v2.
v5: Add has_sudot_4x8 flag. Requested by Rhys.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Without this, lowered saturating ALU instructions would only clamp to
the range of the new type instead of the range of the old type.
v2: Use nir_iclamp. Suggested by Jason. Use new
u_{int,uint}N_{min,max}() helpers.
Fixes: 090e282407 ("nir: Add a saturated unsigned integer add opcode")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
Clang analyzer thinks struct_base_offset can be used uninitialized
because it doesn't know that glsl_type_is_struct_or_ifc returns
the same value for the same type.
Refactor the code to make it clear what is going on. As a side effect
this should be faster because glsl_get_length and
glsl_type_is_struct_or_ifc will be called only once (they are not
inline functions).
This is an alternative approach to
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12399.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12403>
Be able to inline uniforms in loop for unrolling it.
Nested loop/if is also supported.
Some example:
for (i = 0; i < count; i++)
...
uniform "count" will be inlined. But note this does not
make sure the loop will be unrolled (ie. count = 1000).
for (i = 0; i < count; i++)
for (j = init; j < 10; j++)
if (type == 2)
...
uniform "count", "init" and "type" will be inlined.
It is intentional to not be too aggressive to add uniforms
to avoid false positive case while be able to support most
common usage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.
Don't bother inline uniform to deal with such kind of variable
either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>