v2: Fix copy-and-paste bugs in lowering patterns.
v3: Add has_sudot_4x8 flag. Requested by Rhys.
v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Add and modify patterns to let constant folding do better.
v3: Remove '(is_not_zero)' from the patterns that try to combine
addends. I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in. Noticed by
Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by
Alyssa and Jason. Add missing sudot_4x8 patterns.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These
represent the combinations of integer dot-product and add that operate
on packed source vectors. That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.
Some hardware may prefer to operate on unpacked byte vectors. When such
hardware comes to Mesa, we'll have to figure out how to name things.
v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes
are not 2-source commutative.
v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur. Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts. All suggested by Jason.
v4: Fix typo previously noticed by Alyssa but missed in v2.
v5: Add has_sudot_4x8 flag. Requested by Rhys.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Without this, lowered saturating ALU instructions would only clamp to
the range of the new type instead of the range of the old type.
v2: Use nir_iclamp. Suggested by Jason. Use new
u_{int,uint}N_{min,max}() helpers.
Fixes: 090e282407 ("nir: Add a saturated unsigned integer add opcode")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
Clang analyzer thinks struct_base_offset can be used uninitialized
because it doesn't know that glsl_type_is_struct_or_ifc returns
the same value for the same type.
Refactor the code to make it clear what is going on. As a side effect
this should be faster because glsl_get_length and
glsl_type_is_struct_or_ifc will be called only once (they are not
inline functions).
This is an alternative approach to
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12399.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12403>
Be able to inline uniforms in loop for unrolling it.
Nested loop/if is also supported.
Some example:
for (i = 0; i < count; i++)
...
uniform "count" will be inlined. But note this does not
make sure the loop will be unrolled (ie. count = 1000).
for (i = 0; i < count; i++)
for (j = init; j < 10; j++)
if (type == 2)
...
uniform "count", "init" and "type" will be inlined.
It is intentional to not be too aggressive to add uniforms
to avoid false positive case while be able to support most
common usage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.
Don't bother inline uniform to deal with such kind of variable
either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Collect per vector component dependency and lower vector uniform
load to scalar if any component need to be inlined.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Unless all uniforms in the condition can be inlined we can
lower the if/loop. So we rollback added uniforms when one
of uniforms in a if condition fail to be added.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR. The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:
c = a.x ^ b.x;
d = a.y ^ b.y;
e = a.z ^ b.z;
After this change, it looks more like:
uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
c = extract_u8(t, 0);
d = extract_u8(t, 1);
e = extract_u8(t, 2);
On Ice Lake, this results in:
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
This eliminates some spurious, size-converting moves. For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
When operators other than eq and ne are involved we can't really
move operands around and negate them because such transformation
may change the value of the whole expression.
Some examples:
For unsigned var:
0 >= 1u + var would eventually become 0xffffffff >= var,
which would always evaluate to true, when original expression
was true only for var == 0xffffffff.
For signed var:
0 >= 1 + var would become -1 >= var, which would evaluate to
false for var == 2147483647, when original expression evaluated
to true (because signed overflow is defined to wrap around in
glsl, 1 + 2147483647 == -2147483648, so 0 >= -2147483648).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5226
Fixes: 34ec1a24d6 ("glsl: Optimize (x + y cmp 0) into (x cmp -y).")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12359>
Enabling GVN uncovered a bug where we would crash if the pass
thinking about pushing something into a loop.
Fixes: 6538b3e566 ("nir: add heuristic for instructions in loops with GCM")
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>
This shouldn't do anything but will make testing a later patch easier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
This shouldn't do anything but will make testing a later patch easier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
This reverts commit ad363913e6.
This code was added to be able to compare the output file while porting
the script from python2 to python3, but this has long been finished and
the extra complexity is not needed anymore.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
If a loop is followed by a barrier, the ordering between a load inside
the loop and other memory operations after the barrier may have to be
preserved depending on the type of memory involved. This is relevant
when the memory is writeable by other invocations. In such case, it
is not valid to completely eliminate the loop.
This commit doesn't attempt to precisely catch the barrier case, as
analysis could become too complex. It simply assumes it can't drop
the loops that contain certain types of loads unless those are known
to be safe to reorder (via the access flag).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938>
Without this stitch_blocks complains about ending in a jump with a
non-empty block after the inserted body.
I hit this with CTS raytracing tests where we tried to inline a
function that basically ended up being something like
{
ignore_ray_intersection
halt
}
I kept the nop path when possible as that does not leave a mess
for the optimization loop to optimize.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12163>
Need to avoid lowering temps when they are used by other instructions,
like the rt instructions (some of the shader call parameters get
converted to temp variables and we will lower them later with
the explicit io lowering pass as we need to guarantee they will
end up in scratch).
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12162>
gl_FragDepth default value is gl_FragCoord.z so if a shader does:
gl_FragDepth = gl_FragCoord.z
we can drop this assignment.
v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
- handle var = NULL (Rhys)
v4: refactoring (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>