We skip emitting ffma_weak here, because otherwise we'd require a lowering
loop with opt_algebraic and lower_double_ops and this way it's also
cheaper.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
to strengthen it and fix a rare case of incorrect behavior.
This is the incorrect behavior that's fixed:
if (invocation_id == 0)
patch_output[0] = invocation_id;
else if (invocation_id == 1)
patch_output[1] = invocation_id;
else
patch_output[2] = invocation_id;
The stored SSA defs are the same, but each invocation writes a different
value to different outputs, so they are not duplicated, but the code
incorrectly treated them as duplicated and removed [1] and [2].
The requirement for correctness is that if an output is duplicated,
it must have stores in the same blocks as the output it duplicates.
To fix the incorrect behavior, awareness of blocks is added to the algorithm,
which leads to the following new cases that are optimized:
1. Different blocks store different SSA defs, but equal in each block:
if (..) {
output[0] = a;
output[1] = a; // eliminated
} else {
output[0] = b;
output[1] = b; // eliminated
}
2. Different GS emit sections store different SSA defs, but equal in each
section:
output[0] = a;
output[1] = a; // eliminated
emit_vertex;
output[0] = b;
output[1] = b; // eliminated
emit_vertex;
3. A weird case that could be duplicated but is left alone due to
the counterexample above:
if (..)
output[0] = a;
else
output[1] = a;
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41329>
Intel uses nir_lower_bit_size to convert 8-bit integer values to 16-bit
for most instructions. By constant folding u2u8 or i2i8 through a bcsel,
this lowering is undone.
Fixes assertion failure in fossils/parallel-rdp/small_subgroup.foz.
fossilize-replay: src/intel/compiler/brw/brw_from_nir.cpp:852: void brw_from_nir_emit_alu(nir_to_brw_state&, nir_alu_instr*, bool): Assertion `brw_type_size_bytes(op[i].type) > 1' failed.
v2: Reject all integer conversions. Suggested by Daniel Schürmann.
Fixes: f4812dc11d ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41412>
It seems like davinci resolve conflicts on those symbols and we got
regressions from our static libstdc++ linking workaround.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41488>
nir_intrinsic_index_size() expects a nir_intrinsic_index_flag, not
the position in the intrinsic's index list. This could cause
part of a multi-slot index to be ignored.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41593>
nir_def_init sets divergent = true, this means for something like
reduce(reduce(convergent)) we previously only optimized the inner
reduce.
No fossil changes at the moment, but I hit this when trying to
optimize shared memory to subgroup operations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41542>
This was fundamentally broken for workgroup sizes >= 8x8.
This fixes new VKCTS coverage
dEQP-VK.glsl.texture_functions.texture.*_compute, and also few tests
from the vkd3d-proton testsuite (note that quad derivatives is
currently disabled for < GFX12).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41483>
This splits the nir_move_to_top_input_loads option into 2 options. The latter
option is mainly for at_offset/at_sample loads. Then it updates most places to
use only the first option.
The rationale is that moving at_sample loads makes Control (game) shaders
worse, as per the code comment.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41167>
This makes adding workgroup scope easier, this just creates the
split_box and moves things into it and adds some helpers.
This also rewrites some loops from r/c into i which calc r/c
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41500>
Move the VecPair<A, B> data structure from NAK's ir.rs to the shared
compiler Rust crate so it can be reused by other backends.
The fields are private, and NAK's ir.rs (now in a different crate)
needs to read and mutate the inner Vecs. Add a_as_slice(..),
a_as_mut_slice(..), b_as_slice(..) and b_as_mut_slice(..), and update
NAK's SrcsAsSlice and DstsAsSlice impls to call them. Returning slices
keeps callers from changing the length of one side without the other,
which is what VecPair is built to prevent.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41435>
Color interpolation (INTERP_MODE_NONE) has unknown barycentrics
and it could be flat shading at runtime.
It's a problem when shader_info is expected to match what's actually
used.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41226>