We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
to strengthen it and fix a rare case of incorrect behavior.
This is the incorrect behavior that's fixed:
if (invocation_id == 0)
patch_output[0] = invocation_id;
else if (invocation_id == 1)
patch_output[1] = invocation_id;
else
patch_output[2] = invocation_id;
The stored SSA defs are the same, but each invocation writes a different
value to different outputs, so they are not duplicated, but the code
incorrectly treated them as duplicated and removed [1] and [2].
The requirement for correctness is that if an output is duplicated,
it must have stores in the same blocks as the output it duplicates.
To fix the incorrect behavior, awareness of blocks is added to the algorithm,
which leads to the following new cases that are optimized:
1. Different blocks store different SSA defs, but equal in each block:
if (..) {
output[0] = a;
output[1] = a; // eliminated
} else {
output[0] = b;
output[1] = b; // eliminated
}
2. Different GS emit sections store different SSA defs, but equal in each
section:
output[0] = a;
output[1] = a; // eliminated
emit_vertex;
output[0] = b;
output[1] = b; // eliminated
emit_vertex;
3. A weird case that could be duplicated but is left alone due to
the counterexample above:
if (..)
output[0] = a;
else
output[1] = a;
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41329>
Intel uses nir_lower_bit_size to convert 8-bit integer values to 16-bit
for most instructions. By constant folding u2u8 or i2i8 through a bcsel,
this lowering is undone.
Fixes assertion failure in fossils/parallel-rdp/small_subgroup.foz.
fossilize-replay: src/intel/compiler/brw/brw_from_nir.cpp:852: void brw_from_nir_emit_alu(nir_to_brw_state&, nir_alu_instr*, bool): Assertion `brw_type_size_bytes(op[i].type) > 1' failed.
v2: Reject all integer conversions. Suggested by Daniel Schürmann.
Fixes: f4812dc11d ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41412>
nir_intrinsic_index_size() expects a nir_intrinsic_index_flag, not
the position in the intrinsic's index list. This could cause
part of a multi-slot index to be ignored.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41593>
nir_def_init sets divergent = true, this means for something like
reduce(reduce(convergent)) we previously only optimized the inner
reduce.
No fossil changes at the moment, but I hit this when trying to
optimize shared memory to subgroup operations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41542>
This was fundamentally broken for workgroup sizes >= 8x8.
This fixes new VKCTS coverage
dEQP-VK.glsl.texture_functions.texture.*_compute, and also few tests
from the vkd3d-proton testsuite (note that quad derivatives is
currently disabled for < GFX12).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41483>
This splits the nir_move_to_top_input_loads option into 2 options. The latter
option is mainly for at_offset/at_sample loads. Then it updates most places to
use only the first option.
The rationale is that moving at_sample loads makes Control (game) shaders
worse, as per the code comment.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41167>
This makes adding workgroup scope easier, this just creates the
split_box and moves things into it and adds some helpers.
This also rewrites some loops from r/c into i which calc r/c
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41500>
Color interpolation (INTERP_MODE_NONE) has unknown barycentrics
and it could be flat shading at runtime.
It's a problem when shader_info is expected to match what's actually
used.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41226>
Add a new intrinsic to read the raw shading rate provided to the FS
payload, and lower load_frag_shading_rate in NIR using it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
We'll need the raw coverage mask provided to the fragment shader in a
future patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
Instead of dirtying the root buffer and re-uploading the whole thing for each
draw where a per-draw value like the draw ID is changed, use a smaller
secondary buffer for per-draw data. We can also skip flushing state for every
indiviual batched draw and just flush once for the whole draw command.
This may also be useful in the future for handling how sized index buffers from
maintenance5 and null index buffers from maintenance6 work with robustness2,
allowing us to pass through indexed draw parameters and lower the index buffer
read into the shader with bounds checks.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41399>
lower_boolean_reduce only works if the number of components is 1, and even
asserts on this in its prologue. Otherwise, given a boolean vector type, it
may produce output using ballot/vote with a boolean vector input.
Acked-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41186>