With the pass order shuffling, code like `(x & 0xf) + (x & 0xfffffff0)` gets
optimized to bitfield_select(0xF, x, x). But it would be much better to optimize
simply to x. nir_opt_algebraic would do that for us but we run this pass too
late for algebraic to save us from ourselves, so be smarter.
Observed on dEQP-GLES31.functional.compute.basic.image_atomic_op_local_size_8
with Jay, this saves an instruction there.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40956>
Inspired by a commit message in !30934, I set about optimizing the code
generated for nir_copysign. It would be possible to just implement an
opt_algebraic pattern for the specific values used by nir_copysign, but
this casts a slightly larger net.
As noted in a comment in the code, there may be variations of the
pattern that this pass misses. The opt_algebraic pattern would miss them
too.
v2: Use nir_def_replace. Suggested by Alyssa. Allow more "root"
instruction types. Suggested by Georg.
v3: Treat extract_u16(x, 0) as (x & 0x0000ffff), and treat extract_u8(x,
0) as (x & 0x000000ff).
v4: Use nir_scalar. Suggested by Georg.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>