We also enable it in all of the NIR drivers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This just makes it nicely scale across bit sizes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit contains three related changes. First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions. Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool. Third, for destinations, we use a signed integer
type and just do -(int)bool_val which will give us the 0/-1 behavior we
want and neatly scales to all bit widths.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Later in this series, bool is not going to imply 32-bit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15072525 -> 15072525 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This helps prevent regressions in later commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289 "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative. This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.
Reviewed-by: Eric Anholt <eric@anholt.net>
The pass did not correctly handle loops ending in:
if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}
The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.
Fixes: 5921a19d4b ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
I needed the same function for v3d. This was originally in d3e046e76c
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helps a lot when debugging image load/store lowering on large
testcases. Unfortunately the Mesa enum name stuff is under src/mesa and
we can't get at it from the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
I needed the same functions for v3d. Note that the color value in the
Intel lowering has already been cut down to image.chans num_components.
v2: Drop the half float one, since it was a 1-liner after cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of the bits were constant, but a few were missed. Avoids warnings
from v3d's upcoming static const bits declarations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This has thrown a few people off recently and it's good to have the
process and all the rational for it documented somewhere. A comment at
the top of nir_inline_functions seems as good a place as any.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't know if one is better than the other or not but this approach
has the advantage that we never forget to copy information over and
we're not hard-coding quite as many assumptions. It's also a lot
simpler and much less code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Instead of having to call two different lower_gradient functions based
on whether or not it's a cube, just make lower_gradient handle cubes.
This significantly simplifies some of the logic.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Just to make it easier to run a nir tests together.
Fixes: a0ae12ca91
("nir/algebraic: Add unit tests for bitsize validation")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
b2i can now take any size boolean in preparation for 1-bit booleans, so
the error message printed is slightly different.
Fixes: dca6cd9ce6 ("nir: Make boolean conversions sized just like the others")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108961
Cc: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Suffixes are dropped from a bunch of conversion opcodes when it makes
sense to do so. Others are kept if we really do want the bit-size
restriction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
All conversion opcodes require a destination size but this makes
constructing certain algebraic expressions rather cumbersome. This
commit adds support to nir_search and nir_algebraic for writing
conversion opcodes without a size. These meta-opcodes match any
conversion of that type regardless of destination size and the size gets
inferred from the sizes of the things being matched or from other
opcodes in the expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead of using an OrderedDict, just have a (necessarily sorted) array
of transforms and a set of opcodes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Both of these things are already handled in the Value base class so we
don't need to handle them explicitly in Constant.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>