Only NPOT vectors greater than vec4 use the extra uint32.
This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.
There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It can be derived from src and var. This frees 10 bits in the header
that will be used later.
"mode" is moved in the structure, because those bits will be used for
something else later.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The majority of constants can be packed like this.
v2: - use enum for the packing encoding,
- trim packed_value to 20 bits add 1 bit to last_component,
which simplifies a later commit
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This adds the option to lower 64-bit ufind_msb opcodes.
v2: use split_x/y removes component loops (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.
v2: fix validation of nir_intrinsic_instr::num_components
correct validation of instr->num_components
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be useful as a deterministic identifier/index for the variable.
v2: fix comment style
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Doesn't shrink it (at least, on x86-64) and leaves space for more members.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Adds nir_type_bool8 as well as 8-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds nir_type_bool16 as well as 16-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested by Jason.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>