This really hacky commit adds a bit size to registers and SSA values. It
also adds rules in the validator to validate that they do the right things.
It's still an open question as to whether or not we want a bit_size in
nir_alu_instr or if we just want to let it inherit from the destination.
I'm inclined to just let it inherit from the destination. A similar
question needs to be asked about intrinsics.
v2 (Connor):
- Relax validation: comparisons have explicit destination sizes
and implicit source sizes.
v3 (Sam):
- Use helpers to get size and base types of nir_alu_type enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Fix size/type mask to properly handle 8-bit types.
v3: Add helpers to get the bitsize and base type of a
nir_alu_type enum.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration. This changes
nir_print to print the type along with each parameter or return variable.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning. Let's just get rid
of it for now. We can always put it back in if we decide to use NIR for
function inlining at some point in the future.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This new pass lowers load/store_var intrinsics that act on indirect derefs
to if-ladder of direct load/store_var intrinsics. The if-ladders perform a
simple binary search on the indirect.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Unfortunately, glslang gives us cull/clip distance and GS streams even if
the shader doesn't use it whenever a shader is declared as version 450.
This is a glslang bug, but we can easily enough ignore it for now.
"Result is the same as computing the sum of the absolute values of
OpDPdx and OpDPdy on P."
We were doing sum of absolute values of OpDPdx of P and OpDPdx of NULL.
These were originally added to reduce compiler warnings but aren't really
needed. Getting rid of them reduces the diff between the Vulkan branch and
master, so we might as well.
When NIR was originally drafted, there was no easy way to determine if
something was constant or not. The result was that we had lots of
special-casing for constant values such as this. Now that load_const
instructions are SSA-only, it's really easy to find constants and this
isn't really needed anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
This fixes two issues. First, we had a use-after-free in the case where
the instruction got deleted and we tried to return mov->dest.write_mask.
Second, in the case where we are doing a self-mov of a register, we delete
those channels that are moved to themselves from the write-mask. This
means that those channels aren't reported as being handled even though they
are. We now stash off the write-mask before remove unneeded channels so
that they still get reported as handled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
The sub-determinate implementation pattern fixed by
6a7e2904e0 has a second instance in the
same file.
With the previous algorithm, when row and j are both 3, the index
overruns the array. This only impacts the stack on 32 bit builds.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit adds the capability to NIR to support separate textures and
samplers. As it currently stands, glsl_to_nir only sets the texture deref
and leaves the sampler deref alone as it did before and nir_lower_samplers
assumes this. Backends can still assume that they are combined and only
look at only at the texture index. Or, if they wish, they can assume that
they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir
all set both texture and sampler index whenever a sampler is required (the
two indices are the same in this case).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to separate the two concepts. When we do, the sampler will
become optional. Doing a rename first makes the separation a bit more
safe because drivers that depend on GLSL or TGSI behaviour will be fine to
just use the texture index all the time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Direct access to intr->const_index[n], where different slots have
different meanings, is somewhat confusing.
Instead, let's put some extra info in nir_intrinsic_infos[] about which
slots map to what, and add some get/set helpers. The helpers validate
that the field being accessed (base/writemask/etc) is applicable for the
intrinsic opc, for some extra safety. And nir_print can use this to
dump out decoded const_index fields.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>