Phi handling is somewhat intrinsically tied to the CFG. Moving it here
makes it a bit easier to handle that. In particular, we can now do SSA
repair after we've done the phi node second-pass. This fixes 6 CTS tests.
This is a port of Matt's GLSL IR lowering pass to NIR. It's required
because we translate SPIR-V directly to NIR, bypassing GLSL IR.
I haven't introduced a lower_ldexp flag, as I believe all current NIR
consumers would set the flag. i965 wants this, vc4 doesn't implement
this feature, and st_glsl_to_tgsi currently lowers ldexp
unconditionally anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
struct.pack('i', val) interprets `val` as a signed integer, and dies
if `val` > INT_MAX. For larger constants, we need to use 'I' which
interprets it as an unsigned value.
This patch makes us use 'I' for all values >= 0, and 'i' for negative
values. This should work in all cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is untested and probably broken.
We already passed the atan2 CTS tests before implementing this opcode.
Presumably, glslang or something was giving us a plain Atan opcode
instead of Atan2. I don't know why.
SPIR-V only has one ImageQuerySize opcode that has to work for both
textures and storage images. Therefore, we have to special-case that one a
bit and look at the type of the incoming image handle.
Image intrinsics always take a vec4 coordinate and always return a vec4.
This simplifies the intrinsics a but but also means that they don't
actually match the incomming SPIR-V. In order to compensate for this, we
add swizzling movs for both source and destination to get the right number
of components.
The Vulkan spec requires all allocations that happen for device creation to
happen with SCOPE_DEVICE. Since meta calls into other things that allocate
memory, the easiest way to do this is with an allocator.
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address. The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
The __gen_fixed helper properly clamps the value and also handles negative
values correctly. Eventually, we need to make the scripts generate this
and use it for more things.
Instead of emitting the continue before the loop body we emit it
afterwards. Then, once we've finished with the entire function, we run
nir_repair_ssa to add whatever phi nodes are needed.
The efficiency should be approximately the same. We do a little more work
per phi node because we have to sort the predecessors. However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis. Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.