IR_LOOP now has two children: the body code, and the tail code.
Tail code is the "i++" part of a for-loop, or the expression at the end
of a "do {} while(expr);" loop.
"continue" translates into: "execute tail code; CONT;"
Also, the test for infinite do/while loops was incorrect.
ctx->Shader.EmitCondCodes determines if we use condition codes.
If not, IF statement uses first operand's X component as the condition.
Added OPCODE_BRK0, OPCODE_BRK1, OPCODE_CONT0, OPCODE_CONT1 to handle
the common cases of conditional break/continue.
Previously, comparing vec2, vec3, vec4 was broken.
Added IR_EQUAL, IR_NOTEQUAL nodes/operators to compute boolean
equality/inequality vs. IR_SEQUAL/IR_SNEQUAL which work component-wise.
Use IR_EQUAL/IR_NOTEQUAL for the == and != operators.
To compute vec4 equality, use SNE, DP4, SEQ instruction sequence.
There are times when we don't want to allow swizzling when searching for or
adding vector constants. Passing NULL for swizzleOut disables swizzling.
This fixes a constant/swizzle bug in link_uniform_vars().
The index is no longer necessary to share constants between multiple
SIN/COS/SCS instructions inside a single fragment program, and storing
a tiny implementation detail like this in the fragment_program structure
itself was just nasty.
The constant/parameter allocation was significantly simplified, removing
one unnecessary copy operation of parameters. The dirty state tracking is
unchanged and far from optimal, since all state is always re-fetched.
Constants and parameters are now emitted only once, which significantly
reduces the resource pressure on larger programs.
Make sure that instruction slots are fully initialized with NOPs during
find_and_prepare_slot(). This fixes a bug when a fragment program was
translated more than once (e.g. due to a second call to glProgramStringARB).
This partially fixes glean/fragProg1.
This is a necessary change to emit the right instructions when writing
to result.depth.
However, even with this test, Z-write doesn't work properly, and I don't
fully understand why. In addition to this, we'll at least have to disable
early-Z, but even that doesn't seem to be enough.
Fix a bug in the LIT implementation (clamp exponent to 128, not 0.5)
and change the implementation around. In theory, the new implementation
needs as little as 5 instruction slots. Unfortunately, the dependency
analysis in find_and_replace_slot is not strong enough to look at
individual components of a register yet.