Replace swizzles on the LHS with additional swizzles on the RHS and a
write mask in the assignment instruction. As part of this add
ir_assignment::set_lhs. Ideally we'd make ir_assignment::lhs private
to prevent erroneous writes, but that would require a lot of code
butchery at this point.
Add ir_assignment constructor that takes an explicit write mask. This
is required for ir_assignment::clone, but it can also be used in other
places. Without this, ir_assignment clones lose their write masks,
and incorrect IR is generated in optimization passes.
Add ir_assignment::whole_variable_written method. This method gets
the variable on the LHS if the whole variable is written or NULL
otherwise. This is different from
ir->lhs->whole_variable_referenced() because the latter has no
knowledge of the write mask stored in the ir_assignment.
Gut all code from ir_to_mesa that handled swizzles on the LHS of
assignments. There is probably some other refactoring that could be
done here, but that can be left for another day.
Calling exit() on a memory failure probably made sense for the
standalone preprocessor, but doesn't seem too appealing as part of
the GL library. Also, we don't use it in the main compiler.
We only uploaded up to the highest offset a program would use,
and if the constant buffer isn't changed when a new program is
used, the new program is missing the rest of them.
Might want to introduce a "fill state" for user mem constbufs.
Mesa will do the mapping at _mesa_add_sampler() time. Fixes assertion
failures in debug builds, which might have caught real problems with
multiple samplers linked in a row.
Instead of using a linker-assigned location (since samplers don't
actually take up uniform space, being a link-time choice), use the
sampler's varaible pointer as a hash key.
The new compiler doesn't generate Mesa IR at compile time, and that
compile time code previously wouldn't have reflected the link time
code that actually got used. But do dump the info log of the compile
regardless.
In most cases, we needed to be reparenting the cloned IR to a
different context (for example, to the linked shader instead of the
unlinked shader), or optimization before the reparent would cause
memory usage of the original object to grow and grow.
seems hw can do unaligned accesses and unaligned strides
removes extra conversion when using vbo's
however I needed to switch 3 component byte format to 4 component formats
for tests to pass. Somewhat sililar to GL_SHORT fix done earlier
removes assert and gains +2 piglit especially draw-vertices
The BGNLOOP and ENDLOOP instructions are now being used correctly, which
makes break and continue possible. The deadcode pass has been modified to
handle breaks, and the compiler is more careful about which loops are
unrolled.