This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This makes a very simple 1.30 shader go from 196k of memory to 9k.
NOTE: This -may- be a candidate for the 7.9 branch, as the benefit is
substantial. However, it's not a simple change, so it may be wiser to
wait for 7.10.
We are not aware of any GPU that actually implements the cross product
as a single instruction. Hence, there's no need for it to be an opcode.
Future commits will remove it entirely.
In particular, calling the abs function is silly, since there's already
an expression opcode for that. Also, assigning to temporaries then
assigning those to the final location is rather redundant.
This works around MSVC's 65535 byte limit, unfortunately at the expense
of any semblance of readability and much larger file size. Hopefully I
can implement a better solution later, but for now this fixes the build.
This implements round() via the ir_unop_round_even opcode, rather than
adding a new opcode. We may wish to add one in the future, since it
might enable a small performance increase on some hardware, but for now,
this should suffice.
It turns out that most people new to this IR are surprised when an
assignment to (say) 3 components on the LHS takes 4 components on the
RHS. It also makes for quite strange IR output:
(assign (constant bool (1)) (x) (var_ref color) (swiz x (var_ref v) ))
(assign (constant bool (1)) (y) (var_ref color) (swiz yy (var_ref v) ))
(assign (constant bool (1)) (z) (var_ref color) (swiz zzz (var_ref v) ))
But even worse, even we get it wrong, as shown by this line of our
current step(float, vec4):
(assign (constant bool (1)) (w)
(var_ref t)
(expression float b2f (expression bool >=
(swiz w (var_ref x))(var_ref edge))))
where we try to assign a float to the writemasked-out x channel and
don't supply anything for the actual w channel we're writing. Drivers
right now just get lucky since ir_to_mesa spams the float value across
all the source channels of a vec4.
Instead, the RHS will now have a number of components equal to the
number of components actually being written. Hopefully this confuses
everyone less, and it also makes codegen for a scalar target simpler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise builtin_profiles contains dangling pointers the next time
_mesa_read_profile is called. I suspect this may fix bugzilla #29847,
but I was never able to reproduce it.