In a fragment shader or inside of control flow, invocation 0 might be
inactive, and so our use-first-invocation-and-broadcast optimizations
would be invalid, and the loop logic of an emit_read_invocation would
defeat the point of these optimized paths.
For load_kernel_input, I didn't guard the uniform path with the check
because some CL tests that are currently passing are doing the
load_kernel_input under (presumably) uniform control flow. Instead, I
dropped in a once warning for the next person to be debugging CL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>