llvmpipe: add bitcasts around fptrunc/fpext operations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run

We are using different llvm type depending on the cpu supporting f16c
instructions or not.
The reasoning behind this was that we really couldn't do anything with f16
values and had to cast them to some int type anyway, plus IIRC originally
this actually predates llvm even supporting a half type in the first place
(or if it did, at the very least it was not able to do anything useful with
it).
There are now bugs with lavapipe when the cpu doesn't support f16c, since while
we don't expose f16 capabilities in this case, we can still hit f16 conversion
functions for the likes of unpack2x16float and quantizeToF16, and we're just
straight calling fpext/fptrunc functions, not touching our own code for half
conversion (I believe our own code might still be faster as llvm de-vectorizes
it if it's not supported by the cpu, but don't quote me on that - could depend
on llvm version, and also for trunc the rounding is actually different since
our own functions implement rounding according to d3d10 requirements (mostly
used for f16 render targets)).
This only seems to be a problem for vulkan, not GL, since glsl has its own
lowering pass if the half float packing instructions aren't supported by the
driver.
Ideally we'd fix this by just always using llvm half type for f16, however
still not all llvm backends can handle it.
So instead do some hacky bitcasts around the fpext/fptrunc calls with f16,
which works on x86 even when not supporting f16c. Other llvm backends not
really supporting halfs will still crash there as before (albeit it should
be a "cleaner" crash as the IR is now correct...), but at least keeps them
running for more ordinary things such as f16 texture sampling / render
targets (which they wouldn't if we'd use llvm half type everywhere).

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13807
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13865
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37344>
This commit is contained in:
Roland Scheidegger 2025-09-17 22:35:06 +02:00 committed by Marge Bot
parent bbab69d343
commit 0064baeaa5
3 changed files with 41 additions and 7 deletions

View file

@ -230,8 +230,16 @@ LLVMValueRef
lp_build_zero(struct gallivm_state *gallivm, struct lp_type type)
{
if (type.length == 1) {
if (type.floating)
return lp_build_const_float(gallivm, 0.0);
if (type.floating) {
if (type.width == 16)
return lp_build_const_elem(gallivm, type, 0.0);
else if (type.width == 32)
return lp_build_const_float(gallivm, 0.0);
else {
assert(type.width == 64);
return lp_build_const_double(gallivm, 0.0);
}
}
else
return LLVMConstInt(LLVMIntTypeInContext(gallivm->context, type.width), 0, 0);
} else {

View file

@ -198,7 +198,8 @@ lp_build_float_to_half(struct gallivm_state *gallivm,
if (length == 4) {
result = lp_build_extract_range(gallivm, result, 0, 4);
}
result = LLVMBuildBitCast(builder, result, lp_build_vec_type(gallivm, lp_type_float_vec(16, 16 * length)), "");
result = LLVMBuildBitCast(builder, result,
lp_build_vec_type(gallivm, lp_type_float_vec(16, 16 * length)), "");
}
else {

View file

@ -3241,12 +3241,37 @@ do_alu_action(struct lp_build_nir_soa_context *bld,
case nir_op_f2f16:
case nir_op_f2f32:
case nir_op_f2f64:
/*
* FIXME: for f2f16 we can get here even without lp_has_fp16(), despite
* not announcing f16 capability. Should not happen from GL side (glsl has
* lowering pass for f16 pack/unpack functions if not supported) however
* from Vulkan side things like quantizeToF16, unpack2x16float use this.
* Need to fix up the llvm type in this case at least, although on some
* platforms this will still crash if the llvm backend doesn't handle half
* type on its own (but works on x86 even without f16c).
* Alternatively could use lp_build_half_to_float/float_to_half() but need
* to modify to handle rounding differently (need RNE instead of RTZ here).
* Ideally would be able to just use llvm half type consistently for f16,
* but not all backends are ready for that.
*/
if (src_bit_size[0] > instr->def.bit_size) {
result = LLVMBuildFPTrunc(builder, src[0],
dst_float_bld->vec_type, "");
LLVMTypeRef llvm_dst_type = dst_float_bld->vec_type;
if (instr->def.bit_size == 16 && !lp_has_fp16()) {
llvm_dst_type = LLVMHalfTypeInContext(gallivm->context);
if (dst_float_bld->type.length != 1)
llvm_dst_type = LLVMVectorType(llvm_dst_type, dst_float_bld->type.length);
}
result = LLVMBuildFPTrunc(builder, src[0], llvm_dst_type, "");
/* don't need a bit cast for f16, caller already has one */
} else {
result = LLVMBuildFPExt(builder, src[0],
dst_float_bld->vec_type, "");
LLVMValueRef src0 = src[0];
if (src_bit_size[0] == 16 && !lp_has_fp16()) {
LLVMTypeRef llvm_src_type = LLVMHalfTypeInContext(gallivm->context);
if (dst_float_bld->type.length != 1)
llvm_src_type = LLVMVectorType(llvm_src_type, dst_float_bld->type.length);
src0 = LLVMBuildBitCast(builder, src0, llvm_src_type, "");
}
result = LLVMBuildFPExt(builder, src0, dst_float_bld->vec_type, "");
}
break;
case nir_op_f2i8: