With cl_khr_fp16 we can get texture instructions w/ f16 dest. Not all
drivers handle this, so convert to 32b dest and insert alu conversion to
the requested type. Drivers that can handle f16 texture loads would
fold away the extra conversion with nir_opt_16bit_tex_image.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35470>
This was an oversight when BitSet was parameterized on a key type.
BitSetStream needs to also take a key type to prevent users from mixing
different key types in binary operators. Constraining this makes BitSet
usage more type safe.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35328>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
Not strictly scaling, but we upcast fo fp32, do the fdiv there and cast
back again.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34053>
If we can't find an appropiate builtin in the libclc library, we add our
own wrapper at runtime executing the op in fp32 space.
Libclc has variying support for fp16 opcodes and with a libclc prior
llvm-19 it does not work as good as with the newer one.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34053>
no_varying cannot be used to eliminate stores on locations which may
be subsequently read
Fixes: 0058989357 ("nir/lower_io_to_scalar: don't create output stores that have no effect")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35325>
I think this was unlikely to cause issues, even if the qsort()
implementation is unstable.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Fixes "left shift of negative value -128" with parallel_rdp/00f93a9497dfbb3b
and UBSan.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Without this cast, the left shift is promoted to 'int'.
Fixes "left shift of 50432 by 16 places cannot be represented in type 'int'"
with horizon_zero_dawn/001064f580f8e3be and UBSan.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Use the param type, not the referenced variable. The referenced variable
can be a structure, which wouldn't be recognized as a sampler or image.
Fixes: 733bee57eb - glsl: lower samplers with highp coordinates correctly
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel Dieter@nuetzel-hh.de on gfx8 (Polaris 20)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34959>
Normal fmin doesn't make any promises about NaN, common additionally
doesn't make any promises about infinities. Would be nice to hook that
up to codegen but lowering them to normal works for now.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34941>
Kepler cards do not support shared atomic operations directly, but they
have special ldslk and stsul that can implement mutex locks on
addresses. Shared atomics can be lowered into operations in mutexes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35028>
We still allow mesh shader promote constant output to flat, but
mesh shader like geometry shader may store multi vertices'
varying in a single thread. So mesh shader may store different
constant values to different vertices in a single thread, we
should not promote this case to flat.
I'm not using shader_info.mesh.ms_cross_invocation_output_access
because OpenGL does not require IO to have explicit location, so
when nir_shader_gather_info is called in OpenGL GLSL compiler to
compute ms_cross_invocation_output_access, some implicit output
has -1 location which causes ms_cross_invocation_output_access
unset for it.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13134
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35081>