ctz is a CL2.0 opcode but 3.0 requires it as well so just add support
for it.
Tested against CTS integer_ops integer_ctz test.
(long line broken up)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7468>
The SPV_KHR_ray_tracing extension adds 6 new storage classes which is a
bit on the ridiculous side. In order to avoid adding that many variable
modes to NIR, we make a few simplifying assumptions:
1. CallableData and RayPayload data actually lives on the stack
somewhere, presumably in the caller's stack. We assume that these
are no different from global variables and use nir_var_shader_temp
for them. We still need a separate storage class for the incoming
variants but only so we can figure out which one the incoming one
is and lower it to something useful.
2. There's no difference between incoming CallableData and RayPaolad
data. We can use a single storage class for both.
3. ShaderRecordBuffer data is just a global memory access. This lets
us avoid NIR variables entirely and just fetch the pointer via the
shader_record_ptr system value and it's accessed using a 64-bit
global memory pointer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
Missing in this commit are NIR intrinsics for the ObjectToWorld and
WorldToObject built-ins. Those are matrices and so they take a bit more
work and justify a separate commit. For now, we add the enums and leave
the SYSTEM_VALUE <-> nir_intrinsic conversion commented out.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
For now, we assume its a 64-bit global pointer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
We already fail in these same cases in vk_desc_type_for_mode. These
additional assertions are just extra code to update.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
Most of this is fairly straightforward; we just set all the modes on any
derefs which are generic. The one tricky bit is OpGenericCastToPtrExplicit.
Instead of adding NIR intrinsics to do the cast, we add NIR intrinsics
to do a storage class check and then bcsel based on that.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We rename it to "modes" to make it clear that it may contain more than
one mode and adjust all the uses of nir_deref_instr::modes to attempt to
handle multiple modes.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
This will create code that is easier to combine into MADs/FMA when the
last component of the vector is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 4296 (3.75% of 114665) affected shaders:
SGPRs: 283468 -> 283764 (+0.10%); split: -0.02%, +0.12%
VGPRs: 172868 -> 172904 (+0.02%); split: -0.09%, +0.11%
CodeSize: 14045312 -> 14027128 (-0.13%); split: -0.15%, +0.02%
MaxWaves: 59285 -> 59282 (-0.01%); split: +0.04%, -0.05%
Instrs: 2703507 -> 2683187 (-0.75%); split: -0.76%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
`bswap_32` and `<byteswap.h>` aren't available on BSDs. Instead the
same function is spelled slightly different and is provided by
different header file. However, Mesa provides `util_bswap32` to avoid
complicated conditionals.
Fixes: fb6b243c11 ("spirv: Support big-endian strings")
Tested-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7257>
Just casting to a float is insufficient because that gives us the upper-left corner
of the texel rather than the center.
Fixes: 701cb9d60c "nir/vtn: Handle integer sampling coordinates"
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
Note, the aligned versions aren't handled specially yet.
The float16buffer capability is now at least partially supported after
this patch, so move it to be supported when kernels are supported.
v2 (Jason Ekstrand):
- A few cosmetic cleanups around type/base_type
- Rebased on top of the big SPIR-V SSA value rework
- Use the new version of the conversion helpers
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
At that point in the function, we don't know if it's a load or a store
so calling it dest_type isn't really helpful. Also, we don't really
want the glsl_type; we want the base_type.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This is done for kernels via the new convert_alu_types intrinsic. For
Vulkan and OpenGL, we maintain the old path so that drivers don't have
to add that lowering pass.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
We're about to introduce conversion ops which are going to want two
different types. We may as well just split the one we have rather than
end up with three. There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
All of these are pretty well-defined. Rather than implementing them
as a sequence of nir ops, we can just use the libclc implementation.
v2 (idr): Delete functions that are now unused.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035>
Specifically, fmod only uses libclc if it was going to be lowered.
Also, add missing half_divide and half_recip handling.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035>
Adds an additional hook for spirv_to_nir to handle a core opcode via
the OpenCL libclc infrastructure, and adds handling for SpvOpGroupAsyncCopy and
SpvOpGroupWaitEvents.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035>