This hoists all the annoyance of figuring out the current pixel's input
attachment coordinates to the driver. The pass still deals with all the
annoyance of turning an image instruciton into a texture instruction but
it gives the driver more control over the position. For most drivers,
this will be something like ivec3(int(gl_FragCoord.xy), gl_Layer) or
similar, some drivers need something more nuanced. Turnip, for
instance, needs unscaled coordinates for some attachments and NVK
doesn't really want gl_Layer or gl_ViewIndex for the layer. It's better
to just have a new system value that drivers can make what they want.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
The SPIR-V spec is pretty clear that coordinates on subpass attachments
are relative to the current pixel. They're required to be zero but we
should stay consistent with ourselves (we already do this for image
intrinsics) and with the spec.
Fixes: 84b08971fb ("nir/lower_input_attachments: lower nir_texop_fragment_{mask}_fetch")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
There's nothing in NIR which guarantees that the deref is the first
source or that the coordinate is the second. Use
nir_tex_instr_src_index() to get the actual indices.
Fixes: 84b08971fb ("nir/lower_input_attachments: lower nir_texop_fragment_{mask}_fetch")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
There are currently two places where we have to handle values that are
logically 64b: 64b atomics and 64b global addresses. For the former, we
ingest the values as 64b from NIR, while the latter uses 2x32b values.
This commit makes things more consistent by using 64b NIR values for
global addresses as well.
Of course, we could also go the other way around and use 2x32b values
everywhere, which would make things consistent as well. Given that ir3
doesn't actually have 64b registers, and 64b values are represented by
collected 2x32b registers, this could actually make more sense.
In the end, both methods are mostly equivalent and it probably doesn't
matter too much one way or the other. However, the reason I have a
slight preference for ingesting things as 64b is that it allows us to
use more of the generic NIR intrinsics, which use 1-component values for
64b addresses or atomic values. This commit already makes
global_atomic(_swap)_ir3 obsolete and I'm planning to create generic
intrinsics to support ldg.a/stg.a as well.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33797>
nir_unsigned_upper_bound is good enough that this isn't needed anymore.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35514>
If nir_opt_vectorize_io isn't called, 16-bit IO is broken.
This is a workaround to keep RADV working and consume incorrect NIR
while other drivers consume correct NIR.
Hopefully this will be removed ASAP.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35315>
We will use it to skip lowering mediump IO to 16 bits if the previous
shader has XFB and mediump XFB doesn't work.
Drivers can decide not to lower mediump IO to 16 bits if nir->xfb_info
!= NULL for the pre-rast stage and if nir->info.prev_stage_has_xfb is
true for the FS stage. That way, drivers can lower mediump to 16 bits
for all cases except XFB.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35315>
Hopefully this doesn't break it (we may even lack tests), but we need to
know in prelink_lowering whether XFB is enabled or not.
The next commit that adds shader_info::prev_stage_uses_xfb depends on this.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35315>
NIR represents low bits of 16-bit IO as a separate vec4, and high bits as
another separate vec4. There is no representation that allows vec8.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35315>
When lowering mediump to 16 bits, this will allow drivers to enable
the lowering only for certain pairs of stages, e.g. a driver can lower
mediump for VS->FS, but not GS->FS.
This could also be useful for other things.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35315>
With cl_khr_fp16 we can get texture instructions w/ f16 dest. Not all
drivers handle this, so convert to 32b dest and insert alu conversion to
the requested type. Drivers that can handle f16 texture loads would
fold away the extra conversion with nir_opt_16bit_tex_image.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35470>
This was an oversight when BitSet was parameterized on a key type.
BitSetStream needs to also take a key type to prevent users from mixing
different key types in binary operators. Constraining this makes BitSet
usage more type safe.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35328>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
Not strictly scaling, but we upcast fo fp32, do the fdiv there and cast
back again.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34053>
If we can't find an appropiate builtin in the libclc library, we add our
own wrapper at runtime executing the op in fp32 space.
Libclc has variying support for fp16 opcodes and with a libclc prior
llvm-19 it does not work as good as with the newer one.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34053>
no_varying cannot be used to eliminate stores on locations which may
be subsequently read
Fixes: 0058989357 ("nir/lower_io_to_scalar: don't create output stores that have no effect")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35325>