only the GL driver actually wants this, neither panvk nor internal shaders do.
Cc'd as a prereq to the next patch
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
This is complicated by two things: mediump varyings, and the lack of u16
regfmt support in LD_VAR.
With mediump, a load(_interpolated)_input with a 16-bit dest size may
either be an explicit 16-bit type or a mediump type lowered by
nir_lower_mediump_io. With explicit 16-bit types, we write 16-bit values
in the VS, but with mediump we write 32-bit in the VS (for messy
reasons). bi_emit_load_vary needs to distinguish these cases by checking
for a mediump type, and set the appropriate source_format to convert the
type on the LD_VAR_BUF path. Types like 'mediump uint16' are luckily not
allowed.
The missing u16 regfmt for LD_VAR means that we take the obvious
approach for 16-bit int varyings of emitting 16-bit int formats in the
attribute descriptor and loading them to u16. Instead, we just
write/read all 16-bit varyings as f16 regardless of type. Unlike with
mediump, we don't need to do any 32bit->16bit conversion when loading in
the FS, so as long as we use the same type between the attribute
descriptor and LD_VAR, the conversion is a no-op and the mismatch
doesn't matter.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33078>
For Bifrost and newer, we always write mediump varyings from a 32-bit
source in the VS. This is needed because the FS does not unconditionally
lower mediump to 16-bit.
Previously we worked around this in panvk by replacing 16-bit formats
with 32-bit in emit_varying_descs, but once we support
storageInputOutput16, we will need to preserve 16-bit formats for
explicit 16-bit varyings.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33078>
Neither of these changes are a behavior difference. The change to
emitting uint16 formats from pan_collect_varyings for PSIZ is
inconsequential because neither panvk nor the gallium driver emit
attribute descriptors for special varyings.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33078>
The current implementation uses LD_VAR_BUF[_IMM] to look up varyings,
which limits the number of varying components to 64 due to an 8-bit
offset value.
As this does not align to maxVertexOutputComponents (128), this change
replaces the use of LD_VAR_BUF[_IMM] with LD_VAR[_IMM] + Attribute
Descriptors, which do not have this limitation.
As allocating Attribute Descriptors is potentially expensive, this can
be further optimized by falling back to LD_VAR_BUF[_IMM] in cases where
we can ensure we do not use more than 64 varying components.
This change currently does not change behavior for gallium/panfrost,
though that should be done as well.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32969>
The ATEST instruction needs sample_mask as an input, but if the
shader writes to color before sample_mask we could emit them
in the wrong order. Fix this in pan_nir_lower_zs_store by
deferring the color write until after the sample_mask write.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32879>
It's empty now, so we don't need to include it from the packer headers.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32899>
load_vertex_id_zero_base() is supposed to return the zero-based
vertex ID, which is then offset by load_first_vertex() to get
an absolute vertex ID. At the same time, when we're in a Vulkan
environment, load_first_vertex() also encodes the vertexOffset
passed to the indexed draw.
Midgard/Bifrost have a sligtly different semantics, where
load_first_vertex() returns vertexOffset + minVertexIdInIndexRange,
and load_vertex_id_zero_base() returns an ID that needs to be offset
by this vertexOffset + minVertexIdInIndexRange to get the absolute
vertex ID. Everything works fine as long as all the load_first_vertex()
and load_vertex_id_zero_base() calls are coming from the
load_vertex_id() lowering. But as mentioned above, that's no longer
the case in Vulkan, where gl_BaseVertexARB will be turned into
load_first_vertex() and expect a value of vertexOffset in an
indexed draw context.
We thus need to fix the mismatch by introducing two new
panfrost-specific intrinsic so we can stop abusing load_first_vertex()
and load_vertex_id_zero_base().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32415>
This isn't hooked up yet, but should be a significant performance
improvement when available.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
When linking FS and VS with mismatched interpolation qualifiers, we need
to read the FS qualifiers and pass them to the VS. I put
nir_collect_noperspective_varyings in a separate function instead of
merging it into the existing walk_varyings loop because it will later be
used on uncompiled shaders that don't have a pan_shader_info yet.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
Mali only supports perspective-correct varying interpolation in
hardware, so we have to emulate noperspective with lowering in both the
VS and FS.
Both vulkan and opengl allow mismatched interpolation qualifiers between
stages. Because we need all varyings that are noperspective in the FS to
be lowered in the VS, we cannot rely on the interpolation qualifiers in
the VS. Loading the set of noperspective varyings as a sysval allows the
implementation to pass them as a compile-time constant when known
statically, or a runtime push constant when not. Passing noperspective
varyings dynamically has a performance cost with unnecessary branches
and fmuls.
This sysval is not hooked up yet in either panfrost or panvk, so shader
compilation will fail.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
This is needed for noperspective lowering, where we need to multiply the
varying value by gl_FragCoord.w at the same barycentric as the varying.
Normal nir_load_frag_coord_zw instructions are lowered to the new
intrinsic on bifrost with the pan_lower_frag_coord_zw pass.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
Previous code was assuming that load_sample_id loaded the hardware
sample ID register, which is 32 when sample shading is disabled. The
expectation was that we would read (0.5,0.5) from sample_positions[32].
Because the top 3 bits of the sample ID register are masked out in
bi_load_sample_id_to, we were instead reading the position of the first
sample.
This doesn't affect OpenGL, because opengl never uses
nir_load_sample_pos when sample shading is disabled.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: 60146cc57c ("panvk: implement sampleRateShading")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32601>
This tripped me up in the multiview implementation. The commit message
that introduced the pass mentioned that we're relying on
nir_lower_io_to_temporaries, but this was dropped when it was copied to
the comment block.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
In Valhall multiview, position/varying shaders are invoked once per
draw. Each invocation write separate outputs for all views. Fragment
processing is handled by the existing multilayer support. Note that
because the hardware only supports up to 8 views, we don't have to care
about the case where there are too many layers to fit in one tiler when
multiview is enabled.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
It's generally useful to use mesa_log for error messages etc. This makes
it easier to forward diagnostics into the right logs etc.
So let's be more consistent about where we're logging things.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32094>
Writing depth/stencil when update/kill is set to force-early seems to
trip out Valhall GPUs. Since depth/stencil writes are supposed to be
ignored in that case anyway, drop them from shader.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31678>
V3D can use these too.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
On bifrost we only can use 3 coordinates for images, but
image2DMSArray needs 4 (x, y, sample#, and array index).
We work around this by making the image nr_samples times
higher than the original image, using the Y coordinate to
address the sample plane. This limits the maximum image
height (to 4K pixels instead of 64K pixels in the 16 sample
case) but at least allows us to use the images.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30521>
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
nir_lower_frag_coord_to_pixel_coord was adding .5 to work around that the
drivers were mistakenly setting PIXEL_COORD_HALF_INTEGER. With the
setting corrected, the GL frontend handles it appropriately (instead of
subtracting half in the frontend for ARB_fragment_coord_conventions
integer setting and then adding the half back here), and makes the pass
reusable from Intel.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29585>
For some reason, flat shading on T604 does not work when using auto32 varyings
type.
This commit introduces a quirk for T60x, and some plumbing in pan_nir, allowing to
explicitely use appropriate types, rather than always using .u32 for flat shading.
Backport-to: 24.1
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10632
Signed-off-by: Alexandre Marquet <tb@a-marquet.fr>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28146>
The gallium and vulkan drivers deal with vertex attribute emission
differently. The gallium driver re-emits the VS attributes on each
draw, while the vulkan driver uses explicit attribute/image
descriptor dirtiness tracking, and could keep the attribute array
around if a new pipeline using a different number of attribute is
bound. If we want to be able to do that, we need to assign a fixed
offset for image attributes, such that the Vulkan descriptor
lowering pass knows where the images are in the attribute table.
We could teach the Bifrost backend how to deal with a custom offset
but it doing that in a lowering pass also simplifies the Midgard
code.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28200>
This also fix missing encoding of indice with non immediate index.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27846>
Lower tex/sampler table in indices on panfrost.
This also implement wide indices and change the format of texture and sampler
indices received by the compiler.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27846>
Rework the way we compute thread info to make it mostly GPU-agnostic
outside of the kmod backend.
The new logic is based on the following information extracted from
GPU registers:
- mximum number of threads per core
- maximum number ot threads per workgroup
- number of registers per core
If the GPU doesn't provide this information (registers are zero), we
pick the per-arch defaults we had in panfrost_max_thread_count().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358>
Panfrost generally treats 2D multisampled images like 3D images,
with the R coordinate holding the sample index. This commit adds
a lowering pass to convert 2DMS images to 3D for the compiler. It
is not actually invoked yet.
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27626>
If dual blending is enabled, only 1 output is supported. Multiple
outputs confuse the write combining pass in this case, leading to
incorrect output and/or an assert failure in emit_fragment_store.
The fix is straightforward, just skip the speculative emitting of
multiple outputs in the case where dual source blending is enabled.
This also adds an extra sanity check in `pan_nir_lower_zs_store` to
check for only one blend store being present.
Fixes: c65a9be421 ("panfrost: Preprocess shaders at CSO create time")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9487
Co-Authored-By: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26474>
Instead, we replace every use of it with nir_def. Most of this commit
was generated by sed:
sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
A few manual fixups were required in lima and the nir_legacy code.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
We could add a nir_def_num_components() helper but we use
ssa.num_components about 3x as often as nir_dest_num_components() today
so that's a major Coccinelle refactor anyway and this doesn't make it
much worse. Most of this commit was generated byt the following
semantic patch:
@@
expression D;
@@
<...
-nir_dest_num_components(D)
+D.ssa.num_components
...
Some manual fixup was needed, especially in cpp files where Coccinelle
tends to give up the moment it sees any interesting C++.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
We could add a nir_def_bit_size() helper but we use ->bit_size about 3x
as often as nir_dest_bit_size() today so that's a major Coccinelle
refactor anyway and this doesn't make it much worse. Most of this
commit was generated byt the following semantic patch:
@@
expression D;
@@
<...
-nir_dest_bit_size(D)
+D.ssa.bit_size
...
Some manual fixup was needed, especially in cpp files where Coccinelle
tends to give up the moment it sees any interesting C++.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>