The spirv-to-c-array.py script assembles a SPIR-V module,
then disassembles it, capturing that text, then re-assembles
that text, providing it on stdin. But this last invocation of
spirv-as must use '-' to specify that the text input appears
on stdin.
Currently it always errors out, complaining that there must
be exactly one input file.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35691>
Currently anv exposes this on lots of devices, with the intent to be better
than apps can give, but I think this is wrong for a couple of reasons.
Apps want to know if hw exposes the fast path, Vulkan is meant to be explicit,
and telling llama.cpp if the fast path exists lets it make smarter decisions.
It seems unless someone heavily optimises the slow path, that CPU is usually
faster than GPU with llama-bench unless the hw path exists.
v2: added INTEL_LOWER_DPAS support
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35564>
It is annoying to change all function signatures when a driver needs
more information. There are also some callbacks that have a lot of
parameters and there have already been bugs related to that.
This patch tries to clean the interface by adding a struct that contains
all information that might be relevant for the driver and passing that
to most callbacks.
radv changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
anv changes are:
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
turnip changes are:
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vulkan runtime changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
The spec says (emphasis mine):
If the color attachment is fixed-point, the components of the source and
destination values **AND BLEND FACTORS** are each clamped to [0,1] or [-1,1]
respectively for an unsigned normalized or signed normalized color attachment
prior to evaluating the blend operations. If the color attachment is
floating-point, no clamping occurs.
However, neither the CTS nor any hardware implement this semantic.
For unsigned normalized formats, the definitions are roughly equivalent (except
perhaps around constant colours). 0 <= x <= 1 implies that 0 <= 1 - x <= 1.
Therefore if the source/destination colours are clamped to [0, 1], then their
complements are also in [0, 1], so clamping any blend factor (except constant
colour) has no effect if the source/dest were already clamped.
For signed normalized formats, however, this difference matters. -1 <= x <= 1
implies that 0 <= 1 - x <= 2... so to implement the spec text faithfully, we
would need to clamp again the complemented colour blend factors to return back
to signed normalized range. Software blending implementations can of course do
that... but doing so causes CTS fails, as the CTS reference renderer does not do
this.
This commit adjusts nir_lower_blend to match what actual hardware does, what CTS
requires, and what the spec should have said.
See https://gitlab.khronos.org/vulkan/vulkan/-/issues/4293 for the spec
resolution.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35519>
virtio-gpu cross-domain context supersedes the downstream virtio-wl.
However, there's no compositor working well with cross-domain context
yet (sommelier from ChromeOS only works robustly with virtio-wl while
being pretty broken with cross-domain). So let's just drop the legacy
piece to avoid confusions.
Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35680>
Spec recommended values should give us a good balance between progress
in render and compute engines, also with less possibility of values
it will reduce the number of times that we need to emit
STATE_COMPUTE_MODE reducing the number of stalls in the compute engine.
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
Spec has several restrictions to the values we program to compute
engine async thread limits.
Without those we risk hit deadlocks, so here adding a function to
return the optimal value based on those restrictions.
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
If attachments are used in separate subpasses, then we can store them in
the same GMEM area. That gives us more space per attachment and thus
larger tile sizes. For gfxbench vk-5-normal's main renderpass it goes
from 128x128 to 160x128, improving perf by 2.125% +/- 0.194356% (n=4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
This can reduce the subpass live range of attachments, for future gmem
attachment space sharing.
We have to disable IB2 skipping when the subpass isn't the last, but being
able to reuse the gmem space by storing early ends up paying off (in the
next commit).
Fixes: #5181
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
Structured tagging ensures that we are building and testing the current
component version specified in the commit by matching the checksum of
the related build script file.
In this case, it is worthy to isolate the Android CTS version part,
because we don't need to rebuild the entire test-android container when
we change the CTS version or the CTS modules filtering.
PS: actually the new file `build-android-cts.sh` is not building
anything, it is just downloads, filters, compress and reupload the
stripped version to S3. The `build-` prefix is to make it work
transparently with `bin/ci/update_tag.py` script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
The CTS is almost 9GB, when we unzip it locally (on Fedora 41 at least)
it is partially extracted because it is falsely detected as a zipbomb.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
The CTS has almost 9GB atm, which takes almost 20 minutes to download
it. Moreover, it is stripped down after that, so we don't need the entire
file anyway.
So let's move this artifact to S3 in a similar way that we do with
fluster vectors.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
Our index manipulation was only okay because of the order in which
spirv_to_nir constructs texture sources. It's better if we always
reference sources by type and not by index. Otherwise, we risk
screwing up indices when we remove a source.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
This adds a variant of nir_steal_tex_src() which is for derefs as well
as versions that just return the source without removing it.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
When dumping nir validation errors, flush stderr before
calling abort. Otherwise the errors might not be emitted.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35665>
This change was tested on rv770, palm, barts and cayman:
spec/amd_framebuffer_multisample_advanced/api-glcore: skip pass
spec/amd_framebuffer_multisample_advanced/api-gles3: skip pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35429>
The snorm formats are not compatible with the srf flag
which was set by the emit_image_load_or_atomic() function.
In this specific case, "use_const_fields" is not set which
implies that the format definition is local. The other
supported formats do not require the srf flag as well.
This change was tested on cypress, barts and cayman. Here are the tests fixed:
khr-gl4[2-6]/shader_image_load_store/basic-allformats-load: fail pass
khr-gl4[2-6]/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gles31/core/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gles31/core/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/rgba8_rgba8_snorm: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35548>
The mode r10g10b10a2_snorm processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.
This change fixes this issue by adding three gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.
.w(2-bits) expected palm
0 0.0 0.000000
1 1.0 0.333333
2 -1.0 0.666667
3 -1.0 1.000000
w_out = (4.*w_in > 1. ? 1. : 4.*w_in) - (w_in > 0.5 ? 2. : 0.);
0002 00000008 A0080000 ALU 3 @16
0016 00000C02 A0000CC0 1 y: MOV*4_sat __.y, R2.w
0018 801F8C02 600004A0 w: SETGT*2 __.w, R2.w, 0.5
0020 839FC4FE 60400010 2 w: ADD R2.w, PV.y, -PV.w
Note: The rv770 and cypress don't need this correction. This is
definitely a hardware change between these gpus.
This change was tested on palm, barts and cayman. Here are the tests fixed:
spec/arb_vertex_type_2_10_10_10_rev/arb_vertex_type_2_10_10_10_rev-array_types: fail pass
deqp-gles3/functional/draw/random/124: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads1: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads256: fail pass
khr-gl43/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl44/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl45/vertex_attrib_binding/basic-input-case5: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32427>