Commit graph

207385 commits

Author SHA1 Message Date
Christian Gmeiner
e64390a056 etnaviv: nir: Move pre-halti5 tex lowering
Let's have all the texture lowerings in one place.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35696>
2025-06-23 22:50:38 +00:00
Yiwei Zhang
84cb568150 vulkan/wsi: drop wsi_common_get_images
..since there's no users of it now.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35702>
2025-06-23 22:05:19 +00:00
Yiwei Zhang
8d8bdd2b1b dozen: drop redundant dzn_swapchain_get_image api
This drops the last usage of wsi_common_get_images.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35702>
2025-06-23 22:05:19 +00:00
David Neto
e9b9f1f764 spirv: spirv-to-c-array: use '-' to specify stdin
The spirv-to-c-array.py script assembles a SPIR-V module,
then disassembles it, capturing that text, then re-assembles
that text, providing it on stdin. But this last invocation of
spirv-as must use '-' to specify that the text input appears
on stdin.

Currently it always errors out, complaining that there must
be exactly one input file.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35691>
2025-06-23 21:24:56 +00:00
Dave Airlie
29c599ffea anv: only expose VK_KHR_cooperative_matrix on devices with hw instructions.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Currently anv exposes this on lots of devices, with the intent to be better
than apps can give, but I think this is wrong for a couple of reasons.

Apps want to know if hw exposes the fast path, Vulkan is meant to be explicit,
and telling llama.cpp if the fast path exists lets it make smarter decisions.

It seems unless someone heavily optimises the slow path, that CPU is usually
faster than GPU with llama-bench unless the hw path exists.

v2: added INTEL_LOWER_DPAS support

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35564>
2025-06-23 21:06:51 +00:00
Konstantin Seurer
4cbbdc0a50 vulkan: Pass a structure to most BVH build callbacks
It is annoying to change all function signatures when a driver needs
more information. There are also some callbacks that have a lot of
parameters and there have already been bugs related to that.

This patch tries to clean the interface by adding a struct that contains
all information that might be relevant for the driver and passing that
to most callbacks.

radv changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>

anv changes are:
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

turnip changes are:
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

vulkan runtime changes are:

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:43 +00:00
Konstantin Seurer
a73824a59d vulkan: Remove bvh_state::leaf_node_size
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:43 +00:00
Konstantin Seurer
28713789ad vulkan: Replace get_*_key with get_build_config
It is a cleaner API and gives more control about the build to the
driver.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:43 +00:00
Konstantin Seurer
37869a9bc2 vulkan: Move the build options to the accel struct header
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:42 +00:00
Konstantin Seurer
6cae6e8708 vulkan: Allow reserving scratch memory for encode passes
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:42 +00:00
Alyssa Rosenzweig
f336b4d1dd hk: enable snorm rendering
now that blending works.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35519>
2025-06-23 19:38:27 +00:00
Alyssa Rosenzweig
d37bf148d2 nir/lower_blend: fix snorm factor clamping
The spec says (emphasis mine):

  If the color attachment is fixed-point, the components of the source and
  destination values **AND BLEND FACTORS** are each clamped to [0,1] or [-1,1]
  respectively for an unsigned normalized or signed normalized color attachment
  prior to evaluating the blend operations. If the color attachment is
  floating-point, no clamping occurs.

However, neither the CTS nor any hardware implement this semantic.

For unsigned normalized formats, the definitions are roughly equivalent (except
perhaps around constant colours). 0 <= x <= 1 implies that 0 <= 1 - x <= 1.
Therefore if the source/destination colours are clamped to [0, 1], then their
complements are also in [0, 1], so clamping any blend factor (except constant
colour) has no effect if the source/dest were already clamped.

For signed normalized formats, however, this difference matters. -1 <= x <= 1
implies that 0 <= 1 - x <= 2... so to implement the spec text faithfully, we
would need to clamp again the complemented colour blend factors to return back
to signed normalized range. Software blending implementations can of course do
that... but doing so causes CTS fails, as the CTS reference renderer does not do
this.

This commit adjusts nir_lower_blend to match what actual hardware does, what CTS
requires, and what the spec should have said.

See https://gitlab.khronos.org/vulkan/vulkan/-/issues/4293 for the spec
resolution.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35519>
2025-06-23 19:38:27 +00:00
Yiwei Zhang
09d6427c13 docs/venus: keep requirements up to date
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35680>
2025-06-23 19:16:34 +00:00
Yiwei Zhang
a441470315 docs/venus: drop Virtio-WL section
virtio-gpu cross-domain context supersedes the downstream virtio-wl.
However, there's no compositor working well with cross-domain context
yet (sommelier from ChromeOS only works robustly with virtio-wl while
being pretty broken with cross-domain). So let's just drop the legacy
piece to avoid confusions.

Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35680>
2025-06-23 19:16:34 +00:00
José Roberto de Souza
bdd20457ed anv: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER when new async compute limits are needed
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza
b37747ce68 blorp: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza
52bd6fae0e iris: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER when new async compute limits are needed
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza
59d361043e intel/common: Use as much as possible spec recommended values for compute engine async thread limits
Spec recommended values should give us a good balance between progress
in render and compute engines, also with less possibility of values
it will reduce the number of times that we need to emit
STATE_COMPUTE_MODE reducing the number of stalls in the compute engine.

Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza
080b9a165c intel/common: Add function to compute optimal compute engine async thread limits
Spec has several restrictions to the values we program to compute
engine async thread limits.
Without those we risk hit deadlocks, so here adding a function to
return the optimal value based on those restrictions.

Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:24 +00:00
Emma Anholt
61ceaed48c tu: Block ib2 skipping for pre-final subpass resolves.
Fixes a TODO I noticed along the way in this MR.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
2025-06-23 18:40:41 +00:00
Emma Anholt
37ce9a7a61 turnip: Share gmem allocations between attachments.
If attachments are used in separate subpasses, then we can store them in
the same GMEM area.  That gives us more space per attachment and thus
larger tile sizes.  For gfxbench vk-5-normal's main renderpass it goes
from 128x128 to 160x128, improving perf by 2.125% +/- 0.194356% (n=4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
2025-06-23 18:40:41 +00:00
Emma Anholt
ba9d0ba9a0 turnip: Emit tile stores at subpass end time.
This can reduce the subpass live range of attachments, for future gmem
attachment space sharing.

We have to disable IB2 skipping when the subpass isn't the last, but being
able to reuse the gmem space by storing early ends up paying off (in the
next commit).

Fixes: #5181
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
2025-06-23 18:40:41 +00:00
Emma Anholt
33f3e6255d turnip: Move end-of-subpass resolves to a helper function.
Deduplicates a bit of attachment walking code, and drops a note about
something that's not right with how we're doing things.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
2025-06-23 18:40:40 +00:00
Emma Anholt
2b0e6f42de turnip: Fix subpass depth/stencil change detection.
Subpass had a postfix increment, not prefix, so last_subpass was the same
as subpass.

Fixes: 550975f229 ("turnip: Don't disable LRZ in subpasses after the first in the easy case.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17943>
2025-06-23 18:40:40 +00:00
Emma Anholt
d0d3797a4d freedreno/a2xx: Share the shader state create/delete functions.
They were almost exactly the same.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8431>
2025-06-23 18:24:00 +00:00
Emma Anholt
3178b9f4d1 freedreno/a2xx: Dump the intrinsic name instead of a number when compile failing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8431>
2025-06-23 18:24:00 +00:00
Emma Anholt
5a3300f4a3 freedreno/a2xx: Disable interpolated input intrinsics.
We never wanted them, it appears to be a refactor bug.

Fixes: 25d4943481 ("nir: make use_interpolated_input_intrinsics a nir_lower_io parameter")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8431>
2025-06-23 18:24:00 +00:00
Emma Anholt
58187a98c7 ir3: Enable NIR matrix reassociation.
We have to be careful about the placement of this to run after preamble,
otherwise you start executing more instructions in the case of uniform *
uniform * varying.

fossils on a730 (ANGLE affected):
Totals from 45 (0.09% of 50500) affected shaders:
Instrs: 6548 -> 6456 (-1.41%)
CodeSize: 14032 -> 13512 (-3.71%)
NOPs: 364 -> 476 (+30.77%)
MOVs: 91 -> 157 (+72.53%)
Full: 506 -> 508 (+0.40%)
(sy)-stall: 2691 -> 2507 (-6.84%)
Cat0: 411 -> 523 (+27.25%)
Cat1: 195 -> 261 (+33.85%)
Cat2: 1980 -> 1710 (-13.64%)

GL on a630 (mostly ANGLE, skia affected)

total instructions in shared programs: 11097572 -> 11096713 (<.01%)
instructions in affected programs: 43897 -> 43038 (-1.96%)

total nops in shared programs: 1549278 -> 1549333 (<.01%)
nops in affected programs: 4027 -> 4082 (1.37%)

total non-nops in shared programs: 9548294 -> 9547380 (<.01%)
non-nops in affected programs: 39625 -> 38711 (-2.31%)

total mov in shared programs: 290948 -> 290888 (-0.02%)
mov in affected programs: 3382 -> 3322 (-1.77%)

total dwords in shared programs: 21428802 -> 21426224 (-0.01%)
dwords in affected programs: 65354 -> 62776 (-3.94%)

total last-baryf in shared programs: 381845 -> 381784 (-0.02%)
last-baryf in affected programs: 278 -> 217 (-21.94%)

total last-helper in shared programs: 2613970 -> 2614334 (0.01%)
last-helper in affected programs: 31582 -> 31946 (1.15%)

total full in shared programs: 783901 -> 783989 (0.01%)
full in affected programs: 128 -> 216 (68.75%)

total cat0 in shared programs: 1722310 -> 1722365 (<.01%)
cat0 in affected programs: 4291 -> 4346 (1.28%)

total cat1 in shared programs: 478775 -> 478715 (-0.01%)
cat1 in affected programs: 3406 -> 3346 (-1.76%)

total cat2 in shared programs: 4245797 -> 4245335 (-0.01%)
cat2 in affected programs: 4148 -> 3686 (-11.14%)

total cat3 in shared programs: 4134879 -> 4134483 (<.01%)
cat3 in affected programs: 1971 -> 1575 (-20.09%)

total cat6 in shared programs: 167122 -> 167126 (<.01%)
cat6 in affected programs: 433 -> 437 (0.92%)

total sstall in shared programs: 633551 -> 633541 (<.01%)
sstall in affected programs: 47 -> 37 (-21.28%)

total (ss) in shared programs: 188548 -> 187926 (-0.33%)
(ss) in affected programs: 1320 -> 698 (-47.12%)

total systall in shared programs: 1632601 -> 1632496 (<.01%)
systall in affected programs: 14152 -> 14047 (-0.74%)

total (sy) in shared programs: 90995 -> 91000 (<.01%)
(sy) in affected programs: 394 -> 399 (1.27%)

total waves in shared programs: 1530906 -> 1530892 (<.01%)
waves in affected programs: 38 -> 24 (-36.84%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>
2025-06-23 17:49:51 +00:00
Emma Anholt
bc8994cb48 nir: Add a pass to reassociate multiplication of mat*mat*vec.
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>
2025-06-23 17:49:51 +00:00
Timothy Arceri
21ea8c205f nir: raise NIR_SEARCH_MAX_VARIABLES limit to 24
This is required to process the pattern in the following patch.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>
2025-06-23 17:49:51 +00:00
Brian Paul
a5b36600d8 svga: assorted code clean-ups in svga drm code
Fix whitespace/formatting.  Remove trailing whitespace.  Replace
tabs with spaces.  No functional change.

Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Reviewed-by: Neha Bhende <neha.bhende@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35663>
2025-06-23 17:33:18 +00:00
Pohsiang (John) Hsu
d7d31708fa d3d12: fix failure when building with v1.717.0-preview and running on Windows 11 without Agility Pack
remove wrong #else statment preventing falling back to old interface.

Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35690>
2025-06-23 17:18:55 +00:00
Eric Engestrom
2ae9b2ce43 r300/ci: update expectations and document recent flakes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
63e1fbd891 zink/ci: document recent flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
5848e3a894 broadcom/ci: document recent flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
4052348d09 radv/ci: document recent flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
8fa443b3a8 radeonsi/ci: document recent flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
8553cfe125 lavapipe/ci: document recent flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
6eacb756de lavapipe/ci: remove duplicate flakes line
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:19 +00:00
Eric Engestrom
3a58b84033 lavapipe/ci: fix flakes regex
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35693>
2025-06-23 16:58:18 +00:00
Guilherme Gallo
969bb7bd70 ci/android: Use structured tag for Android CTS version
Structured tagging ensures that we are building and testing the current
component version specified in the commit by matching the checksum of
the related build script file.

In this case, it is worthy to isolate the Android CTS version part,
because we don't need to rebuild the entire test-android container when
we change the CTS version or the CTS modules filtering.

PS: actually the new file `build-android-cts.sh` is not building
anything, it is just downloads, filters, compress and reupload the
stripped version to S3. The `build-` prefix is to make it work
transparently with `bin/ci/update_tag.py` script.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
2025-06-23 15:22:45 +00:00
Guilherme Gallo
b26ad1783d ci/android: Disable zipbomb detection for CTS
The CTS is almost 9GB, when we unzip it locally (on Fedora 41 at least)
it is partially extracted because it is falsely detected as a zipbomb.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
2025-06-23 15:22:45 +00:00
Guilherme Gallo
e9a1e22d6c ci/android: Store stripped CTS on S3
The CTS has almost 9GB atm, which takes almost 20 minutes to download
it. Moreover, it is stripped down after that, so we don't need the entire
file anyway.

So let's move this artifact to S3 in a similar way that we do with
fluster vectors.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35596>
2025-06-23 15:22:44 +00:00
Faith Ekstrand
261366cb11 nvk: Use nir_tex_get/steal_src in nvk_nir_lower_descriptors()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Our index manipulation was only okay because of the order in which
spirv_to_nir constructs texture sources.  It's better if we always
reference sources by type and not by index.  Otherwise, we risk
screwing up indices when we remove a source.

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
2025-06-23 14:25:31 +00:00
Faith Ekstrand
bb4c5edda1 nir: Add more tex_src helpers
This adds a variant of nir_steal_tex_src() which is for derefs as well
as versions that just return the source without removing it.

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
2025-06-23 14:25:30 +00:00
Faith Ekstrand
2b40fa09f2 nir: Move nir_steal_tex_src() to nir.h
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
2025-06-23 14:25:30 +00:00
David Neto
bff2b1b947 mesa: flush stderr when dumping nir validation errors
When dumping nir validation errors, flush stderr before
calling abort. Otherwise the errors might not be emitted.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35665>
2025-06-23 13:36:09 +00:00
Patrick Lerda
b986550b92 r600: enable AMD_framebuffer_multisample_advanced
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This change was tested on rv770, palm, barts and cayman:
spec/amd_framebuffer_multisample_advanced/api-glcore: skip pass
spec/amd_framebuffer_multisample_advanced/api-gles3: skip pass

Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35429>
2025-06-23 12:24:42 +00:00
Patrick Lerda
d27ed38d1a r600: fix emit_image_load_or_atomic() snorm formats
The snorm formats are not compatible with the srf flag
which was set by the emit_image_load_or_atomic() function.
In this specific case, "use_const_fields" is not set which
implies that the format definition is local. The other
supported formats do not require the srf flag as well.

This change was tested on cypress, barts and cayman. Here are the tests fixed:
khr-gl4[2-6]/shader_image_load_store/basic-allformats-load: fail pass
khr-gl4[2-6]/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gles31/core/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gles31/core/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/rgba8_rgba8_snorm: fail pass

Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35548>
2025-06-23 12:11:07 +00:00
Patrick Lerda
e8fa3b4950 r600: make vertex r10g10b10a2_snorm conformant on palm and beyond
The mode r10g10b10a2_snorm processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.

This change fixes this issue by adding three gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.

.w(2-bits) expected	palm
0	    0.0		0.000000
1 	    1.0		0.333333
2	   -1.0		0.666667
3	   -1.0		1.000000

w_out = (4.*w_in > 1. ? 1. : 4.*w_in) - (w_in > 0.5 ? 2. : 0.);

0002 00000008 A0080000  ALU 3 @16
 0016 00000C02 A0000CC0     1 y:     MOV*4_sat              __.y,  R2.w
 0018 801F8C02 600004A0       w:     SETGT*2                __.w,  R2.w, 0.5
 0020 839FC4FE 60400010     2 w:     ADD                    R2.w,  PV.y, -PV.w

Note: The rv770 and cypress don't need this correction. This is
definitely a hardware change between these gpus.

This change was tested on palm, barts and cayman. Here are the tests fixed:
spec/arb_vertex_type_2_10_10_10_rev/arb_vertex_type_2_10_10_10_rev-array_types: fail pass
deqp-gles3/functional/draw/random/124: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads1: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads256: fail pass
khr-gl43/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl44/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl45/vertex_attrib_binding/basic-input-case5: fail pass

Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32427>
2025-06-23 11:56:11 +00:00