Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit cc2dbb8ea5)
Although the ORCJIT codepath is fresh and relatively less tested, this
is still better than no llvmpipe at all for those newer architectures
that will not gain MCJIT support, such as LoongArch or RISC-V.
Fixes: 6f02ec5ed1 ("llvmpipe: add an implementation with llvm orcjit")
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit 56f38672a2)
Fixes a "regression" where comically large FPS tests regressed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 19dba854 ("wsi/x11: Rewrite implementation to always use threads.")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30638>
(cherry picked from commit 5a97916fdc)
The versioned libgallium library can be confusing on Android, and it is
probably not even needed there, so simplify the build on Android by
always build the unversioned `libgallium_dri.so` overriding the
`-Dunversion-libgallium=true` option added in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30579
Remove also all the bits that deal with the versioned library which are
not needed anymore.
Fixes: 9568976c52 ("android: fix build in multiple ways")
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30641>
(cherry picked from commit 2d2bc5b307)
This is unneeded in some environments, like ChromeOS and Android. And
for CrOS it specifically causes problems with the gpu sandbox rules.. we
don't want to have to update the sandbox rules for each new mesa
version.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30579>
(cherry picked from commit 19ff16387a)
Valgrind will report some memory possibly lost because of this singleton
(it's dynamically allocated when it is first accessed).
Use atexit() to register a handler that releases this singleton.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5f22e152ad)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
The ownership of the TargetMachine object is released when LPJit
singleton is constructed, leads to a slight memory loss detectable.
Keep the ownership by saving the unique pointer as another class member
named tm_unique.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3423e73cec)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
LoongArch is an architecture too new to have MCJIT support.
Add its support to ORCJIT code.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e16a74c023)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
Currently the mattrs is set according to the softdev convention, with
LSX explicitly disabled because it's troublesome at least on LLVM 17.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 979c364018)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
Only 64-bit is considered now because 32-bit LoongArch Linux support
doesn't exist in upstream yet.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 08425d9aaf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
The `capture_not_overwritten` unit test captures and compares two
backtraces -- one from inside a call to `func_c` and one outside -- and
confirms that they are not identical. That is, that `func_c` is in the
backtrace.
On 32-bit x86, without `-fno-omit-frame-pointer`, the function will not
emit a stack frame. As a result, the unit test fails.
The fix is to compile `func_c` with the flag `-fno-omit-frame-pointer`
to prevent the compiler from optimizing out the stack frame which is
otherwise unneeded.
Bug: https://bugs.gentoo.org/823774
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4091
Fixes: d0d14f3f64 ("util: Add unit test for stack backtrace caputure")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30622>
(cherry picked from commit 05dc4eb536)
* the damage region was not being used correctly (this is a normal rect)
* use_damage was never unset at frame boundary
* original renderArea was never re-set
Fixes: 3d38c9597f ("zink: hook up KHR_partial_update")
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30625>
(cherry picked from commit a7f64c6203)
When closest hit shader is called, the BVH object level
brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using
is the ray origin/direction and apply the transform of the current
instance.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ba7d459a3 ("intel/rt: Implement the new ray-tracing system values")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>
(cherry picked from commit aaff191356)
the CL CTS added a new test being printf("\n", "foo"), but we ended up
printing the new line twice. If we can't find a specifier anymore, ignore
the argument as after the loop processing all arguments we'll print the
remaining format string anyway.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30574>
(cherry picked from commit 4080269845)
Make wsi_device_matches_drm_fd() a default helper that PCI based GPUs plug in to
wsi_dev->can_present_on_device. This is needed for devices without libdrm, where
wsi_device_matches_drm_fd was still being called causing an "undefined reference"
build error.
Suggested-by: Rob Clark <robdclark@chromium.org>
Fixes: baa38c144f ("vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching")
Reviewed-by: Mark Collins <mark@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29627>
(cherry picked from commit 47289ebc8d)
Instead of aligning offsets, we just align the size every time we query
it. This simplifies our offset and size calculations later since we can
always just add up descriptor buffer sizes and know that we'll be okay.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30580>
(cherry picked from commit 0f8f407e57)
There are two problems.
1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.
2. Ignoring the first problem, this only produces the desired flags
for LE and G. All other cases can produce the wrong result.
shader-db:
All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18282314 -> 18282316 (<.01%)
instructions in affected programs: 78 -> 80 (2.56%)
helped: 0
HURT: 2
total cycles in shared programs: 952924234 -> 952924252 (<.01%)
cycles in affected programs: 584 -> 602 (3.08%)
helped: 0
HURT: 2
Fixes: e6022281f2 ("intel/elk: Rename files to use elk prefix")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
(cherry picked from commit 9125b7c1b4)
There are two problems.
1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.
2. Ignoring the first problem, this only produces the desired flags
for LE and G. All other cases can produce the wrong result.
For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the
following code:
mad.ge.f0.0(8) g51<1>F g50<8,8,1>F g46<8,8,1>F g11<1,1,1>F
mov.sat(8) g52<1>F g51<1,1,0>F
...
(+f0.0) sel(8) g54<1>UD g53<8,8,1>UD 0x3f000000UD
Without this commit, the saturate is incorrectly propagated to the MAD.
A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275.
There are even worse cases like total_war_warhammer3.dx12vk-g6.foz
78328466761ef7ab and ee920491573860fc. The former has the following
code (and the latter has very similar code):
mad.l.f0.0(16) g95<1>F g93<8,8,1>F g62<8,8,1>F g68<1,1,1>F
...
mov.sat(16) g109<1>F -g95<1,1,0>F
...
(+f0.0) sel(16) g68<1>UD g111<1,1,0>UD g54<1,1,0>UD
(+f0.0) sel(16) g70<1>UD g113<1,1,0>UD g56<1,1,0>UD
(+f0.0) sel(16) g72<1>UD g115<1,1,0>UD g58<1,1,0>UD
Saturate propagation makes a hash of this code:
mad.sat.l.f0.0(16) g106<1>F -g93<8,8,1>F -g62<8,8,1>F g68<1,1,1>F
...
(+f0.0) sel(16) g70<1>UD g110<1,1,0>UD g56<1,1,0>UD
(+f0.0) sel(16) g72<1>UD g112<1,1,0>UD g58<1,1,0>UD
(+f0.0) sel(16) g68<1>UD g108<1,1,0>UD g54<1,1,0>UD
Not only is the saturate incorrectly applied to the MAD, but the MAD
result is negated without changing the conditional modifier to G!
NOTE: Backports of this commit to stable branches may need to be more
like the following commit to elk.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19729375 -> 19729377 (<.01%)
instructions in affected programs: 112 -> 114 (1.79%)
helped: 0
HURT: 2
total cycles in shared programs: 916234266 -> 916234288 (<.01%)
cycles in affected programs: 636 -> 658 (3.46%)
helped: 0
HURT: 2
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531594 -> 151531601 (+0.00%)
Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00%
Totals from 6 (0.00% of 630198) affected shaders:
Instrs: 4550 -> 4557 (+0.15%)
Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03%
Fixes: 947c828d5c ("i965/fs: Add a saturation propagation optimization pass.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
(cherry picked from commit 3d8fea0e09)
Using ior here is equivalent to using uadd_sat, but works for every driver
and shouldn't hurt anywhere.
I forgot to fix this up when fixing up some vvl errors with zink.
Fixes crashes with the integer_ctz CL CTS tests in zink.
Fixes: 39ec184db6 ("zink: lower 64 bit find_lsb, ufind_msb and bit_count")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30535>
(cherry picked from commit 48acf9d358)
* close(fd) requires also resetting the fd=-1 or else boom
* checking just driver_name is broken because loader_get_driver_for_fd()
uses MESA_LOADER_DRIVER_OVERRIDE, so there's no way to differentiate
an inferred load
Fixes: b907eb4750 ("egl: don't bind zink under dri2/3")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30556>
(cherry picked from commit 1a579552af)
In utrace timestamp copy case cmd_buffer is NULL.
Fixes: dbbcd5c32c ("anv: factor out generation kernel dispatch into helper")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>
(cherry picked from commit ff8953f666)
The CL CTS started to call this API, luckily we don't have to actually
implement it, because we don't intent to support CL 1.0 only devices in
the first place (probably).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30575>
(cherry picked from commit cd2dc4f70c)
For some purposes (e.g. advanced blending) we need a non-zero alpha
value returned from reads. This is only guaranteed on Bifrost if
we explicitly request RGB1 component ordering. The default is to use
RGBA component ordering, which for R5G6B5 causes 0 to be read for
alpha.
A complication is that the Mali fixed function hardware requires
four components (which implies RGBA rather than RGB1). If fixed
function blending is in use, we modify the pixel format back to
RGBA when building the blend descriptor.
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29606>
(cherry picked from commit 004e0eb3ab)
We have to swizzle the border color in order to offset the
automatic swizzling introduced to compensate for limited
component order support in AFBC/AFRC. However, the border color
format is only available if the `TEXTURE_BORDER_COLOR_QUIRK` is
enabled, so set that for v10 (it was already set for v7).
While testing, we uncovered another issue: valhall introduces a
swizzle for depth+stencil formats that isn't present for bifrost, and
also isn't needed (or wanted) for the border color. So ignore the
border color swizzle for depth+stencil on valhall (on bifrost the
swizzle is a no-op anyway).
Fixes: 87aad0a5e4 ("panfrost: encode component order as an inverted swizzle (v10)")
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30542>
(cherry picked from commit 3135f76331)
Before the patch, intel_device_info_get_max_preferred_slm_size()
returns values in kilobytes, but then
intel_device_info_get_max_slm_size() is multiplying it by 1024.
As a result, LNL is reporting maxComputeSharedMemorySize to be
134217728, which is 128mb.
Fix this by making intel_device_info_get_max_slm_size() not multiply
it by 1024.
This should fix at least the following dEQP tests:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64
Some tests were failing with:
deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed.
while other tests were triggering the OOM.
v2:
- Make everybody return sizes in bytes (José).
v3:
- Rename variable to bytes (José, Jordan).
Fixes: fd368f5521 ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>
(cherry picked from commit 0e38b794e2)
This reverts commit d6bb4ddc63.
Fixes: d6bb4ddc63 ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported")
PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE is necessary for some scenarios like the example below
described in https://github.com/microsoft/WSL/issues/11838
gst-launch-1.0 -v videotestsrc num-buffers=250 !
video/x-raw,width=1920,height=1200 !
vaapipostproc !
vaapih264enc !
filesink location=~/wsl_test.h264
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30548>
(cherry picked from commit a0f1a708c4)
This is more correct than the previous code and the CL CTS relies on edge
case behavior here, e.g. for 1Dbuffer images.
I think part of that is not actually required by the spec, but whatever.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 2484331e82)
An application could map and unmap a host ptr allocation multiple times,
but because how the refcounting works, we might never ended up syncing the
written data to the mapped region.
This moves the refcounting out of the event processing.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 1fa288b224)