st_texture_set_sampler_view() currently allows only one samplerview for
a given texobj per context. in a scenario where the same texobj is
bound multiple times with different samplerviews (e.g., SRGB) for the
same draw like
samplerviews[] = {view0, view1}
then st_texture_set_sampler_view() will release view0 while creating view1
before either view is actually set to the driver, and then the driver will explode
this is gross, but the best solution which avoids infinite memory ballooning
from bufferview offsets is to pass through the array of views during creation
to ensure that the cache doesn't try to prune a view it just created
caught by Left 4 Dead 2
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15045
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 3264adf863)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
We can produce a transposed value sometimes, and we have to make sure
that val->transposed is also updated when that happens.
Noticed by inspection after the previous commit.
Cc: mesa-stable
(cherry picked from commit c13bdaaa40)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Since the stack pointer may wrap around the stack size in overflow
cases, traversal logic calculates the real stack pointer with
nir_umod_imm(b, stack, args->stack_entries * args->stack_stride).
For ray queries, "stack" was initialized to
"stack_base + local_invocation_idx * 4". This was completely broken, as
the umod would later delete the stack base completely and overwrite the
start of LDS, which belongs to the apps' shared memory.
Instead, add the stack base as a constant offset in the load/store_stack
callback. (This should also save 1 VALU per ray query)
Also, delete radv_ray_traversal_args::stack_base since it's unused now.
Cc: mesa-stable
(cherry picked from commit b046eaf36d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Vertex shaders shorter than four instructions can hard-lock R3xx GPUs.
This seems to happen in combination with a small vertex count. This was
seen before, most notably with dummy shaders, but the earlier fix only
removed those dummy shaders, so some occurrences could still slip
through the cracks. Pad all vertex shaders to four instructions on R3xx.
Reviewed-by: Filip Gawin <filip@gawin.net>
Fixes: c6aa639ba9 ("r300: skip draws instead of using a dummy vertex shader")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/337
(cherry picked from commit 9b12664b72)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
GFX12 encoding added one bit to the stack offset, doubling the limit on
the stack base offset that is possible to encode. In practice, this
always allows using bvh_stack_push* instructions on GFX12 since LDS is
still 64kB.
Cc: mesa-stable
Fixes: 59a39779 (radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode)
(cherry picked from commit 867d0b33b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Needed for some FSR macro changes I want to test.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 7d6cc15ab8 ("nvk/mme: Add a unit test framework for driver macros")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 32895657b4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
The correct dependence is cs_flush_caches.cs_defer.signal to
signal cs_sync32_set.cs_defer.wait in occulusion query path.
Fixes: 443ddac ("panvk/csf: merge v10 and v11 paths in
issue_fragment_jobs")
Fixed: many random fail cases in VK-GL-CTS 1.4.4.2, eg.
dEQP-VK.query_pool.occlusion_query.get_results_conservative
_size_64_wait_query_without_availability_draw_points_clear_color
Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 93b58064f7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Normally Venus on Nvidia GPUs takes the prime blit path. The exception
is when KWin or any wlroots based compositors are used:
1. KWin and wlroots based compositors always add LINEAR to dmabuf
feedback tranches assuming LINEAR can be handled by GPU drivers.
2. Venus + Virgl only sees the compositor injected LINEAR mod since
Virgl doesn't support explicit modifiers on the driver side.
3. Nvidia GPUs doesn't support LINEAR color attachment, and it's too
late to reject LINEAR mod when the native image path has already
been taken instead of the prime image path.
Gamescope requires VK_EXT_physical_device_drm and its runtime doesn't
use standard WSI extensions, so venus can spoof without impacting it.
Cc: mesa-stable
(cherry picked from commit 1a302155ee)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
"explicit sw" means llvmpipe, which cannot be a real drm device. this requires also
returning only a single device so as to avoid leaking non-sw drivers
should fix LIBGL_ALWAYS_SOFTWARE=1 eglinfo
Fixes: 8a339cdebc ("egl: fix sw fallback rejection in non-sw EGL_PLATFORM=device")
(cherry picked from commit c9b2986607)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.
We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.
This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.
This affect performance on Xe2+ onwards:
* Boundary Benchmark: 36.2%
* Solar Bay extreme: 9.8%
* Hitman world of assassination: 3.9%
Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit cb423ee636)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Previously, we assumed that the selector for bcsel could be whatever,
regardless of the bit sizes of the data and we'd just fix it in the
back-end. This works okay for scalars but falls over the moment we
vectorize because all our vector handling assumes bit sizes match.
Since matching bit sizes is what the hardware wants anyway, it's better
to do the right thing in NIR and hope copy-propagation can fold in
conversions if needed.
Unfortunately, copy prop isn't that smart yet so this does hurt a bit:
Instrs: 1193679 -> 1198086 (+0.37%); split: -0.06%, +0.43%
CodeSize: 11915136 -> 11950592 (+0.30%); split: -0.05%, +0.34%
Full: 160985 -> 160941 (-0.03%); split: -0.04%, +0.01%
Estimated normalized CVT cycles: 4456.938557000181 -> 4480.876069000186 (+0.54%); split: -0.13%, +0.67%
Estimated normalized SFU cycles: 6350.9375 -> 6392.21875 (+0.65%)
Estimated normalized Load/Store cycles: 205773.0 -> 205795.0 (+0.01%)
Maximum number of threads: 12864 -> 12863 (-0.01%)
Number of spill instructions: 22487 -> 22489 (+0.01%)
Number of fill instructions: 52179 -> 52219 (+0.08%)
Hurt shaders:
google-meet-clvk/BgBlur
google-meet-clvk/Relight
parallel-rdp/small_subgroup
parallel-rdp/small_uber_subgroup
The proper solution here is to teach copy-prop about this stuff so that
it can propagate swizzles into ALU ops when they're supported:
https://gitlab.freedesktop.org/panfrost/mesa/-/issues/265
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14945
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 3fd471dca5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
It calls both for some reason but never handles any other booleans than
32-bit. This was probably a mistake.
Fixes: e63a7882a0 ("etnaviv: call nir_lower_bool_to_bitsize")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 6fb3995659)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This should be doing a or and not an assign.
This fixes issues on NVK with mesh stages on DGC.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 8f2eeee7ba)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>