Correlate the screen roots with the xcb_window_t root.
This mapping should be static.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39551>
Makes it possible to query refresh rates of screens.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39551>
Somewhat surprising this was not done already.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39551>
Allows these helpers to be used for X11 WSI as well.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39551>
These will be implemented entirely in the register allocator as register
assignment constraings (and possibly a copy in the case of OpRegOut).
Only OpRegIn is implemented in the trivial RA. OpRegOut will have to
wait for the real RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42344>
The index for shared memory should be set to TGSI_MEMORY_TYPE_SHARED
which matches the index used in the declaration.
Fixes spec@arb_compute_shader@execution@shared-atomic* with SVGA driver
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16458>
Adds NVK_EXPERIMENTAL=dlss_backwards_compat
Allows using a SASS binary with a matching major version
number, but smaller minor number than the device.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
This commit implements the subset of cubin launch functionality in
VK_NVX_binary_import used by DLSS.
With this, DLSS works in Control on a RTX 2060 Super (sm_75) and a
RTX 4060 (sm_89).
Right now, this will only work where there is compatible bytecode
available for the current physical device and will return an error in
vkCreateCuModuleNVX if none is present.
DXVK-NVAPI and DLSS handle this error gracefully and will disable the
affected features.
The NVIDIA driver would do PTX -> bytecode on the fly to handle this,
but we don't have PTX->NIR yet, and that is a very large undertaking
not done by this MR.
This doesn't fully close#12439, as we don't have PTX->NIR, but
is a big step towards it.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
These cubin and fatbin parsers implement a subset of the functionality
exposed in order to launch the modern Cuda kernels used for DLSS.
Co-authored-by: Mary Guillemard <mary@mary.zone>
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
This allows making dispatch with a specifically inputted root
descriptor, primarily for cubin kernel launches.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
The cubin kernel launches need to use a root descriptor that's
compatible with the bytecode that nvcc generates which contains block
dim, grid dim and the kernel params at specific layouts which can be
influenced by ELF .nv.info attributes.
Thus, expose the ability to input custom root descriptors
in nvk_cmd_upload_qmd.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
Previously this was only accessible from Rust,
but VK_NVX_binary_import needs to calculate this
for imported cubin kernels from EIATTR_REGCOUNT.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
There's no longer need for the panvk_sparse library, or for panvk to care
about whether the KMD can do native sparse mapping. Submit sparse VM
bindings as a single operation and let pankmod handle the gory details.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
On top of that, leverage the new push/flush interface so that management of
the black hole in older KMD versions can be handled by the pankmod layer.
Merging of operations is now done in conjunction with buffering the latest
submission, so that the very last operation can have its signal syncs
assigned before being delivered to the pankmod layer.
Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
The goal is moving the need for prebuffering when the total number of
vm_bind operations isn't known in advance away from panvk, and into the
pankmod layer, and also to consolidate that treatment in a single place. At
the moment, both panvk_vX_bind_queue.c and panvk_sparse.c roll their own
workarounds for the blackhole-mapping sparse bind mechanism.
For older KMD versions with no sparse mapping support, emulate it by
cyclically mapping over a dummy BO, which is allocated on demand and per
VM. This behaviour is similar to that of the Panthor KMD.
This moves responsibility over whether to use native KMD sparse mapping or
the blackhole method into the pankmod layer, so that the sparse mapping
mechanism is transparent to the Vulkan driver. Also disallow automatic VA
assignment when sparse emulation is required, because relaying auto va's
back to the caller is both cumbersome and unsafe, and also not a practical
use case.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
Register whether the underlying KMD supports sparse mappings in a device
property. Add a new VM operation field that holds flags, for the time being
only sparse is a valid operation modifier. Disallow sparse operations when
an automatic VA is requested or when a BO is provided accidentally.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
A future commit will want to have a binary sync object attached to a
vm_bind operation or a sync operation only, so rather than creating a
separate pankmod flag for it, we simply check the point (always 0).
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
This is done in preapration of kmod support for blackhole sparse mappings.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
Instead of hard-coding available page sizes in UM, have pan_kmod
backends query the KMD when these are exposed by the kernel.
This is not yet done for Panfrost, but it might be added soon.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
Fixes a compiler warning regarding the assertion.
Fixes: 6d6a3ab679 ("v3dv: asserts push constants data is valid")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42269>
Two bugs in lp_build_fetch_rgba_aos's small_unorm fast path:
- vector_justify=true shifted the loaded value into the MSB of the wider
type on big-endian. The format_desc already carries
big-endian-corrected channel shifts, so the extra shift broke channel
extraction for sub-32-bit formats (e.g. R8G8B8, B5G5R5).
- The output OR-loop packed channels assuming little-endian byte order
(shift = j * width), so after bitcast to vec4-u8 on big-endian the
alpha channel landed at byte[0] instead of byte[3].
The fix is simple: gather with vector_justify=false so format_desc
shifts apply directly; use (3-j)*width on UTIL_ARCH_BIG_ENDIAN to match
the memory layout that big-endian bitcast produces.
This fixes the lp_test_format test on big-endian platforms.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42228>
The software fallback in lp_build_round (used when
arch_rounding_available returns false, e.g. altivec with length < 4)
used lp_build_iround's bias-and-truncate path, which rounds
half-away-from-zero due to float32 rounding of the (a + nextafterf(0.5))
sum. This caused lp_test_arit failures for v1 and v2 vector widths on
ppc64.
For altivec/VSX, llvm.nearbyint lowers to vrfin (AltiVec) or xvrspic
(VSX) — both single instructions that round to nearest-even — for any
vector width. Use it in the else branch when has_altivec is set,
preserving the lp_build_iround path for x86 pre-SSE4.1 where
llvm.nearbyint would expand to scalar nearbyintf calls.
Update the length==2 expected-failure condition in lp_test_arit to
exclude altivec (now fixed), keeping it for other platforms that still
use the software fallback.
This fixes the lp_test_arit test on ppc64.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42227>
We pass the found variables as a pointer set to the driver. Since the
callback is supposed to be used for global decisions, the driver might
end up picking different variables based on the (non-deterministic)
iteration order of the set. Fix this by passing the variables as a
util_dynarray instead.
To make sure the contents of the util_dynarray don't have to be shuffled
around every time the drivers wants to remove a variable from it,
introduce nir_variable::pass_flags that we use to create an intrusive
ordered set using a util_dynarray.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42168>
The previous fix used `grep -P` which is not supported by the grep
implementation used in this job, so replace it with `grep -E` + `cut`
which is supported by that implementation.
Fixes: df3756e6dc ("ci: fix perfetto download in `make-git-archive` nightly job")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42331>
128-bit render targets are emulated as paired G32R32F targets. There is no
integer 64-bit PE format, so the integer formats also render through
G32R32F, as the blob does. The real hardware requirement is the half-float
pipe that provides G32R32F, so gate on HALF_FLOAT instead of the
conservative halti5 level. This enables the formats on older GPUs that have
the pipe.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
128-bit formats (RGBA32) are emulated as two stacked G32R32 planes. The
bound sampler reads the RG plane and a companion sampler reads the BA plane,
which etna_nir_lower_128bit(..) reassembles in the shader. Only the
descriptor path set up the companion, so the state path could not sample
these formats. Set up the companion on the state path too and share
companion_slot(..) between both paths.
The real requirement is the plane format, not the descriptors. The float
plane G32R32F samples through the half-float pipe, so gate it on HALF_FLOAT
and advertise GL_OES_texture_float, also on halti2 GPUs like GC3000. The
integer plane G32R32I needs halti5, so keep the integer formats there.
The KHR-GLES2 internalformat tests for sized RGB32F/RGBA32F need an ES3
context, so list them as expected fails on GC3000 too.
Verified on GC7000 with and without ETNA_MESA_DEBUG=no_texdesc.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
sRGB encoding was only handled through the global PE.LOGIC_OP SRGB bit,
which the hardware applies to the primary render target alone. An sRGB
surface bound to any other MRT slot was written as linear.
Fixes dEQP-GLES3.functional.fragment_out.random.{1,17,39,64,86,93,96}.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
PE.MEM_CONFIG.COLOR_TS_MODE is a single global field, so every TS-enabled
color render target in a framebuffer has to share one TS mode. With
CACHE128B256BPERLINE the mode is picked per resource (256B for compressible
formats, 128B otherwise), so a compressible format bound next to an integer
format disagrees and the odd target gets decoded in the wrong mode, reading
back as the clear color.
The blob keeps TS on the targets that match the global mode and disables it
only on the odd one, instead of giving up TS for the whole framebuffer.
Compute a per-RT TS mask once in etna_set_framebuffer_state(..), store it in
etna_framebuffer_state and reuse it when arming the BLT fast clear, so the
two consumers stay consistent by construction. A disabled target keeps its
tile status allocated, so it recovers once a later framebuffer is compatible
again.
Fixes 23 dEQP-GLES3.functional.draw_buffers_indexed.random.* cases that mix
integer and unorm render targets, with no regression in fbo.color or fbo.blit.
Fixes: d70531ca93 ("etnaviv: Extend etna_update_ts_config(..) for MRTs")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
The 128-bit emulation now covers the clear, blit, copy and sample paths,
so stop rejecting the three emulated RGBA32 formats. The format table is
the remaining filter. Sampling still relies on the halti5 texture
descriptors, so halti5 is the gate.
Sampling RGBA32F enables GL_OES_texture_float, and with the existing
half-float support also GL_ARB_texture_float, so advertise both.
The KHR-GLES2 internalformat tests for sized RGB32F/RGBA32F need an ES3
context, so they fail on the ES2 driver. List them as expected fails, as
other ES2 drivers do.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
nir_lower_fragcolor(..) expands a broadcast gl_FragColor into one store
per render target. It was passed specs->num_rts, the physical HW count,
but on HALTI2 only half of them are advertised (caps.max_render_targets)
since the upper half is reserved for float and 128-bit format emulation
companions.
A broadcast shader thus wrote into the reserved slots. For a 128-bit
target the clear meta shader stores to every gl_FragData and overwrote
the BA companion plane filled by etna_nir_lower_128bit(..), so the clear
came back with the RG half replicated into BA.
Pass the advertised count instead to keep the broadcast inside the user
visible range.
Fixes: 928a276b78 ("etnaviv: Limit max supported render targets")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
A 128-bit color level is laid out as two stacked G32R32F planes, so
clear it with two 64bpp RS fills, the RG half at the level offset and
the BA half at the second-plane offset.
A cache flush and stall separate the two fills. etna_clear_rs(..) needs
the same flush between its color and depth clears to avoid a GC600 hang,
and the blob brackets every RS operation this way. The blob clears
RGBA32F render targets through RS with the same plane split, verified
with a cmdstream capture on a faked GC7000 rev 6204 identity.
Fixes dEQP-GLES3.functional.fbo.color.repeated_clear.* for 128-bit
formats on RS-only halti5 hardware.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
etna_blit_save_state(..) saved the expanded framebuffer including the
appended companion slots. The util_blitter restore goes through
etna_set_framebuffer_state(..), which appends companions again, so every
blitter round trip with a 128-bit color buffer bound grew nr_cbufs until
the expansion assert fired.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
The 128-bit emulation stores all RG halves in the first half of the BO
and all BA halves in the second half. The sampler descriptors, the CPU
upload and the BLT clear all compute the second plane as
(size * depth) / 2.
etna_try_blt_blit(..) advanced source and destination by layer_stride
instead, an interleaved layout nothing else uses. For single-layer 2D
targets both formulas coincide, so plain blits worked, but per-layer
blits of a multi-layer 128-bit array texture corrupted the BA half of
every layer. Use the same (size * depth) / 2 offset as the rest of the
emulation.
Fixes: 1f60a0397b ("etnaviv: blt: Support 128 bit blit operations")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>