One of the cpu pointers wasn't marked as read-write, causing gcc to complain:
../src/gallium/drivers/vc4/vc4_tiling_lt.c:181:17: error: output operand constraint lacks ‘=’
__asm__ volatile (
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Fixes: 813f0a8296 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Currently the Intel "anvil" driver races with the generation of genxml
files, while i965 has an explicit dependency. This patch adds the same
dependency to anvil.
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 279060cd32)
For some reasons, this breaks trees rendering in Project Cars.
Fixes: 85010585cd ("radv: only enable gl_SampleMask if MSAA is enabled too")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 334da034d8)
Because none of them have been picked up for 19.0 due to this bug
being reintroduced.
v2: - Fix fixes tags
Fixes: e6b3a3b201
("bin/get-pick-list.sh: handle "typod" usecase.")
Fixes: fac10169bb
("bin/get-pick-list.sh: prefix output with "[stable] "")
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit aff52dd2c6)
Stop using 12.12 quantization for viewports that are not contained in
the lower 4k corner of the render target as the hardware needs to keep
both absolute and relative coordinates representable.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3c540e0a74)
This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.
Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".
While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.
If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.
v2: do not clamp to framebuffer if there isn't a framebuffer
v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary
v4: clamp also x and y to render area (Jason)
v5: rename used variables (Jason)
Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 1ad26f9417)
Check if a pixel format is supported by the Wayland servers gpu driver
before exposing it to the client via wl_drm, so we avoid reporting formats
to the client which the server gpu can't handle.
Restrict this reporting to the new color depth 30 formats for now, as the
ARGB/XRGB8888 and RGB565 formats are probably supported by every gpu under
the sun.
Atm. this is mostly useful to allow proper PRIME renderoffload for depth
30 formats on the typical Intel iGPU + NVidia dGPU "NVidia Optimus" laptop
combo.
Tested on Intel, AMD, NVidia with single-gpu setup and on a Intel + NVidia
Optimus setup.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 820dfcea43)
Support PRIME render offload between a Wayland server gpu and a Wayland
client gpu with different channel ordering for their color formats,
e.g., between Intel drivers which currently only support ARGB2101010
and XRGB2101010 import/display and nouveau which only supports ABGR2101010
rendering and display on nv-50 and later.
In the wl_visuals table, we also store for each format an alternate
sibling format which stores colors at the same precision, but with
different channel ordering, e.g., ARGB2101010 <-> ABGR2101010.
If a given client-gpu renderable format is not supported by the server
for import, but the alternate format is supported by the server, expose
the client-gpu renderable format as a valid EGLConfig to the client. At
eglSwapBuffers time, during the blitImage() detiling blit from the client
backbuffer to the linear buffer, the client format is converted to the
server supported format. As we have to do a copy for PRIME anyway,
this channel swizzling conversion comes essentially for free.
Note that even if a server gpu in principle does support sampling
from the clients native format, this conversion will be a performance
advantage if it allows to convert to the servers preferred format
for direct scanout, as the Wayland compositor may then be able to
directly page-flip a fullscreen client wl_buffer onto the primary
plane, or onto a hardware overlay plane, avoiding an extra data copy
for desktop composition.
Tested so far under Weston with: nouveau single-gpu, Intel single-gpu,
AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia
client dGPU for rendering.
v2: Implement minor review comments by Eric Engestrom: Add some
comment and assert, and some style fixes for clarity.
No functional change.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit a34b0d68bb)
The implementation of these opcodes in the generator assumes that their
arguments are packed, and it generates register regions based on that
assumption.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 3918943211)
"The C standard says that compound literals which occur inside of
the body of a function have automatic storage duration associated
with the enclosing block. Older GCC releases were putting such
compound literals into the scope of the whole function, so their
lifetime actually ended at the end of containing function. This
has been fixed in GCC 9. Code that relied on this extended lifetime
needs to be fixed, move the compound literals to whatever scope
they need to accessible in."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 129a9f4937)
Piglit's vp-max-array test creates a vertex program containing a uniform
array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB. Mesa
will then add additional state-var parameters for things like the MVP
matrix.
radeonsi currently exposes a value of 4096, derived from constant buffer
upload size. This means the array will have 4096 elements, and the
extra MVP state-vars would get a prog_src_register::Index of over 4096.
Unfortunately, prog_src_register::Index is a signed 13-bit integer, so
values beyond 4096 end up turning into negative numbers. Negative
source indexes are only valid for relative addressing, so this ends up
generating illegal IR.
In prog_to_nir, this would cause an out of bounds array access.
st_mesa_to_tgsi checks for a negative value, assumes it's bogus,
and remaps it to parameter 0 in order to get something in-range.
This isn't right - instead of reading the MVP matrix, it would read
the first element of the vertex program's large array. But the test
only checks that the program compiles, so we never noticed that it
was broken.
This patch limits the size of the program limits, with the understanding
that we may need to generate additional state-vars internally. i965 has
exposed 1024 for this limit for years, so I don't expect lowering it to
2048 will cause any practical problems for radeonsi or other drivers.
Fixes vp-max-array with prog_to_nir.c.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit f45dd6d31b)
If there is no information about number of render targets
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a0a52a0367)
Add "PIPE_VIDEO_PROFILE_MAX" to enum, so it will make sure here will
be correct when adding more profiles in the future.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109107
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 21cdb828a3)
wow, it's hard to believe that fence and syncobjs dependencies were ignored.
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ddfe209a0d)
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that. However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work. The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.
The easy solution here is to just rematerialize deref sources of deref
instructions as well. This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 9e6a6ef0d4)
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc79a1483f)
We're writing to the bo and the kernel needs to know for
fd_bo_cpu_prep() to work.
Fixes: f93e431272 ("freedreno/a6xx: Enable blitter")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
(cherry picked from commit 357ea7da51)
The check was for 1 bit being set, which is clearly not what we want.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c24fc64c7)
Equivalent of ANV patch c7f4a2867c
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 39ab4e12f7)
Fixes: c7b65dcaff "xvmc: Define some Xv attribs to allow users
to specify color standard and procamp"
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 110a6e1839)
When Mesa is compiled for gallium-xlib using e.g.
./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm
-disable-egl
and is used by an X server (usually remotely via SSH X11 forwarding)
that does not support MIT-SHM such as XMing or MobaXterm, OpenGL
clients report error messages such as
Xlib: extension "MIT-SHM" missing on display "localhost:11.0".
ad infinitum.
The reason is that the code in src/gallium/winsys/sw/xlib uses
MIT-SHM without checking for its existence, unlike the code
in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c.
I copied the same check using XQueryExtension, and tested with
glxgears on MobaXterm.
This issue was reported before here:
https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a203eaa4f4)
autotools doesn't have any requirement. This fixes meson on Ubuntu 16.04.
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 1e85cfb91a)
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions. If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b4f0d062cd)
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.
CID: 1435704
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
(cherry picked from commit 90458bef54)
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.
CID: 1430194
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 9ea90ffb98 ("broadcom/vc4: Add support for HW perfmon")
(cherry picked from commit f6e49d5ad0)
Like all the other sends, it's just mlen * REG_SIZE.
Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit cf42b0f9e2)
Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to
avoid having to deal with 0 vs 1 everywhere. But this causes problems
in mesa/st, for example st_finalize_texture() will think there is a
nr_samples mismatch and recreate the texture. Somehow this manifests
as corrupt x11 font rendering on generations that do not support MSAA
(but apparently works fine on a5xx and a6xx which do support MSAA.)
Fixes: cf0c7258ee freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit c3baa077bf)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/freedreno/freedreno_batch_cache.c
Function's out variable could be an array dereferenced by an array:
func(v[w[i]]);
or something more complicated.
Copy index in any case.
Fixes: 76c27e47b9 ("glsl: Copy function out to temp if we don't directly ref a variable")
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0862929bf6)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109488
Nominated-by: Matt Turner <mattst88@gmail.com>
Otherwise we can end up with IR that looks like this:
(
(declare (temporary ) vec4 f@8)
(assign (xyzw) (var_ref f@8) (var_ref f) )
(call f16 ((swiz y (var_ref f@8) )))
(assign (xyzw) (var_ref f) (var_ref f@8) )
))
When we really need:
(declare (temporary ) float inout_tmp)
(assign (x) (var_ref inout_tmp) (swiz y (var_ref f) ))
(call f16 ((var_ref inout_tmp) ))
(assign (y) (var_ref f) (swiz y (swiz xxxx (var_ref inout_tmp) )))
(declare (temporary ) void void_var)
The GLSL IR function inlining code seemed to produce correct code
even without this but we need the correct IR for GLSL IR -> NIR to
be able to understand whats going on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 76c27e47b9)
Nominated-by: Matt Turner <mattst88@gmail.com>
We were leaking surfaces because the references taken in
etna_set_framebuffer_state weren't being released on context destroy.
Instead of just directly releasing those references in
etna_context_destroy, use the util_copy_framebuffer_state helper.
Take the chance to remove the duplicated buffer references in
compiled_framebuffer_state to avoid confusion.
The leak can be reproduced with a client that continuously creates and
destroys contexts.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit bf1dfcc3e8)
[Emil: resolve trivial conflict - dummy_rt does not exist in branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/etnaviv/etnaviv_context.c
The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson
cross-builds because of the need to run host binaries on the build system.
vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my
cross-builds.
Fixes: ebcb4c2156 ("meson: Enable VC4's NEON assembly support.")
(cherry picked from commit 932ed9c00b)
meson.build:166:21: ERROR: Unknown method "verson_compare" for a string.
Fixes: c1efa240c9 ("meson: Add warnings and errors when using ICC")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit be5b271ea7)
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
(cherry picked from commit 300d3ae8b1)
[Emil: apply the patch to vc4_tiling_lt.c instead of v3d_cpu_tiling.h]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/broadcom/common/v3d_cpu_tiling.h
Squashed with commit:
vc4: Declare the last cpu pointer as being modified in NEON asm.
Earlier commit addressed 7 of the 8 instances available.
v2: Rebase patch back to master (by anholt)
Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 385843ac3c)
Conflicts:
src/broadcom/common/v3d_cpu_tiling.h
This makes the asm code more intelligible and clarifies the functional
change in the next commit.
(commit message and commit squashing by anholt)
(cherry picked from commiti 522f688471)
[Emil: apply the patch to vc4_tiling_lt.c instead of v3d_cpu_tiling.h]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/broadcom/common/v3d_cpu_tiling.h
This allows us to avoid expensive string compares since we already have
a map to the pointers.
These compares were taking ~30 seconds for a single shader compile
in Godot due to it using 64,000+ uniforms.
Fixes: c4cff5f402 ("glsl: add basic support for resource list to shader cache")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229
(cherry picked from commit fb78a6cb72)
Fixes: b722b29f10 ("radv: add support for 16bit input/output")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 0907ae35ad)
From the vulkan spec 3.2 "Instances":
"Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing an
apiVersion of 0 is equivalent to providing an apiVersion of
VK_MAKE_VERSION(1,0,0)."
Fixes: ffa15861ef "radv: UseEnumerateInstanceVersion for the default version."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit d12dc39396)
Fixes regression caused by
42d672fa6a
st/nine: Bind src not dst in nine_context_box_upload
Before that patch, for user provided textures,
when the texture was destroyed, the safety
check for pending uploads, which according to
the code "Following condition cannot happen currently",
was flushing the queue and thus triggering the upload.
After the patch, the texture destruction was delayed after
the upload. However the user frees the texture buffer,
as it thinks the texture released.
Instead of reverting the faulty patch,
this patch instead flushes the csmt queue right away
after queuing the upload for this type of textures.
This is more future-proof, as we may want to bind the
surface for other reasons in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d7433c22e6)