Translate the NIR variables directly to LLVM instead of lowering to a
TGSI-style giant array of vec4's and then back to a variable. This
should fix indirect dereferences, make shared variables more tightly
packed, and make LLVM's alias analysis more precise. This should fix an
upcoming Feral title, which has a compute shader that was failing to
compile because the extra padding made us run out of LDS space.
v2: Combine the previous two patches into one, only use this for shared
variables for now until LLVM becomes smarter.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Otherwise, if a client gave us a list of modifiers that contained a
modifier we understand but which is not supported on the hardware, we
might return that one and then fail to create the image.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This commit splits the mapping in half. The modifier_infos table now
only contains the modifier and the since_gen field. The tiling bits
have been moved into a table in tiling_to_modifier as that's the only
place it was ever used. The modifier_is_supported function now takes a
devinfo and does the since_gen check.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This assert was removed in b0cc55f298 but
got added back in 1a43d774b6, probably by
accident.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have an actual aux_usage field, we no longer need the
complex logic of is_lossless_compressed in order to figure out if a
miptree is CCS_E compressed. As a side-effect, there is not longer any
need to overload MSAA_LAYOUT_CMS for CCS_E and we can stop doing so.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
HiZ, like MCS and CCS_E, can compress more than just clear colors so we
want it turned on whenever the miptree is being used as a depth
attachment. It's theoretically possible for someone to create a depth
texture, upload data with glTexSubImage2D, and texture from it without
ever binding it as a depth target. If this happens, we would end up
wasting a bit of space by allocating a HiZ surface we never use.
However, this is rather unlikely out side of test cases, so we're better
off just allocating it up-front.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We need this split for the same reason that we need the split for CCS:
intel_miptree_supports_hiz is called *before* we choose the actual
tiling. Adding a tiling_supports_hiz helper lets choose_aux_usage
more accurately decide whether or not to enable hiz. In particular,
this prevents us from enabling HiZ on linear depth buffers.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit test crash when context creation fails.
v2: As suggested by Brian, move the init to st_create_context_priv()
Reviewed-by: Brian Paul <brianp@vmware.com>
Enable the capability if the DRM supports it.
Hook up mechanism to send and receive fence FD from the DRM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Implement the required functions in vmw_fence module.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Return PIPE_CAP_NATIVE_FENCE_FD capability based on what the
winsys reports
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The new interfaces will be used to enable
EGL_ANDROID_native_fence_sync.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The timeout parameter is required to implement
EGL_ANDROID_native_fence_sync.
v2
* Replaced default timeout from 0 to PIPE_TIMEOUT_INFINITE
* Add more documentation to the new timeout parameter
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
From the Vulkan 1.0.53 spec VU for vkCreateImageView:
"image must have been created with a usage value containing at least
one of VK_IMAGE_USAGE_SAMPLED_BIT, VK_IMAGE_USAGE_STORAGE_BIT,
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, or
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT"
We were missing VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT from out list.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
We were early returning and never created the NULL surface state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: James Legg <jlegg@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.
It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.
When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Consider the following situation:
mtx_lock(mutex);
do_something();
util_queue_add_job(...);
mtx_unlock(mutex);
If the queue is full, util_queue_add_job will wait for a free slot.
If the job which is currently being executed tries to lock the mutex,
it will be stuck forever, because util_queue_add_job is stuck.
The deadlock can be trivially resolved by increasing the queue size
(reallocating the queue) in util_queue_add_job if the queue is full.
Then util_queue_add_job becomes wait-free.
radeonsi will use it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.
In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mesa here requires the scaling lists in diagonal scan order, but
VAAPI passes them in raster scan order. Therefore, rearrange the
elements when copying.
v2: Move scan tables to vl_zscan.c.
Fix type in size assertion.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Jason updated the Khronos spec to explicitly state that Wayland surfaces
must support VK_PRESENT_MODE_MAILBOX_KHR.
ANV did so since day one (back in 2015)
Cc: mesa-stable@lists.freedesktop.org
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason updated the Khronos spec to explicitly state that Wayland surfaces
must support VK_PRESENT_MODE_MAILBOX_KHR.
ANV did so since day one (back in 2015)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
One can override the deviceID, by setting the INTEL_DEVID_OVERRIDE
variable. A few symbolic names or a numerical value for the actual
device ID is accepted.
At the same time we're using strtod (string to double) to convert the
string to a decimal numeral. A seeming thinko, made by the original
commit that introduces the code in libdrm_intel and got here with the
import.
Fixes: 514db96c11 ("i965: Import libdrm_intel.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There seems to be a rounding difference with F2I vs nearest filtering.
The precise problem in the rounding is unknown.
This fixes an incorrect output with OpenMAX encoding.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>