limit==0 is the signal for "don't peephole anything but a move that will
be optimized aways." limit > 0 is "up to N alu instructions may be moved
out." nir-to-tgsi uses ~0 as the indicator of "No, we really need to
eliminate all if instructions" on hardware like i915 that doesn't have
control flow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
We need to make get it updated after we may have nir_instr_remove()d an
instruction, and when we cross blocks. This didn't really matter before
because the only builder usage was idiv, which other users of
lower_int_to_float were probably never hitting.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
The frontend lowering handles normalizing the conventions to the only
model we support, we just need to ignore the property in the TGSI.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
It's not a required feature of the GL2.1 or GLES2, and you really don't
want to be doing SW VS access of the write-combined texture data. Also,
avoids memory corruption in deqp:
Test case 'dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_nearest_linear_repeat'..
Mesa: User error: GL_INVALID_ENUM in glGetIntegerv(pname=GL_MAJOR_VERSION)
Fail (Image comparison failed)
Test case 'dEQP-GLES2.functional.fragment_ops.depth_stencil.stencil_depth_funcs.stencil_equal_depth_always'..
==559181== Invalid read of size 4
==559181== at 0x641E8D0: i915_drm_buffer_unmap (i915_drm_buffer.c:204)
==559181== by 0x64151EB: i915_cleanup_vertex_sampling (i915_state.c:449)
==559181== by 0x640AEA7: i915_draw_vbo (i915_context.c:134)
==559181== by 0x640AEA7: i915_draw_vbo (i915_context.c:55)
==559181== by 0x61367B1: cso_draw_vbo (cso_context.c:1524)
[...]
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
This patch only enables the below VkFormat:
- VK_FORMAT_G8_B8R8_2PLANE_420_UNORM
This patch ensures the proper behavior of the below APIs:
- vkGetPhysicalDeviceFormatProperties2
- vkGetPhysicalDeviceImageFormatProperties2
- vkCreateImage
- vkGetImageSubresourceLayout
- vkGetImageDrmFormatModifierPropertiesEXT
- vkGetImageMemoryRequirements
- vkGetImageMemoryRequirements2
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11281>
Add initial multi-planar format support on the images with modifiers:
- With aux usage,
- Format plane count must be 1.
- Memory plane count must be 2.
- Without aux usage,
- Each format plane must map to a distinct memory plane.
For the other cases, currently there is no way to properly map memory
planes to format planes and aux planes due to the lack of defined ABI
for external multi-planar images.
This patch doesn't include some potentially supported cases like all
format planes mapping to a single memory plane, additional refactoring
is needed to workaround explicit base offset + ANV_OFFSET_IMPLICIT.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11281>
When storing depth- or stencil-only texture data that has been packed into
a depth/stencil texture, the tex store gets PIPE_MAP_READ added onto it
since the other channel will get ORed into the incoming data, but
sometimes we know that the other component is undefined because the whole
texture is either fresh or just invalidated.
Cleans up a confusing extra blit in a dEQP case I've been debugging, and
should be less work for dEQP CI.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11452>
We use the same key in autotune to track historical data about a given
framebuffer state, to inform the decision about using gmem vs sysmem
rendering. Which means we need the key to stick around during the
flush, even if the batch is removed from the batch-cache before the
flush.
Fixes: 507f701d9e ("freedreno: Fix batch flush race condition")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11450>
When an OpBranchConditional that had two equal branches was parsed, we
were treating it as a regular OpBranch. However this doesn't work
well when there's an associated OpSelectionMerge. We ended up
skipping marking the merge block as such, and depending on what was
inside the construct we would end up trying to process the block
twice.
Fix this by keeping the vtn_if around, but when emitting NIR identify
the two equal branch case.
Fixes: 9c2a11430e ("spirv: Rewrite CFG construction")
Closes: #3786, #4580
Reviewed-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9297>
Read directly from the instruction getting spilt. Otherwise a fill
will be inserted before the spill writing the value, so the
instruction reading the spilt value gets garbage data.
Use the bundle_id to check if the instructions are in the same bundle.
Insert a move instruction, as the spill needs the value in a LD/ST
register such as AL0, while the ALU instruction reading the value
needs it in a work register such as R0.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4857
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11212>
Now that we have the rest of format "casting" sharp edges sorted, flip
on copy_image and gles32.
Unfortunately it adds back to piglit xfails (but at least that is more
than offset by my previous round of piglit fixes, and these are pretty
much all things we know had issues based on corresponding nv_copy_image
tests).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11402>
We have some logic to detect when u_blitter generated draws overwrite
the entire render-target, so we know we can discard anything previous.
But some blits (like multi-sample) do multiple draws. We don't want to
discard the earlier draws from the same blit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11402>
In paths where we are handling blits on the 3d pipe, if src==dst we need
to flush to ensure what gets sampled by the blit shader reflects the
results of any previous blits.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11402>
Sequences that pctx->set_framebuffer_state() before pctx->flush() will
see ctx->batch being NULL.. but they still need to call fd_bc_flush(ctx)
to ensure pending batches associated with the context are flushed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11402>
The invalidate would take it out of the bc tracking, so you could go
allocate a new batch->idx matching this one, while this one is still in
the bc using that idx.
You can't generate any new rendering with the ctx's old batches at this
point, anyway, so just flush for simplicity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11439>
It wasn't checking that the transfer map would definitely overwrite all of
the data being initialized by the back blit, and if we knew that it
would then the caller would have provided PIPE_MAP_DISCARD_WHOLE_RESOURCE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11439>
We take references under the lock, but then accessed the lock-requiring
batch_cache structure without holding the lock. The batches wouldn't get
freed and removed from their slots until the last ref goes away so it was
safe (other than the assert at the end), but writing the simple code is
shorter and requires fewer assumptions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11439>
When the system doesn't have enough memory, GNOME Shell may be crashed
by iris:
gnome-shell[1161]: iris: Failed to submit batchbuffer: Cannot allocate memory
gnome-shell[1161]: GNOME Shell crashed with signal 6
So don't abort() when kernel can't allocate memory to avoid crashing the
entire desktop.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11178>
This is entirely implemented in the SPIR-V frontend.
Relevant CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.non_semantic_info.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11440>
Instead of spawning 4 threads when the cache is created,
spawn 1 and let u_queue grow the number of threads if
needed.
I wrote this patch because when running piglit's quick_shader
profile I had lots of samples in disk cache threads - mostly
in native_queued_spin_lock_slowpath kernel function.
Since these tests shouldn't really stress the cache, I assumed
it was caused only by thread creations.
After writing the patch and redoing the measurement, I got an
improvement but I still more hits in the same function for
shader_runner:$disk0 thread so something was wrong.
After digging more, I found out that my shader cache index was
corrupted: the on-disk size was 29MB but the index reported it
was way more than 1GB. So each disk cache thread was spending
a lot of time trying to evict files. Given that my cache had
a really low count of files, the LRU method based on randomly
generating subfolder names failed, so evicting was very slow.
Now that my cache index is fixed, the disk cache threads are
mostly idle but I still think it makes sense to grow the
number of threads instead of spawning 4 at the program start.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11296>
This flag allow to create a single thread initially, but set
max_thread to the request thread count.
If the queue is full and num_threads is lower than max_threads,
we spawn a new thread to help process the queue faster.
This avoid creating N threads at queue creation time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11296>
Instead of doing vbo_exec_vtx_map during initialization,
defer it until the first actual user.
v2: move init to vbo_exec_wrap_upgrade_vertex (Emma Anholt)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11296>