remove_trivial_phi() mostly does nothing for non-array phis, but it
rewrites sources if their definining instruction are trivial phis.
In the case of trivial phis in the loop continue block (for loops with
divergent non-trivial continues), we might need to keep those if they
write a shared register, because the source of the trivial phi will not be
reachable from the loop header phi.
In this example, the predecessors of the continue block should be block2,
but the physical predecessors are block2 and block3, requiring a phi in
the continue block which will then be lowered by ir3_lower_shared_phis.
loop {
block1:
a = phi 0, b
if (divergent) {
block2:
b = a + 1
continue;
}
block3:
break;
}
Fixes RA validation error when compiling blackmythwukong/5645a84e669a6179
from radv_fossils.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
(cherry picked from commit 4f0fb5784f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
VkPipelineRenderingCreateInfo is only required in the fragment output
interface lib. For pre-rasterization shaders and fragment shader state
libs, only the view mask is used but it's optional.
If the attachments info isn't marked invalid merging renderpass info
during lib imports wouldn't work because it would assume that the first
lib has attachment info (eg. the pre-rasterization lib).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15241
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1950b6c1a7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The spec also mentions "output Location decorated color attachments".
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 564b061981 ("hk: Increase maxFragmentCombinedOutputResources to HK_MAX_DESCRIPTORS")
Reviewed-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 59d9bc7bee)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
According to the GL_EXT_multisampled_render_to_texture specification,
copy operations should be allowed when the extension is supported.
Previously, glCopyTexImage* would unconditionally fail with
GL_INVALID_OPERATION when copying from any multisampled framebuffer
(samples > 0), even when using render-to-texture attachments.
Fixes: d7b9da2673 ("mesa/main: fix artifacts with GL_EXT_multisampled_render_to_texture")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Wujian Sun <wujian.sun_1@nxp.com>
(cherry picked from commit 2e340d63d2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
When round_to_nearest_even is enabled with NEAREST filtering, texture
coordinates near texel boundaries (e.g. 0.9999999404) can be incorrectly
rounded up to the next texel instead of being floor()'d.
According to OpenCL spec section 8.2, for CLK_FILTER_NEAREST:
i = address_mode((int)floor(u))
Backport-to: *
Signed-off-by: Eric Guo <eric.guo@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit c415134454)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
AMD APUs are hitting this case where they have very small discrete VRAM,
but a lot of staging memory, which can be used additionally.
Fixes: 7487ac2046 ("rusticl/device: support query_memory_info to retrieve available memory")
(cherry picked from commit 58d45725c7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Is a single triangle is selected, it can be the case that the next iteration
can't merge any pair with the triangle. In that case, the HW node with a
single triangle will not have the highest hw_node_index, triggering an
assert.
Fixes: c18a7d0 ("radv: Emit compressed primitive nodes on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
(cherry picked from commit db38d1a98c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
When cross-compiling with MinGW, d3d12_resource_state.cpp fails to
compile with:
d3d12_resource_state.cpp:161:83: error: call to non-'constexpr'
function 'D3D12_RESOURCE_STATES operator|(D3D12_RESOURCE_STATES,
D3D12_RESOURCE_STATES)'
161 | D3D12_RESOURCE_STATE_ALL_SHADER_RESOURCE |
| D3D12_RESOURCE_STATE_COPY_SOURCE | D3D12_RESOURCE_STATE_COPY_DEST;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/share/mingw-w64/include/minwindef.h:163,
from /usr/share/mingw-w64/include/windef.h:9,
from /usr/share/mingw-w64/include/windows.h:69,
from /usr/share/mingw-w64/include/rpc.h:16,
from /usr/share/mingw-w64/include/unknwn.h:7,
from ../subprojects/DirectX-Headers-1.0/include/wsl/winadapter.h:6,
from ../src/gallium/drivers/d3d12/d3d12_common.h:29,
from ../src/gallium/drivers/d3d12/d3d12_bufmgr.h:31,
from ../src/gallium/drivers/d3d12/d3d12_resource_state.cpp:24:
../subprojects/DirectX-Headers-1.0/include/directx/d3d12.h:3540:1:
note: 'D3D12_RESOURCE_STATES operator|(D3D12_RESOURCE_STATES,
D3D12_RESOURCE_STATES)' declared here
3540 | DEFINE_ENUM_FLAG_OPERATORS( D3D12_RESOURCE_STATES )
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
The DEFINE_ENUM_FLAG_OPERATORS macro in the MinGW winnt.h header
defines operator| for D3D12_RESOURCE_STATES as inline but not
constexpr. (The DirectX-Headers WSL stubs do define it as constexpr,
but when building with MinGW, windows.h is pulled in via winadapter.h
and its non-constexpr definition wins.) Calling a non-constexpr
function to initialize a constexpr variable is ill-formed in C++.
Fix by changing static constexpr to static const, which avoids the
constexpr context while still giving the variable static storage
duration.
Fixes: fe48cd7c5a ("d3d12: Allow state promotion for non-simultaneous access textures")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
(cherry picked from commit 2443f3608a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Usually connector property IDs are acquired in
wsi_display_get_connector, which is called by wsi_get_connectors, and in
turn by vkGetPhysicalDeviceDisplayProperties2KHR and
vkGetPhysicalDeviceDisplayPlanePropertiesKHR. Except if the drm fd is
not available when these functions are called. Which will be the case if
vkAcquireXlibDisplayEXT is not called first.
So it goes like this. First, the display is created in
vkGetRandROutputDisplayEXT. Then it's used in
vkGetPhysicalDeviceDisplayPlanePropertiesKHR, but since the drm fd is
not available at this point, connector property IDs are not initialized.
Later, this display is used in vkAcquireXlibDisplayEXT, which also
doesn't touch the property IDs. Finally in drm_atomic_commit, the
atomic commit fails with EINVAL, specifically because of the
uninitialized ID of the "CRTC_ID" property. Since it's one of the
properties drm_atomic_commit tries to set.
This commit makes sure that find_connector_properties is called in
vkAcquireXlibDisplayEXT to initialize the property IDs.
Fixes: 513ffea1d3 ("wsi/display: use atomic mode setting")
Signed-off-by: Yuxuan Shui <yshui@codeweavers.com>
(cherry picked from commit 37a1986691)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The immediate fs_clear_color shader uses IMM[0] but still declares
CONST[0][0]. That can make drivers try to read a fragment constant
buffer even though one is never uploaded on this path. Only declare
CONST[0][0] when the shader actually uses a constant buffer.
Fixes: 2ff9fa8b72 ("gallium/u_blitter: add a new fs_color_clear variant")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 79e3196320)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Entering a loop with empty exec mask might lead to
not be able to execute the break condition and
lead to infinite loops.
Totals from 81 (0.04% of 202440) affected shaders: (Navi48)
Instrs: 3040566 -> 3040716 (+0.00%)
CodeSize: 17506768 -> 17507188 (+0.00%)
Latency: 16342966 -> 16345166 (+0.01%)
InvThroughput: 3112932 -> 3113286 (+0.01%)
Branches: 82229 -> 82365 (+0.17%)
Cc: mesa-stable
(cherry picked from commit 60b3e5b3f0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Add asserts this time that we don't miss any and that the buckets
actually match the enum in bifrost/compiler.h.
Fixes: 82328a5245 ("pan/bi: Generate instruction packer for new IR")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit fd5c6d1223)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
On v12+, IDVS no longer has separate position and varying variants, so
we only need to emit stats for one binary. Attempting to emit stats for
the nonexistent varying shader breaks shader-db.
Fixes: 7819b103fa ("pan/bi: Add support for IDVS2 on Avalon")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 31ddfe26eb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
All B-series Rogue cores seem to have USC rounding mode as RTE instead
of RTZ.
Set the has_usc_alu_roundingmode_rne feature flag for them (currently
only BXS-4-64 has it set).
Verified via testing on BXM-4-64 (36.52.104.182) by fixing CTS tests
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp32.input_args.* ,
and via proprietary driver vulkaninfo result on BXE-2-32 (36.29.52.182),
BXE-4-32 (36.50.54.182) and BXM-4-64 (36.56.104.183) (checking
shaderRoundingModeRT?Float32 properties).
Fixes: 1db1038a61 ("pvr: add device info for BXM-4-64 (36.56.104.183)")
Fixes: e60e0c96ba ("pvr: add device info for BXE-2-32 (36.29.52.182)")
Fixes: 2743363a57 ("pvr: add device info for BXM-4-64 (36.52.104.182)")
Fixes: ea28791d40 ("pvr: add device info for BXE-4-32 (36.50.54.182)")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 9b44def4e9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
If both src and dst are compressed formats, adjusting the extent isn't
necessary because it's required that texel block extent matches. The
previous division was also wrong because it was truncating partial
blocks causing issues in some tests.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4e00e1c3d0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The sc7180-trogdor-lazor-limozeen devices have been dying off over the
past few weeks, so move the last two jobs to sc7180-trogdor-kingoftown
and retire the device type.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit bbed00ac81)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The sc7180-trogdor-lazor-limozeen devices are having issues, so move the
job to a different device with available capacity.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 17d38c9668)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
This could override data allocated by the application when shader code
is loaded from binary in vkCreateShaderObjectEXT().
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d39e443ef8 ("anv: add infrastructure for common vk_pipeline")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 21952ffb07)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Fix non-functional issue where calls to cs_run_fragment() had swapped
tile_order and enable_tem arguments. Both arguments evaluate to 0.
Hence, no functional change.
Fixes: 53f780ec91 ("panfrost: Remove progress_increment from all CS builders")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 0d08b197f2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The pvr_clear_vdm_state_get_size_in_dw() wrongly think instance count
inputs are needed when doing RTA clear for cores without the
gs_rta_support feature. However, the instance ID is exploited to output
the target layer ID, which isn't supported at all for cores w/o that
feature, so it looks that the condition is inverted. In addition, the
pvr_pack_clear_vdm_state() function seems to have similar logic deciding
whether to emit instance_count, and the logic is opposite to the logic
in pvr_clear_vdm_state_get_size_in_dw() for the part checking the
gs_rta_support feature.
Invert the condition to take instance ID inputs for cores with the
gs_rta_support feature instead of those without this feature.
Fixes: b59eb30e88 ("pvr: Fix cs corruption in pvr_pack_clear_vdm_state()")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
(cherry picked from commit 3db93bbf34)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
runtime error: left shift of 65535 by 16 places cannot be represented in type 'int'
This fixes nir_opt_algebraic_pattern_test.bf2f.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: ecd2d2cf46 ("util: Add functions to convert float to/from bfloat16")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 72f2b8a034)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Without this, reg_offset will return 1024 for acc0. This causes
has_invalid_dst_region to decide that the destination region is invalid
(because 1024 != 0), and the lowering code tries to treat the floating
point accumulators as integers. It's a mess.
v2: Add and use set_gfx_platform. Suggested by Caio.
Fixes: 937373eb25 ("i965/fs: Handle fixed HW GRF subnr in reg_offset().")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit cfdb3ddb93)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Integer accumulators and float accumulators do not occupy the same bits,
so the types cannot be arbitrarily changed.
No shader-db or fossil-db changes on any Intel platform.
v2: Use is_accumulator() instead if brw_reg_is_arf(). Add an extra test
to show the desired behavior when an accumulator is not
involved. Suggested by Caio.
Fixes: 64c251bb3a ("intel/fs: Combine constants for SEL instructions too")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit ffdc310bf1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
After abe6d750e5, glXDestroyContext() can defer destruction by marking
the context with xid == None while it is still current.
However, the release-current path did not clear current->currentDpy,
so a context that had already been marked for deletion could remain
associated with a display after unbinding.
Fixes: abe6d750e5 ("xlib: fix glXDestroyContext in Gallium frontends")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14947
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 447a1d2e8d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
panfrost_bo_import() calls drmPrimeFDToHandle() then pan_kmod_bo_import(),
which also calls drmPrimeFDToHandle() internally. This double import causes
GEM handle refcount leaks because each drmPrimeFDToHandle() increments the
kernel's GEM handle refcount, but only one drmCloseBufferHandle() is called
during cleanup by panfrost_kmod_bo_free(or panthor_kmod_bo_free).
Fix by removing the redundant drmPrimeFDToHandle() and using
pan_kmod_bo_import() directly. On re-import of existing buffers, properly
release the extra pan_kmod_bo reference with pan_kmod_bo_put().
This ensures GEM handle refcount, pan_kmod_bo refcount, and panfrost_bo
refcount are all properly balanced.
Fixes: 5089a758df ("panfrost: Back panfrost_bo with pan_kmod_bo object")
Signed-off-by: Xianzhong Li <xianzhong.li@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 248b0b47b7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source. The source
offset doesn't matter.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit 05c5e52054)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are
either documented wrong or broken in hardware. Instead of swizzling
b0101 and b2323, they swizzle b0011 and b2233 on G52. This is either a
hardware bug or an issue with documentation. In either case, it's
probably best not to trust it. Those swizzles aren't all that useful
anyway. We also weren't using any of them before (or they'd have
broken) so this isn't a performance regression.
Cc: mesa-stable
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit 538b5c411e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
Midpoint sorting is incompatible with how our traversal works.
Specifically, we change tMax when a hit is committed so we can skip over
BVH nodes that are guaranteed not to produce a closer hit. However,
changing tMax also changes the intersection interval of box nodes with
the ray, and thus, the midpoints of that interval. Stackless traversal
relies on getting nodes back in the exact same order as before, and if
that requirement is not met, traversal may incorrectly skip over nodes.
The likely benefit of midpoint sorting does not make up for the loss of
ability to skip over BVH nodes exceeding tMax, so simply disable
midpoint sorting.
This fixes geometry being visible behind other geometry when it
shouldn't be in various applications, including Half-Life 2 RTX.
Cc: mesa-stable
(cherry picked from commit c1a7680d93)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
ballot_bit_count_reduce expects the ballot to have 4 components causing
validation failures on targets where 1 < ballot_components < 4. Fix this
by padding the ballot to 4 components.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: ae66bd1c00 ("nir/opt_uniform_subgroup: use ballot_bit_count")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit cc6eec79c2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
nir_builder_alu_instr_finish_and_insert initialized the def's bit_size
and num_components so we should set them afterwards.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Fixes: c66967b5cb ("nir: add nir_opt_varyings, new pass optimizing and compacting varyings")
(cherry picked from commit 273fd18b89)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
st_cb_bitmap appends a temporary bitmap sampler view to the sampler
view array passed to set_sampler_views().
1a5c660ef5 changed this path to only release the extra YUV views
returned by st_get_sampler_views(), but the temporary bitmap view is
created locally and is not part of extra_sampler_views. It therefore
stopped being released so release the temporary bitmap sampler view
explicitly after drawing the bitmap quad.
Fixes: 1a5c660ef5 ("st/bitmap: only release YUV samplerviews")
(cherry picked from commit 33864e569e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
From the OpenCL specification:
`CL_MEM_KERNEL_READ_AND_WRITE`: This flag is only used by
clGetSupportedImageFormats to query image formats that may be both
read from and written to by the same kernel instance. To create a
memory object that may be read from and written to use
CL_MEM_READ_WRITE.
If an application follows the instructions above, i.e. query a list of
supported image formats, using `CL_MEM_KERNEL_READ_AND_WRITE` as
input, and then attempts to create an image using one of the supported
image formats, by calling `clCreateImage` and passing
`CL_MEM_READ_WRITE`, the call to the image creation entry point should
succeed. This instead fails on Mali devices with the error
`CL_IMAGE_FORMAT_NOT_SUPPORTED`.
Rusticl fails when validating the image format against its supported
flags. Formats that support `PIPE_BIND_SHADER_IMAGE` have their
supported flags set as `CL_MEM_WRITE_ONLY` and
`CL_MEM_KERNEL_READ_AND_WRITE`.
This changes the supported CL flags to be `CL_MEM_WRITE_ONLY` for
`PIPE_BIND_SHADER_IMAGE` and `CL_MEM_READ_WRTE |
CL_MEM_KERNEL_READ_AND_WRITE` for `PIPE_BIND_SAMPLER_VIEW |
PIPE_BIND_SHADER_IMAGE`.
Fixes: 3386e142 (rusticl: support read_write images)
Fixes OpenCL-CTS test: `test_image_streams` on Mali. Invocation:
```
test_image_streams write 1D CL_RGB CL_SIGNED_INT8
```
Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
(cherry picked from commit e77c984cef)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
This would read OOB and crash because data type is optional per the
SPIRV spec.
Original patch by Faith Ekstrand <faith.ekstrand@collabora.com>.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f8be7bfad)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40979>
When converting the index buffer from 4-bytes to 2-bytes, we use the
uploader for the job. Since commit b3133e250e we do an uploader alloc
ref, which releases the uploader buffer if there is no enough space,
creating a new one.
The problem happens when we also need this buffer because it is the one
containing the index buffer to convert. This happens, for instance, if
we need to convert the primitives because they are not supported (e.g.,
converting quads to triangles), as this is done
also using the uploader.
The solution is to ensure the uploader's buffer has an extra reference
so when released, it is not destroyed. This can easily achieved by
calling first pipe_buffer_map_range(), which is required to access the
buffer, and it increases the references.
This fixes `spec@!opengl 1.1@longprim`.
Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 48c086cb42)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
R300_PACKET3_3D_CLEAR_HIZ encodes COUNT in 14 bits (COUNT[13:0]), so a
single packet can clear at most 0x3fff dwords.
Large depth surfaces on R5xx can require more HiZ dwords than that.
When we emitted a single packet, COUNT truncated and part of HiZ RAM
remained uncleared, which could show up as HyperZ corruption.
Emit CLEAR_HIZ in chunks of R300_CLEAR_HIZ_COUNT_MAX and reserve enough
atom space for the worst-case packet count derived.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/360
Fixes: 12dcbd5954 ("r300g: enable Hyper-Z by default on r500")
(cherry picked from commit fddc101070)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Introduce r300_hyperz_pipe_count and use it in\nr300_setup_hyperz_properties.\n\nRV530 selects pipe topology from NUM_Z_PIPES, while other families use\nNUM_GB_PIPES. Keeping this in one helper avoids duplicated family checks\nand prepares follow-up HiZ clear sizing changes to reuse the same rule.
(cherry picked from commit e97ac38ff3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Ian Romanick reported some "undefined behaviour" warnings during some
not specified tests, relating to introduction of RGB[A}16_UNORM formats
in merge request
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588
This due to overflowing the 32-bits masks[], and then during assignment
the red/green/blue/alphaMask fields in struct gl_config when using a 16
bpc format. Iow. the red/green/blue/alphaMask would not be usable.
Suppress this warning by setting masks[] to zero for unorm16 formats,
just as was previously done for is_float16, ie. fp16 formats.
16 bpc formats are only exposed for display on non-X11 WSI target
platforms like GBM+DRM, Wayland, surfaceless, and these platforms do
not use the info in red/green/blue/alphaMask at all, so the "undefined
behaviour" is meaningless.
Fixes: f2aaa9ce00 ("dri,gallium: Add support for RGB[A]16_UNORM display formats.")
Reported-by: Ian Romanick @idr
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit ab94515b0a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Fix a crash in image descriptor emission caused by stale image_mask bits.
Root cause:
- set_shader_images used a shift expression with count==64 when clearing
image_mask, which is undefined behavior in C.
- This could leave image_mask inconsistent with actual image bindings,
so panfrost_emit_images() might dereferences NULL image resources.
Fixes:
- Use 64-bit-safe bit helpers for mask updates to avoid invalid shifts.
Crash observed when running: OpenCL-CTS api/test_api
Backtrace:
#0 util_image_to_sampler_view (v->resource is NULL)
#1 panfrost_emit_images
#2 panfrost_update_shader_state
#3 panfrost_launch_grid_on_batch
#4 panfrost_launch_grid
Backport-to: *
Signed-off-by: Eric Guo <eric.guo@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit c1770565f3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Mesa's vk_common_SetDebugUtilsObjectNameEXT assumes every Vulkan object
handle is a pointer to a vk_object_base struct. In gfxstream, only a
subset of objects (instance, device, queue, command buffer, command pool,
buffer, fence, semaphore) carry a Mesa wrapper. All other non-dispatchable
handles (shader modules, pipelines, render passes, etc.) are opaque host
handles that are not valid pointers.
Passing such an unwrapped handle to the common path causes it to be cast
to a vk_object_base pointer and dereferenced, resulting in a SIGSEGV
(null-pointer dereference at offset 0x40).
Override the function in the gfxstream driver to store debug names on
vk_object_base for wrapped objects and return VK_SUCCESS for unwrapped
objects.
Fixes: 7b50e62179 ("gfxstream: mega-change to support guest Linux WSI with gfxstream")
Test: Verified with hellovk (with validation layers) on Android Emulator - no crashes.
(cherry picked from commit bf8862b49f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When DXGI is not supported, win32 falls back to sw wsi without acquire
timeout ignored.
This change:
1. adds the needed acquire mutex and cond
- the fail path is intentionally left untouched so that mutex and
cond are both valid when wsi_win32_swapchain_destroy is called
2. adds wsi_win32_acquire_idle_cpu_image helper to respect timeout
3. adds wsi_win32_set_image_idle helper to properly signal acquire_cond
for sw wsi case
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15122
Cc: mesa-stable
(cherry picked from commit af42f0c80f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Implement software workaround for AVP decoder corruption on Gen12
platforms. These platforms require a warmup workload before
the actual AV1 decode to prevent output corruption.
- Gen12: Tiger Lake, DG1, Rocket Lake, Alder Lake
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 260908cecb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
I have been running into crashes in this function when using blender.
Some of the entries in ice->state.framebuffer.base.cbufs[0] can
apparently have the texture field be null, which was causing a segfault
in this loop.
In my case, nr_cbufs was 3, and the first two cbufs entries had a null
texture and format set to PIPE_FORMAT_NONE. The last entry had format of
PIPE_FORMAT_R16G16_FLOAT and a non-null texture.
Adding this null check before attempting to dereference the texture
fixes the crash for me and allows blender to work normally.
Fixes: ca96f8517c ("iris: remove uses of pipe_surface as a pointer")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit e16c8cc579)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
In the attribute model, the size is for the attribute binding and the
offset is an offset into that range. If we're going to use that to
offset the buffer itself, we need to increase the size accordingly.
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit ce56f49561)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
They swizzle just like anything else. Technically, we could maybe do a
little better than the generic case for these since they only read 8
bits per 16 bits in the destination but the generic case is correct,
even if it isn't optimal.
Fixes: f7d44a46cd ("pan/bi: Optimize replication")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit 8dc458225b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
struct nu_handle is hashed and deduplicated using struct nu_handle_key, which ignored
parent_deref. That means all instructions will use the first parent_deref when rewriting
the sources.
Avoid this by not including the parent deref in the struct, and instead querying it
when needed.
Fixes: 4d09cd7fa5 ("nir/lower_non_uniform_access: Group accesses using the same resource")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15173
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
(cherry picked from commit e7077e8f5c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
It didn't save states properly. The only correct place to save them is
si_blitter_begin. Unfortunately, we can't skip saving and restoring
those states because we don't know in advance whether the rectangle path
will be used.
Cc: mesa-stable
Reviewed-by: Pierre-Eric
(cherry picked from commit 556ceb1b75)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The dirty state of stencil ops is not checked when deciding whether to
rebuild the ISP state, although the values are part of the ISP state
(the 27:16 bits of ISPB word).
Add MESA_VK_DYNAMIC_DS_STENCIL_OP to the condition for rebuilding ISP
control registers.
Fixes GLCTS tests when running on top of Zink:
dEQP-GLES2.functional.fragment_ops.stencil.zero_stencil_fail
Fixes: 88f1fad3f7 ("pvr: Use common pipeline & dynamic state frameworks")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit ee031d67b4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
On LSC platforms the SLM writes are unfenced between workgroups. This
means a workgroup W1 finishing might have uncompleted SLM writes.
Another workgroup W2 dispatched after W1 which gets allocated an
overlapping SLM location might have writes that race with the previous
W1 operations.
The solution to this is fence all write operations (store & atomics)
of a workgroup before ending the threads. We do this by emitting a
single SLM fence either at the end of the shader or if there is only a
single unfenced right, at the end of that block.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13924
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit fa523aedd0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The DDMADT instruction of PDS has out-of-bound test capability, which is
used for implementation of robust vertex input fetch.
According to the pseudocode in the comment block before the "LAST DDMAD"
mark in pvr_pipeline_pds.c, the check is between
`calculated_source_address + (burst_size << 2)` and `base_address +
buffer_size`, in which the `burst_size` seems to correspond to the BSIZE
field set in the low 32-bit of DDMAD(T) src3 and the `buffer_size`
corresponds to the MSIZE field set in the DDMADT-specific high 32-bit of
src3. As the calculated source address is just the base address adds the
multiplication result (the offset), the base address could be eliminated
from the check, results in the check between `offset + (BSIZE * 4)` and
`MSIZE` .
Naturally it's expected to just set the MSIZE field to the buffer size.
In addition, as the Vulkan spec says "Reads from a vertex input MAY
instead be bounds checked against a range rounded down to the nearest
multiple of the stride of its binding", the driver rounds down the
accessible buffer size before setting MSIZE to it.
However when running OpenGL ES 2.0 CTS, two problems are exhibited about
the setting of the size to check:
- dEQP-GLES2.functional.buffer.write.basic.array_stream_draw sets up a
VBO with 3 bytes per vertex (RGB colors and 1B per color) and 340
vertices (results in a buffer size of 1020 = 0x3fc). However as the
DMA request size, which is specified by BSIZE, is counted by dwords,
3 bytes are rounded up to 1 dword (which is 4 bytes). When the bound
check of the last vertex happens, the vertex's DMA start offset is
0x3f9, so the DDMADT check happens between 0x3fd (0x3f9 + 1 * 4) and
0x3fc, and indicates a check failure. This prevents the last vertex,
which is perfectly in-bound, from being properly fetched; this is
against the Vulkan specification, and needs to be fixed.
- dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.
buffer_0_32_float2_vec4_dynamic_draw_quads_1 sets up a VBO with a size
of 168 bytes, and tries to draw 6 vertices (each vertex consumes 2
floats (thus 8 bytes) of attribute) with a stride of 32 bytes using
this VBO. Zink then translates the VBO to a Vulkan vertex buffer bound
with size = 168B, stride = 32B. Here the optional rule about rounding
down buffer size happens in the current PowerVR driver, and the
checked bound is rounded down to 160B, which prevented the last
vertex's 8B attributes to be fetched. It looks like this kind of
situation is considered in the codepath without DDMADT, but omitted
for the codepath utilizing DDMADT for bound check.
So this patch tries to mimic the behavior of DDMADT when setting the
MSIZE field of it to prevent false out-of-bounds. It first calculates
the offset of the last valid vertex DMA, then adds the DMA request size
to it to form the final MSIZE value. With the code calculating the last
valid DMA offset considering the situation of fetching the attribute
from the space after the last whole multiple of stride, both problems
mentioned above are solved by this rework.
There're 99 GLES CTS testcases fixed by this change, and Vulkan CTS
shows no regression on `dEQP-VK.robustness.robustness1_vertex_access.*`
tests.
Fixes: 4873903b56 ("pvr: Enable PDS_DDMADT")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit 252904f3d1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This memory padding is enforced by GetBufferMemoryRequirements2 and
might be then checked against to decide whether it's enough.
Move it to pvr_buffer.h for further assertions.
Backport-to: 25.3
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit d992474be9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Currently the size of single components inside one attribute is saved
and checked against when checking DMA capability. However, the vertex
attribute DMA happens for a whole attribute instead of individually for
its components, so checking against the component size is useless -- the
size of the whole attribute is what needs to be saved and checked.
Rename all component_size_in_bytes fields to attrib_size_in_bytes, and
save the size of the whole attribute inside them.
Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit aa8dad141c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The ddmadt_oob_buffer_size structure to be filled is named
`obb_buffer_size`, which is obviously a typo.
Change to `oob_buffer_size` to fix the typo.
Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit caea72cffc)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Applications often miss emitting barriers between a shader
initializing data & another shader writing data in the same location
afterward. This is very common for UAVs (see vkd3d-proton).
Vkd3d-proton does a pretty good job as inserting missing barriers
between UAV clears & writes. But some applications also have similar
issues with custom shaders. Here we introduce an analysis pass that
recognize shaders doing clear/initialization. We'll use that
information in the following commit to insert barriers after those
shaders.
Since Gfx12.5 our HW has become a lot more sensitive to those issues
due to the introduction of an L1 untyped data cache that is not
coherent across the shader units. On Gfx20+, typed data is also L1
cacheable exposing even more issues.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 13bf1a4008)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This complements our existing nir_get_io_index_src helper. Most, but annoyingly
not all, stores put their data source in source 0. Having a helper for this lets
us reduce special casing in a bunch of random places.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Job Noorman <jnoorman@igalia.com>
(cherry picked from commit 8fb1d65426)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Even if a linear image isn't created with usages declaring PBE writes,
the image might be exported and then re-imported with a usage that
allows rendering to.
Always align linear images' width for being written by PBE.
This fixes WSI creating surfaces with odd width, exporting them and
re-importing for rendering.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 765a9f4fd9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The s0abs bit in the encoing of fred instruction is wrongly set to the
status of .neg modifier instead of .abs modifier.
Fix this copy-n-paste error.
Fixes GLCTS tests when running on top of Zink:
dEQP-GLES2.functional.shaders.random.trigonometric.vertex.4
dEQP-GLES2.functional.shaders.random.trigonometric.vertex.45
dEQP-GLES2.functional.shaders.random.trigonometric.fragment.4
dEQP-GLES2.functional.shaders.random.trigonometric.fragment.45
Fixes: 8ec174b3f9 ("pco: add support for various selection, complex, trig ops")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 54860bb4c7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The referenced commit switched from a passthrough shader
to fs_clear_color[write_all_cbufs=0]. It shouldn't matter since
the shader isn't supposed to be executed - it's only setup to get
the first color output active.
On some chips (gfx8) it seems to cause issues (hangs or page fault)
for some piglit tests, eg:
framebuffer-blit-levels draw stencil
To fix this, introduce a 3rd variant, where a constant buffer isn't
required and instead the color is hardcoded in the shader.
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2ff9fa8b72)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The per-bind status was always being set to VK_SUCCESS instead of the
actual result from nvk_bind_image_memory.
Fixes: 93792b5ef2 ("nvk: Add static wrappers for image/buffer binding")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit dd3e153a10)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When a predt/predf branch can be removed, any sync flags set on the
terminator were removed as well. Fix this by copying these flags to the
prede that replaces the terminator.
Fixes frame instability in "Devil May Cry 5" and "Resident Evil 3".
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 39088571f0 ("ir3: add support for predication")
(cherry picked from commit b2a44da9e9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The V3D 7.1 TFU ICFG register restructured the IFORMAT field to 3 bits
(25:23) vs 4 bits on V3D 4.2. The defines were still using the V3D 4.2
encoding (11-15) which overflows the 3-bit field. Fix values to the
correct 3-7 range.
This was working by accident because the overflow bits land in the
SVTWID field, which is not used for the affected tiling formats.
Also rename SAND_128 to SAND since V3D 7.1 has a single SAND input
format; the tile width is now controlled by SVTWID.
Fixes: 146ceadcf4 ("v3dv: add support for TFU jobs in v71")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 89229f08bb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Skylake is the default device for the Intel shim, and it's already
included in the four Intel families listed below.
Fixes: 183d57aa9e ("ci: Run intel shader-db on Haswell, Broadwell, and Meteorlake")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 9dd0f19198)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Prevents assertion failures in func.shader-ballot.basic.q0 and other
tests starting with "nir/algebraic: Optimize some b2f of integer
comparison".
Vector immediates, bfloat, and 8-bit floats are still not supported.
v2: Almost complete re-write based on suggestions from Ken.
v3: Don't retype() on a brw_imm_f value.
Fixes: f8e54d02f7 ("intel/compiler: Relax mixed type restriction for saturating immediates")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 985ace332b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Move options were bit or-ing from the wrong enum, causing undefined
behaviour when the number of intrinsics changed.
Replaced it with the values from the right nir_move_options enum that
were previously working. (Further refinement needed on these after
extensive testing.)
Fixes: f1b24267d2 ("pco: rework nir processing and passes")
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 721c1b8f65)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
They check for if uses and want to return false but nir_foreach_use()
means the if uses are never seen.
Cc: mesa-stable
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit 3f870d62b0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Variants can modify which outputs get written so we must update
these fields otherwise spi_shader_col_format will be incorrect.
This can happen for instance with uniforms inlining:
uniform bool depth_only;
void main() {
if (depth_only) return;
...
}
When depth_only is true, this shader becomes empty after uniforms
inlining but spi_shader_col_format wasn't updated properly,
causing a hang.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14737
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 88986dcc9c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
move_rt_instructions() only makes sense for CPS recursive shaders, where
later rt_trace_ray calls can overwrite the current shader's RT system
values.
Running it on the function-call path can hoist load_hit_attrib_amd
above merged intersection writes, which corrupts any-hit
hitAttributeEXT. Move the pass into the existing CPS-only
non-intersection branch before nir_lower_shader_calls().
Fixes: c5d796c902 ("radv/rt: Use function call structure in NIR lowering")
Closes: #15074
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
(cherry picked from commit 5a7f4c62d8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When vdrm_handle_to_res_id fails in virtio_bo_init_dmabuf, the handle
obtained from vdrm_dmabuf_to_handle was leaked.
Closing the handle is safe despite the lack of vdrm refcounting
because dma_bo_lock is held and already-imported BOs return early.
At this point, we are the sole holder of the handle.
While here, use the local vdrm variable consistently.
Fixes: 6ca192f586 ("turnip: virtio: fix iova leak upon found already imported dmabuf")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit f2c89f0188)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
In tu_bo_init, if growing the submit BO list fails, the GEM handle
must be closed. However, bo->gem_handle is only populated later
via compound assignment. Use the gem_handle parameter directly
to ensure the correct handle is closed and not leaked.
Fixes: d67d501af4 ("tu/drm/virtio: Switch to vdrm helper")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 316d9b0209)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When initializing a BO using a lazy VMA, the iova is provided by
the sparse VMA and was not allocated from the device's VMA heap.
Avoid calling util_vma_heap_free in the error path for such BOs
to prevent heap corruption and potential double-frees.
Fixes: 88d001383a ("tu: Add support for a "lazy" sparse VMA")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit eb7897f57b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
set_iova() was called unconditionally after tu_bo_init(), even on the
failure path where the BO has been zeroed. This would call set_iova()
with res_id 0 and a stale iova, corrupting the iova mapping.
Move set_iova() into the success branch so it is only called when
tu_bo_init() succeeds.
Fixes: db88a490b8 ("tu: Avoid extraneous set_iova")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 7a96bc3187)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
load_helper_invocation can not be reordered past a demote.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 7ece220f96 ("nak/nir: Lower systm values before lowering I/O")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit cba5841d61)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 0092edfec0 ("nir/dead_cf: Do not remove loops with loads that can't be reordered")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit 6013667d61)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
with the unlowering pass, there is no longer a separate gl_LastFragData variable,
so this workaround just breaks color outputs
fixes dEQP-GLES31.functional.shaders.framebuffer_fetch.basic.last_frag_data
cc: mesa-stable
(cherry picked from commit 4b2022a8f5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When the first attachment is assigned to a tile buffer, the buffer
alloc mask was not been updated. This means when a second attachment
is added to the same tile buffer it will be assigned the same offset
as the first which will lead to incorrect behaviour.
Fixes for depq-vk:
dEQP-VK.renderpasses.dynamic_rendering.complete_secondary_cmd_buff.suballocation.attachment.4.568
dEQP-VK.renderpasses.dynamic_rendering.complete_secondary_cmd_buff.dedicated_allocation.attachment.4.568
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.suballocation.attachment.4.568
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.dedicated_allocation.attachment.4.568
Fixes: a7de9dae6b ("pvr: Add routine for filling out usc_mrt_setup from dynamic rendering state")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 96cfb1cb7f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Do not assume that the application always provides images for backing
attachments. The app can provide a super set of attachments of which
only some are actually backed with images.
We want to filter-out attachments that aren't meaningful for rendering
or sampling, and create compiler resources only for relevant ones.
Fix assert in CTS:
pvr_arch_mrt.c:215: pvr_rogue_init_usc_mrt_setup: Assertion `att_format != VK_FORMAT_UNDEFINED' failed.
Seen in pipeline monolithic, for instance:
dEQP-VK.pipeline.monolithic.multisample.misc.dynamic_rendering.multi_renderpass.r8g8b8a8_unorm_r16g16b16a16_sfloat_r16g16b16a16_sint_d32_sfloat_s8_uint.random_127
Fixes: d549c1d045 ("pvr: add pipeline handling to use dynamic rendering info")
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 5473ca3be3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Expose the routine in preperation for a later commit.
Backport-to: 26.0
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 6b0fea938b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Use the valid/input coverage masks for tile buffer store coverage masks
when running single/multi-sampled fragment shaders respectively.
Fixes: 297a0c269a ("pvr, pco: tile buffer support")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reported-by: Nick Hamilton <nick.hamilton@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 8eee60fa78)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
When destroying H264/5 decode context we check the profile from decoder to
free the H264/5 PPS/SPS objects, but decoder is only created when decoding
first frame so these objects will never get freed in case decoder is NULL.
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 5134d37e7d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.
Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*
Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit fd556e54f6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Dividing this by itself is nonsensical, and just always gives us one.
That's obviously not what we want here.
But in this case we also know that the extent is divisible by the tile
extent, so there's no need for DIV_ROUND_UP, we can just divide.
Fixes: e6f8cab698 ("pan/layout: Split the logic per modifier")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 5280b80281)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
We can't use the stencil-aspect of a color-attachment. That's going to
fail, so let's use the color-aspect instead. We already have it around
anyway.
Fixes: 7a763bb0a3 ("pan/genxml: Rework the RT/ZS emission logic")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 322aaa88c6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This is controlled by the writeback-mode when using AFRC, not by an YUV
Enable field. This Filed doesn't exist in these, and should according to
the spec be zero.
Fixes: 7a763bb0a3 ("pan/genxml: Rework the RT/ZS emission logic")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 15e0ac0731)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This code path is usually used by lavapipe when importing dmabufs, not
for output.
The resulting size_required is then used to calculate the size
requirements for VkMemoryRequirements2 etc. Requiring a multiple of
LP_RASTER_BLOCK_SIZE - 4 - can eventually result in lavapipe rejecting
dmabuf imports.
An example is YUV420 at a resolution of 1680x1050 produced by Gstreamer
1.28 - e.g. from a screencasts. In this case we currently compute a size
of 3235840, while other drivers like radv compute 3225600. The actual
size is 3227648, fitting into the later but not the former.
Removing the alignment brings lavapipe in line with other drivers.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
(cherry picked from commit 0bbc26d2c4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The preprocessor symbol we want is `PAN_ARCH`, not `MALI_ARCH`.
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit 3945421c17)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
We were computing some positions using `void*` rather than pointers to
the appropriate structures. This caused bad pointers, the effect of
which depended on the current memory environment -- tests related to
texel buffers could pass or not depending on what other tests had run
previously.
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
(cherry picked from commit 0142e2e5e3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
On Turing, the hardware rely on the viewport index for FSR.
If not all viewports are defined, we will end up not rendering
anything when selecting the primitive shading rate.
This patch makes it that we now broadcast the viewport and scissor 0
likes the proprietary driver.
This fixes "dEQP-VK.mesh_shader.ext.builtin.primitive_shading_rate_*" on
Turing.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 2fb4aed9 ("nvk: Advertise VK_KHR_fragment_shading_rate")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit d00965651a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
We are going to need to reuse those functions to fix FSR support on
Turing.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Cc: mesa-stable
(cherry picked from commit 56e31d8145)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
The game uses glGetUniformLocation() but specifies the wrong program id
for one of the uniforms. The shader programs both contain shaders with
a uniform of the same name but because they have a different number of
uniforms the returned uniform location does not match the expected uniform.
Here we add a workaround to force the uniform with the wrong get location
params to always have the location 0 so that it doesn't matter which
shader the application checks for the location.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14864
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 09393b33b2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Allows a uniform name to be passed to force_explicit_uniform_loc_zero
allowing us to set that uniform to an explicit location of zero.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 87ae5cab94)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
This sets the per-vertex point size state correctly in the presence of mesh shaders.
(fixes line is just a educated pick)
Fixes: 51d6e4404a ("mesa: allow NULL for vertex shader when mesh pipeline")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit 5bfaf7536a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
Currently the headless WSI unconditionally uses DRM images as WSI
images, which isn't proper behavior for working with lavapipe driver,
and leads to either error or crash (depending on whether udmabuf is
available).
Properly setup CPU images instead of DRM images for software-rendering
WSI devices.
This fixes (at least) `dEQP-VK.wsi.headless.swapchain.render.*` on
lavapipe.
Fixes: 90caf9bdbd ("vulkan/wsi/headless: drop the wsi_create_null_image_mem override")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
(cherry picked from commit 38cf1b3829)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40752>
st_texture_set_sampler_view() currently allows only one samplerview for
a given texobj per context. in a scenario where the same texobj is
bound multiple times with different samplerviews (e.g., SRGB) for the
same draw like
samplerviews[] = {view0, view1}
then st_texture_set_sampler_view() will release view0 while creating view1
before either view is actually set to the driver, and then the driver will explode
this is gross, but the best solution which avoids infinite memory ballooning
from bufferview offsets is to pass through the array of views during creation
to ensure that the cache doesn't try to prune a view it just created
caught by Left 4 Dead 2
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15045
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 3264adf863)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
We can produce a transposed value sometimes, and we have to make sure
that val->transposed is also updated when that happens.
Noticed by inspection after the previous commit.
Cc: mesa-stable
(cherry picked from commit c13bdaaa40)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Since the stack pointer may wrap around the stack size in overflow
cases, traversal logic calculates the real stack pointer with
nir_umod_imm(b, stack, args->stack_entries * args->stack_stride).
For ray queries, "stack" was initialized to
"stack_base + local_invocation_idx * 4". This was completely broken, as
the umod would later delete the stack base completely and overwrite the
start of LDS, which belongs to the apps' shared memory.
Instead, add the stack base as a constant offset in the load/store_stack
callback. (This should also save 1 VALU per ray query)
Also, delete radv_ray_traversal_args::stack_base since it's unused now.
Cc: mesa-stable
(cherry picked from commit b046eaf36d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Vertex shaders shorter than four instructions can hard-lock R3xx GPUs.
This seems to happen in combination with a small vertex count. This was
seen before, most notably with dummy shaders, but the earlier fix only
removed those dummy shaders, so some occurrences could still slip
through the cracks. Pad all vertex shaders to four instructions on R3xx.
Reviewed-by: Filip Gawin <filip@gawin.net>
Fixes: c6aa639ba9 ("r300: skip draws instead of using a dummy vertex shader")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/337
(cherry picked from commit 9b12664b72)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
GFX12 encoding added one bit to the stack offset, doubling the limit on
the stack base offset that is possible to encode. In practice, this
always allows using bvh_stack_push* instructions on GFX12 since LDS is
still 64kB.
Cc: mesa-stable
Fixes: 59a39779 (radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode)
(cherry picked from commit 867d0b33b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Needed for some FSR macro changes I want to test.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 7d6cc15ab8 ("nvk/mme: Add a unit test framework for driver macros")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 32895657b4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
The correct dependence is cs_flush_caches.cs_defer.signal to
signal cs_sync32_set.cs_defer.wait in occulusion query path.
Fixes: 443ddac ("panvk/csf: merge v10 and v11 paths in
issue_fragment_jobs")
Fixed: many random fail cases in VK-GL-CTS 1.4.4.2, eg.
dEQP-VK.query_pool.occlusion_query.get_results_conservative
_size_64_wait_query_without_availability_draw_points_clear_color
Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 93b58064f7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
Normally Venus on Nvidia GPUs takes the prime blit path. The exception
is when KWin or any wlroots based compositors are used:
1. KWin and wlroots based compositors always add LINEAR to dmabuf
feedback tranches assuming LINEAR can be handled by GPU drivers.
2. Venus + Virgl only sees the compositor injected LINEAR mod since
Virgl doesn't support explicit modifiers on the driver side.
3. Nvidia GPUs doesn't support LINEAR color attachment, and it's too
late to reject LINEAR mod when the native image path has already
been taken instead of the prime image path.
Gamescope requires VK_EXT_physical_device_drm and its runtime doesn't
use standard WSI extensions, so venus can spoof without impacting it.
Cc: mesa-stable
(cherry picked from commit 1a302155ee)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40488>
"explicit sw" means llvmpipe, which cannot be a real drm device. this requires also
returning only a single device so as to avoid leaking non-sw drivers
should fix LIBGL_ALWAYS_SOFTWARE=1 eglinfo
Fixes: 8a339cdebc ("egl: fix sw fallback rejection in non-sw EGL_PLATFORM=device")
(cherry picked from commit c9b2986607)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.
We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.
This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.
This affect performance on Xe2+ onwards:
* Boundary Benchmark: 36.2%
* Solar Bay extreme: 9.8%
* Hitman world of assassination: 3.9%
Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit cb423ee636)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Previously, we assumed that the selector for bcsel could be whatever,
regardless of the bit sizes of the data and we'd just fix it in the
back-end. This works okay for scalars but falls over the moment we
vectorize because all our vector handling assumes bit sizes match.
Since matching bit sizes is what the hardware wants anyway, it's better
to do the right thing in NIR and hope copy-propagation can fold in
conversions if needed.
Unfortunately, copy prop isn't that smart yet so this does hurt a bit:
Instrs: 1193679 -> 1198086 (+0.37%); split: -0.06%, +0.43%
CodeSize: 11915136 -> 11950592 (+0.30%); split: -0.05%, +0.34%
Full: 160985 -> 160941 (-0.03%); split: -0.04%, +0.01%
Estimated normalized CVT cycles: 4456.938557000181 -> 4480.876069000186 (+0.54%); split: -0.13%, +0.67%
Estimated normalized SFU cycles: 6350.9375 -> 6392.21875 (+0.65%)
Estimated normalized Load/Store cycles: 205773.0 -> 205795.0 (+0.01%)
Maximum number of threads: 12864 -> 12863 (-0.01%)
Number of spill instructions: 22487 -> 22489 (+0.01%)
Number of fill instructions: 52179 -> 52219 (+0.08%)
Hurt shaders:
google-meet-clvk/BgBlur
google-meet-clvk/Relight
parallel-rdp/small_subgroup
parallel-rdp/small_uber_subgroup
The proper solution here is to teach copy-prop about this stuff so that
it can propagate swizzles into ALU ops when they're supported:
https://gitlab.freedesktop.org/panfrost/mesa/-/issues/265
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14945
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 3fd471dca5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
It calls both for some reason but never handles any other booleans than
32-bit. This was probably a mistake.
Fixes: e63a7882a0 ("etnaviv: call nir_lower_bool_to_bitsize")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 6fb3995659)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This should be doing a or and not an assign.
This fixes issues on NVK with mesh stages on DGC.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 8f2eeee7ba)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Zink uses the output primitive of the last vertex stage when deciding
the raster primitive. When we generate the gs the output primitive
depends on the raster primitive.
Not only does the generated gs output primitive have no value in chosing
the raster primitive, it can also get us stuck with the last raster
primitve which is of course incorrect.
Ignore it for generated shaders.
Cc: mesa-stable
(cherry picked from commit d526bbc29b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If multiview is enabled on the render pass, baseLayer and layerCount
will be 0 and 1 respectively and throw us off.
We can still fast clear if view_mask == 1, but anything else hits the
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL restriction.
Fixes: e488773b29 ("anv: Fast clear depth/stencil surface in vkCmdClearAttachments")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 5d22f307d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Commit 0d4aa5f55f introduced the watermark to optimize the guardband
state changes and always computed new_distance as MAX2(distance,
watermark).
That is correct for point/line paths where distance > 0, but it keeps a
non-zero discard distance alive when the next draw sets distance = 0
(triangles). This leaks wide point/line clip-discard state into later
triangle draws and can clip away large parts of geometry (as observed in
Sauerbraten). Only apply the watermark when distance > 0 and reset it to
zero otherwise so triangle draws disable clip-discard as intended.
Fixes: 0d4aa5f55f ("r300: pop-free clipping")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14959
(cherry picked from commit ce33f82f83)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
It must be lowered.
This fixes
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.{mesh,task}.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c4cb16159)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If the secondary changes the fragment output state and if the same
PS epilog used before ExecuteCommands() is re-bind immediately after
that call, the PS epilog state wouldn't be re-emitted.
Apply the same change for VS prologs, although the logic is slightly
different and the bug shouldn't occur. The whole logic of secondaries
should be completely rewritten because it's definitely not robust.
This fixes a GPU hang in Where Winds Meet, see
https://github.com/doitsujin/dxvk/issues/5436.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1a00587c44)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Lavapipe relies on true udmabuf support for dmabuf export allocation.
This changes aligns the behavior with both llvmpipe_allocate_memory_fd
and llvmpipe_import_memory_fd.
Fixes: 7d0a631f20 ("llvmpipe: export dmabuf caps for kms_swrast")
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
(cherry picked from commit 5ab8c8a439)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This is not NaN correct.
And also make the pattern 32bit only because the constant is hard coded
FLT_MAX.
Fixes: 780b5c1037 ("nir/algebraic: Simplify some Inf and NaN avoidance code")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit ab773fc5d4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
There was duplicated code to set unscaled_input_fragcoord and a read
from VK_ATTACHMENT_UNUSED attachment, which incorrectly updated
builder->unscaled_input_fragcoord.
ubsan:
tu_pipeline.cc:4734:44: runtime error: load of value 127, which is not a valid value for type 'bool'
Seen in:
dEQP-VK.renderpasses.renderpass1.custom_resolve.monolithic.stencil_only_s8
Fixes: 97da0a7734 ("tu: Rewrite to use common Vulkan dynamic state")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 81a76be861)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
For dynamic renderpass we created a fake second subpass,
which would is used by CmdBeginCustomResolveEXT, however
CmdBeginCustomResolveEXT doesn't trigger tile stores, but
attachments didn't know they should be stored after fake
custom resolve subpass.
Fixes: 520e3f3a47 ("tu: Implement VK_EXT_custom_resolve")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 67c54c4465)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Use INT_MIN instead of INT_MAX for underflow.
Fixes: cc4b50b023 ("nir/opcodes: use u_overflow to fix incorrect checks")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
(cherry picked from commit da57fbfb07)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We need to update the state var locations before the
st_serialize_base_nir() calls otherwise
_mesa_optimize_state_parameters() can alter params such that
variants wont be able to find the correct match when calling
_mesa_lookup_state_param_idx().
Prior to 891d46f5 this worked because after failing to match
we would end up adding additional params back in that we had
just attempted to optimise.
Fixes: a6fcc2835e ("
st/glsl_to_nir: make sure the variant has the correct locations set")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14837
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit 6c60f423b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
There was an assumption that if the instruction had non-native float
as a source, the first source would have such type. This doesn't
hold for Select, and the code failed in two ways
- The boolean source of Select was being converted to the non-native
float type.
- The loop that resolves the bit-size for unsized operands would
trip at `assert(i == 0)` because Select has more than one source.
Re-organize the code to track the types of the sources independently,
and fix both issues above.
Fixes: 90e1b12890 ("spirv: Add bfloat16 support to SpecConstantOp")
Fixes: 51d3c4c889 ("spirv: support float8 spec constant op")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 6affcb43a7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Only used by Convert operations, so just pass 0 from callers that
are not Convert and clarify that in the code.
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 1c3c987d5c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If (uintptr_t)&deleted_key is small enough, inserting entries into the
hash table might not work correctly.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit c0079e09ca)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Like accumulators and ARF address registers, the virtual address
registers are not tracked in a way the defs analysis can know
about. This could actually be fixed, but that is future work.
Fixes: b110b06447 ("brw: introduce a new register type for the address register")
Suggested-by: Lionel
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8624da56ee)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
brw_reg::nr encodes both which ARF it is and which instance of that
ARF. In other words, nr for acc0 and acc2 have some bits that say
BRW_ARF_ACCUMULATOR and some bits that say 0 vs 2. The previous test
would only detect acc0.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 366410e913)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This can occur if NULL or an accumulator is an explicit destination.
update_for_reads still needs to process the sources.
v2: Pass a brw_reg to ::mark_invalid, and do the VGRF check in that one
place.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a548466186)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
v3d_tlb_blit_fast includes the blit onto a pending job that writes
to the source resource. The TLB data is already unpacked according to
the job's RT format, so storing it with a different RT format performs
a channel reinterpretation rather than a raw byte copy, corrupting the
data.
So when copying from RGB10_A2UI to RG16UI with glCopyImageSubData,
the copy_image path remaps both formats to R16G16_UNORM for a raw
32-bit copy. The fast TLB blit found the pending clear job
(RGB10_A2UI, 4 channels: 10-10-10-2) and stored its TLB data as RG16UI
(2 channels: 16-16), writing the unpacked 10-bit R and G channel values
into 16-bit fields instead of preserving the raw packed bits.
Previous internal_type/bpp check was insufficient: both RGB10_A2UI
and RG16UI share internal_type=16UI and the source bpp (64) exceeds
the destination bpp (32), but their channel layouts are different.
Add a check that the job's source surface RT format matches the blit
destination RT format before allowing the fast path.
Fixes: 66de8b4b5c ("v3d: add a faster TLB blit path")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 5454221cfb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This is an old driver bug that could cause Z corruption on gfx8-11.5.
v2: handle allow_expclear differently
Cc: mesa-stable
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
(cherry picked from commit 4cfe08e583)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
fp16 has quite the limited value range and with bigger integers
nir_round_int_to_float might return Inf where it shouldn't depending on
the rounding mode.
Fixes conversions half_rt[npz]_(u)?(int|long) CL CTS tests.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
(cherry picked from commit e1ed7de274)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The special value "Inf" doesn't fit into an int and therefore we have to
clamp regardless of whether all the other values would fit. And because
f2u32 and f2u64 define out-of-range conversions as UB in nir, we need to
clamp.
This change should have no effect for non saturating conversions.
Fixes "conversions long_sat_*half" CL CTS tests
Cc: mesa-stable
Suggested-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 8e8fb2ebaa)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
If the BO comes from a different subsystem
(args.extra_flags & DRM_PANTHOR_BO_IS_IMPORTED), we should normally
add extra DMA_BUF_IOCTL_SYNC calls around CPU accesses to ensure the
CPU mapping consistency, but this is something we never worried about
(we've always assumed exporters were exposing uncached mappings with
NOP {begin,end}_cpu_access() implementations), and it worked fine until
now.
The long term plan is to hook up DMA_BUF_IOCTL_SYNC, but this requires
more work, and we need a quick fix that can be backported easily, hence
this revert+FIXME.
Fixes: b5e47ba894 ("pan/kmod: Add new helpers to sync BO CPU mappings")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14963
Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/282
Closes: https://gitlab.freedesktop.org/wayland/weston/-/issues/1101
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 30f1d5bab9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
st->release_counter is initialized to 0, so if we happen to call
st_add_releasebuf with a non-NULL releasebuf when st->work_counter
is 0 due to wraparound in st_context_add_work, we might end up never
calling st_prune_releasebufs.
Since st_context_add_work and st_add_releasebuf both use work_counter
as a "some work was done" and don't care about the actual value, we
can remove the wraparound which will fix the buffer not being released
issue.
Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14955
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14499
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 10d32feae8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We can have PHIs like this: m10 = PHI u2, 3.
For these, insert_coupling_code will spill the sources but that doesn't
work properly for FAU values before this commit because bi_index_as_mem
asserts that index.type == BI_INDEX_NORMAL and we also can't look up an
FAU index in ctx->S_exit or ctx->remat.
Fixes: 6c64ad93 ("panfrost: spill registers in SSA form")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 8a4d8d490b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
In the following arrangement the old logic leads to the following:
|
v
+----------+------------+
|block5 |
|m815 = PHI m1034, m860 |<-----------+
|343 = FMA.f32 ... | |
+----------+------------+ |
| |
+--------------+ |
| | |
v v |
+-----+ +-----+ |
|b6 | |b7,8 | |
| | | | |
+-----+ +--+--+ |
| +---+ | +---+ |
+----|b9 +-----+----|b10+---+ |
v +---+ +---+ v |
+-------+-------------+ +-------+---------+ |
|block12 | |block11 | |
|m882 = PHI m815, m860| |m860 = MEMMOV 343+--+
+---------+-----------+ +-----------------+
v
The spill of / into m860 (corresponding to 343) ends up in block11 when
insert_coupling_code(succ=block5, pred=block11) because of the memory
phi in block5. Later, in insert_coupling_code(block12, block9), we
reject inserting the spill after ca9c9957. As a result, m860 is
undefined along block5 -> block7,8 -> block9 -> block12.
When the spill position is chosen first, ctx->block is block5 so
choose_spill_position falsely returns the fallback position. The issue
can be fixed by explicitly passing the "current block".
Fixes: ca9c9957 ("pan: Avoid some redundant SSA spills")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 09e1ba28e5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
allowed precision mismatches on uniforms, however if you lower precision on
16-bit consts, then this error triggers instead.
So here we relax the type matching and just make sure we match int vs
float.
Fixes: 0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5337
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 73bc604128)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
AMD docs support this:
R5xx Acceleration v1.5 says safest handling for ZFUNC changes is to disable
HiZ except specific LESS/LEQUAL and GREATER/GEQUAL transitions.
ATI OpenGL Programming and Optimization Guide advises avoiding ALWAYS when
trying to benefit from HiZ so that would imply fglrx also disables HiZ
there.
On RV530 this fixes the following dEQPs:
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.43
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.74
Fixes: 12dcbd5954 ("r300g: enable Hyper-Z by default on r500")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8093
(cherry picked from commit b0f019f8cf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The DISCARD_WHOLE_RESOURCE path in vc4_map_usage_prep() replaces the
resource's BO with vc4_resource_bo_alloc(). As the RCL resolves
rsc->bo at job submit in vc4_submit_setup_rcl_surface(), any pending
write job would store to the new BO instead of the old one, corrupting
the new written data.
This is the same bug that was fixed in v3d in the previous commit.
Fixes: 18ccda7b86 ("vc4: When asked to discard-map a whole resource, discard it.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit ecb6c5d555)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The DISCARD_WHOLE_RESOURCE path in v3d_map_usage_prep() replaces the
resource's BO with v3d_resource_bo_alloc(). As the RCL resolves
rsc->bo at job submit in emit_rcl() any pending write job would
store to the new BO instead of the old one, corrupting the new
written data.
This is adressed by flushing all pending write jobs affecting the
resource before replacing its BO.
This fixes multiple tests where data copied to a renderbuffer was
overwritten by a previos GPU clear. Test are from the subgroup:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.*
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 1eaa46da09)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We might need to DCE users of dead instructions removed by
process_block().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 17b18496f6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
We're setting this in the non-DRI codepath, but this was missed when we
started embedding the VA driver into libgallium. This means we no longer
were able to use VA-API from meson devenv, like we could before.
Fixes: 212d57f7e6 ("targets/va: Build va driver into libgallium when building with dri")
(cherry picked from commit 7e4744909b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This change is useful when the compute shader is called multiple
times with the atomic operations enabled. It fixes some data
coherency issues. This is done by moving
evergreen_emit_atomic_buffer_setup() after r600_flush_emit().
This change is also a partial fix for compute_shader.pipeline-compute-chain.
In this specific case, it makes the memory barrier working.
This change was tested on cayman and barts; it makes these tests
fully deterministic:
khr-gl4[2-6]/shader_atomic_counters/advanced-usage-many-dispatches: fail pass
khr-gles31/core/shader_atomic_counters/advanced-usage-many-dispatches: fail pass
deqp-gles31/functional/synchronization/inter_call/without_memory_barrier/atomic_counter_dispatch_.*_calls_.*_invocations: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit dad942b468)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The alpha instruction always wrote to the same rendertarget as the rgb and the
original target was ignored (surprisingly the HW docs explicitly allows rgb and
alpha to write to different targets). This makes tesseract rendering a bit
better, but there are still some remaining issues.
Fixes: 1c2c4ddbd1 ("r300g: copy the compiler from r300c")
Reviewed-by: Filip Gawin <filip@gawin.net>
(cherry picked from commit 87a881558f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
When building with "-Dvideo-codecs=h264dec,h265dec,av1dec" va/encode.c
won't be built but it's still required because it's used from
picture.c
Fixes: c4f05bdf60 ("frontends/va: include picture_*.c based on selected codec")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 82a51ba9b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Among all subcommands, only gfx subcommands are bound to a query pool,
other subcommands seem to need no special handling.
In addition, if a ResetQuery is done before BeginQuery, the last
subcommand will be a event one, which fails the current assert that
assumes it's a gfx one.
Change the assertion of the subcommand being a gfx one to an addition
check of whether the subcommand is a gfx one.
This fixes crash of Vulkan CTS 1.4.5.1 test
dEQP-VK.query_pool.discard.normal.no_depth.none.discard .
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 5a497316d4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
In 1f83e73145, the pre-encode input picture size was also reduced.
However it was recently discovered that VCN FW uses the input picture
pitch as the pitch for this, which means that previous change broke
pre-encode.
Fixes: 1f83e73145 ("radeonsi/vcn: Reduce allocated size for pre-encode recon pics")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
(cherry picked from commit 2b2b1d405a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The logic is supposed to find the stage with the maximum constlen to
trim for each time we have to trim a stage. But by not resetting
max_constlen each time, we would "trim" the same stage repeatedly,
leaving us thinking the total is below the limit when it actually isn't.
Cc: mesa-stable
(cherry picked from commit ae8928b638)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
The HW uses ViewportIndex to select which GRAS_BIN_FOVEAT offset to use.
For normal 3d draws, either the ViewportIndex equals the view/layer or
we make the offset the same for all viewports/layers, but we aren't
aware of this in the 3d path and we always use viewport 0.
Use the HW offset 0 when subtracting the HW offset. This is a bit of a
hack, but it should work. This fixes LOAD_OP_LOAD with FDM.
Fixes: b34b089ca1 ("tu: Use GRAS bin offset registers")
(cherry picked from commit 68c0031f56)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
- RGBA8888_* is a preprocessor alias for R8G8B8A8_* in u_format.yaml.
- Both entries in the format tables collide on the same enum value, and
RGBA8888 overwrites R8G8B8A8.
- The fix was reverting to the version that was in the commit
39e949434c because there is a different format
was used that did not cause any collisions.
dEQP fixes:
dEQP-VK.api.info.format_properties.r8g8b8a8_sint
dEQP-VK.api.info.format_properties.r8g8b8a8_snorm
dEQP-VK.api.info.format_properties.r8g8b8a8_uint
dEQP-VK.api.info.format_properties.r8g8b8a8_unorm
Fixes: 9f740b26a6 ("pvr: Fix bugs in the format table")
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 7c6dbb099a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
We should be returning if no GS is needed and no GS shader is bound.
This fix various segfaults introduced by the original fix.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e10f29399f ("hk: fix passthrough GS key invalidation")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 6d040df750)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
With perfetto that string is processed later leading to
use-after-free.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 413e169f45)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Blorp emits 3DSTATE_BINDING_TABLE_POINTER_* instructions in 3D mode.
At the moment we're saved by the push constants reemitting the btp but
we'll drop that in the next commit.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 533c748b34)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Cache flushes should be skipped on SDMA. In practice,
radv_emit_cache_flush() should only be called on GFX/ACE.
SDMA NOP packets are emitted in barriers directly.
This fixes recent VKCTS coverage
dEQP-VK.api.command_buffers.secondary_on_transfer_queue.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c4d5090d69)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This should definitely be an OR operation if MRT0 and MRT1 don't write
the same channels. This also requires to set the writemask manually
because when it's 0 (in case a dual-source output is missing), the
intrinsic computes the mask itself with the number of components.
No fossils-db changes on NAVI33.
Fixes: 45d8cd037a ("ac/nir: rewrite ac_nir_lower_ps epilog to fix dual src blending with mono PS")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14878
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2eb9420061)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
On the Rogue architecture add support for using a fragment passthrough
shader when there is no fragment shader present in a graphics
pipeline but the sample mask is required.
fix:
dEQP-VK.pipeline.monolithic.empty_fs.masked_samples
Backport-to: 26.0
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Co-authored-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 14508b4c9a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Configure the EOT setup for SPM EOT programs so that the generated
programs load the tile buffer into the output buffer before doing
the emit
Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71
Backport-to: 26.0
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit d1f2ad17dd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
In subpasses preserve attachments are not used by the subpass but
their contents must be preserved throughout the subpass.
Add a list for the preserve attachments info specified by a subpass
and when determining a subpass attachments total uses check the
preserve attachments list and add it uses to the total.
Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71
Backport-to: 26.0
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 0e01b9ef2d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The struct will also be used for preserve attachments in the next
commit.
Backport-to: 26.0
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit e18670347a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
When calculating the dwords per pixel the output registers should
always be taken into account in addition to the number of tile buffers.
Fixes incorrect scratch buffer space calculation when both output
registers and tile buffers are emitted by a render.
Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71
Fixes: 3457f8083a ("pvr: Acquire scratch buffer on framebuffer creation.")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit df445dc9b9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The subpass merging optimisation check for when subpasses are using
tile buffers was in the incorrect location.
The current check is in a function called from two places but only
the first of these should have been doing the optimisation check.
This was incorrectly affecting the number of renders that subpass
merging could avoid.
Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71
Fixes: 10b6a0d567 ("pvr: Add support for generating render pass hw setup data.")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 0640ac7e3b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Empirically, TCS outputs have to be aligned to 64 bytes,
otherwise stale data may be read in rare cases. The exact
reason is not clear, but tests and proprietary driver behavior
strongly point at the need for 64 byte alignment.
Fixes tesselation issues in at least "Conan Exiles" but likely in many
more cases.
CC: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 47251b2e2d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The Vulkan spec says about VkFormatFeatureFlagBits:
If a format does not incorporate chroma downsampling (it is
not a “422” or “420” format) but the implementation supports
sampler Y′CBCR conversion for this format, the implementation
must set VK_FORMAT_FEATURE_MIDPOINT_CHROMA_SAMPLES_BIT.
Fixes: af062126ae
Signed-off-by: Benjamin Otte <otte@redhat.com>
(cherry picked from commit 0b6dd167ac)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This is possible since VK_KHR_maintenance10.
This fixes new VKCTS coverage in
dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ab6147e8ef)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
We already had this for LOAD_OP_DONT_CARE but we also need it for
LOAD_OP_NONE.
Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 44ff0c4707)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The old code was all out of order and made no sense. There's a reason
it made no sense. It was wrong. Cleaning this up fixes a solid 1/3 of
the remaining Bifrost CTS fails in CI.
Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 962d1f33e1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 3bb7d929f4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The Vulkan spec says that aspects are ignored for Z/S attachments so we
shouldn't consider that as a factor when deciding whether or not to
create other aspect descriptors. This will be irrelevant in a couple of
commits but we need it for the backport anyway.
Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 19ad26a8de)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Fixes: 6fc1030e4f ("nir: Add some new panfrost fragment shader intrinsics")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 88ad8bc75d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The util code doesn't actually fill things with zeros so the high bits
are undefined. If we really want things replicated, we need to mask off
just the bits we care about.
Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 4d8551552e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
As the comment says, we want to limit our pressure based on underlying HW
reg file size, not max it out to HW reg file size. This caused us to not
spill when we should when the HW reg size was bigger than the ISA reg file
size, leading to OOB writes in RA when it tried to allocate to the limit
pressure we spilled to.
Fixes segfaults in llama.cpp's test-backend-ops.
Fixes: e6e34883a9 ("ir3: Add wavesize control")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14846
(cherry picked from commit 0c6da326f8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Since the stride is always 32 dwords, we need to treat the workgroup
size as multiples of that value. Using MAX2() only works for cases where
the workgroup size is less than 32, which was hit by some CTS with 1x1
workgroups.
Cc: mesa-stable
(cherry picked from commit b08f9f192c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
While reworking image resolves completely in RADV, I found a very weird
bug where the only fix was to emit caches immediately after
decompressing the source resolve image (after FMASK_DECOMPRESS).
I have been struggling this for few hours and figured that it was
something related to context rolls (ie. as long the context was rolled
out, emitting the flushes immediately was required).
It turns out this was a known hardware bug on GFX6 that was implemented
in PAL. Though PAL only applies on GFX6 but GFX7-8 are also affected
based on my testing. Note that RadeonSI flushes CB_META too.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 837078b8d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The new compression scheme introduced in Xe2 also applies to Xe3, so
we're liable for the same bugs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2418c91537 ("anv/drirc: disable Xe2 CCS drm modifiers for GTK engine")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4ac47f8dde)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Might happen with radv_emulate_rt=true.
Fixes the_great_circle/a6079328b8df7712 with polaris10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 75722da909)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.
No shader-db changes on any ELK platform. I suspect the problematic
cases only occur after scheduling has rearranged instructions. This is
likely the reason BRW didn't experience this problem until 09450faf.
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit da1fd9786b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This is a backport of BRW e26270249b.
shader-db:
All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18623918 -> 18624594 (<.01%)
instructions in affected programs: 125179 -> 125855 (0.54%)
helped: 0 / HURT: 139
total cycles in shared programs: 957073100 -> 957072484 (<.01%)
cycles in affected programs: 16534168 -> 16533552 (<.01%)
helped: 42 / HURT: 68
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit bdbfe8de4d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.
shader-db:
Lunar Lake
total instructions in shared programs: 17098815 -> 17098818 (<.01%)
instructions in affected programs: 1187 -> 1190 (0.25%)
helped: 0 / HURT: 3
total cycles in shared programs: 876858960 -> 876858968 (<.01%)
cycles in affected programs: 6878 -> 6886 (0.12%)
helped: 0 / HURT: 1
Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20034973 -> 20034984 (<.01%)
instructions in affected programs: 4599 -> 4610 (0.24%)
helped: 0 / HURT: 11
total cycles in shared programs: 881033088 -> 881033108 (<.01%)
cycles in affected programs: 57872 -> 57892 (0.03%)
helped: 0 / HURT: 5
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 918873064 -> 918873269 (+0.00%)
CodeSize: 14747338416 -> 14747339360 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104141836677 -> 104141840371 (+0.00%); split: -0.00%, +0.00%
Totals from 205 (0.01% of 2011421) affected shaders:
Instrs: 290415 -> 290620 (+0.07%)
CodeSize: 4280704 -> 4281648 (+0.02%); split: -0.01%, +0.03%
Cycle count: 18166526 -> 18170220 (+0.02%); split: -0.00%, +0.02%
Closes: #14874
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit d1614cd6db)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
When creating frame buffers the alloc callbacks are used in the host
allocations, those same alloc callbacks need to be used when freeing
those allocations but are missing in some places causing the CTS to
report memory leaks in certain test cases.
Fixes: 146364ab9f ("pvr: add support for VK_KHR_dynamic_rendering")
fix:
dEQP-VK.api.object_management.alloc_callback_fail.framebuffer
dEQP-VK.api.object_management.single_alloc_callbacks.framebuffer
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 05ef9f01a7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Bundles all graphics pipeline creation information required by Metal into
the vertex shader so we can later rebuild the pipeline. This allows us to
correctly create pipelines from caches that were loaded from files.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit cdbf7242f3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
When deserializing the compute shader from a blob, we need to recreate the
pipeline because the blob may have been loaded from file and therefore the
reference to the Metal resource will be invalid.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 75f6f46c0f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The condition to release Metal pipelines incorrectly checks which shader
stage we are destroying leading to leads when graphics pipelines had to
be released.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 622ebba476)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The NIR_PASS macro only overwrites this when the pass actually makes
progress. If the pass doesn't make progress, the variable stays
uninitialized.
Clang correctly spots this and warns about it.
Cc: mesa-stable
(cherry picked from commit 47e4a68a83)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The hardware only provides 13 bits for encoding the stack base (in
dwords). That translates to the stack base being required to be below
8192 dwords, or 32kB. It's possible to exceed this - LDS is 64kB after
all. Add an explicit check to make sure we don't end up with offsets
that overflow the hw's address fields. This fixes Metro Exodus Enhanced
Edition, which was using ray queries in a 1024-thread sized workgroup,
resulting in exactly 64kB of LDS being required for the stack.
This check isn't required for RT pipelines as we always use 32 or 64
wide workgroups with no other LDS used, so it's impossible to reach this
stack base limit.
Cc: mesa-stable
(cherry picked from commit 59a397793e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Just seeing that a passthrough GS was already bound is not sufficient to
know that it is a *matching* passthrough GS. If the application binds a
new VS that requires a different passthrough GS key than the previous
VS, then we need to bind a different passthrough GS.
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
(cherry picked from commit e10f29399f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Use either libagx_fill_uint4 or libagx_fill based of size and object
alignment for clear_sizes which are a power of two up to 16.
Reported fill rate for 256MB buffers on a M1 Ultra (G13D) in
gpu-ratemeter is 355 GB/s for 16 byte aligned buffers and 155 GB/s for
4 byte aligned buffers.
Signed-off-by: Janne Grunau <janne-fdr@jannau.net>
(cherry picked from commit 5c2d62c030)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Use a compute shader to copy PIPE_BUFFERs. Based on hk's hk_cmd_copy().
For large copy sizes (>= 128MB) it achieves 3/4 of the available memory
bandwidth on a M1 Ultra (G13D). `gpu-ratemeter gl.bufbw` reports
~625 GB/s for 256MB buffer size. Apple specifies the memory bandwidth of
the M1 Ultra with 819.2 GB/s.
Signed-off-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 3f5497ded8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The issue caused us to put a switch to disable (Xe2) drm modifers
in 2418c91537 is fixed in GTK 4.20.3,
so we can enable the modifiers with this and newer GTK releases.
GTK https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/9164:
b2a42d5a6e Revert "vulkan: Wait for device to be idle before
create/recreating swapchain"
270735a151 vulkan: Rework swapchain present implementation
The hex values represent the GTK version range: [4.0.0, 4.20.2] for
VK_MAKE_VERSION(), refer to:
f493f5c88d
Cc: mesa-stable
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit df7d333656)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
dcd_flags1 was not counted as dirty in case the color attachment map was
updated. This could lead to an outdated value for render_target_mask.
Fixes: a4670a67e0 ("panvk/csf: Set the correct DCD_FLAGS_1.render_rarget_mask")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 75242b1862)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Odd macro-tile counts in X trigger flaky rendering/readback in
parallel stress runs with macro-tiled NPOT textures (for example
piglit draw-pixel-with-texture -auto -fbo).
When a texture is macro-tiled and uses stride addressing, align the
width to two macro tiles. This keeps the stride at an even number of
macro tiles in X and avoids the corruption without disabling
macrotiling.
I was not able to find anything about this in the docs.
Cc: mesa-stable
(cherry picked from commit 0763fb947a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
No issue with clang or gcc-14.x (or earlier versions). The issue only
shows up since gcc-15.1. The compiler somehow fails to consider those
cs helpers dereferencing the pointer from the pNext chain for reads,
and thus has falsely optimized away the pNext store. This change works
around this with a no-op memory clobber.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13242
Cc: mesa-stable
(cherry picked from commit b0397b967d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
For drivers that set allow_st_finalize_nir_twice locations are set
when the variable is created. But for variants here we update the
locations in case parameter opt pass or something else changed the
location.
Fixes: 891d46f517 ("st/glsl_to_nir: dont add duplicate state tokens")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14837
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit a6fcc2835e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Make sure that lowering undone in elk_nir_optimize are reapplied.
No shader-db or fossil-db changes on any Intel platform. This is most
likely to impact either Gfx8 on ANV or Gfx7.5 on HASVK. I don't
fossil-db test either of those platforms.
I tried doing a similar thing here as is done in BRW (previous commit),
but that caused a couple Haswell shaders to fall off a performance
cliff:
total spills in shared programs: 8247 -> 8311 (0.78%)
spills in affected programs: 6 -> 70 (1066.67%)
helped: 0 / HURT: 2
total fills in shared programs: 8558 -> 8910 (4.11%)
fills in affected programs: 6 -> 358 (5866.67%)
helped: 0 / HURT: 2
Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit df704bd38e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The number of fields comes from the shader, so it could be a value large
enough that using alloca would be problematic.
Fixes: c11833ab24 ("nir,spirv: Rework function calls")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9017d37e84)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The number of fields comes from the shader, so it could be a value large
enough that using alloca would be problematic.
Fixes: 2a023f30a6 ("nir/spirv: Add basic support for types")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 3da828d2dd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:
non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bd5ebbb2f8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.
is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1be4ffdff9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Caught through VVL test NegativeWsi.SwapchainImageFormatList. The test
would try to create a swapchain with a color space from
VK_EXT_swapchain_colorspace without enabling the extension. This is
because wsi would expose those color spaces even when the extension was
not enabled.
Fixes: fd045ac99c ("wsi/metal: add support for color spaces")
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit e6f118f12b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
A descriptor buffer promoted to push constants requires a constant
cache invalidation if it is modified on the device.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 42b70cf05a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Given a situation like this :
- CB_A: begin, renderDepthA, end
- CB_B: begin, computeA, barrier (depth), computeB, end
The depth cache is not being flushed between renderDepthA & computeB
because :
- it's not flushed at the end of CB_A (it's not required)
- when CB_B starts, we're still on GFX pipeline mode but do not
flush render caches because pipeline mode is unknown
- when barrier is CB_B is executed, we're already in compute
pipeline mode and HW cannot flush depth.
The fix is to flush RT/depth cached when switching from unknown
pipeline mode any pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6dae6ef5f ("vulkan: Optimize implicit end_subpass barrier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14816
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: David Gow <david@davidgow.net>
(cherry picked from commit 888ac904a3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This bit is set in mocs for other protected attachment types by
anv_image_fill_surface_state() however was ommited for depth/stencil
attachments here.
Without the protected bit set, it causes heavy black artifacting when
attaching a protected depth attachment image to a framebuffer.
Fixes: 794b0496e9 ("anv: enable protected memory")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f84ed620c2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
`operands_match` was modifying instruction source operands in-place
(through the `elk_fs_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 14c65322e8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.
Guard all `.f` accesses with `.file == IMM` checks.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 93f39f87c4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
`operands_match` was modifying instruction source operands in-place
(through the `brw_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit b302faad8b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.
Guard all `.f` accesses with `.file == IMM` checks.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit f5e0f63216)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Could do better by checking which registers are clobbered/preserved,
but that's unlikely to be useful anyway.
Backport-to: 26.0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit fc7b5d7eed)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
The kernel capabilty has the `FPFastMathMode` decoration, but not the
`FPFastMathDefault` execution mode, so a SPIR-V module not using
`SPV_KHR_float_controls2` has no way of setting any defaults.
Fixes: 9da2d21804 ("vtn: implement default fp_math_ctrl without using execution mode")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Tested-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit faf3a93e8f)
[Eric: adjusted commit because of missing 46a617884e, as suggested by the author
at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39790#note_3325830]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
I somehow screwed this up on my previous attempt at fixing this bug,
This should fix the loop limiter bug on big endian properly.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Cc: mesa-stable
Fixes: e28cfb2bad ("gallivm: handle u8/u16 const loads properly on big-endian.")
(cherry picked from commit c016346b50)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
When a buffer is deleted, we have to remove it from all binding points.
We were re-using the code for BindBufferRange for this; however, this
caused the general binding point to be unbound (bound to NULL)
unconditionally, even if a different buffer is bound there. Fix this by
inlining the various bind calls into the delete buffers code.
cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14755
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit fa418f1e73)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Only for partial copies because image stores don't decompress on writes
(ie. HTILE isn't updated by image stores).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 9f5a20abde)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
libclc doesn't so we have to. fixes math_brutefore cbrt on Iris.
Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit af954427bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
On the ppc64le architecture error log fail to compile with error:
../src/virtio/vulkan/vn_renderer_virtgpu.c: In function ‘virtgpu_ioctl_map’:
../src/virtio/vulkan/vn_renderer_virtgpu.c:751:66: error: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long unsigned int’} [-Werror=format=]
751 | "mmap failed: gpu_fd=%d, handle=%u, size=%zu, offset=%llu, err=%s",
| ~~~^
| |
| long long unsigned int
| %lu
752 | gpu->fd, gem_handle, size, args.offset, strerror(errno));
| ~~~~~~~~~~~
| |
| __u64 {aka long unsigned int}
cc1: some warnings being treated as errors
Parse the parameters to fix the failure.
Fixes: a49b7adad8 ("venus: add error log coverage for virtgpu backend")
(cherry picked from commit dd3fe2d671)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
There might be cases under which we can make this work but they're
tricky at best. For now, don't even try.
Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 918624174b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
There's a nice little comment here saying we use the same write mask (an
out of date term in NIR) and swizzle but we're no longer actually doing
that. Depending on nir_builder magic, we may actually generate a scalar
when we really want a vector. The fix is to use more builder helpers
and just eat the potential copy.
Fixes: 3180656bbc ("nir: don't use nir_build_alu() with incomplete sources")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 711b3358a8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
the lowerings for e.g. f2f16_rtp have carefully written sequences using
Infinity. nir_opt_algebraic will stomp right through this. `feq x, inf`
without an exact flag is basically always a bug. Disable fast math here.
Fixes OpenCL CTS test_half on Iris.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 91550d0709)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
The passed flags is always zero on the import paths:
- panfrost_bo_import
- panvk_AllocateMemory
- panvk_GetMemoryFdPropertiesKHR
Fixes: 1c7793ea0b ("panvk: Advertise a HOST_CACHED memory type if we have WC maps")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 8d25f9821b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Fix compiler error:
../src/freedreno/decode/cffdec.c:580:7: error: assigning to 'char *'
from 'const char *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
580 | p = strstr(name, "CONST");
| ^ ~~~~~~~~~~~~~~~~~~~~~
glibc now provides C23-style type-generic string functions. strstr
returns const char * when passed a const char * argument. Update p
declaration to const since it's only used for offset calculation.
Fixes: 1ea4ef0d3b ("freedreno: slurp in decode tools")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit bc34a122f3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
glibc master has been C23'fying the functions which is resulting errors
Several functions assigned results of bsearch/strstr/strpbrk/memchr to
non-const pointers, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror. Cast bsearch return values where needed and
propagate const correctness for strstr/strpbrk/memchr results.
Removes build failures with strict warning flags without changing behavior.
Signed-off-by: Khem Raj <raj.khem@gmail.com>
[Eric: changed the glxglvnd.c hunk to add the missing `const` instead of casting it away]
Cc: mesa-stable
(cherry picked from commit 268e19378f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This was added specifically for vectorized stores, so allow for loads.
Without this, the pass will fail to vectorize 2 consecutive 16-bit loads
into a single 32-bit load.
Fixes: 2ed79f80ba ("nir/load_store_vectorize: Skip new bit-sizes that are unaligned with high_offset")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit f6a2d14008)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This has not been problem before the compression hint given to kernel
but now that we set it we hit problems when allocating bo if modifier
does not support compression.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14625
Fixes: f91de58818 ("anv: Add support to DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit fc814fa828)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
In glibc 2.43 the strstr function now propagate const to the output, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror.
Fix two of these cases by adding the const qualifier.
cc: mesa-stable
(cherry picked from commit ece5f671b3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
lower_store_component() always returns false even though it modifies
NIR instructions (rewrites sources, creates new SSA defs, removes
previous stores). This triggers the "NIR changed but no progress
reported" assertion in nir_shader_intrinsics_pass.
Return true when a store_output or store_per_view_output intrinsic is
processed, since the function always modifies the shader in that case.
Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/274
Cc: mesa-stable
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 4938ad435e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Those have long been supported by vn_image_deferred_info_init because of
AHB support. For non-aliased ANB image, those are directly passed from
the platform swapchain create info as well. So we just need to drop the
obsolete asserts to make newer Android platform and ANGLE happy.
Cc: mesa-stable
(cherry picked from commit 091c4f43ff)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Since glibc-2.43:
For ISO C23, the functions bsearch, memchr, strchr, strpbrk, strrchr, strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr that return pointers into their input arrays now have definitions as macros that return a pointer to a const-qualified type when the input argument is a pointer to a const-qualified type.
https://lists.gnu.org/archive/html/info-gnu/2026-01/msg00005.html
Resolves the following warnings:
src/mesa/glapi/glapi/gen/enums.c: In function '_mesa_enum_to_string':
src/mesa/glapi/glapi/gen/enums.c:7799:8: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
7799 | elt = bsearch(& nr, enum_string_table_offsets,
| ^
../src/egl/main/egldispatchstubs.c: In function 'FindProcIndex':
../src/egl/main/egldispatchstubs.c:52:7: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
52 | bsearch(name, __EGL_DISPATCH_FUNC_NAMES, __EGL_DISPATCH_COUNT,
| ^~~~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
(cherry picked from commit 1acc96b8cb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Its no longer an error for depth and stencil formats to have invalid
accumulator format.
Fixes the following tests:
* dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm
* dEQP-VK.api.info.image_format_properties.2d.optimal.d24_unorm_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.x8_d24_unorm_pack32
Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 58c7437d3a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This could return the graphics DCC pipeline if it was created before,
and crash or potentially hang the GPU.
Found this while working on in-progress VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ad7151f4bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Commit cf4bd2e412 added a fast path for handling no-command submits to
accommodate a kernel behavior quirk. Sparse support was complete before
that change but landed afterwards, leaving sparse submits that don't have
command buffers but do have sparse bind commands to take that fast path,
leaving the bind commands unhandled. The condition for the fast path is
fixed to address that.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 71ef46717c ("tu/kgsl: Add support for sparse binding")
(cherry picked from commit 5b33ee9f0b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Fixes a race condition where a BVH will be dumped before its command buffer is
actually submitted if a different command buffer completes between the time the
BVH dump is recorded and the time the command buffer is actually submitted.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes: 1b55f101 ("anv/bvh: Dump BVH synchronously upon command buffer completion")
(cherry picked from commit 95e471e558)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
This is to make sure early culling related Wa_16020518922 is enabled
properly.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 331238e44e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Now that the small/large pages race is fixed, we can safely enable it
back when the kernel side report 1.4.2 support.
Fixes: f3c53cf66b ("nvk: Disable large pages for now")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit b524bf368e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.
Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>
(cherry picked from commit 13c9e529bd)
Not enough tested on over Gen12 platforms.
Turns out to be not working on DG2, for example.
Cc: mesa-stable
Closes: #14449
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39676>
(cherry picked from commit d2c24a0d8b)
Clean tiles must actually be written back for AFBC buffers (color,
z/s) when either one of the effective tile size dimension is smaller
than the superblock dimension. This commit fixes the current check
which compares the effective tile size to the superblock size.
Fixes: 762a0f4133 ("panfrost: Add the concept of render block")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
(cherry picked from commit 098b69a05c)
tu_CmdBeginRendering was unconditionally allocating a new
patchpoints_ctx. When resuming a render pass chain, this overwrote the
existing context from the suspended pass, leaking it and all associated
FDM patchpoints.
Fixes: 0dd06c74d6 ("tu: Fix FDM patchpoint memory leak")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39639>
(cherry picked from commit d4ad50752f)
Avoiding NaNs should have the same effect but it's good practice to not
rely on float OPs for correctness.
Fixes: 95a89f7 ("radv: Report smaller bvh sizes when possible")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 24a1e3d8c2)
As seen in the Vivante kernel driver function gckHARDWARE_Flush(),
GPUs without gcvFEATURE_TEX_CACHE_FLUSH_FIX, which translates to
all GPUs before halti5, need a full stall of the GPU pipeline
before flushing the texture caches.
This fixes sporadic GPU hangs observed in use-cases where texture
data updates are intermixed with draws without any state changes
that might necessitate a stall.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39673>
(cherry picked from commit 643ba9a784)
When you get UnexpectedResult(skip), that means take your xfail out
because it's now skipping. Which is the fix, instead of "take the xfail
out and add it to manual skips".
Fixes: e54440d15e ("Uprev Piglit to a3826de3c26a279599d15b018a9a3e75ca46f4f8")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39568>
(cherry picked from commit 42e17a948e)
The generation version for V3D XML package was marked as 3.3, but
actually we removed all the code supporting this generation, and the
generations we support now are from 4.2 onwards.
So we bump up the generation version.
Fixes: 9c4829473a ("broadcom/cle: remove v33 and v41 from xml definition")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39577>
(cherry picked from commit 5a85b3d9f4)
Mesh shader uses store per vertex output for point size
and store per primitive output for layer id.
This fixes gpu-ratemeter run slow for kill point size
and layer id cases when mono shader is used which expect
to kill these outputs.
Also gather fragment shader per primitive input info
to kill mesh shader per primitive output.
Fixes: e6e21dfbf2 ("radeonsi: kill outputs for mesh shader")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39644>
(cherry picked from commit f20cd07e21)
For non-WSI images, explicitly map VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in anv_layout_to_aux_state().
Before this patch, the function passed PRESENT_SRC into
vk_image_layout_to_usage_flags() and got a return value of 0 from it
(that function expects that layout to be explicitly handled by the
caller). This caused the logic dependent on the return value to be
unreliable.
Fixes: c5cad407f8 ("anv: handle non-wsi images in anv_layout_to_aux_state")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit f616d4fb2a)
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 811c413f98)
This change updates the vs_as_ls switch logic to make it
reliable. It resets the dirty flag when the switch is
happening. It uses also evergreen_emit_vs_constant_buffers()
to try to update again some of the states which could be
lost otherwise.
This change fixes some "flakes". These tests needed previously
to be executed twice to set the hardware in the proper state
for the test to pass. It also fixes the main issue of the
texture_view.view_sampling test.
This change was tested on palm and cayman. Here are the tests
which are now utterly fixed:
khr-gl4[3-6]/stencil_texturing/functional: fail pass
khr-gl4[4-6]/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass
khr-gles31/core/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass
khr-glesext/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass
Fixes: 25f96c1120 ("r600: hook up constants/samplers/sampler view for tessellation")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39269>
(cherry picked from commit 9c5e15e6f5)
The empty .clang-format file in the project root is required for meson
to generate the clang-format target. It was accidentally deleted.
Fixes: efe60d2940 ("intel: remove unused show_shader_stage debug option")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39648>
(cherry picked from commit 449261b6ba)
It was incorrect to mark chit/miss arguments as discardable without
the equivalent in the traversal shader. Also, tail calls with modified
parameters that aren't marked discardable are incorrect.
This could lead to random corruption by clobbering parameter values
across two levels of nested calls: A Raygen shader calls traversal,
expecting e.g. the ray tMax parameter to be preserved. Traversal
overwrites the parameter's register with the hit t and tail-calls chit,
which immediately returns to raygen. Now the raygen shader still has the
clobbered tMax (which is actually the ray hit t) - if it calls traversal
multiple times, the second traversal iteration may use the previous
ray's hit t as tMax instead of the intended value.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 3275be503c)
There were two issues here:
1. Tail calls where the tail-callee receives modified parameters are
hazardous and only work if the parameter is return or discardable.
Otherwise, the caller of the function that executes the tail-call may
not expect some of the parameters to be clobbered.
2. There was also an indexing confusion with the call instruction vs.
call signature parameters. The call instruction has not been adapted
to the new lowered signatures, where the system args are prepended. To
make things clearer, split the loop into two, one iterating over
parameters in the call signature and one for parameters of the call
instruction.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 0d7705c206)
The original semantic of discardable parameters was "okay, nothing
actually uses this parameter, feel free to clobber it", but we were
only using it with tail calls from a function without discardable
parameters, which was broken.
Instead, slightly change the use-case and utilize the "discardable"
attribute to mark parameters that the callee will clobber in a tail
call. This makes doing tail calls safe when the tail callee receives a
modified set of parameters.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit ad23e02a28)
This matches the initialization that the proprietary driver does.
Fixes dEQP-VK.query_pool.discard.*.alpha_to_coverage* on vk cts 1.4.5
Cc: mesa-stable
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39621>
(cherry picked from commit 8d7f14620b)
Expose DRM format modifiers via VkDrmFormatModifierPropertiesList2EXT.
VVL is one notable user.
This is required for VK_EXT_image_drm_format_modifier when
VK_KHR_format_feature_flags2 is supported.
Cc: mesa-stable
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39600>
(cherry picked from commit e185f40fc3)
In case of texel buffers that are read in the shader, but not bound by
the application, the current implementation would incorrectly try to
read from non-existent buffers.
To ensure this does not happen, this change sets the format for any
unbound attributes to CONST_0000, which will kill any actual
reads/writes and always return zeroes.
This fixes the following two tests:
- spec@arb_shading_language_420pack@active sampler conflict
- spec@arb_texture_buffer_object@render-no-bo
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39431>
(cherry picked from commit aec8132b8b)
Also change to use H265 constant for maxDpbSlots (both values for H264 and H265
are the same).
Fixes: ee535aa039 ("radv: video: rework maxActiveReferenceSlot/MaxDpbSlots")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39609>
(cherry picked from commit 7607aeefa6)
This is just wrong if the secondary uses ESO because the emitted
pipelines would be NULL in the secondary, but if the app re-binds
the same pipeline in the primary it would consider it as already
emitted. A sequence like this would break:
CmdBindPipeline(compute)
CmdDispatch()
CmdExecuteCommands() --> with ESO compute
CmdBindPipeline(compute)
CmdDispatch()
This tracking is probably useless anyways because it's unlikely that
apps will rebind the same pipeline right after CmdExecuteCommands() but
let's keep it because this is a bugfix.
Fixes
dEQP-VK.api.command_buffers.pipeline_shader_object_mix_with_secondaries.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39587>
(cherry picked from commit 9ad02b5724)
The pMiColStarts/pMiRowStarts arrays from applications may have
incorrect units. Instead of using them directly, compute the tile
start positions in superblock units internally based on the tile
dimensions.
Cc: mesa-stable
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>
(cherry picked from commit 8e9fec8e40)
It might be that the radv_pipeline_cache_lookup_nir_handle() in
radv_ray_tracing_pipeline_cache_search() fails but we will later need the
NIR. If rt_stages[i].shader was non-NULL, then we would not have created
the NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38263>
(cherry picked from commit 89eefdcadb)
The non-dynamic members of xfb_info are already included in
sizeof(hk_passthrough_gs_key), so adding nir_xfb_info_size counts them
twice. Because of this we were including uninitialized memory in the key
in hk_handle_passthrough_gs, which is undefined behavior.
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39574>
(cherry picked from commit d6745b358d)
This fixes bunch of cts tests hitting issues when attempting
anv_image_mcs_op with compute.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39581>
(cherry picked from commit 85978ccd28)
When proceeding with rendering, any transient attachment that will be used
as LRZ buffer should also be allocated. With GMEM rendering, these
attachments otherwise remained unloaded and subsequent LRZ clears produced
GPU faults.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 764b3d9161 ("tu: Implement transient attachments and lazily allocated memory")
Fixes: #14604
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39535>
(cherry picked from commit b6a049ea4b)
The samples per tile calculation was incorrect for sample count 4 and 8.
Fix:
dEQP-VK.pipeline.monolithic.multisample.std_sample_locations.draw.depth.samples_4.*
dEQP-VK.pipeline.monolithic.multisample.std_sample_locations.draw.stencil.samples_4.*
Backport-to: 26.0
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39580>
(cherry picked from commit 9f9788330e)
Within the driver buffers are treated as 2D as sampling them as 1D
will run into HW restrictions on max size.
The compiler does the same however for atomic image ops the address
is manually calculated and doing this via the 2D path leads to
incorrect offsets.
The fix is to treat buffers as 1D for atomic ops which calculates
the correct offsets for the operations.
Fix deqp:
dEQP-VK.image.atomic_operations.add.buffer.*
dEQP-VK.image.atomic_operations.and.buffer.*
dEQP-VK.image.atomic_operations.compare_exchange.buffer.*
dEQP-VK.image.atomic_operations.dec.buffer.*
dEQP-VK.image.atomic_operations.exchange.buffer.*
dEQP-VK.image.atomic_operations.inc.buffer.*
dEQP-VK.image.atomic_operations.max.buffer.*
dEQP-VK.image.atomic_operations.min.buffer.*
dEQP-VK.image.atomic_operations.or.buffer.*
dEQP-VK.image.atomic_operations.sub.buffer.*
dEQP-VK.image.atomic_operations.xor.buffer.*
Fixes: 6dc5e1e109 ("pco: fully support Vulkan 1.2 image atomics")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39521>
(cherry picked from commit 079377c767)
This reverts commit 6eadcaa851.
VK_EXT_primitives_generated_query has a dependency on
VK_EXT_transform_feedback, which we do not implement yet. This is
breaking the android CTS. It will be reenabled once transform feedback
is in.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39547>
(cherry picked from commit 4959f45e99)
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
(cherry picked from commit 5b48805b42)
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.
Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit ce196c9de5)
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.
Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 6c6b2d8f30)
When changing the clear color without a fast clear, use dirty bits to
ensure that surfaces with inline clear colors are updated and that
partial resolves are done as needed.
Remove the flags at the bottom of fast_clear_color() as
blorp_fast_clear() already sets them for us.
Fixes: 64d861b700 ("iris: Skip some fast-clears even on color changes")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 3b642f7456)
From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+,
This field must be set to an integer multiple of the Surface
Vertical Alignment
Accomplish this by aligning the height of each MCS layer to main
surface's vertical alignment. Prevents the following test group from
failing on Xe2 when a future commit enables multi-layer fast-clears in
anv:
dEQP-VK.api.image_clearing.*.
clear_color_attachment.multiple_layers.
*_clamp_input_sample_count_*
The main test I used to debug this:
dEQP-VK.api.image_clearing.core.
clear_color_attachment.multiple_layers.
a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2
Backport-to: 25.3
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit eb4a581e44)
Tested on PTL, fixes various copy_and_blit tests that utilize compute
after ab9d3528dc that exposed this to them.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>
(cherry picked from commit bb84773c81)
Prevent assert failures in a future commit where Tile64 will be selected
more often.
Fixes: 42ef23ecd1 ("intel/blorp: Don't redescribe some Tile64 clears")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit 6fc0e5c0aa)
When determining if an LOD can fit within a miptail, we must minify in
pixel space and then convert to elements.
Prevents the following test case from failing when Yf is force-enabled:
dEQP-VK.image.texel_view_compatible.graphic.extended.3d_image.texture_read.astc_8x5_srgb_block.r32g32b32a32_uint
Fixes: 46f45d62d1 ("intel/isl: Start using miptails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit add742fca6)
The previous method to calculate imageSize().z was
incorrect for a cubearray view.
This change was tested on palm and cayman. Here is the test fixed:
spec/arb_texture_view/rendering-layers-image/layers rendering of imagecubearray: fail pass
Fixes: 6c1432f0be ("r600/eg: fix cube map array buffer images.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39063>
(cherry picked from commit 0b8d8f2b17)
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492>
(cherry picked from commit c10ebb0fda)
This change fixes the clamp to max_texel_buffer_elements
issue related to rv770 and older gpus.
Here are the tests fixed on rv770:
spec/arb_texture_buffer_object/texture-buffer-size-clamp/r8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rg8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rgba8ui_texture_buffer_size_via_sampler: fail pass
Fixes: 1a441ad5cb ("r600: clamp to max_texel_buffer_elements")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39385>
(cherry picked from commit afcead9158)
This is a gl4.3 issue very similar to e8fa3b4950.
The mode r10g10b10a2_sscaled processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.
This change fixes this issue by adding two gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.
.w(2-bits) expected palm cypress
0 0 0 0
1 1 1 1
2 -2 2 -2
3 -1 3 -1
w_out = w_in - (w_in > 1. ? 4. : 0.);
0002 00000024 A0040000 ALU 2 @72
0072 801F2C0A 600004C0 1 w: SETGT*4 __.w, R10.w, 1.0
0074 839FCC0A 61400010 2 w: ADD R10.w, R10.w, -PV.w
Note: cypress returns the expected value, and does not need
this correction.
This change was tested on palm, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/vertex_attrib_binding/basic-input-case6: fail pass
khr-gles31/core/vertex_attrib_binding/basic-input-case6: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38849>
(cherry picked from commit 2ed761021f)
Using a PV register which is not PV.x, after a dot4 operation,
does not work on rv770. Anyway, this does work on evergreen
but this is not documented.
This change updates this behavior for all the r600 gpus
which fixes the issue on rv770. It adds max4 which has the
same requirement in the case of max4 being implemented.
Here are some of the affected tests on rv770:
piglit/bin/fp-abs-01 -auto -fbo
glcts --deqp-case=KHR-GL31.buffer_objects.triangles
piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/built-in-functions/fs-distance-vec2-vec2.shader_test -auto -fbo
Fixes: 942e6af40b ("r600/sfn: use PS and PV inline registers when possible")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39101>
(cherry picked from commit da1108dcc4)
The functionality was working properly at glMinSampleShading(0.)
and glMinSampleShading(1.). The issue was with the intermediary
values. This change makes this function compatible with the
evergreen setup.
Note: this was one of the few functionalities which were working
properly on evergreen but not on cayman.
Here are the tests fixed:
spec/arb_sample_shading/samplemask 4 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 4/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.500000 partition: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_8: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_8: fail pass
Fixes: f7796a966d ("radeonsi: add basic code for overrasterization")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38615>
(cherry picked from commit d5d844bfc4)
This can happen if a loop has no continues, and the later code should work
fine in this situation.
This fixes war_thunder/0013a69e097b2471 on navi21.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b9d28ab9b ("aco/insert_fp_mode: insert fp mode in reverse")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39481>
(cherry picked from commit e59a0df302)
MESA_VK_DYNAMIC_DS_DEPTH_BOUNDS_TEST_BOUNDS state should be emitted as part
of TU_DYNAMIC_STATE_RB_DEPTH_CNTL along with other depth state, and not as
part of dynamic stencil state.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 979cf7bac0 ("tu: Merge depth/stencil draw states")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39323>
(cherry picked from commit 3cb4776ede)
The previous Gfx12+ implementation using bit masking is failing for FP8
types, so replacing with explicit lookup tables.
For float types, the encoding now aligns with brw_data_type_float, ensuring
correct behavior for DPAS and other 3-source instructions.
Fixes: d1d4e3d530 ("brw: Add EU assembler support for float8")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39448>
(cherry picked from commit 0ce4e8ba6f)
This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.
Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
(cherry picked from commit 895ff7fe92)
Otherwise:
gallium/auxiliary/gallivm/lp_bld_nir_soa.c:2394:7:
error: variable 'opname' is used uninitialized whenever switch default is taken
is observed.
Reviewed-by: @LingMan
Fixes: 12bceb228a ("gallivm: let reduce ops use llvm intrinsics")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39418>
(cherry picked from commit 0f582b0268)
The extension is optional in Vulkan 1.2 and is causing crashes in
multiple CTS tests.
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Backport-to: 26.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39351>
(cherry picked from commit 3aacc324bc)
now that transient images are a more complete mechanism, this should
in theory be okay and also accounts for the case where
a framebuffer contains mixed msrtt textures and plain multisampled textures
(cherry picked from commit 6474af3b42)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39469>
WSI used to track the similar for aliased wsi image creation, but later
got deprecated. So let's rename wsi.memory to wsi.anb_mem and drop
wsi.memory_owned to avoid confusions with common wsi related trackings.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 481df22209)
Vulkan is supposed to operate in explicit synchronization mode. However,
for legacy compositors that only support implicit fencing, we have to
extract the compositor implicit fence (release fence) and resolve it
properly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 849e3552e8)
The skip check should only be checking the format rather than the entire
packed word.
Fixes: 52ddc40a75 ("pco: restrict shadow sampler comparator clamping to unorm formats")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39428>
(cherry picked from commit c5b70dcb48)
Fixes: 6bda88bfdb ("pvr: copy WSI can_present_on_device function from PanVK")
Signed-off-by: Kitlith <kitlith@kitl.pw>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39415>
(cherry picked from commit b18b52e61d)
The offset for the dynamic buffers needs to be computed with the currently
bound pipeline layout. This change fixes incorrectly selecting the offset
for a dynamic buffer if a descriptor with a lower index than the currently
being bound contains a dynamic buffer but said descriptor hasn't being
bound yet. It also prevents the binding to override the dynamic buffers in
order to preserve the already bound dynamic descriptors.
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
(cherry picked from commit aaf4405507)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39440>
The offset for the dynamic buffers needs to be computed with the currently
bound pipeline layout. This change fixes incorrectly selecting the offset
for a dynamic buffer if a descriptor with a lower index than the currently
being bound contains a dynamic buffer but said descriptor hasn't being
bound yet. It also prevents the binding to override the dynamic buffers in
order to preserve the already bound dynamic descriptors.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 80a076f5d0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39440>
```
Traceback (most recent call last):
File "bin/pick-ui.py", line 31, in <module>
loop = urwid.MainLoop(u.render(), PALETTE, event_loop=evl, handle_mouse=False)
~~~~~~~~^^
File "bin/pick/ui.py", line 196, in render
asyncio.ensure_future(self.update())
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "/usr/lib64/python3.14/asyncio/tasks.py", line 730, in ensure_future
loop = events.get_event_loop()
File "/usr/lib64/python3.14/asyncio/events.py", line 715, in get_event_loop
raise RuntimeError('There is no current event loop in thread %r.'
% threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'MainThread'.
```
Of the 3 dependencies, only urwid actually needs to be updated, but
while at it let's pick the latest of each.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39452>
(cherry picked from commit 21829c9f7e)