Using wayland on s390x has all the colors wrong.
Mesa reports using GBM_FORMAT_XRGB8888 but inside the buffer, the
colors are in GBM_FORMAT_BGRX8888 order.
This patch fixes it for common formats, and also introduced BGRX8888
which is the default on big endian.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31707>
__DRI_IMAGE formats are not well defined for big endian.
This patch has no functionnal change and prepare the work to better support
big endian.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31707>
This fixes:
KHR-GLES32.core.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
Without this change the assert in gather_output is hit:
assert(!nir_src_is_const(offset) || nir_src_as_uint(offset) == 0)
Because nir_opt_algebraic determines that some ssa values are constant,
but the nir_io_add_const_offset_to_base wasn't run afterwards.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684>
When offset=0, the pass was a no-op but was setting the progress
flag which could cause infinite loops when this pass is going
to be added to gl_nir_opts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684>
We cannot set the {X,Y}MAX_RIGHT_EXCLUSION bits
if we have a sample location at a pixel boundary.
CTS does not seem to be catching this.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Co-authored-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31839>
RADV needs to adjust this register for user sample locations because
it seems possible to have a sample on the -8 coordinate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815>
It's possible to enable NGG culling with ESO if shaders are linked, or
if the VS doesn't need a prolog or if TES is used. This wasn't
supposed to be enabled but I think it worked just by luck because the
user SGPR value was probably zero and NGGC was disabled at draw time.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31829>
This is useful in order to correlate shader hashes between RGP and
Fossilize. This is because Fossilize needs to pass the capture
statistics flag for getting shader hashes and the pipeline key won't
match otherwise.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31820>
This reduces the penalty the heuristic gives to SIMD32 shaders
relative to SIMD16 in presence of discard control flow on Xe2+. The
penalty was meant to account for the inefficient divergence behavior
of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs
bundled in groups of two, and each pair shared control flow logic so
both EUs could only execute instructions in lockstep, which meant that
SIMD32 shaders had an effective warp size of 64 on Gfx12.x.
This change switches back to more optimistic modelling of discard
divergence. With it we gain about 6% performance in a Shadow of the
Tomb Raider trace (tested on BMG).
One may wonder if there are still workloads that would suffer
materially from enabling SIMD32 for all pixel shaders on Xe2 instead
of using this heuristic, since Xe2 EUs have twice the GRF space, twice
the FPU throughput and better divergence behavior than Xe, but the
answer seems to be yes unfortunately: E.g. Superposition has some
pixel shaders where SIMD32 has substantially worse scheduling due to
the increased number of false dependencies due to higher register
pressure, and using SIMD32 for them reduces performance significantly.
The heuristic seems to model this correctly so it doesn't look like we
can do without it at least right now on Xe2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>
We weren't allowing immediates in BFE at all. Gfx12+ supports
immediates in src0 (value) and src2 (width), but not src1 (offset).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437>
This very test was working until the commit 4da147a02b
("mesa: remove fallback for GL_DEPTH_STENCIL"). Indeed this
commit lets the driver handles this path and this was
failing on evergreen r600.
The test was processed through r600_blit() which loads the
fragment shader util_make_fs_blit_zs(). This fragment shader
loads two textures the stencil and depth. The texture depth
was processed properly but the other texture was generating
incorrect values. This issue, which seems to be related to
the hardware configuration, disappears when the underlying
surface is allocated using a width multiple of 32.
This change was tested on cayman and palm with the normal test:
"piglit/bin/ext_packed_depth_stencil-getteximage -auto -fb" and
the test was modified to test all the relevant width and height
values. The gpu rv770 was not affected by this issue. Here is
the result:
spec/ext_packed_depth_stencil/getteximage: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31757>
HSD 14021541470 lists a HW bug on blitter engine where the compression pairing bit is
not programmed correctly for stencil resources.
Use RCS Engine to perform copy instead.
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31792>
Fix a typo in prepare_vp which causes incorrect scissor box with
non-zero X in viewport/scissor.
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31832>
According to pandecode, r32 is global attribute offset and r36 is vertex
offset. Follow panfrost to use r36 instead of r32 for both non-indexed
firstVertex and indexed vertexOffset.
With this, gl_VertexIndex stops being zero-based which is incorrect.
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31810>
I don't think this matters for how we use CounterMap::operator==.
The BITSET_TEST() was unnecessary because of the BITSET_EQUAL above.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30478>
We don't need to bend over backwards to try to repair the consequences
of something that happened earlier when we can just prevent that from
happening in the first palce instead.
While we're at it, also move the cmd_dispatch initialization into the
conditional block, because that's only used by the secondary-buffer
emulation code.
This fixes WSI, because there's more differences than just secondary
command buffers between the device-level dispatch-table and the
cmd_dispatch table.
Fixes: c2299b6642 ("panvk/csf: Implement vkCmdExecuteCommands")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31819>
We don't dirty the CS state, so if a 3D blit comes between binding a
compute pipeline and executing a dispatch then we won't re-emit the
pipeline and invalidating CS state causes immediates emitted via
CP_LOAD_STATE to disappear. Fixes
dEQP-VK.binding_model.descriptor_buffer.ycbcr_sampler.compute_comp.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31764>
This GPU seems to have half the compute constlen of other a7xx GPUs,
because there are sporadic hangs in dEQP-VK.robustness.robustness2.* and
other tests unless we limit the constlen. This does *not* happen on
SM8550-HDK, so it does seem to be specific to the GPU in x1e laptops.
Fixes: b0d22461b9 ("freedreno: Enable the X1-85")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31764>
Setting the surface index is optimal for performance in case of
multiple MRTs because addrlib rotates tiles differently.
But this should be disabled when the image is used for descriptor
buffers capture&replay because the descriptor isn't immutable
(ie. tile_swizzle can be different).
This fixes dEQP-VK.binding_model.descriptor_buffer.capture_replay.*
on some GPUs where tile_swizzle is non-zero.
Fixes: 3b57a35ece ("radv: Enable descriptorBufferCaptureReplay.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31818>