If we're in handle_collect()'s dst allocation and are part of a merge set
near the end of the file, our check for reg_elem_size(reg) would let us
use the preferred reg when that would immediately lead to
allocate_dst_fixed() creating an interval extending thruogh reg_size(reg)
that overflows the file.
Avoids a regression on gfxbench5/gl_5_high_off/17.shader_test in the next
commit. No change on shader-db.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18946>
(cherry picked from commit a39113b616)
While Panfrost allocates linear images with strides that are a multiple of 64
bytes, other dma-buf producers on the system may not satisfy this requirement.
However, at least on v7 and newer, any image with a regular format must have a
stride that is a multiple of 64 bytes.
This fixes a real bug in an application that created a linear R8_UNORM image
with stride 480 bytes, imported it as an EGL_image, and then tried to texture
from it with the GPU. Previously, the driver allowed this situation but it
resulted in an imprecise fault from the GPU. This patch corrects the driver to
reject the import as invalid due to the unaligned stride, ensuring we never
attempt to texture from such a resource.
To implement, we add some new layout queries to centralize knowledge about the
stride alignment requirements, and we sprinkle in asserts to show how the
invariant is upheld throughout the lifecycle of image creation to texturing.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19620>
(cherry picked from commit 811f8a1946)
Outputs are always considered 32-bits.
Found by inspection.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19612>
(cherry picked from commit d2563e6600)
This can cause us to stomp the contents of r5 before we have a chance to read
it, like this:
0x3d103186bb800000 nop ; nop ; ldvary.r0
0x3d105686bbf40000 nop ; mov rf26, r5 ; ldvary.r1
0x020000ef0000d000 bu.allna 232, r:unif (0x0000001c / 0.000000)
0x3d1096c6bbf40000 nop ; mov rf27, r5 ; ldvary.r2
Here, the MOV in the last instruction is supposed to read r5 produced from
ldvary.r0, but because we have inserted the bu instruction in between now
that read happens at the same time that ldvary.r1 updates r5, stomping the
value we were supposed to read.
Fix this by disallowing injection of a branch instruction in between an ldvary
instruction and its write to the r5 register 2 instructions later.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7062
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19616>
(cherry picked from commit 1174f37609)
The back-end swizzles dwords so that our indirect scratch messages match
the memory layout of spill/fill messages for better cache coherency.
The swizzle happens at a DWORD granularity. If a read or write crosses
a DWORD boundary, the first bit will get correctly swizzled but whatever
piece lands in the next dword will not because the scatter instructions
assume sequential addresses for all bytes. For DWORD writes, this is
handled naturally as part of scalarizing. For smaller writes, we need
to be sure that a single write never escapes a dword.
Fixes: fd04f858b0 ("intel/nir: Don't try to emit vector load_scratch instructions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7364
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
(cherry picked from commit 25c180b509)
Because dup_mem_intrinsic() retains the SSA offset from the original
intrinsic and only modifies it by adding a constant, we can compute the
alignment based on the original alignment and the constant offset. This
is both easier and more accurate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
(cherry picked from commit 85685cf932)
If an application was transitioning out of fullscreen exclusive
display mode, the wsi_display_connector->active state was not
reset in vkReleaseDisplay() from fullscreen. When the app then
later tried to go to fullscreen display mode again on the same
display output with the same video mode, this caused
_wsi_display_queue_next() to skip a required drmModeSetCrtc()
during the first vkQueuePresent() after entering direct display
mode.
While this often worked by pure luck on a single-display setup,
it goes sideways on a multi-display setup where the viewport
of the associated crtc does not have a (x,y) offset of (0,0).
E.g., XOrg/X11 RandR output leasing of an output whose viewport
starts at x = 1920:
1. X-Server has RandR outputs viewport at x = 1920, in a shared
framebuffer, shared across all crtc's on a X-Screen.
2. Application leases that output for direct display mode,
1st vkQueuePresent() triggers drmModeSetCrtc() of output
to (x,y) = 0,0, as required for Vulkan/wsi/direct framebuffer
setup.
3. Application does rendering and presenting.
4. Application vkReleaseDisplay() the output, terminates the
RandR lease. X-Server takes over again.
5. X-Server modesets to reconfigure output back to viewport
with (x,y) = 1920, 0.
6. Application leases same output again later on, and tries
vkQueuePresent() again. Because of the bug fixed in this
commit, the required drmModeSetCrtc() to (x,y) = 0,0 is
erroneously skipped due to the stale cached connector state.
7. drmModePageflip() fails due to the wrong crtc viewport
(x,y) = 1920, 0, mismatched for the need of the Vulkan
framebuffer of (x,y) = 0,0. Kernel returns -ENOSPACE,
Swapchain goes into permanent VK_ERROR_SURFACE_LOST state.
Destroying and recreating the swapchain, as recommended
by the Vulkan spec for error handling won't help. Game over!
Resetting wsi_display_connector->active = false; fixes the
problem of wrong / stale connector state and Vulkan/wsi/display
clients are happy on multi-display setups again, as tested
in various single- and multi-display configurations.
This bug affects all Mesa releases with Vulkan/WSI/Display
support and should therefore be backported.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 352d320a07 ("vulkan: Add EXT_direct_mode_display [v2]")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19484>
(cherry picked from commit 24094ee03d)
If the map doesn't set MAP_DISCARD_RANGE, we do have to copy the existing
contents over. MAP_WRITE on its only gives permission to replace the contents,
unfortunately it does not require that the application actually do so.
Closes: #7640
Fixes: 0b26a9f773 ("panfrost: Don't copy resources if replaced")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Roman Elshin
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>
(cherry picked from commit cf7a3906b0)
Fixes issue with "is helper invocation" that in recent SPIR-V is mapped to
a volatile Load. The CSE was catching the loads before they were transformed
in the new is_helper_invocation intrinsic (that is not reorderable).
Fixes: 729df14e45 ("nir: Handle volatile semantics for loading HelperInvocation builtin")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19432>
(cherry picked from commit 8ab628ab2e)
Don't copy propagate the constant in situations like
mov(8) g8<1>D 0x7fffffffD
mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to
mul(8) g15<1>D g14<8,8,1>D 0xffffUW
mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in
mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
Volume 2a of the Skylake PRM says:
When multiplying a DW and any lower precision integer, the
DW operand must on src0.
See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.
Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur. I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits. The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.
No shader-db or fossil-db changes on any Intel platform.
Fixes: de6c0f8487 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
(cherry picked from commit db20412168)
XFB queries need to be enabled with NGG streamout and VS/TES.
Previously, the NGG lowering code relied on has_prim_query for XFB.
This fixes failures with RADV_PERFTEST=ngg_streamout on GFX10.3 with
the vkd3d-proton testsuite. Vulkan CTS is missing TES tests with XFB
queries apparently.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19493>
(cherry picked from commit 505290dc44)
Meta operations change dynamic states like viewports and previously,
the guardband state was also always re-emitted because it relied on
dynamic viewport/scissor changes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7577
Fixes: 40d8df7280 ("radv: emit the guardband state separately from the scissor state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19521>
(cherry picked from commit 33d60bda9d)
While the GC880 is HALTI0, it still uses the old set of state registers
for PE pipe configuration. This is another specialty of the GC880, readd
the missing handling for this GPU otherwise e.g. Qt5 cube example suffers
from rendering corruption with both eglfs and wayland backends.
Fixes: 7c46a48836 ("etnaviv: use new PE pipe address states on >= HALTI0")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19562>
(cherry picked from commit 20984aab0f)
Currently float16 to int64 conversions don't work correctly, because
the "div" variable has an infinite value, since 2^32 isn't
representable as a 16-bit float, which causes the result of of rem(x,
div) to be NaN for all inputs, leading to an incorrect result. Since
no values of magnitude greater than 2^32 are representable as a
float16 we don't actually need to do the fdiv/frem operations, the
conversion is equivalent to f2u32 with the result padded to 64 bits.
Rework:
* Jordan: Handle f16 in if/else rather than conditional
Fixes: 936c58c8fc ("nir: Extend nir_lower_int64() to support i2f/f2i lowering")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19391>
(cherry picked from commit e14f85366e)
`pred` is a pointer, for sufficiently large numbers these
being cast to int were both > 0 regardless of the order
of `data1` and `data2`.
Fixes: 523a28d3fe ("nir: add an instruction set API")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19539>
(cherry picked from commit c987a727a7)
GDS is used for NGG queries/streamout (GFX10+ only) and the BOs were
only added to the graphics queue because compute doesn't need them.
Though, the kernel emits a GDS switch when a queue submission doesn't
use GDS. That means that submitting jobs on the compute queue without
GDS can reset the state of the graphics queue and lead to GPU hangs.
The only viable solution for now is to make the GDS BOs resident to
avoid resetting the state between queues. This shouldn't introduce
more syncs between queues because GDS BOs are similar for both.
This fixes a GPU hang with Warhammer Chaosbane during loading time and
possibly some spurious random GPU hangs. Note that this GPU hang was
workarounded on the Steam side with RADV_DEBUG=nongg.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19466>
(cherry picked from commit 26c8fedc1b)
This test as a whole does not seem to work anywhere, even lavapipe, but
one particular subtest was passing until a recent change
(!19438 - zink: polygon mode fixes?).
After consideration by @kusma, it appears that the subtest was passing
by accident due to zink generating the wrong values. Given that this is
not something that users would ever experience as a regression, we
simply document this new failure along with all the others for this
test.
Fixes: 53721827ea ("zink: correct depth-bias enable condition")
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19517>
(cherry picked from commit d7ad9e7014)
We have been incorrectly assuming there was just one for all the
events, apparently CTS never uses more than one event.
Fixes: e6884df088 ('v3dv: fix event synchronization')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19518>
(cherry picked from commit 36ef75b6eb)
Found by inspection because the MIN_LOD bits were moved.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19496>
(cherry picked from commit e891e84f4b)
Since we now implement events in the GPU we need to be more careful
and insert barriers to honor the dependencies provided by the API
as well as ensuring we are synchronizing these with the compute
queue, since that is how we implement GPU event functionality.
Fixes: ecb01d53fd ("v3dv: refactor events")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19458>
(cherry picked from commit e6884df088)
These leaks on device creation failure have been there before, but
were only exposed as CTS failures after the recent event refactoring.
Partially fixes:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
dEQP-VK.api.object_management.alloc_callback_fail.device
dEQP-VK.api.object_management.alloc_callback_fail.device_group
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19458>
(cherry picked from commit b78fd50e90)
It would assert anyways. Found by inspection.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19495>
(cherry picked from commit fab87b0f41)
Implement Wa_1508744258:
Disable RHWO by setting 0x7010[14] by default except during resolve
pass.
Disable the RCC RHWO optimization at all times except when resolving
single sampled color surfaces.
v2: Move stalling to genX(cmd_buffer_apply_pipe_flushes) for clarity (Mark)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19450>
(cherry picked from commit ba0336ab3f)
Copied wrong from radeonsi. The registers following the scratch
buffer address are the shader rsrc1/rsrc2. Not the user SGPR0
containing the ring resource word 1.
Fixes: 278e533ec9 ("radv: update scratch buffer registers on GFX11")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19488>
(cherry picked from commit b8865ad046)
The entire point of resource shadowing is to avoid unnecessary flushing.
Flushing readers after shadowing is counterproductive. A refresher on
how resource shadowing is supposed to work:
First, we determine if it's beneficial to shadow resources. If so, we
create a new backing buffer object. We flush the current writer of the
resource, if there is one, so the current contents become known to the
CPU. If we are not discarding the original resource, we then copy the
existing contents of the buffer to the new shadow buffer on the CPU.
Finally, we swap the resource's backing buffer for our shadow. Any batch
that reads the resource will continue to read the old copy of the
resource, and any future draw calls will see the new copy with the
change implemented.
Where did we go wrong?
In 988d5aae74 ("panfrost: Flush resources when shadowing"), we started
flushing all readers. We didn't actually need to flush, we just needed
to avoid dangling references on the batches reading the old copy of the
resource. But that's easily enough avoided: just remove the references.
The batches still hold a reference to the underlying BO, which will be
freed at the right time regardless.
Originally motivated by glmark2 -bbuffer:update-method=subdata, which
has some pathological access paterns.
Firefox is a lot faster anecdotally (now scrolling at 60fps in firefox).
But what actually motivated this is an apitrace from Duckstation's GLES
renderer. With this patch, the in-game portion is improved 3fps to 21fps.
Closes: #4028
Fixes: 988d5aae74 ("panfrost: Flush resources when shadowing")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19361>
(cherry picked from commit 2d8f28df73)
If a synchronized transfer_map is going to overwrite an entire resource,
there's no need to memcpy in the original contents ahead-of-time. This
memcpy is particularly bad for large buffers where it's copying WC->WC,
although that could be mitigated with threaded_context's cpu_storage in
the future if needed.
Prevents a performance regression in glmark2's buffer scenes from the
next patch, hence the Cc.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19361>
(cherry picked from commit 0b26a9f773)