It turns out that we need to add a NOP not only in between two
consecutive WHILE instructions, but also after every control flow
instruction that immediately precedes a WHILE.
v2: Rebase after the renames.
Fixes: 5ca883505e ("brw: add a NOP in between WHILE instructions on LNL")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
I'm not aware of any workloads that will be impacted by this change,
but let's keep our list of control flow instructions complete. A
shader-db run on MTL tells me nothing changes.
v2: "The scheduler relies on HALT not being considered control flow to
be able to move code past HALT instructions. Doing this would prevent
such optimization from happening and would reduce performance
dramatically in some cases." - Francisco.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
The state-tracker already tells us if we should use rectangular ends or
not on our lines, so we don't need to manually infer this from
combination of states.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33382>
This field is really about the line-shape, not multisampling or not.
Yeah, in OpenGL, these two concepts are kinda intertwined. But this is
what the state actually does, so let's name it based on that.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33382>
In `fossilize-replay --pipeline-hash 375a63e14afa96c4
fossils/fossil-db/steam-dxvk/f1_22_abu_dhabi.dx12vk-ultra.foz`,
`cf_count` would get decremented below zero. This would lead trying to
print `UINT_MAX` levels of indentation just a few lines below. I ran
out of disk space and patience before that finished. 🤣
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33748>
Fill the core features and properties properly, and conditionally pass
through support of the extension based on the renderer venus protocol
spec version.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33757>
For setters, e.g. vk_set_physical_device_properties_struct used by venus
to fill all props, the out array storage comes from the driver, so we'd
assign directly. This change also fixes the template indent and drops an
unused arg.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33757>
The img-2-img and layout transition are trivial passthrough. For
img-2-mem and mem-2-img copies, host pointer has to be sized for proper
protocol encoding and decoding, and we have to either query or calculate
on our own based on VK_HOST_IMAGE_COPY_MEMCPY_EXT flag being used or
not.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33757>
Currently the condition to use a single MOV is failing on immediate
values, so we emit 2 MOVs in SIMD8 instead of a single SIMD16.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33704>
No need to burn an entire PAGE_SIZE BO for an event. And in particular
the pattern of allocate + immediate mmap is expensive in a VM.
Suballocating cuts down the # of times we do this in
dEQP-VK.api.command_buffers.execute_large_primary from 10000 to 157,
avoiding problems with the test running up against watchdog timeout.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33721>
semantic patch made a few bad choices.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
in prep for removing this method.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
apply our semantic patch manually to the remaining users. Coccinelle bailed on
these files for whatever reason, I guess.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
These will replace nir_metadata_preserve as more ergonomic replacements that
convey a notion of impl progress instead of simply updating metadata.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
Since there is only one userq per process there is no need to add
glWaitSync to cs->seq_no_dependencies if the fence is not imported
and ip type is same.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33661>
Jobs from multiple context are submitted to aws->cs_queue are executed in order. Jobs
in aws->cs_queue are directly added to userqueue ring, hence userqueue execution order
between context is guaranteed in case of userqueue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33661>
radeon_cmdbuf is rcs instead of rws, probably earlier renaming of
rws was agressive.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33661>
Instead of csc1 and csc2, make it as an array. Use current_cs_index
to point to csc that will be getting filled with commands.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33661>
Use amdgpu_cs(rcs)->csc. This will give more code readability with
next cleanup patches.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33661>
The issue with using video buffer associated data is that the data will
not be cleared when the buffer is removed from DPB. This will cause
issues if application tries to reuse such buffer (buffer that was
valid buffer in DPB in the past, but is currently not active in DPB)
as a dummy buffer for missing reference.
With Tier2 this works correctly because we allocate the DPB buffers
internally, but with UDT we use the video buffers directly for
references and so we need to make sure to only use the valid buffer
for a given index.
Instead of storing the buffer index as video buffer associated data,
use the render_pic_list array that we already have for keeping track
of active buffers in DPB.
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33641>
There are some tests that reaches out of memory (OOM) on purpose to
cover some fail cases.
But others that shouldn't are actually causing OOM too because we run
multiple tests in parallel, which increases the memory pressure.
This can affects other tests running in parallel, causing an increase of
the flakiness.
It is better to skip all of them
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33763>
This used to work fine with linear only, but now we need to use the
actual chroma surface pitch. For JPEG this value is in bytes.
Also swap 64KB_R_X addr mode with 256KB_S_X.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33598>