It's split into ac_nir_lower_ps_early ac_nir_lower_ps_late.
ac_nir_lower_ps_early doesn't generate any AMD specific intrinsics except
some system values and is mainly an optimization pass with some lowering.
The new change here is that it also eliminates output components not needed
by spi_shader_col_format.
ac_nir_lower_ps_late lowers output stores to exports and does the bc_optimize
thing.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
Add support for getting time elapsed values via glBeginQuery/glEndQuery.
When recording query start & end time, we ensure that all pending jobs have
been completed by using v3d cpu_queue & the multisync extension.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support for getting timestamp values via
glGet(GL_TIMESTAMP) and glQueryCounter(GL_TIMESTAMP). For the case of
glQueryCounter, we make use of v3d cpu jobs via
DRM_IOCTL_V3D_SUBMIT_CPU and DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support to check if v3d supports the multisync
extension. This will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support to check if v3d supports cpu_queue. This
will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
The attribute ring size per SE is different than GFX11 and it was
already computed correctly in common code but RADV was using the old
GFX11 style.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32994>
The spec says
vkCmdCopyQueryPoolResults is considered to be a transfer operation,
and its writes to buffer memory must be synchronized using
VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before
using the results.
While STORE_MULTIPLE is not exactly VK_PIPELINE_STAGE_TRANSFER_BIT /
VK_ACCESS_TRANSFER_WRITE_BIT, we can still rely on user barriers to do
the right thing (e.g., flush caches for host access).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
When VK_QUERY_RESULT_WAIT_BIT is set, we rely on sync wait. When
VK_QUERY_RESULT_WAIT_BIT is not set, no wait is needed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
We can guarantee ordering with this sequence of async cmds
RUN_FRAGMENT ->
(signal and wait SB_ITER) ->
FLUSH_CACHE2 ->
(signal and wait DEFERRED_FLUSH) ->
SYNC_SET32
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
VUID-vkCmdBeginQueryIndexedEXT-None-00807
All queries used by the command must be unavailable
and panvk_cmd_reset_occlusion_queries is synchronous.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the
status to unavailable and makes the numerical results undefined.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
VUID-VkQueryPoolCreateInfo-queryCount-02763
queryCount must be greater than 0
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
After query pool creation, each query is in an uninitialized state and
must be reset before it is used.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
Format re-interpretation is no longer a problem with texture views. The
clear color address now points to a clear color that is in the expected
format.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>
On gfx12+, the pre-amble and post-amble flushes contain the stalls
necessary to ensure the prior operation is complete. Remove the extra
uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do
this for the pre-amble flushes, but this doesn't have any impact. The
flush application function will implicitly add the bit.
For A750, this improves the TWWH3 trace in the performance CI by 0.52%
(n=2).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
On gfx12+, the post-amble flushes contain the stalls necessary to ensure
the prior operation is complete. Remove the extra uses
iris_emit_end_of_pipe_sync().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
Each reg may store a list of conflict regs. This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_regs (which serves as mem_ctx for that). Since the usage
here is very simple, we just handle the array growth manually.
The initial size remains the same as before.
The mem_ctx of each ra_reg was being used to identify the case
in which the list wasn't used. Change to use a bool in the
ra_regs struct instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
For Intel, looking at a few fossils, the majority of nodes
have more than 32 entries in the list. I'd expect other backends
to have similar numbers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
Each node stores a list of adjacent nodes. This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_graph (which serves as mem_ctx for that). Since the usage
here is very simple, we just handle the array growth manually.
For now keep using the same initial size as was being used by dynarray.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
Create a parallel array to hold them. In particular, the `spill_cost` is
used at a completely different moment than the main node data.
Reduces the `struct ra_node` size to 40 bytes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
Fast-clears require expensive flushes beforehand and afterwards. The
cost of flushes are decreased in a series of back-to-back fast-clears as
no extra fast-clear flushes are required in between them. If the ratio
of a command buffer's recorded back-to-back fast clears over independent
fast-clears falls below 1/2, prevent that command buffer from recording
any further fast-clears.
Averaging two runs of our Factorio trace on an A750 shows a +14.37%
improvement in FPS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>
RBBM_PRIMCTR registers are used for different pipeline statistics that can
be queried, but current usage was wrong in some cases. Comments in the
register file are updated, and the per-statistic register index mapping is
updated accordingly.
Fixes on a750:
test_query_pipeline_statistics in vkd3d-proton
arb_query_buffer_object failures in piglit (zink)
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32900>
If we don't have surface descriptor, treat this as a hint from
application that it will export the surface later.
This matches Intel driver behavior.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32970>