Assertion (or attempting the layout change) is causing crash when
launching Steel Rats. Tighten the condition for change so that it should
affect only when runtime has made changes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12602
Fixes: eed788213b ("anv: ensure consistent layout transitions in render passes")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33523>
On gfx12+, the pre-amble and post-amble flushes contain the stalls
necessary to ensure the prior operation is complete. Remove the extra
uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do
this for the pre-amble flushes, but this doesn't have any impact. The
flush application function will implicitly add the bit.
For A750, this improves the TWWH3 trace in the performance CI by 0.52%
(n=2).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
Fast-clears require expensive flushes beforehand and afterwards. The
cost of flushes are decreased in a series of back-to-back fast-clears as
no extra fast-clear flushes are required in between them. If the ratio
of a command buffer's recorded back-to-back fast clears over independent
fast-clears falls below 1/2, prevent that command buffer from recording
any further fast-clears.
Averaging two runs of our Factorio trace on an A750 shows a +14.37%
improvement in FPS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>
Use 3DSTATE_URB_ALLOC_* instruction to program URB for multislice device
config.
In case only one slice is available in the device, SliceN fields will be
ignored by HW.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32736>
Prevents the next patch from failing CTS tests such as:
dEQP-VK.api.image_clearing.core.clear_color_image.*.b4g4r4a4*
Brings back the feature that was introduced in commit 46187bb54f
("anv: Swizzle fast-clear values"), but went unused in commit
721d0c3e77 ("anv,hasvk: Always use BLORP_BATCH_NO_UPDATE_CLEAR_COLOR").
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32187>
We're going to drop a generic restriction on clear color conversions in
anv_can_fast_clear_color(). Without preparing for it, the following
tests would fail:
* piglit.spec.arb_framebuffer_srgb.blit texture srgb msaa disabled clear.gen9_zinkm64
* piglit.spec.arb_framebuffer_srgb.blit renderbuffer srgb msaa disabled clear.gen9_zinkm64
* piglit.spec.arb_framebuffer_srgb.blit texture srgb downsample enabled clear.gen9_zinkm64
* piglit.spec.arb_framebuffer_srgb.blit renderbuffer srgb downsample enabled clear.gen9_zinkm64
* piglit.spec.arb_framebuffer_srgb.blit renderbuffer srgb msaa enabled clear.gen9_zinkm64
* piglit.spec.arb_framebuffer_srgb.blit texture srgb msaa enabled clear.gen9_zinkm64
So, add support for sRGB sampling via BLORP transfer operations and drop
the gfx9-specific restriction on sRGB fast-clears.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32187>
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.
This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
If a render area covers an area that is smaller than an attachment's
extent and is not aligned to the CCS block size, we must load the clear
color so that the pixels outside of that area are decompressed with the
right clear color.
Prevents the next patch from causing the following test failure on gfx9:
dEQP-VK.renderpass.suballocation.load_store_op_none.color_load_op_none_store_op_none
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
Store an array of clear values, one for each view format of the image.
Load the clear value based on the view format.
anv_image_msaa_resolve() may override the source or destination with
ISL_FORMAT_UNSUPPORTED, so make anv_image_get_clear_color_addr() handle
that format.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
Fixes: 4aa3b2d ('anv: LNL+ doesn't need the special flush for sparse')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31737>
This will make things easier in situations where we don't want to use
the binding table at all (indirect draws/dispatches).
The mechanism is simple, upload a vec3 either through push constants
(<= Gfx12.0) or through the inline parameter register (>= Gfx12.5).
In the shader, do this :
if vec.x == 0xffffffff:
addr = pack64_2x32 vec.y, vec.z
vec = load_global addr
This works because we limit the maximum number of workgroup size to
0xffff in all dimension :
maxComputeWorkGroupCount = { 65535, 65535, 65535 },
So we can use the large values to signal the need for indirect
loading.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
When flushing the render target cache for future operations, we need a
stall at pixel scoreboard. We likely didn't see any issue until now
because a change in render target added the pb-stall.
When using a 2 compute shaders with the following pattern :
vkCmdDispatch()
vkCmdPipelineBarrier() ImageBarrier with (src|dst)AccessMask=0 & identical layout
vkCmdDispatch()
we should ensure that the first dispatch is completed before executing
the second one, otherwise they can race to on resource accesses. This
fixes failures in some new CTS tests.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31676>
Coverity notices that we've insured that index index is < MAX_RTS in one
case, but that it must be greater in one case. Since `color_att_count`
is a uint32_t, it can easily exceed MAX_RTS (8), and would thus create
an out-of-bounds read situation. While the type system would allow this,
the actually implementation shouldn't, so an assert should make Coverity
happy and help us check our assumption.
CID: 1620440
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31640>
These should be included according to table in Bspec 43904.
Patch removes PIPE_CONTROL_STATE_CACHE_INVALIDATE based on HSDES.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29764>
This avoids sprinkling those all over the code base. Debug breakpoints
are put in there too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31481>
The workaround is already implemented by
batch_emit_pipe_control_write(), we don't need to do it here as well.
This was spotted by Lionel Landwerlin. The credits go to him, I just
wrote the patch.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31412>
Commit a603cc0633 ("anv: move some pc was to
batch_emit_pipe_control_write") moved some WAs from
emit_apply_pipe_flushes() to batch_emit_pipe_control_write(), but it
turns out one of them was already there since cf7e1f3817 ("anv,
iris: add missing CS_STALL bit for GPGPU texture invalidation").
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31412>
We setup an empty render target when there are no color attachments,
which effectively makes it a different surface state. In most cases
the compiler will insert a null-rt bit in the extended descriptor
which means the RT isn't even accessed. But in some cases like
alpha-to-coverage output + depth/stencil write, we will access the
render target because using the null-rt will prevent alpha-to-coverage
from happening.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2bd304bc8f ("anv: Skip the RT flush when doing depth-only rendering.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
Whenever we execute a fast-clear due to LOAD_OP_CLEAR, we decrease the
number of layers to clear by one. We then enter the slow clear function
and possibly exit without clearing if the layer count is zero.
Unfortunately, we've already compiled the shader for slow clears by the
time we exit. Skip the slow clear function if there are no layers to
clear.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31167>
Coverity alerts that the uint32_t pointer I was passing into
isl_color_value_pack() could possibly be used as an array. The value is
being used as such, but only the first element of that array should be
accessed. That's because the depth buffer formats I'm also passing into
the function only have a single channel, R. Nonetheless, let's update
the code to avoid the warning.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31123>
Otherwise we can end up with uninitialized values, this fixes following
valgrind warning:
==31283== Uninitialised byte(s) found during client check request
==31283== at 0x503E4DE: anv_batch_bo_finish (anv_batch_chain.c:345)
==31283== by 0x504220A: anv_cmd_buffer_end_batch_buffer (anv_batch_chain.c:1103)
==31283== by 0x55A0E4F: end_command_buffer (genX_cmd_buffer.c:3455)
==31283== by 0x55A0E82: gfx11_EndCommandBuffer (genX_cmd_buffer.c:3466)
==31283== by 0x11233A: ??? (in /usr/bin/vkcube)
==31283== by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283== by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)
==31283== Address 0xc10c4d8 is 1,240 bytes inside a block of size 8,192 client-defined
==31283== at 0x5036EF6: anv_bo_pool_alloc (anv_allocator.c:1284)
==31283== by 0x503E0E1: anv_batch_bo_create (anv_batch_chain.c:262)
==31283== by 0x5040D3F: anv_cmd_buffer_init_batch_bo_chain (anv_batch_chain.c:868)
==31283== by 0x504F9C1: anv_create_cmd_buffer (anv_cmd_buffer.c:147)
==31283== by 0x6B718C4: vk_common_AllocateCommandBuffers (vk_command_pool.c:206)
==31283== by 0x4FB06B2: vkAllocateCommandBuffers (trampoline.c:1996)
==31283== by 0x111E6B: ??? (in /usr/bin/vkcube)
==31283== by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283== by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30990>
The benchmarks we're tracking tend to prefer clearing depth buffers to
0.0f when the depth buffers are part of images with multiple aspects.
Otherwise, they tend to prefer clearing depth buffers to 1.0f.
Replace the ANV_HZ_FC_VAL constant with a function which implements this
heuristic.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30767>
Xe2 can easily support fast-clearing depth buffers to multiple clear
values. Instead of assuming a hard-coded value in various parts of the
driver, pass the clear value down the expected paths.
For consistency, also adjust the slow depth clear function to have a
matching parameter.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30767>
We're going to be storing clear colors from the drivers rather than
BLORP. Add a function for this purpose.
For now, the first use replaces init_fast_clear_color(). One change in
behavior is that the clear color initialization is now done without
write-checking on gfx12. This actually matches what anv does to other
writes to the image's fast-clear tracking state. We can fix this later
if and when we address the larger issue.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30824>
The hardware's clear color conversion feature requires invalidating the
texture cache for every fast clear. We're no longer using the hardware
feature, so we longer need the invalidation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>