Take advantage of 3 spare JSL in Collabora lab to load the balance of
those jobs:
job name avg duation (min)
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
--- ---
anv-jsl 15
anv-jsl-angle 20
iris-jsl-deqp 18
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31414>
We can't just always negate the alu instruction's cmod, because negating
it can produce different results when the argument is NaN float. We can
still do that if the condition is == or !=.
Fixes: 0ba9497e ("intel/fs: Improve discard_if code generation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11800
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31042>
The per-primitive have their own separate section in the FS thread
payload, and are not considered when setting the mask in
3STATE_SBE's ConstantInterpolationEnable.
This is also consistent with what is done for brw_interp_reg().
Fixes
- dEQP-VK.mesh_shader.ext.misc.clip_geom_provoking_last
- dEQP-VK.mesh_shader.ext.misc.clip_geom_and_task_shader_provoking_last
Backport-to: 24.2
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11844
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31417>
When allocating a buffer normally, this flag gets to the allocator from
the memory requirements, but when sparse bindings are created we were
checking for them but never setting them.
Fixes sparse descriptor buffers on Xe2.
Makes the failure on TRTT more obvious.
Fixes: c6a91f1695 ("anv: add new heap/pool for descriptor buffers")
Fixes: 692e1ab2c1 ("anv: get rid of the second dynamic state heap")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31372>
Valgrind doesn't seem to know that drmSyncobjQuery() writes to the
variable that we pass as 'last_value'. This gets rid of:
==6275== Conditional jump or move depends on uninitialised value(s)
==6275== at 0x5308370: anv_sparse_trtt_garbage_collect_batches (anv_sparse.c:540)
==6275== by 0x53091E2: anv_sparse_bind_trtt (anv_sparse.c:825)
==6275== by 0x5309771: anv_sparse_bind (anv_sparse.c:953)
==6275== by 0x5309A3B: anv_free_sparse_bindings (anv_sparse.c:1041)
==6275== by 0x529FF21: anv_DestroyBuffer (anv_buffer.c:248)
==6275== by 0x932ADBD: ??? (in /usr/lib/x86_64-linux-gnu/libVkLayer_khronos_validation.so)
==6275== by 0x127AA2: MyVkBuffer::~MyVkBuffer() (sparse.cpp:364)
==6275== by 0x12B2D4: MyApp::test1_trivial_sparse() (sparse.cpp:1421)
==6275== by 0x13E01A: MyApp::run_test(int) (sparse.cpp:6594)
==6275== by 0x13E3B0: main (sparse.cpp:6656)
==6275== Uninitialised value was created by a stack allocation
==6275== at 0x53082D3: anv_sparse_trtt_garbage_collect_batches (anv_sparse.c:525)
An alternative to these Valgrind macros would simply have been to
zero-intialize last_value.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31332>
I just noticed that my custom sparse program was not working correctly
when I used ANV_QUEUE_OVERRIDE (instead of enabling the compute queue
by default or using INTEL_ENGINE_CLASS_COMPUTE, which was removed by
commit 600d88ab3c ("intel: Remove INTEL_ENGINE_CLASS_COMPUTE and
INTEL_ENGINE_CLASS_COPY parameters").
It turns out we were not setting the same engine class type when using
ANV_QUEUE_OVERRIDE vs the other cases. Move the code around so the
behavior can stay the same.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31332>
This avoids massively long shader compile times when there is lots of
spilling, at a minor cost of a few more spills/fills. Choose 15 as it is
already the default used by the Cyberpunk 2077 driconf workaround.
Surprisingly the number of additional spills/fills are miniscule in
fossil-db:
Instructions in all programs: 152680595 -> 152681525 (+0.0%)
SENDs in all programs: 7672789 -> 7672789 (+0.0%)
Loops in all programs: 48469 -> 48469 (+0.0%)
Cycles in all programs: 11981743456 -> 11984228708 (+0.0%)
Spills in all programs: 42989 -> 42779 (-0.5%)
Fills in all programs: 76380 -> 76776 (+0.5%)
partly because of the chaotic unpredictability that the choice of
registe to spill has on a shader. For example, this patch massively
helps some shaders in terms of spills/fills:
Spills helped fossils/fossil-db/steam-native/red_dead_redemption2.vk-g6.foz/4101ff9c9b83bf22/SIMD8 fragment: 3208 -> 2894 (-9.8%)
Fills helped fossils/fossil-db/steam-native/red_dead_redemption2.vk-g6.foz/4101ff9c9b83bf22/SIMD8 fragment: 7258 -> 6795 (-6.4%)
Spills helped fossils/q2rtx/q2rtx-rt-pipeline.976f4ab1c0fee975.1.foz/c496e8a549f6b4bf/compute: 109 -> 92 (-15.6%)
Related: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31133
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9241
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11709
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11844
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31269>
Having the actual generated assembly is helpful when trying to figure
out if the code emission and disassembly are implemented correctly.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31305>
This tests is asserting on LNL like :
dEQP-VK.pipeline.monolithic.sampler.border_swizzle.r8_srgb.gbar.custom.gather_1.no_swizzle_hint
dEQP-VK.api.image_clearing.core.clear_color_image.2d.optimal.single_layer.e5b9g9r9_ufloat_pack32
Because blorp tries, for example, to setup a render target with
L8_UNORM_SRGB (which is mapped to the R8_UNORM_SRGB of Vulkan) but is
not supported for rendering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1c7fe9ad1b ("anv: Support fast clears in anv_CmdClearColorImage")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31357>
During compute state save/restore, let's track all the descriptor sets.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30798>
When tests were added, there was a single pipe (float), so there wasn't
a pipe to compare in `operator==`. Add it there now and adjust
expectations accordingly.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31335>
Intel's protection mechanism is descriptor based. There is nothing
going on in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31339>
We can't use register counts since 16-bit sampler loads in SIMD8 will
only write back half a GRF.
Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Fixes: 0116430d39 ("intel/brw: Handle 16-bit sampler return payloads")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>
We can generalize the simd8-16bits case by just rounding to a physical
register.
We also take the opportunity to limit the register allocation to a
single physical GRF for the residency data.
Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Fixes: 0116430d39 ("intel/brw: Handle 16-bit sampler return payloads")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>
In the RT shader, if there's a executeCallableEXT() in between,
even though the called shader does nothing, the instructions before and
after the executeCallableEXT() is not properly synced.
Patch fixes:
- dEQP-VK.ray_tracing_pipeline.memguarantee.inside.rgen
- dEQP-VK.ray_tracing_pipeline.memguarantee.inside.chit
- dEQP-VK.ray_tracing_pipeline.memguarantee.inside.miss
- dEQP-VK.ray_tracing_pipeline.memguarantee.inside.call
Thank to Kevin for finding out there is a load/store issue.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31201>
In commit 44351d67f8, I needed to change some variables in a check for
compression in anv_can_fast_clear_color_view(). Instead of doing that, I
dropped the check altogether because I thought the call to
anv_layout_to_fast_clear_type() which followed right afterwards would
return ANV_FAST_CLEAR_NONE if the aux usage was ISL_AUX_USAGE_NONE.
That turned out not to be the case, due to special-casing of Xe2+. For
now, make Xe2+ more like other platforms when it comes to enabling
fast-clears. If there comes a reason to actually fast-clear with
ISL_AUX_USAGE_NONE, we can revisit this.
Fixes: 44351d67f8 ("anv: Change params of anv_can_fast_clear_color_view")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11920
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31297>
In an earlier commit, I made us stop counting sync.nops in the shader
statistics we use for shader-db (brw_debug_log_message) and fossil-db
(stats->instructions = ...). However, I missed adjusting the printout
for INTEL_DEBUG.
Fixes: 1497f4e0c2 ("intel/fs: Don't include sync.nop in instruction count statistics")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31311>
When no pipeline cache is provided by the application and we rely on the
internal one, cache hits are not counted as such.
This was causing us to return COMPILE_REQUIRED on some cases where all
shaders had been found in the cache, as well as some unnecessary extra
processing in the case that we did have to compile the pipeline.
Fixes: 1dacea10f3 ("anv: implement caching for ray tracing pipelines")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31298>
We have not yet added the shaders to the pipeline->shaders array at
this point. If we couldn't compile (or were asked not to) the
pipeline, we were leaking references to any shaders found in the cache.
This would manifest as an assert on device destruction:
vk_pipeline_cache_destroy: Assertion `cache->object_cache->entries == 0' failed.
Fixes: 58c9f817cb ("anv: fix pipeline executable properties with graphics libraries")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31298>
We found out that some HW changes on Xe2 make the HW avoid reading the
blend state if we're using the null_rt bit in the extended descriptor.
Since the alpha_to_coverage bit resides in the blend state, that state
is ignored and writes are going through to the depth/stencil buffers.
Disable null_rt in the color outputs if the color outputs can affect
other outputs (through alpha_to_coverage & omask).
Fixes tests in this pattern on Xe2 :
dEQP-VK.pipeline.*.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
We'll want to tune this setting based on other parameters.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
We setup an empty render target when there are no color attachments,
which effectively makes it a different surface state. In most cases
the compiler will insert a null-rt bit in the extended descriptor
which means the RT isn't even accessed. But in some cases like
alpha-to-coverage output + depth/stencil write, we will access the
render target because using the null-rt will prevent alpha-to-coverage
from happening.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2bd304bc8f ("anv: Skip the RT flush when doing depth-only rendering.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
Whenever we execute a fast-clear due to LOAD_OP_CLEAR, we decrease the
number of layers to clear by one. We then enter the slow clear function
and possibly exit without clearing if the layer count is zero.
Unfortunately, we've already compiled the shader for slow clears by the
time we exit. Skip the slow clear function if there are no layers to
clear.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31167>
vk_common_QueueWaitIdle() creates a syncobj, does a submit with no
batch buffers what translates to execute trivial_batch_bo and then
waits for syncobj to be signaled when trivial_batch_bo finishes.
On Xe KMD on other hand we can avoid the trivial_batch_bo submission
and instead use the special DRM_IOCTL_XE_EXEC with num_batch_buffer == 0
to get a syncobj to be signaled when the last exec finish execution.
This should free a bit GPU to execute more important workloads.
This will also optimize vkDeviceWaitIdle() that calls QueueWaitIdle().
It have to fallback to vk_common_QueueWaitIdle() when queue is in
VK_QUEUE_SUBMIT_MODE_THREADED mode because vkQueueWaitIdle()
could return but there still stuff in VK/CPU submission queue.
Also it could cause use after free when resources attached to
submission are freed before it is processed, example:
vkCreateFence() or vkCreateSemaphore()
vkQueueSubmit() // with Fence or Semaphore created above
vkQueueWaitIdle() // with the race it returns
vkDestroyFence() or vkDestroySemaphore()
// vk_queue_submit_thread_func() start to process submission above...
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30958>
Split anv_xe_wait_exec_queue_idle() into 2 functions, the first
function creates the syncobj and prepares it to be signaled when the
last workload in queue is completed.
And the second one that calls the first function, then waits for the
syncobj to be signaled and destroy the syncobj.
The main reason for that is that the first function can be reused in
Iris and a future patch will add another user, so lets share it.
No changes in behavior are expected here.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30958>
We currently disable CCS if a 3D Ys/Yf surface uses miptails. However,
ISL generally configures surfaces to be compatible with compression. For
consistency, disable miptails on 3D Ys/Yf surfaces in order to allow
compression.
If drivers prefer to have a more compact layout, they can pass the
ISL_SURF_USAGE_DISABLE_AUX_BIT flag at surface creation time.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30081>
We currently disable CCS if a surface uses more than 11 slots in a
miptail. However, ISL generally configures surfaces to be compatible
with compression. For consistency, reduce the number of slots used in
miptails in order to allow compression.
If drivers prefer to have a more compact layout, they can pass the
ISL_SURF_USAGE_DISABLE_AUX_BIT flag at surface creation time.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30081>