These values are taking from runtime interrogation of the media driver.
It would be nice to know if they are correct, but they work.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20782>
this was including the generated tcs bits, which was likely to be wrong
and thus break optimal key hashing, requiring more pipelines
it also wasn't setting the optimal key value correctly during precompile,
which meant the wrong hash value was used and the precompiled libs were never
actually accessible
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21169>
failing to set this yields patterns like
* bind fs
* bind samplerviews
* draw
* bind fs2
* ~~unbind samplerviews~~ (eliminated)
* draw
the eliminated unbinding of samplerviews between draws also eliminates a descriptor update,
triggering various artifacts in certain corner cases (like DOOM2016 shadows)
it's possible to manage the updating during shader binding, but the detection is a bit more
complex, and the cpu overhead from maintaining the current codepath with an
extra pipe_context::set_sampler_views (et al) isn't high enough to warrant further investigation
at this time
fixes#8252
Fixes: 153af03b94 ("gallium: Add cap to request state validation for all dirty state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21176>
At vkQueueSubmit time, for each batch with timeline semaphores to
signal, append cmd_buffers with feedback cmds to update the counter
value in its respective feedback slot.
Since multiple signals on the same semaphore could be pending at the
same time across batches/vkQueueSubmits, src slots and commands are
allocated on demand. These src slots can be reused after they've been
signaled (if the current semaphore counter is greater/equal than the
src value) and are cleaned up on vkDestroySemaphore.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
Unlike fence feedback, commands to update timeline semaphore feedback
slots can't be fully pre-recorded because of the counter value input
for signaling timeline semaphores. To avoid fully recording commands
during vkQueueSubmit, pre-record commands that write a counter value
from a feedback "src" slot to the feedback "dst" slot. Then at
vkQueueSubmit, parse the signal semaphores and write the signal counter
value in the feedback src slot and append the command that writes from
that feedback src slot offset to the command buffer associated with the
signal semaphore.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
Refactor into the following stages:
- prepare: Does an initial pass setting vn_queue_submission fields
and fixing up semaphores.
- alloc_storage: based on fields (including counts) from prepare,
calculate and allocate the amount of temporary storage needed.
- setup_batches: perform any modifications on the submission
batches using the allocated temporary storage.
- cleanup: free any temporary storage used.
Currently, only fence feedback needs alloc_storage and setup_batches
to append fence feedback to the submission but this slow will also
be utilized by upcoming timeline semaphore feedback.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
This hardware bug is the result of a control flow optimization present
in Gfx8-9 meant to prevent the ELSE instruction from disabling all
channels and update the control flow stack only to have them
re-enabled at the ENDIF instruction executed immediately after it.
Instead, on Gfx8-9 an ELSE instruction that would normally have ended
up with all channels disabled would pop off the last element of the
stack and jump directly to JIP+1 instead of to the ENDIF at JIP,
skipping over the ENDIF instruction. In simple cases this would work
okay (though it's actual performance benefit is questionable), but in
cases where a branch instruction within the IF block (e.g. BREAK or
CONTINUE) caused all active channels to jump outside the IF
conditional, the optimization would break the JIP chain of "join"
instructions by skipping the ENDIF, causing the block of instructions
immediately after the ENDIF to execute with all channels disabled
until execution reaches the reconvergence point.
This issue was observed on SKL in the
dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.0.38
test in combination with some Vulkan binding model changes Lionel is
working on. In such cases the execution with all channels disabled
was leading to corruption of an indirect message descriptor, causing a
hang.
Unfortunately the hardware bug doesn't provide a recommended
workaround. In order to fix the problem we point the JIP of an ELSE
instruction to the instruction immediately before the ENDIF -- However
that's not expected to work due to the restriction that JIP and UIP
must be equal if and only if BranchCtrl is disabled -- So this patch
also enables BranchCtrl, which is intended to support join
instructions within the "ELSE" block, which in turn disables the
optimization described above, which in turn causes us to execute the
instruction immediately *before* the ENDIF with all channels disabled
-- So in order to avoid further fallout from executing code with all
channels disabled we need to insert a NOP before ENDIF instructions
that have a matching ELSE instruction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20921>
It took me a minute to figure out why the last uni_offsets NULL check
didn't also need to check num_offsets. I think this makes the code
slightly easier to understand.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21144>
A few lines earlier uni_offsets is accessed with ubo scaled by
PIPE_MAX_CONSTANT_BUFFERS:
if (uni_offsets[ubo * PIPE_MAX_CONSTANT_BUFFERS + i] == offset)
Found by inspection.
Looking at the before and after NIR code for
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays,
using the correct indexing appears to enable the pass to inline an
additional uniform. My guess is that when a uniform is used more than
once, the first loop wouldn't find the offset recored in the table
because it was recorded at the wrong location.
Fixes: d23a9380dd ("lavapipe: implement extreme uniform inlining")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21144>
Revert part of the commit 8187b35f to disable decoder fence for now
since it is causing regression for transcoding tests.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21145>
Point sprite coordinates in general need to be inverted,
not just the texcoords converted to point sprite.
Move point coord y inversion out to its own pass.
Fixes GTF-GL46.gtf21.GL2FixedTests.point_sprites.point_sprites
with FBO dEQP surface.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21050>