Here we switch to using a sorted arrays for the binary search which
significantly speeds up the lookups. For a shader with ~8000
uniforms its up to 10x faster and the godot-tps-gles3-high.trace
in issue #13894 returns to its original runtime length before we
switched to using range remap in e052254066
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
This creates an array of the linked list elements to be used for
faster binary searches in the following patch, and frees the old
linked list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
This will allow us to use the linked list for validation of inserts
during linking then switch to using an array for fast binary searches.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
If the ptr value is NULL and we match an existing entry during an
insert call we now just return the existing value. This allows us
to drop an extra lookup, this will become important as the
following patches change `util_range_remap()` lookups to use a
sorted array that is not created until after we have added all
entries to our linked list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
This works around a long-standing synchronization issue consequence of
the HALT instruction used to implement FS discard not being considered
a control flow instruction by the back-end -- The fact that it doesn't
cause the CFG pass to introduce an edge in the graph means that the
software scoreboard pass is completely blind to the effect of discard
jumps on control flow, so it doesn't introduce the required
annotations to avoid data hazards when the discard path of the CFG is
taken. Note that because of the very limited set of instructions that
can follow the HALT target in a fragment shader this was very unlikely
to lead to issues in practice, but starting on xe3 it appears to have
become far more likely due to the use of SENDG, since SENDG requires
the scalar register to be set prior to the submission of the render
target write payloads, which can easily lead to a WaR hazard if there
was another SENDG before the HALT jump that wasn't done reading out
its payload from the GRF.
In an ideal world this would be avoided by having HALT be a normal
control flow instruction represented as an edge in the control flow
graph -- But unfortunately that would prevent the optimizations we
currently do that take advantage of the ability of reordering code
past the HALT instruction, so it would have a pretty large performance
cost. Instead this simply adds a SYNC.ALLWR instruction after the
HALT target to guarantee that all pending SEND messages have finished
execution -- That may also seem costly, however its cost in practice
appears to be minimal since at the point of the program when the
target HALT is executed there is almost nothing left to do other than
send out the render target write payloads, so any pending operations
had to be waited on at roughly this point of the program regardless.
There appear to be no statistically significant regressions in Traci
on neither BMG nor PTL. Fixes hangs observed on Dying Light 2 and
Cyberpunk on PTL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13896
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13965
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14092
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37674>
Instead of a bunch of switches which have to match, this introduces a
table which we can use to map bidirectionally from GOBType to
(GOBKindVersion, SectorLayout).
Backport-to: 25.2
Reviewed-by: James Jones <jajones@nvidia.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37824>
Recursion in Rust can be annoyingly expensive. The recursive DFS
implementation we have now is probably not terrible but these things
bloat in debug builds more than you'd expect. This replaceas it with a
non-recursive implementation which uses a Vec instead.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
There are four depth first searches in cfg.rs. This adds a DFS
abstraction which we can later optimize.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
The proprietary driver seems to always do this before a graphics
semaphore release. This is presumably weaker than a WFI because it's
never emitted for pipeline barriers.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37857>
tc_reserve_set_vertex_elements_and_buffers_call slots data are only valid
after the call to tc_set_vertex_elements_for_call.
If a batch flush occurs between these 2 calls, random memory will be read
leading to crashes.
The only user of tc_reserve_set_vertex_elements_and_buffers_call being
st_update_array_templ, we can determine that only 2 tc_buffer_unmap calls
can be inserted, so we reserve slots for them.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37763>
Call it before tc_add_set_vertex_elements_and_buffers_call to
make sure that its use from st_setup_current -> st_add_releasebuf
won't insert calls into the batch.
Fixes: 1638d486 ("gallium/u_threaded,st/mesa: add a merged set_vertex_elements_and_buffers call")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37763>
_mesa_glthread_release_upload_buffer will call _mesa_reference_buffer_object
which will then end up in tc_resource_release.
Doing so from glthread is bad because we now have 2 threads concurrently
accessing TC batch: glthread and the unmarshalling/tc thread.
Fix this by queuing the release call through an "Internal" glthread function.
Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37763>
In release builds, the assertion checking whether spilling failed isn't
evaluated, which can result in shader compilation continuing despite this.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37872>
New Vulkan CTS 1.4.4 started requiring glx.pc pkg-config file. Provide
one if GLVND is not used in order to let VK CTS and other programs find
Mesa GLX implementation.
Cc: mesa-stable
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37834>