This workflow has been discussed a lot with the team for the past
few years. Let's just clarify it for real in the documentation.
Co-written-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41239>
GPU capture bugs if heap sizes are not aligned to at least 16K. Ensuring that
they are is not expected to impact memory usage since it seems the actual
internal memory allocation is already aligned to 16K, the issue is only with
how the heap reports its size versus the allocation size that capture uses.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41218>
If a loop has only one break case, Metal appears to re-order it to after
the loop ends, which goes against the expected behavior for reconvergence.
Work around this by putting the break statement into a trivial, always-true
runtime conditional, when maximal reconvergence is requested.
Fixes dEQP-VK.reconvergence.maximal.compute.nesting*
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41229>
Certain games tend to use rendering patterns that strongly prefer
one mode over the other, and thus we're better off not bothering
with profiling them.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This introduces a new option that makes autotune optimize for low
preemption latency which is crucial to ensure responsiveness on
systems with GPU-based composition. A large enough draw can entirely
block the compositor from running with draw-level preemption, this can
be mitigated by preferring to use GMEM which breaks up the draw into
smaller pieces and generally has a lower latency for preemption.
As a further mitigation, tiles in GMEM are then divided into smaller
and smaller pieces which lowers the non-preemptible duration. There
are static checks in place to avoid doing this when it would incur a
cost that is too large.
Uses performance counters read during ambles to detect preemption
latency events while rendering in SYSMEM. This approach is superior
to using RBBM draw time thresholds which could be imprecise as only
the average was calculated rather than true maximum draw time.
However, converting the preemption latency performance counter value
from CP ticks to wall clock is based on the average GPU frequency of
the whole period from the start of the RP until the switch-away amble
while the preemption latency stars counting from the request. Thus, if
the GPU frequency shifts rapidly throughout the RP, it may cause the
estimated wall clock time to be inaccurate, but it should be good enough
in the vast majority of cases.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Tuning these small renderpasses is difficult due to their high
variability across command buffers and low impact on overall performance
in most cases. This change disables autotuning for renderpasses with 5
or fewer draw calls unless the TUNE_SMALL modifier flag is explicitly
set.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This algo measures the time taken by each RP as a whole, and uses that
to move a probability distribution of whether to use GMEM or SYSMEM for
that RP. This is done with a delta of 5% per run, and the probability is
clamped to 5% and 95% to avoid getting stuck when conditions change.
Additionally, an "immediate resolve" variant which tries to work off a
single data point in SYSMEM and GMEM, then immediately resolves to the
faster path. This is useful for usage in CI which runs a single frame
multiple times where the performance isn't varying change from frame to
frame.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
The list in the documentation still doesn’t go higher than v10, and it
isn’t clear from that list of GPU IDs which one actually corresponds to
the newer generations, but at least users can test them.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39564>
Allow adjusting the location of RD dumps and trigger file through the
FD_RD_DUMP_PATH environment variable. When not present, the existing
defaults will be used.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40532>
NVIDIA moved the nSight Graphics docs and this was showing with
linkcheck on sphinx-build.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40340>
Add 'force_robustness' to 'MESA_DEBUG_KK' to force robustness in all
shaders.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38881>
Metal does not seem to respect memory coherency for threads. Workaround 6
enforces device coherency for global loads/stores even if it should not
be needed.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38847>
Metal will prematurely discard fragments with side effects even if those
side effects happen before the discard. Work around this by making said
discards "optional".
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38741>
These were accidental when I split up the large article in to multiple
documents. Let's fix that up, so we don't end up repeating this for
future documents.
Fixes: 8248cc0bf4 ("docs/panfrost: move details to separate articles")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38738>
When discarding a fragment in Metal, it will not be demoted to helper. At
least for Apple Silicon M1 and M2. Call nir_lower_is_helper_invocation to
work around this.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38590>
Netcat locks on to the first connection so if one tried to use
breadcrumbs again Netcat will appear as if it didn't receive
anything. Use `-k` so that it accepts another connection.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38419>
Adds build instructions and workarounds documentation.
Workarounds documentation only has the biggest offenders and
there are probably way more in code that need yet to be
documented.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38232>
This description was incorrect in that it impiled we supported Hopper
and Blackwell A, which is not currently the case (see nvk_is_conformant
in nvk_physical_device.c).
Fixes: edd0cb6d56 ("docs/nvk: Update hardware support")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38320>