The usage after spilling is usually either the same as before or the
maximum.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559>
This lets us measure optimizations without interference of waitcnt
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559>
A somewhat random collection of fossils:
N Min Max Median Avg Stddev
x 6 16.59 16.61 16.605 16.603333 0.0081649658
+ 6 15.99 16 16 15.998333 0.0040824829
Difference at 95.0% confidence
-0.605 +/- 0.00830327
-3.64385% +/- 0.0485573%
(Student's t, pooled s = 0.00645497)
I'm not sure if nir_opt_if and nir_opt_loop_unroll are actually idempotent
or not.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
For example, in the loop:
while (more_late_algebraic) {
more_late_algebraic = false;
NIR_PASS(more_late_algebraic, nir, nir_opt_algebraic_late);
NIR_PASS(_, nir, nir_opt_constant_folding);
NIR_PASS(_, nir, nir_copy_prop);
NIR_PASS(_, nir, nir_opt_dce);
NIR_PASS(_, nir, nir_opt_cse);
}
if nir_opt_algebraic_late makes no progress, later passes might be
skippable depending on which ones made progress in the previous iteration.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
Patch adds a fallback to calculate_tile_dimensions if such case is hit,
this happened when running CTS tests on simulation.
Fixes: d13c81a2c3 ("iris/xehp: Implement TBIMR tile pass setup and pipeline bandwidth estimation.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25989>
When the pitch or slice pitch isn't properly aligned,
the SDMA HW is unable to copy between tiled images and buffers.
To work around this, we process the image chunk by chunk,
copying the data to a temporary buffer which uses supported
pitches, and then copy it to the intended destination.
The implementation assumes that at least one pixel row of the
image will fit into the temporary buffer, and will try to copy
as many rows at once as possible. Sadly, this still results in
a lot of packets being generated for large images.
A possibe future improvement is to copy the image slice by slice
when only the slice pitch is misaligned. However, that is out
of scope for this commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Some copy operations are poorly supported by the SDMA hardware,
meaning that the built-in packets don't support them, so we will
need to work around that by copying to and from a temporary BO.
The size of the temporary buffer was chosen so that it can fit
at least one full pixel row of the largest possible image.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Previously, RADV only had a simple implementation of
image to buffer copies using the SDMA for the PRIME copy.
This commit replaces that with a full-featured implementation
that includes buffer to image and image to buffer copies and
removes the assumptions that the PRIME copy had, as well as
adds new helper functions which will be shared with other copy
functions in upcoming commits.
Unaligned buffer/image copies require a workaround, which
will be implemented by a future commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
This register seems needed to enable compute shader shader invocations
on GFX7. On GFX8+ it's working fine without emitting this register but
I think it doesn't hurt.
This fixes dEQP-VK.query_pool.statistics_query.*_cq on GFX7.
Fixes: a9945216ba ("radv: fix COMPUTE_SHADER_INVOCATIONS query on compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25957>
Looks like GFX6 always writes the number of compute shader invocations
at offset 0 when used on compute queue.
This fixes dEQP-VK.query_pool.statistics_query.*_cq on GFX6.
Fixes: a9945216ba ("radv: fix COMPUTE_SHADER_INVOCATIONS query on compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25957>
This partially reverts 74ab940156 which
was a fix for random GPU hangs with binning on GFX10+. Though,
according to RadeonSI, only GFX10+ is affected and this reduced perf
on GFX9 chips.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25951>
The following sequence is valid (although weird) but many other drivers
(including RADV) were broken:
- bind pipeline with some static state
- set state command for that static state (to a bad value)
- bind the same pipeline again
- draw
Fixes new dEQP-VK.dynamic_state.*.double_static_bind.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25954>
Another case of a game assuming XeSS is available since an
Intel ARC GPU is discovered by the game's executable binary.
With this, a warning will appear that GPU is unstable/not supported,
but a warning is preferable over the game crashing.
No other issues observed upon starting & playing the game.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25965>
Removes the link-time dependency on tgsi_get_gl_varying_semantic from
Gallium auxiliary.
ps_prim_id_input linkage removed due to redundancy - the SPI SID is
calculated for VARYING_SLOT_PRIMITIVE_ID on both sides.
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25695>
Hopefully this will get us more useful backtraces in CI (for ex, with
traces replay) while maintaining _most_ of the artifact size benefits of
stripping:
-rwxr-xr-x 1 robclark robclark 50M Oct 30 11:47 msm_dri.so.strip-debug
-rwxr-xr-x 1 robclark robclark 40M Oct 30 11:47 msm_dri.so.strip
-rwxr-xr-x 1 robclark robclark 129M Oct 30 11:47 msm_dri.so.orig
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25962>
Changes a variable type from `nir_component_mask_t` to `uint32_t`. The
variable's name suggests it may have been meant to be a 32-bit integer
anyway.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25691>
So far we were packing by hand unnormalized coordinates at the
V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we
hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71
because the structure are the same. But that is somewhat ugly, and
will not work if a new hw generation have a different structure.
Additionally, we found that for v3d this will be also needed.
So this commit adds a helper on the compiler. For now, and to simplify
it also use just one method for both generations. This solves the
problem of the same code needed on both v3d and v3dv.
But the idea is that in the future we need a similar need, but the
structure different on each generation, it would have used a similar
approach to other generation dependent function calls (like
v3d40_vir_emit_tex), having the implementation on a source file that
can safely include the hw generation headers.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>
Currently the run takes ~ 35 minutes, which is too much.
Reported-by: default avatarDaniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25843>
Most likely was historically used there, but not recently.
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25843>