The is a squash of what in the original MR was "util: Add generic pass
that tries to combine constants" and "intel/fs: Switch to using
util_combine_constants".
The new algorithm uses a multi-pass greedy algorithm that attempts to
collect constants for loading in order of increasing degrees of freedom.
The first pass collects constants that must be emitted as-is (e.g.,
without source modifiers).
The second pass emits all constants that must be emitted (because they
are used in a source field that cannot be a literal constant) but that
can have a source modifier.
The final pass possibly emits constants that may not have to be emitted.
This is used for instructions where one of the fields is allowed to be a
constant. This is not used in the current commit, but future commits
that enable SEL will use this. The SEL instruction can have a single
constant, but when both sources are constant, one of the sources has to
be loaded into a register.
By loading constants in this order, required "choices" made in earlier
passes may be re-used in later passes. This provides a more optimal
result.
At this point in the series, most platforms have the same results with
the new implementation. Gen7 platforms see a significant number of
"small" changes. Due to the coissue optimization on Gen7, each shader
is likely to have most constants affected by constant combining.
If a shader has only a single basic block, constants are packed into
registers in the order produced by the constant combining process.
Since each constant has a different live range in the shader, even
slightly different packing orders can have dramatic effects on the live
range of a register. Even in cases where this does not affect register
pressure in a meaningful way, it can cause the scheduler to make very
different choices about the ordering of instructions.
From my analysis (using the `if (debug) { ... }` block at the end of
fs_visitor::opt_combine_constants), the old implementation and the new
implementation pick the same set of constants, but the order produced
may be slightly different. For the smaller number of values in non-Gfx7
shaders, the orders are similar enough to not matter.
No shader-db or fossil-db changes on any non-Gfx7 platforms.
Haswell and Ivy Bridge had similar results. (Haswell shown)
total cycles in shared programs: 879930036 -> 880001666 (<.01%)
cycles in affected programs: 22485040 -> 22556670 (0.32%)
helped: 1879
HURT: 2309
helped stats (abs) min: 1 max: 6296 x̄: 258.54 x̃: 34
helped stats (rel) min: <.01% max: 54.63% x̄: 3.88% x̃: 0.87%
HURT stats (abs) min: 1 max: 9739 x̄: 241.41 x̃: 40
HURT stats (rel) min: <.01% max: 160.50% x̄: 6.01% x̃: 0.99%
95% mean confidence interval for cycles value: -1.04 35.25
95% mean confidence interval for cycles %-change: 1.23% 1.92%
Inconclusive result (value mean confidence interval includes 0).
LOST: 82
GAINED: 39
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
strace output is only used for debug and its output takes too much
space. Remove it to save resources.
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Fixes: 7b51a583ed ("ci/android: add android to the ci")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24913>
Now that we have drm-ci, add --project, so the script can also be used
to linux (and any other projects).
Let the default to "mesa" so it can keep behaving as before when the
option is not given.
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24912>
There is no way we will rematerialize a 40k instruction long chain and
it also won't be beneficial. This improves the replay time if our CP2077
fossil by 350% when compiling only ray tracing pipelines.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24864>
With merged shaders, the long-jump should be emitted inside the
divergent if (ie. only for TCS invocations) and other non TCS
invocations should just end the program.
This fixes a bunch of failures with CTS by forcing TCS epilogs on
RDNA2.
Not sure how RadeonSI will handle that but maybe doing the merged
wave info thing in epilogs would help.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24832>
On commit 29588fe116 we re-enable sync_fd import/export. But only
with the real hw, because at that time there were wrong CTS tests
(that were calling vkSetEvent after submission) that needed to be
fixed.
Since this commit:
717c051d4b
Those tests are fixed. That fix has been on CTS releases for some
time. So we can enable it on the simulator too.
With this change the pattern dEQP-VK.api.external.semaphore.sync_fd*
goes from 2 Passed/10 Not Supported to 12 Passed on the simulator.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24688>
this allows more reordering when the first barrier in a new cmdbuf can
be reordered after previous ordered access exists
KHR-GLES3.copy_tex_image_conversions.required.texture2d_cubemap_negz:
before - ordered 68
after - ordered 16
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24886>
chain->base.present_mode is unset at this point, ie. it's
zero-initialized. VK_PRESENT_MODE_IMMEDIATE_KHR happens to be 0,
so the WSI will attempt to use tearing-control on compositors that
don't support it.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5ceba97c2e ("vulkan/wsi/wayland: add support for IMMEDIATE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24885>
Kernel makes every mapping coherent. If a memory type is truly
incoherent, it's better to remove the host-visible flag than silently
making it coherent. However, for app compatibility purpose, when
coherent-cached memory type is unavailable, we emulate the first cached
memory type with the first coherent memory type.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24875>
Have to set the interlaced field of the surface before
end_frame is called in the pipe codec object so the info
is available to the frontend/va, instead of setting it
directly in end_frame like before.
Fixes: 578e10e157 ("frontends/va: Alloc interlaced surface for interlaced pics")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24909>
Restore transfer box original size after temporal per plane
dimension calculation.
Currently the returned transfer object on Map will have the
size (usually downsampled) of the latest plane instead of
the overall resource size. Then on unmap, when flushing the
changes the received transfer box has the wrong dimensions
and only partial data is flushed.
Fixes: 12a4f2c132 ("frontends/va: Also map VAImageBufferType for reading")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24909>
Fix defect reported by Coverity Scan.
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement: return VK_ERROR_SURFACE_LOS....
Fixes: fb9f697fbb ("vk/wsi/x11: move surface alpha check from get_caps to creation")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24802>
Fix defect reported by Coverity Scan.
Unused value (UNUSED_VALUE)
assigned_pointer: Assigning value from &nv50->vtxbuf[b] to vb here, but
that stored value is overwritten before it can be used.
Fixes: 7672545223 ("gallium: move vertex stride to CSO")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24801>
This almost certainly intends to call the user-definied assignment
operator here instead of the automatically generated copy constructor.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24904>
With lavapipe the previous change to only enable the hack when there's
no textures bound doesn't work anymore, since we don't have that
information when using the texture handles.
So add another random state restriction which is sufficient to pass
tests. (This really needs a better long term solution.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24887>