This commit completely rewrites the way we extract a structured CFG from
SPIR-V. The new approach is different in a few ways:
1. It does a breadth-first search instead of depth-first. This means
that we've visited the merge node for a construct before we visit
any of the nodes inside the construct. This makes it easier to
validate things like loop and switch nesting.
2. We record more information in the CFG. Earlier commits added a
parent pointer to vtn_cf_node but we now record all of the merge and
other special blocks for each CFG node. This lets us validate
things more precisely.
3. It makes heavy use of merge blocks for walking the CFG. Previously,
we sort of used them as hints for trying to guess the CFG structure
but things got dicey whenever a merge was missing. We had some
heuristics for how to handle short-circuiting if statements but it
was a bunch of special cases.
Now, we make them a fundamental part of walking the CFG. When we
encounter a control-flow construct, we add the body components of
the construct to the BFS work list and then jump to the merge block
if one exists to continue scanning the current CFG nesting level.
If no merge block exists, we assume that means that control-flow
never re-converges in a normal way and that the only way to get back
to normality is with a direct jump such as a loop break or continue.
This should make things far more robust when trying to deal with the
more creative placement (or lack thereof) of merge instructions.
Reviewed-by: Alan Baker <alanbaker@google.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
This commit enables the I/O vectorization pass that was originally
written for ACO for Intel drivers. We enable it for UBOs, SSBOs, global
memory, and SLM. We only enable vectorization for the scalar back-end
because it vec4 makes certain alignment assumptions.
Shader-db results with iris on ICL:
total instructions in shared programs: 16077927 -> 16068236 (-0.06%)
instructions in affected programs: 199839 -> 190148 (-4.85%)
helped: 324
HURT: 0
helped stats (abs) min: 2 max: 458 x̄: 29.91 x̃: 4
helped stats (rel) min: 0.11% max: 38.94% x̄: 4.32% x̃: 1.64%
95% mean confidence interval for instructions value: -37.02 -22.80
95% mean confidence interval for instructions %-change: -5.07% -3.58%
Instructions are helped.
total cycles in shared programs: 336806135 -> 336151501 (-0.19%)
cycles in affected programs: 16009735 -> 15355101 (-4.09%)
helped: 458
HURT: 154
helped stats (abs) min: 1 max: 77812 x̄: 1542.50 x̃: 75
helped stats (rel) min: <.01% max: 34.46% x̄: 5.16% x̃: 2.01%
HURT stats (abs) min: 1 max: 22800 x̄: 336.55 x̃: 20
HURT stats (rel) min: <.01% max: 17.11% x̄: 2.12% x̃: 1.00%
95% mean confidence interval for cycles value: -1596.83 -542.49
95% mean confidence interval for cycles %-change: -3.83% -2.82%
Cycles are helped.
total sends in shared programs: 814177 -> 809049 (-0.63%)
sends in affected programs: 15422 -> 10294 (-33.25%)
helped: 324
HURT: 0
helped stats (abs) min: 1 max: 256 x̄: 15.83 x̃: 2
helped stats (rel) min: 1.33% max: 67.90% x̄: 21.21% x̃: 15.38%
95% mean confidence interval for sends value: -19.67 -11.98
95% mean confidence interval for sends %-change: -23.03% -19.39%
Sends are helped.
LOST: 7
GAINED: 2
Most of the helped shaders were in the following titles:
- Doom
- Deus Ex: Mankind Divided
- Aztec Ruins
- Shadow of Mordor
- DiRT Showdown
- Tomb Raider (Rise, I think)
Five of the lost programs are SIMD16 shaders we lost from dirt showdown.
The other two are compute shaders in Aztec Ruins which switched from
SIMD8 to SIMD16.
Vulkan pipeline-db stats on ICL:
Instructions in all programs: 296780486 -> 293493363 (-1.1%)
Loops in all programs: 149669 -> 149669 (+0.0%)
Cycles in all programs: 90999206722 -> 88513844563 (-2.7%)
Spills in all programs: 1710217 -> 1730691 (+1.2%)
Fills in all programs: 1931235 -> 1958138 (+1.4%)
By far the most help was in the Tomb Raider games. A couple of Batman
games with DXVK were also helped. In Shadow of the Tomb Raider:
Instructions in all programs: 41614336 -> 39408023 (-5.3%)
Loops in all programs: 32200 -> 32200 (+0.0%)
Cycles in all programs: 1875498485 -> 1667034831 (-11.1%)
Spills in all programs: 196307 -> 214945 (+9.5%)
Fills in all programs: 282736 -> 307113 (+8.6%)
Benchmarks of real games I've done on this patch:
- Rise of the Tomb Raider: +3%
- Shadow of the Tomb Raider: +10%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
These were clearly copied and pasted from SSBOs. The shared atomics
don't have an SSBO index so their offset is src0 and data is src1.
Fixes: ce9205c03b "nir: add a load/store vectorization pass"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
We're about to do load/store vectorization right before this but we need
that to happen after we've done a round of optimization. Otherwise,
we'll be getting unoptimized NIR in from ANV and the vectorizer won't be
able to do anything with it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
This commit makes us take both bit size and alignment into account so
that we can properly handle cases such as when we have a 32-bit store
to an 8-bit-aligned address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Thanks to the NIR vectorizing pass, we're about to see alignments that
are higher than the bit size. Previously, we could use either and we
just happened to choose alignment (probably the wrong choice) so it's
harmless to switch to detecting based on bit size. This commit changes
things to take both into account which is more accurate to what the
messages we're using do. We also beef up the asserts and make them more
consistent, more accurate, and more complete.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Immediate offsets are currently collapsed for ldlw, but ldlw does
behave correctly with immediate values. For example,
`ldlw.u32 r0.x, l[4], 1` actually means to use the value of
regid 4 (r1.x) as the offset when we actually want it to use the
imm value of 4 as the offset.
This commit disables copy prop for ldlw offsets so the same
intrinsic gets compiled to:
mov.u32u32 r0.y, 0x00000004
ldlw.u32 r0.x, l[r0.y], 1
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4439>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4439>
Previously, turnip advertised 4-bit subpixel precision when in
practice, a6xx seems to render with 8-bit precision. This caused
dEQP-VK.renderpass2.suballocation.subpass_dependencies.late_fragment_tests.*
to fail because they compare images rendered with turnip against
ones rendered via a software reference implementation parameterized
by turnip's VkPhysicalDeviceLimits.subPixelPrecisionBits value.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4172>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4172>
This isn't perfect (for example, changes might not be too meaningful when
comparing shaders with different control flow) but it should be useful for
evaluating scheduler changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>
Adds these statistics:
- hash of code and constant data
- number of instructions
- number of copies from pseudo-instructions
- number of branches
- estimate of cycles spent not waiting in s_waitcnt
- number of vmem/smem "clauses"
- sgpr/vgpr usage before scheduling
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>
Statistics will be added to ACO in later commits.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>
This was missed in the original conversion, which added support for
eglSetDamageRegionKHR to local EGL exports, but forgot to generate
updated dispatch for GLVND.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Fixes: 9827547313 ("egl/android: support for EGL_KHR_partial_update")
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4403>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4403>
Due to possible alignment issues, make sure to split stores of
16-bit vectors.
Doom Eternal requires storageBuffer16BitAccess.
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4339>
Due to possible alignment issues, make sure to split loads/stores
of 16-bit vectors.
Doom Eternal requires storageBuffer16BitAccess.
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4339>
Due to possible alignment issues, make sure to split stores of
8-bit vectors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4339>
Due to possible alignment issues, make sure to split loads/stores
of 8-bit vectors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4339>
v_mad and v_madak are both 64-bit instructions, so it doesn't
increase code size to always apply a 32-bit literal instead of
using v_mad and a sgpr which contains that literal.
Found with some Youngblood shaders but help some other games.
vkpipeline-db (VEGA10):
Totals from affected shaders:
SGPRS: 46168 -> 46016 (-0.33 %)
VGPRS: 45576 -> 45564 (-0.03 %)
Code Size: 5187208 -> 5179584 (-0.15 %) bytes
Max Waves: 3297 -> 3297 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410>
The use of vector.end()[-1] seems to generate warnings in Coverity about
not allowing a negative argument to a parameter. The intention with the
code snippet is just to access the last element of the vector. The
vector.back() call acheives the same thing, is clearer and will
hopefully fix the Coverity warning.
I’m not exactly sure why Coverity thinks the array index can’t be
negative. cplusplus.com says that vector::end() returns a random access
iterator and that the type of the array index operator argument to that
should be the difference type for the container. It then also says that
difference_type for a vector is "a signed integral type".
Reviewed-by: Eric Anholt <eric@anholt.net>