No matter the format, this should return true if the instruction has an
exec operand.
Otherwise, eliminate_useless_exec_writes_in_block() could remove an exec
write in a block if it's successor begins with:
s2: %3737:s[8-9] = p_parallelcopy %0:exec
s2: %0:exec, s1: %3738:scc = s_wqm_b64 %3737:s[8-9]
Totals from 3 (0.00% of 150170) affected shaders (GFX10.3):
CodeSize: 23184 -> 23204 (+0.09%)
Instrs: 4143 -> 4148 (+0.12%)
Latency: 98379 -> 98382 (+0.00%)
Copies: 172 -> 175 (+1.74%)
Branches: 95 -> 97 (+2.11%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: bc13049747 ("aco: Eliminate useless exec writes in jump threading.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5620
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13776>
Instead, we can rely on the fact that subdword definitions
must preserve the unused bits while dword definitions either
pad or sign-extend.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
This commit introduces a new struct SubdwordSel
in order to ease and clean up the usage of SDWA
selections. This includes removing the distinction
between register-allocated and fixed SDWA selections.
Instead, SDWA selections can now also access the high
bits of subdword variables. Alignment and sizes are
validated accordingly. Size, offset and sign_extend
can be evaluated via helper methods.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
For example, v_cvt_f64_i32. LLVM doesn't seem to allow this either and it
doesn't seem to work correctly.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
These don't really need the exec mask (and never have), but we haven't
needed to include them in needs_exec_mask yet.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
In the future, we might have vertex attribute loads from the same binding
but with different descriptors. Since they will be loading from the same
buffer, we should continue grouping them into clauses.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7871>
This is used when printing the program and to avoid updating register
demand during post-RA liveness analysis.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
This prints the program with each instruction's contribution to it's
latency and various factors for the calculation of the Inverse Throughput
statistic.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
This reverts commit 1a0b0e8460.
The bounds checking behaviour of ds_read_b64, ds_read_b96 and ds_read_b128
make this feature very difficult to use safely.
This fixes a blocking artifact in Hitman 2. Previously, it contained:
ds_read_b64(local_invocation_index() * 4 - 4)
For local_invocation_index()=0, the second dword would be considered
out-of-bounds, even though it's at offset 0.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
Instead of assuming WGP mode on GFX10+ in different places, add a member
to Program that can be used instead.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
This patch delays the calculation of GPR limits in order to
precisely incorporate extra registers (VCC etc.) and shared VGPRs.
Additionally, the allocation granularity is used to set the config.
This has some effect on the reported SGPR stats.
Totals (Navi10):
SGPRs: 6971787 -> 17753642 (+154.65%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
This also switches the alloc_granule of Tonga and Iceland
to 96, so that the calculation is consistent.
Also changes the granularity for RDNA to 16 to keep
better stats with the upcoming patch.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
Previously, when the instruction had 3 operands, this would cause
possible corruption because of writing to sdwa->sel[2].
This was noticed thanks to GCC 10's -Wstringop-overflow warning.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6436>
Sounds useful for debugging missing wait-states and for improving
detection of the faulty instruction in case of memory violations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6386>