For:
v_mov_b32_e32 v0, 1.0
exp mrtz v0, off, off, off
we should completely remove the ALU entry before creating the EXP's WaR entry for v0.
Otherwise, the two will be combined into an entry which will wait for
expcnt(0) for later uses of v0.
gen_alu() should also be before gen(), since gen_alu() performs the clear
while gen() creates the WaR entry.
fossil-db (gfx1100):
Totals from 3589 (2.69% of 133428) affected shaders:
Instrs: 5591041 -> 5589047 (-0.04%); split: -0.04%, +0.00%
CodeSize: 28580840 -> 28572864 (-0.03%); split: -0.03%, +0.00%
Latency: 65427923 -> 65427543 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 11109079 -> 11109065 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23213>
Unlike \S, \w only matches characters which are valid in identifiers. This
seems to be much faster, especially for longer identifier names.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22636>
Usually the register demand before an instruction would be considered part
of the previous instruction, since it's not greater than the register
demand for that previous instruction. Except, it can be greater in the
case of an definition fixed to a non-killed operand: the RA needs to
reserve space between the two instructions for the definition (containing
a copy of the operand).
fossil-db (navi21):
Totals from 5 (0.00% of 135636) affected shaders:
PreVGPRs: 35 -> 40 (+14.29%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8807
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22446>
C++17 is the project-wide default since f9057cea51 ("fix(FTBFS):
meson: raise C++ standard to C++17"), so let's drop these local
overrides.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23048>
Now p_wqm always kills its operand, so no movs will be created for it.
Long term we want to remove p_wqm in favor of a Definition flag,
so this is also a step in that direction.
Foz-DB Navi21:
Totals from 45351 (33.63% of 134864) affected shaders:
VGPRs: 2099552 -> 2116192 (+0.79%); split: -0.14%, +0.93%
CodeSize: 179530772 -> 179072104 (-0.26%); split: -0.29%, +0.03%
MaxWaves: 1054740 -> 1052262 (-0.23%); split: +0.10%, -0.33%
Instrs: 33238535 -> 33188347 (-0.15%); split: -0.17%, +0.02%
Latency: 451000471 -> 450869384 (-0.03%); split: -0.11%, +0.08%
InvThroughput: 86026785 -> 86286288 (+0.30%); split: -0.11%, +0.41%
VClause: 633291 -> 623920 (-1.48%); split: -1.91%, +0.43%
SClause: 1436708 -> 1431395 (-0.37%); split: -0.60%, +0.23%
Copies: 2166563 -> 2122592 (-2.03%); split: -2.29%, +0.26%
Branches: 706846 -> 706838 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1976162 -> 1976592 (+0.02%)
PreVGPRs: 1797409 -> 1794704 (-0.15%)
MaxWaves regressions in Detroit: Become Human MaxWaves seem to be due
to the scheduler choosing to schedule more aggressively.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22956>
... and expect it to behave like ACCESS_NON_TEMPORAL.
Handling ACCESS_NON_TEMPORAL is sufficient.
This was copied from ac_nir_to_llvm.c.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
Previously, we could safely read/write unused lanes of VMEM/DS
destination VGPRs without waiting for the load to finish. That doesn't
seem to be the case on GFX11.
fossil-db (gfx1100):
Totals from 6698 (4.94% of 135636) affected shaders:
Instrs: 11184274 -> 11199420 (+0.14%); split: -0.00%, +0.14%
CodeSize: 57578344 -> 57638928 (+0.11%); split: -0.00%, +0.11%
Latency: 198348808 -> 198382472 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 24376324 -> 24378439 (+0.01%); split: -0.00%, +0.01%
VClause: 192420 -> 192559 (+0.07%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8722
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8239
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22965>
This can not happen because the post-RA optimizer doesn't support sub dword
writes at the moment, but everytime I look at this I wonder if there might
be a bug here.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22821>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
Legacy GS needs to emit a DONE signal at the end. Do this in NIR
instead of in the backends.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
Contains a global wave ID of legacy GS waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
This intrinsic is going to be used for simplifying GS code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>