We can't do the optimization if there are neg/abs in-between.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Using needs_vop3 check was flawed because it would only combine the
literal if the first operand is the literal. If the second or third
operand is the literal, then needs_vop3 will be true and the literal will
not be combined.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 782051 -> 782051 (0.00 %)
VGPRS: 630048 -> 630048 (0.00 %)
Spilled SGPRs: 195 -> 195 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 54743740 -> 54585548 (-0.29 %) bytes
Max Waves: 67340 -> 67340 (0.00 %)
Instructions: 10182030 -> 10182030 (0.00 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 701990 -> 699590 (-0.34 %)
VGPRS: 566632 -> 566784 (0.03 %)
Spilled SGPRs: 218 -> 218 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 49173564 -> 49007856 (-0.34 %) bytes
Max Waves: 59650 -> 59612 (-0.06 %)
Instructions: 9315135 -> 9293330 (-0.23 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
We could create VALU instructions which read two sgprs, but only if isel
created an instruction which already read one of them.
This change is in a separate patch from the apply_sgprs() rewrite so that
it can be tested if the rewrite affected anything.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1756 -> 1708 (-2.73 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 312 -> 300 (-3.85 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1784 -> 1736 (-2.69 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 319 -> 307 (-3.76 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
This will be useful when taking advantage of GFX10 features.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
This will make it easier to apply two different sgprs (for GFX10) or apply
the same sgpr twice (just remove the break).
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Should make taking advantage of GFX10's increased constant bus limit and
VOP3 literals easier.
No pipeline-db changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
GFX10 increased the constant bus limit and allowed literals on VOP3
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
ds_bpermute_b32/ds_permute_b32 are fine, I think
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
isel/label_instruction currently doesn't create these but we should
probably check anyway.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
This was fine because a literal used to mean that it didn't use modifiers,
but now VOP3 can take a literal on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8
path with the size multipled by the stride.
dEQP-VK.image.image_size.buffer.* was also run with the tests modified to
use a 96bit format.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
RADV's LLVM backend and radeonsi does the same thing.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
This field is for the primitive ID export to the fragment shader.
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only needed for NGG without passthrough mode or for NGG streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can't be enabled for geometry shaders, for NGG streamout and
for vertex shaders that export the primitive ID. NGG passthrough
requires that LDS isn't used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.
Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This extension allows to use LOD with image read/write operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>