Commit graph

845 commits

Author SHA1 Message Date
Rhys Perry
c240c1aecf aco: apply literals to split mads
Removing the return is also needed to apply literals to mads (which can be
done on GFX10).

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 368787 -> 367555 (-0.33 %)
VGPRS: 312436 -> 312448 (0.00 %)
Spilled SGPRs: 461 -> 461 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 26113388 -> 26098260 (-0.06 %) bytes
Max Waves: 35982 -> 35982 (0.00 %)
Instructions: 5038670 -> 5028941 (-0.19 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 369843 -> 368659 (-0.32 %)
VGPRS: 317224 -> 317196 (-0.01 %)
Spilled SGPRs: 629 -> 629 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 26310540 -> 26295156 (-0.06 %) bytes
Max Waves: 36324 -> 36326 (0.01 %)
Instructions: 5073957 -> 5064164 (-0.19 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
8f10e48745 aco: update IR validator
GFX10 increased the constant bus limit and allowed literals on VOP3

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:27 +00:00
Daniel Schürmann
05c81875d7 aco: fix unconditional demote_to_helper
This patch fixes an out-of-bounds access on p_exit_early
and binds the exec register to the correct operand.

Fixes: 2ea9e59e8d ('aco: move s_andn2_b64 instructions out of the p_discard_if')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
2020-01-13 21:08:41 +00:00
Jason Ekstrand
d3737002ee nir/lower_atomics_to_ssbo: Also lower barriers
This is more correct for a pass which is supposed to completely lower
away atomic counters.  It also lets us stop supporting atomic counter
barriers in most of the drivers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
e40b11bbcb nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
60097cc840 nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone.  For most drivers,
it's a switch case in the NIR -> whatever which just breaks.  For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Rhys Perry
809c8feb92 aco: check if multiplication/clamp is live when applying output modifier
It's possible that a multiplication/clamp is dead code and the single use
is from a different user.

Fixes portal rendering in Path of Exile when global illumination is
enabled.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
ef8abfa790 aco: disable add combining for ds_swizzle_b32
ds_bpermute_b32/ds_permute_b32 are fine, I think

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
69bed1c918 aco: don't DCE atomics with return values
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
8f291dc146 aco: set exec_potentially_empty for demotes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
21eafe30df aco: better handle neg/abs of sgprs
isel/label_instruction currently doesn't create these but we should
probably check anyway.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
f29a5a205c aco: check usesModifiers() when identifying a neg/abs
This was fine because a literal used to mean that it didn't use modifiers,
but now VOP3 can take a literal on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
46fb341b8d aco: handle omod successors with the constant in the first operand
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
7ce244b7d1 aco: handle VOP3 modifiers when combining a constant comparison's NaN test
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
bbac52873f aco: fix uninitialized data in the binary
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:25:32 +00:00
Rhys Perry
fcd6d83245 aco: fix imageSize()/textureSize() with large buffers on GFX8
Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8
path with the size multipled by the stride.
dEQP-VK.image.image_size.buffer.* was also run with the tests modified to
use a 96bit format.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:25:32 +00:00
Rhys Perry
49bcd06f97 aco: set vm for pos0 exports on GFX10
RADV's LLVM backend and radeonsi does the same thing.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:25:32 +00:00
Timur Kristóf
44a6b17df7 aco/wave32: Set the definitions of v_cmp instructions to the lane mask.
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.

Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-11 20:15:53 +01:00
Daniel Schürmann
8b7a42d6d0 aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t.
This reduces the size of the Instruction base class
from 40 bytes to 16 bytes. No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Daniel Schürmann
ffb4790279 aco: compact various Instruction classes
No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Rhys Perry
b5c9688516 aco: limit register usage for large work groups
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-10 12:10:37 +00:00
Samuel Pitoiset
4d49a7ac73 aco: handle nir_intrinsic_image_deref_{load,store} with lod
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2020-01-09 07:58:33 +01:00
Timur Kristóf
11e62a9734 aco: Fix uniform i2i64.
Fixes 240 failing test cases in dEQP-VK.spirv_assembly which
were failing due to a bad s_ashr_i32 instruction. This commit
fixes the instruction format along with the definitions of the
instruction.

Fixes: 11f43caaec
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-31 14:22:31 +01:00
Eric Engestrom
51569e525a amd: fix empty-body issues
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 8d43e2b2de ("meson: add -Werror=empty-body to disallow `if(x);`")
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-12-27 22:09:00 +00:00
Rhys Perry
afe1a8ff5b aco: fix vgpr alloc granule with wave32
We still need to increase the number of physical vgprs

Totals from affected shaders:
SGPRS: 671976 -> 675288 (0.49 %)
VGPRS: 550112 -> 562596 (2.27 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 27621660 -> 27606532 (-0.05 %) bytes
Max Waves: 81083 -> 87833 (8.32 %)
Instructions: 5391560 -> 5389031 (-0.05 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Rhys Perry
01ccd7839c aco: improve jump threading with wave32
Totals from affected shaders:
SGPRS: 748746 -> 748746 (0.00 %)
VGPRS: 636984 -> 636984 (0.00 %)
Spilled SGPRs: 387 -> 387 (0.00 %)
Spilled VGPRs: 15 -> 15 (0.00 %)
Code Size: 61138824 -> 60928620 (-0.34 %) bytes
Max Waves: 48602 -> 48602 (0.00 %)
Instructions: 11967660 -> 11915084 (-0.44 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Rhys Perry
6ff92f3d68 aco/wave32: fix comparison optimizations
Previously, they weren't done in wave32.

Totals from affected shaders:
SGPRS: 507726 -> 508006 (0.06 %)
VGPRS: 450340 -> 450268 (-0.02 %)
Spilled SGPRs: 298 -> 298 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 39689708 -> 39384488 (-0.77 %) bytes
Max Waves: 39631 -> 39636 (0.01 %)
Instructions: 7865919 -> 7793650 (-0.92 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Karol Herbst
b35e583c17 aco: use NIR_MAX_VEC_COMPONENTS instead of 4
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-21 11:00:16 +00:00
Samuel Pitoiset
13b4e9adcf ac: declare an enum for the OOB select field on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147>
2019-12-19 15:15:32 +01:00
Daniel Schürmann
8259c97b2d aco: propagate temporaries into expanded vectors
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
df3e674fb3 aco: improve readfirstlane after uniform ssbo loads on GFX7
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)

Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0837471463 aco: use soffset for MUBUF instructions on SI/CI
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)

Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
28c95cc402 aco: return to loop_active mask at continue_or_break blocks
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
746165e540 aco: implement exclusive scan for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7ae227effd aco: implement inclusive_scan for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f895a8b1df aco: implement (clustered) reductions for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
9254fb4fc7 aco: don't use a scalar temporary for reductions on GFX10
This patch also adds the scalar temporary for scans on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8ad43d8838 aco: flush denorms after fmin/fmax on pre-GFX9
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
79ce6c1b33 aco: disable disassembly for SI/CI due to lack of support by LLVM
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1c4afe38f2 aco: implement 64bit ine/ieq for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1e1356b2ad aco: implement 64bit i2b for SI /CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835 aco: make 1/2*PI a literal constant on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
90fad7360d aco: implement 64bit VGPR shifts for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006 aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
23319add93 aco: fix disassembly of writelane instructions.
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6fc9ddfef8 aco: recognize SI/CI SMRD hazards
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3eed4d2be5 aco: implement quad swizzles for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
bde9c1e3a1 aco: move buffer_store data to VGPR if needed
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
a8195bdf2e aco: implement nir_op_isign on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
b8783973cd aco: only use scalar loads for readonly buffers on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00