Commit graph

104 commits

Author SHA1 Message Date
Samuel Pitoiset
aad5176c58 aco: implement 64-bit nir_op_ftrunc on GFX6
GFX6 doesn't have V_TRUNC_F64, it needs to be lowered. Loosely based
on the AMDGPU LLVM backend.

Introduce a new function because it will be useful for some other
64-bit operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>
2020-01-23 14:40:34 +01:00
Samuel Pitoiset
36e7a5f5b9 aco: implement nir_intrinsic_global_atomic_* on GFX6
GFX6 doesn't have FLAT instructions, use MUBUF instructions instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>
2020-01-23 14:40:30 +01:00
Samuel Pitoiset
22d8822683 aco: implement nir_intrinsic_load_global on GFX6
GFX6 doesn't have FLAT instructions, use MUBUF instructions instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>
2020-01-23 14:40:27 +01:00
Samuel Pitoiset
d6af7571c2 aco: implement nir_intrinsic_store_global on GFX6
GFX6 doesn't have FLAT instructions, use MUBUF instructions instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>
2020-01-23 14:40:24 +01:00
Samuel Pitoiset
01f0bef71e aco: fix wrong IR in nir_intrinsic_load_barycentric_at_sample
Only GFX6 was affected, my mistake. The total number of SGPR operands
should be 4 when we want to create a vec4.

Fixes: dbdf3b3ef9 ("aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>
2020-01-23 14:40:21 +01:00
Samuel Pitoiset
e030aef32c aco: add support for nir_texop_fragment_{mask}_fetch
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>
2020-01-23 10:48:02 +00:00
Timur Kristóf
533a20dbd5 aco: Fix maybe-uninitialized warnings.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3483>
2020-01-22 11:09:14 +01:00
Samuel Pitoiset
dbdf3b3ef9 aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6
GFX6 doesn't have FLAT instructions which means we have to emit
a 64-bit MUBUF load.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
2020-01-20 16:24:55 +00:00
Samuel Pitoiset
fe9157a700 aco: do not use the vec3 variant for loads on GFX6
GFX6 only supports vec3 with load/store format.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
2020-01-20 16:24:55 +00:00
Samuel Pitoiset
1b5bb204d9 aco: do not use the vec3 variant for stores on GFX6
GFX6 only supports vec3 with load/store format.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
2020-01-20 16:24:55 +00:00
Samuel Pitoiset
300f8dec76 aco: implement stream output with vec3 on GFX6
GFX6 doesn't support vec3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>
2020-01-16 14:06:06 +00:00
Samuel Pitoiset
923005bf54 aco: do not select 96-bit/128-bit variants for ds_read/ds_write on GFX6
Only GFX7 and later support large ds_read/ds_write.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>
2020-01-16 14:06:06 +00:00
Timur Kristóf
dfaa3c0af6 aco: Flip s_cbranch / s_cselect to optimize out an s_not if possible.
When possible, get rid of an s_not when all it does is invert the SCC,
and its successor s_cbranch / s_cselect can be inverted instead.

Also modify some parts of instruction_selection to take advantage of
this feature.

Example:
s2: %3900,  s1: %3899:scc = s_andn2_b64 %0:exec, %406
s2: %3902 = s_cselect_b64 -1, 0, %3900:scc
s2: %407,  s1: %3903:scc = s_not_b64 %3902
s2: %3906,  s1: %3905:scc = s_and_b64 %407, %0:exec
p_cbranch_z %3905:scc
Can now be optimized to:
s2: %3900,  s1: %3899:scc = s_andn2_b64 %0:exec, %406
p_cbranch_nz %3900:scc

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Timur Kristóf
338d03090f aco: Allow optimizing vote_all and nir_op_iand.
By adding an extra instruction, we can replace the operands of
the s_cselect_b64, which allows it to get picked up by the
optimizer when it looks for uniform booleans.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Rhys Perry
f92a89a979 aco: improve readfirstlane after uniform LDS loads
Totals from affected shaders:
SGPRS: 976 -> 968 (-0.82 %)
VGPRS: 580 -> 584 (0.69 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 106032 -> 103076 (-2.79 %) bytes
Max Waves: 237 -> 237 (0.00 %)
Instructions: 19452 -> 18740 (-3.66 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Daniel Schürmann
05c81875d7 aco: fix unconditional demote_to_helper
This patch fixes an out-of-bounds access on p_exit_early
and binds the exec register to the correct operand.

Fixes: 2ea9e59e8d ('aco: move s_andn2_b64 instructions out of the p_discard_if')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
2020-01-13 21:08:41 +00:00
Jason Ekstrand
d3737002ee nir/lower_atomics_to_ssbo: Also lower barriers
This is more correct for a pass which is supposed to completely lower
away atomic counters.  It also lets us stop supporting atomic counter
barriers in most of the drivers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
e40b11bbcb nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
60097cc840 nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone.  For most drivers,
it's a switch case in the NIR -> whatever which just breaks.  For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Rhys Perry
8f291dc146 aco: set exec_potentially_empty for demotes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
fcd6d83245 aco: fix imageSize()/textureSize() with large buffers on GFX8
Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8
path with the size multipled by the stride.
dEQP-VK.image.image_size.buffer.* was also run with the tests modified to
use a 96bit format.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:25:32 +00:00
Rhys Perry
49bcd06f97 aco: set vm for pos0 exports on GFX10
RADV's LLVM backend and radeonsi does the same thing.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:25:32 +00:00
Timur Kristóf
44a6b17df7 aco/wave32: Set the definitions of v_cmp instructions to the lane mask.
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.

Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-11 20:15:53 +01:00
Daniel Schürmann
8b7a42d6d0 aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t.
This reduces the size of the Instruction base class
from 40 bytes to 16 bytes. No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Daniel Schürmann
ffb4790279 aco: compact various Instruction classes
No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Samuel Pitoiset
4d49a7ac73 aco: handle nir_intrinsic_image_deref_{load,store} with lod
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2020-01-09 07:58:33 +01:00
Timur Kristóf
11e62a9734 aco: Fix uniform i2i64.
Fixes 240 failing test cases in dEQP-VK.spirv_assembly which
were failing due to a bad s_ashr_i32 instruction. This commit
fixes the instruction format along with the definitions of the
instruction.

Fixes: 11f43caaec
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-31 14:22:31 +01:00
Karol Herbst
b35e583c17 aco: use NIR_MAX_VEC_COMPONENTS instead of 4
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-21 11:00:16 +00:00
Samuel Pitoiset
13b4e9adcf ac: declare an enum for the OOB select field on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147>
2019-12-19 15:15:32 +01:00
Daniel Schürmann
df3e674fb3 aco: improve readfirstlane after uniform ssbo loads on GFX7
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)

Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0837471463 aco: use soffset for MUBUF instructions on SI/CI
pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)

Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8ad43d8838 aco: flush denorms after fmin/fmax on pre-GFX9
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1c4afe38f2 aco: implement 64bit ine/ieq for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1e1356b2ad aco: implement 64bit i2b for SI /CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835 aco: make 1/2*PI a literal constant on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
90fad7360d aco: implement 64bit VGPR shifts for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006 aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3eed4d2be5 aco: implement quad swizzles for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
bde9c1e3a1 aco: move buffer_store data to VGPR if needed
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
a8195bdf2e aco: implement nir_op_isign on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
b8783973cd aco: only use scalar loads for readonly buffers on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f27783a667 aco: implement nir_op_fquantize2f16 for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8aab92b393 aco: SI/CI - fix sampler aniso
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Dave Airlie
9b533a2ca3 aco: handle gfx7 int8/10 clamping on exports
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3177346bfc aco: refactor visit_store_fs_output() to use the Builder
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Timur Kristóf
637c5a1dd9 aco/wave32: Fix reductions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
21db083504 aco/wave32: Allow setting the subgroup ballot size to 64-bit.
Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
ed815d503e aco/wave32: Use wave_size for barrier intrinsic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
b8f2edb452 aco/wave32: Fix load_local_invocation_index to support wave32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
e0bcefc3a0 aco/wave32: Use lane mask regclass for exec/vcc.
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00