Commit graph

2571 commits

Author SHA1 Message Date
Rhys Perry
132ae89b19 aco: use nir_lower_idiv_precise
v7: rename _nv50/_llvm to _fast/_precise

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Rhys Perry
8b98d0954e nir/lower_idiv: add new llvm-based path
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Daniel Schürmann
0e4bd261b1 aco: ensure that uniform booleans are computed in WQM if their uses happen in WQM
This fixes graphical corruption in SC2.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-21 17:39:46 +00:00
Timur Kristóf
7e5f87b533 aco/gfx10: Update constant addresses in fix_branches_gfx10.
Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Timur Kristóf
f380398f8f aco/gfx10: Fix PS exports for SPI_SHADER_32_AR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Timur Kristóf
1749953ea3 aco/gfx10: Wait for pending SMEM stores before loads
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.

Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Daniel Schürmann
4b458b3e8f aco: don't combine minmax3 if there is a neg or abs modifier in between
This fixes a graphical corruption in HotS.
No pipelinedb changes other than that.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-17 16:21:19 +00:00
Rhys Perry
88f1c0a360 aco: emit_split_vector() s_memtime results
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-16 15:31:19 +01:00
Rhys Perry
ded51b13da aco: don't CSE s_memtime
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-16 15:31:19 +01:00
Rhys Perry
d7838152f5 aco: fix scheduling with s_memtime/s_memrealtime
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-16 15:31:19 +01:00
Rhys Perry
f13ad839f1 aco: don't use p_as_uniform for vgpr sampler/image indices
p_as_uniform can get CSE'd, which can be incorrect and break some
dEQP-VK.descriptor_indexing.* tests.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00
Rhys Perry
0c3fe323b6 aco: implement divergent vulkan_resource_index
Fixes the UBO/SSBO dEQP-VK.descriptor_indexing.* tests

v2: remove bld.copy() usage

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00
Rhys Perry
5526a557ee aco: readfirstlane vgpr pointers in convert_pointer_to_64_bit()
This can happen when bcsel is used between the results of two
vulkan_resource_index. It's also probably needed for non-uniform
descriptor indexing

Fixes dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.reads_opselect_two_buffers

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00
Rhys Perry
45d6c69b99 aco: use can_accept_constant in valu_can_accept_literal
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00
Rhys Perry
b37857bcea aco: don't apply sgprs/constants to read/write lane instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00
Rhys Perry
2026ff5165 aco: update print_ir
Mostly adds GFX10 stuff.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 20:02:36 +00:00
Rhys Perry
283eda71cf aco: rework scratch resource code
Fix compute, cleanup and add GFX10 support.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 20:02:36 +00:00
Rhys Perry
f64b1a3454 aco/gfx10: disable GFX9 1D texture workarounds
Navi added back support for 1D textures.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 20:02:36 +00:00
Rhys Perry
de0748c42e aco/gfx10: fix inline uniform blocks
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 20:02:36 +00:00
Timur Kristóf
aa75be05af aco: Clean up usages of PhysReg::reg from aco_assembler.
These are not needed anymore, since PhyReg has an implicit
conversion operator that can convert it to unsigned int,
which is equivalent to accessing this field.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
d729d8f1dc aco: Add extra assertion for number of FS input VGPRs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
a89153d038 aco: Fix s_dcache_wb on GFX10.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
68c9554732 aco: Have s_waitcnt_vscnt write to NULL.
Not sure if this instruction actually writes anything, but LLVM
disassembles a destination and sets it to NULL.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
619f0a71cc aco: Use the VOP3-only add/sub GFX10 instructions if needed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
6a6bef59b0 aco: Initial work to avoid GFX10 hazards.
Currently just breaks up SMEM groups and fixes
FeatureVMEMtoScalarWriteHazard (name from LLVM).

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
d63c175897 aco: pad code with s_code_end on GFX10
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
83993f535e aco: workaround GFX10 0x3f branch bug
According to LLVM, branches with an offset of 0x3f are buggy.

v2: (by Timur Kristóf)
- extract the GFX10 specific part to its own function

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
0be1dd8564 aco: Fix VS input VGPRs on GFX10.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
c24cd97515 aco: Assemble opsel in VOP3 instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
818bdab796 aco: Allow literals on VOP3 instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-10 09:57:53 +02:00
Timur Kristóf
7cf1dcf22d aco: Support subvector loops in aco_assembler.
These are currently not used, but could be useful later.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
21f1953383 aco: Set GFX10 dimensionality on the instructions that need it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
eaa2a7cdf6 aco: Use ac_get_sampler_dim, delete duplicate code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
1de9ef9c96 aco: Set GFX10 DLC bit properly.
The DLC bit is now set to 1 for all loads when GLC is also set,
but cleared to 0 for all stores (otherwise it causes issues),
and also cleared to 0 for atomics.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
89b074be86 aco: Support GFX10 VOP3 and VOP1 as VOP3 in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
d3a48c272f aco: Support GFX10 EXP in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
e6330d71b5 aco: Fix GFX9 FLAT, SCRATCH, GLOBAL instructions, add GFX10 support.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
64d74ca816 aco: Support GFX10 MIMG and GFX9 D16 in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
c0df15e645 aco: Support GFX10 MTBUF in aco_assembler.
Also remove img_format from aco_ir, since it can be calculated
from dfmt and nfmt. So only the assember needs to deal with it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Timur Kristóf
e96124bd65 aco: Link ACO with amd/common.
We'd like to use some functions, for example some
ac_shader_util functions in ACO, so we need to link
ACO to AC.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
9e27816252 aco: Support GFX10 MUBUF in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
6106d4bce9 aco: Support GFX10 DS in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
bbe87eb6c3 aco: Support GFX10 VINTRP in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
b6235651b9 aco: Support GFX10 SMEM in aco_assembler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
fd1d947457 aco: Add missing GFX10 specific fields and some README notes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Timur Kristóf
a01d796de4 aco: Set +wavefrontsize64 for LLVM disassembler in GFX10 wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:52 +02:00
Rhys Perry
3f6e91a8d8 aco: enable nir_opt_sink
SGPRS: 880272 -> 838936 (-4.70 %)
VGPRS: 705316 -> 680988 (-3.45 %)
Spilled SGPRs: 1032 -> 832 (-19.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size: 55150788 -> 55172436 (0.04 %) bytes
LDS: 451 -> 451 (0.00 %) blocks
Max Waves: 66178 -> 68706 (3.82 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-09 17:55:25 +00:00
Rhys Perry
2ea9e59e8d aco: move s_andn2_b64 instructions out of the p_discard_if
And use a new p_discard_early_exit instruction. This fixes some cases
where a definition having the same register as an operand causes issues.

v2: rename instruction to p_exit_early_if
v2: modify the existing instruction instead of creating a new one
v3: merge the "i == num - 1" IFs

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-09 16:19:02 +00:00
Daniel Schürmann
f584c42707 aco: don't reorder instructions in order to lower boolean phis
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-09 17:50:23 +02:00
Daniel Schürmann
10be90671f aco: re-use existing phi instruction when lowering boolean phis
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-09 17:50:23 +02:00