Commit graph

4340 commits

Author SHA1 Message Date
Timur Kristóf
e0bcefc3a0 aco/wave32: Use lane mask regclass for exec/vcc.
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
b4efe179ed aco/wave32: Add wave size specific opcodes to aco_builder.
Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.

This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
c44af6cbc7 aco/wave32: Introduce emit_mbcnt which takes wave size into account.
This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
07754a9c9e aco/wave32: Replace hardcoded numbers in spiller with wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
c0dbf42a03 aco/wave32: Change uniform bool optimization to work with wave32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
dd9dad731b aco: Optimize load_subgroup_id to one bit field extract instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
753670e902 aco: Remove lower_linear_bool_phi, it is not needed anymore.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
0d2d672020 aco: Remove superfluous argument from emit_boolean_logic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
9a43d26b74 aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Samuel Pitoiset
4cacba0c86 radv/gfx10: fix the vertex order for triangle strips emitted by a GS
My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.

Fixes: deafe4cc58 ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-04 08:28:57 +01:00
Samuel Pitoiset
dac6bd29ae radv: simplify a check in radv_fixup_vertex_input_fetches()
The number of loaded channels should always be > 0 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-04 08:04:05 +01:00
Samuel Pitoiset
3b51259f06 radv: remove dead shader input/output variables
No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-04 08:04:05 +01:00
Samuel Pitoiset
f63a3132e8 ac/llvm: fix atomic var operations if source isn't a deref
Fixes some CTS regressions.

Fixes: e61a826f39 ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-03 09:41:33 +01:00
Rhys Perry
01cacdb71e aco: fix block_kind_discard s_andn2 definition to exec
Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:24 +00:00
Rhys Perry
0e8da9f607 aco: handle loop exit and IF merge phis with break/discard
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.

This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.

Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:19 +00:00
Rhys Perry
06fc83989c aco: validate the CFG
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:05 +00:00
Rhys Perry
35fab1ba33 radv: set writes_memory for global memory stores/atomics
Fixes: 13ab63bb62 ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-02 11:47:12 +00:00
Rhys Perry
a814f3d8a7 ac/llvm: improve sync scope for global atomics
Stronger ordering is implemented in SPIRV->NIR with barriers.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-12-02 10:48:27 +00:00
Rhys Perry
e61a826f39 ac/llvm: fix pointer type for global atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-12-02 10:48:18 +00:00
Samuel Pitoiset
0eb78a078e aco: drop useless lowering of deref operations for shared memory
Moved to RADV. No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 21:58:25 +01:00
Samuel Pitoiset
c105e6169c radv,ac/nir: lower deref operations for shared memory
This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.

pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)

Found this while glancing at some F12019 shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-29 21:58:18 +01:00
Daniel Schürmann
b690543851 aco: fix a couple of value numbering issues
Fixes: 3a20ef4a32 'aco: refactor value numbering'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-29 21:54:27 +01:00
Daniel Schürmann
8861a82be7 aco: don't split live-ranges of linear VGPRs
Fixes: 93c8ebfa78 'aco: Initial commit of independent AMD compiler'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-29 21:54:27 +01:00
Rhys Perry
73783ed389 aco: implement global atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
389ee819c0 aco: improve FLAT/GLOBAL scheduling
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
cc742562c1 aco: don't enable store_global for helper invocations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
31e68e230f aco: fix SADDR with FLAT on GFX10
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
082e3a68fa aco: fix assembly of FLAT/GLOBAL atomics
They can take both a definition and data operand

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
f1381e6715 aco: fix GFX10 opcodes for some global/flat atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
5986e00194 aco: improve WAR hazard workaround with >64bit stores
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
a9fc81b098 aco: add v_nop inbetween exec write and VMEM/DS/FLAT
LLVM and the proprietary compiler seem to do this

Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
54742e157d aco: fix incorrect cast in parse_wait_instr()
s_waitcnt is SOPP, not SOPK

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
11f43caaec aco: fix i2i64
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
ff70ccad16 aco: propagate p_wqm on an image_sample's coordinate p_create_vector
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:19:52 +00:00
Bas Nieuwenhuizen
48fc65413c radv: Fix timeline semaphore refcounting.
Was totally broken ...

Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.

Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-28 23:46:09 +01:00
Bas Nieuwenhuizen
4cde0e04e3 radv: Unify max_descriptor_set_size.
They were out of sync. Besides syncing, lets ensure they never diverge
again.

Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-28 12:06:44 +01:00
Bas Nieuwenhuizen
e09426ad6b amd/llvm: Refactor ac_build_scan.
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-28 11:35:11 +01:00
Samuel Pitoiset
d347f2805d radv: add more constants to avoid using magic numbers
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-28 10:59:14 +01:00
Samuel Pitoiset
52aadbfd04 ac/llvm: convert src operands to pointers if necessary
To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.

Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-28 08:26:51 +01:00
Marek Olšák
42318f9197 ac/nir: don't rely on data.patch for tess factors
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:10 -05:00
Samuel Pitoiset
a24f1c8f7f radv: enable VK_KHR_shader_subgroup_extended_types on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:42:44 +01:00
Samuel Pitoiset
0812dbd403 ac: add 8-bit and 16-bit supports to ac_build_permlane16()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:42:42 +01:00
Samuel Pitoiset
c9aa843961 radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:39:26 +01:00
Samuel Pitoiset
86a5fbfd4a radv: fix enabling sample shading with SampleID/SamplePosition
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.

Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:22:54 +01:00
Daniel Schürmann
7cd548d352 aco: don't value-number instructions from within a loop with ones after the loop.
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
46420dd294 aco: set dlc/glc correctly for image loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
37843e454e aco: allow constant offsets for global/scratch instructions on GFX10
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.

Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Bas Nieuwenhuizen
02375b8436 radv: Enable VK_KHR_buffer_device_address.
Still no capture/replay or multi device support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-26 11:59:52 +00:00
Samuel Pitoiset
34dd4251e2 radv: fix reporting subgroup size with VK_KHR_pipeline_executable_properties
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-26 10:48:48 +01:00
Bas Nieuwenhuizen
25bc9102d8 radv: Allocate cmdbuffer space for buffer marker write.
Fixes: 946193ae00 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by:  Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-26 09:35:02 +00:00