fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Daniel Schürmann	1e1356b2ad	aco: implement 64bit i2b for SI /CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	da7ff58835	aco: make 1/2*PI a literal constant on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	90fad7360d	aco: implement 64bit VGPR shifts for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	6a586a6006	aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	3eed4d2be5	aco: implement quad swizzles for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	bde9c1e3a1	aco: move buffer_store data to VGPR if needed Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	a8195bdf2e	aco: implement nir_op_isign on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	b8783973cd	aco: only use scalar loads for readonly buffers on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	f27783a667	aco: implement nir_op_fquantize2f16 for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	8aab92b393	aco: SI/CI - fix sampler aniso Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Dave Airlie	9b533a2ca3	aco: handle gfx7 int8/10 clamping on exports Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	3177346bfc	aco: refactor visit_store_fs_output() to use the Builder Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Timur Kristóf	637c5a1dd9	aco/wave32: Fix reductions. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	21db083504	aco/wave32: Allow setting the subgroup ballot size to 64-bit. Previously, it would only work when the ballot size was set to the lane mask. This patch makes is possible to set the ballot size to either 32-bit or 64-bit for wave32 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	ed815d503e	aco/wave32: Use wave_size for barrier intrinsic. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	b8f2edb452	aco/wave32: Fix load_local_invocation_index to support wave32. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	e0bcefc3a0	aco/wave32: Use lane mask regclass for exec/vcc. Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	c44af6cbc7	aco/wave32: Introduce emit_mbcnt which takes wave size into account. This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32 instruction is superfluous. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	dd9dad731b	aco: Optimize load_subgroup_id to one bit field extract instruction. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	0d2d672020	aco: Remove superfluous argument from emit_boolean_logic. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	9a43d26b74	aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Rhys Perry	0e8da9f607	aco: handle loop exit and IF merge phis with break/discard ACO considers discards jumps and creates edges in the CFG for them but NIR does neither of these. This can be fixed instead by keeping track of whether a side of an IF had a break/discard, but this doesn't solve the issue with discards affecting loop exit phis. So this reworks phi handling a bit. Fixes these tests: dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop dEQP-VK.graphicsfuzz.loop-call-discard dEQP-VK.graphicsfuzz.complex-nested-loops-and-call Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-02 16:56:19 +00:00
Rhys Perry	73783ed389	aco: implement global atomics Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:02 +00:00
Rhys Perry	389ee819c0	aco: improve FLAT/GLOBAL scheduling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:02 +00:00
Rhys Perry	cc742562c1	aco: don't enable store_global for helper invocations Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:02 +00:00
Rhys Perry	11f43caaec	aco: fix i2i64 Fixes: `93c8ebfa` ('aco: Initial commit of independent AMD compiler') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:01 +00:00
Rhys Perry	ff70ccad16	aco: propagate p_wqm on an image_sample's coordinate p_create_vector Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156 Fixes: `93c8ebfa78` ('aco: Initial commit of independent AMD compiler') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:19:52 +00:00
Rhys Perry	46420dd294	aco: set dlc/glc correctly for image loads Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>	2019-11-26 14:39:27 +00:00
Connor Abbott	01eb6ef870	aco: Make unused workgroup id's 0 It shouldn't matter, but the 1 was leftover from when it was handled together with workgroup_size and num_work_groups. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	bb78f9b4e4	aco: Use common argument handling Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	4d6676d78a	aco: Make num_workgroups and local_invocation_ids one argument each To match the LLVM argument setup code. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	a7f1c63442	aco: Split vector arguments at the beginning Due to how LLVM works we have to make some of the FS inputs become vectors, and therefore have to split them early so that they don't take up extra register pressure due to how RA currently works. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	680b086db1	aco: Constify radv_nir_compiler_options in isel It's already const for everything else. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Marek Olšák	ebe7579655	nir: move data.image.access to data.access The size of the data structure doesn't change. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-19 18:20:05 -05:00
Rhys Perry	df645fa369	aco: implement VK_KHR_shader_float_controls This actually supports more of the extension than the LLVM backend but we can't enable it because ACO doesn't work with all stages yet. With more of it enabled, some CTS tests fail because our 64-bit sqrt is very imprecise. I can't find any precision requirements for it anywhere, so I'm thinking it might be a CTS issue. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Rhys Perry	be1d11249b	aco: fix 64-bit fsign with 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `93c8ebfa` ('aco: Initial commit of independent AMD compiler') Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Timur Kristóf	8995c0b30a	aco: Treat all booleans as per-lane. Previously, instruction selection had two kinds of booleans: 1. divergent which was per-lane and stored in s2 (VCC size) 2. uniform which was stored in s1 Additionally, uniform booleans were made per-lane when they resulted from operations which were supported only by the VALU. To decide which type was used, we relied on the destination size, which was not reliable due to the per-lane uniform bools, but it mostly works on wave64. However, in wave32 mode (where VCC is also s1) this approach makes it impossible keep track of which boolean is uniform and which is divergent. This commit makes all booleans per-lane. The resulting excess code size will be taken care of by the optimizer. v2 (by Daniel Schürmann): - Better names for some functions - Use s_andn2_b64 with exec for nir_op_inot - Simplify code due to using s_and_b64 in bool_to_scalar_condition v3 (by Timur Kristóf): - Fix several subgroups regressions Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-14 17:27:11 +01:00
Daniel Schürmann	a1622c1a11	aco: use s_and_b64 exec to reduce uniform booleans to one bit Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-11-14 17:27:10 +01:00
Rhys Perry	6914b0236f	aco: combine read_invocation and shuffle implementations They do mostly the same thing now. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-12 17:21:38 +00:00
Rhys Perry	5a1bacb6f9	aco: fix read_invocation with VGPR lane index Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `93c8ebfa` ('aco: Initial commit of independent AMD compiler')	2019-11-12 17:21:38 +00:00
Rhys Perry	f97d933426	aco: fix shuffle with uniform operands Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `93c8ebfa` ('aco: Initial commit of independent AMD compiler')	2019-11-12 17:21:38 +00:00
Samuel Pitoiset	d3f9957de4	radv: determine shaders wavesize at pipeline level Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-06 09:20:34 +01:00
Daniel Schürmann	c79972b604	aco: always set scratch_offset in startpgm This patch also moves private_segment_buffer and scratch_offset to Program to easily access it. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-30 19:48:33 +00:00
Daniel Schürmann	655a703349	aco: remove potential critical edge on loops. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-30 19:48:33 +00:00
Daniel Schürmann	636d45e46a	aco: add can_reorder flags to load_ubo and load_constant These got lost due to some refactoring. Due to the way our scheduler works currently, for now we add back the reorder flag for divergent loads only. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-30 16:12:10 +00:00
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Timur Kristóf	d59f702e26	aco: Implement subgroup shuffle in GFX10 wave64 mode. Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	964ce47abc	aco: add missing bld.scc() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	a8d0101d69	aco: use ds_read2_b64/ds_write2_b64 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	58d4aee5df	aco: fix sparse store_lds() p_extract_vector's second operand is in units of the definition size, not dwords. v2: move extract_subvector() to right before ds_write_helper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00

1 2 3

121 commits