fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Samuel Pitoiset	36e7a5f5b9	aco: implement nir_intrinsic_global_atomic_* on GFX6 GFX6 doesn't have FLAT instructions, use MUBUF instructions instead. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>	2020-01-23 14:40:30 +01:00
Samuel Pitoiset	22d8822683	aco: implement nir_intrinsic_load_global on GFX6 GFX6 doesn't have FLAT instructions, use MUBUF instructions instead. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>	2020-01-23 14:40:27 +01:00
Samuel Pitoiset	d6af7571c2	aco: implement nir_intrinsic_store_global on GFX6 GFX6 doesn't have FLAT instructions, use MUBUF instructions instead. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>	2020-01-23 14:40:24 +01:00
Samuel Pitoiset	01f0bef71e	aco: fix wrong IR in nir_intrinsic_load_barycentric_at_sample Only GFX6 was affected, my mistake. The total number of SGPR operands should be 4 when we want to create a vec4. Fixes: `dbdf3b3ef9` ("aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3477>	2020-01-23 14:40:21 +01:00
Samuel Pitoiset	e030aef32c	aco: add support for nir_texop_fragment_{mask}_fetch Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>	2020-01-23 10:48:02 +00:00
Timur Kristóf	533a20dbd5	aco: Fix maybe-uninitialized warnings. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3483>	2020-01-22 11:09:14 +01:00
Samuel Pitoiset	dbdf3b3ef9	aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6 GFX6 doesn't have FLAT instructions which means we have to emit a 64-bit MUBUF load. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>	2020-01-20 16:24:55 +00:00
Samuel Pitoiset	fe9157a700	aco: do not use the vec3 variant for loads on GFX6 GFX6 only supports vec3 with load/store format. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>	2020-01-20 16:24:55 +00:00
Samuel Pitoiset	1b5bb204d9	aco: do not use the vec3 variant for stores on GFX6 GFX6 only supports vec3 with load/store format. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>	2020-01-20 16:24:55 +00:00
Samuel Pitoiset	300f8dec76	aco: implement stream output with vec3 on GFX6 GFX6 doesn't support vec3. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>	2020-01-16 14:06:06 +00:00
Samuel Pitoiset	923005bf54	aco: do not select 96-bit/128-bit variants for ds_read/ds_write on GFX6 Only GFX7 and later support large ds_read/ds_write. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>	2020-01-16 14:06:06 +00:00
Timur Kristóf	dfaa3c0af6	aco: Flip s_cbranch / s_cselect to optimize out an s_not if possible. When possible, get rid of an s_not when all it does is invert the SCC, and its successor s_cbranch / s_cselect can be inverted instead. Also modify some parts of instruction_selection to take advantage of this feature. Example: s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406 s2: %3902 = s_cselect_b64 -1, 0, %3900:scc s2: %407, s1: %3903:scc = s_not_b64 %3902 s2: %3906, s1: %3905:scc = s_and_b64 %407, %0:exec p_cbranch_z %3905:scc Can now be optimized to: s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406 p_cbranch_nz %3900:scc Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-14 21:21:06 +01:00
Timur Kristóf	338d03090f	aco: Allow optimizing vote_all and nir_op_iand. By adding an extra instruction, we can replace the operands of the s_cselect_b64, which allows it to get picked up by the optimizer when it looks for uniform booleans. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-14 21:21:06 +01:00
Rhys Perry	f92a89a979	aco: improve readfirstlane after uniform LDS loads Totals from affected shaders: SGPRS: 976 -> 968 (-0.82 %) VGPRS: 580 -> 584 (0.69 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 106032 -> 103076 (-2.79 %) bytes Max Waves: 237 -> 237 (0.00 %) Instructions: 19452 -> 18740 (-3.66 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>	2020-01-14 12:56:28 +00:00
Daniel Schürmann	05c81875d7	aco: fix unconditional demote_to_helper This patch fixes an out-of-bounds access on p_exit_early and binds the exec register to the correct operand. Fixes: `2ea9e59e8d` ('aco: move s_andn2_b64 instructions out of the p_discard_if') Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>	2020-01-13 21:08:41 +00:00
Jason Ekstrand	d3737002ee	nir/lower_atomics_to_ssbo: Also lower barriers This is more correct for a pass which is supposed to completely lower away atomic counters. It also lets us stop supporting atomic counter barriers in most of the drivers. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	e40b11bbcb	nir: Rename nir_intrinsic_barrier to control_barrier This is a more explicit name now that we don't want it to be doing any memory barrier stuff for us. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	60097cc840	nir: Add a new memory_barrier_tcs_patch intrinsic Right now, it's implemented as a no-op for everyone. For most drivers, it's a switch case in the NIR -> whatever which just breaks. For ir3, they already have code to delete tessellation barriers so we just add a case to also delete memory_barrier_tcs_patch. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Rhys Perry	8f291dc146	aco: set exec_potentially_empty for demotes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `93c8ebfa78` ('aco: Initial commit of independent AMD compiler') Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>	2020-01-13 13:26:43 +00:00
Rhys Perry	fcd6d83245	aco: fix imageSize()/textureSize() with large buffers on GFX8 Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8 path with the size multipled by the stride. dEQP-VK.image.image_size.buffer.* was also run with the tests modified to use a 96bit format. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `93c8ebfa78` ('aco: Initial commit of independent AMD compiler') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>	2020-01-13 13:25:32 +00:00
Rhys Perry	49bcd06f97	aco: set vm for pos0 exports on GFX10 RADV's LLVM backend and radeonsi does the same thing. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: 19.3 <mesa-stable@lists.freedesktop.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>	2020-01-13 13:25:32 +00:00
Timur Kristóf	44a6b17df7	aco/wave32: Set the definitions of v_cmp instructions to the lane mask. The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode, as opposed to s2 (an SGPR-pair) in wave64 mode. A couple of cases where this should have been fixed were omitted from the previous patch by mistake. Fixes: `e0bcefc3a0` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-11 20:15:53 +01:00
Daniel Schürmann	8b7a42d6d0	aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t. This reduces the size of the Instruction base class from 40 bytes to 16 bytes. No pipelinedb changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>	2020-01-10 17:49:18 +00:00
Daniel Schürmann	ffb4790279	aco: compact various Instruction classes No pipelinedb changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>	2020-01-10 17:49:18 +00:00
Samuel Pitoiset	4d49a7ac73	aco: handle nir_intrinsic_image_deref_{load,store} with lod Use image_load_mip and image_store_mip respectively if the lod parameter isn't zero. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2020-01-09 07:58:33 +01:00
Timur Kristóf	11e62a9734	aco: Fix uniform i2i64. Fixes 240 failing test cases in dEQP-VK.spirv_assembly which were failing due to a bad s_ashr_i32 instruction. This commit fixes the instruction format along with the definitions of the instruction. Fixes: `11f43caaec` Cc: 19.3 <mesa-stable@lists.freedesktop.org> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-31 14:22:31 +01:00
Karol Herbst	b35e583c17	aco: use NIR_MAX_VEC_COMPONENTS instead of 4 Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-21 11:00:16 +00:00
Samuel Pitoiset	13b4e9adcf	ac: declare an enum for the OOB select field on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3147>	2019-12-19 15:15:32 +01:00
Daniel Schürmann	df3e674fb3	aco: improve readfirstlane after uniform ssbo loads on GFX7 pipeline-db changes for GFX7: 80310 shaders in 40472 tests Totals: SGPRS: 3655900 -> 3643916 (-0.33 %) VGPRS: 2678324 -> 2686324 (0.30 %) Spilled SGPRs: 1730 -> 1634 (-5.55 %) Spilled VGPRs: 14 -> 21 (50.00 %) Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread Code Size: 136106120 -> 135457616 (-0.48 %) bytes LDS: 1259 -> 1259 (0.00 %) blocks Max Waves: 601014 -> 600206 (-0.13 %) Totals from affected shaders: SGPRS: 307832 -> 295848 (-3.89 %) VGPRS: 267864 -> 275864 (2.99 %) Spilled SGPRs: 770 -> 674 (-12.47 %) Spilled VGPRs: 14 -> 21 (50.00 %) Scratch size: 16 -> 12 (-25.00 %) dwords per thread Code Size: 22007488 -> 21358984 (-2.95 %) bytes LDS: 65 -> 65 (0.00 %) blocks Max Waves: 28668 -> 27860 (-2.82 %) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	0837471463	aco: use soffset for MUBUF instructions on SI/CI pipeline-db changes for GFX7: 80310 shaders in 40472 tests Totals: SGPRS: 3655300 -> 3655900 (0.02 %) VGPRS: 2677732 -> 2678324 (0.02 %) Spilled SGPRs: 1730 -> 1730 (0.00 %) Spilled VGPRs: 14 -> 14 (0.00 %) Scratch size: 15540 -> 15540 (0.00 %) dwords per thread Code Size: 136488364 -> 136106120 (-0.28 %) bytes LDS: 1259 -> 1259 (0.00 %) blocks Max Waves: 601039 -> 601014 (-0.00 %) Totals from affected shaders: SGPRS: 316312 -> 316912 (0.19 %) VGPRS: 273844 -> 274436 (0.22 %) Spilled SGPRs: 770 -> 770 (0.00 %) Spilled VGPRs: 14 -> 14 (0.00 %) Scratch size: 16 -> 16 (0.00 %) dwords per thread Code Size: 22724904 -> 22342660 (-1.68 %) bytes LDS: 114 -> 114 (0.00 %) blocks Max Waves: 30861 -> 30836 (-0.08 %) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	8ad43d8838	aco: flush denorms after fmin/fmax on pre-GFX9 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	1c4afe38f2	aco: implement 64bit ine/ieq for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	1e1356b2ad	aco: implement 64bit i2b for SI /CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	da7ff58835	aco: make 1/2*PI a literal constant on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	90fad7360d	aco: implement 64bit VGPR shifts for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	6a586a6006	aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	3eed4d2be5	aco: implement quad swizzles for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	bde9c1e3a1	aco: move buffer_store data to VGPR if needed Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	a8195bdf2e	aco: implement nir_op_isign on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	b8783973cd	aco: only use scalar loads for readonly buffers on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	f27783a667	aco: implement nir_op_fquantize2f16 for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	8aab92b393	aco: SI/CI - fix sampler aniso Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Dave Airlie	9b533a2ca3	aco: handle gfx7 int8/10 clamping on exports Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	3177346bfc	aco: refactor visit_store_fs_output() to use the Builder Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Timur Kristóf	637c5a1dd9	aco/wave32: Fix reductions. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	21db083504	aco/wave32: Allow setting the subgroup ballot size to 64-bit. Previously, it would only work when the ballot size was set to the lane mask. This patch makes is possible to set the ballot size to either 32-bit or 64-bit for wave32 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	ed815d503e	aco/wave32: Use wave_size for barrier intrinsic. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	b8f2edb452	aco/wave32: Fix load_local_invocation_index to support wave32. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	e0bcefc3a0	aco/wave32: Use lane mask regclass for exec/vcc. Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	c44af6cbc7	aco/wave32: Introduce emit_mbcnt which takes wave size into account. This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32 instruction is superfluous. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00

... 3 4 5 6 7

303 commits