fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 11:00:11 +01:00

Author	SHA1	Message	Date
Nicolai Hähnle	8efaffa893	amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scan Allow for a unified but efficient treatment of adding a bitmask over a wave or an entire threadgroup. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:19 +01:00
Nicolai Hähnle	300876a9a7	amd/common: scan/reduce across waves of a workgroup Order-aware scan/reduce can trade-off LDS traffic for external atomics memory traffic in producer/consumer compute shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:17 +01:00
Nicolai Hähnle	3963402fd3	amd/common: add ac_build_ifcc Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:15 +01:00
Nicolai Hähnle	3c77f26ccc	amd/common: whitespace fixes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:12 +01:00
Rhys Perry	12dc7cb202	ac: refactor visit_load_buffer This is so that we can split different types of loads more easily. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Samuel Pitoiset	3fbdcd942f	amd: remove support for LLVM 6.0 User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-06 14:02:56 +01:00
Dave Airlie	ec9fe8abc7	ac: avoid casting pointers on bcsel and stores For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 08:54:25 +10:00
Bas Nieuwenhuizen	dd0172e865	radv: Use structured intrinsics instead of indexing workaround for GFX9. These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-19 23:36:00 +01:00
Marek Olšák	8676af12c8	ac: fix ac_build_fdiv for f64 trivial Fixes: `a5f35aa742`	2018-10-29 17:24:21 -04:00
Connor Abbott	59535b05cf	ac: Introduce ac_build_expand() And implement ac_bulid_expand_to_vec4() on top of it. Fixes: `7e7ee82698` ("ac: add support for 16bit buffer loads") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 09:44:51 +02:00
Marek Olšák	bfc795670e	ac: add helpers for fast integer division by a constant	2018-10-16 17:23:25 -04:00
Samuel Pitoiset	416013b4f5	radv: emit the GLC bit for SSBO loads/stores when needed This fixes some new memory model tests: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-12 08:42:08 +02:00
Marek Olšák	77903c8cfb	ac: add ac_build_round	2018-10-06 21:50:09 -04:00
Marek Olšák	82f5f89bf6	ac: simplify LLVM alloca helpers	2018-10-06 21:50:09 -04:00
Marek Olšák	a668c8d6ba	ac: define all address spaces properly	2018-10-06 21:50:09 -04:00
Samuel Pitoiset	cd76ce0078	ac: add 16-bit support to ac_build_bitfield_reverse() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:37 +02:00
Samuel Pitoiset	fc398f4d67	ac: add 16-bit support to ac_build_bit_count() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:34 +02:00
Samuel Pitoiset	94dd08eb7c	ac: add 16-bit support to ac_find_lsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:32 +02:00
Samuel Pitoiset	5a6c8ca3e8	ac: add 16-bit support to ac_build_umsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:30 +02:00
Samuel Pitoiset	3e7f3e2cd1	ac: add 16-bit support to ac_build_isign() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:28 +02:00
Samuel Pitoiset	cfd6314cfe	ac: add 16-bit constant values for zero and one Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:26 +02:00
Samuel Pitoiset	074e29183c	ac: add ac_build_bifield_reverse() helper Are we missing 64-bit support? Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:23 +02:00
Samuel Pitoiset	371c35e5bb	ac: add ac_build_bit_count() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:20 +02:00
Marek Olšák	be0bd95abf	radeonsi: fix GPU hangs with bindless textures and LLVM 7.0 Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	cc36ebbdc3	ac: use iN_0/1 constants Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	a5f35aa742	ac: revert new LLVM 7.0 behavior for fdiv Cc: 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	60beac9efc	ac,radeonsi: use ac_build_fmad Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	659f2e0fcb	ac: add imad & fmad helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	2276f8f064	ac: add ac_build_s_barrier Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	fd1121e839	amd: remove support for LLVM 5.0 Users are encouraged to switch to LLVM 6.0 released in March 2018. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-03 18:36:11 -04:00
Daniel Schürmann	a6a21e651d	ac: add support for 16bit UBO loads Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 23:16:25 +02:00
Daniel Schürmann	f582367d49	ac: add 16bit conversion operations Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 23:16:25 +02:00
Marek Olšák	4695984dbc	ac: fold LLVMContext creation into ac_llvm_context_init Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-07-04 15:48:18 -04:00
Marek Olšák	e5e57c3a5e	ac: handle undefined EQAA samples in ac_apply_fmask_to_sample RADV might wanna use this helper too. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-06-13 22:00:12 -04:00
Timothy Arceri	fae3b38770	ac: fix possible truncation of intrinsic name Fixes the gcc warning: snprintf’ output between 26 and 33 bytes into a destination of size 32 Fixes: `d5f7ebda3e` ("ac: add LLVM build functions for subgroup instrinsics") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-08 09:24:15 +10:00
Bas Nieuwenhuizen	4fc2d5e141	amd/common: Fix number of coords for getlod. The LLVM 6 code reduced it to a non-array call. We need to do that with the new code too. This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.array for radv. Fixes: `a9a7993441` "amd/common: use the dimension-aware image intrinsics on LLVM 7+" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-06-07 23:59:52 +02:00
Nicolai Hähnle	a9a7993441	amd/common: use the dimension-aware image intrinsics on LLVM 7+ Requires LLVM trunk r329166. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-06-04 21:34:59 +02:00
Bas Nieuwenhuizen	699e1f5aac	ac: Use DPP for build_ddxy where possible. WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-23 21:02:45 +02:00
Marek Olšák	f9eb1ef870	amd: remove support for LLVM 4.0 It doesn't support GFX9. Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-17 14:54:41 -04:00
Dave Airlie	eba4cf797c	ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic Drop the use of the old intrinsic. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-05-17 11:46:53 +10:00
Nicolai Hähnle	c0acb596f4	amd/common: use llvm.amdgcn.wqm for explicit derivatives To comply with an upcoming change in LLVM, see https://reviews.llvm.org/D46051 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-04 11:02:48 +02:00
Samuel Pitoiset	d136a5fad9	ac: fix the number of coordinates for ac_image_get_lod and arrays This fixes crashes for the following CTS: dEQP-VK.glsl.texture_functions.query.texturequerylod.* Cubemaps are the same as 2D arrays. Fixes: `625dcbbc45` ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-23 21:48:38 +02:00
Nicolai Hähnle	24fb3e6aa1	ac/nir: use ac_build_image_opcode for image intrinsics So that we'll use the dimension-aware intrinsics in the future. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:30:07 +02:00
Nicolai Hähnle	74063431f1	radeonsi: generate image load/store/atomic ops using ac_build_image_opcode In preparation of dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:29:57 +02:00
Nicolai Hähnle	625dcbbc45	amd/common: pass address components individually to ac_build_image_intrinsic This is in preparation for the new image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:23:52 +02:00
Nicolai Hähnle	f931583828	amd/common: pass new enum ac_image_dim to ac_build_image_opcode This is in preparation for the new, dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:23:40 +02:00
Daniel Schürmann	d5f7ebda3e	ac: add LLVM build functions for subgroup instrinsics Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-14 01:03:09 +02:00
Daniel Schürmann	d19f20e793	ac: make ballot and umsb capable of 64bit inputs Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-14 00:52:22 +02:00
Marek Olšák	dc04e4bba2	radeonsi: move FMASK shader logic to shared code We'll need it for FBFETCH in both TGSI and NIR paths. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-04-02 13:55:22 -04:00
Bas Nieuwenhuizen	4503ff760c	ac/nir: Add workaround for GFX9 buffer views. On GFX9 whether the buffer size is interpreted as elements or bytes depends on whether IDXEN is enabled in the instruction. If the index is a constant zero, LLVM optimizes IDXEN to 0. Now the size in elements is interpreted in bytes which of course results in out of bounds accesses. The correct fix is most likely to disable the LLVM optimization, but we need something to work with LLVM <= 6.0. radeonsi does the max between stride and element count on the CPU but that results in the size intrinsics returning the wrong size for the buffer. This would cause CTS errors for radv. v2: Also include the store changes. Fixes: `e38685cc62` 'Revert "radv: disable support for VEGA for now."' Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-29 00:03:03 +02:00

1 2 3 4

160 commits