fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-04 09:10:12 +01:00

Author	SHA1	Message	Date
Marek Olšák	facfab28fe	radeonsi/gfx9: add workarounds to avoid VGPR indexing completely For inputs and outputs, indirect indexing is lowered by the GLSL compiler. For temporaries, use alloca and disable the "promote-alloca" pass. In the future, we could switch all codepaths to alloca permanently and just rely on the "promote-alloca" pass. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	4560f2b90a	radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	ece0c0439f	radeonsi: don't call gallivm_init_llvm_targets It's for initializing the native (x86) target. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Juan A. Suarez Romero	a625d58ee1	radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0 LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0. Fixes: `257b538` ("radeonsi: do EarlyCSEMemSSA LLVM pass) Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-06-08 23:32:32 +02:00
Marek Olšák	257b538fd2	radeonsi: do EarlyCSEMemSSA LLVM pass so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:09 +02:00
Marek Olšák	b8f8d9e46c	radeonsi: clamp indirect index to the number of declared shader resources We'll do partial uploads of descriptor arrays, so we need to clamp against what shaders declare. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Nicolai Hähnle	c485b47383	radeonsi: extract TGSI memory/texture opcode handling into its own file It's about time to get the growth of si_shader.c somewhat under control. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-16 16:11:55 +02:00
Marek Olšák	e107c5a426	radeonsi/gfx9: set correct LLVM calling conventions for merged shaders for scratch support Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	2d662c0cba	radeonsi: inline si_llvm_shader_type into si_llvm_create_func Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	f8f8242e8b	radeonsi: fold surrounding code into si_llvm_finalize_module and rename to si_llvm_optimize_module. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	12beef0374	radeonsi: drop support for LLVM 3.8 LLVM 3.8: - had broken indirect resource indexing - didn't have scratch coalescing - was the last user of problematic v16i8 - only supported OpenGL 4.1 This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	4d32b4ac99	radeonsi: stop using v16i8 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	130e198c49	radeonsi: separate out TGSI initialization of si_shader_context so that we can put multiple different TGSI shaders into one module. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Nicolai Hähnle	24d4fbe226	radeonsi: strengthen emit_optimization_barrier LLVM will lift inline assembly out of if-else-blocks if both paths have the same inline assembly. Prevent this by adding an irrelevant unique text to the assembly. This requires the LLVM assembly parser to be initialized. Furthermore, allow forcing subsequent computations to happen after the optimization barrier by defining a data dependency. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-05 15:29:43 +02:00
Nicolai Hähnle	4cf2942777	radeonsi: support 64-bit system values For simplicitly, always store system values as 32-bit values or arrays of 32-bit values. 64-bit values are unpacked and packed accordingly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-05 15:29:43 +02:00
Marek Olšák	6ca46c3d77	radeonsi: access gallivm through ctx in most places Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-04 16:55:21 +02:00
Marek Olšák	04e4fe594b	radeonsi: use ctx->types instead of bld->types etc. even vec_type is f32. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-04 16:55:19 +02:00
Marek Olšák	7a5e6dcba5	radeonsi: use i32_0/1 instead of *int_bld.zero/one in most places Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-04 16:55:16 +02:00
Marek Olšák	29adaa19ac	radeonsi: remove most uses of lp_build_const* Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-04 11:14:43 +02:00
Marek Olšák	474468fbf9	radeonsi/gfx9: disable features that don't work Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Samuel Pitoiset	7751ed39e4	radeonsi: disable sinking common instructions down to the end block Initially this was a workaround for a bug introduced in LLVM 4.0 in the SimplifyCFG pass that caused image instrinsics to disappear (because they were badly sunk). Finally, this is a win because it decreases SGPR spilling and increases the number of waves a bit. Although, shader-db results are good I think we might want to remove it in the future once the issue is fixed. For now, enable it for LLVM >= 4.0. This also fixes a rendering issue with the speedometer in Dirt Rally. More information can be found here https://reviews.llvm.org/D26348. Thanks to Dave Airlie for the patch. v2: - add a FIXME comment - use if (HAVE_LLVM >= 0x0400) instead Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-15 14:24:40 +01:00
Marek Olšák	7e1faa79d3	radeonsi: drop support for LLVM 3.6 & 3.7 They are too old. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-03-06 14:13:04 +01:00
Marek Olšák	7f1446a8a1	ac: normalize build helper names s/emit/build/ Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-03-03 17:30:07 +01:00
Timothy Arceri	d90bf4ef3e	radeon: remove unused radeon_elf_util.{c,h} We now use the shared code in AMD common instead. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-28 13:20:31 +11:00
Timothy Arceri	dc4c551a34	radeon/ac: switch from radeon_elf_read() to ac_elf_read() Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-28 13:20:31 +11:00
Timothy Arceri	69a687189e	radeon/ac: switch from radeon_shader_binary to ac_shader_binary Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-28 13:20:31 +11:00
Marek Olšák	52581606c2	radeonsi: set no-signed-zeros-fp-math Recommended by Matt Arsenault. 46757 shaders in 28742 tests Totals: SGPRS: 2068851 -> 2066907 (-0.09 %) VGPRS: 1604056 -> 1602676 (-0.09 %) Spilled SGPRs: 1402 -> 1382 (-1.43 %) Spilled VGPRs: 113 -> 113 (0.00 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 3224 -> 3188 (-1.12 %) dwords per thread Code Size: 58815520 -> 58716788 (-0.17 %) bytes LDS: 1162 -> 1162 (0.00 %) blocks Max Waves: 354616 -> 354905 (0.08 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 786452 -> 784508 (-0.25 %) VGPRS: 530000 -> 528620 (-0.26 %) Spilled SGPRs: 958 -> 938 (-2.09 %) Spilled VGPRs: 85 -> 85 (0.00 %) Private memory VGPRs: 636 -> 636 (0.00 %) Scratch size: 1880 -> 1844 (-1.91 %) dwords per thread Code Size: 26349936 -> 26251204 (-0.37 %) bytes LDS: 304 -> 304 (0.00 %) blocks Max Waves: 108962 -> 109251 (0.27 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Marek Olšák	fd3e73f54e	gallivm: add no-signed-zeros-fp-math option to lp_create_builder (v2) v2: define lp_float_mode Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Marek Olšák	660b55e6d9	radeonsi: stop using TGSI_OPCODE_CLAMP by moving it amd/common Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 02:58:43 +01:00
Marek Olšák	dbd38f2a92	radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loads Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Tom Stellard	226a2c6d6e	radeonsi: Fix build on LLVM < 3.9 v2 This was broken by: `e0cc0a614c` v2: - Use preprocessor macro Tested-by: Mark Janes <mark.a.janes@intel.com>	2017-02-01 02:10:00 +00:00
Tom Stellard	e0cc0a614c	radeonsi: Set datalayout on the llvm module This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-31 20:39:30 +00:00
Marek Olšák	59c5da40ed	radeonsi: preload PS inputs only if KILL is used so that most shaders can get lower VGPR usage thanks to lazy input loading. I think this is a more accurate constraint that prevents the black transitions in Witcher 2. Affected shaders (7758): Max Waves: 57437 -> 58231 (1.38 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-23 23:43:38 +01:00
Samuel Pitoiset	e1ea70d9f3	radeonsi: replace si_shader_context::soa by bld_base We no longer need to use lp_build_tgsi_soa_context. No regressions founds with full piglit run. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-13 10:41:08 +01:00
Samuel Pitoiset	ecf04b84e5	radeonsi: replace ctx->soa.outputs by ctx->outputs The plan is to replace si_shader_context::soa with its parent structure (ie. bld_base). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-13 10:41:06 +01:00
Samuel Pitoiset	f04088a7ba	radeonsi: move si_shader_context::soa::addr to si_shader_context The plan is to replace si_shader_context::soa with its parent structure (ie. bld_base). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-13 10:41:02 +01:00
Samuel Pitoiset	6f0d955b6d	radeonsi: allocate the array of immediates dynamically Currently, we can store up to 256 immediates in a static array, but this is not always enough. Instead, allocate a dynamic array like what we currently do for temps. This fixes a segfault with dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 No regressions found with full piglit run. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-13 10:40:57 +01:00
Nicolai Hähnle	a0ce09b4b2	amd/common: unify cube map coordinate handling between radeonsi and radv Code is taken from a combination of radv (for the more basic functions, to avoid gallivm dependencies) and radeonsi (for the new and improved derivative calculations). v2: add 0.5 offset to tex coords only after derivative calculation v3: - really only touch the first three coordinates - rebase on the removal of the 1.5 --> 0.5 offset change Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-13 00:39:10 +01:00
Marek Olšák	cac74a9bcc	radeonsi: fix the Witcher 2 black transitions v2: do it properly Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-09 12:01:30 +01:00
Marek Olšák	5b85a6b3f7	radeonsi: set si_shader_context::input_decls for ranged decls correctly This has no effect because no code uses those members with ranged decls. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-09 12:01:30 +01:00
Marek Olšák	358079da2d	radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUG Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-15 19:17:56 +01:00
Marek Olšák	171e349782	radeonsi: fold some shader context initialization to si_llvm_context_init Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-15 19:17:56 +01:00
Nicolai Hähnle	0b9bba7f6c	radeonsi: pass the function name to si_llvm_create_func We will use multiple functions in one module, so they should have different names. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:54 +01:00
Nicolai Hähnle	23dfb688ba	radeonsi: add always-inline pass to si_llvm_finalize_module Change the pass manager as well, since this is a module-level pass. No noticeable run-time difference on shader-db. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:42 +01:00
Marek Olšák	21af69e753	radeonsi: rename prefixes from radeon to si Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:08 +02:00
Marek Olšák	6e475fefa1	radeonsi: merge radeon_llvm_context and si_shader_context Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:06 +02:00
Marek Olšák	5ab25bb4ba	radeonsi: import all TGSI->LLVM code from gallium/radeon Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:04 +02:00

47 commits