fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-01 14:00:16 +01:00

Author	SHA1	Message	Date
Nicolai Hähnle	1e9476e8c5	gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsics Caught by R600_DEBUG=checkir (next commit). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:28 +02:00
Nicolai Hähnle	1b6fb88ab2	gallium/radeon: unify the creation of basic blocks This changes the order of basic blocks to be equal to the order of code in the original TGSI, which is nice for making sense of shader dumps. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:25 +02:00
Nicolai Hähnle	d377f4c1ca	gallium/radeon: merge branch and loop flow control stacks Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:21 +02:00
Nicolai Hähnle	b0d50e157d	gallium/radeon: simplify if/else/endif blocks In particular, we no longer emit an else block when there is no ELSE instruction. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:18 +02:00
Nicolai Hähnle	89e9de2ea6	gallium/radeon: label basic blocks by the corresponding TGSI pc Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:15 +02:00
Nicolai Hähnle	6f87d7a146	gallium/radeon: cleanup and fix branch emits Some of the existing code is needlessly complicated. The basic principle should be: control-flow opcodes emit branches to properly terminate the current block, _unless_ the current block already has a terminator (which happens if and only if there was a BRK or CONT). This also fixes a bug where multiple terminators were created in a block. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-04 16:39:10 +02:00
Dave Airlie	4207612f9c	radeonsi: prepare 64-bit integer support. (v2) v2: - no PIPE_CAP_INT64 yet - emit DIV/MOD without the divide-by-zero workaround Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-21 10:24:38 +02:00
Marek Olšák	ab29788250	radeonsi: reload PS inputs with direct indexing at each use (v2) The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-14 12:33:00 +02:00
Marek Olšák	a491b9e945	radeonsi: don't use allocas for arrays with LLVM 3.8 It crashes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413	2016-08-25 21:19:17 +02:00
Marek Olšák	07ccec002b	radeonsi: initialize and finalize the LLVM function pass manager Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2016-08-18 21:36:03 +02:00
Nicolai Hähnle	c5798d6314	gallium/radeon: use lp_build_alloca_undef Avoid building all those store 0 / store undef instruction pairs that end up getting removed anyway. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:25 +02:00
Nicolai Hähnle	f4204ba53d	gallium/radeon: protect against out of bounds temporary array accesses They can lead to VM faults and worse, which goes against the GL robustness promises. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:24 +02:00
Nicolai Hähnle	ea283779be	gallium/radeon: add radeon_llvm_bound_index for bounds checking Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:24 +02:00
Nicolai Hähnle	8916d1e2fa	gallium/radeon: reduce alloca of temporaries based on usagemask v2: take actual writemasks into account Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:24 +02:00
Nicolai Hähnle	6bba956073	gallium/radeon: use tgsi_scan_arrays for temp arrays Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	7c2295d7ef	gallium/radeon: allocate temps array info in radeon_llvm_context_init Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	850c8dcc9c	gallium/radeon: always do the full store in store_value_to_array Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	4b150931c9	gallium/radeon: extract common getelementptr logic into get_pointer_into_array Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	dfbb8ea284	gallium/radeon: pass indirect register info into get_alloca_for_array To have the same signature as get_array_range. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	b76aabffa2	gallium/radeon: extract common lookup code into get_temp_array function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:23 +02:00
Nicolai Hähnle	fa84296a5a	gallium/radeon: clarify the comment on the array alloca heuristic Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:22 +02:00
Nicolai Hähnle	92b66b38c9	gallium/radeon: more descriptive names for LLVM temporaries in debug builds Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:22 +02:00
Nicolai Hähnle	eacfc86d83	gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:22 +02:00
Nicolai Hähnle	87fa7cea23	gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:22 +02:00
Nicolai Hähnle	eb50cbf3bd	gallium/radeon: clean up emit_declaration for temporaries In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-17 12:11:22 +02:00
Marek Olšák	c88b309fd5	radeonsi: don't set the last parameter component of llvm.AMDGPU.cube LLVM doesn't use it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-03 17:46:46 +02:00
Marek Olšák	42c5f839ad	radeonsi: use llvm.amdgcn.cube* if available Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-03 17:46:46 +02:00
Marek Olšák	1fb6e55eaf	radeonsi: use llvm.amdgcn.rsq.f64 if available Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-03 17:46:46 +02:00
Marek Olšák	db2d31dab1	radeonsi: use v_mad_f32 for fma v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem to use fma for all d3d multiply-add instructions, which is silly. This tries to restore performance for those games. The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't support denormals, which we don't enable anyway, because they are slow too. Also, there is code size reduction: Totals from affected shaders: VGPRS: 109796 -> 109808 (0.01 %) Spilled SGPRs: 29995 -> 30022 (0.09 %) Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13 Code Size: 6667596 -> 6476356 (-2.87 %) bytes Max Waves: 26931 -> 26899 (-0.12 %) I've not actually tested real performance. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-03 17:46:46 +02:00
Marek Olšák	c98c732158	radeon/llvm: Use alloca instructions for larger arrays [revert a revert] This reverts commit `f84e9d749f`. Bioshock Infinite no longer hangs.	2016-07-26 23:31:56 +02:00
Marek Olšák	f84e9d749f	Revert "radeon/llvm: Use alloca instructions for larger arrays" This reverts commit `513fccdfb6`. Bioshock Infinite hangs with that.	2016-07-14 22:15:08 +02:00
Marek Olšák	f2f573e777	gallium/radeon: normalize the code style no change in behavior Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Tom Stellard	513fccdfb6	radeon/llvm: Use alloca instructions for larger arrays We were storing arrays in vectors, which was leading to some really bad spill code for large arrays. allocas instructions are a better fit for arrays and LLVM optimizations are more geared toward dealing with allocas instead of vectors. For arrays that have 16 or less 32-bit elements, we will continue to use vectors, because this will force LLVM to store them in registers and use indirect registers, which is usually faster for small arrays. In the future we should use allocas for all arrays and teach LLVM how to store allocas in registers. This fixes the piglit test: spec/glsl-1.50/execution/geometry/max-input-component Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-06 19:47:38 +00:00
Tom Stellard	02873a7b0c	radeon/llvm: Add helpers for loading and storing data from arrays. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-06 19:47:38 +00:00
Tom Stellard	2dc48984b2	radeon/llvm: Remove uses_temp_indirect_addressing() function bld->indirect_files is never set, so this function always returns false. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-06 19:47:38 +00:00
Marek Olšák	eaccc4e8c8	radeonsi: keep using v_rcp_f32 for division in future LLVM (v2) This will be needed after some LLVM changes that haven't landed yet. v2: - use LLVMIsConstant to fix an LLVM assertion failure. LLVMSetMetadata doesn't work with constants. - don't set float metadata as string Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	7db10093d3	gallium/radeon: boolean -> bool, TRUE -> true, FALSE -> false Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-25 23:13:42 +02:00
Marek Olšák	0e1fefa722	radeonsi: emit 1/sqrt for RSQ We don't need the clamped version and we don't have to use any intrinsic. Stats on Tonga: 15382 shaders in 9128 tests Totals: SGPRS: 1230560 -> 1230560 (0.00 %) VGPRS: 469577 -> 462504 (-1.51 %) Code Size: 22089908 -> 21730052 (-1.63 %) bytes LDS: 598 -> 598 (0.00 %) blocks Scratch: 283648 -> 281600 (-0.72 %) bytes per wave Max Waves: 125664 -> 126969 (1.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 547280 -> 547280 (0.00 %) VGPRS: 269132 -> 262059 (-2.63 %) Code Size: 15709604 -> 15349748 (-2.29 %) bytes LDS: 198 -> 198 (0.00 %) blocks Scratch: 74752 -> 72704 (-2.74 %) bytes per wave Max Waves: 47840 -> 49145 (2.73 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-06-24 12:31:03 +02:00
Dave Airlie	f550b6d296	radeonsi: convert to 64-bitness checks instead of doubles. This converts to testing for 64-bit types and renames some things in anticipation of 64-bit integer support. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-06-11 06:44:21 +10:00
Jan Vesely	47b390fe45	Treewide: Remove Elements() macro Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-17 15:28:04 -04:00
Emil Velikov	9fa2e57a73	gallium/radeon: nuke the final pre LLVM 3.6 codepath Missed with commit `100796c15c` "gallium/radeon: drop support for LLVM 3.5" v2: s/LLVN/LLVM/ in shortlog (Nicolai) Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-05-01 08:57:32 +01:00
Bas Nieuwenhuizen	84a6761ae3	radeonsi: add shared memory Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	ea2bff1d11	gallium/radeon: remove remnants of R600 TGSI->LLVM Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-03-20 00:57:05 +01:00
Marek Olšák	36202182ac	gallium/radeon: add basic code for setting shader return values LLVMBuildInsertValue will be used on return_value. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Tom Stellard	dc7cf07af3	radeon/llvm: Add TargetLibraryInfo to the pass manager This will prevent optimization passes from introducing unsupported library calls. Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-17 19:06:41 +00:00
Tom Stellard	4f351a6cb1	radeon/llvm: Set the target triple on the module Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-17 19:06:41 +00:00
Nicolai Hähnle	5aafc169ca	gallium/radeon: emit LLVM `ret void` before radeon_llvm_finalize_module This allows dumping a consumable LLVM module before the initial optimization passes are run. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-05 09:21:54 -05:00
Marek Olšák	bff640b3e0	radeonsi: implement PK2H and UP2H opcodes Based on a gallivm patch by Ilia Mirkin. +8 piglit regressions due to precision issues (I blame the tests) The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead of emulation with integer instructions. They are GLSL 4.00 intrinsics. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-02-04 19:52:28 +01:00
Marek Olšák	b3bac55621	radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2 Requested by Matt Arsenault. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-22 22:05:42 +01:00
Michel Dänzer	d094631936	radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now llvm.exp2.f32 doesn't work in some cases yet. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-24 18:07:48 +09:00

1 2 3

138 commits