fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-03 13:40:11 +01:00

Author	SHA1	Message	Date
Tom Stellard	dc7cf07af3	radeon/llvm: Add TargetLibraryInfo to the pass manager This will prevent optimization passes from introducing unsupported library calls. Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-17 19:06:41 +00:00
Tom Stellard	4f351a6cb1	radeon/llvm: Set the target triple on the module Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-17 19:06:41 +00:00
Nicolai Hähnle	5aafc169ca	gallium/radeon: emit LLVM `ret void` before radeon_llvm_finalize_module This allows dumping a consumable LLVM module before the initial optimization passes are run. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-02-05 09:21:54 -05:00
Marek Olšák	bff640b3e0	radeonsi: implement PK2H and UP2H opcodes Based on a gallivm patch by Ilia Mirkin. +8 piglit regressions due to precision issues (I blame the tests) The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead of emulation with integer instructions. They are GLSL 4.00 intrinsics. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-02-04 19:52:28 +01:00
Marek Olšák	b3bac55621	radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2 Requested by Matt Arsenault. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-22 22:05:42 +01:00
Michel Dänzer	d094631936	radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now llvm.exp2.f32 doesn't work in some cases yet. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-24 18:07:48 +09:00
Marek Olšák	7c10af6425	radeonsi: don't use the AMDGPU intrinsic for CMP No difference according to shader-db. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-10-17 21:40:04 +02:00
Marek Olšák	f2cdb68c8b	radeonsi: use LRP from gallivm Totals: SGPRS: 344552 -> 344368 (-0.05 %) VGPRS: 197132 -> 197552 (0.21 %) Code Size: 7375376 -> 7366304 (-0.12 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave Totals from affected shaders: SGPRS: 47736 -> 47552 (-0.39 %) VGPRS: 27952 -> 28372 (1.50 %) Code Size: 1392724 -> 1383652 (-0.65 %) bytes LDS: 39 -> 39 (0.00 %) blocks Scratch: 513024 -> 449536 (-12.38 %) bytes per wave Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-17 21:40:04 +02:00
Marek Olšák	eb11efc989	radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max No difference according to shader-db. (with the new S_ABS_I32 pattern) Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-10-17 21:40:04 +02:00
Marek Olšák	d72a26ec5d	radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC No difference according to shader-db. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-10-17 21:40:04 +02:00
Marek Olšák	6660ca7121	radeonsi: initialize output, temp, and address registers to "undef" This removes "v_mov v0, 0" which typically occurs before exports. Totals: SGPRS: 345216 -> 344552 (-0.19 %) VGPRS: 197684 -> 197132 (-0.28 %) Code Size: 7390408 -> 7375376 (-0.20 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave Totals from affected shaders: SGPRS: 101336 -> 100672 (-0.66 %) VGPRS: 53920 -> 53368 (-1.02 %) Code Size: 2170176 -> 2155144 (-0.69 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-10-17 21:40:03 +02:00
Marek Olšák	e6d3846dd0	gallium/radeon: drop support for LLVM 3.4 This allows using the new tex instrinsics unconditionally. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-09-10 17:14:15 +02:00
Marek Olšák	cc59c78b0a	gallium/radeon: always use the llvm. prefix in intrinsic names Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-08-06 20:44:35 +02:00
Marek Olšák	1bbe408363	radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC There are 2 reasons for this: - LLVM optimization passes can work with floor - there are patterns to select v_fract from floor anyway There is no change in the generated code.	2015-07-31 16:49:16 +02:00
Marek Olšák	12a197b2d5	gallium/radeon: don't use rsq_action Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-07-31 16:49:16 +02:00
Marek Olšák	681dbcf690	gallium/radeon: move r600-specific code to r600g Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-07-31 16:49:16 +02:00
Marek Olšák	9a4c57afe4	gallium/radeon: remove unused variables and old comments Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-07-31 16:49:16 +02:00
Marek Olšák	b9dad585e6	gallium/radeon: remove build_intrinsic and build_tgsi_intrinsic duplicated now Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-07-31 16:49:16 +02:00
Marek Olšák	9deb614cac	radeonsi: fix GLSL textureGrad(samplerCube*) functions +4 piglits Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-25 10:38:14 +02:00
Marek Olšák	1bc0fba572	gallium/radeon: expose emit_fetch Radeonsi will use this.	2015-07-23 00:59:31 +02:00
Marek Olšák	a3be59b4a9	gallium/radeon: expose LLVM functions implementing emit_store emit_store will be reimplemented for tessellation control shader outputs where only radeon_llvm_saturate will be used, but radeonsi will want to fall back to radeon_llvm_emit_store for other register types. This exposes both functions.	2015-07-23 00:59:31 +02:00
Dave Airlie	de5c2b6f2b	radeonsi: direct emit intrinsic for DFRAC. Michel reported this still failed, and this fixed it Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-07-13 09:21:43 +01:00
Dave Airlie	4cbf0a0ccf	radeonsi: ARB_gpu_shader_fp64 + ARB_vertex_attrib_64bit support. This adds the translation from TGSI to AMDGPU llvm backend, for the 64-bit opcodes. The backend pretty much handles everything for us fine. There is one patch required for SI DFRAC support, that I know off. [airlied: fixed missing comma, updated relnotes] Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-07-12 22:40:51 +01:00
Marek Olšák	7116250b7a	radeon/llvm: reset temps_count on deallocation Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-05-29 11:52:44 +02:00
Marek Olšák	7afc992c20	radeon/llvm: don't use a static array size for radeon_llvm_context::arrays (v2) v2: - don't use realloc (tgsi_shader_info provides the size) Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-05-29 11:52:44 +02:00
Marek Olšák	e1c4e8aaaa	gallium: remove TGSI_SAT_MINUS_PLUS_ONE It's a remnant of some old NV extension. Unused. I also have a patch that removes predicates if anyone is interested. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-05-20 15:40:46 +02:00
Marek Olšák	ecc7f2ed91	gallium/radeon: don't crash when getting out-of-bounds TEMP references Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-23 16:14:39 +02:00
Tom Stellard	e0994e0f97	radeon/llvm: Improve codegen for KILL_IF Rather than emitting one kill instruction per component of KILL_IF's src reg, we now or the components of the src register together and use the result as a condition for just one kill instruction. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 34872 -> 34848 (-0.07 %) VGPRS: 20696 -> 20676 (-0.10 %) Code Size: 749032 -> 748452 (-0.08 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 1184 -> 1160 (-2.03 %) VGPRS: 600 -> 580 (-3.33 %) Code Size: 13200 -> 12620 (-4.39 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Increases: SGPRS: 2 (0.00 %) VGPRS: 0 (0.00 %) Code Size: 0 (0.00 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 5 (0.01 %) VGPRS: 5 (0.01 %) Code Size: 25 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 40 (25.00 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 116 -> 96 (-17.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 64 -> 72 (12.50 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 424 -> 356 (-16.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:12 +00:00
Tom Stellard	c6d79ed289	radeon/llvm: Run LLVM's instruction combining pass This should improve code quality in general and will help with some future changes to how we emit kill instructions. shader-db shows a few regressions, but these don't seem to be the result of deficiencies in instcombine. They're mostly caused by the scheduler making different decisions than before. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 35056 -> 34872 (-0.52 %) VGPRS: 20624 -> 20696 (0.35 %) Code Size: 764372 -> 749032 (-2.01 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 13264 -> 13072 (-1.45 %) VGPRS: 8248 -> 8316 (0.82 %) Code Size: 486320 -> 470992 (-3.15 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 11264 -> 11264 (0.00 %) bytes per wave Increases: SGPRS: 6 (0.01 %) VGPRS: 20 (0.02 %) Code Size: 14 (0.01 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 32 (0.03 %) VGPRS: 8 (0.01 %) Code Size: 244 (0.25 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 12 -> 20 (66.67 %) Code Size: 216 -> 224 (3.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 40 -> 32 (-20.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 368 -> 280 (-23.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 28 -> 36 (28.57 %) Code Size: 39320 -> 40132 (2.07 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 72 -> 64 (-11.11 %) VGPRS: 48 -> 40 (-16.67 %) Code Size: 6272 -> 5852 (-6.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:05 +00:00
Marek Olšák	a984abdad3	radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.) Discovered by Coverity. Reported by Ilia Mirkin. Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-18 12:04:27 +01:00
Marek Olšák	b5f19db976	radeonsi: implement TGSI_OPCODE_BFI (v2) v2: Don't use the intrinsics, the shader backend can recognize these patterns and generates optimal code automatically. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 14:58:19 +01:00
Marek Olšák	955ebf2890	radeonsi: add support for easy opcodes from ARB_gpu_shader5 I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).	2015-03-16 12:54:18 +01:00
Marek Olšák	755a2907a3	radeonsi: implement bit-finding opcodes from ARB_gpu_shader5 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	f9fd0c4a55	radeonsi: add support for SQRT Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	d73c1c1304	radeonsi: add support for FMA Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	dfea35666e	gallium/radeon: don't use LLVMReadOnlyAttribute for ALU None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	d1d2af2398	radeonsi: use ordered compares for SSG and face selection Ordered compares are what you have in C. Unordered compares are the result of negating ordered compares (they return true if either argument is NaN). That special NaN behavior is completely useless here, and unordered compares produce horrible code with all stable LLVM versions. (I think that has been fixed in LLVM git) Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Michel Dänzer	402ab50bed	radeon/llvm: Dynamically allocate branch/loop stack arrays This prevents us from silently overflowing the stack arrays, and allows arbitrary stack depths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85454 Cc: mesa-stable@lists.freedesktop.org Reported-and-Tested-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-10-29 19:01:25 +09:00
Marek Olšák	8067732740	radeonsi: remove shader->input[] and output[] arrays and dependencies They were reinventing tgsi_shader_info. They are unused now. radeon_llvm_context::load_input can be NULL if input fetching is implemented in some other way. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:53:57 +02:00
Tom Stellard	b9f501bc6b	radeon/llvm: Use the llvm.rsq.clamped intrinsic for RSQ Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Laurent Carlier <lordheavym@gmail.com> https://bugs.freedesktop.org/show_bug.cgi?id=80015 CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>	2014-07-02 14:59:29 -04:00
Michel Dänzer	93b6b1fa83	radeon/llvm: Adapt to AMDGPU.rsq intrinsic change in LLVM 3.5 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>	2014-06-19 09:58:03 -04:00
Marek Olšák	bd2df40a84	radeon/llvm: add support for non-scalar system values The sample position is one of them. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-05-10 13:58:46 +02:00
Marek Olšák	559af1df10	gallium/radeon: fix warnings	2014-02-06 17:43:29 +01:00
Michel Dänzer	404b29d765	radeonsi: Initial geometry shader support Partly based on the corresponding r600g work by Vadim Girlin and Dave Airlie. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:06:28 +09:00
Vincent Lejeune	797894036d	r600/llvm: Allow arbitrary amount of temps in tgsi to llvm	2013-12-07 18:39:10 +01:00
Aaron Watry	df482fe02f	radeon/llvm: fix spelling error Reviewed-by: Tom Stellard <thomas.stellard@amd.com> CC: "10.0" <mesa-stable@lists.freedesktop.org>	2013-11-15 09:16:49 -08:00
Marek Olšák	900b1863c8	radeon/llvm: fix TGSI_OPCODE_UCMP This doesn't fix any known issue (I haven't run piglit with this yet), but the code was obviously completely wrong. It looks like copy-pasted from CMP. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-09-29 14:49:23 +02:00
Marek Olšák	028b26e2ef	radeon/llvm: fix shadow cube texturing for GL3.0 The fix is at the end (TGSI_TEXTURE_SHADOWCUBE handling), but I also restructured the code for it to be more readable. Fixes spec/!OpenGL 3.0/sampler-cube-shadow. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2013-09-25 20:45:23 +02:00
Roland Scheidegger	7727fbb7c5	r600/radeonsi: implement new float comparison instructions Also use ordered comparisons for old cmp instructions. Tested-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Tom Stellard <tom@stellard.net>	2013-08-15 00:40:14 +02:00
Brian Paul	46205ab8cc	tgsi: rename the TGSI fragment kill opcodes TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-12 08:32:51 -06:00

1 2

94 commits