Commit graph

94 commits

Author SHA1 Message Date
Tom Stellard
dc7cf07af3 radeon/llvm: Add TargetLibraryInfo to the pass manager
This will prevent optimization passes from introducing unsupported
library calls.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Tom Stellard
4f351a6cb1 radeon/llvm: Set the target triple on the module
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Nicolai Hähnle
5aafc169ca gallium/radeon: emit LLVM ret void before radeon_llvm_finalize_module
This allows dumping a consumable LLVM module before the initial optimization
passes are run.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:54 -05:00
Marek Olšák
bff640b3e0 radeonsi: implement PK2H and UP2H opcodes
Based on a gallivm patch by Ilia Mirkin.

+8 piglit regressions due to precision issues (I blame the tests)

The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead
of emulation with integer instructions. They are GLSL 4.00 intrinsics.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-04 19:52:28 +01:00
Marek Olšák
b3bac55621 radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2
Requested by Matt Arsenault.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Michel Dänzer
d094631936 radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now
llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-24 18:07:48 +09:00
Marek Olšák
7c10af6425 radeonsi: don't use the AMDGPU intrinsic for CMP
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
f2cdb68c8b radeonsi: use LRP from gallivm
Totals:
SGPRS: 344552 -> 344368 (-0.05 %)
VGPRS: 197132 -> 197552 (0.21 %)
Code Size: 7375376 -> 7366304 (-0.12 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave

Totals from affected shaders:
SGPRS: 47736 -> 47552 (-0.39 %)
VGPRS: 27952 -> 28372 (1.50 %)
Code Size: 1392724 -> 1383652 (-0.65 %) bytes
LDS: 39 -> 39 (0.00 %) blocks
Scratch: 513024 -> 449536 (-12.38 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
eb11efc989 radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max
No difference according to shader-db. (with the new S_ABS_I32 pattern)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
d72a26ec5d radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
6660ca7121 radeonsi: initialize output, temp, and address registers to "undef"
This removes "v_mov v0, 0" which typically occurs before exports.

Totals:
SGPRS: 345216 -> 344552 (-0.19 %)
VGPRS: 197684 -> 197132 (-0.28 %)
Code Size: 7390408 -> 7375376 (-0.20 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave

Totals from affected shaders:
SGPRS: 101336 -> 100672 (-0.66 %)
VGPRS: 53920 -> 53368 (-1.02 %)
Code Size: 2170176 -> 2155144 (-0.69 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
e6d3846dd0 gallium/radeon: drop support for LLVM 3.4
This allows using the new tex instrinsics unconditionally.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
cc59c78b0a gallium/radeon: always use the llvm. prefix in intrinsic names
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-08-06 20:44:35 +02:00
Marek Olšák
1bbe408363 radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC
There are 2 reasons for this:
- LLVM optimization passes can work with floor
- there are patterns to select v_fract from floor anyway

There is no change in the generated code.
2015-07-31 16:49:16 +02:00
Marek Olšák
12a197b2d5 gallium/radeon: don't use rsq_action
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-07-31 16:49:16 +02:00
Marek Olšák
681dbcf690 gallium/radeon: move r600-specific code to r600g
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-07-31 16:49:16 +02:00
Marek Olšák
9a4c57afe4 gallium/radeon: remove unused variables and old comments
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-07-31 16:49:16 +02:00
Marek Olšák
b9dad585e6 gallium/radeon: remove build_intrinsic and build_tgsi_intrinsic
duplicated now

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-07-31 16:49:16 +02:00
Marek Olšák
9deb614cac radeonsi: fix GLSL textureGrad(samplerCube*) functions
+4 piglits

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-25 10:38:14 +02:00
Marek Olšák
1bc0fba572 gallium/radeon: expose emit_fetch
Radeonsi will use this.
2015-07-23 00:59:31 +02:00
Marek Olšák
a3be59b4a9 gallium/radeon: expose LLVM functions implementing emit_store
emit_store will be reimplemented for tessellation control shader outputs
where only radeon_llvm_saturate will be used, but radeonsi will want to
fall back to radeon_llvm_emit_store for other register types.

This exposes both functions.
2015-07-23 00:59:31 +02:00
Dave Airlie
de5c2b6f2b radeonsi: direct emit intrinsic for DFRAC.
Michel reported this still failed, and this fixed it

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-07-13 09:21:43 +01:00
Dave Airlie
4cbf0a0ccf radeonsi: ARB_gpu_shader_fp64 + ARB_vertex_attrib_64bit support.
This adds the translation from TGSI to AMDGPU llvm backend, for the
64-bit opcodes. The backend pretty much handles everything for us
fine. There is one patch required for SI DFRAC support, that I know
off.

[airlied: fixed missing comma, updated relnotes]

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-07-12 22:40:51 +01:00
Marek Olšák
7116250b7a radeon/llvm: reset temps_count on deallocation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-29 11:52:44 +02:00
Marek Olšák
7afc992c20 radeon/llvm: don't use a static array size for radeon_llvm_context::arrays (v2)
v2: - don't use realloc (tgsi_shader_info provides the size)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-29 11:52:44 +02:00
Marek Olšák
e1c4e8aaaa gallium: remove TGSI_SAT_MINUS_PLUS_ONE
It's a remnant of some old NV extension. Unused.

I also have a patch that removes predicates if anyone is interested.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-05-20 15:40:46 +02:00
Marek Olšák
ecc7f2ed91 gallium/radeon: don't crash when getting out-of-bounds TEMP references
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-23 16:14:39 +02:00
Tom Stellard
e0994e0f97 radeon/llvm: Improve codegen for KILL_IF
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:12 +00:00
Tom Stellard
c6d79ed289 radeon/llvm: Run LLVM's instruction combining pass
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.

shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine.  They're mostly caused by the scheduler
making different decisions than before.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave

Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:05 +00:00
Marek Olšák
a984abdad3 radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords
radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)

Discovered by Coverity. Reported by Ilia Mirkin.

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-18 12:04:27 +01:00
Marek Olšák
b5f19db976 radeonsi: implement TGSI_OPCODE_BFI (v2)
v2: Don't use the intrinsics, the shader backend can recognize these
    patterns and generates optimal code automatically.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 14:58:19 +01:00
Marek Olšák
955ebf2890 radeonsi: add support for easy opcodes from ARB_gpu_shader5
I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.

In the isel DAG, lowered BFE has 27 nodes (including leafs).
2015-03-16 12:54:18 +01:00
Marek Olšák
755a2907a3 radeonsi: implement bit-finding opcodes from ARB_gpu_shader5
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
f9fd0c4a55 radeonsi: add support for SQRT
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
d73c1c1304 radeonsi: add support for FMA
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
dfea35666e gallium/radeon: don't use LLVMReadOnlyAttribute for ALU
None of the instructions use a pointer argument.
(+ small cosmetic changes)

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
d1d2af2398 radeonsi: use ordered compares for SSG and face selection
Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).

That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Michel Dänzer
402ab50bed radeon/llvm: Dynamically allocate branch/loop stack arrays
This prevents us from silently overflowing the stack arrays, and allows
arbitrary stack depths.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85454

Cc: mesa-stable@lists.freedesktop.org
Reported-and-Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-10-29 19:01:25 +09:00
Marek Olšák
8067732740 radeonsi: remove shader->input[] and output[] arrays and dependencies
They were reinventing tgsi_shader_info. They are unused now.

radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-12 23:53:57 +02:00
Tom Stellard
b9f501bc6b radeon/llvm: Use the llvm.rsq.clamped intrinsic for RSQ
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>

https://bugs.freedesktop.org/show_bug.cgi?id=80015

CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
2014-07-02 14:59:29 -04:00
Michel Dänzer
93b6b1fa83 radeon/llvm: Adapt to AMDGPU.rsq intrinsic change in LLVM 3.5
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
2014-06-19 09:58:03 -04:00
Marek Olšák
bd2df40a84 radeon/llvm: add support for non-scalar system values
The sample position is one of them.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-05-10 13:58:46 +02:00
Marek Olšák
559af1df10 gallium/radeon: fix warnings 2014-02-06 17:43:29 +01:00
Michel Dänzer
404b29d765 radeonsi: Initial geometry shader support
Partly based on the corresponding r600g work by Vadim Girlin and Dave
Airlie.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-29 11:06:28 +09:00
Vincent Lejeune
797894036d r600/llvm: Allow arbitrary amount of temps in tgsi to llvm 2013-12-07 18:39:10 +01:00
Aaron Watry
df482fe02f radeon/llvm: fix spelling error
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

CC: "10.0" <mesa-stable@lists.freedesktop.org>
2013-11-15 09:16:49 -08:00
Marek Olšák
900b1863c8 radeon/llvm: fix TGSI_OPCODE_UCMP
This doesn't fix any known issue (I haven't run piglit with this yet),
but the code was obviously completely wrong. It looks like copy-pasted from CMP.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-09-29 14:49:23 +02:00
Marek Olšák
028b26e2ef radeon/llvm: fix shadow cube texturing for GL3.0
The fix is at the end (TGSI_TEXTURE_SHADOWCUBE handling), but I also
restructured the code for it to be more readable.

Fixes spec/!OpenGL 3.0/sampler-cube-shadow.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-25 20:45:23 +02:00
Roland Scheidegger
7727fbb7c5 r600/radeonsi: implement new float comparison instructions
Also use ordered comparisons for old cmp instructions.

Tested-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Tom Stellard <tom@stellard.net>
2013-08-15 00:40:14 +02:00
Brian Paul
46205ab8cc tgsi: rename the TGSI fragment kill opcodes
TGSI_OPCODE_KIL and KILP had confusing names.  The former was conditional
kill (if any src component < 0).  The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.

This patch renames both opcodes:
  TGSI_OPCODE_KIL -> KILL_IF   (kill if src.xyzw < 0)
  TGSI_OPCODE_KILP -> KILL     (unconditional kill)

Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.

I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up.  Driver authors should review their code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12 08:32:51 -06:00