Commit graph

138 commits

Author SHA1 Message Date
Nicolai Hähnle
1e9476e8c5 gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsics
Caught by R600_DEBUG=checkir (next commit).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:28 +02:00
Nicolai Hähnle
1b6fb88ab2 gallium/radeon: unify the creation of basic blocks
This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:25 +02:00
Nicolai Hähnle
d377f4c1ca gallium/radeon: merge branch and loop flow control stacks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:21 +02:00
Nicolai Hähnle
b0d50e157d gallium/radeon: simplify if/else/endif blocks
In particular, we no longer emit an else block when there is no ELSE
instruction.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:18 +02:00
Nicolai Hähnle
89e9de2ea6 gallium/radeon: label basic blocks by the corresponding TGSI pc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:15 +02:00
Nicolai Hähnle
6f87d7a146 gallium/radeon: cleanup and fix branch emits
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).

This also fixes a bug where multiple terminators were created in a block.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:10 +02:00
Dave Airlie
4207612f9c radeonsi: prepare 64-bit integer support. (v2)
v2:
- no PIPE_CAP_INT64 yet
- emit DIV/MOD without the divide-by-zero workaround

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:38 +02:00
Marek Olšák
ab29788250 radeonsi: reload PS inputs with direct indexing at each use (v2)
The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: don't call load_input for fragment shaders in emit_declaration

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-14 12:33:00 +02:00
Marek Olšák
a491b9e945 radeonsi: don't use allocas for arrays with LLVM 3.8
It crashes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
2016-08-25 21:19:17 +02:00
Marek Olšák
07ccec002b radeonsi: initialize and finalize the LLVM function pass manager
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-08-18 21:36:03 +02:00
Nicolai Hähnle
c5798d6314 gallium/radeon: use lp_build_alloca_undef
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:25 +02:00
Nicolai Hähnle
f4204ba53d gallium/radeon: protect against out of bounds temporary array accesses
They can lead to VM faults and worse, which goes against the GL robustness
promises.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
ea283779be gallium/radeon: add radeon_llvm_bound_index for bounds checking
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
8916d1e2fa gallium/radeon: reduce alloca of temporaries based on usagemask
v2: take actual writemasks into account

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
6bba956073 gallium/radeon: use tgsi_scan_arrays for temp arrays
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
7c2295d7ef gallium/radeon: allocate temps array info in radeon_llvm_context_init
Also, prepare for using tgsi_array_info.

This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
850c8dcc9c gallium/radeon: always do the full store in store_value_to_array
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.

This also allows us to get rid of get_alloca_for_array.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
4b150931c9 gallium/radeon: extract common getelementptr logic into get_pointer_into_array
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
dfbb8ea284 gallium/radeon: pass indirect register info into get_alloca_for_array
To have the same signature as get_array_range.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
b76aabffa2 gallium/radeon: extract common lookup code into get_temp_array function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
fa84296a5a gallium/radeon: clarify the comment on the array alloca heuristic
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
92b66b38c9 gallium/radeon: more descriptive names for LLVM temporaries in debug builds
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
eacfc86d83 gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing
We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
87fa7cea23 gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing
We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
eb50cbf3bd gallium/radeon: clean up emit_declaration for temporaries
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Marek Olšák
c88b309fd5 radeonsi: don't set the last parameter component of llvm.AMDGPU.cube
LLVM doesn't use it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
42c5f839ad radeonsi: use llvm.amdgcn.cube* if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
1fb6e55eaf radeonsi: use llvm.amdgcn.rsq.f64 if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
db2d31dab1 radeonsi: use v_mad_f32 for fma
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.

This tries to restore performance for those games.

The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.

Also, there is code size reduction:
  Totals from affected shaders:
  VGPRS: 109796 -> 109808 (0.01 %)
  Spilled SGPRs: 29995 -> 30022 (0.09 %)
  Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
  Code Size: 6667596 -> 6476356 (-2.87 %) bytes
  Max Waves: 26931 -> 26899 (-0.12 %)

I've not actually tested real performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c98c732158 radeon/llvm: Use alloca instructions for larger arrays [revert a revert]
This reverts commit f84e9d749f.

Bioshock Infinite no longer hangs.
2016-07-26 23:31:56 +02:00
Marek Olšák
f84e9d749f Revert "radeon/llvm: Use alloca instructions for larger arrays"
This reverts commit 513fccdfb6.

Bioshock Infinite hangs with that.
2016-07-14 22:15:08 +02:00
Marek Olšák
f2f573e777 gallium/radeon: normalize the code style
no change in behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Tom Stellard
513fccdfb6 radeon/llvm: Use alloca instructions for larger arrays
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays.  allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.

For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.

In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.

This fixes the piglit test:

spec/glsl-1.50/execution/geometry/max-input-component

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Tom Stellard
02873a7b0c radeon/llvm: Add helpers for loading and storing data from arrays.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Tom Stellard
2dc48984b2 radeon/llvm: Remove uses_temp_indirect_addressing() function
bld->indirect_files is never set, so this function always returns false.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Marek Olšák
eaccc4e8c8 radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)
This will be needed after some LLVM changes that haven't landed yet.

v2: - use LLVMIsConstant to fix an LLVM assertion failure.
      LLVMSetMetadata doesn't work with constants.
    - don't set float metadata as string

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
7db10093d3 gallium/radeon: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
0e1fefa722 radeonsi: emit 1/sqrt for RSQ
We don't need the clamped version and we don't have to use any intrinsic.

Stats on Tonga:

15382 shaders in 9128 tests
Totals:
SGPRS: 1230560 -> 1230560 (0.00 %)
VGPRS: 469577 -> 462504 (-1.51 %)
Code Size: 22089908 -> 21730052 (-1.63 %) bytes
LDS: 598 -> 598 (0.00 %) blocks
Scratch: 283648 -> 281600 (-0.72 %) bytes per wave
Max Waves: 125664 -> 126969 (1.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 547280 -> 547280 (0.00 %)
VGPRS: 269132 -> 262059 (-2.63 %)
Code Size: 15709604 -> 15349748 (-2.29 %) bytes
LDS: 198 -> 198 (0.00 %) blocks
Scratch: 74752 -> 72704 (-2.74 %) bytes per wave
Max Waves: 47840 -> 49145 (2.73 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-24 12:31:03 +02:00
Dave Airlie
f550b6d296 radeonsi: convert to 64-bitness checks instead of doubles.
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:21 +10:00
Jan Vesely
47b390fe45 Treewide: Remove Elements() macro
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-17 15:28:04 -04:00
Emil Velikov
9fa2e57a73 gallium/radeon: nuke the final pre LLVM 3.6 codepath
Missed with commit 100796c15c "gallium/radeon: drop support for LLVM
3.5"

v2: s/LLVN/LLVM/ in shortlog (Nicolai)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-01 08:57:32 +01:00
Bas Nieuwenhuizen
84a6761ae3 radeonsi: add shared memory
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
ea2bff1d11 gallium/radeon: remove remnants of R600 TGSI->LLVM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-20 00:57:05 +01:00
Marek Olšák
36202182ac gallium/radeon: add basic code for setting shader return values
LLVMBuildInsertValue will be used on return_value.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Tom Stellard
dc7cf07af3 radeon/llvm: Add TargetLibraryInfo to the pass manager
This will prevent optimization passes from introducing unsupported
library calls.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Tom Stellard
4f351a6cb1 radeon/llvm: Set the target triple on the module
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Nicolai Hähnle
5aafc169ca gallium/radeon: emit LLVM ret void before radeon_llvm_finalize_module
This allows dumping a consumable LLVM module before the initial optimization
passes are run.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:54 -05:00
Marek Olšák
bff640b3e0 radeonsi: implement PK2H and UP2H opcodes
Based on a gallivm patch by Ilia Mirkin.

+8 piglit regressions due to precision issues (I blame the tests)

The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead
of emulation with integer instructions. They are GLSL 4.00 intrinsics.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-04 19:52:28 +01:00
Marek Olšák
b3bac55621 radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2
Requested by Matt Arsenault.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Michel Dänzer
d094631936 radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now
llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-24 18:07:48 +09:00