Commit graph

47 commits

Author SHA1 Message Date
Marek Olšák
facfab28fe radeonsi/gfx9: add workarounds to avoid VGPR indexing completely
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.

In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
4560f2b90a radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
ece0c0439f radeonsi: don't call gallivm_init_llvm_targets
It's for initializing the native (x86) target.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Juan A. Suarez Romero
a625d58ee1 radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0
LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0.

Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass)

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-06-08 23:32:32 +02:00
Marek Olšák
257b538fd2 radeonsi: do EarlyCSEMemSSA LLVM pass
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.

This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)

The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.

EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.

SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:09 +02:00
Marek Olšák
b8f8d9e46c radeonsi: clamp indirect index to the number of declared shader resources
We'll do partial uploads of descriptor arrays, so we need to clamp
against what shaders declare.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Nicolai Hähnle
c485b47383 radeonsi: extract TGSI memory/texture opcode handling into its own file
It's about time to get the growth of si_shader.c somewhat under control.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:55 +02:00
Marek Olšák
e107c5a426 radeonsi/gfx9: set correct LLVM calling conventions for merged shaders
for scratch support

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
2d662c0cba radeonsi: inline si_llvm_shader_type into si_llvm_create_func
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
f8f8242e8b radeonsi: fold surrounding code into si_llvm_finalize_module
and rename to si_llvm_optimize_module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
12beef0374 radeonsi: drop support for LLVM 3.8
LLVM 3.8:
- had broken indirect resource indexing
- didn't have scratch coalescing
- was the last user of problematic v16i8
- only supported OpenGL 4.1

This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
4d32b4ac99 radeonsi: stop using v16i8
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
130e198c49 radeonsi: separate out TGSI initialization of si_shader_context
so that we can put multiple different TGSI shaders into one module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Nicolai Hähnle
24d4fbe226 radeonsi: strengthen emit_optimization_barrier
LLVM will lift inline assembly out of if-else-blocks if both paths have
the same inline assembly. Prevent this by adding an irrelevant unique
text to the assembly.

This requires the LLVM assembly parser to be initialized.

Furthermore, allow forcing subsequent computations to happen after the
optimization barrier by defining a data dependency.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
4cf2942777 radeonsi: support 64-bit system values
For simplicitly, always store system values as 32-bit values or arrays
of 32-bit values. 64-bit values are unpacked and packed accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Marek Olšák
6ca46c3d77 radeonsi: access gallivm through ctx in most places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:21 +02:00
Marek Olšák
04e4fe594b radeonsi: use ctx->types instead of bld->types etc.
even vec_type is f32.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:19 +02:00
Marek Olšák
7a5e6dcba5 radeonsi: use i32_0/1 instead of *int_bld.zero/one in most places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:16 +02:00
Marek Olšák
29adaa19ac radeonsi: remove most uses of lp_build_const*
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
474468fbf9 radeonsi/gfx9: disable features that don't work
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Samuel Pitoiset
7751ed39e4 radeonsi: disable sinking common instructions down to the end block
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.

Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.

This also fixes a rendering issue with the speedometer in Dirt Rally.

More information can be found here https://reviews.llvm.org/D26348.

Thanks to Dave Airlie for the patch.

v2: - add a FIXME comment
    - use if (HAVE_LLVM >= 0x0400) instead

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:24:40 +01:00
Marek Olšák
7e1faa79d3 radeonsi: drop support for LLVM 3.6 & 3.7
They are too old.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Marek Olšák
7f1446a8a1 ac: normalize build helper names
s/emit/build/

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Timothy Arceri
d90bf4ef3e radeon: remove unused radeon_elf_util.{c,h}
We now use the shared code in AMD common instead.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
dc4c551a34 radeon/ac: switch from radeon_elf_read() to ac_elf_read()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
69a687189e radeon/ac: switch from radeon_shader_binary to ac_shader_binary
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Marek Olšák
52581606c2 radeonsi: set no-signed-zeros-fp-math
Recommended by Matt Arsenault.

46757 shaders in 28742 tests
Totals:
SGPRS: 2068851 -> 2066907 (-0.09 %)
VGPRS: 1604056 -> 1602676 (-0.09 %)
Spilled SGPRs: 1402 -> 1382 (-1.43 %)
Spilled VGPRs: 113 -> 113 (0.00 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 3224 -> 3188 (-1.12 %) dwords per thread
Code Size: 58815520 -> 58716788 (-0.17 %) bytes
LDS: 1162 -> 1162 (0.00 %) blocks
Max Waves: 354616 -> 354905 (0.08 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 786452 -> 784508 (-0.25 %)
VGPRS: 530000 -> 528620 (-0.26 %)
Spilled SGPRs: 958 -> 938 (-2.09 %)
Spilled VGPRs: 85 -> 85 (0.00 %)
Private memory VGPRs: 636 -> 636 (0.00 %)
Scratch size: 1880 -> 1844 (-1.91 %) dwords per thread
Code Size: 26349936 -> 26251204 (-0.37 %) bytes
LDS: 304 -> 304 (0.00 %) blocks
Max Waves: 108962 -> 109251 (0.27 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
fd3e73f54e gallivm: add no-signed-zeros-fp-math option to lp_create_builder (v2)
v2: define lp_float_mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
660b55e6d9 radeonsi: stop using TGSI_OPCODE_CLAMP by moving it amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
dbd38f2a92 radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Tom Stellard
226a2c6d6e radeonsi: Fix build on LLVM < 3.9 v2
This was broken by: e0cc0a614c

v2:
  - Use preprocessor macro

Tested-by: Mark Janes <mark.a.janes@intel.com>
2017-02-01 02:10:00 +00:00
Tom Stellard
e0cc0a614c radeonsi: Set datalayout on the llvm module
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.

This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-31 20:39:30 +00:00
Marek Olšák
59c5da40ed radeonsi: preload PS inputs only if KILL is used
so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.

Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Samuel Pitoiset
e1ea70d9f3 radeonsi: replace si_shader_context::soa by bld_base
We no longer need to use lp_build_tgsi_soa_context.

No regressions founds with full piglit run.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:08 +01:00
Samuel Pitoiset
ecf04b84e5 radeonsi: replace ctx->soa.outputs by ctx->outputs
The plan is to replace si_shader_context::soa with its parent
structure (ie. bld_base).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:06 +01:00
Samuel Pitoiset
f04088a7ba radeonsi: move si_shader_context::soa::addr to si_shader_context
The plan is to replace si_shader_context::soa with its parent
structure (ie. bld_base).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:02 +01:00
Samuel Pitoiset
6f0d955b6d radeonsi: allocate the array of immediates dynamically
Currently, we can store up to 256 immediates in a static array,
but this is not always enough. Instead, allocate a dynamic array
like what we currently do for temps.

This fixes a segfault with
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23

No regressions found with full piglit run.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:40:57 +01:00
Nicolai Hähnle
a0ce09b4b2 amd/common: unify cube map coordinate handling between radeonsi and radv
Code is taken from a combination of radv (for the more basic functions,
to avoid gallivm dependencies) and radeonsi (for the new and improved
derivative calculations).

v2: add 0.5 offset to tex coords only after derivative calculation

v3:
- really only touch the first three coordinates
- rebase on the removal of the 1.5 --> 0.5 offset change

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:10 +01:00
Marek Olšák
cac74a9bcc radeonsi: fix the Witcher 2 black transitions
v2: do it properly

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Marek Olšák
5b85a6b3f7 radeonsi: set si_shader_context::input_decls for ranged decls correctly
This has no effect because no code uses those members with ranged decls.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Marek Olšák
358079da2d radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUG
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Marek Olšák
171e349782 radeonsi: fold some shader context initialization to si_llvm_context_init
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Nicolai Hähnle
0b9bba7f6c radeonsi: pass the function name to si_llvm_create_func
We will use multiple functions in one module, so they should have
different names.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:54 +01:00
Nicolai Hähnle
23dfb688ba radeonsi: add always-inline pass to si_llvm_finalize_module
Change the pass manager as well, since this is a module-level pass. No
noticeable run-time difference on shader-db.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:42 +01:00
Marek Olšák
21af69e753 radeonsi: rename prefixes from radeon to si
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:08 +02:00
Marek Olšák
6e475fefa1 radeonsi: merge radeon_llvm_context and si_shader_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:06 +02:00
Marek Olšák
5ab25bb4ba radeonsi: import all TGSI->LLVM code from gallium/radeon
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:04 +02:00
Renamed from src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c (Browse further)