This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).
This also fixes a bug where multiple terminators were created in a block.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- no PIPE_CAP_INT64 yet
- emit DIV/MOD without the divide-by-zero workaround
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also, prepare for using tgsi_array_info.
This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.
This also allows us to get rid of get_alloca_for_array.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays. allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.
For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.
In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.
This fixes the piglit test:
spec/glsl-1.50/execution/geometry/max-input-component
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be needed after some LLVM changes that haven't landed yet.
v2: - use LLVMIsConstant to fix an LLVM assertion failure.
LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Missed with commit 100796c15c "gallium/radeon: drop support for LLVM
3.5"
v2: s/LLVN/LLVM/ in shortlog (Nicolai)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.
v2: - Use ctx->i8.
- Dropped null-check for declare_memory_region.
- Changed memory region array to single region.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This will prevent optimization passes from introducing unsupported
library calls.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Based on a gallivm patch by Ilia Mirkin.
+8 piglit regressions due to precision issues (I blame the tests)
The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead
of emulation with integer instructions. They are GLSL 4.00 intrinsics.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>