Commit graph

80324 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
e56514f631 radeonsi: update predicate condition for compute dispatches
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
c3083d841e radeonsi: implement TGSI compute dispatch
v2: - Use radeon_set_sh_reg_seq.
    - Set predicate bit for conditional rendering.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
1349dd16ff radeonsi: only emit compute shader state when switching shaders
v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
ba1f66a73d radeonsi: rework compute scratch buffer
Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
107f4d3538 radeonsi: do per cs setup for compute shaders once per cs
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
52d3584dec radeonsi: don't pass scratch buffer to user SGPRs
As far as I can see we use relocations for clover too.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
422a19f76f radeonsi: split input upload off from si_launch_grid
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.

v2: - Clarified commit message.
    - Use radeon_set_sh_reg_seq.
    - Also upload input buffer for clover kernels, even when
      input_size is 0, as it contains grid parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
898298efc9 radeonsi: implement TGSI compute shader creation
v2: Moved scratch_enabled initialization after compile.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
85fd7817ee radeonsi: update shader count for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
da88c2a8e8 radeonsi: set maximum work group size based on block size
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
b082147b78 radeonsi: implement shared atomics
v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
8acf3e501b radeonsi: implement shared memory load/store
v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
84a6761ae3 radeonsi: add shared memory
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
753a3e472b radeonsi: lower compute shader arguments
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
008d977d01 radeonsi: Use CE for all descriptors.
v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0b6c463dac gallium/util: Add u_bit_scan_consecutive_range64.
For use by radeonsi.

v2: Make sure that it works for all 64 bits set.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
058b54c624 radeonsi: Replace list_dirty with a mask.
We can then upload only the dirty ones with the constant engine.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
aabc7d61d6 radeonsi: Add CE uploader.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0d7ddd6819 radeonsi: Allocate chunks of CE ram.
v2: Use 32 byte alignment.

v3: Don't allocate CE space for vertex buffer descriptors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
86c71ff989 radeonsi: Add CE synchronization.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
fe1ef23b66 radeonsi: Add CE packet definitions.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
8fee75d606 radeonsi: Create CE IB.
Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
7201230582 winsys/amdgpu: Enlarge const IB size.
Necessary to prevent performance regressions due to extra flushing.

Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.

v2: Add preamble IB.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
7997b5f005 winsys/amdgpu: Add support for const IB.
v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
e78170f388 winsys/amdgpu: split IB data into a new structure in preparation for CE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Marek Olšák
f4b77c764a gallium/radeon: move ring_type into winsyses
Not used by drivers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Jose Fonseca
1d2ac7a7ca llvmpipe: Call LLVMShutdown before exiting.
So that LLVM frees its globals.

Trivial.
2016-04-19 12:10:09 +01:00
Jose Fonseca
524042fa35 llvmpipe: Avoid LLVMGetGlobalContext in tests.
Trivial.
2016-04-19 12:10:02 +01:00
Jose Fonseca
bb9e8c5090 llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.
64bits MSVCRT's exp2f(-inf) returns -inf instead of 0.  Tested with
MSVC 2013's CRT.  (I haven't tried 2015 yet.)

Also this does not happen with MinGW.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:53 +01:00
Jose Fonseca
ee9876be1d llvmpipe: Test more vector lengths.
All power of two of up native vector length.

There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest.  Fixing is left to the future, but the test is now
able to expect it to fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:44 +01:00
Jose Fonseca
932b71f17d gallivm: Avoid llvm::sys::getProcessTriple().
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:37 +01:00
Jose Fonseca
b5ca689cee gallivm: Remove lp_get_module_id.
Just keep a copy of the module_name in gallivm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:26 +01:00
Jose Fonseca
969ba8bfa7 gallivm: Fix MCJIT with LLVM 3.3.
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.

This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.

Tested MCJIT with LLVM 3.3 to 3.6.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:17 +01:00
Jose Fonseca
cf4105740f gallivm: Make MCJIT a runtime option.
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.

The new option is GALLIVM_MCJIT.

Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows.  I'm almost certain this used to
work, so there probably is a regression somewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:14 +01:00
Jose Fonseca
7d2151b6ea scons: Show the unit test full path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:11 +01:00
Jose Fonseca
2211f8d559 gallivm: Use LLVMSetTarget.
Instead of LLVM C++ interfaces.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:00 +01:00
Jose Fonseca
9aa23b11e4 gallivm: Use LLVMPrintValueToString where available.
And llvm::raw_string_ostream where not (LLVM 3.3).

Thereby eliminating yet another dependency on unstable LLVM interfaces.

As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)

Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:37 +01:00
Jose Fonseca
f6621cd3be gallium/tests: Update UTIL_FORMAT_MAX_* defines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:16 +01:00
Jose Fonseca
121a0cedc8 Revert "nv50/ra: isinf() is in namespace std since C++11."
This reverts commit f525db6358.

It was superseeded by commit 649704f1f7.
2016-04-19 11:22:45 +01:00
Eric Anholt
802b9292aa vc4: Fix fbo-generatemipmap-formats for NPOT.
Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but
we only get one value to control the stride of the src and dst for single
sampled buffers.  A RCL tile blit from level != 1 to level == 0 would
therefore load from the wrong stride.
2016-04-18 16:55:36 -07:00
Eric Anholt
2402bb6095 vc4: Remove unused "immediates" field
This was for TGSI, which we no longer have to deal with.
2016-04-18 16:48:45 -07:00
Ben Widawsky
2408899cb2 i965: Define miptree map functions static (trivial)
They were already declared as such. It was changed here:
commit 31f0967fb5
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 2 14:43:18 2015 -0700

    i965: Make intel_miptree_map_raw static

Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-18 16:12:13 -07:00
Matt Turner
b1d9353cb5 glsl: Properly handle ldexp(0.0f, non-zero-exp). 2016-04-18 15:48:54 -07:00
Dave Airlie
3a26ef23e7 gallivm: convert size query to using a set of parameters.
This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.

[airlied: use some local variables (Roland)]

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-19 07:33:39 +10:00
Tim Rowley
3227c10270 swr: dereference cbuf/zbuf/views on context destroy
Fixes resource memory leaks.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-18 15:52:26 -05:00
Rob Clark
77a9107bf2 freedreno/ir3: fix grouping issue w/ reverse swizzles
When we have something like:

   MOV OUT[n], IN[m].wzyx

the existing grouping code was missing a potential conflict.  Due to
input needing to be sequential scalar regs, we have:

 IN:  x <-> y <-> z <-> w

which would be grouped to:

 OUT: w <-> z2 <-> y2 <-> x  (where the 2 denotes a copy/mov)

but that can't actually work.  We need to realize that x and w are
already in the same chain, not just that they aren't both already in
new chain being built.

With this fixed, we probably no longer need the hack from f68f6c0.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-18 15:41:32 -04:00
Marek Olšák
ed66c75784 radeonsi: use enums in si_shader.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
0c52caf7b7 gallium/radeon: use enums in r600_query.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
dd9ca77cb9 radeonsi: always use PFP_SYNC_ME when doing flushes and waits
This is typically used by the closed driver before SURFACE_SYNC.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
1db5678688 radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00