Commit graph

80315 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
da88c2a8e8 radeonsi: set maximum work group size based on block size
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
b082147b78 radeonsi: implement shared atomics
v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
8acf3e501b radeonsi: implement shared memory load/store
v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
84a6761ae3 radeonsi: add shared memory
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
753a3e472b radeonsi: lower compute shader arguments
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
008d977d01 radeonsi: Use CE for all descriptors.
v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0b6c463dac gallium/util: Add u_bit_scan_consecutive_range64.
For use by radeonsi.

v2: Make sure that it works for all 64 bits set.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
058b54c624 radeonsi: Replace list_dirty with a mask.
We can then upload only the dirty ones with the constant engine.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
aabc7d61d6 radeonsi: Add CE uploader.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0d7ddd6819 radeonsi: Allocate chunks of CE ram.
v2: Use 32 byte alignment.

v3: Don't allocate CE space for vertex buffer descriptors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
86c71ff989 radeonsi: Add CE synchronization.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
fe1ef23b66 radeonsi: Add CE packet definitions.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
8fee75d606 radeonsi: Create CE IB.
Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
7201230582 winsys/amdgpu: Enlarge const IB size.
Necessary to prevent performance regressions due to extra flushing.

Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.

v2: Add preamble IB.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
7997b5f005 winsys/amdgpu: Add support for const IB.
v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
e78170f388 winsys/amdgpu: split IB data into a new structure in preparation for CE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Marek Olšák
f4b77c764a gallium/radeon: move ring_type into winsyses
Not used by drivers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Jose Fonseca
1d2ac7a7ca llvmpipe: Call LLVMShutdown before exiting.
So that LLVM frees its globals.

Trivial.
2016-04-19 12:10:09 +01:00
Jose Fonseca
524042fa35 llvmpipe: Avoid LLVMGetGlobalContext in tests.
Trivial.
2016-04-19 12:10:02 +01:00
Jose Fonseca
bb9e8c5090 llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.
64bits MSVCRT's exp2f(-inf) returns -inf instead of 0.  Tested with
MSVC 2013's CRT.  (I haven't tried 2015 yet.)

Also this does not happen with MinGW.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:53 +01:00
Jose Fonseca
ee9876be1d llvmpipe: Test more vector lengths.
All power of two of up native vector length.

There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest.  Fixing is left to the future, but the test is now
able to expect it to fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:44 +01:00
Jose Fonseca
932b71f17d gallivm: Avoid llvm::sys::getProcessTriple().
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:37 +01:00
Jose Fonseca
b5ca689cee gallivm: Remove lp_get_module_id.
Just keep a copy of the module_name in gallivm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:26 +01:00
Jose Fonseca
969ba8bfa7 gallivm: Fix MCJIT with LLVM 3.3.
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.

This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.

Tested MCJIT with LLVM 3.3 to 3.6.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:17 +01:00
Jose Fonseca
cf4105740f gallivm: Make MCJIT a runtime option.
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.

The new option is GALLIVM_MCJIT.

Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows.  I'm almost certain this used to
work, so there probably is a regression somewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:14 +01:00
Jose Fonseca
7d2151b6ea scons: Show the unit test full path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:11 +01:00
Jose Fonseca
2211f8d559 gallivm: Use LLVMSetTarget.
Instead of LLVM C++ interfaces.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:00 +01:00
Jose Fonseca
9aa23b11e4 gallivm: Use LLVMPrintValueToString where available.
And llvm::raw_string_ostream where not (LLVM 3.3).

Thereby eliminating yet another dependency on unstable LLVM interfaces.

As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)

Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:37 +01:00
Jose Fonseca
f6621cd3be gallium/tests: Update UTIL_FORMAT_MAX_* defines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:16 +01:00
Jose Fonseca
121a0cedc8 Revert "nv50/ra: isinf() is in namespace std since C++11."
This reverts commit f525db6358.

It was superseeded by commit 649704f1f7.
2016-04-19 11:22:45 +01:00
Eric Anholt
802b9292aa vc4: Fix fbo-generatemipmap-formats for NPOT.
Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but
we only get one value to control the stride of the src and dst for single
sampled buffers.  A RCL tile blit from level != 1 to level == 0 would
therefore load from the wrong stride.
2016-04-18 16:55:36 -07:00
Eric Anholt
2402bb6095 vc4: Remove unused "immediates" field
This was for TGSI, which we no longer have to deal with.
2016-04-18 16:48:45 -07:00
Ben Widawsky
2408899cb2 i965: Define miptree map functions static (trivial)
They were already declared as such. It was changed here:
commit 31f0967fb5
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 2 14:43:18 2015 -0700

    i965: Make intel_miptree_map_raw static

Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-18 16:12:13 -07:00
Matt Turner
b1d9353cb5 glsl: Properly handle ldexp(0.0f, non-zero-exp). 2016-04-18 15:48:54 -07:00
Dave Airlie
3a26ef23e7 gallivm: convert size query to using a set of parameters.
This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.

[airlied: use some local variables (Roland)]

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-19 07:33:39 +10:00
Tim Rowley
3227c10270 swr: dereference cbuf/zbuf/views on context destroy
Fixes resource memory leaks.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-18 15:52:26 -05:00
Rob Clark
77a9107bf2 freedreno/ir3: fix grouping issue w/ reverse swizzles
When we have something like:

   MOV OUT[n], IN[m].wzyx

the existing grouping code was missing a potential conflict.  Due to
input needing to be sequential scalar regs, we have:

 IN:  x <-> y <-> z <-> w

which would be grouped to:

 OUT: w <-> z2 <-> y2 <-> x  (where the 2 denotes a copy/mov)

but that can't actually work.  We need to realize that x and w are
already in the same chain, not just that they aren't both already in
new chain being built.

With this fixed, we probably no longer need the hack from f68f6c0.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-18 15:41:32 -04:00
Marek Olšák
ed66c75784 radeonsi: use enums in si_shader.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
0c52caf7b7 gallium/radeon: use enums in r600_query.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
dd9ca77cb9 radeonsi: always use PFP_SYNC_ME when doing flushes and waits
This is typically used by the closed driver before SURFACE_SYNC.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
1db5678688 radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
58494b42b5 radeonsi: add safety assertions for meta cache flushes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
78f58a4e6f radeonsi: don't use ACQUIRE_MEM on the graphics ring
It's only required on the compute ring. This matches the closed driver.

The compute flag is removed to prevent confusion and Bas's compute shader
patches remove it in the whole function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
3faecdd4e1 radeonsi: remove TODO and correct a comment in si_emit_cache_flush
Yes, that flag is really needed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
28c2573b4f radeonsi: don't flush CB/DB caches for performance counters
I'm not sure about this. This will make the engines go idle, but the caches
will be unflushed. This should match app behavior without performance
counters, which can be a good thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
97c328b2a3 gallium/radeon: don't flush CB/DB caches for timestamp queries
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
6dc21b1962 gallium/util: fix undefined shift to the last bit in u_bit_scan
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
9434aa8103 gallium/util: fix u_bit_scan_consecutive_range for mask == 0xffffffff
The second ffs returns 0, yielding count == -1.

v2: change 1 to 1u

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-18 19:51:24 +02:00
Marek Olšák
e50e1f86b0 gallium/radeon: fix Nine with its slightly shifted viewports
just need to do the calculation in floating-point and then round things
properly

Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-04-18 19:51:24 +02:00
Erik Faye-Lund
ee5b35142a docs: correct name for GL_OES_primitive_bounding_box
When this extension was added, an underscore were mistakenly replaced
by a space. Let's correct this, so it's a tad easier to grep for this
extension.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2016-04-18 10:48:57 -07:00