Commit graph

26626 commits

Author SHA1 Message Date
Dave Airlie
aeb2be3a2f r600: use rctx consistently in evergreen_compute.c
Another step towards cleaning this up.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:05 +01:00
Dave Airlie
0560c82ff6 r600: cleanup whitespace in evergreen_compute.c
This aligns the code with the style of the rest of the driver.

Makes editing it a lot less painful.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:38:51 +01:00
Edward O'Callaghan
ea310f2b38 r600g: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
483a686f80 radeonsi: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
1156cad405 radeonsi: Improve assert info out of si_set_framebuffer_state()
Lets give the developer a little hand if we are going to assert
on a zero literal at the end of a branch.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
bb1bd0ddd7 radeonsi: Allow 16 samples MSAA mode for PIPE_FORMAT_NONE
For ARB_framebuffer_no_attachment; A is_format_supported() query
with 'PIPE_FORMAT_NONE' passed implies a query of the number of
samples supported from the framebuffer with no attachment.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
63f2b2f2c0 softpipe: Set samples and layers in set_framebuffer_state() cb
Carries across the number of samples and layers state in the
'softpipe_set_framebuffer_state()' callback. This state is
part of 'ARB_framebuffer_no_attachments' support.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
7ff28d2af0 gallium/trace: Dump no.of samples and layers in fb state
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
0b7075fed7 gallium: Put no.of {samples,layers} into pipe_framebuffer_state
Here we store the number of samples and layers directly in the
pipe_framebuffer_state so that in the case of
ARB_framebuffer_no_attachment we may make use of them directly.

Further, we adjust various gallium/auxiliary helper functions
accordingly.

V2:
  Convert branches in util_framebuffer_get_num_layers() and
  util_framebuffer_get_num_samples() to their canonical form.

V3:
  'git stash pop' the typo fix of 'cbufs' which should be
  'nr_cbufs' that was missing in V2, woops! Thanks Marek for
  pointing this out yet again.

V4:
  Squash in the following patch:

  'gallium/util: Ensure util_framebuffer_get_num_samples() is valid'

   Upon context creation, internal driver structures are malloc()'ed
   and memset() to zero them. This results in a invalid number of
   samples 'by default'. Handle this in the simplest way to avoid
   elaborate and probably equally sub-optimial solutions.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
4bc9130fba gallium: Add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT
Add PIPE_CAP to determine if the GL extension
'GL_ARB_framebuffer_no_attachments' shall be
supported.

The driver is required to support 'PIPE_FORMAT_NONE'
via its 'is_format_supported()' callback in order
to determine the MSAA modes the hardware supports so
that values requested from the application using
'GL_ARB_framebuffer_no_attachments' may be quantized
to what the hardware expects.

V.2:
 Fix doc for a more detailed description of the PIPE_CAP
 and the corresponding GL constant.

V.3:
 Renamed and repurposed once again.

V.4:
 Remove CAP from cap_mapping array.

[airlied: fix damaged whitespace]

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:56:44 +10:00
Bas Nieuwenhuizen
3393358115 radeonsi: set shader calling conventions
Note that old mesa + new LLVM or new mesa + old LLVM breaks
with this change and the corresponding LLVM change (D18559).

For LLVM version <= 3.8 we use the old method, but we can't detect
people using a post 3.8 svn version that is still too old.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-06 21:54:35 +02:00
Rob Clark
506b561ba7 freedreno/ir3: insert extra move into phi
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi.  But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src.  Insert an extra mov in the source block.  If it is not
required the CP pass will take it back out again.

Fixes:

  ./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
  ./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test

and probably others.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:43 -04:00
Rob Clark
f9cdbf4405 freedreno/ir3: eliminate unnecessary absneg's
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0).  So we'd end up with things like:

   cmps.s.ge r1.x, ...
   absneg.s r1.x, (neg)r1.x
   absneg.s r1.x, (abs)r1.x
   sel.b32 r2.x, r0.x, r1.x, r0.y

The (neg) already gets collapsed due to the following (abs).  Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:25 -04:00
Michel Dänzer
0daab9878d clover: Fix build against clang SVN >= r265359
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-05 17:00:58 +00:00
Bas Nieuwenhuizen
799789ba99 radeonsi: use bounded indexing for samplers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:18 +02:00
Bas Nieuwenhuizen
713353db18 radeonsi: use bounded indexing for constant buffers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:07 +02:00
Marek Olšák
a64dbdf612 gallium/radeon: allow multiple exports of the same texture with different usage
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.

v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-05 15:32:40 +02:00
Rob Clark
3e13572826 freedreno/ir3: deal with duplicate phi sources
Otherwise we end up with funny things like:

  mov.f32f32 r0.x, r1.y
  mov.f32f32 r0.x, r1.y

(It doesn't happen as much after fixing the problem w/ CP into phi src,
but it can still happen since we aren't too clever about generating phi
sources in the first place.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
f8feb97ba5 freedreno/ir3: fix silly brain-fart in RA
We want to consider all the vars, not 1/32nd of them, when extending
live-ranges.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
8e451c2d06 freedreno/ir3: don't cp into phi's
The block defining a phi source might not have been executed.  If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
383b6e87f9 freedreno/ir3: we can't store immediate values
Fixes some transform-feedback piglits, like:

bin/ext_transform_feedback-nonflat-integral

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
d47fb856af freedreno/ir3: add dumping for use/def/live-in/live-out
Turned out to be useful to debug an issue in RA.  Let's keep it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
38ae05a340 freedreno/ir3: drop unused instr category arg
No longer used, so drop the extra arg to ir3_instr_create()

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
19739e4fb9 freedreno/ir3: remove ir3_instruction::category
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
70735643f4 freedreno/ir3: encode instruction category in opc_t
Been on my TODO list for a while.  If nothing else this will make gdb
properly grok the opc_t enum.

This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t).  Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Ilia Mirkin
4bc3b1ca48 nvc0: add hardware ETC2 and ASTC support on GK20A and GM107+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-04 00:32:48 -04:00
Jose Fonseca
7ad49daca6 gallivm: Introduce lp_format_intrinsic.
For adding .v4f32 like suffixes to intrinsics, taking special care for
scalar case, which was being often neglected.

This fixes invalid IR when doing mipmap filtering on SSE2 (the only
case where we'd use intrinsics with scalars.)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-04 00:06:09 +01:00
Jose Fonseca
a293f57e13 gallivm: Use llvm.fabs.
Exactly the same code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:09:09 +01:00
Jose Fonseca
e4f01da15d gallivm: Prefer backend agnostic intrinsic for rounding.
We could unconditionally use these instrinsics, but performance with SSE2
would suck, as LLVM falls back to calling libm.

lp_test_arit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:09:07 +01:00
Jose Fonseca
324451e73f gallivm: Add debug option to force SSE2.
For simulating less capable machines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:08:57 +01:00
Jose Fonseca
5fa31a4aba llvmpipe: Test abs.
Trivial.
2016-04-03 11:17:20 +01:00
Jose Fonseca
522ebe701d llvmpipe: Build lp_test_arit on MSVC too.
It builds fine now.  Probably due to C99 support.

Trivial.
2016-04-03 11:17:20 +01:00
Jose Fonseca
b284f1f7f9 gallivm: Fix performance regressions due to vector selects.
LLVM often can't determine the mask elements are all ones/zeros, and
there doesn't seem to be a good way to hint that.

Thanks to Roland Scheidegger for spotting and analyzing the issue.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Jose Fonseca
11c4e5b45c gallivm: Remove lp_build_load_volatile.
No longer needed.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Jose Fonseca
bcfb86b09d gallivm: Use standard LLVMSetAlignment from LLVM 3.4 onwards.
Only provide a fallback for LLVM 3.3.

One less dependency on LLVM C++ interface.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Ilia Mirkin
d64134ecae gm107/ir: add OP_SELP emission, used in DSQRT lowering
The current DSQRT lowering code emits an OP_SELP, so we have to handle
its emission. This will eventually go away, but no harm supporting this
op.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-02 19:27:51 -04:00
Ilia Mirkin
3610b1466d nv50/ir: we can't load local memory directly into an output
This fixes piglit tests like

tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test

and related ones.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-02 18:10:20 -04:00
Samuel Pitoiset
0852c5703b nv50/ir: fix envyas variants when building the code lib
nvc0 and nve4 have been respectively replaced by gf100 and gk104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-02 20:00:57 +02:00
Brian Paul
36d8fed798 svga: remove unused svga_compile_key::texture_msaa field
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Brian Paul
b283c76342 svga: check TXF instruction's target to determine MSAA
Rather than the currently bound texture.  This goes along with the
earlier patch to get away from examining bound textures and sampler
views during shader translation.

Fixes VMware bug 1632739.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Brian Paul
ef10b5427a tgsi: add simple tgsi_is_msaa_target() helper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Bas Nieuwenhuizen
1a5c8c24b5 gallium: distinguish between shader IR in get_compute_param
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.

The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:13 +02:00
Bas Nieuwenhuizen
be5899dcf9 gallium: add global buffer memory barrier bit
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:06 +02:00
Bas Nieuwenhuizen
01f993a21f gallium: add threads per block TGSI property
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:50:59 +02:00
Bas Nieuwenhuizen
ea8f4a6b13 gallium: add compute shader IR type
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:49:57 +02:00
Samuel Pitoiset
60e1c6a7fc nvc0: enable compute shaders on GK104 and GM107+
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
71f327aa21 nvc0: bump the maximum number of UBOs for compute on Kepler
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
839a469166 nvc0/ir: do not lower shared+atomics on GM107+
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
543fb95473 nvc0/ir: add atomics support on shared memory for Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
275019d7db nvc0/ir: fix wrong pred emission for ld lock on GK104
This fixes 84b9b8f (nvc0/ir: add missing emission of locked load
predicate).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00