Commit graph

15886 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
fc67375379 radeonsi: Synchronize a streamout write after read hazard.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 13:55:38 +02:00
Hans de Goede
dccdb655a1 nv30: Add missing PIPE_SHADER_CAP_INTEGERS to get_shader_param()
Add missing PIPE_SHADER_CAP_INTEGERS for frag shaders to
nv30_screen_get_shader_param().

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-12 11:41:12 +02:00
Dave Airlie
afa8707ba9 softpipe: add SSBO/shader atomics support.
This adds support for the features requires for ARB_shader_storage_buffer_object
and ARB_shader_atomic_counters, ARB_shader_atomic_counter_ops.

[airlied: some cleanups applied]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:16:13 +10:00
Dave Airlie
081a958bcd tgsi: add support for buffer/atomic operations to tgsi_exec.
This adds support for doing load/store/atomic operations on
buffer objects.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:15:33 +10:00
Boyuan Zhang
1c7ba7f156 radeon/uvd: alignment fix for decode message buffer
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-04-11 19:30:47 -04:00
Jason Ekstrand
a9e6213edd nir/lower_system_values: Add support for several computed values
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-11 13:53:03 -07:00
Emil Velikov
5e010a72c9 drivers/softpipe: add missing header to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-11 19:08:23 +01:00
Ilia Mirkin
cdb6fa91fa nvc0: handle the case where there are no framebuffer attachments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:44 -04:00
Ilia Mirkin
59ca92137b nv50,nvc0: support sending string markers down into the command stream
This should hopefully make it a little easier to debug with GL
applications like glretrace and looking at command streams.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:43 -04:00
Ilia Mirkin
f9480d7918 nv50,nvc0: add invalidate_resource support for buffer resources
Provide a callback to reallocate the underlying storage of a resource so
that it is not bound to any existing fences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:43 -04:00
Eric Anholt
30b818d5eb vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes.
This gives us one less set of special instruction generation cases, and
instead just the case for returning the correct register to read.
2016-04-08 18:41:46 -07:00
Eric Anholt
f029932cac vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.
This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.

No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
2016-04-08 18:41:46 -07:00
Eric Anholt
44d7b8ad12 vc4: Add a helper function for the construction of qregs.
The separate declaration of the struct is not helping clarity, and I was
going to be writing a whole lot more of these in the upcoming patches.
2016-04-08 18:41:45 -07:00
Eric Anholt
114c8b38d3 vc4: Add missing scheduling dependency for MS color writes. 2016-04-08 18:41:45 -07:00
Eric Anholt
483c172989 vc4: Drop the multi_instruction distinction for QIR instructions.
It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.

No change on shader-db.
2016-04-08 18:41:45 -07:00
Eric Anholt
a8b525f8c4 vc4: Handle SF on instructions that write r4.
Normal SFU writes couldn't have SF because they were marked as
multi_instruction, but tex_result and tlb_color_read weren't.  This ended
up not being a problem according to anything in shader-db, but it seems
possible.
2016-04-08 18:41:45 -07:00
Eric Anholt
e46b48963a vc4: Allow multi-instruction QIR nodes to get VPM optimization.
There used to be multi-instruction operations that would use src[] twice,
which is why we couldn't do some optimizations on them.  This is no longer
the case.

total instructions in shared programs: 77973 -> 77969 (-0.01%)
instructions in affected programs:     84 -> 80 (-4.76%)
total estimated cycles in shared programs: 234165 -> 234157 (-0.00%)
estimated cycles in affected programs:     92 -> 84 (-8.70%)
2016-04-08 18:41:45 -07:00
Eric Anholt
99a759a4a3 vc4: Switch to using NIR_PASS macros.
This gets us better validation of our NIR transformations.
2016-04-08 18:41:45 -07:00
Eric Anholt
7030eadbed vc4: Handle nir_intrinsic_load_user_clip_plane as a vec4.
I liked having all my NIR be scalar, but nir_validate() complains that the
intrinsic writes 4 components but the destination we set up was only 1
component.  I could generate a new scalar variant, but it's a lot easier
to just leave it as a vec4.  This doesn't hurt codegen since we GC unused
uniforms, and UCP dot products use all the components anyway.
2016-04-08 18:40:55 -07:00
Rhys Kidd
40e77741cf vc4: Emit a warning and proceed for handling loops in NIR.
We don't really suppor control flow yet, but it's a lot nicer to render
something and warn on stderr than to crash.

Fixes the following piglit tests:
- shaders/complex-loop-analysis-bug
- shaders/glsl-fs-discard-04

Converts the following piglit tests from crash to fail:
- shaders/glsl-fs-continue-inside-do-while
- shaders/glsl-fs-loop
- shaders/glsl-fs-loop-continue
- shaders/glsl-fs-loop-nested
- shaders/glsl-texcoord-array
- shaders/glsl-vs-continue-inside-do-while
- shaders/glsl-vs-loop
- shaders/glsl-vs-loop-continue
- shaders/glsl-vs-loop-nested

No piglit regressions.

v2 (Eric): Add stronger stderr warning.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
2450b219e5 vc4: Add a stub for NIR->QIR of control flow function nodes
We shouldn't have any NIR functions present since all GLSL functions get
inlined, but this would be a more informative error if it does happen.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
e5997778bc vc4: Add better debug of NIR->QIR control flow graph failure
Ensure NIR control flow graph nodes that are unhandled in QIR
are reported with sufficient verbosity to aid debugging.

This improves piglit outputs, amongst other tools.

There are no other remaining uses of assert(0) as a blunt tool
within vc4.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
e529dd179f vc4: Remove unused include from vc4_program.c
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Marek Olšák
1cd19ebc4a radeonsi: do per-pixel clipping based on viewport states
In other words, vport scissors are derived from viewport states.
If the scissor test is enabled, the intersection of both is used.

The guard band will disable clipping, so we have to clip per-pixel.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-08 00:23:05 +02:00
Samuel Pitoiset
059308db84 nv50/ir: do not try to attach JOIN ops to ATOM
This might result in an INVALID_OPCODE dmesg error in case a join is
attached to an atomic operation.

Spotted with arb_shader_image_load_store-host-mem-barrier on GK104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-04-07 23:10:26 +02:00
Nicolai Hähnle
2abe4f8d7d radeonsi: raise number of samplers per shader to 32
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94835
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle
9d2693f58a radeonsi: expand the compressed color and depth texture masks to 64 bits
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle
f270067ef9 radeonsi: replace magic 16 by SI_NUM_USER_SAMPLERS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Brian Paul
b7e67b2337 svga: new SVGA_MSAA env var to disable/enable MSAA pixel formats
On by default.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-07 11:42:43 -06:00
Brian Paul
9f443af449 svga: add some trivial null pointer checks
These small mallocs will probably never fail, but static analysis tools
may complain about the missing checks.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-07 11:42:43 -06:00
Samuel Pitoiset
60cf2fa477 trace: add missing set_shader_images()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 18:52:27 +02:00
Marek Olšák
5fac4887d8 radeonsi: disable perfect ZPASS counts for PIPE_QUERY_OCCLUSION_PREDICATE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 13:58:01 +02:00
Marek Olšák
baa0b3f4cc radeonsi: don't use the real barrier instruction in tess ctrl shaders
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 13:58:01 +02:00
Dave Airlie
828d84c8e2 r600: use radeon_emit in a few more places in evergreen_compute
This is just a cleanup of the code.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:26 +01:00
Dave Airlie
0c40b6f96c r600: make compute global buffer functions static.
This moves things around so that the global buffer handling
functions in evergreen_compute.c are static.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:22 +01:00
Dave Airlie
a5d247dda0 r600: make two compute functions static.
These aren't used outside evergreen_compute.c

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:17 +01:00
Dave Airlie
41558efa87 r600: using pipe_grid_info more in evergreen_compute.
No reason to pull the pieces apart here, also make
one of the functions static as it's unused outside this.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:13 +01:00
Dave Airlie
a6e17d7d69 r600: in evergreen_compute use ctx consistently instead of ctx_
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:09 +01:00
Dave Airlie
aeb2be3a2f r600: use rctx consistently in evergreen_compute.c
Another step towards cleaning this up.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:05 +01:00
Dave Airlie
0560c82ff6 r600: cleanup whitespace in evergreen_compute.c
This aligns the code with the style of the rest of the driver.

Makes editing it a lot less painful.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:38:51 +01:00
Edward O'Callaghan
ea310f2b38 r600g: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
483a686f80 radeonsi: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
1156cad405 radeonsi: Improve assert info out of si_set_framebuffer_state()
Lets give the developer a little hand if we are going to assert
on a zero literal at the end of a branch.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
bb1bd0ddd7 radeonsi: Allow 16 samples MSAA mode for PIPE_FORMAT_NONE
For ARB_framebuffer_no_attachment; A is_format_supported() query
with 'PIPE_FORMAT_NONE' passed implies a query of the number of
samples supported from the framebuffer with no attachment.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
63f2b2f2c0 softpipe: Set samples and layers in set_framebuffer_state() cb
Carries across the number of samples and layers state in the
'softpipe_set_framebuffer_state()' callback. This state is
part of 'ARB_framebuffer_no_attachments' support.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
7ff28d2af0 gallium/trace: Dump no.of samples and layers in fb state
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
4bc9130fba gallium: Add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT
Add PIPE_CAP to determine if the GL extension
'GL_ARB_framebuffer_no_attachments' shall be
supported.

The driver is required to support 'PIPE_FORMAT_NONE'
via its 'is_format_supported()' callback in order
to determine the MSAA modes the hardware supports so
that values requested from the application using
'GL_ARB_framebuffer_no_attachments' may be quantized
to what the hardware expects.

V.2:
 Fix doc for a more detailed description of the PIPE_CAP
 and the corresponding GL constant.

V.3:
 Renamed and repurposed once again.

V.4:
 Remove CAP from cap_mapping array.

[airlied: fix damaged whitespace]

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:56:44 +10:00
Bas Nieuwenhuizen
3393358115 radeonsi: set shader calling conventions
Note that old mesa + new LLVM or new mesa + old LLVM breaks
with this change and the corresponding LLVM change (D18559).

For LLVM version <= 3.8 we use the old method, but we can't detect
people using a post 3.8 svn version that is still too old.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-06 21:54:35 +02:00
Rob Clark
506b561ba7 freedreno/ir3: insert extra move into phi
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi.  But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src.  Insert an extra mov in the source block.  If it is not
required the CP pass will take it back out again.

Fixes:

  ./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
  ./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test

and probably others.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:43 -04:00
Rob Clark
f9cdbf4405 freedreno/ir3: eliminate unnecessary absneg's
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0).  So we'd end up with things like:

   cmps.s.ge r1.x, ...
   absneg.s r1.x, (neg)r1.x
   absneg.s r1.x, (abs)r1.x
   sel.b32 r2.x, r0.x, r1.x, r0.y

The (neg) already gets collapsed due to the following (abs).  Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:25 -04:00