Commit graph

32837 commits

Author SHA1 Message Date
Rob Clark
c267750bb1 freedreno/a5xx: stencil texturing support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-17 11:19:34 -05:00
Rob Clark
a39d403202 freedreno/a5xx/gmem: fix z32/s8 restore/resolve
BLIT_ZS mode is used for either combined z24/s8 or z32 in which case
BLIT_S mode is used for separate stencil.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-17 11:19:34 -05:00
Rob Clark
010ebed72a freedreno/a5xx/gmem: move ZS restore tiling hack
Code motion to simplify next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-17 11:19:33 -05:00
Rob Clark
22605dce4b freedreno: update generated headers 2017-11-17 11:19:33 -05:00
Brian Paul
501591e852 svga: add missing PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTER* cases
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-11-16 20:35:17 -07:00
Brian Paul
940fba68c9 util/tgsi: use ASSERT_BITFIELD_SIZE() to check opcode field size
I've noticed at least two places where we store the TGSI opcode in
an unsigned:8 bitfield.  We're at 249 opcodes now.  If we hit 256 we'll
need to grow those bitfields.  Use the new ASSERT_BITFIELD_SIZE() macro
to detect that.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-16 20:35:17 -07:00
Dave Airlie
8c52ece581 r600: enable ARB_shader_image_load_store, ARB_shader_image_size
This also enables GL4.2 for gpus with hw fp64 (cayman, cypress)

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:41 +10:00
Dave Airlie
31db4a3200 r600: handle image size support.
This adds support for the RESQ opcode with the workaround
required due to hw bugs for buffers and cube arrays.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:41 +10:00
Dave Airlie
f94f5c9a9f r600/sb: disable SB for images.
Until we can work further on sb, disable it for images for now.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:41 +10:00
Dave Airlie
aa38bf658f r600/shader: add support for load/store/atomic ops on images.
This adds support to the shader assembler for load/store/atomic
ops on images which are handled via the RAT operations.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:41 +10:00
Dave Airlie
a6b3792843 r600: add core pieces of image support.
This adds the atoms and gallium api implementations,
along with support for compress/decompress paths for
shader images.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dave Airlie
5689bb0022 r600/shader: implement getting thread id.
We need the thread id to use the immediate buffer readback
mechanism, so add support for calculating it.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dave Airlie
c119a1acb3 r600/shader: add flag to denote if shader uses images
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dave Airlie
894e2deb7e r600: implement basic memory barrier.
This isn't 100% perfect (fglrx also fails a bunch of those tests)
but implement the start of a memory barrier for image support.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dave Airlie
ac4f175d79 r600: allocate immed buffer resource for images.
In order to image readback we have to execute a MEM_RAT instruction
that needs a buffer to transfer the result into until the shader
can fetch it.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dave Airlie
77d36cbc8d r600: handle writes_memory properly
This implements proper handling for shaders with side effects.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-17 11:31:40 +10:00
Dylan Baker
d8acf79f0c autotools: change version TINY -> PATCH
Because patch is more common than tiny for talking about the 3rd element
of a version.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
2017-11-16 16:16:45 -08:00
Dylan Baker
65fc16c974 autotools: set XA versions in configure.ac and configure header file
Currently the versions are set in the header, and then sed is used to
extract them, so that autotools can use them elsewhere.

This is odd. Autotools is perfectly capable of configuring the header
with the versions, and then they don't need to be extracted from the
the header. This is cleaner and more obvious.

Tested with make distcheck.

v2: - Split tiny -> patch change
    - Drop temporary variables
    - change XA_VERSION_* -> XA_*
v3: - Finish splitting the tiny -> patch change

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
2017-11-16 16:16:29 -08:00
Rob Clark
ff018a3f55 freedreno: also mark images used by draw/grid
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-16 08:44:19 -05:00
Rob Clark
92e75bf0ec freedreno: mark SSBOs written at draw time
Comment was right, implementation was wrong ;-)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-16 08:44:19 -05:00
Rob Clark
2878af74dd freedreno/a5xx: ARB_framebuffer_no_attachments support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-16 08:44:19 -05:00
Nicolai Hähnle
f3fa3b0d95 tgsi/exec: fix LDEXP in softpipe
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103128
Fixes: cad959d901 ("gallium: add LDEXP TGSI instruction and corresponding cap")
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-11-16 06:47:05 +01:00
Timothy Arceri
a8bdf0e0c4 radeonsi: copy some nir gs info
v2: copy input primitive

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-16 10:54:03 +11:00
Timothy Arceri
b73ce64fb8 ac: add gs_{prim,invocation}_id to the abi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-16 10:54:03 +11:00
Timothy Arceri
8ae92a9209 radeonsi: gather stream info in nir path
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-16 10:51:35 +11:00
Brian Paul
ae7b4fdb32 tgsi: whitespace clean-ups in tgsi_util.[ch]
Trivial.
2017-11-15 16:12:44 -07:00
Brian Paul
cbfc92c04b svga: s/unsigned/enum tgsi_texture_type/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-11-15 16:12:43 -07:00
Brian Paul
dc37cb0f86 tgsi: s/unsigned/enum tgsi_texture_type/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-11-15 16:12:43 -07:00
Frank Richter
bf41b2b262 gallium/wgl: fix default pixel format issue
When creating a context without SetPixelFormat() don't blindly take the
pixel format reported by GDI. Instead, look for our own closest pixel
format.

Minor clean-ups added by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103412
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2017-11-15 16:12:43 -07:00
Brian Paul
824e8084ed svga: issue debug warning for unsupported two-sided stencil state
We only have a single stencil read mask and write mask.  Issue a
warning if different front/back values are used.  The Piglit
gl-2.0-two-sided-stencil test hits this.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-11-15 16:12:43 -07:00
Wladimir J. van der Laan
d61a914394 etnaviv: Add sampler TS support
Sampler TS is an hardware optimization that can be used when rendering
to textures. After rendering to a resource with TS enabled, the
texture unit can use this to bypass lookups to empty tiles. This also
means a resolve-in-place can be avoided to flush the TS.

This commit is also an optimization when not using sampler TS, as
resolve-in-place will now be skipped if a resource has no (valid) TS.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:54 +01:00
Wladimir J. van der Laan
59d76e7ab6 etnaviv: Flush TS cache before changing TS configuration
This is to make sure that the TS is properly flushed to memory before
rendering to a new surface starts.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:39 +01:00
Wladimir J. van der Laan
0d6d9b520b etnaviv: Add TS_SAMPLER formats to etnaviv_format
Sampler TS introduces yet another format enumeration for
renderable+textureable formats. Introduce it into the etnaviv_format
table as another column.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:26 +01:00
Wladimir J. van der Laan
ade528edd1 etnaviv: Check that resource has a valid TS in etna_resource_needs_flush
Resources only need a resolve-to-itself if their TS is valid for any
level, not just if it happens to be allocated.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:09 +01:00
Wladimir J. van der Laan
b24cb40188 etnaviv: rnndb update
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:26:53 +01:00
Roland Scheidegger
65123ee62c r600: set the number type correctly for float rts in cb setup
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger
570d5b7992 r600: use ieee version of rsq
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger
1c8d57a008 r600: use ieee version of rcp
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger
3835009796 r600: use DX10_CLAMP bit in shader setup
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).

v2: set it in all shader stages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger
aab0bfc648 r600: use min_dx10/max_dx10 instead of min/max
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Dave Airlie
3ceee04a4f r600: fix cubemap arrays
A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.

For images I will port the rest of the code.

Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-15 11:26:11 +10:00
Rob Clark
7676e71113 freedreno/a5xx: small comment fix
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:12:47 -05:00
Rob Clark
d27318bdd0 freedreno/a5xx: indirect draw support
A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:58 -05:00
Rob Clark
f383cf9d41 freedreno/a5xx: split out helper for pipeline stalls
We need a similar thing for indirect draws.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:51 -05:00
Rob Clark
d74029bddc freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:43 -05:00
Timothy Arceri
5041ea96a0 gallium/radeon: disable the cache when nir backend enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-15 08:47:31 +11:00
Timothy Arceri
bc308122cc gallium/tgsi: add tess output supoort to tgsi_get_gl_varying_semantic()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Timothy Arceri
3d21eb3b7d gallium/tgsi: add prim id to tgsi_get_gl_varying_semantic()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Andres Rodriguez
f7580e7204 broadcom/vc4: fix indentation in vc4_screen.c
Stumbled into this when adding a new PIPE_CAP.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-14 11:31:36 -08:00
Tim Rowley
d8489517a5 swr/rast: Faster emulated simd16 permute
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-14 11:40:19 -06:00