Commit graph

26585 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
1a5c8c24b5 gallium: distinguish between shader IR in get_compute_param
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.

The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:13 +02:00
Bas Nieuwenhuizen
be5899dcf9 gallium: add global buffer memory barrier bit
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:06 +02:00
Bas Nieuwenhuizen
01f993a21f gallium: add threads per block TGSI property
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:50:59 +02:00
Bas Nieuwenhuizen
ea8f4a6b13 gallium: add compute shader IR type
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:49:57 +02:00
Samuel Pitoiset
60e1c6a7fc nvc0: enable compute shaders on GK104 and GM107+
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
71f327aa21 nvc0: bump the maximum number of UBOs for compute on Kepler
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
839a469166 nvc0/ir: do not lower shared+atomics on GM107+
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
543fb95473 nvc0/ir: add atomics support on shared memory for Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
275019d7db nvc0/ir: fix wrong pred emission for ld lock on GK104
This fixes 84b9b8f (nvc0/ir: add missing emission of locked load
predicate).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
4f58b78c30 nvc0/ir: add support for compute UBOs on Kepler
Make sure to avoid out of bounds access in presence of indirect
array indexing by loading the size from the driver constant buffer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
3b246a71d7 nvc0: add indirect compute support on Kepler
The grid size is stored as three 32-bits integers in the indirect
buffer but the launch descriptor uses a 32-bits integer for both
griddim_y and griddim_z like this (z << 16) | y. To make it work,
the 16 high bits of griddim_y are overwritten by griddim_z.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
7797d5f7d9 nvc0: reduce likelihood of collision for real buffers on Kepler
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.

Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
e2e8085fac nvc0: store ubo info to the driver constbuf on Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
12aa047c98 nvc0: bind user uniforms for compute on Kepler
Uniform buffer objects will be sticked to the driver constant buffer
like buffers because the launch descriptor only allows 8 CBs.

Input kernel parameters for OpenCL are still uploaded to screen->parm
which is bound on c0, but this will be changed later with a new series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
1828d90a00 nvc0: bind shader buffers for compute on Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
debd910512 nvc0: bind driver cb for compute on c7[] for Kepler
Instead of using the screen->parm buffer object which will be removed,
upload auxiliary constants to uniform_bo to be consistent regarding
what we already do for Fermi.

This breaks surfaces support (for compute only) but this will be
properly re-introduced later for ARB_shader_image_load_store.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Jose Fonseca
f72de6f386 gallivm: Prevent disassembly debug output from being truncated.
By using os_log_message directly, as _debug_vprintf truncates messages
to 4K.

Also cleanup the disassemble interface.

Spotted by Roland.

Trivial.
2016-04-01 21:22:42 +01:00
Mauro Rossi
e09d04cd56 radeonsi: use util_strchrnul() to fix android build error
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()

This change avoids the following building error:

external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
undefined reference to 'strchrnul'
collect2: error: ld returned 1 exit status

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:56:57 +01:00
Jose Fonseca
cdf7c6b83d gallivm: Use vector selects on LLVM 3.3+.
This is an old patch I had around.

Vector selects seem to work well from LLVM 3.3.  Using them should
improve code quality, as it might make constant propagation pass more
effective.

Tested lp_test_*

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-01 09:05:19 +01:00
Ilia Mirkin
df03be196a nv50,nvc0: add PIPE_BIND_LINEAR support to is_format_supported
vdpau has recently come to rely on this, so make sure to check it
properly.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-31 21:53:11 -04:00
Samuel Pitoiset
d22eca5f90 tgsi: silence compiler warning in fetch_sampler_unit()
The unit variable can be used uninitialized.

Fixes: 24e77cb09 ("tgsi: handle indirect sampler arrays. (v2)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-01 07:16:24 +10:00
Samuel Pitoiset
05902a6686 tgsi: fix out of bounds access in exec_atomop()
The number of channels must be 4 for all RGBA components.

Fixes: 22d129601 ("tgsi: add support for image operations to tgsi_exec. (v2.1)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-01 07:15:16 +10:00
Brian Paul
9076e04934 tgsi: split tgsi_util_get_texture_coord_dim() function into two
It was kind of overloaded, returning two different things.  Now get
the index of the shadow reference src register with a new
tgsi_util_get_shadow_ref_src_index() function.

To verify the new code, I added some temp/debug code which looped
over all TGSI_TEXTURE_x values, calling the old function and new and
checking that the returned indexes matched.

Also tested piglit "shadow" tests with softpipe/llvmpipe.
No testing of ilo and radeonsi changes.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:48:00 -06:00
Brian Paul
9d7cd43988 tgsi: skip texture query opcodes when examining texture targets
Should fix the assertion in piglit
spec@arb_gpu_shader5@texturegather@fs-r-none-shadow-2d when the
TXQ instruction specifies a 2D target but the sampler view was
declared as SHADOW2D.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-31 09:47:40 -06:00
Pierre Moreau
f96a403bc3 nv50/ir: Check for valid insn instead of def size
This fixes a null pointer dereference during the register allocation pass,
if a function had arguments.

Functions arguments get a definition from the function itself, a definition
which is therefore not linked to any instruction. If a value ends up having
a definition but no linked instruction, the register allocation pass doesn't
need to consider whether that value is generated by an instruction that
can only handle "short" registers (on nv50).

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-31 10:30:29 -04:00
Dave Airlie
eb9ad9faa3 softpipe: add image support to softpipe (v3)
This adds support for ARB_shader_image_load_store to softpipe.

v2: add RESQ support (Ilia)
v3: constify, cleanup internals, add some comments (Brian).

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:16 +10:00
Dave Airlie
0d1f679ded draw: add support for passing images to vs/gs shaders.
This just adds support for passing through images to the
tgsi execution stage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:11 +10:00
Dave Airlie
22d1296013 tgsi: add support for image operations to tgsi_exec. (v2.1)
This adds support for load/store/atomic operations on images
along with image tracking support.

v2: add RESQ support. (Ilia)
v2.1: constify interface (Brian)
split get_image_coord_dim (Brian)

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:05 +10:00
Dave Airlie
493eab7679 softpipe: add support for explicit early depth testing
ARB_shader_image_load_store adds support for explicit early
depth testing. However we need to make sure we don't overwrite
values using the shader written values in this case.

This fixes early depth testing in softpipe to conform with
those requirements.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:54 +10:00
Dave Airlie
827393b76f tgsi: introduce NonHelperMask
This is a mask of which of the current 2x2 grid are non-helper
invocations. This allows us to mask off the helper invocations
later for the image operations.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:50 +10:00
Dave Airlie
ca180c09bb tgsi_exec: handle execmask when doing indirect lookups
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:46 +10:00
Dave Airlie
1ff4cc0535 tgsi_exec: add support for up to 3 address registers (v2)
v2: be consistent with other definitions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:08 +10:00
Christian König
1faca438bd r600: ignore PIPE_BIND_LINEAR in *_is_format_supported
Similar to radeonsi linear layout should work for all not compressed
or depth/stencil formats. Fixes issues with VDPAU on r600.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-03-30 20:00:27 +02:00
Thomas Hindoe Paaboel Andersen
9a73f5728e st/vdpau: correct null check
The null check of result was the wrong way around. Also, move memset
and dereference of result after the null check.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-03-30 20:00:27 +02:00
Roland Scheidegger
2d3b8aefda tgsi: (trivial) only verify target for is_tex instructions
d3d10 state tracker does not encode (valid) target (only offsets are
really used from the texture bits), since that information always comes
from the sview dcl, and not the instruction (note the meaning of target
is actually slightly different between gl and d3d10 in any case, because
d3d10 target does never include shadow bit).
Also move the msaa sampler identification as well - would need to set that
on the sview not sampler, so while this does not fix it make it at least
obvious it won't work with sample instructions.
2016-03-30 04:26:54 +02:00
Brian Paul
5c85c3be26 tgsi: simplify tgsi_shader_info::is_msaa_sampler checking
We assert that fullinst->Instruction.Texture != 0 above so no need to
check it in the conditional.  We also have the fullinst->Texture.Texture
value in a local variable, so use it.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
86e1768c13 tgsi: collect texture sampler target info in tgsi_scan_shader()
Texture sample instructions specify a sampler unit and texture target
such as "1D", "2D", "CUBE", etc.  Sampler view declarations also specify
the sampler unit and texture target.

This patch checks that the texture instructions agree with the declarations
and collects the texture target type for each sampler unit.

v2: only compare instruction's texture target to the sampler view declaration
target if the instruction is a TEX instruction, not a SAMPLE instruction.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
6775268b61 gallium/docs: s/gven/given/ 2016-03-29 18:13:46 -06:00
Rovanion Luckey
7087e0ab27 gallium: Format code in pb_buffer_fenced.c according to style guide.
This is a tiny housekeeping patch which does the following:

  * Replaced tabs with three spaces.
  * Formatted oneline and multiline code comments. Some doxygen
    comments weren't marked as such and some code comments were marked
    as doxygen comments.
  * Spaces between if- and while-statements and their parenthesis.

According to the mesa coding style guidelines.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:44:11 -06:00
Charmaine Lee
2d8df0306b svga: emit sampler declarations in the helper function for non vgpu10
With commit dc9ecf58c0,
we are now getting the sampler target from the sampler view
declaration. But since a sampler view declaration can be defined
after a sampler declaration, we need to emit the
sampler declarations in the pre-helpers function, otherwise,
the sampler target might not have defined yet for the sampler declaration.

Fixes viewperf maya-03 and various gl trace regressions in hwv11.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:09 -06:00
Brian Paul
96e0894106 svga: avoid freeing non-malloced memory
svga_shader_expand() will fall back to using non-malloced memory for
emit.buf if malloc fails. We should check if the memory is malloced
before freeing it in the error path of svga_tgsi_vgpu9_translate.

Original patch by Thomas Hindoe Paaboel Andersen <phomes@gmail.com>.
Remove trivial svga_destroy_shader_emitter() function, by BrianP.

Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:08 -06:00
Samuel Pitoiset
9d57c84994 nvc0/ir: move load/store lowering pass to handleLDST()
Having all this code in a big switch is not really a good pratice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 19:55:51 +02:00
Christian König
bdeb22b7b6 st/vdpau: implement the new DMA-buf based interop v2
That should allow us to get away from passing internal structures around.

v2: rebased

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:18 +02:00
Christian König
0042aa508e st/vdpau: move FormatRGBAToPipe into the interop
We are going to need that in the Mesa state tracker as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:14 +02:00
Christian König
faba96bc60 st/vdpau: add new interop interface
Use DMA-buf for the VDPAU interop interface instead of using
internal structures.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:10 +02:00
Christian König
d180de3532 st/vdpau: use linear layout for output surfaces
Works around a bug in radeonsi and tiling is actually
not very beneficial in this use case.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:28:43 +02:00
Christian König
7eb5e5b8b4 radeonsi: ignore PIPE_BIND_LINEAR in si_is_format_supported v2
Linear layout should work for all not compressed or depth/stencil formats.

v2: restrict it a bit more

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-29 17:28:35 +02:00
Samuel Pitoiset
b8b3af2932 nvc0: use a different offset for buffers and surfaces
To not overwrite buffers and surfaces information, we need to use
a different offset in the driver constant buffer. Currently, OP_SUQ
is only supported for buffers but this will be slightly updated for
images support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 00:47:28 +02:00
Rhys Kidd
668b6ddfc5 vc4: Remove unused include from vc4_nir_lower_txf_ms.c
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-03-28 11:51:11 -07:00
Rob Clark
b4c72b792c freedreno/ir3: fix for load_front_face intrinsic
Seems like trying to widen in the same instruction as the add.s does a
non-sign-extending widen.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-28 10:19:53 -04:00