For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.
The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes 84b9b8f (nvc0/ir: add missing emission of locked load
predicate).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Make sure to avoid out of bounds access in presence of indirect
array indexing by loading the size from the driver constant buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The grid size is stored as three 32-bits integers in the indirect
buffer but the launch descriptor uses a 32-bits integer for both
griddim_y and griddim_z like this (z << 16) | y. To make it work,
the 16 high bits of griddim_y are overwritten by griddim_z.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.
Suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uniform buffer objects will be sticked to the driver constant buffer
like buffers because the launch descriptor only allows 8 CBs.
Input kernel parameters for OpenCL are still uploaded to screen->parm
which is bound on c0, but this will be changed later with a new series.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of using the screen->parm buffer object which will be removed,
upload auxiliary constants to uniform_bo to be consistent regarding
what we already do for Fermi.
This breaks surfaces support (for compute only) but this will be
properly re-introduced later for ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()
This change avoids the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
undefined reference to 'strchrnul'
collect2: error: ld returned 1 exit status
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is an old patch I had around.
Vector selects seem to work well from LLVM 3.3. Using them should
improve code quality, as it might make constant propagation pass more
effective.
Tested lp_test_*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The unit variable can be used uninitialized.
Fixes: 24e77cb09 ("tgsi: handle indirect sampler arrays. (v2)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The number of channels must be 4 for all RGBA components.
Fixes: 22d129601 ("tgsi: add support for image operations to tgsi_exec. (v2.1)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It was kind of overloaded, returning two different things. Now get
the index of the shadow reference src register with a new
tgsi_util_get_shadow_ref_src_index() function.
To verify the new code, I added some temp/debug code which looped
over all TGSI_TEXTURE_x values, calling the old function and new and
checking that the returned indexes matched.
Also tested piglit "shadow" tests with softpipe/llvmpipe.
No testing of ilo and radeonsi changes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Should fix the assertion in piglit
spec@arb_gpu_shader5@texturegather@fs-r-none-shadow-2d when the
TXQ instruction specifies a 2D target but the sampler view was
declared as SHADOW2D.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This fixes a null pointer dereference during the register allocation pass,
if a function had arguments.
Functions arguments get a definition from the function itself, a definition
which is therefore not linked to any instruction. If a value ends up having
a definition but no linked instruction, the register allocation pass doesn't
need to consider whether that value is generated by an instruction that
can only handle "short" registers (on nv50).
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
This adds support for ARB_shader_image_load_store to softpipe.
v2: add RESQ support (Ilia)
v3: constify, cleanup internals, add some comments (Brian).
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for passing through images to the
tgsi execution stage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for load/store/atomic operations on images
along with image tracking support.
v2: add RESQ support. (Ilia)
v2.1: constify interface (Brian)
split get_image_coord_dim (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
ARB_shader_image_load_store adds support for explicit early
depth testing. However we need to make sure we don't overwrite
values using the shader written values in this case.
This fixes early depth testing in softpipe to conform with
those requirements.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a mask of which of the current 2x2 grid are non-helper
invocations. This allows us to mask off the helper invocations
later for the image operations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Similar to radeonsi linear layout should work for all not compressed
or depth/stencil formats. Fixes issues with VDPAU on r600.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The null check of result was the wrong way around. Also, move memset
and dereference of result after the null check.
Reviewed-by: Christian König <christian.koenig@amd.com>
d3d10 state tracker does not encode (valid) target (only offsets are
really used from the texture bits), since that information always comes
from the sview dcl, and not the instruction (note the meaning of target
is actually slightly different between gl and d3d10 in any case, because
d3d10 target does never include shadow bit).
Also move the msaa sampler identification as well - would need to set that
on the sview not sampler, so while this does not fix it make it at least
obvious it won't work with sample instructions.
We assert that fullinst->Instruction.Texture != 0 above so no need to
check it in the conditional. We also have the fullinst->Texture.Texture
value in a local variable, so use it.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Texture sample instructions specify a sampler unit and texture target
such as "1D", "2D", "CUBE", etc. Sampler view declarations also specify
the sampler unit and texture target.
This patch checks that the texture instructions agree with the declarations
and collects the texture target type for each sampler unit.
v2: only compare instruction's texture target to the sampler view declaration
target if the instruction is a TEX instruction, not a SAMPLE instruction.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is a tiny housekeeping patch which does the following:
* Replaced tabs with three spaces.
* Formatted oneline and multiline code comments. Some doxygen
comments weren't marked as such and some code comments were marked
as doxygen comments.
* Spaces between if- and while-statements and their parenthesis.
According to the mesa coding style guidelines.
Reviewed-by: Brian Paul <brianp@vmware.com>
With commit dc9ecf58c0,
we are now getting the sampler target from the sampler view
declaration. But since a sampler view declaration can be defined
after a sampler declaration, we need to emit the
sampler declarations in the pre-helpers function, otherwise,
the sampler target might not have defined yet for the sampler declaration.
Fixes viewperf maya-03 and various gl trace regressions in hwv11.
Reviewed-by: Brian Paul <brianp@vmware.com>
svga_shader_expand() will fall back to using non-malloced memory for
emit.buf if malloc fails. We should check if the memory is malloced
before freeing it in the error path of svga_tgsi_vgpu9_translate.
Original patch by Thomas Hindoe Paaboel Andersen <phomes@gmail.com>.
Remove trivial svga_destroy_shader_emitter() function, by BrianP.
Signed-off-by: Brian Paul <brianp@vmware.com>
Having all this code in a big switch is not really a good pratice.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
That should allow us to get away from passing internal structures around.
v2: rebased
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
We are going to need that in the Mesa state tracker as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Use DMA-buf for the VDPAU interop interface instead of using
internal structures.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Works around a bug in radeonsi and tiling is actually
not very beneficial in this use case.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Linear layout should work for all not compressed or depth/stencil formats.
v2: restrict it a bit more
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To not overwrite buffers and surfaces information, we need to use
a different offset in the driver constant buffer. Currently, OP_SUQ
is only supported for buffers but this will be slightly updated for
images support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>