Commit graph

27608 commits

Author SHA1 Message Date
Rob Herring
f0f4259324 virgl: also build vtest for Android
Enabling swrast on Android causes a link error because vtest is missing.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:51 +10:00
Rob Herring
2d3301e4d5 virgl: fix reference counting of prime handles
The virgl reference counting of buffers is broken for prime fd buffers.
Each prime fd passed into virgl_drm_winsys_resource_create_handle creates
a new resource. The solution requires creating a separate hash table to
track flink names separately from prime handles.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:29 +10:00
Rob Herring
f87330dbce virgl: reuse screen when fd is already open
It is necessary to share the screen between mesa and gralloc to
properly ref count resources. This implements a hash lookup on
the file description to re-use an already created screen. This is
a similar implementation as freedreno and radeon.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:29 +10:00
Mauro Rossi
6711592c2f nouveau/video: wrap assertion within #ifndef NDEBUG
The change is necessary to avoid the following building error in android:

external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c: In function 'nouveau_vp3_bsp_next':
external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c:269:14: error: 'bsp_bo' undeclared (first use in this function)
       assert(bsp_bo->size >= str_bsp->w0[0] + num_bytes[i]);
              ^
This matches the declaration of the variables in question.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-01 17:45:19 -05:00
François Tigeot
a48afb92ff gallium: Add DragonFly support
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-31 11:56:09 +00:00
Ilia Mirkin
7f19e29305 nv50/ir: get rid of memory stores with nop values
This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.

total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs    : 910139 -> 895501 (-1.61%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst
    helped           0        7442        4693
      hurt           0          90        2687

Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-30 17:18:41 -05:00
Ilia Mirkin
3ca941d60e nv50/ir: fix false global CSE on instructions with multiple defs
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do

    a, b = tex()
    if () c = a
    else c = b

which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.

This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 17:18:41 -05:00
Ilia Mirkin
3ca2001b53 nv50,nvc0: fix buffer clearing to respect engine alignment requirements
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.

This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.

As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 16:01:41 -05:00
Rob Clark
f15447e7c9 freedreno/ir3: ignore clip-vertex varying
Since we emulate clip-planes, the clip-vertex is used within the VS
itself (thanks to nir_lower_clip).  So just ignore it as a VS output.
Fixes a boatload of piglit tests that were asserting on unknown
varying slot.

(Also unrelated spelling/typo fix.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:29:21 -05:00
Rob Clark
f20cf22b54 freedreno/ir3: don't ignore local vars
With glsl_to_nir we end up with local variables, instead of global, for
arrays.

Note that we'll eventually have to do something more clever, I think,
when we support multiple functions, but that will probably take some
work in a few places.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:57 -05:00
Rob Clark
8039a2a6b3 freedreno/ir3: handle tex instrs w/ const offset
Something we start to see with glsl_to_nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:27 -05:00
Rob Clark
f212d7dc50 freedreno/ir3: support load_front_face intrinsic
With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE.
But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:11:54 -05:00
Rob Clark
9e05e8cb75 freedreno: limit string marker to max packet size
Experimentally derived max size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:10:13 -05:00
Ilia Mirkin
438d421f8b nvc0: avoid crashing when there are holes in vertex array bindings
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-29 22:10:42 -05:00
Ilia Mirkin
899b1b98a4 nvc0: enable atomic counters and ssbo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 22:10:42 -05:00
Ilia Mirkin
48cf392c0e nv50/ir: handle new TGSI MEMBAR opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
df043f0764 nvc0/ir: fix atomic compare-and-swap arguments
Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
7b9a77b905 nv50/ir: add support for indirect buffer loading
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
2c4eeb0b5c nv50/ir: add SUQ op by reading the info from driver constbuf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin
c3083c7082 nv50/ir: add support for BUFFER accesses
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin
abe427ebd2 nvc0: handle shader buffer memory barrier
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:38 -05:00
Ilia Mirkin
fe01be4ad5 nvc0: add state management for shader buffers
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:07 -05:00
Ilia Mirkin
b4688c4615 nvc0: double per-shader stage driver constants area
We need to store a lot more info now with per-buffer address/size.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:06 -05:00
Ilia Mirkin
ae725d5746 trace: add support for set_shader_buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add arg_begin/arg_end around buffer array
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
6fb8fac853 st/mesa: add shader buffer barrier bit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
2ccc42fd2c tgsi: add MEMBAR opcode to handle memoryBarrier* GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-29 21:04:36 -05:00
Michel Dänzer
30fcf241e1 winsys/amdgpu: Process RADEON_FLAG_* independently from RADEON_DOMAIN_*
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Michel Dänzer
62f837e2ea winsys/amdgpu: Handle RADEON_FLAG_NO_CPU_ACCESS
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862

(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Karol Herbst
29d09f8747 nv50/ir: optimize mad/fma with third argument 0 to mul
Very modest effect, but it's clearly the right thing to do.

total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs    : 910157 -> 910131 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          55          85          85
      hurt           0          26          20          20

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:59:41 -05:00
Karol Herbst
3aa681449e nv50/ir: run DCE backwards
Reduces calls up to 50%

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:29 -05:00
Karol Herbst
978ae28ca2 nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))
Following shader-db results on GK110:

total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs    : 910187 -> 910157 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          18         821         821
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:22 -05:00
Axel Davy
dda7a84986 radeonsi: Add option for SI scheduler
Add a debug option to select the LLVM SI Machine Scheduler.
R600_DEBUG=sisched

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-28 17:22:44 +01:00
Eric Anholt
3fba517bdd vc4: Throttle outstanding rendering after submission.
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more.  Prevents up to
seconds of lag with window movement in X with xcompmgr -c.  There may
be useful tuning to do in the future, but for now this gets us
usability.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Eric Anholt
2a449ce7c9 vc4: Don't record the seqno of a failed job submit.
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Karol Herbst
19ae5de981 nv50/ir: fix memory corruption when spilling and redoing RA
When RA fails, and we spill, we have to clean everything up before doing
RA again. We were forgetting to reset the hi/lo linked lists - at
least the hi list is guaranteed to still have pointers to now-deleted
RIG nodes.

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-26 17:55:06 -05:00
Jan Vesely
efc4142acd r600,compute: Plug few memory leaks
v2: drop inline keyword
    drop radeon_llvm_dispose_kernel_module wrapper

v3: move definitions to .c file
    use in radeonsi

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 19:04:38 +01:00
Jan Vesely
e1dcd333e4 r600: Typos and whitespace fixes
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-26 19:01:22 +01:00
Marek Olšák
2924ca131f radeonsi: fix clover crash
caused by ce1e7784d0

Trivial.
2016-01-26 18:53:41 +01:00
Marek Olšák
af57507e4f radeonsi: fix shader precompilation for shader-db
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)

v2: also set the alpha func (trivial)
2016-01-26 18:49:50 +01:00
Emil Velikov
eb63640c1d glsl: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:33 +00:00
Emil Velikov
a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00
Emil Velikov
1a882fd2ee nir: move shader_enums.[ch] to compiler
This way one can reuse it in glsl, nir or other infrastructure without
pulling nir as dependency.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:20 +00:00
Nicolai Hähnle
41875ac4ed gallium/ddebug: add 'verbose' option
This currently just writes out the name of dump files, which can be useful
to easily correlate those files with other log outputs (driver debug output,
apitrace calls, etc.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:58:55 -05:00
Nicolai Hähnle
f4c8fa4e49 gallium/ddebug: make 'noflush' also affect 'always' mode
This changes the default behavior of 'always' mode to be consistent with
hang detection mode.

I have used this to more easily compare dumped command streams using diff.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:58:49 -05:00
Nicolai Hähnle
8894b5f008 radeonsi: use llvm.amdgcn.s.barrier instead of llvm.AMDGPU.barrier.local
The new name for the intrinsic was introduced in LLVM r258558.

v2: use ternary operator instead of preprocessor

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:57:06 -05:00
Nicolai Hähnle
1067e6eb55 radeonsi: add DCC buffer for sampler views on new CS
This fixes a VM fault and possible lockup in high memory pressure situations.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-25 10:16:12 -05:00
Nicolai Hähnle
0bacbf5b7e radeonsi: emit rw_buffers for tes_shader only if tes_shader present
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:08 -05:00
Nicolai Hähnle
2385b253c6 radeonsi: do not set the shader->key for gs copy shaders
The key for a geometry shader would be interpreted as the key for a vertex
shader further down the line, which really doesn't make sense.

This does not affect the contents of shader->key because geometry shaders
don't have any key entries anyway.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:05 -05:00
Nicolai Hähnle
46c0ba60c6 radeonsi: si_llvm_emit_vs_epilogue is never used with gs copy shaders
Hence remove the misleading branch on is_gs_copy_shader.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:02 -05:00
Nicolai Hähnle
c55b9499d5 radeonsi: move is_gs_copy_shader to si_shader_context
It is only used during shader creation now, so no need to keep it around
afterwards.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:00 -05:00