Commit graph

29173 commits

Author SHA1 Message Date
Marek Olšák
fec7f74129 gallium/pb_cache: check parameters that are more likely to fail first
This makes Bioshock Infinite with deferred flushing 2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
2596ae2b6e radeonsi: emit PS exports last
This effectively removes s_waitcnt instructions after FP16 exports.

Before:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v4, v5   ; 5E000B04
    v_cvt_pkrtz_f16_f32_e32 v1, v6, v7   ; 5E020F06
    exp 15, 1, 1, 0, 0, v0, v1, v0, v0   ; F800041F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v8, v9   ; 5E001308
    v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A
    exp 15, 2, 1, 0, 0, v0, v1, v0, v0   ; F800042F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C
    v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E
    exp 15, 3, 1, 1, 1, v0, v1, v0, v0   ; F8001C3F 00000100
    s_endpgm                             ; BF810000

After:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    v_cvt_pkrtz_f16_f32_e32 v2, v4, v5   ; 5E040B04
    v_cvt_pkrtz_f16_f32_e32 v3, v6, v7   ; 5E060F06
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    v_cvt_pkrtz_f16_f32_e32 v4, v8, v9   ; 5E081308
    v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A
    exp 15, 1, 1, 0, 0, v2, v3, v0, v0   ; F800041F 00000302
    v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C
    v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E
    exp 15, 2, 1, 0, 0, v4, v5, v0, v0   ; F800042F 00000504
    exp 15, 3, 1, 1, 1, v6, v7, v0, v0   ; F8001C3F 00000706
    s_endpgm                             ; BF810000

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
b2b45cecef radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITS
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
ad70c3954b radeonsi: really wait for the second EOP event and not the first one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
1a1cc67edd gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flag
always set

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Samuel Pitoiset
9c63224540 gm107/ir: make use of ADD32I for all immediates
ADD only allows to emit 19-bits immediates.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-07-19 18:07:15 +02:00
Samuel Pitoiset
0904a2ba97 gm107/ir: add missing NEG modifier for IADD32I
Like FADD32I, the NEG modifier of src0 is at position 56.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-07-19 18:07:10 +02:00
Andreas Boll
c482decd4d ddebug: Fix trivial typo in stderr message
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
2016-07-19 16:04:40 +02:00
Eric Engestrom
8ba46fbd9e vl: fix memory leak
CovID: 1363008
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:41:00 +02:00
Boyuan Zhang
60c7450f16 vl: add entry point
Add entrypoint to distinguish H.264 decode and encode. For example, in patch
5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated
for H.264 encoding. So we need to use the entry_point to determine this is
H.264 decode or H.264 encode. We can use config to determine the entrypoint
since config_id is passed to us for VaCreateContext call. However, for
VaDestoyContext call, only context_id is passed to us. So we need to know the
entrypoint in order to not free the pps/sps for encoding case.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:36:46 +02:00
Ilia Mirkin
ed9dd3bcd9 nv50,nvc0: srgb rendering is only available for rgba/bgra
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:

  bin/getteximage-formats init-by-clear-and-render -auto -fbo
  bin/getteximage-formats init-by-rendering -auto -fbo

which previously ran into srgb-encoding differences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-07-18 20:04:17 -04:00
Ilia Mirkin
8e7893eb53 nvc0: add support for BGRA8 images
This is useful for pbo downloads, which are now accelerated with images.
BGRA8 is a moderately common format to do that in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-18 20:04:17 -04:00
Christian König
3e1ad846f9 radeon/uvd: add session context buffer for polaris 10/11 v2
This way we have unlimited UVD sessions.

v2: only enable it when kernel supports it as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-18 17:13:17 +02:00
Leo Liu
134d6e4e4f vl/dri3: fix a memory leak from front buffer
Inspired by fix for mem leak of vdpau interop, resource_from_handle
set texture reference count, that need to be decreased and released,
recall there is a similar case for DRI3, that is with VA-API glx
extension, there is temporary TFP(texture from pixmap), we target it
through dma-buf. leak happens when without count down the reference.

Checked and found with mpv vo=opengl case, there only one static TFP,
the leak happens once, but for totem player using gstreamer VA-API glx,
the dynamic TFP for each frame, so leak quite a bit.

This fixes mem leak for mpv and totem.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-18 09:20:40 -04:00
Kenneth Graunke
ac1181ffbe compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.
Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input.  Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
  -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
  -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-07-17 19:26:48 -07:00
Dave Airlie
e7d96e7685 virgl: drop pointless leftover init of virgl_transfer_inline_write.
Pointed out by Marek.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-17 06:20:53 +10:00
Ilia Mirkin
062c6b8e54 nv50: fix alphatest for non-blendable formats
The hardware can only do alphatest when using a blendable format. This
means that the various *16 norm formats didn't work with alphatest. It
appears that Talos Principle uses such formats, as well as alpha tests,
for some internal renders, which made them be incorrect. However this
does not appear to affect the final renders, but in a different game it
easily could.

The approach we take is that when alphatests are enabled and a suitable
format is used (which we anticipate is the vast minority of the time),
we insert code into the shader to perform the comparison and discard.
Once inserted, that code lives in the shader forever, and we re-upload
it each time the function changes with a fixed-up compare. To avoid
re-uploading too often, if we switch back to a blendable format, the
test is (effectively) disabled and the hw alphatest functionality is
used.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-16 11:45:30 -04:00
Rob Clark
44bbfedbd9 gallium/u_queue: add optional cleanup callback
Adds a second optional cleanup callback, called after the fence is
signaled.  This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence.  In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-16 10:00:04 -04:00
Nicolai Hähnle
6f73c7595f radeonsi: remove the DRAW_PREAMBLE packet
According to firmware guys, the new sequence that we added for Polaris should
work on all CIK parts, and should actually be faster on some parts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-16 13:02:37 +02:00
Eric Anholt
3bcd0f1912 vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.
To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary
miptree.  However, if a single level is being selected, we can use the
existing miptree and force all the sampling to be from that particular
level.

This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses
base levels in the blit implementation in gallium.  Improves "glmark2 -b
terrain" from 2 fps to 3 (perhaps some more precision would be useful?),
and cuts its CPU usage during the benchmarking from ~30% to ~10% (total
CPU time from 8.8s to 7.6s).
2016-07-15 13:54:00 -07:00
Eric Anholt
88152d7dc0 vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags.
The compiler uses the per-stage flags already, so it didn't need this.
vc4_uniforms was using it, so just replace it with both of the stage flags
for now.
2016-07-15 13:54:00 -07:00
Eric Anholt
5db82e0c89 vc4: Remove dead dirty_samplers field.
We use a big VC4_DIRTY_FRAGTEX/VC4_DIRTY_VERTEX on the stage, instead.
2016-07-15 13:54:00 -07:00
Eric Anholt
219b75deb9 vc4: Turn on control flow support in the simulator environment.
We can't merge the non-simulator support until we merge the kernel side and
get a new libdrm release.
2016-07-15 13:54:00 -07:00
Charmaine Lee
6b7923ee46 svga: avoid ubinding render targets that have already been unbound
Fixed the remaining redundant SetRenderTargets command emission.

Tested with lightsMark2008, Heaven, mtt piglit, glretrace, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-15 14:24:34 -06:00
Neha Bhende
4f633d110a svga: dump code for GenMips.
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-15 14:24:33 -06:00
Yaakov Selkowitz
5d303867f5 Use correct names for dlopen()ed files on Cygwin
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Brian Paul
50a669de4e svga: handle mismatched number of samplers, sampler views
in svga_init_shader_key_common().  Since the CSO module only tracks
sampler views for fragment shaders, the number of samplers and sampler
views can be mismatched for other types of shaders.  This situation
triggered an assertion in Chrome with maps.google.com

This patch adds defensive code to handle that situation.

Fixes VMware bug 1694027
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-07-15 11:05:18 -06:00
Leo Liu
b9d10e79c8 st/omx/enc: check uninitialized list from task release
The uninitialized list should be checked and returned.

Thank Julien for the notification and suggested fix.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 09:17:36 -04:00
Samuel Pitoiset
ea6b236ab1 nv50/ir: add missing string for SV_WORK_DIM
Fixes: 2aa1197 ("nouveau: Add support for SV_WORK_DIM")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2016-07-14 22:28:39 +02:00
Marek Olšák
f84e9d749f Revert "radeon/llvm: Use alloca instructions for larger arrays"
This reverts commit 513fccdfb6.

Bioshock Infinite hangs with that.
2016-07-14 22:15:08 +02:00
Jan Vesely
489bb5473b r600,compute: Reserve vtx 3 for kernel arguments
Using vtx 0 does not work for dynamic offsets.

v2: add explanatory comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-07-14 16:04:50 -04:00
Marek Olšák
33eddde4a7 radeon/uvd: fail to create a decoder if RUVD_MSG_CREATE submission fails
This is the bare minimum for reporting the error to the user.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Marek Olšák
85388652f9 winsys/amdgpu: return an error on IB submission failures
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Marek Olšák
a7d84f7731 gallium/radeon: add a return value to cs_flush
Required by our UVD code.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
francians@gmail.com
3db7f3458f freedreno/a4xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
francians@gmail.com
948822018f freedreno/a3xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
francians@gmail.com
cf2f345356 freedreno/a2xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
Boyuan Zhang
23c5e8bc58 radeon/vce: handle newly added parameters
Replace the previous hardcoded value with newly defined parameters

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:21 +02:00
Boyuan Zhang
5490068fb1 st/omx: assign previous values to new structure
Assign previously hardcoded values for OMX to newly defined
structure. As a result, OMX behaviour will not change at all.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:14 +02:00
Boyuan Zhang
b86bf4b568 vl: add parameters for VAAPI encode
Allow to specify more parameters in the encoding interface
which previously just hardcoded in the encoder

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:07 +02:00
Eric Anholt
9194473dd2 vc4: Emit resets of the uniform stream at the starts of blocks.
If a block might be entered from multiple locations, then the uniform
stream will (probably) be at different points, and we need to make sure
that it's pointing where we expect it to be.  The kernel also enforces
that any block reading a uniform resets uniforms, to prevent reading
outside of the uniform stream by using looping.
2016-07-13 23:54:15 -07:00
Eric Anholt
44df061aaa vc4: Add support for scheduling of branch instructions.
For now we don't fill the delay slots, and instead just drop in NOPs.
2016-07-13 23:54:15 -07:00
Eric Anholt
a59da513d3 vc4: Move the QPU instructions to schedule into each block.
We'll want to schedule them individually, to handle delay slots.
2016-07-13 23:54:15 -07:00
Eric Anholt
37ecc61662 vc4: Disable vc4_opt_vpm in the presence of control flow.
It's a really valuable pass currently, but it will be a mess to rewrite
for control flow.  For now, just disable it if we have multiple blocks
present.
2016-07-13 23:54:15 -07:00
Eric Anholt
ee69cfd11d vc4: Convert vc4_opt_dead_code to work in the presence of control flow.
With control flow, we can't be sure that we'll see the uses of a variable
before its def as we walk backwards.  Given that NIR is eliminating our
long chains of dead code, a simple solution for now seems fine.

This slightly changes the order of some optimizations, and so an opt_vpm
happens before opt_dce, causing 3 dead MOVs to be turned into dead FMAXes
in Minecraft:

instructions in affected programs:     52 -> 54 (3.85%)
2016-07-13 23:54:15 -07:00
Eric Anholt
4e797bd98f vc4: Update copy propagation for control flow.
Previously, we could assume that a MOV from a temp was always an available
copy, because all temps were SSA in NIR, and their non-SSA state in QIR
was just due to the fact that they were from a bcsel or pack_unorm_4x8, so
we could use the current value of the temp after that series of QIR
instructions to define it.

However, this is no longer the case with control flow.  Instead, we track
a new array of MOVs defined within the block that haven't had their source
or dest killed yet, and use that primarily.  We fall back to looking
through the QIR defs array to handle across-block MOVs, but now require
that copies from the SSA defs have an SSA src as well.
2016-07-13 23:54:15 -07:00
Tim Rowley
29f53d7937 Revert "gallium: Force blend color to 16-byte alignment"
This reverts commit d8d6091a84.

Heap allocations may be only 8-byte aligned on 32-bit system, and so having
members with 16-byte alignment (such as in the case where pipe_blend_color is
embedded in radeonsi's si_context) is undefined behavior which indeed causes
crashes when compiled with gcc -O3.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96835
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Acked-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-07-13 13:55:33 -05:00
Marek Olšák
0f7a6ea5e7 radeonsi: report accurate SGPR and VGPR spills
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
d227dbe272 radeonsi: add a workaround for a compute VGPR-usage LLVM bug
v2: use abort(), describe which LLVM version is affected

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
f4d1de7f86 radeonsi: use LLVMGetTypeKind to tell if an input is an array of descriptors
just a cleanup

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00