Commit graph

28135 commits

Author SHA1 Message Date
Boyuan Zhang
dd208ea006 st/va: enable h264 VAAPI encode
Enable H.264 VAAPI encoding through config. Currently only H.264 baseline is supported. Encode entrypoint is not accepted by driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:54 +02:00
Boyuan Zhang
71da1354d7 st/va: add function to handle misc param type frame rate
Frame rate can be passed to driver either through VAEncSequenceParameterBufferType or VAEncMiscParameterTypeFrameRate. Previous code only implement the former one, which is used by Gstreamer-Vaapi. Now adding implementation for VAEncMiscParameterTypeFrameRate. Also adding default frame rate as 30 just in case application never provides frame rate information to driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang
10dec2de2d st/va: add enviromental variable to disable interlace
Add environmental variable to disable interlace mode. At VAAPI decoding stage, driver can not distinguish b/w pure decoding case and transcoding case. And since interlace encoding is not supported, we have to disable interlace for transcoding case. The temporary solution is to use enviromental variable to disable interlace mode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang
b0ceb4cc48 st/va: add preset values for VAAPI encode
Add some hardcoded values hardware needs mainly for rate control purpose. With previously hardcoded values for OMX, the rate control result is not correct. This change fixed the rate control result by setting correct values for Vaapi.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang
85d807f2e0 st/va: add functions for VAAPI encode
Add necessary functions/changes for VAAPI encoding to buffer and picture. These changes will allow driver to handle all Vaapi encode related operations. This patch doesn't change the Vaapi decode behaviour.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang
10c1cc47a6 st/va: get rate control method from configattrib v2
Rate control method is passed from app to driver through config attrib list.
That is why we need to store this rate control method to config. And later
on, we will pass this value to context->desc.h264enc.rate_ctrl.rate_ctrl_method.

v2 (chk): fix broken build and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang
34f4634843 st/va: add conversion for yv12 to nv12in putimage v2
For putimage call, if image format is yv12 (or IYUV with U V field swap) and
surface format is nv12, then we need to convert yv12 to nv12 and then copy
the converted data from image to surface. We can't use the existing logic
where surface is destroyed and re-created with yv12 format.

v2 (chk): fix some compiler warnings and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang
23b4ab1738 vl/util: add copy func for yv12image to nv12surface v2
Add function to copy from yv12 image to nv12 surface for VAAPI putimage call.
We need this function in VaPutImage call where copying from yv12 image to nv12
surface for encoding. Existing function can't be used because it only work for
copying from yv12 surface to nv12 image in Vaapi.

v2: cleanup variable types and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:18 +02:00
Boyuan Zhang
5bcaa1b9e9 st/va: add encode entrypoint v2
VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We
will save this encode entry point in config. config_id was used as profile
previously. Now, config has both profile and entrypoint field, and config_id is
used to get the config object. Later on, we pass this entrypoint to
context->templat.entrypoint instead of always hardcoded to
PIPE_VIDEO_ENTRYPOINT_BITSTREAM for decoding case previously. Encode entrypoint
is not accepted by driver until we enable Vaapi encode in later patch.

v2 (chk): fix commit message to match 80 chars, use switch instead of ifs,
	  fix memory leaks in the error path, implement vlVaQueryConfigEntrypoints
	  as well, drop VAEntrypointEncPicture (only used for JPEG).

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:30:42 +02:00
Samuel Pitoiset
e7b2ce5fd8 nvc0: upload sample locations on GM20x
This fixes a bunch of multisample piglit tests on GM206, like
bin/arb_texture_multisample-texelfetch 2 -auto -fbo

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-24 22:46:26 +02:00
Rob Clark
2f57e57881 freedreno/a4xx: time-elapsed query should be active for clears
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-24 09:33:05 -04:00
Samuel Pitoiset
3a2e67bf78 nvc0/ir: fix up an assertion in emitUADD()
It's illegal to have neg modifiers on both sources for OP_ADD,
and it's illegal to have OP_SUB with just src0 neg.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-24 00:42:47 +02:00
Samuel Pitoiset
a159a3d5cb nvc0: fix wrong indentation in nvc0_validate_fb()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-23 23:59:10 +02:00
Rob Clark
9253dcde58 freedreno/a4xx: timestamp queries
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Rob Clark
b888d8e937 freedreno: hw timestamp support
If the kernel supports it, use hw counter for timestamps.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Rob Clark
6a4b052820 freedreno: prep work for timestamp queries
We need "NULL" state to be a valid bit in the bitmask, because timestamp
queries are not restricted to draw/etc stages (ie. the only commands to
submit may just be to read the timestamp).  And just because there are
no draws, isn't a reason to skip the flush and return zero.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Nicolai Hähnle
3d69357da9 radeonsi: ensure sample locations are set for line and polygon smoothing
Since commit d938b8c, the sample locations are no longer set unconditionally,
so we need to set the atom to dirty on all chips, not just Polaris.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-23 15:36:39 +02:00
Nicolai Hähnle
f755da0f2f radeonsi: fix Polaris MSAA regression
The regression was introduced by commit d938b8c. The problem here is that in
order to use the small primitive filter, we need to explicitly set the sample
locations to 0. But the DB doesn't properly process the change of sample
locations without a flush, and so we can end up with incorrect Z values.

Instead of doing a flush, just disable the small primitive filter when MSAA
is force-disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-23 15:36:38 +02:00
francians@gmail.com
abb2a865a4 freedreno/ir3: Add missing braces in initializer
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 09:14:55 -04:00
francians@gmail.com
c99cdd2175 freedreno/a2xx: silence missing case 'SHADER_COMPUTE' warning (v2)
v2: no need for break after an unreachable (Matt Turner)

Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 09:14:18 -04:00
Marek Olšák
700de07771 radeonsi: implement buffer_subdata without indirect calls
There is less noise in CPU profile data now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-23 13:33:42 +02:00
Marek Olšák
8e3e9d2839 gallium/util: don't modify usage in pipe_buffer_write
All drivers were already doing it except virgl.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-23 13:33:42 +02:00
Marek Olšák
1ffe77e7bb gallium: split transfer_inline_write into buffer and texture callbacks
to reduce the call indirections with u_resource_vtbl.

The worst call tree you could get was:
  - u_transfer_inline_write_vtbl
    - u_default_transfer_inline_write
      - u_transfer_map_vtbl
        - driver_transfer_map
      - u_transfer_unmap_vtbl
        - driver_transfer_unmap

That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.

The new interface is:
  pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
  pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
                                stride, layer_stride)

v2: fix whitespace, correct ilo's behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-07-23 13:33:42 +02:00
Samuel Pitoiset
3f5cf8c488 nv50/ir: allow to swap sources for OP_SUB
This allows the load-propagation pass to swap the sources in presence
of immediate values.

Maxwell (GM107):

total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs    :330741 -> 330154 (-0.18%)
total local used in shared programs   :28032 -> 28032 (0.00%)

                local        gpr       inst      bytes
    helped           0         271         425         425
      hurt           0           0         194         194

Fermi (GF114):

total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs    :380934 -> 380215 (-0.19%)
total local used in shared programs   :33304 -> 33264 (-0.12%)

                local        gpr       inst      bytes
    helped           5         314         521         521
      hurt           0           4         195         195

No regressions on GM107 and GF114 with full piglit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 22:51:37 +02:00
Marek Olšák
2e890b5350 gallium/radeon: make deferred flushes asynchronous
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-22 22:34:49 +02:00
Marek Olšák
d17b35e671 gallium: add PIPE_FLUSH_DEFERRED
There are 2 uses:
- Asynchronous flushing for multithreaded drivers.
- Return a fence without flushing (mid-command-buffer fence). The driver
  can defer flushing until fence_finish is called.

This is required to make Bioshock Infinite faster, which creates
1000 fences (flushes) per frame.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-07-22 22:34:49 +02:00
Marek Olšák
4cdc482283 gallium/os: use CLOCK_MONOTONIC for sleeps (v2)
v2: handle EINTR, remove backslashes

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-22 22:34:49 +02:00
Samuel Pitoiset
c2801f9272 nvc0/mme: fix offsets used for indirect draws
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x180 to 0x1a0, and the macros have to be re-compiled.

Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 11:32:09 +02:00
Samuel Pitoiset
dbcff7fdbb nvc0: fix offsets of MP perf counters input parameters
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.

This also fixes amd_performance_monitor_measure piglit.

Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 11:32:04 +02:00
Eric Anholt
d2b4b16589 vc4: Return V3D version details in the GL renderer info.
This is as close as we get to a name for the 3D blocks.
2016-07-20 16:15:15 -07:00
Eric Anholt
d81934cded vc4: Check the V3D version reported by the kernel.
We don't want to bring up an old userspace driver on a kernel for
newer hardware.  We'll also want to look at the other ident fields in
the future.
2016-07-20 16:15:15 -07:00
Eric Anholt
83b8ca58e1 vc4: Detect and report kernel support for branching. 2016-07-20 16:15:15 -07:00
Eric Anholt
16985eb308 vc4: Switch to using the libdrm-provided vc4_drm.h.
The required version is set to .69 for the getparam ioctl that will be
used in the next commit.
2016-07-20 16:15:15 -07:00
Tom Stellard
106946153f clover: Re-order includes in invocation.cpp to fix build
The build was failing because the official CL headers have a few defines,
like: # define cl_khr_gl_sharing 1

Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-20 21:15:53 +00:00
Tom Stellard
a73bf11a63 clover: Add missing include v2
clang commit r275822 removed unnecessary includes from header files,
so we now need to explicitly include clang/Lex/PreprocessorOptions.h

v2:
  - Use <> instead of "" for the include path.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-20 21:15:53 +00:00
Tim Rowley
0f13a8f770 swr: [rasterizer core] introduce simd16intrin.h
Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged,
adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side,
with emulation, that we can use piecemeal, rather than the all-or-nothing
approach to bring up avx512.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
5fe361e2c0 swr: [rasterizer core] fix for possible int32 overflow condition
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
a123d12e14 swr: [rasterizer core] rename *_MAX enum values to *_COUNT
Makes these names semantically correct.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
e41d9dd576 swr: [rasterizer core] centroid correction
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
e0529a4668 swr: [rasterizer core] support range of values in TemplateArgUnroller
Fixes Linux warnings.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
0363015964 swr: [rasterizer core] ensure adjacent topologies use the cut-aware PA
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
efdaf5fa3e swr: [rasterizer] attribute swizzling and linkage
Add support for enhanced attribute swizzling. Currently supports constant
source overrides to handle PrimitiveID support. No support yet for input
select swizzling or wrap shortest. Removes obsoleted linkageMask and
associated code.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
a5846fb75a swr: [rasterizer common] icc declspec definitions
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
0d13f2e801 swr: [rasterizer jitter] rework vertex/instance ID storage in fetch
Moved the setting into the existing component control code. Fixes bad
interaction between attribute/component setting for vertex/instance ID
and component packing.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Tim Rowley
1d09b3971a swr: [rasterizer core] avx512 simd utility work
Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Tim Rowley
98641f4e73 swr: [rasterizer core] viewport rounding for disabled scissor
Adjust viewport rounding when scissor rect is disabled during macro
tile scissor setup.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Tomasz Figa
70a28afb29 gallium/dri: Add shared glapi to LIBADD on Android
An earlier patch fixed the problem for classic drivers, however Gallium
was still left broken. This patch applies the same workaround to
Gallium, when compiled for Android. Following is a quote from the
original patch:

0cbc90c57c mesa: dri: Add shared glapi to LIBADD on Android

/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"

On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
  dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
  dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);

However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:10:33 +01:00
Józef Kucia
14608ef920 radeonsi: advertise 8 bits subpixel precision for viewport bounds
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-20 12:45:31 +02:00
Józef Kucia
98aa807188 r600: advertise 8 bits subpixel precision for viewport bounds
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-20 12:45:31 +02:00
Józef Kucia
3cd28fe3de gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)
This allows Gallium drivers to advertise the subpixel precision
for floating point viewports bounds.

v2:
  - Set ViewportSubpixelBits in st_init_limits.

Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 12:45:31 +02:00