Commit graph

24966 commits

Author SHA1 Message Date
Oded Gabbay
39b4dfe6ab llvmpipe: use simple coeffs calc for 128bit vectors
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 02:38:53 +01:00
Marek Olšák
3b37155a68 gallium/radeon: allow returning SDMA fences from pipe->flush
pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Marek Olšák
7f9122c968 gallium/radeon: always return the last SDMA fence on SDMA flush if needed
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Samuel Pitoiset
e887407491 nvc0: add missing compute parameters required by clover
This fixes crashes with some piglit OpenCL tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:17:00 +01:00
Samuel Pitoiset
e640ba41ed nvc0: handle NULL pointer in nvc0_get_compute_param()
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:16:45 +01:00
Samuel Pitoiset
00bb524716 nv50: use correct heaps for FP and GP code segments
This is just a cosmetic change. Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-01 23:29:20 +01:00
Ilia Mirkin
67635a0a71 nouveau: get rid of tabs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-31 19:58:14 -04:00
Dave Airlie
425d8c2578 virgl/vtest: fix extra malloc
This somehow got added twice, drop the first one.

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 18:05:33 +10:00
Dave Airlie
8d731ebd33 virgl: free sampler view on failure path
Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
7153b12651 gallium/swrast: fixup build breakage and warnings
The front buffer rendering changes broke an interface, I didn't
fix up all of them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
2b67657096 gallium/swrast: fix front buffer blitting. (v2)
So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:04:36 +10:00
Emil Velikov
7bac333508 winsys/virgl: rework line wrapping/indent
Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
493e410d55 virgl: unwrap the includes
Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7154d48c6e winsys/virgl: remove temporary ret variable
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
bdcb005788 winsys/virgl: always memset prior to ioctl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
e992715da2 winsys/virgl: use MALLOC to match FREE
The uppercase versions are wrappers which must be matched.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
72d7d1e224 winsys/virgl: remove calloc/malloc casts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
1ce685f05e winsys/virgl: throw in some inline wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
78be78b681 virgl: introduce virgl_query() inline wrapper
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
dafcb21405 virgl: use virgl_screen/surface upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7af46b9c74 virgl: introduce and use virgl_transfer/texture/resource inline wrappers
The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
6b123fa07f virgl: add virgl_context/sampler_view/so_target() upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
1f43e4e1a3 winsys/virgl/drm: drop unneeded forward declaration
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
e0056228f6 virgl: remove sw_winsys pointer from virgl_screen
The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
0c82c2fb0b virgl: rename virgl.h to virgl_screen.h
Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
87f7d61e19 virgl: move virgl_hw.h into the driver dir
Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
014f8ef2ff virgl: straighten the includes confusion
Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
2c705d2220 virgl: remove the _FILE_OFFSET_BITS defines
The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
a05648fd7e winsys/virgl/drm: add all files to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
8b9e69e2ea winsys/virgl/vtest: list all files in Makefile.sources
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:36:46 +00:00
Emil Velikov
73308ca802 virgl: move sources list to Makefile.sources
... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:33:11 +00:00
Emil Velikov
c1bf71f77c virgl: fix drm.h include path
The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:29:01 +00:00
Samuel Pitoiset
0d0329df8f nv50: do not create an invalid HW query type
While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
5f1eeb799b nv50: move HW queries to nv50_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
76b48ceee9 nv50: move nva0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
2e3fe0379e nv50: add a header file for nv50_query
Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-30 17:57:15 +01:00
Julien Isorce
e7ed3963ed st/va: add support to export a surface as dmabuf
I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:20 +01:00
Julien Isorce
802ba6f865 st/va: implement VaDeriveImage
And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:11 +01:00
Julien Isorce
5e763aaa21 st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:41 +01:00
Julien Isorce
86eb4131a9 st/va: add headless support, i.e. VA_DISPLAY_DRM
This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:35 +01:00
Julien Isorce
1bdea0e579 st/va: handle Video Post Processing for configs
Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:29 +01:00
Julien Isorce
0b868807e4 st/va: add colospace conversion through Video Post Processing
Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:10 +01:00
Julien Isorce
05b6ce4209 st/va: implement dmabuf import for VaCreateSurfaces2
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:03 +01:00
Julien Isorce
adf1133118 st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:54 +01:00
Julien Isorce
d42029d2d9 st/va: do not destroy old buffer when new one failed
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:47 +01:00
Julien Isorce
87109e5f88 st/va: properly defines VAImageFormat formats and improve VaCreateImage
Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:05:23 +01:00
Eric Anholt
04c42f3ab5 vc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).
2015-10-29 22:58:01 -07:00
Ilia Mirkin
06fa2e864a nv50: mark contexts shareable, compile at creation time
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 23:25:08 -04:00
Ilia Mirkin
f768eaa87d nv50: allow per-sample interpolation to be forced via rast
Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 22:42:38 -04:00
Dave Airlie
744cc036b9 r600: enable SB for geom shaders on pre-evergreen
I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.

Otherwise SB seems to be working fine for geom shaders on my
rv635.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-30 10:40:59 +10:00