Commit graph

82384 commits

Author SHA1 Message Date
Emil Velikov
dafcb21405 virgl: use virgl_screen/surface upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7af46b9c74 virgl: introduce and use virgl_transfer/texture/resource inline wrappers
The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
6b123fa07f virgl: add virgl_context/sampler_view/so_target() upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
1f43e4e1a3 winsys/virgl/drm: drop unneeded forward declaration
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
e0056228f6 virgl: remove sw_winsys pointer from virgl_screen
The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
0c82c2fb0b virgl: rename virgl.h to virgl_screen.h
Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
87f7d61e19 virgl: move virgl_hw.h into the driver dir
Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
014f8ef2ff virgl: straighten the includes confusion
Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
2c705d2220 virgl: remove the _FILE_OFFSET_BITS defines
The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
a05648fd7e winsys/virgl/drm: add all files to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
8b9e69e2ea winsys/virgl/vtest: list all files in Makefile.sources
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:36:46 +00:00
Emil Velikov
73308ca802 virgl: move sources list to Makefile.sources
... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:33:11 +00:00
Emil Velikov
c1bf71f77c virgl: fix drm.h include path
The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:29:01 +00:00
Emil Velikov
60418a28ea i965: enable ARB_shader_clock on gen7+
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:23:18 +00:00
Emil Velikov
4379ca22f1 i965: Implement nir_intrinsic_shader_clock
v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:40 +00:00
Emil Velikov
6a15517242 i965/fs: move the fs_reg::smear() from get_timestamp() to the callers
We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:36 +00:00
Emil Velikov
7682844f34 nir: add shader_clock intrinsic
v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:32 +00:00
Emil Velikov
f1d98fc90a glsl: add support for the clock2x32ARB function
v2: correctly set the return type

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:29 +00:00
Emil Velikov
51265c1b85 glsl: add ARB_shader_clock infrastructure
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:27 +00:00
Emil Velikov
e916d5e013 mesa: add infra for ARB_shader_clock
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:23 +00:00
Samuel Pitoiset
0d0329df8f nv50: do not create an invalid HW query type
While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
5f1eeb799b nv50: move HW queries to nv50_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
76b48ceee9 nv50: move nva0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
2e3fe0379e nv50: add a header file for nv50_query
Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-30 17:57:15 +01:00
Julien Isorce
e7ed3963ed st/va: add support to export a surface as dmabuf
I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:20 +01:00
Julien Isorce
802ba6f865 st/va: implement VaDeriveImage
And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:11 +01:00
Julien Isorce
5e763aaa21 st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:41 +01:00
Julien Isorce
86eb4131a9 st/va: add headless support, i.e. VA_DISPLAY_DRM
This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:35 +01:00
Julien Isorce
1bdea0e579 st/va: handle Video Post Processing for configs
Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:29 +01:00
Julien Isorce
0b868807e4 st/va: add colospace conversion through Video Post Processing
Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:10 +01:00
Julien Isorce
05b6ce4209 st/va: implement dmabuf import for VaCreateSurfaces2
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:03 +01:00
Julien Isorce
adf1133118 st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:54 +01:00
Julien Isorce
d42029d2d9 st/va: do not destroy old buffer when new one failed
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:47 +01:00
Julien Isorce
87109e5f88 st/va: properly defines VAImageFormat formats and improve VaCreateImage
Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:05:23 +01:00
Samuel Iglesias Gonsalvez
7b8cc37585 main: fix basename match's check if it's an array or struct
Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.

Fixes two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
  read. Reported by Tapani Pälli.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-30 08:12:53 +01:00
Kristian Høgsberg
f7f1bc6cca i965: Fix invalid memory accesses after resizing brw_codegen's store table
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-30 07:49:10 +01:00
Connor Abbott
73caa26e43 i965/sched: use liveness analysis for computing register pressure
Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.

This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.

Shader-db results on bdw:

total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1

total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873

LOST:   8
GAINED: 0

v2: use regs_read() in more places.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:43 -04:00
Connor Abbott
c1860299b8 i965/fs: split out calculation of payload live ranges
We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:33 -04:00
Connor Abbott
45cd76e342 i965: dump scheduling cycle estimates
The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:24 -04:00
Connor Abbott
486268bdb0 i965: always run the post-RA scheduler
Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
2015-10-30 02:19:00 -04:00
Connor Abbott
85fce2d2f5 i965/sched: write-after-read dependencies are free
Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:56 -04:00
Connor Abbott
6f231fddff i965: fix cycle estimates when there's a pipeline stall
The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:53 -04:00
Eric Anholt
04c42f3ab5 vc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).
2015-10-29 22:58:01 -07:00
Jason Ekstrand
3883728730 anv: Add better push constant support
What we had before was kind of a hack where we made certain untrue
assumptions about the incoming data.  This new support, while it still
doesn't support indirects properly (that will come), at least pulls the
offsets and strides from SPIR-V like it's supposed to.
2015-10-29 22:26:36 -07:00
Jason Ekstrand
1f2624e6dd nir/spirv: Add support for push constants 2015-10-29 22:26:00 -07:00
Jason Ekstrand
a2283508b0 nir/intrinsics: Add a load_push_constant intrinsic 2015-10-29 22:26:00 -07:00
Jason Ekstrand
f2a8c9db24 nir/spirv: Rework the way we handle interface types 2015-10-29 22:26:00 -07:00
Ilia Mirkin
06fa2e864a nv50: mark contexts shareable, compile at creation time
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 23:25:08 -04:00
Ilia Mirkin
f768eaa87d nv50: allow per-sample interpolation to be forced via rast
Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 22:42:38 -04:00
Matt Turner
85ee2f7fcf i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00