Commit graph

32650 commits

Author SHA1 Message Date
Eric Anholt
3e3772c1b3 broadcom/vc4: Fix release build
I remember thinking "gosh, it would be nice if I could do a kernel-style
'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was
compiled on both builds", and then forgot to test a release build anyway.

Fixes: a8fd58eae5 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.")
Reported-by: Derek Foreman <derekf@osg.samsung.com>
2017-09-27 13:03:14 -07:00
Eric Anholt
a8fd58eae5 vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.
This has proven to be incredibly useful for debugging CMA allocation
failures and driving memory management improvements.  However, we don't
want to burden entry and exit from the BO cache with the labeling ioctl's
overhead on release builds.
2017-09-27 10:21:49 -07:00
Marek Olšák
a65db0ad1c st/dri: don't expose modifiers in EGL if the driver doesn't implement them
This unbreaks waffle/gbm (piglit/gbm) which fails initialization.

v2: also don't set queryDmaBufFormats

Reviewed-by: Daniel Stone <daniel@fooishbar.org>
2017-09-27 17:59:50 +02:00
Jan Vesely
f67ceeffd4 clover: Query and export int64 atomics
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-09-27 11:13:22 -04:00
Marek Olšák
f70f6baaa3 gallium/radeon: consolidate PIPE_BIND_SHARED/SCANOUT handling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-27 10:38:46 +02:00
Samuel Pitoiset
3ab0cff32c radeonsi: remove useless check in si_blit_decompress_color()
That's unnecessary to double-check that dcc_offset is not 0
because all callers already check that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-27 09:31:24 +02:00
Samuel Pitoiset
eba2abf54b gallium/radeon: more use of vi_dcc_formats_are_incompatible()
Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-27 09:31:24 +02:00
George Kyriazis
e927cb55a9 swr: Remove unneeeded comparison
No need to check if screen->pipe != pipe, so we can just assign it.  Just do it.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-26 18:09:19 -05:00
George Kyriazis
b9aa0fa7d6 swr: Handle resource across context changes
Swr caches fb contents in tiles.  Those tiles are stored on a per-context
basis.

When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).

The context does not get any dirty bits to identify this case.  This has
to be, then, coordinated by the resources that are being shared between
the contexts.

Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived().  At that time,
we invalidate the tiles of the new context.  The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.

So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
  of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
  it to the surface
- During swr_update_derived(), when we validate the new context, we check
  all resources to see what changed, and if so, we invalidate the
  current tiles.

Fixes rendering problems with CEI/Ensight.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-26 18:09:15 -05:00
Eric Anholt
6cc59de9cd gallium: Weaken assertion about u_mm's align2 field.
vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb
here was too small for that purpose.  Allowing any valid align2 value that
u_mm's 32-bit addressing can represent will still catch most cases of
people passing in a byte alignment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-26 14:50:29 -07:00
Boris Brezillon
ef578906d8 broadcom/vc4: Fix infinite retry in vc4_bo_alloc()
cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.

Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function.  While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: 78087676c9 "vc4: Restructure the simulator mode."
2017-09-26 14:49:48 -07:00
Eric Anholt
68c91a87d7 broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).

To avoid that, move the shadow resource to the vc4_sampler_view struct.

Fixes: f0ecd36ef8 ("st/mesa: add an entirely separate codepath for setting up buffer views")
2017-09-26 14:49:43 -07:00
Brian Paul
8822ea100c svga: silence unused var warning in optimized build with MAYBE_UNUSED
Trivial
2017-09-26 09:51:43 -06:00
Marek Olšák
06bfb2d28f r600: fork and import gallium/radeon
This marks the end of code sharing between r600 and radeonsi.
It's getting difficult to work on radeonsi without breaking r600.

A lot of functions had to be renamed to prevent linker conflicts.
There are also minor cleanups.

Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-26 04:21:14 +02:00
Tim Rowley
5a2bca5db5 swr/rast: Handle instanceID offset / Instance Stride enable
Supported in JitGatherVertices(); FetchJit::JitLoadVertices() may require
similar changes, will need address this if it is determined that this
path is still in use.

Handle Force Sequential Access in FetchJit::Create.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
68d8dd1fb5 swr/rast: Remove code supporting legacy llvm (<3.9)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
9c468c775b swr/rast: Fix allocation of DS output data for USE_SIMD16_FRONTEND
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
d18c2a1fa4 swr/rast: Slightly more efficient blend jit
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
5033d49d5d swr/rast: Properly sized null GS buffer
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
9c82cf0f1e swr/rast: Move SWR_GS_CONTEXT from thread local storage to stack
Move structure, as the size is significantly reduced due to dynamic
allocation of the GS buffers.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
efe7fa4384 swr/rast: Fetch compile state changes
Add ForceSequentialAccessEnable and InstanceIDOffsetEnable bools to
FETCH_COMPILE_STATE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
cd6e91d3a2 swr/rast: New GS state/context API
One piglit regression, which was a false pass:
  spec@glsl-1.50@execution@geometry@dynamic_input_array_index

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Tim Rowley
41565ddf7a swr/rast: Add support for R10G10B10_FLOAT_A2_UNORM pixel format
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-25 13:38:57 -05:00
Leo Liu
f3ed1d2f6b st/va/postproc: implement the DRM prime grabber
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
b47bdf55dc vl/compositor: convert RGB buffer to YUV with color conversion
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
737d13637d vl/csc: add a RGB to YUV CSC matrix
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
a2ebe57992 vl/compositor: create RGB to YUV fragment shader
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
169c077d1d st/va/postproc: use progressive target buffer for scaling
Scaling between interlaced buffers, esp. for scale-up, because
blit will scale up top filed and bottom field separately. it'll
result in the weaving for these buffer with lack of accuracy.
So use shader deint for the case.

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
1d1299f8a4 st/va: make internal func vlVaHandleSurfaceAllocate() call simpler
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
96f89f440b st/va/postproc: add a full NV12 deint support from buffer I to P
Before it's impossible to transcode an interlaced video, becasue if
in order for encoder to work, we have to force buffer to progessive,
but the deint with buffer from I to P is missing. Now along With
the new YUV deint full function, it works with weave and bob deint.

Also this will benefit transcoding video with scaling parameters.

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
4f9e7b1279 vl/compositor: add Bob top and bottom to YUV deint function
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:14 -04:00
Leo Liu
9484852cdb vl/compositor: remove vl_compositor_yuv_deint() function
No longer used.

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
3ad8687295 st/va: use new vl_compositor_yuv_deint_full() to deint
We also set src rectangle explicitly just in case of the mismatch
of size between interlaced buffer and progressive buffer

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
db28fdc0ad st/omx: use new vl_compositor_yuv_deint_full() to deint
v2: add dst rect to make sure no scale

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
001358a97c vl/compositor: add a new function for YUV deint
It will replace previous deint function with abilities of
scaling and field deinterlacing

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
abd05a6cc4 vl/compositor: extend YUV deint function to do field deint
It will add Bob deint ability to interlaced video for HW encoder

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
4ef0828946 vl/compositor: separate YUV part from shader video buffer function
So that it can be re-used

Acked-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Leo Liu
eb51838771 st/va/postproc: use video original size for postprocessing
Otherwise the aligned size will make video scaled

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-25 09:42:13 -04:00
Eric Engestrom
eb2efbba78 scons: use python3-compatible generator
These changes were generated using python's `2to3` tool.

Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-25 12:05:47 +01:00
Eric Engestrom
7d48219b3a scons: use python3-compatible print()
These changes were generated using python's `2to3` tool.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102852
Reported-by: Alex Granni <liviuprodea@yahoo.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-09-25 11:57:12 +01:00
Wladimir J. van der Laan
3f7093bed2 etnaviv: Add missing includes after 6ace0b8
Add missing includes after 6ace0b8 (etnaviv: don't enable RT
full-overwrite when logicop is enabled), otherwise the etnaviv driver
won't build because of missing macros.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Andres Gomez <agomez@igalia.com>
2017-09-22 20:49:03 +02:00
Lucas Stach
e9d37d68cf etnaviv: fix 16bpp clears
util_pack_color may leave undefined values in the upper half of the packed
integer. As our hardware needs the upper 16 bits to mirror the lower 16bits,
this breaks clears of those formats if the undefined values aren't masked off.

I've only observed the issue with R5G6B5_UNORM surfaces, other 16bpp
formats seem to work fine.

Fixes: d6aa2ba2b2 (etnaviv: replace translate_clear_color with util_pack_color)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-09-22 20:48:32 +02:00
Tim Rowley
066d1dc951 swr/rast: remove llvm fence/atomics from generated files
We currently don't use these instructions, and since their API
changed in llvm-5.0 having them in the autogen files broke the mesa
release tarballs which ship with generated autogen files.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102847
CC: mesa-stable@lists.freedesktop.org
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-22 11:38:57 -05:00
Lucas Stach
6ace0b8bc8 etnaviv: don't enable RT full-overwrite when logicop is enabled
Logicop is a form of blending with the framebuffer, so we must allow
framebuffer reads when logicop is enabled.

Fixes: piglit gl-1.0-logicop on GC3000, which has logicop support

Signed-off-by: Lucas Stach <dev@lynxeye.de>
2017-09-22 12:30:42 +02:00
Thomas Helland
030f4ecf74 gallium/util: Remove unused keymap
This is not used anywhere in the codebase. It's a hashtable
implementation that is based around cso_hash, and is therefore
(and as mentioned in a comment in the source) quite similar to
u_hash_table.

CC: Brian Paul<brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-21 20:42:38 +02:00
Jan Vesely
9c87150618 gallium: Add PIPE_SHADER_CAP_INT64_ATOMICS
Denotes availability of 64bit int atomic instructions

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-21 11:18:17 -04:00
Jan Vesely
3a5b69c09b clover: Wait for requested operation if blocking flag is set
v2: wait in map_buffer and map_image as well
v3: use event::wait instead of wait (skips fence wait for hard_event)
v4: use wait_signalled()

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2017-09-20 18:48:46 -04:00
Francisco Jerez
bc4000ee40 clover: Run the associated action before an event is signalled.
And define a method for other threads to wait until the action
function associated with an event has been executed to completion.

For hard events, this will mean waiting until the corresponding
command has been submitted to the pipe driver, without necessarily
flushing the pipe_context and waiting for the actual command to be
processed by the GPU (which is what hard_event::wait() already does).

This weaker kind of event wait will allow implementing blocking memory
transfers efficiently.

Acked-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2017-09-20 18:48:41 -04:00
Francisco Jerez
02f8ac6b70 clover: Wrap event::wait_count in a method taking care of the required locking.
Acked-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2017-09-20 18:48:28 -04:00
Roland Scheidegger
886626960b llvmpipe, gallivm: implement lod queries (LODQ opcode)
This uses all the existing code to calculate lod values for mip linear
filtering. Though we'll have to disable the simplifications (if we know some
parts of the lod calculation won't actually matter for filtering purposes due
to mip clamps etc.). For better or worse, we'll also disable lod calculation
hacks (mostly should make a difference for cube maps) always - the issue with
per-pixel lod being difficult is mostly because we then have different mipmaps
needed for the actual texel fetch, which isn't a problem with lodq.
We still use approximation for the log2 - for that reason I believe the float
part of the lod is only accurate to about 4-5 bits (and one bit less with 1d
textures actually) which is hopefully good enough (though d3d10 technically
requires 6 bits - could use quadratic interpolation instead of linear to get
8 bits or so).
Since lodq requires unclamped lod, we also have to move some sampler key
calculations to texture sampling code - even if we know we're going to access
mipmap 0 we still have to calculate lod and apply lod_bias for lodq.

Passes piglit ARB_texture_query_lod tests (after having fixed the test).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-09-20 21:18:54 +02:00