Commit graph

25808 commits

Author SHA1 Message Date
Christian König
eaf7ec9cfc st/va: add motion adaptive deinterlacing v2
v2: minor cleanup

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-18 10:59:32 +01:00
Michel Dänzer
ad20be1f30 gallium/radeon: Rename do_invalidate_resource to invalidate_buffer
And only call it from r600_invalidate_resource for buffer resources.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
0491dd1deb st/dri: Don't call invalidate_resource for NULL depth/stencil buffers
Fixes crash in 4 EGL piglit tests with radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
a9ab7172a6 radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
4297259fc8 radeonsi: Print "LLVM emitted unknown config register" warning only once
Say "LLVM" instead of "Compiler" for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Oded Gabbay
679a654a77 llvmpipe: use vpkswss when dst is signed
This patch fixes a bug when building a pack instruction.

For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.

This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale

I've also corrected some coding style issues in the function.

v2: Returned else statements to vmware style

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-18 09:45:25 +02:00
Ilia Mirkin
4ac1274caa gm107/ir: don't do indirect frag shader inputs on GM107
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
3281ae96c8 tgsi: initialize Atomic field in tgsi_default_declaration
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
5a81b48ad0 nvc0: bsp_bo can't be null
We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Oded Gabbay
529aa8249a llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-17 21:07:27 +02:00
Rob Clark
02ac91d717 freedreno/ir3: fix mad 3rd src delay calc
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-17 12:21:45 -05:00
Rob Clark
2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark
6a33c5c0df freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:20 -05:00
Rob Clark
ddede497b8 freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions.  This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:22:43 -05:00
Rob Clark
ebd3a1fc17 ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:52 -05:00
Rob Clark
fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark
cc7ed34df9 freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:46 -05:00
Rob Clark
680664dff9 freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:37 -05:00
Rob Clark
2809c87f90 freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:59 -05:00
Rob Clark
fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00
Rob Clark
d430f443de freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:33 -05:00
Rob Clark
6f0377d651 ttn: add missing writemask on store_output
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-16 13:35:44 -05:00
Ilia Mirkin
32a9fe013b nv50/ir: add saturate support on ex2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-16 00:10:56 -05:00
Jeff Muizelaar
e5fefe49f2 gallivm: avoid crashing in mod by 0 with llvmpipe
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-16 03:36:29 +01:00
Roland Scheidegger
03f66dfb4b llvmpipe: ditch additional ref counting for vertex/geometry sampler views
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Roland Scheidegger
2f9a325b6a llvmpipe: fix "leaking" textures
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...

v2: rename var

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Ilia Mirkin
fffb559129 nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:

total local used in shared programs   : 54116 -> 30372 (-43.88%)

Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 20:14:01 -05:00
Ilia Mirkin
e231f59b6d nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection
Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.

This helps a lot of Metro 2033 Redux and a handful of KSP shaders:

total instructions in shared programs : 6288373 -> 6261517 (-0.43%)
total gprs used in shared programs    : 944051 -> 945131 (0.11%)
total local used in shared programs   : 54116 -> 54116 (0.00%)

A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 20:13:59 -05:00
Ilia Mirkin
37b67db6ae nvc0/ir: be careful about propagating very large offsets into const load
Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.

This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.

Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 18:20:27 -05:00
Ilia Mirkin
7a521ddf36 nvc0: allow fragment shader inputs to use indirect indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 14:28:04 -05:00
Marek Olšák
dc96a18d24 radeonsi: don't miss changes to SPI_TMPRING_SIZE
I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-14 19:55:41 +01:00
Charmaine Lee
6303231a1d svga: add DXGenMips command support
For those formats that support hw mipmap generation, use the
DXGenMips command. Otherwise fallback to the mipmap generation utility.

Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench)

v2: make sure the texture surface was created with the render target bind flag
    set relocation flag to SVGA_RELOC_WRITE for the texture surface

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:44:25 -07:00
Charmaine Lee
78e628ae43 svga: add num-generate-mipmap HUD query
The actual increment of the num-generate-mipmap counter will be done
in a subsequent patch when hw generate mipmap is supported.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:39:53 -07:00
Charmaine Lee
3038e8984d gallium/st: add pipe_context::generate_mipmap()
This patch adds a new interface to support hardware mipmap generation.
PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify
if this new interface is supported; if not supported, the state tracker will
fallback to mipmap generation by rendering/texturing.

v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers
v3: add format to the generate_mipmap interface to allow mipmap generation
    using a format other than the resource format
v4: fix return type of trace_context_generate_mipmap()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:39:53 -07:00
Nicolai Hähnle
e976860638 gallium/radeon: do not reallocate user memory buffers
The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:41:24 -05:00
Nicolai Hähnle
321140d563 gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:41:04 -05:00
Nicolai Hähnle
08c71740ad gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:40:00 -05:00
Nicolai Hähnle
654670b404 gallium: add PIPE_CAP_INVALIDATE_BUFFER
It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:39:38 -05:00
Nicolai Hähnle
cbcdef7b40 winsys/radeon: fix warnings about incompatible pointer types
Some confusion between pb_buffer and radeon_bo as well as between
radeon_drm_winsys and radeon_winsys.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:33:58 -05:00
Marek Olšák
4ea0febcb0 radeonsi: move POSITION and FACE fragment shader inputs to system values
And FACE becomes integer instead of float.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Marek Olšák
caf3c2abea radeonsi: simplify gl_FragCoord behavior
It will become a system value, not an input.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Roland Scheidegger
38cdcb000d llvmpipe: (trivial) use cast wrapper for __m128d to __m128 casts
some compiler was unhappy.
2016-01-13 04:48:41 +01:00
Roland Scheidegger
49ec647c3b llvmpipe: avoid most 64 bit math in rasterization
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (9773722c2b) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).

This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:50:57 +01:00
Roland Scheidegger
16530fdc82 llvmpipe: scale up bounding box planes to subpixel precision
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:59 +01:00
Roland Scheidegger
0298f5aca7 llvmpipe: add sse code for fixed position calculation
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).

(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:09 +01:00
Roland Scheidegger
9422999e40 draw: fix key comparison with uninitialized value
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).

v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
2016-01-13 02:43:04 +01:00
Tom St Denis
56fc2986d5 st/omx: Avoid segfault in deconstructor if constructor fails
If the constructor fails before the LIST_INIT calls the pointers
will be null and the deconstructor will segfault.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-01-12 19:13:19 +01:00
Christian König
6f898f740c vl: use preferred format for deinterlacing
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:42 +01:00
Christian König
5fdd4a5aef vl: improve motion adaptive deinterlacer
Handle other formats than YV12 as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:39 +01:00
Christian König
e945235aed st/va: add BOB deinterlacing v2
Tested with MPV.

v2: correctly handle compositor deinterlacing as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:35 +01:00