Commit graph

25358 commits

Author SHA1 Message Date
Eric Anholt
74c4b3b80c vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.
2015-12-04 09:23:55 -08:00
Eric Anholt
3a508a0d94 vc4: Fix up tile alignment checks for blitting using just an RCL.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile.  We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.
2015-12-04 09:10:53 -08:00
Eric Anholt
a664233042 vc4: Add support for loading sample mask. 2015-12-04 09:10:53 -08:00
Rob Clark
4b18d51756 freedreno/ir3: convert scheduler back to recursive algo
I've played with a few different approaches to tweak instruction
priority according to how much they increase/decrease register pressure,
etc.  But nothing seems to change the fact that compared to original
(pre-multiple-block-support) scheduler, in some edge cases we are
generating shaders w/ 5-6x higher register usage.

The problem is that the priority queue approach completely looses the
dependency between instructions, and ends up scheduling all paths at the
same time.

Original reason for switching was that recursive approach relied on
starting from the shader outputs array.  But we can achieve more or less
the same thing by starting from the depth-sorted list.

shader-db results:

total instructions in shared programs:          113350 -> 105183 (-7.21%)
total dwords in shared programs:                219328 -> 211168 (-3.72%)
total full registers used in shared programs:   7911 -> 7383 (-6.67%)
total half registers used in shader programs:   109 -> 109 (0.00%)
total const registers used in shared programs:  21294 -> 21294 (0.00%)

                 half       full      const      instr     dwords
    helped           0         322           0         711         215
      hurt           0         163           0          38           4

The shaders hurt tend to gain a register or two.  While there are also a
lot of helped shaders that only loose a register or two, the more
complex ones tend to loose significanly more registers used.  In some
more extreme cases, like glsl-fs-convolution-1.shader_test it is more
like 7 vs 34 registers!

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Rob Clark
ad2cc7bddc freedreno/ir3: don't reuse a0.x across blocks
It causes confusion in sched if we need to split_addr() since otherwise
we wouldn't easily know which block the new addr instr will be scheduled
in.  So just side-step the whole situation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Rob Clark
8e52344dc1 freedreno/ir3: rename ir3_block::bd
We'll need to add similar for ir3_instruction, but following the pattern
to use 'id' seems confusing.  Let's just go w/ generic 'data' as the
name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Giuseppe Bilotta
efaac624af xvmc: force assertion in XvMC tests
This follows the src/util/u_atomic_test.c model of undefining NDEBUG
unconditionally throughouth the XvMC tests, to force asserts regardless
of debug mode.

The comment on u_atomic_test.c is also fixed (read 'debug' where it
should have been 'release').

v2: s/debug/release/ in relevant comments

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
[Emil Velikov: keep the src/util/ hunk as separate patch]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-04 14:06:41 +00:00
Ilia Mirkin
204f803ce0 nv50/ir: replace zeros in movs as well
The original change to put zeroes directly into instructions created
conditional mov's with the zero immediate. However that can't be
emitted, so make sure to replace the zero with r63.

Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-03 23:46:02 -05:00
Ilia Mirkin
a3722b81f5 nv50/ir: fold fma/mad when all 3 args are immediates
This happens pretty rarely, but might as well do it when it does.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-03 23:02:57 -05:00
Ilia Mirkin
2b98914fe0 nv50/ir: avoid looking at uninitialized srcMods entries
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-03 23:02:57 -05:00
Ilia Mirkin
49692f86a1 nv50/ir: fix DCE to not generate 96-bit loads
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-03 23:02:57 -05:00
Roland Scheidegger
51140f452a draw: fix clipping of layer/vp index outputs
This was just plain broken. It used always the value from v0 (for vp_index)
but would pass the value from the provoking vertex to later stages - but only
if there was a corresponding fs input, otherwise the layer/vp index would get
lost completely (as it would try to interpolate the (unsigned) values as
floats).
So, make it obey provoking vertex rules (drivers relying on draw will need to
do the same). And make sure that the default interpolation mode (when no
corresponding fs input is found) for them is constant.
Also, change the code a bit so constant inputs aren't interpolated then
copied over later.

Fixes the new piglit test gl-layer-render-clipped.

v2: more consistent whitespaces fixes for function defs, and more tab killing
(overall still not quite right however).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Roland Scheidegger
5ea5b169e9 softpipe: use provoking vertex for layer
Same as for llvmpipe, albeit softpipe only really handles multiple layers,
not multiple viewports/scissors.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Roland Scheidegger
ddaf8d7b10 llvmpipe: use provoking vertex for layer/viewport
d3d10 actually requires using provoking (first) vertex. GL is happy with
any vertex (as long as we say it's undefined in the corresponding queries).
Up to now we actually used vertex 0 for viewport index, and vertex 1 for
layer (for tris), which really didn't make sense (probably a typo). Also,$
since we reorder vertices of clockwise triangle, that actually meant we used
a different vertex depending if the traingle was cw or ccw (still ok by gl).
However, it should be consistent with what draw (clip) does, and using
provoking vertex seems like the sensible choice (draw clip will be fixed
next as it is totally broken there).
While here, also use the correct viewport always even when not needed
in setup (we pass it down to jit fragment shader it might be needed there
for getting correct near/far depth values).

No piglit changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Eric Anholt
83e65ca831 vc4: Add the RCL to CL debug dumping when in simulator mode.
We can't dump it in the real driver, since the kernel doesn't give us a
handle to it (except after a GPU hang, using a root ioctl).  In the
simulator we can.
2015-12-03 18:20:39 -08:00
Marek Olšák
dd27825c8c radeonsi: fix Fiji for LLVM <= 3.7
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-03 23:55:23 +01:00
Marek Olšák
bfc14796b0 radeonsi: fix occlusion queries on Fiji
Tested.
2015-12-03 23:46:37 +01:00
Marek Olšák
0b03f2def0 radeonsi: dump init_config IBs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
3a6de8c86e radeonsi: print framebuffer info into ddebug logs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
a0bfb2798d gallium/radeon: print more info about HTILE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
1cca259d99 gallium/radeon: print more info about CMASK
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
84fbb0aff9 gallium/radeon: rename fmask::pitch -> pitch_in_pixels
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
19eaceb6ed gallium/radeon: print more information about textures
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
2d712d35c5 gallium/radeon: move printing texture info into a separate function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
c60d49161e gallium/radeon: remove unused r600_texture::pitch_override
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
75d64698f0 gallium/radeon: remove DBG_TEXMIP
we don't need 2 flags for dumping texture info

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Edward O'Callaghan
a5055e2f86 gallium/aux/util: Trivial, we already have format use it
No need to dereference again, fixup for clarity.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-03 23:41:23 +01:00
Jose Fonseca
5294debfa4 automake: Fix typo in MSVC2008 compat flags.
It should be MSVC2008_COMPAT_CFLAGS and not MSVC2008_COMPAT_CXXFLAGS.

This is why the recent util_blitter breakage went unnoticed on autotools
builds.

Trivial.
2015-12-03 22:00:49 +00:00
Jose Fonseca
071af9a511 ttn: Whitelist from -Werror=declaration-after-statement.
nir is the exception among gallium/auxiliary -- we don't need to compile
it with MSVC2008 yet.  And this enables us to use
-Werror=declaration-after-statement in the next commit as we should,
without complicated fixes to tgsi_to_nir module.

Trvial.  Tested with GCC and Clang.
2015-12-03 22:00:49 +00:00
Brian Paul
72a913ceb8 st/wgl: add new stw_ext_rendertexture.c file
This should have been included in the previous commit.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-12-03 09:33:55 -07:00
Brian Paul
e832b5b7fa st/wgl: add support for WGL_ARB_render_texture
There are a few legacy OpenGL apps on Windows which need this extension.
We basically use glCopyTex[Sub]Image to implement wglBindTexImageARB (see
the implementation notes for details).

v2: refactor code to use st_copy_framebuffer_to_texture() helper function.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-12-03 09:12:20 -07:00
Ilia Mirkin
6c6f28c35e nv50/ir: fix moves to/from flags
Noticed this when looking at a trace that caused flags to spill to/from
registers. The flags source/destination wasn't encoded correctly
according to both envydis and nvdisasm.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
101e315cc1 nv50/ir: don't forget to mark flagsDef on cvt in txb lowering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
06055121e6 nv50/ir: fix instruction permutation logic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
11fcf46590 nv50/ir: the mad source might not have a defining instruction
For example if it's $r63 (aka 0), there won't be a definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:37 -05:00
Ilia Mirkin
52b68375ae nv50/ir: make sure entire graph is reachable
The algorithm expects the entire CFG to be reachable, so make sure that
we hit every node. Otherwise we will end up with uninitialized data,
memory corruption, etc.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
adcc547bfb nv50/ir: deal with loops with no breaks
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
ff61ac4838 nvc0/ir: fold postfactor into immediate
SM20-SM50 can't emit a post-factor in the presence of a long immediate.
Make sure to fold it in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
52a800a687 nv50/ir: allow immediate 0 to be loaded anywhere
There's a post-RA fixup to replace 0's with $r63 (or $r127 if too many
regs are used), so just as nvc0, let an immediate 0 be loaded anywhere.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 18:51:15 -05:00
Samuel Pitoiset
8482763d35 nv50/ir/gk110: add memory barriers support for GK110
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 22:44:53 +01:00
Samuel Pitoiset
c672bf3b04 nv50/ir: do not call textureMask() for surface ops
That texture mask thing doesn't seem to be needed for surface ops, so
just as nve4+, let do that only for texture ops.

This fixes a segfault with 'test_surface_st' from
gallium/tests/trivial/compute.c on Fermi because this test uses sustp.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 22:10:44 +01:00
Jose Fonseca
4a3d388834 util/blitter: Fix "SO C90 forbids mixed declarations and code".
Trivial.
2015-12-02 17:49:20 +00:00
Edward O'Callaghan
772f429f0a gallium/util: Fix util_blitter_clear_depth_stencil() for num_layers>1
Previously util_blitter_clear_depth_stencil() could not clear more
than the first layer. We need to generalise this as we did for
util_blitter_clear_render_target().

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-02 18:23:43 +01:00
Edward O'Callaghan
8f2c5e281d gallium/util: Fix util_blitter_clear_render_target() for num_layers>1
Previously util_blitter_clear_render_target() could not clear more
than the first layer. We need to generalise this so that
ARB_clear_texture can pass the 3d piglit test.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-02 18:23:43 +01:00
Jose Fonseca
56aff6bb4e Remove Sun CC specific code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2015-12-02 07:51:04 +00:00
Dave Airlie
0e49151dcf r600: set mega fetch count to 16 for gs copy shader
Seems like MFC should be set for this shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
4ebcf5194d r600: increment ring index after emit vertex not before.
The docs say we should send the emit after the ring writes,
so lets do that and not have an ALU in between.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
13b134a443 r600: add alu + cf nop to copy shader on r600
SB suggests we do this for r600, so lets do it,
for the copy shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
af4013d26b r600: SMX returns CONTEXT_DONE early workaround
streamout, gs rings bug on certain r600s, requires a wait idle
before each surface sync.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:00 +10:00
Dave Airlie
b63944e8b9 r600: do SQ flush ES ring rolling workaround
Need to insert a SQ_NON_EVENT when ever geometry
shaders are enabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:24:32 +10:00