Commit graph

69167 commits

Author SHA1 Message Date
Jordan Justen
d70f4e6daf i965/state: Create separate dirty state bits for each pipeline
When clearing the state for a pipeline, we will save changed state for
the other pipelines.

v3:
 * Adjust brw_upload_pipeline_state
   * Don't pull pipeline state bits into common state bits
   * Don't clear pipeline state bits
 * Adjust 'clear' phase
   * brw_clear_dirty_bits is now brw_render_state_finished
   * Move cross-pipeline state flagging to brw_pipeline_state_finished
   * Move pipeline clears to brw_pipeline_state_finished

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-31 16:40:24 -07:00
Jordan Justen
db11955072 i965/state: Support multiple pipelines in brw->num_atoms
brw->num_atoms is converted to an array, but currently just an array
of length 1.

Adds brw_copy_pipeline_atoms which copies the atoms for a pipeline,
and sets brw->num_atoms[p] for pipeline p.

v2:
 * Rename brw->atoms[] to render_atoms
 * Rename brw_add_pipeline_atoms to brw_copy_pipeline_atoms
 * Rename brw_pipeline_first_atom to brw_get_pipeline_atoms

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-31 16:40:23 -07:00
Jordan Justen
736a31d462 i965/state: Rename brw_clear_dirty_bits to brw_render_state_finished
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-31 16:40:23 -07:00
Jordan Justen
2c02baa487 i965/state: Rename brw_upload_state to brw_upload_render_state
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-31 16:40:23 -07:00
Roland Scheidegger
611bd80f3b gallivm: do some hack heuristic to disable texture functions
We've seen some cases where performance can hurt quite a bit.
Technically, the more simple the function the more overhead there is
for using a function for this (and the less benefits this provides).
Hence don't do this if we expect the generated code to be simple.
There's an even more important reason why this hurts performance,
which is shaders reusing the same unit with some of the same inputs,
as llvm cannot figure out the calculations are the same if they
are performned in the function (even just reusing the same unit without
any input being the same provides such optimization opportunities though
not very much). This is something which would need to be handled by IPO
passes however.
2015-04-01 00:56:12 +02:00
Matt Turner
47c4b38540 i965/fs: Allow CSE to handle MULs with negated arguments.
mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of
mul x, -y.

With NIR:
total instructions in shared programs: 6167779 -> 6161193 (-0.11%)
instructions in affected programs:     983511 -> 976925 (-0.67%)
helped:                                4106
HURT:                                  16
GAINED:                                18
LOST:                                  7

Without NIR:
total instructions in shared programs: 6192323 -> 6185299 (-0.11%)
instructions in affected programs:     987875 -> 980851 (-0.71%)
helped:                                4146
HURT:                                  16
GAINED:                                16
LOST:                                  0
2015-03-31 14:14:36 -07:00
Matt Turner
438c1c0080 i965: Mark brw_inst_bits' brw_inst* parameter const.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-31 14:14:36 -07:00
Matt Turner
ac6102bcc5 glsl: Remove bogus Makefile dependency. 2015-03-31 14:14:36 -07:00
Matt Turner
2c38f891ad glsl: Reassociate multiplication of mat*mat*vec.
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.

On HSW (with vec4 vertex shaders):
instructions in affected programs:     4420 -> 3194 (-27.74%)

On BDW (with scalar vertex shaders):
instructions in affected programs:     12756 -> 6726 (-47.27%)

Implementing a general matrix chain ordering is harder (or at least
tedious) because of having to walk the GLSL IR to create a list of
multiplicands. I'm guessing that this patch handles 90+% of cases, but
of course to tell definitively you'd have to implement the general
thing.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-03-31 14:01:15 -07:00
Matt Turner
cf2dc1624f glsl: Implement type inferencing of matrix types.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-03-31 14:01:15 -07:00
Matt Turner
73f6f9b9be glsl: Factor out a get_mul_type() function.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-03-31 14:01:15 -07:00
Marcin Ślusarz
f9e2295560 nouveau: synchronize "scratch runout" destruction with the command stream
When nvc0_push_vbo calls nouveau_scratch_done it does not mean
scratch buffers can be freed immediately. It means "when hardware
advances to this place in the command stream the scratch buffers
can be freed".

To fix it, just postpone scratch runout destruction after current
fence is signalled.

The bug existed for a very long time. Nobody noticed, because
"scratch runout" code path is rarely executed.

Fixes hang at the very beginning of first mission in "Serious Sam 3"
on nve7/gk107. It manifested as:

nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]]

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-31 22:04:31 +02:00
Brian Paul
3db0317351 docs: document Viewperf 12 issues
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-03-31 11:50:20 -06:00
Neil Roberts
fe026d7ce5 i965/skl: Avoid using the 1D stencil layout for stencil-only images
Commit cf67ca9ffa made the layouting code pick a special layout for
1D images on Skylake. This should not be used for depth and stencil
buffers because these need to be treated as 2D tiled images. However
the patch was missing a check for images with a base format of
GL_STENCIL_INDEX. In practice I don't think it's currently possible to
hit this because Mesa doesn't support GL_ARB_texture_stencil8 and it's
not possible to create a 1D renderbuffer, but it'll be good to be
ready for when the extension is supported.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-03-31 18:22:01 +01:00
Tom Stellard
fda7558057 clover: Return CL_BUILD_ERROR for CL_PROGRAM_BUILD_STATUS when compilation fails v2
v2:
  - Don't use _errs map

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-03-31 15:40:51 +00:00
Tom Stellard
4c53d2acbb radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader types v2
v2:
  - Fix typo

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-31 15:40:51 +00:00
Leo Liu
a714fbacf7 radeon/vce: implement video usability information support
This will help encoding VUI into the bitstream

v2: make backward compatible

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-03-31 12:31:58 -04:00
Leo Liu
8e3668a7c0 st/omx/enc: export framerate to vce driver
The framerate will be used for video usability info support by VCE driver

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-03-31 12:31:58 -04:00
Roland Scheidegger
489866938f llvmpipe: enable ARB_texture_gather
Just announce support for 4 components.
While here also increase the max/min texel offsets (the limit is completely
artificial, was chosen because that's what other hardware did, however there's
other drivers using larger limits).
Over a thousand little piglits skip->pass.

v2: update docs/GL3.txt

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Roland Scheidegger
0753b135f6 gallivm: implement TG4 for ARB_texture_gather
This is quite trivial, essentially just follow all the same code you'd
use with linear min/mag (and no mip) filter, then just skip the filtering
after looking up the texels in favor of direct assignment of the right channel
to the result. (This is though not true for the multi-offset version if we'd
want to support it - for this would probably need to do something along the
lines of 4x nearest sampling due to the necessity of doing coord wrapping
individually per texel.)
Supports multi-channel formats.
From the SM5 gather cap bit, should support non-constant offsets, plus shadow
comparisons (the former untested), but not component selection (should be
easy to implement but all this stuff is not really exposable anyway for now).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Roland Scheidegger
73c6914195 gallivm: add gather support to sampler interface
Luckily thanks to the revamped interface this is a lot less work now...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Roland Scheidegger
1863ed21ff gallivm: simplify sampler interface
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Jose Fonseca
0fc5b80e7a util/debug: Update MgwHelp link, drop BfdHelp link. 2015-03-31 09:42:06 +01:00
Michel Dänzer
b8797a7875 gallivm: Fix build against LLVM 3.7 SVN r233648
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 15:05:01 +09:00
Eric Anholt
1dcc1ee314 vc4: Drop integer multiplies with 0 to moves of 0.
This cleans up more instructions generated by uniform array indexing
multiplies.

total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs:     896 -> 868 (-3.12%)
2015-03-30 12:57:45 -07:00
Eric Anholt
8c5dcdbccb vc4: Add a constant folding pass.
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).

I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.

total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs:     346 -> 344 (-0.58%)
2015-03-30 12:57:45 -07:00
Brian Paul
dbe67d76e0 glsl: allow ForceGLSLVersion to override #version directives
Previously, the ctx->Const.ForceGLSLVersion setting only worked if
the shader lacked a #version directive.  Now, the ForceGLSLVersion
setting will override the #version directive too.

This change should be safe since it should be rare to have an app
that has a mix of shader versions and we only wanted to override
the #version for shaders which lacked the #version directive.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-30 11:25:39 -06:00
Eric Anholt
c519c4d85e vc4: Don't bother masking out the low 24 bits for integer multiplies
The hardware just uses the low 24 lines, saving us an AND to drop the high
bits.

total uniforms in shared programs: 13433 -> 13423 (-0.07%)
uniforms in affected programs:     356 -> 346 (-2.81%)
total instructions in shared programs: 40003 -> 39989 (-0.03%)
instructions in affected programs:     910 -> 896 (-1.54%)
2015-03-30 09:23:39 -07:00
Eric Anholt
5df8bf86fe vc4: Make integer multiply use 24 bits for the low parts.
The hardware uses the low 24 bits in integer multiplies, so we can have
fewer high bits (and so probably drop them more frequently).
2015-03-30 09:23:39 -07:00
Samuel Iglesias Gonsalvez
18004c338f glsl: fail when a shader's input var has not an equivalent out var in previous
GLSL ES 3.00 spec, 4.3.10 (Linking of Vertex Outputs and Fragment Inputs),
page 45 says the following:

"The type of vertex outputs and fragment input with the same name must match,
otherwise the link command will fail. The precision does not need to match.
Only those fragment inputs statically used (i.e. read) in the fragment shader
must be declared as outputs in the vertex shader; declaring superfluous vertex
shader outputs is permissible."
[...]
"The term static use means that after preprocessing the shader includes at
least one statement that accesses the input or output, even if that statement
is never actually executed."

And it includes a table with all the possibilities.

Similar table or content is present in other GLSL specs: GLSL 4.40, GLSL 1.50,
etc but for more stages (vertex and geometry shaders, etc).

This patch detects that case and returns a link error. It fixes the following
dEQP test:

  dEQP-GLES3.functional.shaders.linkage.varying.rules.illegal_usage_1

However, it adds a new regression in piglit because the test hasn't a
vertex shader and it checks the link status.

bin/glslparsertest \
tests/spec/glsl-1.50/compiler/gs-also-uses-smooth-flat-noperspective.geom pass \
1.50 --check-link

This piglit test is wrong according to the spec wording above, so if this patch
is merged it should be updated.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-03-30 13:29:05 +02:00
Michel Dänzer
d64adc3a79 radeonsi: Cache LLVMTargetMachineRef in context instead of in screen
Fixes a crash in genymotion with several threads compiling shaders
concurrently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746

Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-30 15:15:10 +09:00
Tapani Pälli
ce83a6ec81 glsl: fix unreachable(!"") to unreachable("")
Correct error with commit 151fb1e where assert was renamed
to unreachable without removing ! from string argument.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-30 08:16:00 +03:00
Emil Velikov
938b17940f docs: add news item and link release notes for mesa 10.5.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-03-28 19:21:31 +00:00
Emil Velikov
dc8d8a2951 docs: Add sha256 sums for the 10.5.2 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit ff87ae1e00)
2015-03-28 19:21:31 +00:00
Emil Velikov
6e19f6b4d0 Add release notes for the 10.5.2 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 5e59f895c4)
2015-03-28 19:21:31 +00:00
Ilia Mirkin
ee670c9efa freedreno/a3xx: add support for point sprite coordinate replacement
This does not (yet) support different coordinate origins, so the tests
still fail due to fbo flipping.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin
995f55a6ce freedreno/a3xx: make vs-set point size work
This appears to need the A2XX version of the point list, so select it at
draw time if necessary.

Experimentally, always using the A2XX version causes hangs when PSIZE
isn't actually emitted.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin
7fc5da8b93 freedreno/a3xx: point size should not be divided by 2
The division is probably a holdover from the days when the fixed point
inline functions generated by headergen were broken.

Also reduce the maximum point size to 4092 (vs 4096), which is what the
blob does.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin
738c8319ac freedreno/a3xx: fix 3d texture layout
The SZ2 field contains the layer size of a lower miplevel. It only
contains 4 bits, which limits the maximum layer size it can describe. In
situations where the next miplevel would be too big, the hardware
appears to keep minifying the size until it hits one of that size.
Unfortunately the hardware's ideas about sizes can differ from
freedreno's which can still lead to issues. Minimize those by stopping
to minify as soon as possible.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-28 14:54:41 -04:00
Ilia Mirkin
3735643df3 freedreno/a3xx: LAYERSZ2 appears to have no effect on arrays
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:40 -04:00
Kenneth Graunke
72b06fb08e nir: Fix copy and pasted error message in nir_validate.
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.

v2: Do it right (thanks Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-28 09:36:46 -07:00
Kenneth Graunke
31dc63d5ca i965/nir: Use NIR for ARB_vertex_program support on Gen8+.
Everything is already in place; we simply have to take the scalar code
generation path.  This gives us SIMD8 VS programs, instead of SIMD4x2.

v2: Rebase on the patch that drops brw->gen >= 8.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-27 21:16:51 -07:00
Kenneth Graunke
ac69ab7302 i965: Move env_var_as_boolean to intel_debug.c.
I need to use this in brw_vec4.cpp, so it can't be static anymore.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-03-27 21:16:43 -07:00
Kenneth Graunke
826d3afb8f i965/fs: Add ARB_fragment_program support to the NIR backend.
Use prog_to_nir where we would normally call glsl_to_nir, handle program
parameter lists, and skip a few things that don't exist.

Using NIR generates much better shader code than Mesa IR, since we get
real optimizations, as opposed to prog_optimize:

total instructions in shared programs: 314007 -> 279892 (-10.86%)
instructions in affected programs:     285173 -> 251058 (-11.96%)
helped:                                2001
HURT:                                  67
GAINED:                                4
LOST:                                  7

v2: Change early return in nir_setup_uniforms to if/else (Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-27 21:16:34 -07:00
Kenneth Graunke
bf2c3bc316 nir: Lower subtraction to add with negation when !lower_negate.
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.

Suggested by Connor Abbott.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:34 -07:00
Kenneth Graunke
faf6106c6f nir: Implement a Mesa IR -> NIR translator.
Shamelessly ripped off from Eric Anholt's tgsi_to_nir pass.

This is not built on SCons, like the rest of NIR.

v2:
- Delete redundant c->s, c->impl, and c->cf_node_list pointers (Ken)
- Use nir_builder directly instead of ptn_compile in more places (Ken)
- Drop 'struct' keyword in front of nir_builder (ken)
- Add a file level Doxygen comment (Ken)
- Use scalar constants instead of splatting (Eric)
- Use nir_builder helpers for constants, moves, and swizzles (Connor)

v3: Minor indentation improvements.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:34 -07:00
Kenneth Graunke
06f7bea96a nir: Add builder helpers for MOVs with ALU sources and swizzling MOVs.
These will be useful for prog->nir and tgsi->nir.

v2: Don't forget to mark nir_swizzle as inline (Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Kenneth Graunke
75c922e0fe nir: Add nir_builder helpers for creating load_const intrinsics.
Both prog->nir and tgsi->nir will want to use these.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Ben Widawsky
74fd226e34 i965/skl: Don't use the PMA depth stall workaround
The PMA depth stall must be enabled (optimization turned off) under certain
circumstances on gen8. This was supposedly fixed for Gen9, which means we do not
need to check, or toggle the state. The hardware is supposed to enable the
hardware optimization by default, unlike BDW, so we also don't need to set it at
init. For whatever reason this improves stability on ETQW with the bug mentioned
below.

References: https://bugs.freedesktop.org/show_bug.cgi?id=89039 (doesn't fix)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-27 21:04:41 -07:00
Ben Widawsky
9d32d35850 i965/skl: Disable partial resolve in VC
Recomendation [sic] is to set this field to 1 always. Programming it to default
value of 0, may have -ve impact on performance for MSAA WLs.

Another don't suck bit which needs to get set.

The patch wasn't as well tested as I would have liked, primarily I don't have
perf numbers for it, but it's getting to a point where it is in danger of being
lost.

v2: v1 was a mix of two patches. Since 0x7004 is masked, we only need to set it
once at initialization and make sure the pma workaround doesn't set the mask bit
(which it doesn't).
Move LRI to init gpu state (Ken)
Add a comment.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-03-27 21:04:37 -07:00