Commit graph

62693 commits

Author SHA1 Message Date
Jason Ekstrand
e50cf5faa5 i965/generator: Get rid of the ! in the unreachable statement
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-02 10:21:18 -07:00
Jason Ekstrand
0573d0e484 nir/print: Correctly print swizzles for explicitly sized alu sources
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-02 10:21:18 -07:00
Ilia Mirkin
4a3c0e9950 freedreno/a3xx: add MRT support
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
6f4c1976f4 freedreno: convert blit program to array for each number of rts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
d9992ab35a freedreno: add support for laying out MRTs in gmem
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
602bc6c88d freedreno: add core infrastructure support for MRTs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
d13803c76f freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property
This will enable the driver to tell which regids to link up to which
MRT outputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
f27ec59084 freedreno/a3xx: add independent blend function support
This is needed for MRT support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin
8efa3e340d freedreno: remove alpha key from ir3_shader
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Stéphane Marchesin
70eed78cac i915g: Implement EGL_EXT_image_dma_buf_import
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
2015-04-01 20:13:37 -07:00
Matt Turner
a03d0ba78f i965/fs: Relax type check in cmod propagation.
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.

total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs:     117192 -> 116359 (-0.71%)
helped:                                471

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-01 13:43:57 -07:00
Matt Turner
781badee7a nir: Remove useless ftrunc inside f2i/f2u.
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
97e6c1b957 nir: Recognize (a < b || a < c) as a < max(b, c).
Doesn't work for analogous && cases, because of NaNs.

total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs:     42000 -> 41117 (-2.10%)
helped:                                403

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
a2b6e908cf nir: Add addition/multiplication identities of exp/log.
instructions in affected programs:     2858 -> 2808 (-1.75%)
helped:                                12

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
099c729b4c nir: Add identities for the log function.
The rcp(log(x)) pattern affects instruction counts.

instructions in affected programs:     144 -> 138 (-4.17%)
helped:                                6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
8a6ae384b2 nir: Add identities for the exponential function.
No changes in shader-db.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
e26783d445 nir: Recognize another open coded lrp.
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs:     4876 -> 4720 (-3.20%)
helped:                                58
HURT:                                  10

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
e82437e141 nir: Recognize open coded lrp.
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs:     34773 -> 33083 (-4.86%)
helped:                                147
HURT:                                  6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Kenneth Graunke
25e214db00 nir: Use _mesa_flsll(InputsRead) in prog->nir.
InputsRead is a 64-bit bitfield.  Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.

Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32.  Switch back to using < now that
the underlying problem is fixed.

Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 13:30:13 -07:00
Kenneth Graunke
3d166b313d mesa: Implement _mesa_flsll().
This is _mesa_fls() for 64-bit values.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 13:30:13 -07:00
Kenneth Graunke
4b38c5c783 nir: In prog->nir, don't wrap dot products with ptn_channel(..., X).
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:30:13 -07:00
Jason Ekstrand
218e45e2f7 i965: Use the same nir options for all gens
If we tell NIR to split ffma's, then we don't need seperate options
anymore.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
b9d7454571 i965/nir: Run DCE again before going out of SSA
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
37703040a1 i965/nir: Run the ffma peephole after the rest of the optimizations
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering.  Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem.  We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.

shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs:     989359 -> 978057 (-1.14%)
helped:                                5308
HURT:                                  97
GAINED:                                78
LOST:                                  5

This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program.  However, the damage is caused by changing a single
instruction from a ffma to an add.  This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill.  Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
7f344721b1 nir/peephole_ffma: Be less agressive about fusing multiply-adds
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs:     355876 -> 349811 (-1.70%)
helped:                                1455
HURT:                                  14
GAINED:                                5
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
a8c8b3b872 nir: Add a dedicated ffma peephole optimization
i965/nir: Use the dedicated ffma peephole

total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs:     1292790 -> 1268660 (-1.87%)
helped:                                5999
HURT:                                  457
GAINED:                                4
LOST:                                  9

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
e06a3d0282 nir: Move the compare-with-zero optimizations to the late section
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs:     4230 -> 4286 (1.32%)
helped:                                0
HURT:                                  12

While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
da294f9b2f nir/algebraic: Add a seperate section for "late" optimizations
i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
1779dc060f nir/algebraic: Remove a duplicate optimization
This optimization is repeated verbatim above

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
22ee7eeb4e nir/algebraic: #define around structure definitions
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions.  To solve this, we play the
age-old header file trick and just #define around it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
793a94d6b5 nir/print: Don't print extra swizzzle components
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw.  This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.

Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
2015-04-01 12:49:49 -07:00
Emil Velikov
d99135b2e9 configure: nuke --with-max-{width,height}
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.

v2: Use the correct variable name in the comments

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-04-01 19:43:34 +00:00
Emil Velikov
bd4925c6ac gallium: ship tgsi_to_nir.h in the tarball
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-04-01 19:33:37 +00:00
Matt Turner
3384179faa glsl: Make sure not to dereference NULL.
Found by Coverity.
2015-04-01 12:25:29 -07:00
Laura Ekstrand
142909f19d main: create_buffers unlocks mutex when throwing OUT_OF_MEMORY.
Ilia Mirkin found that I had forgotten to free the mutex in the error case.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-01 12:07:28 -07:00
Jose Fonseca
3321724c10 automake,scons: Put NIR source files in a separate var to fix SCons build.
SCons does not build NIR yet.

Trivial.
2015-04-01 19:49:09 +01:00
Jose Fonseca
7f0682cebf automake: Fix out-of-source builds.
Add include path for generated nir_opcodes.h.

Trivial.
2015-04-01 19:48:09 +01:00
Brian Paul
1625d7a87a mesa: don't include colormac.h in format code
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul
2768a0b1b4 mesa: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul
f1d55017d7 tnl: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul
8ac9407a83 swrast: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul
2ad8af1a0c mesa: remove unused macros from colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Eric Anholt
15b03b7964 nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value.  Only, in NIR we want a proper
bool once again, so we compare with 0.  This is a lot of pointless extra
instructions.

total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs:     1342 -> 1309 (-2.46%)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt
6e8d4a2f80 nir: Recognize a pattern for doing b2f without the opcode.
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand.  This is common when generating NIR from TGSI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt
26261bca21 vc4: Add shader-db dumping of NIR instruction count.
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
2015-04-01 10:57:01 -07:00
Eric Anholt
73e2d4837d vc4: Convert to consuming NIR.
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.

total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs:     62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs:     15494 -> 15240 (-1.64%)

v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
    I'm not committing)
2015-04-01 10:57:01 -07:00
Eric Anholt
783ad697d2 gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader.
This will be used by the VC4 driver for doing device-independent
optimization, and hopefully eventually replacing its whole IR.  It also
may be useful to other drivers for the same reason.

v2: Add all of the instructions I was relying on tgsi_lowering to remove,
    and more.
v3: Rebase on SSA rework of the builder.
v4: Use the NIR ineg operation instead of doing a src modifier.
v5: Don't use ineg for fnegs.  (infer_src_type on MOV doesn't do what I
    expect, again).
v6: Fix handling of multi-channel KILL_IF sources.
v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than
    a vector load_const.  CSE doesn't recognize that srcs out of those
    channels are actually all the same.
v8: Rebase on nir_builder auto-sizing, make the scalar arguments to
    non-ALU instructions actually be scalars.
v9: Add support for if/loop instructions, additional texture targets, and
    untested support for indirect addressing on temps.
v10: Rebase on master, drop bad comment about control flow and just choose
     the X channel, use int comparison opcodes in LIT for now, drop unused
     pipe_context argument..
v11: Fix translation of LRP (previously missed because I mis-translated
     back out), use nir_builder init helpers.
v12: Rebase on master, adding explicit include of mtypes.h to get
     INTERP_QUALIFIER_*
v13: Rebase on variables being in lists instead of hash tables, drop use
     of mtypes.h in favor of util/pipeline.h.  Use Ken's nir_builder
     swizzle and fmov/imov_alu helpers, drop "struct" in front of
     nir_builder, use nir_builder directly as the function arg in a lot of
     cases, drop redundant members of ttn_compile that are also in
     nir_builder, drop some half-baked malloc failure handling.
v14: The indirect uniform src0 should be scalar, not vector (noticed as
     odd by robclark, confirmed by cwabbott).  Apply Ken's review to
     initialize s->num_uniforms and friends, skip ttn_channel for dot
     products, and use the simpler discard_if intrinsic.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v13)
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-04-01 10:57:01 -07:00
Eric Anholt
486dcfbbd9 vc4: Tell shader-db how big our UBOs are, if present.
I had regressed them for a while with the NIR work.
2015-04-01 10:57:01 -07:00
Eric Anholt
a3a07d46d1 mesa: Make a shared header for 3D pipeline enum / #defines.
NIR uses these enums/#defines in nir_variables and associated intrinsics,
but I want to be able to use them from TGSI->NIR and NIR->TGSI.
Otherwise, we had to pull in all of mtypes.h.

This doesn't cover all of the enums we might want from a shared compiler
core (like varying slots or vert attribs), but it at least covers what I
need at the moment (system values and interp qualifiers).

v2: Move to src/glsl since util/ is really vague.  Include in Makefile.am
    list.  Use plain bitshifts and stdint types instead of undefined
    BITFIELD64_BIT.
v3: Rename to shader_enums.h. Move it into Makefile.sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, with
             recommendation to rename)
2015-04-01 10:57:01 -07:00
Emil Velikov
5604d7675e nir: add nir_builder.h to the tarball
The header was added with commit 2a135c470e3(nir: Add an ALU op builder
kind of like ir_builder.h) but did not made it into to the sources list.

Fortunately it remained unused until a recent commit faf6106c6f6(nir:
Implement a Mesa IR -> NIR translator.)

v2: Remove the bogus dependency. Tweak commit message.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 14:46:42 +01:00