Commit graph

67714 commits

Author SHA1 Message Date
Matt Turner
c638ea3d19 i965: Don't make instructions with a null dest a barrier to scheduling.
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:

   mul  acc0, x, y
   mach null, x, y
   mov  dest, acc0

Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.

GAINED:                                103
LOST:                                  43

Causes one program to spill (643 -> 1076 instructions).

I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 17:57:39 -08:00
Ian Romanick
f02f1af9f7 i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.

No piglit regressions on any i965 platform in Jenkins.

total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs:     0 -> 0
helped:                                0
HURT:                                  0
GAINED:                                2089
LOST:                                  0

No other platforms affected in shader-db.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:34:47 -08:00
Eric Anholt
0680d170d1 nir: Expose nir_print_instr() for debug prints
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.

v2: Just pass a NULL state, now that it won't crash when you do.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 17:30:11 -08:00
Eric Anholt
6445a40520 nir: When asked to print with a NULL state, just use bare variable names.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 17:30:01 -08:00
Eric Anholt
447ddfc137 nir: Add nir_lower_alu_to_scalar.
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.

v2: Use the nir_src_for_ssa() helper, and another instance of
    nir_alu_src_copy().
v3: Drop the non-SSA support.  All intended callers will have SSA-only ALU
    ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
    unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
    about weird input_sizes[].

Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
2015-01-23 16:37:23 -08:00
Eric Anholt
b200127816 nir: Make some helpers for copying ALU src/dests.
There aren't many users yet, but I wanted to do this from my scalarizing
pass.

v2: Constify the src arguments.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 16:37:16 -08:00
Kenneth Graunke
15063d2ad0 nir: Add algebraic optimizations for division and reciprocal.
These also exist in opt_algebraic.cpp.

total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%)
NIR instructions in affected programs:     42221 -> 42002 (-0.52%)
helped:                                    198

total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%)
i965 instructions in affected programs:     84322 -> 83885 (-0.52%)
helped:                                     394
HURT:                                       1 (by 1 instruction)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
bbd60f6d79 nir: Add algebraic optimizations for exponential/logarithmic functions.
Most of these exist in the GLSL IR algebraic pass already.  However,
SSA allows us to find more instances of the patterns.

total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs:     124189 -> 120026 (-3.35%)
helped:                                    604

total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs:     261295 -> 254507 (-2.60%)
helped:                                     1295
HURT:                                       3

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
391fb32bbe nir: Add algebraic optimizations for simplifying comparisons.
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.

The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.

total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs:     411143 -> 405922 (-1.27%)
helped:                                    2233
HURT:                                      214

A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x.  We can always clean that up later.

total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs:     784980 -> 775093 (-1.26%)
helped:                                     4508
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
551a752a59 nir: Add algebraic optimizations for pointless shifts.
The GLSL IR optimization pass contained these; we may as well include
them too.

v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).

No change in the number of NIR instructions on a shader-db run.

total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs:     542 -> 537 (-0.92%)
helped:                                     2 (in glamor)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
3e56572c49 nir: Add a bunch of algebraic optimizations on logic/bit operations.
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that.  While there,
I decided to add a bunch more of them.

v2: Delete bogus fand/for optimizations (caught by Jason).

total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs:     149634 -> 146937 (-1.80%)
helped:                                    1032

total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs:     537 -> 542 (0.93%)
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
978b0a9cda nir: Implement CSE on intrinsics that can be eliminated and reordered.
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value.  This
made ALU operations on those values appear distinct as well.

Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.

Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered.  I didn't implement
support for variables for the time being.

v2: Assert that info->num_variables == 0 (requested by Jason).

total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs:     2413496 -> 2001071 (-17.09%)
helped:                                    16872

total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs:     640654 -> 620094 (-3.21%)
helped:                                     2071
HURT:                                       585
GAINED:                                     14
LOST:                                       25

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
cbdd623f13 nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.

The next patch will introduce another case that requires SSA.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
d7743bb1c2 i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.
This allows us to count NIR instructions via shader-db.

Use "run" as normal.  The results file will contain both NIR and
assembly.

Then, to generate a NIR report:
./report.py <(grep    NIR results/foo) <(grep    NIR results/bar)

Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
f3e06fcc6a i965/nir: Print NIR on INTEL_DEBUG=fs.
This is useful for debugging and looking for optimization opportunities.

It will need to be expanded when we add support for other scalar stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke
faa38e16aa i965/nir: Do optimizations again just before lowering source mods.
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.

total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs:     22406 -> 21984 (-1.88%)
helped:                                47
HURT:                                  0
GAINED:                                0
LOST:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 14:53:25 -08:00
Matt Turner
9b5efac461 loader: Remove NEED_OPENGL_COMMON check.
HAVE_DRICOMMON is sufficient since OpenGL must be enabled for DRI.
2015-01-23 14:28:44 -08:00
Matt Turner
2e7b62cbb9 gitignore: Ignore .tar.xz files. 2015-01-23 14:28:44 -08:00
Matt Turner
dd6f641303 mesa: Build with subdir-objects. 2015-01-23 14:28:44 -08:00
Matt Turner
145919b2ab glsl: Build a libglsl_util library.
Rather than sourcing files with ../dir/file.c which leads to distclean
wiping out ../dir's .deps directory.
2015-01-23 14:28:44 -08:00
Matt Turner
a37ae2ab92 mapi: Build with subdir-objects. 2015-01-23 14:28:44 -08:00
Matt Turner
961def1074 mapi: Remove vgapi from SUBDIRS.
OpenVG is disabled with via autotools.
2015-01-23 14:28:44 -08:00
Matt Turner
ce98519266 mesa: Drop inclusion of glapi_gen.mk.
Some glapi headers used to be generated from this Makefile.am, but no
longer.
2015-01-23 14:28:43 -08:00
Matt Turner
618c3b35f1 glsl: Build with subdir-objects.
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
2015-01-23 14:28:42 -08:00
Matt Turner
a8b880bd63 nir: Add headers to distribution. 2015-01-23 14:27:39 -08:00
Matt Turner
ae494281a4 nir: Add nir_{opt_,}algebraic.py to distribution. 2015-01-23 14:26:53 -08:00
Matt Turner
4db329ddff mesa: Add format_{un,}pack.py to distribution. 2015-01-23 14:26:53 -08:00
Matt Turner
195488e945 mesa: Remove pack_tmp.h from sources.
Missed in commit 3a4de321.
2015-01-23 13:35:25 -08:00
Connor Abbott
68a9d0b36f nir: add generated file to .gitignore
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 10:20:46 -08:00
Ville Syrjälä
f4b31d29d7 i965: Fix min_vs_entries for CHV
According to BSpec the correct number for min_vs_entries is 34 for CHV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2015-01-23 12:09:41 +02:00
Ville Syrjälä
99754446ab i965: Fix max_wm_threads for CHV
Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.

On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2015-01-23 12:09:35 +02:00
Connor Abbott
c8761c8559 glsl: fix stale comment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-23 00:23:51 -05:00
Jason Ekstrand
6be2434031 i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion
When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message.  Since we only do this for >= HSW,
this is ok since there are no MRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
2015-01-22 16:00:34 -08:00
Jason Ekstrand
7de8a3e13e i965/emit: Do the sampler index adjustment directly in header.0.3
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space.  The usual candidate was
the destination of the sampler instruction.  However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data.  Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-01-22 15:19:13 -08:00
Axel Davy
8751734613 st/nine: Correctly handle when ff vs should have no texture coord input/output
Previous code semantic was:

. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.

Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.

Drivers like r300 dislike when vs has inputs that are not fed.

Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.

The new code semantic is:

. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage

The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.

This fixes 3Dmark05, which uses ff vs with programmable ps.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-01-22 22:16:24 +00:00
Axel Davy
77fcff37cf st/nine: Change comment relating to vertex shader inputs not matching declaration
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-01-22 22:16:24 +00:00
Axel Davy
f8a74410f1 st/nine: Allocate vs constbuf buffer for indirect addressing once.
When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.

This patch makes this buffer be allocated once.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:24 +00:00
Axel Davy
e0f75044c8 st/nine: Allocate the correct size for the user constant buffer
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:24 +00:00
Axel Davy
b9cbea9dbc st/nine: Add variables containing the size of the constant buffers
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:24 +00:00
Axel Davy
a721987077 st/nine: Fix sm3 relative addressing for non-debug build
Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.

The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-01-22 22:16:23 +00:00
Axel Davy
4b7a9cfddb st/nine: Remove unused code for ps
Since constant indirect adressing is not allowed for ps,
we can remove our code to handle that.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
9690bf33d7 st/nine: Correct rules for relative adressing and constants.
relative adressing for constants is possible only for vs float
constants.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
bce94ce831 st/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGB
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
9e23b64c15 st/nine: Implement TEXDP3TEX
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
09eb1e901f st/nine: Implement TEXDP3
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
f19e699368 st/nine: Implement TEXDEPTH
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:23 +00:00
Axel Davy
3676ab02fb st/nine: Implement TEXM3x3SPEC
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:22 +00:00
Axel Davy
2b9f079ae3 st/nine: Implement TEXM3x2TEX
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:22 +00:00
Axel Davy
fdff111dc8 st/nine: implement TEXM3x2DEPTH
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:22 +00:00
Axel Davy
7865210670 st/nine: Fix TEXM3x3 and implement TEXM3x3VSPEC
The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-22 22:16:22 +00:00