Commit graph

71 commits

Author SHA1 Message Date
Jason Ekstrand
f7dcc11603 i965/fs: Add a builder argument to offset()
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2015-06-30 16:13:48 -07:00
Francisco Jerez
74c2458ecf i965/fs: Migrate opt_cse to the IR builder.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-09 15:18:32 +03:00
Francisco Jerez
e7069fbc70 i965/fs: Don't drop force_writemask_all and _sechalf when copying a CSE temporary.
LOAD_PAYLOAD instructions need the same treatment as any other
generator instructions, at least FB writes and typed surface messages
will need a payload built with non-zero execution controls.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-09 15:18:31 +03:00
Francisco Jerez
8013b8147a i965/fs: Take into account all instruction fields in CSE instructions_match().
Most of these fields affect the behaviour of the instruction so it
could actually break the program if we CSE a pair of otherwise
matching instructions with different values of these fields.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-09 15:18:31 +03:00
Jason Ekstrand
41868bb682 i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction
The newly reworked instruction is far more straightforward than the
original.  Before, the LOAD_PAYLOAD instruction was lowered by a the
complicated and broken-by-design pile of heuristics to try and guess
force_writemask_all, exec_size, and a number of other factors on the
sources.

Instead, we use the header_size on the instruction to denote which sources
are "header sources".  Header sources are required to be a single physical
hardware register that is copied verbatim.  The registers that follow are
considered the actual payload registers and have a width that correspond's
to the LOAD_PAYLOAD's exec_size and are treated as being per-channel.  This
gives us a fairly straightforward lowering:

 1) All header sources are copied directly using force_writemask_all and,
    since they are guaranteed to be a single register, there are no
    force_sechalf issues.

 2) All non-header sources are copied using the exact same force_sechalf
    and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself.

 3) In order to accommodate older gens that need interleaved colors,
    lower_load_payload detects when the destination is a COMPR4 register
    and automatically interleaves the non-header sources.  The
    lower_load_payload pass does the right thing here regardless of whether
    or not the hardware actually supports COMPR4.

This patch commit itself is made up of a bunch of smaller changes squashed
together.  Individual change descriptions follow:

i965/fs: Rework fs_visitor::LOAD_PAYLOAD

   We rework LOAD_PAYLOAD to verify that all of the sources that count as
   headers are, indeed, exactly one register and that all of the non-header
   sources match the destination width.  We then take the exec_size for
   LOAD_PAYLOAD directly from the destination width.

i965/fs: Make destinations of load_payload have the appropreate width

i965/fs: Rework fs_visitor::lower_load_payload

   v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions

i965/fs_cse: Support the new-style LOAD_PAYLOAD

i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD

i965/fs: Simplify setup_color_payload

   Previously, setup_color_payload was a a big helper function that did a
   lot of gen-specific special casing for setting up the color sources of
   the LOAD_PAYLOAD instruction.  Now that lower_load_payload is much more
   sane, most of that complexity isn't needed anymore.  Instead, we can do
   a simple fixup pass for color clamps and then just stash sources
   directly in the LOAD_PAYLOAD.  We can trust lower_load_payload to do the
   right thing with respect to COMPR4.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-06 10:29:30 -07:00
Jason Ekstrand
94ee908448 i965/fs: Make LOAD_PAYLOAD take a header size
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-06 10:29:30 -07:00
Jason Ekstrand
32af7d4188 i965/fs_inst: Add an is_copy_payload helper
This commit adds a new is_copy_payload helper to fs_inst that takes the
place of the similarly named functions in cse and register coalesce.  The
two is_copy_payload functions in CSE and register coalesce were subtly
different and potentially subtly broken.  The new version unifies the two
and should be more correct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-06 10:29:30 -07:00
Jason Ekstrand
76c1086f2d i965: Change header_present to header_size in backend_instruction
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-06 10:29:30 -07:00
Jason Ekstrand
a9ccb14d14 i965/fs_cse: Factor out code to create copy instructions
v2: Get rid of the block parameter and make src a const reference

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-06 10:29:29 -07:00
Francisco Jerez
3da9f708d4 i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
v2: Save some CPU cycles by doing 'return progress' rather than
    'depth++' in the discard jump special case.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-04 17:44:17 +03:00
Francisco Jerez
f2fad0dc80 i965: Perform basic optimizations on the BROADCAST opcode.
v2: Style fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-05-04 17:44:17 +03:00
Matt Turner
3ca17e75e4 i965/fs: Correct mistake in determining whether a MUL is negated.
a * b is equivalent to -a * -b, and the previous code was failing at
that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89961
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-04-14 12:16:03 -07:00
Matt Turner
47c4b38540 i965/fs: Allow CSE to handle MULs with negated arguments.
mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of
mul x, -y.

With NIR:
total instructions in shared programs: 6167779 -> 6161193 (-0.11%)
instructions in affected programs:     983511 -> 976925 (-0.67%)
helped:                                4106
HURT:                                  16
GAINED:                                18
LOST:                                  7

Without NIR:
total instructions in shared programs: 6192323 -> 6185299 (-0.11%)
instructions in affected programs:     987875 -> 980851 (-0.71%)
helped:                                4146
HURT:                                  16
GAINED:                                16
LOST:                                  0
2015-03-31 14:14:36 -07:00
Kenneth Graunke
db095eb43b i965: De-duplicate is_expression_commutative() functions.
Create a backend_inst::is_commutative() method to replace two static
functions that did the exact same thing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-03-15 03:14:53 -07:00
Francisco Jerez
447879eb88 i965: Factor out virtual GRF allocation to a separate object.
Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.

v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-02-10 16:05:47 +02:00
Matt Turner
f0aec4ee1e i965: Don't consider null dst instructions as matching non-null dst.
When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:

   cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
   ...
   cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f

we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.

Don't do anything if the entry's destination is null and the
instruction's destination is non-null.

Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-15 10:11:42 -08:00
Matt Turner
3d8188d4f8 i965: Consider SEL.{GE,L} to be commutative operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-08 15:38:16 -08:00
Matt Turner
a28ad9d4c0 i965/fs: Perform CSE on MOV ..., VF instructions.
Safe from causing optimization loops, since we don't constant propagate
VF arguments.

(for this and the previous patch):
total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
instructions in affected programs:     1616779 -> 1599636 (-1.06%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner
bd50213929 i965: Combine offset/texture_offset fields.
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-11-21 10:26:38 -08:00
Matt Turner
b65bd9583b i965/fs: Perform CSE on MAD instructions with final arguments switched.
Multiplication is commutative.

instructions in affected programs:     48314 -> 47954 (-0.75%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-10-29 21:35:46 -07:00
Kenneth Graunke
39a5a60b57 i965: Allow CSE on Gen4-5 unary math.
Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+
math instruction: it's a single instruction (SEND) with a GRF source.
The difference is that it also implicitly clobbers a message register.

The only visible effect is that CSE will remove the MRF-clobbering from
later math operations.  This should be fine; compute_to_mrf and
remove_redundant_mrf_writes don't look at the values populated by
implied writes, so they can't rely on those values being present.
Less interference may actually help those passes make more progress.

Binary math is still problematic, since it involves a separate MOV
instruction to load the second operand.  We continue disallowing CSE for
binary math operations.

total instructions in shared programs: 3340303 -> 3340100 (-0.01%)
instructions in affected programs:     26927 -> 26724 (-0.75%)
Nothing hurt, gained, or lost.  ~6% reduction on a few shaders.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-10-15 08:44:54 -07:00
Jason Ekstrand
7210583eb8 i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
This is actually the squash of a bunch of different changes.  Individual
commit titles follow:

i965/fs: Always 2-align registers SIMD16 for gen <= 5

i965/fs: Use the register width when applying offsets

   This reworks both byte_offset() and offset() to be more intelligent.
   The byte_offset() function now supports offsets bigger than 32. The
   offset() function uses the byte_offset() function together with the
   register width and the type size to offset the register by the correct
   amount.

i965/fs: Change regs_read to be in hardware registers

i965/fs: Change regs_written to be actual hardware registers

i965/fs: Properly handle register widths in LOAD_PAYLOAD

   The LOAD_PAYLOAD instruction is a bit special because it collects a
   bunch of registers (with possibly different widths) into a single
   payload block.  Once the payload is constructed, it's treated as a
   single block of data and most of the information such as register widths
   doesn't matter anymore.  In particular, the offset of any particular
   source register is the accumulation of the sizes of the previous source
   registers.

i965/fs: Properly set writemasks in LOAD_PAYLOAD

i965/fs: Handle register widths in demote_pull_constants

i965/fs: Get rid of implicit register doubling in the allocator

i965/fs: Reserve enough registers for PLN instructions

i965/fs: Make sources and destinations interfere in 16-wide

i965/fs: Properly handle register widths in CSE

i965/fs: Properly handle register widths in register_coalesce

i965/fs: Properly handle widths in copy propagation

i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD

i965/fs: Properly handle register widths and odd register sizes in spilling

i965/fs: Don't waste a register on texture lookups for gen >= 7

   Previously, we were waisting a register in SIMD16 mode because we could
   only allocate registers in pairs.  Now that we can allocate and address
   odd-sized registers, let's get rid of this special-case.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand
f0d43c09b2 i965/fs: Use offset a lot more places
We have this wonderful offset() function for advancing registers, but we're
not using it.  Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset.  In a few commits, we will make
offset do even more nifty things for us.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Matt Turner
49374fab5d i965: Make instruction lists local to the bblocks.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-09-24 09:42:46 -07:00
Matt Turner
072ea414d0 i965: Remove cfg-invalidating parameter from invalidate_live_intervals.
Everything has been converted to preserve the CFG.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-09-24 09:42:46 -07:00
Matt Turner
20a849b4aa i965: Use basic-block aware insertion/removal functions.
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.

cfg calculations:     202951 -> 80307 (-60.43%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-08-22 10:23:34 -07:00
Matt Turner
596990d91e i965: Add and use foreach_block macro.
Use this as an opportunity to rename 'block_num' to 'num'. block->num is
clear, and block->block_num has always been redundant.
2014-08-18 18:56:30 -07:00
Jason Ekstrand
f5cc3fdcf1 i965/cse: Don't eliminate instructions with side-effects
This casues problems when converting atomics to use the GRF.  Sometimes the atomic operation would get eaten by CSE when it shouldn't.

v2: Roll the has_side_effects check into is_expression

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-08-11 11:40:32 -07:00
Chris Forbes
0f4c5a70c6 i965: Get rid of backend_instruction::sampler
The generators no longer use this.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-08-09 13:12:35 +12:00
Matt Turner
680fe0acb3 i965: Add cfg to backend_visitor.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-21 10:35:34 -07:00
Matt Turner
1d97212007 i965/fs: Perform CSE on sends-from-GRF rather than textures.
Should potentially allow a few more cases, while avoiding doing CSE on
texture operations on Gen <= 6 with the MRF.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: lu hua <huax.lu@intel.com>
2014-07-15 10:12:29 -07:00
Matt Turner
1ca6b5d2e8 i965/fs: Invalidate live intervals in opt_cse, not _local.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-14 11:27:52 -07:00
Matt Turner
bdbaa9ab5b i965/fs: Move aeb list into opt_cse_local.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-14 11:27:52 -07:00
Matt Turner
2e90d1fb62 i965/fs: Pass cfg to calculate_live_intervals().
We've often created the CFG immediately before, so use it when
available.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:52 -07:00
Matt Turner
266109736a i965: Use typed foreach_in_list_safe instead of foreach_list_safe.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:51 -07:00
Matt Turner
bc2fbbafd2 i965: Add and use foreach_inst_in_block macros.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:51 -07:00
Matt Turner
22cd917329 mesa: Add and use foreach_in_list_use_after.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:51 -07:00
Matt Turner
35bc02dee8 i965/fs: Perform CSE on texture operations.
Helps Unigine Tropics and some (old) gstreamer shaders in shader-db.

instructions in affected programs:     792 -> 744 (-6.06%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-06-17 09:40:31 -07:00
Matt Turner
31ae9c25ff i965/fs: Perform CSE on load_payload instructions if it's not a copy.
Since CSE creates instructions, if we let CSE generate things register
coalescing can't remove, bad things will happen. Only let CSE combine
non-copy load_payloads.

E.g., allow CSE to handle this

   load_payload vgrf4+0, vgrf5, vgrf6

but not this

   load_payload vgrf4+0, vgrf5+0, vgrf5+1
2014-06-17 09:40:30 -07:00
Matt Turner
4b7bca8979 i965/fs: Emit load_payload instead of multiple MOVs for large VGRFs. 2014-06-17 09:40:07 -07:00
Matt Turner
68b7b03429 i965/fs: Only consider real sources when comparing instructions. 2014-06-17 09:38:06 -07:00
Matt Turner
f51a7e00da i965/fs: Clean up tabs in brw_fs_cse.cpp.
I'm adding vec4 CSE, and I want to diff the files.
2014-06-11 20:09:22 -07:00
Kenneth Graunke
3a439534de i965/fs: Allow CSE on math opcodes on Gen6+.
total instructions in shared programs: 2081469 -> 2081248 (-0.01%)
instructions in affected programs:     22606 -> 22385 (-0.98%)
No programs were hurt by this patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-06-10 16:38:25 -07:00
Matt Turner
b1dcdcde2e i965/fs: Loop from 0 to inst->sources, not 0 to 3.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-06-01 13:29:24 -07:00
Matt Turner
63d57f3b08 i965/fs: Name temporary ralloc contexts something other than mem_ctx.
Or else poor programmers might mistakenly use the temporary mem_ctx,
instead of the fs_visitor's mem_ctx and wonder why their code is
crashing.

Also remove the parenting. These contexts are local to the optimization
passes they're in and are freed at the end.
2014-04-05 09:44:54 -07:00
Matt Turner
d2fcdd0973 i965/cfg: Clean up cfg_t constructors.
parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:42 -08:00
Matt Turner
68349e5219 i965/fs: Don't perform CSE on inst HW_REG dests (unless it's null)
Commit b16b3c87 began performing CSE on CMP instructions with null
destinations. I relaxed the restrictions a bit too much, thereby
allowing CSE to be performed on instructions with, for instance, an
explicit accumulator destination.

This broke the arb_gpu_shader5/fs-imulExtended shader tests because
they emit MUL instructions with the accumulator as the destination. CSE
would instead cause the MUL to write to a GRF, which is lower precision
than the accumulator.

Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
2013-11-09 09:10:24 -08:00
Matt Turner
b16b3c8703 i965/fs: Perform CSE on CMP(N) instructions.
Optimizes

      cmp.ge.f0(8)  null     g45<8,8,1>F  0F
(+f0) sel(8)        g50<1>F  g40<8,8,1>F  g10<8,8,1>F
      cmp.ge.f0(8)  null     g45<8,8,1>F  0F
(+f0) sel(8)        g51<1>F  g41<8,8,1>F  g11<8,8,1>F
      cmp.ge.f0(8)  null     g45<8,8,1>F  0F
(+f0) sel(8)        g52<1>F  g42<8,8,1>F  g12<8,8,1>F
      cmp.ge.f0(8)  null     g45<8,8,1>F  0F
(+f0) sel(8)        g53<1>F  g43<8,8,1>F  g13<8,8,1>F

into

      cmp.ge.f0(8)  null     g45<8,8,1>F  0F
(+f0) sel(8)        g50<1>F  g40<8,8,1>F  g10<8,8,1>F
(+f0) sel(8)        g51<1>F  g41<8,8,1>F  g11<8,8,1>F
(+f0) sel(8)        g52<1>F  g42<8,8,1>F  g12<8,8,1>F
(+f0) sel(8)        g53<1>F  g43<8,8,1>F  g13<8,8,1>F

total instructions in shared programs: 1644938 -> 1638181 (-0.41%)
instructions in affected programs:     574955 -> 568198 (-1.18%)

Two more 16-wide programs (in L4D2). Some large (-9%) decreases in
instruction count in some of Valve's Source Engine games. No
regressions.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-30 19:49:27 -07:00
Matt Turner
219b43c612 i965/fs: Don't emit null MOVs in CSE.
We'd like to CSE some instructions, like CMP, that often have null
destinations. Instead of replacing them with MOVs to null, just don't
emit the MOV.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-30 19:49:27 -07:00
Matt Turner
e52959e961 i965/fs: Match commutative expressions with reversed arguments.
total instructions in shared programs: 1645011 -> 1644938 (-0.00%)
instructions in affected programs:     17543 -> 17470 (-0.42%)

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-10-25 10:34:02 -07:00